MetagenOmic read Re-Assigner and abundance quantifier

Overview

Mora

Mora is an read re-assigner that re-assigns query reads to a unique reference.

Main steps of Mora:

  1. Calculate the expected abundance levels of the references based on the input SAM file.
  2. Assign each query that had at least one valid mapping to a reference based on their mapping scores and the expected abundance levels.
  3. Output the results into a txt file.

For more details, please consult the (preprint) paper.

Requirements

Rust and Cargo need to be installed and added to PATH.

Installation

To be able to use the full pipeline, run the following commands. If you only want to run Mora as a Rust program with a SAM/BAM file as input, you do not need to run the bash install.sh command.

git clone https://github.com/AfZheng126/MORA.git
cd MORA
bash install.sh
cargo build --release

Running Mora

After everything in the config files (see below) is updated according to your directories, run

snakemake --snakefile MORA --cores 24 --resources mem_mb=140000

Running Mora as a Rust Program

If you already have a SAM file that has mappings scores stored in the AS:i: optional field, you can directly run the Rust program and skip the indexing and mapping steps. To do this and get outputs without taxonomic information, run the following commands in the Mora directory.

cargo run --release -- -s samfile -o output

If you are runing from another directory and the specific binary is wanted, run

target/release/mora -s samifile -o output

A sample sam file is provided in the samples directory. To use it, use the following command.

taget/release/mora -s samples/test.sam -o test.txt

For more options and customization, run

target/release/mora -h

Config File

The parameters of the config.yaml file used for the snakemake pipline are listed below:

Parameter Description
BINARIES Binary folder directory (default: binaries) - do not edit
REFERENCES Directory to reference fasta file
SAMPLES_DIR Directory to folder containing query fasta files
RESULTS Directory to write the results
FILES_EXT Query files extension, i.e. .fq, .fq.gz etc
MAPPING_MODE Algorithm for the initial mapping - (pufferfish, bowtie2, minimap2)
STRATEGY "PE" for paired-end samples or "SE" for single-end samples
TYPE RNA or DNA host-specific samples - right now only supports DNA
MIN_CNT Minimum number of counts for a reference to be considered valid
MIN_SCORE_DIFFERENCE Minimum score difference for a query to be assgined second
MAX_ABUNDANCE_DIFFERENCE Maximum difference allowed between the initial abundance estimation and the abundances created from assignments
SEGMENT_SIZE Size to split references into bins
ABUNDANCE_OUTPUT Whether to output estimated abundance levels
TAXONOMY Directory of taxonomic information to write results with taxonomic classes (NA to not include taxonomic information in the results)
MEM_MB Amount of memory to be allocated to snakemake
TPS Number of threads to be used per sample

Query Files

The program requires a list of query files. These can be .fasta, .fq, or even compressed files. If the query files are pair-end quries, their name must be of the form *_1.fq and *_2.fq, where the file extension can something else. The directory of these query files must be written into the config file.

Reference File

If a reference file is provided, its directory must also be written into the config file. If there is no reference file, you can download the fasta file representing the complete representative and reference bacterial genomes from NCBI RefSeq database by following the instructions from the Microbial reference preparation from the Agamemnon Wiki. The index will be built when you run the program, so you don't have to manually do it.

Taxonomic Information

Normally, the output of the program is two columns telling you which reference each query came from. If taxonomic information about the assigned reference is wanted as well, extra files must be made. To do this, navigate to the scripts directory and run

bash taxonomy.sh reference.fa

where reference.fa is your reference files. After this is done, update TAXONOMY in the config file to Taxonomy.

Use Case

Sample data and the results from the Mora paper can be found here. To run the data, simply update the configuration file with where you download the data and run it with the snakemake command.

You might also like...
Downloads and provides debug symbols and source code for nix derivations to gdb and other debuginfod-capable debuggers as needed.

nixseparatedebuginfod Downloads and provides debug symbols and source code for nix derivations to gdb and other debuginfod-capable debuggers as needed

Tooling and library for generation, validation and verification of supply chain metadata documents and frameworks

Spector Spector is both tooling and a library for the generation, validation and verification of supply chain metadata documents and frameworks. Many

A comprehensive collection of resources and learning materials for Rust programming, empowering developers to explore and master the modern, safe, and blazingly fast language.

🦀 Awesome Rust Lang ⛰️ Project Description : Welcome to the Awesome Rust Lang repository! This is a comprehensive collection of resources for Rust, a

ratlab is a programming platform designed loosely for hobbyist and masochist to analyse and design stuff and things that transform our world?
ratlab is a programming platform designed loosely for hobbyist and masochist to analyse and design stuff and things that transform our world?

ratlab A programming language developed by Quinn Horton and Jay Hunter. ratlab is a programming platform designed loosely for hobbyists and masochists

REC2 (Rusty External Command and Control) is client and server tool allowing auditor to execute command from VirusTotal and Mastodon APIs written in Rust. 🦀
REC2 (Rusty External Command and Control) is client and server tool allowing auditor to execute command from VirusTotal and Mastodon APIs written in Rust. 🦀

Information: REC2 is an old personal project (early 2023) that I didn't continue development on. It's part of a list of projects that helped me to lea

Execution of and interaction with external processes and pipelines

subprocess The subprocess library provides facilities for execution of and interaction with external processes and pipelines, inspired by Python's sub

 create and test the style and formatting of text in your terminal applications
create and test the style and formatting of text in your terminal applications

description: create and test the style and formatting of text in your terminal applications docs: https://docs.rs/termstyle termstyle is a library tha

Command-Line program that takes images and produces the copy of the image with a thin frame and palette made of the 10 most frequent colors.
Command-Line program that takes images and produces the copy of the image with a thin frame and palette made of the 10 most frequent colors.

paleatra v.0.0.1 Command-Line program that takes an image and produces the copy of the image with a thin frame and palette made of the 10 most frequen

This is choose, a human-friendly and fast alternative to cut and (sometimes) awk
This is choose, a human-friendly and fast alternative to cut and (sometimes) awk

Choose This is choose, a human-friendly and fast alternative to cut and (sometimes) awk Features terse field selection syntax similar to Python's list

Owner
Andrew Zheng
Andrew Zheng
Scriptable tool to read and write UEFI variables from EFI shell. View, save, edit and restore hidden UEFI (BIOS) Setup settings faster than with the OEM menu forms.

UEFI Variable Tool (UVT) UEFI Variable Tool (UVT) is a command-line application that runs from the UEFI shell. It can be launched in seconds from any

null 4 Dec 11, 2023
A ln scraper to read light novels and watch anime in your terminal (Written in rust)

Readme Table of content Why use kami Dependencies Install Linux/mac Windows Honorable mentions Why use kami Well its a fast and easy way to watch anim

mrfluffy 26 Dec 23, 2022
ask.sh: AI terminal assistant that can read and write your terminal directly!

ask.sh: AI terminal assistant that read from & write to your terminal ask.sh is an AI terminal assistant based on OpenAI APIs such as GPT-3.5/4! What'

hmirin 5 Jun 20, 2023
Check the reproducibility status of your Arch Linux packages (read-only mirror)

arch-repro-status A CLI tool for querying the reproducibility status of the Arch Linux packages using data from a rebuilderd instance such as reproduc

Arch Linux 12 Nov 16, 2022
A Rust synchronisation primitive for "Multiplexed Concurrent Single-Threaded Read" access

exit-left verb; 1. To exit or disappear in a quiet, non-dramatic fashion, making way for more interesting events. 2. (imperative) Leave the scene, and

Jonathan de Jong 0 Dec 5, 2021
A PoC for the CVE-2022-44268 - ImageMagick arbitrary file read

CVE-2022-44268 Arbitrary File Read PoC - PNG generator This is a proof of concept of the ImageMagick bug discovered by https://www.metabaseq.com/image

Cristian 'void' Giustini 100 Feb 19, 2023
Tight Model format is a lossy 3D model format focused on reducing file size as much as posible without decreasing visual quality of the viewed model or read speeds.

What is Tight Model Format The main goal of the tmf project is to provide a way to save 3D game assets compressed in such a way, that there are no not

null 59 Mar 6, 2023
Warp is a blazingly fast, Rust-based terminal that makes you and your team more productive at running, debugging, and deploying code and infrastructure.

Warp is a blazingly fast, Rust-based terminal that makes you and your team more productive at running, debugging, and deploying code and infrastructure.

Warp 10.4k Jan 4, 2023
Sets of libraries and tools to write applications and libraries mixing OCaml and Rust

Sets of libraries and tools to write applications and libraries mixing OCaml and Rust. These libraries will help keeping your types and data structures synchronized, and enable seamless exchange between OCaml and Rust

Meta 36 Jan 28, 2023