Tumour-only somatic mutation calling using long reads

Related tags

Command-line smrest
Overview

smrest

smrest is a prototype somatic mutation caller for single molecule long reads. It uses haplotype phasing patterns for tumour samples that have a sigificant proportion of normal cells (purity > 0.3, < 0.8) to identify somatic mutations. For more details, see the preprint linked below.

Citation

Simpson, J.T., Detecting Somatic Mutations Without Matched Normal Samples Using Long Reads, BioRxiv

Compiling

This program is written in Rust and uses the Cargo build system. After you have installed Cargo, you can compile this software from github as follows:

git clone https://github.com/jts/smrest.git
cd smrest
cargo build --release

Usage

smrest has three steps: first it finds heterozygous SNPs using a panel of known population variants from gnomAD, then these are phased using whatshap, followed by somatic mutation calling. These steps can be run manually, or using a Snakemake pipeline we have provided for convenience. We describe both methods here, using a small demo dataset that is descibed in the following section.

Demo data preparation

To demonstrate the usage of this program, we have prepared a small dataset consisting of ONT reads for chromosome 20 of COLO829/COLO829BL. To get the demo data you can use the snakemake pipeline (for simplicitly all commands shown below will assume you are running in the smrest/workflow directory, if you are running from a different path you will need to adjust the commands):

snakemake prepare_demo

This command will place the reads in data/COLO829.mixture.chr20.bam. smrest needs a set of population variants to estimate the local of heterozygous SNPs and a BED file describing the callable regions of the genome. You can download these resources using snakemake as well:

snakemake prepare_resources

Mutation calling (manual)

There are three steps to calling somatic mutations with smrest. First, we find heterozygous SNPs with smrest genotype-hets:

smrest genotype-hets -c resources/genotype_sites.vcf -r chr20 -g resources/GRCh38_no_alt_analysis_set.GCA_000001405.15.fna data/COLO829.mixture.chr20.bam > COLO829.gnomad_genotype.vcf

Next, we phase these hets using whatshap:

whatshap phase --ignore-read-groups -r resources/GRCh38_no_alt_analysis_set.GCA_000001405.15.fna -o COLO829.gnomad_genotype_whatshap_phased.vcf COLO829.gnomad_genotype.vcf data/COLO829.mixture.chr20.bam

Finally, we call somatic mutations:

smrest call -m haplotype-likelihood --purity 0.5 -r chr20 -g resources/GRCh38_no_alt_analysis_set.GCA_000001405.15.fna -p COLO829.gnomad_genotype_whatshap_phased.vcf -o COLO829.smrest_called_regions.bed data/COLO829.mixture.chr20.bam > COLO829.smrest_somatic_calls.vcf

Mutation calling (pipeline)

A Snakemake pipeline is provided in workflow/Snakemake to automate these three steps. It will also parallelize the process across 10Mb segments of the genome. It assumes the BAM file is in data/ (as in the demo data) and the pipeline can be run by building the smrest_calls/<sample>/<sample>.whatshap.final_q20_pass_calls.vcf target, where is the prefix of the BAM file. For example:

snakemake smrest_calls/COLO829.mixture.chr20/COLO829.mixture.chr20.whatshap.final_q20_pass_calls.vcf

License

MIT

Acknowledgements

This program reuses code originally developed by Edge et al for the Longshot variant caller.

You might also like...
Work-in-progress Rust application that converts C++ header-only libraries to single self-contained headers.

unosolo Work-in-progress Rust application that converts C++ header-only libraries to single self-contained headers. Disclaimer This is my first Rust p

A concurrent, append-only vector
A concurrent, append-only vector

The vector provided by this crate suports concurrent get and push operations. Reads are always lock-free, as are writes except when resizing is required.

try to find the correct word with only first letter and unknown letter count.

MOTUS Current dictionaries are provided in french and can contain some words not included in the official Motus dictionary. Additionally, dictionaries

Boxxy puts bad Linux applications in a box with only their files.

boxxy is a tool for boxing up misbehaving Linux applications and forcing them to put their files and directories in the right place, without symlinks!

Shell Escape for Typst typesetting system. Linux Only.

Shell Escape for Typst This is a simple shell escape for Typst. It allows you to run shell commands directly from Typst compiler. That said, it does n

Prisma2D - Fast, API agnostic, software only 2D graphics crate in pure Rust.

Prisma2D: Ultra-fast CPU 2D graphics Prisma2D is a blazingly fast, efficient yet minimal crate for basic 2D graphics on the CPU. for Rust. With Prisma

A super simple /sbin/init for Linux which allows running one and only one program

Summary High-performance /sbin/init program for Linux This is designed to do literally nothing but accept binaries over the network and run them as a

A minimal file exchange server designed for clients with browsers only.

XIAO-Files Xiao-Files is a minimal file exchange server designed for clients with browsers only. Example Let's say we have a host with IP 10.8.20.1, a

Using BDK from nodejs using WASM webpack 🦀

BDK + nodejs = ❤️ This repository shows how to use the bdk library in nodejs. It's just a proof-of-concept, not a complete example, and as such, it's

Owner
Jared Simpson
Jared Simpson
argmax is a library that allows Rust applications to avoid Argument list too long errors (E2BIG) by providing a std::process::Command wrapper with a

argmax argmax is a library that allows Rust applications to avoid Argument list too long errors (E2BIG) by providing a std::process::Command wrapper w

David Peter 22 Nov 20, 2022
🧠 A command-line utility for switching git branches more easily. Switch branches interactively or use a fuzzy search to find that long-forgotten branch name.

git-smart-checkout A git command extension for switching git branches more efficiently. About Interactively switch branches or fuzzy search for that f

Cezar Craciun 51 Dec 29, 2022
Ideas => Creations, a multi-language CMS(Content Management System) based on Rust Web stacks, with long-term upgrade and maintenance.

Ideas => Creations 中文 RustHub: Rust ideas yesterday, shining creations today! This repository holds source code used to run https://rusthub.org, it's

rusthub.org 4 May 9, 2023
Fast tool to scan for valid 7-long imgur ids for the ArchiveTeam imgur efforts (not affiliated or endorsed)

imgur_id7 Fast tool to scan for valid 7-long imgur ids for the ArchiveTeam imgur efforts (not affiliated or endorsed) Optionally uses supplied http pr

Robin Rolf 6 Jun 3, 2023
Bam Error Stats Tool (best): analysis of error types in aligned reads.

best Bam Error Stats Tool (best): analysis of error types in aligned reads. best is used to assess the quality of reads after aligning them to a refer

Google 54 Jan 3, 2023
Extract subsets of ONT (Nanopore) reads based on time

ONTime Extract subsets of ONT (Nanopore) reads based on time Motivation Install Examples Usage Time range format Cite Motivation Some collaborators wa

Michael Hall 5 Jan 17, 2023
A simple TTS tool for Windows that reads directly from the clipboard.

Quick Text-To-Speech A simple TTS tool for Windows that reads directly from the clipboard or from textfiles that are dragged into the window. Screensh

Alexander 3 May 1, 2023
Answering the question nobody asked: what if you wanted to text your friends using only ARP?

arpchat so... you know arp? the protocol your computer uses to find the mac addresses of other computers on your network? yeah. that. i thought it wou

Kognise 1.3k Jan 1, 2023
A complete imgui-rs example using dependencies only from crates.io.

Dear imgui-rs, hello. This is a fairly basic, but complete and standalone example application for the Rust version of dear imgui (https://github.com/o

null 0 Nov 30, 2022
Check the reproducibility status of your Arch Linux packages (read-only mirror)

arch-repro-status A CLI tool for querying the reproducibility status of the Arch Linux packages using data from a rebuilderd instance such as reproduc

Arch Linux 12 Nov 16, 2022