use variant nesting information to flter overlapping sites from vg deconstruct output

Related tags

Miscellaneous vcfbub
Overview

vcfbub

popping bubbles in vg deconstruct VCFs

overview

The VCF output produced by a command like vg deconstruct -e -a -H '#' ... includes information about the nesting of variants. With -a, --all-snarls, we obtain not just the top level bubbles, but all nested ones. This exposed snarl tree information can be used to filter the VCF to obtain a set of non-overlapping sites (n.b. "snarl" is a generic model of graph bubbles including tips and loops).

vcfbub lets us do two common operations on these VCFs:

  1. We can filter sites by maximum level in the snarl tree. For instance, --max-level 0 would keep only sites with LV=0. In practice, vg's snarl finder ensures that these are sites rooted on the main linear axis of the pangenome graph. Those at higher levels occur within larger variants.
  2. We can filter sites by maximum allele size, either for the reference allele or any allele. In this case, --max-ref-length 10000 would keep only sites where the reference allele is less than 10kb long. Setting --max-ref-length or --max-allele-length additionally ensures that the output contains the bubbles nested inside of any popped bubble, even if they are at greater than --max-level.

vcfbub accomplishes a simple task: we keep sites that are the children of those which we "pop" due to their size. These occur around complex large SVs, such as multi-Mbp inversions and segmental duplications. We often need to remove these, as they provide little information for many downstream applications, such as haplotype panels or other imputation references.

usage

This removes all non-top-level variant sites (-l 0) unless they are inside of variants with reference length > 10kb (-r 10000):

vcfbub -l 0 -r 10000 var.vcf >filt.vcf
You might also like...
📜 A pci.ids-compliant library for getting information about available PCI devices.

aparato A pci.ids-compliant library for getting information about available PCI devices. Usage Add the following to your project's Cargo.toml file: ap

Uradhura is a telegram bot that fetches information and media from reddit
Uradhura is a telegram bot that fetches information and media from reddit

Pathetic little Telegram bot that fetches information from Reddit posts(with gif, image(post with single image) and video)

A stringly-typed Error that includes `#[track_caller]` information.

A stringly-typed Error that includes #[track_caller] information.

Memory.lol - a tiny web service that provides historical information about social media accounts

memory.lol Overview This project is a tiny web service that provides historical information about social media accounts. It can currently be used to l

 Lambda function to handle Bitbucket webhook payloads, extract relevant information and send notifications to Microsoft Teams
Lambda function to handle Bitbucket webhook payloads, extract relevant information and send notifications to Microsoft Teams

PR-Bot Lambda function to handle Bitbucket webhook payloads, extract relevant information, and send notifications to Microsoft Teams, saving you time

Easy-to-use optional function arguments for Rust

OptArgs uses const generics to ensure compile-time correctness. I've taken the liberty of expanding and humanizing the macros in the reference examples.

MeiliSearch is a powerful, fast, open-source, easy to use and deploy search engine
MeiliSearch is a powerful, fast, open-source, easy to use and deploy search engine

MeiliSearch is a powerful, fast, open-source, easy to use and deploy search engine. Both searching and indexing are highly customizable. Features such as typo-tolerance, filters, and synonyms are provided out-of-the-box. For more information about features go to our documentation.

very cool esoteric language pls use

okfrick has one memory pointer has less than 5 characters hopefully works well is turing complete (possibly) + - increase memeory pointer value ( - st

 Achieve it! How you ask? Well, it's pretty simple; just use greatness!
Achieve it! How you ask? Well, it's pretty simple; just use greatness!

Greatness! Achieve it! How you ask? Well, it's pretty simple; just use greatness! Disclaimer I do not believe that greatness is the best. It fits a me

Comments
  • vcfbub installation/usage

    vcfbub installation/usage

    Hello to all,

    I am struggling with vcfbub installation. I've looked for documentation but could not find something that helped me through it (I should point out that I am a biologist and may as well be missing something quite straightforward).

    I first thought that vcfbub may be integrated in another pangenome tool (I looked in pggb, vg, vcflib), but did not find it.

    I then tried to copy the git repository, but this led me to the conclusion that I require rust with cargo to run the main.rs file. For some reason, the rust installation is not working (neither locally nor on the hpc cluster I am using).

    How should I proceed to be able to use vcfbub ? Should I keep trying with rust installation or is there another way ?

    Thank you for your time on this issue.

    Best, Luca Soldini

    opened by lsoldini 2
Releases(v0.1.0)
Owner
graphical pangenomic methods
null
Boop is a variant of Brainfuck featuring cats.

Boop Boop is a variant of Brainfuck featuring cats. Try it $ cat ./examples/hello-world.boop This program is Hello World translated from Brainfuck to

Federico Damián Schonborn 4 Aug 12, 2022
A server to continously poll nearly always-on sites to verify that your internet connectivity stays up

Dead Router A server to continously poll nearly always-on sites to verify that your internet connectivity stays up! If one or more of the servers stop

null 0 Feb 5, 2022
A modern and open source twist to classic pastebin sites.

Turbine A modern and open-source twist to classic pastebin sites. What is this? Turbine originally started out as a simple pastebin idea so I could ha

Jay3332 4 Oct 1, 2022
Rust library for program synthesis of string transformations from input-output examples 🔮

Synox implements program synthesis of string transformations from input-output examples. Perhaps the most well-known use of string program synthesis in end-user programs is the Flash Fill feature in Excel. These string transformations are learned from input-output examples.

Anish Athalye 21 Apr 27, 2022
Shows only the first page of rustc output

cargo-first-page Shows only the first page of rustc output. Installation cargo install cargo-firstpage Usage Prefix the cargo command by firstpage: T

Cecile Tonglet 11 Dec 19, 2021
Provides a Suricata Eve output for Kafka with Suricate Eve plugin

Suricata Eve Kafka Output Plugin for Suricata 6.0.x This plugin provides a Suricata Eve output for Kafka. Base on suricata-redis-output: https://githu

Center 7 Dec 15, 2022
A stupid macro that compiles and executes Rust and spits the output directly into your Rust code

inline-rust This is a stupid macro inspired by inline-python that compiles and executes Rust and spits the output directly into your Rust code. There

William 19 Nov 29, 2022
CLI tool that make it easier to perform multiple lighthouse runs towards a single target and output the result in a "plotable" format.

Lighthouse Groupie CLI tool that make it easier to perform multiple lighthouse runs towards a single target and output the result in a "plotable" form

Polestar 1 Jan 12, 2022
🥅 Dead simple webhook worker for Sentry to output events in a Discord channel

?? Sentry Webhook Dead simple webhook worker for Sentry to output events in a Discord channel Why? This is just a simple Rust HTTP service to do so, t

Noel 5 Nov 7, 2022
A dynamic output configuration tool that automatically detects and configures connected outputs based on a set of profiles.

shikane A dynamic output configuration tool that automatically detects and configures connected outputs based on a set of profiles. Each profile speci

Hendrik Wolff 15 May 4, 2023