A Rust BED-to-GFF3 translator

Overview

Crates.io GitHub

bed2gff

A Rust BED-to-GFF3 translator.

translates

chr7 56766360 56805692 ENST00000581852.25 1000 + 56766360 56805692 0,0,200 3 3,135,81, 0,496,39251,

into

chr7 bed2gff gene 56399404 56805692 . + . ID=ENSG00000166960;gene_id=ENSG00000166960

chr7 bed2gff transcript 56766361 56805692 . + . ID=ENST00000581852.25;Parent=ENSG00000166960;gene_id=ENSG00000166960;transcript_id=ENST00000581852.25

chr7 bed2gff exon 56766361 56766363 . + . ID=exon:ENST00000581852.25.1;Parent=ENST00000581852.25;gene_id=ENSG00000166960;transcript_id=ENST00000581852.25,exon_number=1

chr7 bed2gff CDS 56766361 56766363 . + 0 ID=CDS:ENST00000581852.25.1;Parent=ENST00000581852.25;gene_id=ENSG00000166960;transcript_id=ENST00000581852.25,exon_number=1

...

chr7 bed2gff start_codon 56766361 56766363 . + 0 ID=start_codon:ENST00000581852.25.1;Parent=ENST00000581852.25;gene_id=ENSG00000166960;transcript_id=ENST00000581852.25,exon_number=1

chr7 bed2gff stop_codon 56805690 56805692 . + 0 ID=stop_codon:ENST00000581852.25.3;Parent=ENST00000581852.25;gene_id=ENSG00000166960;transcript_id=ENST00000581852.25,exon_number=3

...

in a few seconds.

Usage

Usage: bed2gff[EXE] --bed <BED> --isoforms <ISOFORMS> --output <OUTPUT>

Arguments:
    --bed <BED>: a .bed file
    --isoforms <ISOFORMS>: a tab-delimited file
    --output <OUTPUT>: path to output file

Options:
    --help: print help
    --version: print version

Warning

All the transcripts in .bed file should appear in the isoforms file.

crate: https://crates.io/crates/bed2gff

click for detailed formats

bed2gff just needs two files:

  1. a .bed file

    tab-delimited files with 3 required and 9 optional fields:

    chrom   chromStart  chromEnd      name    ...
      |         |           |           |
    chr20   50222035    50222038    ENST00000595977    ...
    

    see BED format for more information

  2. a tab-delimited .txt/.tsv/.csv/... file with genes/isoforms (all the transcripts in .bed file should appear in the isoforms file):

    > cat isoforms.txt
    
    ENSG00000198888 ENST00000361390
    ENSG00000198763 ENST00000361453
    ENSG00000198804 ENST00000361624
    ENSG00000188868 ENST00000595977
    

    you can build a custom file for your preferred species using Ensembl BioMart.

Installation

to install bed2gff on your system follow this steps:

  1. get rust: curl https://sh.rustup.rs -sSf | sh on unix, or go here for other options
  2. run cargo install bed2gff (make sure ~/.cargo/bin is in your $PATH before running it)
  3. use bed2gff with the required arguments
  4. enjoy!

Library

to include bed2gff as a library and use it within your project follow these steps:

  1. include bed2gff = 0.1.0 under [dependencies] in the Cargo.toml file

  2. the library name is bed2gff, to use it just write:

    use bed2gff::bed2gff; 

    or

    use bed2gff::*;
  3. invoke

    let gff = bed2gff(bed: &String, isoforms: &String, output: &String)

Build

to build bed2gff from this repo, do:

  1. get rust (as described above)
  2. run git clone https://github.com/alejandrogzi/bed2gff.git && cd bed2gff
  3. run cargo run --release <BED> <ISOFORMS> <OUTPUT>(arguments are positional, so you do not need to specify --bed/--isoforms)

Output

bed2gff will send the output directly to the same .bed file path if you specify so

bed2gff annotation.bed isoforms.txt output.gff

.
├── ...
├── isoforms.txt
├── annotation.bed
└── output.gff3

where output.gff3 is the result.

FAQ

Why?

Converting formats is a daily practice in bioinformatics. This is way more common while working with gene annotations as tools differ in input/output layouts. GTF/GFF/BED are the most used structures to store gene-related annotations and the conversion needs are not well covered by available software.

A considerable portion of genomic tools reduce the software space by accepting GTF/GFF3 files only, directing BED users to translate their files into different formats. While some of this issues have already been covered (e.g. bed2gtf) with GTF files, the GFF3 layout lacks stable converting tools (1, 2).

bed2gff is presented as a straightforward option to convert BED files into ready-to-use GFF3 files, closing that gap.

How?

bed2gff, takes the base code of bed2gtf, that basically is the reimplementation of UCSC's C binaries merged in 1 step (bedToGenePred + genePredToGtf). Before any conversion, this tool sorts the .bed file internally using a similar algorithmic approach seen in gtfsort. This step allows bed2gff to directly present the output file sorted in a natural and convenient way. Then, evaluates the position of exons and other features (CDS, stop/start, UTRs), preserving reading frames and adjusting the indexing count.

Following the rationale of bed2gtf, bed2gff is able to produce a ready-to-use gff3 file by using an isoforms file, that works as the refTable in C binaries to map each transcript to their respective gene.

To Do's

  • Allow users to input compressed files (e.g. .gz, .bgzip)
  • Test GFF3 with different types of aligners
  • Improve the error module
  • Add test modules for most of the scripts
  • Allow users to specify their parent/child relationships (?)

References

  1. https://bioinformatics.stackexchange.com/questions/2242/how-to-convert-bed-to-gff3
  2. https://www.biostars.org/p/2/
You might also like...
This rust compiler backend emmits valid CLR IR, enambling you to use Rust in .NET projects

What is rustc_codegen_clr? NOTE: this project is a very early proof-of-concept This is a compiler backend for rustc which targets the .NET platform an

RustGPT is a ChatGPT UI built with Rust + HTMX: the power of Rust coupled with the simplicity of HTMX 💚

RustGPT 🦀✨ RustGPT.Blog.Post.mp4 Welcome to the RustGPT repository! Here, you'll find a web ChatGPT clone entirely crafted using Rust and HTMX, where

Rust API Server: A versatile template for building RESTful interfaces, designed for simplicity in setup and configuration using the Rust programming language.
Rust API Server: A versatile template for building RESTful interfaces, designed for simplicity in setup and configuration using the Rust programming language.

RUST API SERVER Introduction Welcome to the Rust API Server! This server provides a simple REST interface for your applications. This README will guid

Rust File Management CLI is a command-line tool written in Rust that provides essential file management functionalities. Whether you're working with files or directories, this tool simplifies common file operations with ease.

Rust FileOps Rust File Management CLI is a command-line tool written in Rust that provides essential file management functionalities. Whether you're w

Experimental engine agnostic 3D CSG library for game development written in Rust. Started as a port of csg.js to Rust.

brusher Experimental engine agnostic 3D CSG library for game development written in Rust. Started as a port of csg.js to Rust. ultimate goal My hope i

A full featured, fast Command Line Argument Parser for Rust

clap Command Line Argument Parser for Rust It is a simple-to-use, efficient, and full-featured library for parsing command line arguments and subcomma

Docopt for Rust (command line argument parser).

THIS CRATE IS UNMAINTAINED This crate is unlikely to see significant future evolution. The primary reason to choose this crate for a new project is if

Quickly build cool CLI apps in Rust.

QuiCLI Quickly build cool CLI apps in Rust. Getting started Read the Getting Started guide! Thanks This is only possible because of all the awesome li

A minimal CLI framework written in Rust
A minimal CLI framework written in Rust

seahorse A minimal CLI framework written in Rust Features Easy to use No dependencies Typed flags(Bool, String, Int, Float) Documentation Here Usage T

Owner
Alejandro Gonzales-Irribarren
Alejandro Gonzales-Irribarren
Potr (Po Translator) is a command line tool for translating gettext PO files.

Potr Potr (Po Translator) is a command line tool for translating Gettext PO files. Currently, it supports translation using OpenAI, Azure OpenAI Servi

Riff 6 Jul 16, 2023
Rust-advent - Learning Rust by solving advent of code challenges (Streaming live on Twitch every Monday)

Rust advent ?? ?? Learning Rust by implementing solutions for Advent of Code problems. ?? HEY, we are live-streaming our attempts to solve the exercis

Luciano Mammino 20 Nov 11, 2022
Rust-clippy - A bunch of lints to catch common mistakes and improve your Rust code

Clippy A collection of lints to catch common mistakes and improve your Rust code. There are over 450 lints included in this crate! Lints are divided i

The Rust Programming Language 8.7k Dec 31, 2022
Rust-battery - Rust crate providing cross-platform information about the notebook batteries.

battery Rust crate providing cross-platform information about the notebook batteries. Table of contents Overview Supported platforms Install Examples

svartalf 326 Dec 21, 2022
A Rust-based shell script to create a folder structure to use for a single class every semester. Mostly an excuse to use Rust.

A Rust Course Folder Shell Script PROJECT IN PROGRESS (Spring 2022) When completed, script will create a folder structure of the following schema: [ro

Sebastián Romero Cruz 1 Apr 10, 2022
Rust Imaging Library's Python binding: A performant and high-level image processing library for Python written in Rust

ril-py Rust Imaging Library for Python: Python bindings for ril, a performant and high-level image processing library written in Rust. What's this? Th

Cryptex 13 Dec 6, 2022
FTL Rust Demangler is a command-line tool for demangling symbol names that are mangled with the Rust convention

FTL Rust Demangler is a command-line tool for demangling symbol names that are mangled with the Rust convention. It takes a mangled symbol name as input and returns the demangled name

timetravel3 7 Mar 30, 2023
rpm (Rust project manager) is a tool that helps you to manage your rust projects

rpm rpm (Rust project manager) is a open source tool for managing your rust project in an organized way Installation # make sure you have rust install

Dilshad 4 May 4, 2023
auto-rust is an experimental project that aims to automatically generate Rust code with LLM (Large Language Models) during compilation, utilizing procedural macros.

Auto Rust auto-rust is an experimental project that aims to automatically generate Rust code with LLM (Large Language Models) during compilation, util

Minsky 6 May 14, 2023
Rusty Shellcode Reflective DLL Injection (sRDI) - A small reflective loader in Rust 4KB in size for generating position-independent code (PIC) in Rust.

Shellcode Reflective DLL Injection (sRDI) Shellcode reflective DLL injection (sRDI) is a process injection technique that allows us to convert a given

null 242 Jul 5, 2023