Rust crate `needleman_wunsch` of the `fasebare` package: reading FASTA sequences, Needleman-Wunsch alignment

Overview

fasebare

Rust crate needleman_wunsch of the fasebare package: reading FASTA sequences, Needleman-Wunsch alignment.

Synopsis

The crate needleman_wunsch of the fasebare package consists of two Rust modules:

  • fasta_multiple_cmp provides functions to read biological sequences (DNA, RNA or proteins) in FASTA formatted files;
  • sequences_matrix provides functions to build an alignment matrix of two sequences and to compute their similarity score, according to the Needleman-Wunsch algorithm.

The Data directory contains test data with artificial sequence data, and also true sequences extracted from the Genbank databank, in order to try the programs.

The cargo build system will build a standalone program to be invoked from the command line. The program has been built and run only with the Linux OS, but maybe it would run with other OS.

Motivation

The first aim of this package was for the author to learn the programming language Rust, and to apply it to a domain he knows a bit, Bioinformatics. The author's site gives some explanations of this approach (in French...).

These programs are intended for pedagogic use, if you use them for professional or scientific projects, it will be at your own risks.

Credits

These programs invoke the following crate:

  • simple_matrix

To take into account the dependency to this package, the Cargo.toml file must be:

[package]
name = "needleman_wunsch"
version = "0.1.0"
authors = ["Laurent Bloch <[email protected]>"]
edition = "2018"

# See more keys and their definitions at
# https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
simple-matrix = "0.1"

The author found hints and inspiration from totof2000, Unknown, Nicolas Memeint.

Principle of operations (summary)

Usually biologists work about a sequence of interest, which we will name the โ€œquery sequenceโ€, and they try to compare it with a batch of sequences, the โ€œbankโ€, in order to select the sequences of the bank with the higher similarity scores.

The similarity scores between two sequences are computed according to the Needleman-Wunsch algorithm. This algorithm build an alignment matrix. One sequence has its letters placed horizontally on the top of the matrix, each letter on the top of a column. The second sequence has its letter placed vertically on the left of the matrix, each letter on the left of a row. One extra line is placed below the top sequence, and one extra column is placed on the right of the left sequence. Each cell of the matrix will contain the score of each individual pair of letters.

To fill the matrix, the program computes each score for each individual pair of letters according to one of three situations (definitions borrowed from Wikipedia):

  • Match: The two letters at the current index are the same.
  • Mismatch: The two letters at the current index are different.
  • Gap: The best alignment involves one letter aligning to a gap in the other sequence.

So the algorithm needs two parameters to work: the value of the gap penalty, and the value of the mismatch penalty (or, alternatively, the value of the match bonus, which is the solution adopted for our program).

You could refer to the Wikipedia article for further explanations and details.

To build and invoke the program:

For the developper, the command line to build the program is (from the base directory of the project):

cargo build

Then, you can invoke the program is as follows:

cargo run <path to the file of the query sequence>
          <path to the file of the sequences bank>
          <value of the match bonus>
          <value of the gap penalty (negative or zero)>

For instance, with test files from this repository:

cargo run Data/seq_orchid2.fasta Data/sequences_orchid.fasta 1.0 -0.5

To build an executable binary file proceed as follows:

cargo build --release

The executable file will be there:

./target/release/needleman_wunsch

Remember, with Rust, no runtime, so this executable is executable anywhere with your data.

You might also like...
A blazingly fast rust-based bionic reader for blazingly fast reading within a terminal console ๐Ÿฆ€
A blazingly fast rust-based bionic reader for blazingly fast reading within a terminal console ๐Ÿฆ€

This Rust-based CLI tool reads text and returns it back in bionic reading format for blazingly fast loading and even faster reading! Bionic reading is

๐Ÿ“š flow state reading in the terminal
๐Ÿ“š flow state reading in the terminal

fsrx ๐Ÿ“š (f)low (s)tate (r)eading e(x)change โ€“ flow state reading in the terminal Inspired by (but not affiliated with) Renato Casutt and his revolutio

Desktop app for reading and downloading manga. With clean distraction-free design and no clutter

Tonbun Tonbun is a desktop app for reading and downloading manga. With clean distraction-free design and no clutter. Build with Rust, Tauri, Vue.js, a

Reading Getting Friendly With CPU Caches
Reading Getting Friendly With CPU Caches

Getting Friendly With CPU Caches Reading Getting Friendly With CPU Caches, by Miki Tebeka and William Kennedy, inspired me to look at some Rust equiva

Tool and framework for securely reading untrusted USB mass storage devices.

usbsas is a free and open source (GPLv3) tool and framework for securely reading untrusted USB mass storage devices. Description Following the concept

๐Ÿ“š flow state reading in the terminal
๐Ÿ“š flow state reading in the terminal

fsrx ๐Ÿ“š(f)low (s)tate (r)eading e(x)change โ€“ flow state reading in the terminal Inspired by (but not affiliated with) Renato Casutt and his revolution

A Rust CLI to provide last publish dates for packages in a package-lock.json file

NPM Package Age A Rust CLI which if you provide a npm lockfile (package-lock.json to start), it will give you a listing of all of the packages & the l

Yet another package manager for Rust.

Rpip Installing. Make sure you have just (packages) installed! Once you have just installed move into the root directory (where this file is) and run

๐Ÿ“ฆ A Python package manager written in Rust inspired by Cargo.
๐Ÿ“ฆ A Python package manager written in Rust inspired by Cargo.

huak About A Python package manager written in Rust. The Cargo for Python. โš ๏ธ Disclaimer: huak is currently in its proof-of-concept (PoC) phase. Huak

Owner
Laurent Bloch
Informatics since 1969, Bioinformatics since 1994
Laurent Bloch
A tool to filter sites in a FASTA-format whole-genome pseudo-alignment

Core-SNP-filter This is a tool to filter sites (i.e. columns) in a FASTA-format whole-genome pseudo-alignment based on: Whether the site contains vari

Ryan Wick 15 Apr 2, 2023
Parallel iteration of FASTA/FASTQ files, for when sequence order doesn't matter but speed does

Rust-parallelfastx A truly parallel parser for FASTA/FASTQ files. Principle The input file is memory-mapped then virtually split into N chunks. Each c

Rayan Chikhi 8 Oct 24, 2022
A CLI utility installed as "ansi" to quickly get ANSI escape sequences. Supports the most basic ones, like colors and styles as bold or italic.

'ansi' - a CLI utility to quickly get ANSI escape codes This Rust project called ansi-escape-sequences-cli provides an executable called ansi which ca

Philipp Schuster 5 Jul 28, 2022
Terminal text styling via ANSI escape sequences.

Iridescent Features iridescent is a library for styling terminal text easily. It supports basic ANSI sequences, Xterm-256 colors, and RGB. You can ope

Rob 2 Oct 20, 2022
Chemical structure generation for protein sequences as SMILES string.

proteinogenic Chemical structure generation for protein sequences as SMILES string. ?? Usage This crate builds on top of purr, a crate providing primi

Martin Larralde 4 Aug 4, 2022
View Source, but for terminal escape sequences

Escape Artist Escape Artist is a tool for seeing ANSI escape codes in terminal applications. You interact with your shell just like you normally would

Reilly Wood 8 Apr 16, 2023
Simple terminal alignment viewer

Alen Simple terminal sequence alignment viewer. What is Alen? It's a command-like program to view DNA or protein alignments in FASTA formats. Alen is

Jakob Nybo Nissen 51 Dec 19, 2022
Yet another sort crate, porting Golang sort package to Rust.

IndexSort IndexSort Yet another sort crate (in place), porting Golang's standard sort package to Rust. Installation [dependencies] indexsort = "0.1.0"

Al Liu 4 Sep 28, 2022
An open source, programmed in rust, privacy focused tool for reading programming resources (like stackoverflow) fast, efficient and asynchronous from the terminal.

Falion An open source, programmed in rust, privacy focused tool for reading programming resources (like StackOverFlow) fast, efficient and asynchronou

Obscurely 17 Dec 20, 2022
Pure rust library for reading / writing DNG files providing access to the raw data in a zero-copy friendly way.

DNG-rs โ€ƒ A pure rust library for reading / writing DNG files providing access to the raw data in a zero-copy friendly way. Also containing code for re

apertusยฐ - open source cinema 4 Dec 1, 2022