k-mer counter in Rust using the rust-bio and rayon crates

Overview

krust is a k-mer counter written in Rust and run from the command line that will output canonical k-mers and their frequency across the records in a fasta file.

Run krust on the test data in the krust Github repo, searching for kmers of length 5, like this:
$ cargo run --release 5 cerevisae.pan.fa > output.tsv
or, searching for kmers of length 21:
$ cargo run --release 21 cerevisae.pan.fa > output.tsv

krust prints to stdout, writing, on alternate lines:
>{frequency}
{canonical k-mer}
>{frequency}
(canonical k-mer}
...

krust uses rust-bio, rayon, and dashmap.

Future:
A function like fn single_sequence_canonical_kmers(filepath: String, k: usize) {}
Would returns k-mer counts for individual sequences in a fasta file.

Comments
  • needletail other then bio to accelerate fasta parsing

    needletail other then bio to accelerate fasta parsing

    Hello Team,

    It seems needle tail is much faster than bio for fasta file parsing. For larger fasta files, parsing can also be parallelized. Is this doable?

    Thanks,

    Jianshu

    opened by jianshu93 3
  • Explore using the bytes crate

    Explore using the bytes crate

    bytes

    The biggest feature it adds over Vec is shallow cloning. In other words, calling clone() on a Bytes instance does not copy the underlying data. Instead, a Bytes instance is a reference-counted handle to some underlying data. The Bytes type is roughly an Arc<Vec> but with some added capabilities.

    opened by suchapalaver 1
  • speed up by changing the utf8 processing, reverse-comp, and storage

    speed up by changing the utf8 processing, reverse-comp, and storage

    the utf8-processing of the kmers. The kmer iterator itself should really check it has valid kmers while iterating. Also, instead of storing the reverse-complement in heap-allocated strings, you can make a lazy reverse-complemented object. Alternatively, store the kmers in u64 - one of the reasons for using kmers in the first place is that they can be packed into machine integers for speed.

    opened by suchapalaver 0
  • It would be nice to have a better error message when you call it from command-line with wrong arguments

    It would be nice to have a better error message when you call it from command-line with wrong arguments

    something that explains which arguments to pass in. When the output directory exists, you just write "File exists", which is very confusing if you have an unrelated file called "output".

    opened by suchapalaver 0
  • avoid panicking at all in your library code

    avoid panicking at all in your library code

    If anyone wants to import your function they won't be happy with something that crashes the whole application when it fails. You can panic in the executable portion of the program though.

    opened by suchapalaver 0
  • change the Config struct member kmer_len to be a usize

    change the Config struct member kmer_len to be a usize

    Rather than do let kmer_len = config.kmer_len.parse::().unwrap();, I would instead change the Config struct member kmer_len to be a usize, and perform parsing while constructing Config - Config::new already returns Result.

    opened by suchapalaver 0
  • speed up using hashmaps

    speed up using hashmaps

    writing a line per kmer is too inefficient and rarely needed. Much better to just return a vector of kmer hashmaps. Alternatively, make a hashmap containing n -> m pairs, where N is the number of time some kmer has been seen, and m the number of distinct kmers having been seen n times.

    opened by suchapalaver 0
Owner
null
async-alloc-counter measures max allocations in a future invocation

async-alloc-counter measures max allocations in a future invocation see examples/ for usage This allocator can be used as follows: use async_alloc_cou

Geoffroy Couprie 2 Dec 3, 2021
Arduino Nano frequency counter with atomic clock accuracy

Arduino Nano frequency counter with atomic clock accuracy Project description and test setup With this project you can measure a frequency from less t

Frank Buss 24 Apr 3, 2022
A lean, minimal, and stable set of types for color interoperation between crates in Rust.

This library provides a lean, minimal, and stable set of types for color interoperation between crates in Rust. Its goal is to serve the same function that mint provides for (linear algebra) math types.

Gray Olson 16 Sep 21, 2022
Rust library to scan files and expand multi-file crates source code as a single tree

syn-file-expand This library allows you to load full source code of multi-file crates into a single syn::File. Features: Based on syn crate. Handling

Vitaly Shukela 11 Jul 27, 2022
Automatically cross-compiles the sysroot crates core, compiler_builtins, and alloc.

cargo-xbuild Cargo-xbuild is a wrapper for cargo build, which cross compiles the sysroot crates core, compiler_builtins, and alloc for custom targets.

Rust OSDev 241 Dec 30, 2022
Generate an SPDX Software Bill of Materials for Rust crates.

cargo-spdx cargo-spdx is currently in development and not yet ready for use. cargo-spdx provides a cargo subcommand to generate an SPDX Software Bill

Andrew Lilley Brinker 13 May 18, 2023
A snapshot of name squatting on crates.io

Machine-readable database of public packages on crates.io which meet an arbitrary, unwritten, sensible definition of name squatting: squatted.csv Form

David Tolnay 69 Feb 1, 2023
Verify that registry crates in your Cargo.lock are reproducible from the git repository

cargo-goggles Verify that registry crates in your Cargo.lock are reproducible from the git repository. This cargo subcommand analyzes the following pr

M4SS - Industrial IoT Solutions 36 Jul 16, 2024
Garden monitoring system using m328p Arduino Uno boards. 100% Rust [no_std] using the avr hardware abstraction layer (avr-hal)

uno-revive-rs References Arduino Garden Controller Roadmap uno-revive-rs: roadmap Components & Controllers 1-2 Uno R3 m328p Soil moisture sensor: m328

Ethan Gallucci 1 May 4, 2022
Cookiecutter templates for Serverless applications using AWS SAM and the Rust programming language.

Cookiecutter SAM template for Lambda functions in Rust This is a Cookiecutter template to create a serverless application based on the Serverless Appl

AWS Samples 24 Nov 11, 2022
Image optimization using Rust and Vips 🦀

Huffman Image optimization using Rust and Libvips. Requirements You must have the following packages installed before getting started Rust Vips pkg-co

ChronicleHQ 4 Nov 3, 2022
Awesome full-stack template using Yew and Rust

Docker + Actix + Yew Full Stack Template ??‍?? YouTube videos Full Stack Rust App Template using Yew + Actix! https://youtu.be/oCiGjrpGk4A Add Docker

Security Union 143 Jun 22, 2023
A Rust library that simplifies YAML serialization and deserialization using Serde.

Serde YML: Seamless YAML Serialization for Rust Serde YML is a Rust library that simplifies YAML serialization and deserialization using Serde. Effort

Sebastien Rousseau 4 Apr 4, 2024
A simple workshop to learn how to write, test and deploy AWS Lambda functions using the Rust programming language

Rust Lambda Workshop Material to host a workshop on how to build and deploy Rust Lambda functions with AWS SAM and Cargo Lambda. Intro What is Serverl

Luciano Mammino 13 Mar 28, 2024
Garden monitoring system using m328p and m2560 Arduino Uno boards

Garden monitoring system using m328p and m2560 Arduino Uno boards. 100% Rust [no_std] using the avr hardware abstraction layer (avr-hal)

Ethan Gallucci 1 May 4, 2022
Generate commit messages using GPT3 based on your changes and commit history.

Commit Generate commit messages using GPT-3 based on your changes and commit history. Install You need Rust and Cargo installed on your machine. See t

Brian Le 40 Jan 3, 2023
🦀 Rust-based implementation of a Snowflake Generator which communicates using gRPC

Clawflake Clawflake is a Rust application which implements Twitter Snowflakes and communicates using gRPC. Snowflake ID numbers are 63 bits integers s

n1c00o 5 Oct 31, 2022
A pure Rust PLONK implementation using arkworks as a backend.

PLONK This is a pure Rust implementation of the PLONK zk proving system Usage use ark_plonk::prelude::*; use ark_ec::bls12::Bls12; use rand_core::OsRn

rust-zkp 201 Dec 31, 2022
This crate allows you to safely initialize Dynamically Sized Types (DST) using only safe Rust.

This crate allows you to safely initialize Dynamically Sized Types (DST) using only safe Rust.

Christofer Nolander 11 Dec 22, 2022