Just another minhash implementation.

Overview

Rust License Codecov Dependency status


jam-rs

Just another minhash (jam) implementation. A high performance minhash variant to screen extremely large (metagenomic) datasets in a very short timeframe. Implements parts of the ScaledMinHash / FracMinHash algorithm described in sourmash.

Unlike traditional implementations like sourmash or mash this version tries to specialise more on estimating containment of small sequences in large sets. This is intended to be used to screen terabytes of data in just a few seconds / minutes.

Comparison

  • xxhash3 or ahash-fallback (for kmer < 31) instead of murmurhash3
  • No jaccard similarity since this is meaningless when comparing small embeded sequences against large sets
  • (coming soon) optimisations for specificity and sensitivity (and speed) specifically for search of small sequences in assembled metagenomes

Scaling methods

Multiple different scaling methods:

  • FracMinHash (fscale): Restricts the hash-space to a maximum of scale * u64::MAX
  • KmerCountScaling (kscale): Restrict the overall maximum number of hashes to a factor of scale
  • MinMaxAbsoluteScaling (nscale): Use a minimum or maximum number of hashes per sequence record

If KmerCountScaling and MinMaxAbsoluteScaling are used together the minimum number of hashes (per sequence record) will be guaranteed. FracMinHash and KmerCountScaling produce similar results, the first is mainly provided for sourmash compatibility.

Usage

$ jam
Just another minhasher, obviously blazingly fast

Usage: jam [OPTIONS] <COMMAND>

Commands:
  sketch   Sketches one or more files and writes the result to an output file
  merge    Merge multiple input sketches into a single sketch
  dist     Calculate distance of a (small) sketch against one or more sketches as database
  help     Print this message or the help of the given subcommand(s)

Options:
  -t, --threads <THREADS>  Number of threads to use [default: 1]
  -f, --force              Overwrite output files
  -h, --help               Print help (see more with '--help')
  -V, --version            Print version

Sketching

The easiest way to sketch files is to use the jam sketch command. This accepts one or more input files (fastx / fastx.gz) or a .list file with a full list of input files. And sketches all inputs to a specific outpuf sketch file.

$ jam sketch
Sketches one or more files and writes the result to an output file

Usage: jam sketch [OPTIONS] --input <INPUT> --output <OUTPUT>

Options:
  -i, --input <INPUT>          Input file, directory or file with list of files to be hashed
  -o, --output <OUTPUT>        Output file
  -k, --kmer-size <KMER_SIZE>  kmer size all sketches to be compared must have the same size [default: 21]
  -s, --scale <SCALE>          The estimated scaling factor to apply [default: 0.001]
  -t, --threads <THREADS>      Number of threads to use [default: 1]
  -f, --force                  Overwrite output files
  -h, --help                   Print help

Dist

Calculate the distance for one or more inputs vs. a large set of database sketches. Optionally specify a minimum cutoff in percent of matching kmers. Output is optional if not specified the result will be printed to stdout.

$ jam dist
Calculate distance of a (small) sketch against one or more sketches as database. Requires all sketches to have the same kmer size

Usage: jam dist [OPTIONS] --input <INPUT> --database <DATABASE>

Options:
  -i, --input <INPUT>        Input sketch or raw file
  -d, --database <DATABASE>  Database sketch(es)
  -o, --output <OUTPUT>      Output to file instead of stdout
  -c, --cutoff <CUTOFF>      Cut-off value for similarity [default: 0.0]
  -t, --threads <THREADS>    Number of threads to use [default: 1]
  -f, --force                Overwrite output files
  -h, --help                 Print help

Merge

Merge multiple sketches into one large one.

$ jam merge
Merge multiple input sketches into a single sketch

Usage: jam merge [OPTIONS] --output <OUTPUT> [INPUTS]...

Arguments:
  [INPUTS]...  One or more input sketches

Options:
  -o, --output <OUTPUT>    Output file
  -t, --threads <THREADS>  Number of threads to use [default: 1]
  -f, --force              Overwrite output files
  -h, --help               Print help

License

This project is licensed under the MIT license. See the LICENSE file for more info.

Disclaimer

jam-rs is still in early active development and not ready for production use. Use at your own risk. Once a stable version is released additional information and installation guidelines will be added.

Credits

This tool is heavily inspired by finch-rs/License and sourmash/License. Check them out if you need a more mature ecosystem with well tested hash functions and more features.

You might also like...
A simple, very minimal Minecraft server implementation in Rust.
A simple, very minimal Minecraft server implementation in Rust.

A simple, very minimal Minecraft server implementation in Rust. For a simple Minecraft server that isn't supposed to do much (for example, a limbo ser

Rust implementation of the Nomic Bitcoin sidechain

Nomic Bitcoin Bridge testnet v0.3.0 (codename "gucci") Guccinet In this testnet, we've added two core featues: staking and Bitcoin integration. Full s

A wordle implementation that I made for a competition of wordle solvers

Wordle tester A wordle implementation that I made for a competition of wordle solvers. Runs tests using a list of words and outputs the total turns ta

A first-time implementation of Conway's Game of Life in Rust: Adventure and Commentary

A Commentary on Life This project documents the process and final result of my first-ever attempt at implementing Conway's Game of Life. I'll be using

The study of a simple path tracer implementation (image raytracing in shorts)
The study of a simple path tracer implementation (image raytracing in shorts)

The study of a simple path tracer implementation (generate a raytraced image, in shorts).

RusTTS is an unofficial Coqui TTS implementation.

RusTTS RusTTS is an unofficial Coqui TTS implementation. Currently, only the YourTTS for [ TTS & VC ] has been implemented. So, feel free to contribut

bevy-hikari is an implementation of voxel cone tracing global illumination with anisotropic mip-mapping in Bevy
bevy-hikari is an implementation of voxel cone tracing global illumination with anisotropic mip-mapping in Bevy

Bevy Voxel Cone Tracing bevy-hikari is an implementation of voxel cone tracing global illumination with anisotropic mip-mapping in Bevy. Bevy Version

An alternative ggez implementation on top of miniquad.
An alternative ggez implementation on top of miniquad.

Good Web Game good-web-game is a wasm32-unknown-unknown implementation of a ggez subset on top of miniquad. Originally built to run Zemeroth on the we

An implementation of the Jump Flooding Algorithm for the Bevy engine.

bevy_jfa The Jump Flooding Algorithm (JFA) for Bevy. Features This crate provides an OutlinePlugin that can be used to add outlines to Bevy meshes. Se

Owner
Sebastian Beyvers
Rustacean, distributed systems & metagenomics
Sebastian Beyvers
Yet another shape chess game in Rust.

shape_chesss_in_rust Yet another shape chess game in Rust. Why the implementation is so slow? The main reason is performance of Vector iteration is ve

Simon Lee 1 Apr 10, 2022
Just when you thought Bevy couldn't get more ergonomic, Bvy shows up to change the game.

Just when you thought Bevy couldn't get more ergonomic, Bvy shows up to change the game. Is this a joke? You decide. Does it work? You can bet your As

Carter Anderson 40 Oct 28, 2022
A no-frills Tetris implementation written in Rust with the Piston game engine, and Rodio for music.

rustris A no-frills Tetris implementation written in Rust with the Piston game engine, and Rodio for music. (C) 2020 Ben Cantrick. This code is distri

Ben Cantrick 17 Aug 18, 2022
An (unofficial) open source Rust implementation of the Discord Game SDK.

⚔️ discord-sdk An (unofficial) open source Rust implementation of the Discord Game SDK. Why not use this? This project is not official and is using a

Embark 86 Dec 23, 2022
Terminal UI implementation and types for the Dark Forest game

dark-forest.rs Terminal UI implementation and types for the Dark Forest game Development We use the standard Rust toolchain cargo check

Georgios Konstantopoulos 63 Nov 12, 2022
Implementation of the great book Ray Tracing in One Weekend in Rust.

Ray Tracing in One Weekend (Rust) Implementation of the great book Ray Tracing in One Weekend in Rust. Fun easy way to pick up and learn Rust (was rou

Stanley Su 6 Dec 29, 2021
A Rust implementation of the legendary solitaire game

Freecell Yet another implementation of the legendary total information solitaire. Play patience like it's 1991, complete with sights and sounds. Build

null 16 Dec 14, 2022
Game of life implementation written in Rust.

Game of life Game of life implementation written in Rust. Part of my journey in learning Rust. Pattern files The patterns are based on the example pat

Hashem Hashem 2 Nov 17, 2022
A simple implementation of Conway's Game of Life using Fully homomorphic Encryption

Game of life using Fully homomorphic encryption A simple implementation of Conway's Game of Life built using Zama's concrete-boolean library. Build Ju

Florent Michel 4 Oct 3, 2022
An implementation of the Game of Life

Lifeee – An implementation of the Game of Life I realized this application to keep learning Rust, discover the front-end library Yew, and because I’m

Sébastien Castiel 58 Nov 23, 2022