PyO3 bindings and Python interface to skani, a method for fast fast genomic identity calculation using sparse chaining.

Overview

🐍 ⛓️ 🧬 Pyskani Stars

PyO3 bindings and Python interface to skani, a method for fast fast genomic identity calculation using sparse chaining.

Actions Coverage License PyPI Bioconda AUR Wheel Python Versions Python Implementations Source Mirror Issues Docs Changelog Downloads

πŸ—ΊοΈ Overview

skani is a method developed by Jim Shaw and Yun William Yu for fast and robust metagenomic sequence comparison through sparse chaining. It improves on FastANI by being more accurate and much faster, while requiring less memory.

pyskani is a Python module, implemented using the PyO3 framework, that provides bindings to skani. It directly links to the skani code, which has the following advantages over CLI wrappers:

  • pre-built wheels: pyskani is distributed on PyPI and features pre-built wheels for common platforms, including x86-64 and Arm64 UNIX.
  • single dependency: If your software or your analysis pipeline is distributed as a Python package, you can add pyskani as a dependency to your project, and stop worrying about the skani binary being present on the end-user machine.
  • sans I/O: Everything happens in memory, in Python objects you control, making it easier to pass your sequences to skani without having to write them to a temporary file.

This library is still a work-in-progress, and in an experimental stage, but it should already pack enough features to be used in a standard pipeline.

πŸ”§ Installing

Pyskani can be installed directly from PyPI, which hosts some pre-built CPython wheels for x86-64 Unix platforms, as well as the code required to compile from source with Rust:

$ pip install pyskani

In the event you have to compile the package from source, all the required Rust libraries are vendored in the source distribution, and a Rust compiler will be setup automatically if there is none on the host machine.

πŸ’‘ Examples

πŸ“ Creating a database

A database can be created either in memory or using a folder on the machine filesystem to store the sketches. Independently of the storage, a database can be used immediately for querying, or saved to a different location.

Here is how to create a database into memory, using Biopython to load the record:

database = pyskani.Database()
record = Bio.SeqIO.read("vendor/skani/test_files/e.coli-EC590.fasta", "fasta")
database.sketch("E. coli EC590", bytes(record.seq))

For draft genomes, simply pass more arguments to the sketch method, for which you can use the splat operator:

database = pyskani.Database()
records = Bio.SeqIO.parse("vendor/skani/test_files/e.coli-o157.fasta", "fasta")
sequences = (bytes(record.seq) for record in records)
database.sketch("E. coli O157", *sequences)

πŸ—’οΈ Loading a database

To load a database, either created from skani or pyskani, you can either load all sketches into memory, for fast querying:

database = pyskani.Database.load("path/to/sketches")

Or load the files lazily to save memory, at the cost of slower querying:

database = pyskani.Database.open("path/to/sketches")

πŸ”Ž Querying a database

Once a database has been created or loaded, use the Database.query method to compute ANI for some query genomes:

record = Bio.SeqIO.read("vendor/skani/test_files/e.coli-K12.fasta", "fasta")
hits = database.query("E. coli K12", bytes(record.seq))

πŸ”Ž See Also

Computing ANI for closed genomes? You may also be interested in pyfastani, a Python package for computing ANI using the FastANI method developed by Chirag Jain et al.

πŸ’­ Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

πŸ—οΈ Contributing

Contributions are more than welcome! See CONTRIBUTING.md for more details.

βš–οΈ License

This library is provided under the MIT License.

The skani code was written by Jim Shaw and is distributed under the terms of the MIT License as well. See vendor/skani/LICENSE for more information. Source distributions of pyskani vendors additional sources under their own terms using the cargo vendor command.

This project is in no way not affiliated, sponsored, or otherwise endorsed by the original skani authors. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.

You might also like...
Pollard's p - 1, in rust, with python bindings

Pollard's p - 1 algorithm for factorization Written in rust, using pyo3 to provide python bindings and primesieve for fast prime enumeration. libprime

RocksDB-based queue with python bindings

RocksQ An inproc RocksDB-based queue with Python bindings. It is implemented in Rust. Features: max capacity limit in number of elements; size calcula

🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure. (Python wrapper for daachorse)

python-daachorse daachorse is a fast implementation of the Aho-Corasick algorithm using the compact double-array data structure. This is a Python wrap

⚑ Blazing fast async/await HTTP client for Python written on Rust using reqwests

Reqsnaked Reqsnaked is a blazing fast async/await HTTP client for Python written on Rust using reqwests. Works 15% faster than aiohttp on average RAII

An implementation of Piet's text interface using cosmic-text

piet-cosmic-text Implements piet's Text interface using the cosmic-text crate. License piet-cosmic-text is free software: you can redistribute it and/

tpp (Tera Pre-Processor) is a versatile CLI (Command Line Interface) tool crafted for preprocessing files using the Tera templating engine.

tpp (Tera Pre-Processor) is a versatile CLI (Command Line Interface) tool crafted for preprocessing files using the Tera templating engine. Drawing inspiration from pre-processors like cpp and gpp, tpp is the next evolution with its powerful expressive toolset.

Demo app duplicated in 5 languages (Go/JavaScript/Python/Ruby/Rust) showing how to go from source code to container image using melange+apko

hello-melange-apko πŸ’« This repo contains an example app duplicated across 5 languages showing how to: Package source code into APKs using melange Buil

SKYULL is a command-line interface (CLI) in development that creates REST API project structure templates with the aim of making it easy and fast to start a new project.

SKYULL is a command-line interface (CLI) in development that creates REST API project structure templates with the aim of making it easy and fast to start a new project. With just a few primary configurations, such as project name, you can get started quickly.

A fast, simple and lightweight Bloom filter library for Python, fully implemented in Rust.

rBloom A fast, simple and lightweight Bloom filter library for Python, fully implemented in Rust. It's designed to be as pythonic as possible, mimicki

Owner
Martin Larralde
PhD candidate in Bioinformatics, passionate about programming, SIMD-enthusiast, Pythonista, Rustacean. I write poems, and sometimes they are executable.
Martin Larralde
Schemars is a high-performance Python serialization library, leveraging Rust and PyO3 for efficient handling of complex objects

Schemars Introduction Schemars is a Python package, written in Rust and leveraging PyO3, designed for efficient and flexible serialization of Python c

Michael Gendy 7 Nov 21, 2023
osu! difficulty and pp calculation for all modes

rosu-pp-js Difficulty and performance calculation for all osu! modes. This is a js binding to the Rust library rosu-pp which was bootstrapped through

Max 8 Nov 23, 2022
osu! difficulty and pp calculation for all modes

rosu-pp-py Difficulty and performance calculation for all osu! modes. This is a python binding to the Rust library rosu-pp which was bootstrapped thro

Max 16 Dec 28, 2022
Calculation of Wigner symbols and related constants

Calculation of Wigner symbols and related constants This crate computes Wigner 3j coefficients and Clebsch-Gordan coefficients in pure Rust. The calcu

Guillaume Fraux 3 Jan 19, 2022
PyO3's PyAny as a serde data format

serde-pyobject PyO3's PyAny as a serde data format Usage Serialize T: Serialize into &'py PyAny: use serde::Serialize; use pyo3::{Python, types::{PyAn

Jij 3 Nov 24, 2023
pyrevm Blazing-fast Python bindings to revm

pyrevm Blazing-fast Python bindings to revm Quickstart make install make test Example Usage Here we show how you can fork from Ethereum mainnet and s

Georgios Konstantopoulos 97 Apr 14, 2023
Rust Imaging Library's Python binding: A performant and high-level image processing library for Python written in Rust

ril-py Rust Imaging Library for Python: Python bindings for ril, a performant and high-level image processing library written in Rust. What's this? Th

Cryptex 13 Dec 6, 2022
Songbird bindings for python

Songbird-Py Songbird bindings for python. The goal is to provide an easy to use alternitive to Lavalink. Its written with rust-bindings to Songbird. S

null 0 Jul 23, 2022
Python bindings for decancer.

decancer_py Python bindings for decancer. Installation You can get started with decancer_py by installing from PyPI: pip install -U decancer-py Usage

Jonxslays 6 Dec 27, 2022
⚑️ Python bindings for a rust implementation of aapleby's MurMurHash.

?? murmurh ?? ⚑️ Python bindings for a rust implementation of aapleby's MurMurHash. ?? Contributing Contributions are welcome! Please feel free to ope

Paul Leydier 6 Jan 14, 2023