Chemical structure generation for protein sequences as SMILES string.

Overview

proteinogenic Star me

Chemical structure generation for protein sequences as SMILES string.

Actions Codecov License Source Crate Documentation Changelog GitHub issues

๐Ÿ”Œ Usage

This crate builds on top of purr, a crate providing primitives for reading and writing SMILES.

Use the AminoAcid enum to encode the sequence residues, and build a SMILES string with proteinogenic::smiles. For example with divergicin 750:

extern crate proteinogenic;

let residues = "KGILGKLGVVQAGVDFVSGVWAGIKQSAKDHPNA"
  .chars()
  .map(proteinogenic::AminoAcid::from_char)
  .map(Result::unwrap);
let s = proteinogenic::smiles(residues)
  .expect("failed to generate SMILES string");

Additional modifications can be carried out by using a Peptide struct to configure the rendering of the peptide. So far, disulfide bonds as well as lanthionine bridges are supported, as well as head-to-tail cyclization. For instance. we can generate the SMILES string of a cyclotide such as kalata B1:

extern crate proteinogenic;

let residues = "GLPVCGETCVGGTCNTPGCTCSWPVCTRN"
  .chars()
  .map(proteinogenic::AminoAcid::from_char)
  .map(Result::unwrap);

let mut p = proteinogenic::Protein::new(residues);
p.cyclization(proteinogenic::Cyclization::HeadToTail);
p.cross_link(proteinogenic::CrossLink::Cystine(5, 19)).unwrap();
p.cross_link(proteinogenic::CrossLink::Cystine(9, 21)).unwrap();
p.cross_link(proteinogenic::CrossLink::Cystine(14, 26)).unwrap();

let s = p.smiles()
  .expect("failed to generate SMILES string");

This SMILES string can be used in conjunction with other cheminformatics toolkits, for instance OpenBabel which can generate a PNG figure:

Skeletal formula of divergicin 750

Note that proteinogenic is not limited to building a SMILES string; it can actually use any purr::walk::Follower implementor to generate an in-memory representation of a protein formula. If your code is already compatible with purr, then you'll be able to use protein sequences quite easily.

extern crate proteinogenic;
extern crate purr;

let sequence = "KGILGKLGVVQAGVDFVSGVWAGIKQSAKDHPNA";
let residues = sequence.chars()
  .map(proteinogenic::AminoAcid::from_char)
  .map(Result::unwrap);

let mut builder = purr::graph::Builder::new();
proteinogenic::visit(residues, &mut builder);

builder.build()
  .expect("failed to create a graph representation");

The API is not yet stable, and may change to follow changes introduced by purr or to improve the interface ergonomics.

๐Ÿ’ญ Feedback

โš ๏ธ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

๐Ÿ“‹ Changelog

This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.

๐Ÿ” See Also

If you're a bioinformatician and a Rustacean, you may be interested in these other libraries:

  • uniprot.rs: Rust data structures for the UniProtKB databases.
  • obofoundry.rs: Rust data structures for the OBO Foundry.
  • fastobo: Rust parser and abstract syntax tree for Open Biomedical Ontologies.
  • pubchem.rs: Rust data structures and API client for the PubChem API.

๐Ÿ“œ License

This library is provided under the open-source MIT license.

This project was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.

You might also like...
A structure editor for a simple functional programming language, with Vim-like shortcuts and commands.

dilim A structure editor for a simple functional programming language, with Vim-like shortcuts and commands. Written in Rust, using the Yew framework,

EVA ICS v4 is a new-generation Industrial-IoT platform for Industry-4.0 automated control systems.
EVA ICS v4 is a new-generation Industrial-IoT platform for Industry-4.0 automated control systems.

EVA ICS v4 EVA ICSยฎ v4 is a new-generation Industrial-IoT platform for Industry-4.0 automated control systems. The world-first and only Enterprise aut

Tooling and library for generation, validation and verification of supply chain metadata documents and frameworks

Spector Spector is both tooling and a library for the generation, validation and verification of supply chain metadata documents and frameworks. Many

SKYULL is a command-line interface (CLI) in development that creates REST API project structure templates with the aim of making it easy and fast to start a new project.

SKYULL is a command-line interface (CLI) in development that creates REST API project structure templates with the aim of making it easy and fast to start a new project. With just a few primary configurations, such as project name, you can get started quickly.

Designed as successor to Pretty-Good-Video for improved codec structure, API design & performance

Pretty Fast Video Minimal video codec designed as a successor to Pretty Good Video Goals are to improve: Quality API design Codec structure (Hopefully

A rust crate to view a structure as raw bytes (&[u8])

rawbytes A Rust crate to view a structure as a plain byte array (&[u8]). Super simple. Tiny. Zero dependencies. This is a safer interface to slice::fr

A simple CLI tool to create python project file structure, written in Rust
A simple CLI tool to create python project file structure, written in Rust

Ezpie Create python projects blazingly fast What Ezpie can do? It can create a python project directory What kind of directory can Ezpie create? For c

๐Ÿฆ€ OpenAPI code generation ๐Ÿท

Pig ๐Ÿฆ€ OpenAPI code generation ๐Ÿท Install cargo install --git [email protected]:truchi/pig.git --locked Usage ๐Ÿฆ€ OpenAPI code generation ๐Ÿท Usage: pig [

Serde definition of Cargo.toml structure

Deserialize Cargo.toml This is a definition of fields in Cargo.toml files for serde. It allows reading of Cargo.toml data, and serializing it using TO

Comments
Releases(v0.2.0)
  • v0.2.0(Feb 17, 2022)

    Fixed

    • Kekulization of imidazole cycle of L-histidine residues.

    Changed

    • Refactored API to make failible operations return a result.
    • Renamed AminoAcid::from_code1 to AminoAcid::from_char.
    • Renamed AminoAcid::from_code3 to AminoAcid::from_code.

    Added

    • AminoAcid::as_code to view the 3-letter code of an AminoAcid variant.
    • L-pyrrolysine, dehydroalanine and (Z)-dehydrobutyrine amino acids.
    • Support for cross-link modifications like cystine or lanthionine.
    • Support for head-to-tail homodetic cyclization.
    • Dedicated error type for the new possible errors.
    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Feb 16, 2022)

Owner
Martin Larralde
PhD candidate in Bioinformatics, passionate about programming, Pythonista, Rustacean. I write poems, and sometimes they are executable.
Martin Larralde
`rusty_regex` takes an input string and produces a `regex` string representing what was provided.

rusty_regex This project provides a binary that takes an input string, and preps it for regex usage, effectively replacing known generics and producin

Chris Speakes 2 Dec 31, 2022
A CLI utility installed as "ansi" to quickly get ANSI escape sequences. Supports the most basic ones, like colors and styles as bold or italic.

'ansi' - a CLI utility to quickly get ANSI escape codes This Rust project called ansi-escape-sequences-cli provides an executable called ansi which ca

Philipp Schuster 5 Jul 28, 2022
Rust crate `needleman_wunsch` of the `fasebare` package: reading FASTA sequences, Needleman-Wunsch alignment

fasebare Rust crate needleman_wunsch of the fasebare package: reading FASTA sequences, Needleman-Wunsch alignment. Synopsis The crate needleman_wunsch

Laurent Bloch 2 Nov 19, 2021
Terminal text styling via ANSI escape sequences.

Iridescent Features iridescent is a library for styling terminal text easily. It supports basic ANSI sequences, Xterm-256 colors, and RGB. You can ope

Rob 2 Oct 20, 2022
View Source, but for terminal escape sequences

Escape Artist Escape Artist is a tool for seeing ANSI escape codes in terminal applications. You interact with your shell just like you normally would

Reilly Wood 8 Apr 16, 2023
A command-line utility that creates project structure.

petridish A command-line utility that creates project structure. If you have heard of the cookiecutter project, petridish is a rust implementation of

null 11 Dec 29, 2022
Next-generation, type-safe CLI parser for Rust

Next-generation, type-safe CLI parser for Rust

0918nobita 19 Jul 20, 2022
A Rust-based shell script to create a folder structure to use for a single class every semester. Mostly an excuse to use Rust.

A Rust Course Folder Shell Script PROJECT IN PROGRESS (Spring 2022) When completed, script will create a folder structure of the following schema: [ro

Sebastiรกn Romero Cruz 1 Apr 10, 2022
๐ŸŽ A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure. (Python wrapper for daachorse)

python-daachorse daachorse is a fast implementation of the Aho-Corasick algorithm using the compact double-array data structure. This is a Python wrap

Koichi Akabe 11 Nov 30, 2022
Quickly find all blackhole directories with a huge amount of filesystem entries in a flat structure

findlargedir About Findlargedir is a tool specifically written to help quickly identify "black hole" directories on an any filesystem having more than

Dinko Korunic 24 Jan 1, 2023