Chemical structure generation for protein sequences as SMILES string.

Overview

proteinogenic Star me

Chemical structure generation for protein sequences as SMILES string.

Actions Codecov License Source Crate Documentation Changelog GitHub issues

🔌 Usage

This crate builds on top of purr, a crate providing primitives for reading and writing SMILES.

Use the AminoAcid enum to encode the sequence residues, and build a SMILES string with proteinogenic::smiles. For example with divergicin 750:

extern crate proteinogenic;

let residues = "KGILGKLGVVQAGVDFVSGVWAGIKQSAKDHPNA"
  .chars()
  .map(proteinogenic::AminoAcid::from_char)
  .map(Result::unwrap);
let s = proteinogenic::smiles(residues)
  .expect("failed to generate SMILES string");

Additional modifications can be carried out by using a Peptide struct to configure the rendering of the peptide. So far, disulfide bonds as well as lanthionine bridges are supported, as well as head-to-tail cyclization. For instance. we can generate the SMILES string of a cyclotide such as kalata B1:

extern crate proteinogenic;

let residues = "GLPVCGETCVGGTCNTPGCTCSWPVCTRN"
  .chars()
  .map(proteinogenic::AminoAcid::from_char)
  .map(Result::unwrap);

let mut p = proteinogenic::Protein::new(residues);
p.cyclization(proteinogenic::Cyclization::HeadToTail);
p.cross_link(proteinogenic::CrossLink::Cystine(5, 19)).unwrap();
p.cross_link(proteinogenic::CrossLink::Cystine(9, 21)).unwrap();
p.cross_link(proteinogenic::CrossLink::Cystine(14, 26)).unwrap();

let s = p.smiles()
  .expect("failed to generate SMILES string");

This SMILES string can be used in conjunction with other cheminformatics toolkits, for instance OpenBabel which can generate a PNG figure:

Skeletal formula of divergicin 750

Note that proteinogenic is not limited to building a SMILES string; it can actually use any purr::walk::Follower implementor to generate an in-memory representation of a protein formula. If your code is already compatible with purr, then you'll be able to use protein sequences quite easily.

extern crate proteinogenic;
extern crate purr;

let sequence = "KGILGKLGVVQAGVDFVSGVWAGIKQSAKDHPNA";
let residues = sequence.chars()
  .map(proteinogenic::AminoAcid::from_char)
  .map(Result::unwrap);

let mut builder = purr::graph::Builder::new();
proteinogenic::visit(residues, &mut builder);

builder.build()
  .expect("failed to create a graph representation");

The API is not yet stable, and may change to follow changes introduced by purr or to improve the interface ergonomics.

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

📋 Changelog

This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.

🔍 See Also

If you're a bioinformatician and a Rustacean, you may be interested in these other libraries:

  • uniprot.rs: Rust data structures for the UniProtKB databases.
  • obofoundry.rs: Rust data structures for the OBO Foundry.
  • fastobo: Rust parser and abstract syntax tree for Open Biomedical Ontologies.
  • pubchem.rs: Rust data structures and API client for the PubChem API.

📜 License

This library is provided under the open-source MIT license.

This project was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.

You might also like...
Write a simple CLI script, that when given a 64-byte encoded string

Write a simple CLI script, that when given a 64-byte encoded string, it finds a suitable 4-byte prefix so that, a SHA256 hash of the prefix combined with the original string of bytes, has two last bytes as 0xca, 0xfe. Script should expect the original content of the string to be passed in hexadecimal format and should return two lines, first being the SHA256 string found and second 4-byte prefix used (in hexadecimal format).

`matchable` provides a convenient enum for checking if a piece of text is matching a string or a regex.

matchable matchable provides a convenient enum for checking if a piece of text is matching a string or a regex. The common usage of this crate is used

Easy c̵̰͠r̵̛̠ö̴̪s̶̩̒s̵̭̀-t̶̲͝h̶̯̚r̵̺͐e̷̖̽ḁ̴̍d̶̖̔ ȓ̵͙ė̶͎ḟ̴͙e̸̖͛r̶̖͗ë̶̱́ṉ̵̒ĉ̷̥e̷͚̍ s̷̹͌h̷̲̉a̵̭͋r̷̫̊ḭ̵̊n̷̬͂g̵̦̃ f̶̻̊ơ̵̜ṟ̸̈́ R̵̞̋ù̵̺s̷̖̅ţ̸͗!̸̼͋

Rust S̵̓i̸̓n̵̉ I̴n̴f̶e̸r̵n̷a̴l mutability! Howdy, friendly Rust developer! Ever had a value get m̵̯̅ð̶͊v̴̮̾ê̴̼͘d away right under your nose just when

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values. Authors: Sanjay Ghem

An annotated string type in Rust, made up of string slices

A string type made up of multiple annotated string slices.

`rusty_regex` takes an input string and produces a `regex` string representing what was provided.

rusty_regex This project provides a binary that takes an input string, and preps it for regex usage, effectively replacing known generics and producin

A CLI utility installed as
A CLI utility installed as "ansi" to quickly get ANSI escape sequences. Supports the most basic ones, like colors and styles as bold or italic.

'ansi' - a CLI utility to quickly get ANSI escape codes This Rust project called ansi-escape-sequences-cli provides an executable called ansi which ca

Rust crate `needleman_wunsch` of the `fasebare` package: reading FASTA sequences, Needleman-Wunsch alignment

fasebare Rust crate needleman_wunsch of the fasebare package: reading FASTA sequences, Needleman-Wunsch alignment. Synopsis The crate needleman_wunsch

Terminal text styling via ANSI escape sequences.
Terminal text styling via ANSI escape sequences.

Iridescent Features iridescent is a library for styling terminal text easily. It supports basic ANSI sequences, Xterm-256 colors, and RGB. You can ope

A special web app to render fancy UTF-8 sequences. :hindu_temple: :scroll:

UTF RENDER 🛕 📜 A special web app to render fancy UTF-8 sequences. 🛕 📜 ABOUT 📚 Emojis and fancy symbols are part of the UTF-8 character standard (

Dataframe structure and operations in Rust

Utah Utah is a Rust crate backed by ndarray for type-conscious, tabular data manipulation with an expressive, functional interface. Note: This crate w

Redis Tree(Ploytree) Structure Module

RedisTree is a Redis module that implements Polytree as a native data type. It allows creating,locating,pushing and detaching tree from Redi

Proof-of-concept for a memory-efficient data structure for zooming billion-event traces

Proof-of-concept for a gigabyte-scale trace viewer This repo includes: A memory-efficient representation for event traces An unusually simple and memo

SegVec data structure for rust. Similar to Vec, but allocates memory in chunks of increasing size.

segvec This crate provides the SegVec data structure. It is similar to Vec, but allocates memory in chunks of increasing size, referred to as "segment

Graph data structure library for Rust.

petgraph Graph data structure library. Supports Rust 1.41 and later. Please read the API documentation here Crate feature flags: graphmap (default) en

Structure-aware, in-process, coverage-guided, evolutionary fuzzing engine for Rust functions.

fuzzcheck Fuzzcheck is a structure-aware, in-process, coverage-guided, evolutionary fuzzing engine for Rust functions. Given a function test: (T) - b

Build a config structure from environment variables in Rust without boilerplate

Yasec Yet another stupid environment config (YASEC) creates settings from environment variables. (Envconig-rs fork) Features Nested configuration stru

Hypergraph is a data structure library to generate directed hypergraphs.

Hypergraph is data structure library to create a directed hypergraph in which a hyperedge can join any number of vertices.

Comments
Releases(v0.2.0)
  • v0.2.0(Feb 17, 2022)

    Fixed

    • Kekulization of imidazole cycle of L-histidine residues.

    Changed

    • Refactored API to make failible operations return a result.
    • Renamed AminoAcid::from_code1 to AminoAcid::from_char.
    • Renamed AminoAcid::from_code3 to AminoAcid::from_code.

    Added

    • AminoAcid::as_code to view the 3-letter code of an AminoAcid variant.
    • L-pyrrolysine, dehydroalanine and (Z)-dehydrobutyrine amino acids.
    • Support for cross-link modifications like cystine or lanthionine.
    • Support for head-to-tail homodetic cyclization.
    • Dedicated error type for the new possible errors.
    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Feb 16, 2022)

Owner
Martin Larralde
PhD candidate in Bioinformatics, passionate about programming, Pythonista, Rustacean. I write poems, and sometimes they are executable.
Martin Larralde
A CLI utility installed as "ansi" to quickly get ANSI escape sequences. Supports the most basic ones, like colors and styles as bold or italic.

'ansi' - a CLI utility to quickly get ANSI escape codes This Rust project called ansi-escape-sequences-cli provides an executable called ansi which ca

Philipp Schuster 5 Jul 28, 2022
Rust crate `needleman_wunsch` of the `fasebare` package: reading FASTA sequences, Needleman-Wunsch alignment

fasebare Rust crate needleman_wunsch of the fasebare package: reading FASTA sequences, Needleman-Wunsch alignment. Synopsis The crate needleman_wunsch

Laurent Bloch 2 Nov 19, 2021
Terminal text styling via ANSI escape sequences.

Iridescent Features iridescent is a library for styling terminal text easily. It supports basic ANSI sequences, Xterm-256 colors, and RGB. You can ope

Rob 2 Oct 20, 2022
A command-line utility that creates project structure.

petridish A command-line utility that creates project structure. If you have heard of the cookiecutter project, petridish is a rust implementation of

null 11 Dec 29, 2022
A Rust-based shell script to create a folder structure to use for a single class every semester. Mostly an excuse to use Rust.

A Rust Course Folder Shell Script PROJECT IN PROGRESS (Spring 2022) When completed, script will create a folder structure of the following schema: [ro

Sebastián Romero Cruz 1 Apr 10, 2022
🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure. (Python wrapper for daachorse)

python-daachorse daachorse is a fast implementation of the Aho-Corasick algorithm using the compact double-array data structure. This is a Python wrap

Koichi Akabe 11 Nov 30, 2022
Quickly find all blackhole directories with a huge amount of filesystem entries in a flat structure

findlargedir About Findlargedir is a tool specifically written to help quickly identify "black hole" directories on an any filesystem having more than

Dinko Korunic 24 Jan 1, 2023
A structure editor for a simple functional programming language, with Vim-like shortcuts and commands.

dilim A structure editor for a simple functional programming language, with Vim-like shortcuts and commands. Written in Rust, using the Yew framework,

Joomy Korkut 6 Nov 18, 2022
Next-generation, type-safe CLI parser for Rust

Next-generation, type-safe CLI parser for Rust

0918nobita 19 Jul 20, 2022
UnixString is An FFI-friendly null-terminated byte string

UnixString is an FFI-friendly null-terminated byte string that may be constructed from a String, a CString, a PathBuf, an OsString or a collection of bytes.

Vinícius Miguel 20 Apr 14, 2022