38 Repositories
Rust nlp Libraries
The Bytepiece Tokenizer Implemented in Rust.
bytepiece Implementation of Su's bytepiece. Bytepiece is a new tokenize method, which uses UTF-8 Byte as unigram to process text. It needs little prep
Believe in AI democratization. llama for nodejs backed by llama-rs, work locally on your laptop CPU. support llama/alpaca model.
llama-node Large Language Model LLaMA on node.js This project is in an early stage, the API for nodejs may change in the future, use it with caution.
A Rust library for interacting with OpenAI's ChatGPT API, providing an easy-to-use interface and strongly typed structures.
ChatGPT Rust Library A Rust library for interacting with OpenAI's ChatGPT API. This library simplifies the process of making requests to the ChatGPT A
Quickner is a new tool to quickly annotate texts for NER (Named Entity Recognition). It is written in Rust and accessible through a Python API.
Quickner ⚡ A simple, fast, and easy to use NER annotator for Python Quickner is a new tool to quickly annotate texts for NER (Named Entity Recognition
Viterbi-based accelerated tokenizer (Python wrapper)
🐍 python-vibrato 🎤 Vibrato is a fast implementation of tokenization (or morphological analysis) based on the Viterbi algorithm. This is a Python wra
A lightning-fast Sanskrit toolkit. For Python bindings, see `vidyut-py`.
Vidyut मा भूदेवं क्षणमपि च ते विद्युता विप्रयोगः ॥ Vidyut is a lightning-fast toolkit for processing Sanskrit text. Vidyut aims to provide standard co
🛥 Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto.
🐍 python-vaporetto 🛥 Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto. Installation
A rule based sentence segmentation library.
cutters A rule based sentence segmentation library. 🚧 This library is experimental. 🚧 Features Full UTF-8 support. Robust parsing. Language specific
Ultra-fast, spookily accurate text summarizer that works on any language
pithy 0.1.0 - an absurdly fast, strangely accurate, summariser Quick example: pithy -f your_file_here.txt --sentences 4 --help: Print this help messa
tongrams-rs: Tons of N-grams in Rust
tongrams-rs: Tons of N-grams in Rust This is a Rust port of tongrams to index and query large language models in compressed space, in which the data s
Talk with your machine in this minimalistic Rust crate!
Speak Speak is a simple, easy to use Natural Language Processor (NLP) written in Rust. Why use Speak? Speak uses a custom engine, and to setup you jus
Common stop words in a variety of languages
About Stop words are words that don't carry much meaning, and are typically removed as a preprocessing step before text analysis or natural language p
A naive (read: slow) implementation of Word2Vec. Uses BLAS behind the scenes for speed.
SloWord2Vec This is a naive implementation of Word2Vec implemented in Rust. The goal is to learn the basic principles and formulas behind Word2Vec. BT
An official Sudachi clone in Rust 🦀
sudachi.rs - English README 2021-12-09 UPDATE: 0.6.2 Release Try it: pip install --update 'sudachipy=0.6.2' sudachi.rs is a Rust implementation of Su
A fast, searchable, knowledge engine using various machine learning models to aggregate based on importance, association and relevance
NewsAggregator We live in an era where both the demand and quantity of information are enormous. However, the way we store and access that information
Checks all your documentation for spelling and grammar mistakes with hunspell and a nlprule based checker for grammar
cargo-spellcheck Check your spelling with hunspell and/or nlprule. Use Cases Run cargo spellcheck --fix or cargo spellcheck fix to fix all your docume
Composable n-gram combinators that are ergonomic and bare-metal fast
CREATURE FEATUR(ization) A crate for polymorphic ML & NLP featurization that leverages zero-cost abstraction. It provides composable n-gram combinator
This is a simple Telegram bot with interface to Firefly III to process and store simple transactions.
Firefly Telegram Bot Fireflies are free, so beautiful. (Les lucioles sont libres, donc belles.) ― Charles de Leusse, Les Contes de la nuit This is a s
Vaporetto: a fast and lightweight pointwise prediction based tokenizer
🛥 VAporetto: POintwise pREdicTion based TOkenizer Vaporetto is a fast and lightweight pointwise prediction based tokenizer. Overview This repository
Probabilistically split concatenated words using NLP based on English Wikipedia unigram frequencies.
Untanglr Untanglr takes in a some mangled words and makes sense out of them so you dont have to. It goes through the input and splits it probabilistic
AI-powered search engine for Rust
txtai: AI-powered search engine for Rust txtai executes machine-learning workflows to transform data and build AI-powered text indices to perform simi
fastText Rust binding
fasttext-rs fastText Rust binding Installation Add it to your Cargo.toml: [dependencies] fasttext = "0.6" Add extern crate fasttext to your crate root
A Japanese Morphological Analyzer written in pure Rust
Yoin - A Japanese Morphological Analyzer yoin is a Japanese morphological analyze engine written in pure Rust. mecab-ipadic is embedded in yoin. :) $
An official Sudachi clone in Rust (incomplete) 🦀
2021-07-07 UPDATE: The official Sudachi team will take over this project (cf. 日本語形態素解析器 SudachiPy の 現状と今後について - Speaker Deck) sudachi.rs An official S
Rust-nlp is a library to use Natural Language Processing algorithm with RUST
nlp Rust-nlp Implemented algorithm Distance Levenshtein (Explanation) Jaro / Jaro-Winkler (Explanation) Phonetics Soundex (Explanation) Metaphone (Exp
A rust implementation of some popular snowball stemming algorithms
Rust Stemmers This crate implements some stemmer algorithms found in the snowball project which are compiled to rust using the rust-backend of the sno
A HDPSG-inspired symbolic natural language parser written in Rust
Treebender A symbolic natural language parsing library for Rust, inspired by HDPSG. What is this? This is a library for parsing natural or constructed
🦀 A Rust implementation of a RoBERTa classification model for the SNLI dataset
RustBERTa-SNLI A Rust implementation of a RoBERTa classification model for the SNLI dataset, with support for fine-tuning, predicting, and serving. Th
Rust wrapper for the BlingFire tokenization library
BlingFire in Rust blingfire is a thin Rust wrapper for the BlingFire tokenization library. Add the library to Cargo.toml to get started cargo add blin
Read and modify constituency trees in Rust.
lumberjack Read and process constituency trees in various formats. Install: From crates.io: cargo install lumberjack-utils From GitHub: cargo install
An NLP-suite powered by deep learning
DeepFrog - NLP Suite Introduction DeepFrog aims to be a (partial) successor of the Dutch-NLP suite Frog. Whereas the various NLP modules in Frog wre b
Rust port of sentence-transformers (https://github.com/UKPLab/sentence-transformers)
Rust SBert Rust port of sentence-transformers using rust-bert and tch-rs. Supports both rust-tokenizers and Hugging Face's tokenizers. Supported model
Simple NLP in Rust with Python bindings
vtext NLP in Rust with Python bindings This package aims to provide a high performance toolkit for ingesting textual data for machine learning applica
A fast, low-resource Natural Language Processing and Text Correction library written in Rust.
nlprule A fast, low-resource Natural Language Processing and Error Correction library written in Rust. nlprule implements a rule- and lookup-based app
Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)
rust-bert Rust native Transformer-based models implementation. Port of Hugging Face's Transformers library, using the tch-rs crate and pre-processing
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tok
Natural language detection library for Rust. Try demo online: https://www.greyblake.com/whatlang/
Whatlang Natural language detection for Rust with focus on simplicity and performance. Content Features Get started Documentation Supported languages
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tok