Home / Rust Text processing
154 Repositories
Sortby
What The Fuck Was That? Find out what just happened to your Substrate Transaction by attaching a Rust debugger and re-executing it in a unit test envi
Mainline Simple, robust, BitTorrent's Mainline DHT implementation. This library is focused on being the best and simplest Rust client for Mainline, es
Memofante (Clique aqui ver em português) Memofante is here, a biiiig help: Do you often forget japanese words you really didn't want to forget? Do you
expandable An opinionated attribute-macro based macro_rules! expansion checker. Textbook example rustc treats macro definitions as some opaque piece o
SHA256 sentence "The SHA256 for this sentence begins with: one, eight, two, a, seven, c and nine." Inspired by @lauriewired post Inspired by @humbleha
candle-lora LoRA (low rank adaptation) implemented in Rust for use with Candle. This technique interchanges the fully-trainable layers of the model wi
ferrugem Aren't you pistola from writing Rust programs in English? Do you like saying "caralho" a lot? Would you like to try something different, in a
bytepiece Implementation of Su's bytepiece. Bytepiece is a new tokenize method, which uses UTF-8 Byte as unigram to process text. It needs little prep
Sentence Transformers in Burn This library provides an implementation of the Sentence Transformers framework for computing text representations as vec
A "Naive" Implementation of the Wavefront Algorithm for Sequence Alignment with Gap-Affine Scoring This repository contains some simple code that I wr
🎼 🧬 lightmotif A lightweight platform-accelerated library for biological motif scanning using position weight matrices. 🗺️ Overview Motif scanning
murasaki: Nostr to Speech ⚠ このソフトウェアはα版です ⚠ VOICEVOX を利用したタイムライン読み上げツールです。 指定したリレーのグローバルタイムライン、または指定した公開鍵でフォローしているユーザのタイムラインを読み上げます。 つかいかた Rust をインストー
Quickner ⚡ A simple, fast, and easy to use NER annotator for Python Quickner is a new tool to quickly annotate texts for NER (Named Entity Recognition
mtop mtop: top for Memcached. Features Display real-time statistics about your memcached servers such as Memory usage/limit Current/max connections Hi
tx-decoder A quick way to decode a contract's transaction data with only the contract address and abi. E.g, let tx_data = "0xe70dd2fc00000000000000000
Is It Planet Yet? This is just an experiment: me trying to learn how to build real-time procedural planets. Any suggestions/contributions is welcome.
rust-bert Rust native Transformer-based models implementation. Port of Hugging Face's Transformers library, using the tch-rs crate and pre-processing
nlprule A fast, low-resource Natural Language Processing and Error Correction library written in Rust. nlprule implements a rule- and lookup-based app
regex A Rust library for parsing, compiling, and executing regular expressions. Its syntax is similar to Perl-style regular expressions, but lacks a f
simdutf8 – High-speed UTF-8 validation for Rust Blazingly fast API-compatible UTF-8 validation for Rust using SIMD extensions, based on the implementa
triple_accel Rust edit distance routines accelerated using SIMD. Supports fast Hamming, Levenshtein, restricted Damerau-Levenshtein, etc. distance cal
Noq Not Coq. Simple expression transformer that is not Coq. Quick Start $ cargo run ./examples/add.noq Main Idea The Main Idea is being able to define
frawk frawk is a small programming language for writing short programs processing textual data. To a first approximation, it is an implementation of t
A backend for mdBook written in Rust for generating PDF based on headless chrome and Chrome DevTools Protocol.
Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tok
PDFRip Fast PDF password cracking utility equipped with commonly encountered password format builders and dictionary attacks. 📖 Table of Contents Int
cpc calculation + conversion cpc parses and evaluates strings of math, with support for units and conversion. 128-bit decimal floating points are used
glam A simple and fast 3D math library for games and graphics. Development status glam is in beta stage. Base functionality has been implemented and t
Table of Contents What does this library do? Why does this library exist? Which languages are supported? How good is it? Why is it better than other l
vtext NLP in Rust with Python bindings This package aims to provide a high performance toolkit for ingesting textual data for machine learning applica
Difftastic is an experimental structured diff tool that compares files based on their syntax.
rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc. rga is a line-oriented search tool that allows you to look for a r
Introduction finalfusion is a crate for reading, writing, and using embeddings in Rust. finalfusion primarily works with its own format which supports
Source text parsing, lexing, and AST related functionality for Deno.
rust-tokenizers Rust-tokenizer offers high-performance tokenizers for modern language models, including WordPiece, Byte-Pair Encoding (BPE) and Unigra
aho-corasick A library for finding occurrences of many patterns at once with SIMD acceleration in some cases. This library provides multiple pattern s
ipa_renamer A command line tool for renaming your ipa files quickly and easily. Usage ipa_renamer 0.0.1 A command line tool for renaming your ipa file
sabi In Japanese version https://github.com/bnjbvr/rouille. Shamelessly copied and updated from it. 日本語で Rust プログラムを書くことができます! 例 main.rs sabi::sabi! {
Table of Contents What does this tool do? Do I still need to learn to write regexes then? Current features How to install? 4.1 The command-line tool 4
The fastest way to identify anything lemmeknow ⚡ Identify any mysterious text or analyze strings from a file, just ask lemmeknow. lemmeknow can be use
html5gum html5gum is a WHATWG-compliant HTML tokenizer. use std::fmt::Write; use html5gum::{Tokenizer, Token}; let html = "title hello world/tit
sudachi.rs - English README 2021-12-09 UPDATE: 0.6.2 Release Try it: pip install --update 'sudachipy=0.6.2' sudachi.rs is a Rust implementation of Su
N-grams Documentation This crate takes a sequence of tokens and generates an n-gram for it. For more information about n-grams, check wikipedia: https
lingua-py lingua-rs Python binding. An accurate natural language detection library, suitable for long and short text alike. Installation pip install l
Vidyut मा भूदेवं क्षणमपि च ते विद्युता विप्रयोगः ॥ Vidyut is a lightning-fast toolkit for processing Sanskrit text. Vidyut aims to provide standard co
bottom encodes UTF-8 text into a sequence comprised of bottom emoji (with , sprinkled in for good measure) followed by 👉👈. It can encode any valid UTF-8 - being a bottom transcends language, after all - and decode back into UTF-8.
finalfrontier Introduction finalfrontier is a Rust program for training word embeddings. finalfrontier currently has the following features: Models: s
Rust subtitle utilities Are you looking for substudy? Try here. (substudy has been merged into the subtitles-rs project.) This repository contains a n
🐍 python-vibrato 🎤 Vibrato is a fast implementation of tokenization (or morphological analysis) based on the Viterbi algorithm. This is a Python wra
uwuify fastest text uwuifier in the west transforms Hey... I think I really love you. Do you want a headpat? into hey... i think i w-weawwy wuv you.
NNSplit A tool to split text using a neural network. The main application is sentence boundary detection, but e. g. compound splitting for German is a
2021-07-07 UPDATE: The official Sudachi team will take over this project (cf. 日本語形態素解析器 SudachiPy の 現状と今後について - Speaker Deck) sudachi.rs An official S
Find Files (ff) Find Files (ff) utility recursively searches the files whose names match the specified RegExp pattern in the provided directory (defau
Ruplacer Find and replace text in source files: $ ruplacer old new src/ Patching src/a_dir/sub/foo.txt -- old is everywhere, old is old ++ new is ever