127 Repositories
glam A simple and fast 3D math library for games and graphics. Development status glam is in beta stage. Base functionality has been implemented and t
aho-corasick A library for finding occurrences of many patterns at once with SIMD acceleration in some cases. This library provides multiple pattern s
frawk frawk is a small programming language for writing short programs processing textual data. To a first approximation, it is an implementation of t
Difftastic is an experimental structured diff tool that compares files based on their syntax.
SSS - so stupid search tool 阿Q的哥锐普 English Documentation install install from source code 1.install rust toolchain curl --proto '=https' --tlsv1.2 -
2021-07-07 UPDATE: The official Sudachi team will take over this project (cf. 日本語形態素解析器 SudachiPy の 現状と今後について - Speaker Deck) sudachi.rs An official S
sudachi.rs - English README 2021-12-09 UPDATE: 0.6.2 Release Try it: pip install --update 'sudachipy=0.6.2' sudachi.rs is a Rust implementation of Su
duckscript duckscript SDK CLI Simple, extendable and embeddable scripting language. Overview Language Goals Installation Homebrew Binary Release Ducks
Table of Contents What does this library do? Why does this library exist? Which languages are supported? How good is it? Why is it better than other l
A backend for mdBook written in Rust for generating PDF based on headless chrome and Chrome DevTools Protocol.
Textwrap Textwrap is a library for wrapping and indenting text. It is most often used by command-line programs to format dynamic output nicely so it l
🛥 VAporetto: POintwise pREdicTion based TOkenizer Vaporetto is a fast and lightweight pointwise prediction based tokenizer. Overview This repository
WTF Is I was tired of looking up Metaplex program errors. wtf-is hex_code_no_prefix Example: $ wtf-is 37 0x37: Token Metadata
uwuify fastest text uwuifier in the west transforms Hey... I think I really love you. Do you want a headpat? into hey... i think i w-weawwy wuv you.
helfsteal Simple Data Stealer Hi All, I published basic data stealer malware with Rust. FOR EDUCATIONAL PURPOSES. You can use it for Red Team operatio
bottom encodes UTF-8 text into a sequence comprised of bottom emoji (with , sprinkled in for good measure) followed by 👉👈. It can encode any valid UTF-8 - being a bottom transcends language, after all - and decode back into UTF-8.
nlprule A fast, low-resource Natural Language Processing and Error Correction library written in Rust. nlprule implements a rule- and lookup-based app
This is an implementation of reactive streams, which, at the high level, is patterned off of the interfaces and protocols defined in http://reactive-s
Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tok
simdutf8 – High-speed UTF-8 validation for Rust Blazingly fast API-compatible UTF-8 validation for Rust using SIMD extensions, based on the implementa
The fastest way to identify anything lemmeknow ⚡ Identify any mysterious text or analyze strings from a file, just ask lemmeknow. lemmeknow can be use
Noq Not Coq. Simple expression transformer that is not Coq. Quick Start $ cargo run ./examples/add.noq Main Idea The Main Idea is being able to define
cargo-spellcheck Check your spelling with hunspell and/or nlprule. Use Cases Run cargo spellcheck --fix or cargo spellcheck fix to fix all your docume
Ruplacer Find and replace text in source files: $ ruplacer old new src/ Patching src/a_dir/sub/foo.txt -- old is everywhere, old is old ++ new is ever
sweetpaste sweetpaste is a sweet n' simple pastebin server. It's completely server-side, with zero client-side code. Configuration The configuration w
rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc. rga is a line-oriented search tool that allows you to look for a r
regex A Rust library for parsing, compiling, and executing regular expressions. Its syntax is similar to Perl-style regular expressions, but lacks a f
rust-bert Rust native Transformer-based models implementation. Port of Hugging Face's Transformers library, using the tch-rs crate and pre-processing
html5gum html5gum is a WHATWG-compliant HTML tokenizer. use std::fmt::Write; use html5gum::{Tokenizer, Token}; let html = "title hello world/tit
Source text parsing, lexing, and AST related functionality for Deno.
Google CP-SAT solver Rust bindings Rust bindings to the Google CP-SAT constraint programming solver. To use this library, you need a C++ compiler and
PDFRip Fast PDF password cracking utility equipped with commonly encountered password format builders and dictionary attacks. 📖 Table of Contents Int
Lindera A Japanese morphological analysis library in Rust. This project fork from fulmicoton's kuromoji-rs. Lindera aims to build a library which is e
Table of Contents What does this tool do? Do I still need to learn to write regexes then? Current features How to install? 4.1 The command-line tool 4
Find Files (ff) Find Files (ff) utility recursively searches the files whose names match the specified RegExp pattern in the provided directory (defau
👽 Not Only a Translator 🌏 English·中文 🌏 This program is not just a translation software, it is not named yet. Supports conversion of input character
sabi In Japanese version https://github.com/bnjbvr/rouille. Shamelessly copied and updated from it. 日本語で Rust プログラムを書くことができます! 例 main.rs sabi::sabi! {
Whatlang Natural language detection for Rust with focus on simplicity and performance. Content Features Get started Documentation Supported languages
Rust subtitle utilities Are you looking for substudy? Try here. (substudy has been merged into the subtitles-rs project.) This repository contains a n
Snips NLU Rust Installation Add it to your Cargo.toml: [dependencies] snips-nlu-lib = { git = "https://github.com/snipsco/snips-nlu-rs", branch = "mas
🐍 python-vaporetto 🛥 Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto. Installation
runiq This project offers an efficient way (in both time and space) to filter duplicate entries (lines) from texual input. This project was born from
BlingFire in Rust blingfire is a thin Rust wrapper for the BlingFire tokenization library. Add the library to Cargo.toml to get started cargo add blin
rust-tokenizers Rust-tokenizer offers high-performance tokenizers for modern language models, including WordPiece, Byte-Pair Encoding (BPE) and Unigra
rs-natural Natural language processing library written in Rust. Still very much a work in progress. Basically an experiment, but hey maybe something c
Automata Build On Ubuntu/Debian (or similar distributions on WSL), install the following packages: sudo apt-get update sudo apt-get install -y build-e
suffix Fast linear time & space suffix arrays for Rust. Supports Unicode! Dual-licensed under MIT or the UNLICENSE. Documentation https://docs.rs/suff
Rust SBert Rust port of sentence-transformers using rust-bert and tch-rs. Supports both rust-tokenizers and Hugging Face's tokenizers. Supported model
triple_accel Rust edit distance routines accelerated using SIMD. Supports fast Hamming, Levenshtein, restricted Damerau-Levenshtein, etc. distance cal
NNSplit A tool to split text using a neural network. The main application is sentence boundary detection, but e. g. compound splitting for German is a
bytelines This library provides an easy way to read in input lines as byte slices for high efficiency. It's basically lines from the standard library,
webster-rs A Rust library containing an offline version of webster's dictionary. Add to Cargo.toml webster = 0.3.0 Simple example: fn main() { le
pq - query textual streams with PromQL Glossary Time Series - a stream of timestamped values, aka samples sharing the same metric name and, optionally
tabwriter is a crate that implements elastic tabstops. It provides both a library for wrapping Rust Writers and a small program that exposes the same