10 Repositories
Rust sentence-transformers Libraries
SHA256 sentence: discover a SHA256 checksum that matches a sentence's description of hex digit words.
SHA256 sentence "The SHA256 for this sentence begins with: one, eight, two, a, seven, c and nine." Inspired by @lauriewired post Inspired by @humbleha
Implementation of sentence embeddings with BERT in Rust, using the Burn library.
Sentence Transformers in Burn This library provides an implementation of the Sentence Transformers framework for computing text representations as vec
Algebraic structures, higher-kinded types and other category theory bad ideas
Algar Algebric structures, higher-kinded types and other category theory bad ideas. Yes, you'll have generalized functors, applicatives, monads, trave
A rule based sentence segmentation library.
cutters A rule based sentence segmentation library. π§ This library is experimental. π§ Features Full UTF-8 support. Robust parsing. Language specific
Parse BNF grammar definitions
bnf A library for parsing BackusβNaur form context-free grammars. What does a parsable BNF grammar look like? The following grammar from the Wikipedia
An NLP-suite powered by deep learning
DeepFrog - NLP Suite Introduction DeepFrog aims to be a (partial) successor of the Dutch-NLP suite Frog. Whereas the various NLP modules in Frog wre b
Rust port of sentence-transformers (https://github.com/UKPLab/sentence-transformers)
Rust SBert Rust port of sentence-transformers using rust-bert and tch-rs. Supports both rust-tokenizers and Hugging Face's tokenizers. Supported model
Semantic text segmentation. For sentence boundary detection, compound splitting and more.
NNSplit A tool to split text using a neural network. The main application is sentence boundary detection, but e. g. compound splitting for German is a
π₯ Fast State-of-the-Art Tokenizers optimized for Research and Production
Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tok
π₯ Fast State-of-the-Art Tokenizers optimized for Research and Production
Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tok