Rust wrapper for the BlingFire tokenization library

Overview

Build Status Documentation Crate

BlingFire in Rust

blingfire is a thin Rust wrapper for the BlingFire tokenization library.

Add the library to Cargo.toml to get started

cargo add blingfire

The library exposes two functions text_to_words and text_to_sentences

use blingfire;

fn main() {
    let mut parsed = String::new();

    blingfire::text_to_words("Cat,sat on   the mat.", &mut parsed).unwrap();
    assert_eq!(parsed.as_str(), "Cat , sat on the mat .");

    blingfire::text_to_sentences("Cat sat. Dog barked.", &mut parsed).unwrap();
    assert_eq!(parsed.as_str(), "Cat sat.\nDog barked.");
}

The code is licensed under the MIT License.

You might also like...
A fast, low-resource Natural Language Processing and Text Correction library written in Rust.

nlprule A fast, low-resource Natural Language Processing and Error Correction library written in Rust. nlprule implements a rule- and lookup-based app

A Rust library containing an offline version of webster's dictionary.

webster-rs A Rust library containing an offline version of webster's dictionary. Add to Cargo.toml webster = 0.3.0 Simple example: fn main() { le

A small random number generator hacked on top of Rust's standard library. An exercise in pointlessness.

attorand from 'atto', meaning smaller than small, and 'rand', short for random. A small random number generator hacked on top of Rust's standard libra

A small rust library for creating regex-based lexers

A small rust library for creating regex-based lexers

Implementation of sentence embeddings with BERT in Rust, using the Burn library.
Implementation of sentence embeddings with BERT in Rust, using the Burn library.

Sentence Transformers in Burn This library provides an implementation of the Sentence Transformers framework for computing text representations as vec

A command-line tool and library for generating regular expressions from user-provided test cases
A command-line tool and library for generating regular expressions from user-provided test cases

Table of Contents What does this tool do? Do I still need to learn to write regexes then? Current features How to install? 4.1 The command-line tool 4

A morphological analysis library.

Lindera A Japanese morphological analysis library in Rust. This project fork from fulmicoton's kuromoji-rs. Lindera aims to build a library which is e

A lightweight library with vehicle tuning utilities.

A lightweight library with vehicle tuning utilities. This includes utilities for communicating with OBD-II services, firmware downloading/flashing, and table modifications.

lingua-rs Python binding. An accurate natural language detection library, suitable for long and short text alike.

lingua-py lingua-rs Python binding. An accurate natural language detection library, suitable for long and short text alike. Installation pip install l

Owner
Re:infer
The communications intelligence platform that uses state-of-the-art machine learning technology to help businesses cut costs and drive revenue.
Re:infer
nombytes is a library that provides a wrapper for the bytes::Bytes byte container for use with nom.

NomBytes nombytes is a library that provides a wrapper for the bytes::Bytes byte container for use with nom. I originally made this so that I could ha

Alexander Krivács Schrøder 2 Jul 25, 2022
A Rust wrapper for the Text synthesization service TextSynth API

A Rust wrapper for the Text synthesization service TextSynth API

ALinuxPerson 2 Mar 24, 2022
🛥 Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto.

?? python-vaporetto ?? Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto. Installation

null 17 Dec 22, 2022
Viterbi-based accelerated tokenizer (Python wrapper)

?? python-vibrato ?? Vibrato is a fast implementation of tokenization (or morphological analysis) based on the Viterbi algorithm. This is a Python wra

null 20 Dec 29, 2022
Rust-nlp is a library to use Natural Language Processing algorithm with RUST

nlp Rust-nlp Implemented algorithm Distance Levenshtein (Explanation) Jaro / Jaro-Winkler (Explanation) Phonetics Soundex (Explanation) Metaphone (Exp

Simon Paitrault 34 Dec 20, 2022
An efficient and powerful Rust library for word wrapping text.

Textwrap Textwrap is a library for wrapping and indenting text. It is most often used by command-line programs to format dynamic output nicely so it l

Martin Geisler 322 Dec 26, 2022
⏮ ⏯ ⏭ A Rust library to easily read forwards, backwards or randomly through the lines of huge files.

EasyReader The main goal of this library is to allow long navigations through the lines of large files, freely moving forwards and backwards or gettin

Michele Federici 81 Dec 6, 2022
Natural language detection library for Rust. Try demo online: https://www.greyblake.com/whatlang/

Whatlang Natural language detection for Rust with focus on simplicity and performance. Content Features Get started Documentation Supported languages

Sergey Potapov 805 Dec 28, 2022
A Rust library for generically joining iterables with a separator

joinery A Rust library for generically joining iterables with a separator. Provides the tragically missing string join functionality to rust. extern c

Nathan West 72 Dec 16, 2022
👄 The most accurate natural language detection library in the Rust ecosystem, suitable for long and short text alike

Table of Contents What does this library do? Why does this library exist? Which languages are supported? How good is it? Why is it better than other l

Peter M. Stahl 569 Jan 3, 2023