Probabilistically split concatenated words using NLP based on English Wikipedia unigram frequencies.

Overview

Untanglr

Untanglr

Untanglr takes in a some mangled words and makes sense out of them so you dont have to. It goes through the input and splits it probabilistically into words. The crate includes both a bin.rs and a lib.rs to facilitate both usage as a command line utility, and as a library that you can use in your code.

Usage

Pass the tangled words as a cli argument:

$ untanglr thequickbrownfoxjumpedoverthelazydog
the quick brown fox jumped over the lazy dog

Or use it in your projects:

extern crate untanglr;

fn main() {
	let lm = untanglr::LanguageModel::new();
	println!("{:?}", lm.untangle("helloworld"));
}

Installation

If you find that untanglr might be useful on your machine you can install it. Just make sure cargo is installed and run:

$ cargo install untanglr

Note: Don't be discouraged if this project hasn't been updated in a while. I will address potential issues but the crate does not need regular updates.

Credits

I have developed this project around Derek Anderson's wordninja python implementation for some exercising in rust while producing something useful.

You might also like...
Rust edit distance routines accelerated using SIMD. Supports fast Hamming, Levenshtein, restricted Damerau-Levenshtein, etc. distance calculations and string search.

triple_accel Rust edit distance routines accelerated using SIMD. Supports fast Hamming, Levenshtein, restricted Damerau-Levenshtein, etc. distance cal

A crate using DeepSpeech bindings to convert mic audio from speech to text

DS-TRANSCRIBER Need an Offline Speech To Text converter? Records your mic, and returns a String containing what was said. Features Begins transcriptio

A Markdown to HTML compiler and Syntax Highlighter, built using Rust's pulldown-cmark and tree-sitter-highlight crates.

A blazingly fast( possibly the fastest) markdown to html parser and syntax highlighter built using Rust's pulldown-cmark and tree-sitter-highlight crate natively for Node's Foreign Function Interface.

A lightweight platform-accelerated library for biological motif scanning using position weight matrices.

🎼 🧬 lightmotif A lightweight platform-accelerated library for biological motif scanning using position weight matrices. 🗺️ Overview Motif scanning

Implementation of sentence embeddings with BERT in Rust, using the Burn library.
Implementation of sentence embeddings with BERT in Rust, using the Burn library.

Sentence Transformers in Burn This library provides an implementation of the Sentence Transformers framework for computing text representations as vec

What if we could check declarative macros before using them?
What if we could check declarative macros before using them?

expandable An opinionated attribute-macro based macro_rules! expansion checker. Textbook example rustc treats macro definitions as some opaque piece o

Neural network transition-based dependency parser (in Rust)

dpar Introduction dpar is a neural network transition-based dependency parser. The original Go version can be found in the oldgo branch. Dependencies

Simple STM32F103 based glitcher FW

Airtag glitcher (Bluepill firmware) Simple glitcher firmware running on an STM32F103 on a bluepill board. See https://github.com/pd0wm/airtag-dump for

Difftastic is an experimental structured diff tool that compares files based on their syntax.
Difftastic is an experimental structured diff tool that compares files based on their syntax.

Difftastic is an experimental structured diff tool that compares files based on their syntax.

Rust-tokenizer offers high-performance tokenizers for modern language models, including WordPiece, Byte-Pair Encoding (BPE) and Unigram (SentencePiece) models

rust-tokenizers Rust-tokenizer offers high-performance tokenizers for modern language models, including WordPiece, Byte-Pair Encoding (BPE) and Unigra

null 165 Jan 1, 2023
Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)

rust-bert Rust native Transformer-based models implementation. Port of Hugging Face's Transformers library, using the tch-rs crate and pre-processing

null 1.3k Jan 8, 2023
Simple NLP in Rust with Python bindings

vtext NLP in Rust with Python bindings This package aims to provide a high performance toolkit for ingesting textual data for machine learning applica

Roman Yurchak 133 Jan 3, 2023
An NLP-suite powered by deep learning

DeepFrog - NLP Suite Introduction DeepFrog aims to be a (partial) successor of the Dutch-NLP suite Frog. Whereas the various NLP modules in Frog wre b

Maarten van Gompel 16 Feb 28, 2022
Rust-nlp is a library to use Natural Language Processing algorithm with RUST

nlp Rust-nlp Implemented algorithm Distance Levenshtein (Explanation) Jaro / Jaro-Winkler (Explanation) Phonetics Soundex (Explanation) Metaphone (Exp

Simon Paitrault 34 Dec 20, 2022
Which words can you spell using only element abbreviations from the periodic table?

Periodic Words Have you ever wondered which words you can spell using only element abbreviations from the periodic table? Well thanks to this extremel

J Spencer 11 Apr 26, 2021
Common stop words in a variety of languages

About Stop words are words that don't carry much meaning, and are typically removed as a preprocessing step before text analysis or natural language p

Chris McComb 9 Dec 9, 2022
Predict next words (`・ω・´)

Mocword Predict next words (`・ω・´) Installation Important: You must prepare Mocword dataset in advance. See below (Dataset and Environment Variable).

high-moctane 36 Dec 3, 2022
SHA256 sentence: discover a SHA256 checksum that matches a sentence's description of hex digit words.

SHA256 sentence "The SHA256 for this sentence begins with: one, eight, two, a, seven, c and nine." Inspired by @lauriewired post Inspired by @humbleha

Joel Parker Henderson 16 Oct 9, 2023
A small CLI utility for helping you learn japanese words made in rust 🦀

Memofante (Clique aqui ver em português) Memofante is here, a biiiig help: Do you often forget japanese words you really didn't want to forget? Do you

Tiaguinho 3 Nov 4, 2023