A Rust library containing an offline version of webster's dictionary.

Overview

webster-rs

A Rust library containing an offline version of webster's dictionary.

Add to Cargo.toml

webster = 0.3.0

Simple example:

fn main() {
    let word = "silence";

    let definition = webster::dictionary(word).unwrap();

    println!("{} definition: {}", word, definition);
}

The definitions are not great but they'll do for simple projects if you need an open source local dictionary API.

This library uses the dictionary.json file from adambom's dictionary adapted from Webster's Unabridged English Dictionary.

Runtime Decompression

In an effort to reduce binary size (naive storage weighs 9mb), the dictionary is stored in a compressed binary format in the executable (4mb) and then decompressed upon runtime access. The runtime container provides O(log n) access complexity and access time (anecdotally) faster than a BTreeMap.

License

The works in this repository are licensed under the MIT License, with the exception of the contents of dictionary.json, which are licensed under the terms of the Project Gutenberg License:

From Project Gutenberg:

This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.net

You might also like...
A small rust library for creating regex-based lexers

A small rust library for creating regex-based lexers

Implementation of sentence embeddings with BERT in Rust, using the Burn library.
Implementation of sentence embeddings with BERT in Rust, using the Burn library.

Sentence Transformers in Burn This library provides an implementation of the Sentence Transformers framework for computing text representations as vec

A command-line tool and library for generating regular expressions from user-provided test cases
A command-line tool and library for generating regular expressions from user-provided test cases

Table of Contents What does this tool do? Do I still need to learn to write regexes then? Current features How to install? 4.1 The command-line tool 4

A morphological analysis library.

Lindera A Japanese morphological analysis library in Rust. This project fork from fulmicoton's kuromoji-rs. Lindera aims to build a library which is e

Wrapper around Microsoft CNTK library

Bindings for CNTK library Simple low level bindings for CNTK library from Microsoft. API Documentation Status Currently exploring ways how to interact

A lightweight library with vehicle tuning utilities.

A lightweight library with vehicle tuning utilities. This includes utilities for communicating with OBD-II services, firmware downloading/flashing, and table modifications.

lingua-rs Python binding. An accurate natural language detection library, suitable for long and short text alike.

lingua-py lingua-rs Python binding. An accurate natural language detection library, suitable for long and short text alike. Installation pip install l

Library to calculate TF-IDF (Term Frequency - Inverse Document Frequency) for generic documents.

Library to calculate TF-IDF (Term Frequency - Inverse Document Frequency) for generic documents. The library provides strategies to act on objects that implement certain document traits (NaiveDocument, ProcessedDocument, ExpandableDocument).

A simple and fast linear algebra library for games and graphics

glam A simple and fast 3D math library for games and graphics. Development status glam is in beta stage. Base functionality has been implemented and t

Comments
  • Reduction in binary size, access time.

    Reduction in binary size, access time.

    I have not profiled startup time, it's tricky because it only occurs once per run of the executable. Left as an exercise for the reader :)

    On my machine the amortized access times have been reduced to 142ns. That's pretty good, and better than std containers like BTreeMap.

    Further improvements:

    • compressed representation can be made smaller, gzip -9 gives better results than using libflate in the build script. Go figure.
    • load time can probably be improved by using mostly stack-allocated buffers for words.

    Hopefully everything looks good to you.

    opened by wbrickner 6
  • [Opt] do not parse giant JSON string for each query

    [Opt] do not parse giant JSON string for each query

    Hi,

    I am wondering if a PR would be accepted that changes the behavior from reparsing a giant JSON file every query to something better but more complex.

    The idea is to store compressed bytes as a static, in the binary as you have done now.

    The first dictionary access decompresses the bytes and stores them in a lazy_static BTree for low memory usage and high data locality and O(log n) access complexity.

    This would incur a small startup cost, but this startup cost is likely similar to the cost of every single call that we have now. This solution would be very fast for subsequent accesses and reduce binary bloat.

    opened by wbrickner 4
Owner
Grant Handy
I'm a high school student, self-taught programmer, linux/unix user, and Rust enthusiast.
Grant Handy
Fast PDF password cracking utility equipped with commonly encountered password format builders and dictionary attacks.

PDFRip Fast PDF password cracking utility equipped with commonly encountered password format builders and dictionary attacks. ?? Table of Contents Int

Mufeed VH 226 Jan 4, 2023
Rust-nlp is a library to use Natural Language Processing algorithm with RUST

nlp Rust-nlp Implemented algorithm Distance Levenshtein (Explanation) Jaro / Jaro-Winkler (Explanation) Phonetics Soundex (Explanation) Metaphone (Exp

Simon Paitrault 34 Dec 20, 2022
An efficient and powerful Rust library for word wrapping text.

Textwrap Textwrap is a library for wrapping and indenting text. It is most often used by command-line programs to format dynamic output nicely so it l

Martin Geisler 322 Dec 26, 2022
⏮ ⏯ ⏭ A Rust library to easily read forwards, backwards or randomly through the lines of huge files.

EasyReader The main goal of this library is to allow long navigations through the lines of large files, freely moving forwards and backwards or gettin

Michele Federici 81 Dec 6, 2022
Natural language detection library for Rust. Try demo online: https://www.greyblake.com/whatlang/

Whatlang Natural language detection for Rust with focus on simplicity and performance. Content Features Get started Documentation Supported languages

Sergey Potapov 805 Dec 28, 2022
A Rust library for generically joining iterables with a separator

joinery A Rust library for generically joining iterables with a separator. Provides the tragically missing string join functionality to rust. extern c

Nathan West 72 Dec 16, 2022
👄 The most accurate natural language detection library in the Rust ecosystem, suitable for long and short text alike

Table of Contents What does this library do? Why does this library exist? Which languages are supported? How good is it? Why is it better than other l

Peter M. Stahl 569 Jan 3, 2023
A fast, low-resource Natural Language Processing and Text Correction library written in Rust.

nlprule A fast, low-resource Natural Language Processing and Error Correction library written in Rust. nlprule implements a rule- and lookup-based app

Benjamin Minixhofer 496 Jan 8, 2023
Rust wrapper for the BlingFire tokenization library

BlingFire in Rust blingfire is a thin Rust wrapper for the BlingFire tokenization library. Add the library to Cargo.toml to get started cargo add blin

Re:infer 14 Sep 5, 2022
A small random number generator hacked on top of Rust's standard library. An exercise in pointlessness.

attorand from 'atto', meaning smaller than small, and 'rand', short for random. A small random number generator hacked on top of Rust's standard libra

Isaac Clayton 1 Nov 24, 2021