English formatting for unsigned integer IDs

Overview

englishid

englishid forbids unsafe code crate version Live Build Status Documentation for main branch

English formatting for unsigned integers. Useful for encoding large IDs in a human-readable and recognizable format. Uses a modified list of words based on a list created by the EFF.

Basic Usage

Generating an ID can be done from any primitive unsigned integer type:

use englishid::EnglishId;

let english_id = EnglishId::from(42_u16).to_string().unwrap();
assert_eq!(english_id, "accept-abacus");

Use the corresponding parse method to extract the encoded id:

let parsed = englishid::parse_u16("accept-abacus").unwrap();
assert_eq!(parsed, 42);

Restricting word-length

The wordlist used can encode 52 bits of information in 4 words. If you'd prefer to restrict your u64 IDs to 52 bits, you can set the number of words used:

use englishid::EnglishId;

let english_id = EnglishId::from(123456789_u64).words(4).to_string().unwrap();
assert_eq!(english_id, "haunt-subtitle-abandon-abacus");
assert_eq!(englishid::parse_u64(&english_id).unwrap(), 123456789_u64);

If a value is ever out of acceptable ranges, Error::ValueOutOfRange will be returned.

Encoding/decoding arbitrary data

This crate also offers functions that allow encoding arbitrary bytes of information using the same word list. If you will always know the data size, you can use the fixed_length functions:

let payload = b"hello world";
let encoded = englishid::encode_fixed_length(payload);
assert_eq!(encoded, "hatchback-reissue-residual-overbuilt-ladybug-tusk-buffing");
assert_eq!(englishid::decode_fixed_length(&encoded, payload.len()).unwrap(), payload);

If you are encoding payloads of differing lengths and want the length to be encoded into the resulting englishid string, encode() and decode() will do that for you:

let payload = b"hello world";
let encoded = englishid::encode(payload).unwrap();
assert_eq!(encoded, "able-hatchback-reissue-residual-overbuilt-ladybug-tusk-buffing");
assert_eq!(englishid::decode(&encoded).unwrap(), payload);

Or, if you have an enum that can correspond to a byte length, you can use a custom header value:

enum PrivateKey {
    Ed25519([u8; 32]),
    Ed448([u8; 56])
}

impl PrivateKey {
    fn as_bytes(&self) -> &[u8] {
        match self {
            Self::Ed25519(key) => key,
            Self::Ed448(key) => key,
        }
    }

    fn kind(&self) -> u16 {
        match self {
            Self::Ed25519(_) => 1,
            Self::Ed448(_) => 2,
        }
    }

    fn byte_length(kind: u16) -> usize {
        match kind {
            1 => 32,
            2 => 56,
            _ => 0,
        }
    }
}

let key = PrivateKey::Ed25519([0; 32]);
let encoded = englishid::encode_with_custom_header(key.as_bytes(), key.kind()).unwrap();
assert_eq!(englishid::decode_with_custom_header(&encoded, PrivateKey::byte_length).unwrap(), key.as_bytes());

Limits on data encoding

When encoding using the fixed_length APIs, there is no limit to the amount of data that can be encoded.

When using the automatic length header or a custom header, the value in the header cannot be larger than 8190. This limit may be removed in the future, but this crate is not intended for large payload encoding.

Open-source Licenses

This project, like all projects from Khonsu Labs, are open-source. This repository is available under the MIT License or the Apache License 2.0.

You might also like...
A Rust trait to convert numbers of any type and size to their English representation.

num2english This Rust crate provides the NumberToEnglish trait which can be used to convert any* number to its string representation in English. It us

📜 A pci.ids-compliant library for getting information about available PCI devices.

aparato A pci.ids-compliant library for getting information about available PCI devices. Usage Add the following to your project's Cargo.toml file: ap

tiny_id is a Rust library for generating non-sequential, tightly-packed short IDs.

tiny_id tiny_id is a Rust library for generating non-sequential, tightly-packed short IDs. Most other short ID generators just string together random

Make your IDs strongly typed!!

About TypedId introduces a single type, aptly named TypedId. This is a generic wrapper any type, often types that you would use as an identifier. Howe

convert nostr keys and note-ids between hex and bech32

Key-Convertr People are copy-pasting nostr private keys into webpages to convert between the original hex-encoding and bech32-encoding (specified in N

Rust crate implementing short & stable ids based on timestamps

Lexicoid Short & stable IDs based on timestamps. Heavily inspired by Short, friendly base32 slugs from timestamps by @brandur. Install Install with ca

Fast tool to scan for valid 7-long imgur ids for the ArchiveTeam imgur efforts (not affiliated or endorsed)

imgur_id7 Fast tool to scan for valid 7-long imgur ids for the ArchiveTeam imgur efforts (not affiliated or endorsed) Optionally uses supplied http pr

SIMD Floating point and integer compressed vector library

compressed_vec Floating point and integer compressed vector library, SIMD-enabled for fast processing/iteration over compressed representations. This

Pixel-perfect integer scaling for windowed applications

integer-fullscreen Pixel-perfect integer scaling for windowed applications. Usage Run the program. Move your cursor to a window you would like to get

Parse byte size into integer accurately.

parse-size parse-size is an accurate, customizable, allocation-free library for parsing byte size into integer. use parse_size::parse_size; assert_eq

DSP algorithms for embedded. Often integer math.

This crate contains some tuned DSP algorithms for general and especially embedded use.

Cryptography-oriented big integer library with constant-time, stack-allocated (no_std-friendly) implementations of modern formulas

RustCrypto: Cryptographic Big Integers Pure Rust implementation of a big integer library which has been designed from the ground-up for use in cryptog

Highly experimental, pure-Rust big integer library

grou-num (Pronounced "groo", from the Chiac meaning "big") This package is a highly experimental, unstable big integer library. I would not recommend

Simple procedural macros `tnconst![...]`, `pconst![...]`, `nconst![...]` and `uconst![...]` that returns the type level integer from `typenum` crate.

typenum-consts Procedural macros that take a literal integer (or the result of an evaluation of simple mathematical expressions or an environment vari

Cross-platform Rust library for coloring and formatting terminal output
Cross-platform Rust library for coloring and formatting terminal output

Coloring terminal output Documentation term-painter is a cross-platform (i.e. also non-ANSI terminals) Rust library for coloring and formatting termin

 create and test the style and formatting of text in your terminal applications
create and test the style and formatting of text in your terminal applications

description: create and test the style and formatting of text in your terminal applications docs: https://docs.rs/termstyle termstyle is a library tha

const panic with formatting

For panicking with formatting in const contexts. This library exists because the panic macro was stabilized for use in const contexts in Rust 1.57.0,

Fmt-rfcs - RFCs for Rust formatting guidelines and changes to Rustfmt

Rust code formatting RFCs This repository exists to decide on a code style for Rust code, to be enforced by the Rustfmt tool. Accepted RFCs live in th

Vari (Väri) is a Rust library for formatting strings with colors and cosmetic stuff to the terminal.
Vari (Väri) is a Rust library for formatting strings with colors and cosmetic stuff to the terminal.

Vari Vari (Väri) is a Rust library for formatting strings with colors and cosmetic stuff to the terminal. Like Rich library for Python. Väri means "co

Comments
  • once_cell WORDLIST_LOOKUP init causes stack overflow on Windows

    once_cell WORDLIST_LOOKUP init causes stack overflow on Windows

    Weird bug: I reliably get a stack overflow error from the initialization of WORDLIST_LOOKUP

    static WORDLIST_LOOKUP: Lazy<BTreeMap<&'static str, usize>> = Lazy::new(|| {
        let mut words = BTreeMap::new();
        for (index, word) in WORD_LIST.into_iter().enumerate() {
            words.insert(word, index);
        }
        words
    });
    

    Specifically just the for word in WORD_LIST.into_iter() { call

    • This only happens on Windows
    • Only in debug mode (in release mode it works fine)
    • If I change this to for word in WORD_LIST.iter() { ... word.insert(*word, ... it is also fine

    Just changing it to an iterator may be fine :thinking: .iter() is theoretically slower than .into_iter()` (at least the generated assembly is a reasonable chunk longer according to godbolt), but I suspect the actual runtime overhead is pretty trivial

    I'll try and do some timings comparing the two to make sure the difference is indeed minimal, and also want to see if the errors are reproducable on GH Actions

    opened by dbr 2
  • Duplicates and hyphens in wordlist

    Duplicates and hyphens in wordlist

    Hello! Thanks for this, was about to write something exactly like this :partying_face:

    I am randomly generating numbers and encoding them with this library - and a test case failed because

    unnerve-strongly-step-quartet-felt-tip

    failed to parse with the UnknownWord error - because felt-tip is in the word-list, the parsing based around .split("-") falls over

    There was a few others like yo-yo and drop-down which are pretty common words non-hyphenated so I've just removed the -, but a few others like "felttip" didn't work so I've changed them entirely, and same with the duplicates

    opened by dbr 2
  • Tracking Issue: Wordlist v2

    Tracking Issue: Wordlist v2

    While at first word glance the wordlist we started with is good, we might discover changes that we want to accumulate into a "v2" wordlist that is improved. Ways I can imagine the word-list being improved:

    • Homophones: I discovered "aids" and "aide" in the original list. I can imagine there are others. Homophones make it harder to have a friend help you read in a large encoded piece of data.
    • Obscure words: Words that are obscure may take a bit more thought to enter, if the person is unfamiliar with the words.

    If you have any suggestions for words that should be replaced, please just leave a comment below. If the suggestion is because of multiple words being too similar, please list all of the words that you consider too similar, not just the one you think should be replaced.

    • Blocked by #2
    opened by ecton 0
Releases(v0.3.1)
  • v0.3.1(Sep 7, 2022)

    Fixes

    • #4: Fixed stack overflow issue on Windows (and maybe other platforms) in debug mode. This was due to an accidental copy of the word list to the stack during iteration. Thank you @dbr for reporting this and suggesting the correct fix!
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Nov 1, 2021)

    Fixes / Breaking Changes

    • Issues with the word list were addressed by @dbr in pull request #1. The words "opt" and "try" were in the list twice and five words included the hyphen character. Both would generate englishid-encoded data successfully, but in almost all circumstances where those words were selected, parsing would fail.
    Source code(tar.gz)
    Source code(zip)
Owner
Khonsu Labs
Khonsu Labs
const panic with formatting

For panicking with formatting in const contexts. This library exists because the panic macro was stabilized for use in const contexts in Rust 1.57.0,

null 4 Jul 10, 2022
Extended precision integer Rust library. Provides signed/unsigned integer 256 to 2048.

Extended precision integer Rust library. Provides signed/unsigned integer 256 to 2048.

Mohanson 4 Jul 28, 2022
Variable-length signed and unsigned integer encoding that is byte-orderable for Rust

ordered-varint Provides variable-length signed and unsigned integer encoding that is byte-orderable. This crate provides the Variable trait which enco

Khonsu Labs 7 Dec 6, 2022
A simple to use rust package to generate or parse Twitter snowflake IDs,generate time sortable 64 bits unique ids for distributed systems

A simple to use rust package to generate or parse Twitter snowflake IDs,generate time sortable 64 bits unique ids for distributed systems (inspired from twitter snowflake)

houseme 5 Oct 6, 2022
Easy c̵̰͠r̵̛̠ö̴̪s̶̩̒s̵̭̀-t̶̲͝h̶̯̚r̵̺͐e̷̖̽ḁ̴̍d̶̖̔ ȓ̵͙ė̶͎ḟ̴͙e̸̖͛r̶̖͗ë̶̱́ṉ̵̒ĉ̷̥e̷͚̍ s̷̹͌h̷̲̉a̵̭͋r̷̫̊ḭ̵̊n̷̬͂g̵̦̃ f̶̻̊ơ̵̜ṟ̸̈́ R̵̞̋ù̵̺s̷̖̅ţ̸͗!̸̼͋

Rust S̵̓i̸̓n̵̉ I̴n̴f̶e̸r̵n̷a̴l mutability! Howdy, friendly Rust developer! Ever had a value get m̵̯̅ð̶͊v̴̮̾ê̴̼͘d away right under your nose just when

null 294 Dec 23, 2022
Rust explained using easy English

Update 22 December 2020: mdBook can be found here. 28 November 2020: Now also available in simplified Chinese thanks to kumakichi! 1 February 2021: No

null 7.3k Jan 3, 2023
Probabilistically split concatenated words using NLP based on English Wikipedia unigram frequencies.

Untanglr Untanglr takes in a some mangled words and makes sense out of them so you dont have to. It goes through the input and splits it probabilistic

Andrei Butnaru 15 Nov 23, 2022
A blazingly fast command-line tool for converting Chinese punctuations to English punctuations

A blazingly fast command-line tool for converting Chinese punctuations to English punctuations

Hogan Lee 9 Dec 23, 2022
Free Rust 🦀 course in English 🇬🇧

Learn Rust ?? Free Rust ?? course in English ???? This course was inspired by Dcode Before starting to learn a programming language, you need to under

Skwal 10 Jul 5, 2022
An opinionated, better system for spelling words in English.

ingLix / ˈɪŋ glɪʃ / English done right. An opinionated, better system for spelling words in English. Preamble Click to expand. The English language is

Nicholas Omer Chiasson 6 Aug 8, 2022