Fast and compact sets of bytes or ASCII characters

Overview

bset

Crates.io status Docs

Fast and compact sets of bytes and ASCII characters, useful for searching, parsing and determining membership of a given byte in the given set. They don't use any allocation, nor even any std features. In fact, all of the provided functions are const, so they can be freely constructed at the compile time

Sets

This crate exports two set types - ByteSet for a set of general bytes and AsciiSet for a set restricted to the range of ASCII characters. The is two times smaller in the memory, but comes with a slight performance trade-off in having to check whether the given character belongs to the ASCII range.

use ascii_set::AsciiSet;

const OP: AsciiSet = AsciiSet::new().add_bytes(b"+-*/%&|^");
assert!(OP.contains(b'%'));

The sets are implemented as an array of pointer-sized bit masks.

Stack of sets

Inspired by this article by Maciej Hirsz, this crate provides a way to stack multiple sets into one structure to gain even more performance. To not slow it down, sets are "obtained" from the given stack at the type level. For this, the module bits contains types B0, B1, ..., B7 representing indices of a set in the stack. Because const fns currently don't support generic functions, the sets are indexed by the order they were added to the stack. Type aliases can be used to identify the sets within the stack:

	use ascii_set::{bits::*, ByteSet, ByteStack};

const BYTE_STACK: ByteStack<B3> = ByteStack::new()
    .add_set(ByteSet::DIGITS)
    .add_set(ByteSet::ALPHABETIC)
    .add_set(ByteSet::new().add_bytes(b"+-*/%&|^"));
type Digits = B0;
type Alphabetic = B1;
type Operations = B2;
assert!(BYTE_STACK.contains::<Operations>(b'%'));

Again, there are two versions, ByteStack for all bytes and AsciiStack restricted to the ASCII range. Benchmarks show that testing the set membership is about 20% faster with stacked sets. They come with 8 times larger memory size (128/256 bytes vs. 16/32), which does not increase with the stacks added, so when 8 sets (the maximum number) are used in one stack, the memory size is equivalent.

Benchmarks

Stacked full byte set version consistently outperforms both matching and std is_ascii_* functions. For some simple sets, the set version can be a bit slower.

Alphanumeric characters:

test alnum_ascii_set        ... bench:       1,051 ns/iter (+/- 48) = 974 MB/s
test alnum_ascii_stack      ... bench:         801 ns/iter (+/- 33) = 1278 MB/s
test alnum_byte_set         ... bench:         839 ns/iter (+/- 50) = 1220 MB/s
test alnum_byte_stack       ... bench:         620 ns/iter (+/- 33) = 1651 MB/s
test alnum_is_alnum         ... bench:       1,574 ns/iter (+/- 70) = 650 MB/s
test alnum_match            ... bench:       1,573 ns/iter (+/- 86) = 650 MB/s

Alphabetic characters:

test letter_ascii_set       ... bench:       1,027 ns/iter (+/- 42) = 997 MB/s
test letter_ascii_stack     ... bench:         943 ns/iter (+/- 45) = 1085 MB/s
test letter_byte_set        ... bench:         839 ns/iter (+/- 34) = 1220 MB/s
test letter_byte_stack      ... bench:         619 ns/iter (+/- 29) = 1654 MB/s
test letter_is_alphabetic   ... bench:         820 ns/iter (+/- 42) = 1248 MB/s
test letter_match           ... bench:         825 ns/iter (+/- 36) = 1241 MB/s

Lowercase characters:

test lowercase_ascii_set    ... bench:       1,197 ns/iter (+/- 52) = 855 MB/s
test lowercase_ascii_stack  ... bench:         893 ns/iter (+/- 45) = 1146 MB/s
test lowercase_byte_set     ... bench:         890 ns/iter (+/- 44) = 1150 MB/s
test lowercase_byte_stack   ... bench:         451 ns/iter (+/- 14) = 2270 MB/s
test lowercase_is_lowercase ... bench:         752 ns/iter (+/- 33) = 1361 MB/s
test lowercase_match        ... bench:         771 ns/iter (+/- 67) = 1328 MB/s

URI reserved characters (per RFC 3986, section 2.2):

test uri_ascii_set          ... bench:       1,243 ns/iter (+/- 87) = 823 MB/s
test uri_ascii_stack        ... bench:         887 ns/iter (+/- 103) = 1154 MB/s
test uri_byte_set           ... bench:         905 ns/iter (+/- 84) = 1131 MB/s
test uri_byte_stack         ... bench:         610 ns/iter (+/- 35) = 1678 MB/s
test uri_match              ... bench:       1,294 ns/iter (+/- 45) = 791 MB/s

License: MIT

You might also like...
Decode Mode S and ADS-B signals in Rust

rs1090 rs1090 is a Rust library to decode Mode S and ADS-B messages. It takes its inspiration from the Python pyModeS library, and uses deku in order

A more compact and intuitive ASCII table in your terminal: an alternative to
A more compact and intuitive ASCII table in your terminal: an alternative to "man 7 ascii" and "ascii"

asciit A more compact and intuitive ASCII table in your terminal: an alternative to man 7 ascii and ascii. Colored numbers and letters are much more e

nombytes is a library that provides a wrapper for the bytes::Bytes byte container for use with nom.

NomBytes nombytes is a library that provides a wrapper for the bytes::Bytes byte container for use with nom. I originally made this so that I could ha

Tiny Rust library to draw pretty line graphs using ascii characters.

rasciigraph Tiny Rust library to draw pretty line graphs using ascii characters. Usage Add this to your Cargo.toml [dependencies] rasciigraph = "0.1.1

A Rust library for drawing grid-based user interfaces using ASCII characters.

grux A library for drawing grid-based user interfaces using ASCII characters. // Provides a uniform interface for drawing to a 2D grid. use grux::Grid

Perspective projection of a spinning cube, using just ASCII characters.

Spinning Cube Perspective projection of a spinning cube, using just ASCII characters. spinning_cube.mp4 Instalation cargo install spinning_cube Basic

tai (Terminal Ascii Image) tool to convert images to ascii written in Rust
tai (Terminal Ascii Image) tool to convert images to ascii written in Rust

TAI Terminal Ascii Image A tool to convert images to ascii art written in Rust 🦀 Notes This tool is still in development stage. Contributions All Con

A terminal ASCII media player. View images, gifs, videos, webcam, YouTube, etc.. directly in the terminal as ASCII art.
A terminal ASCII media player. View images, gifs, videos, webcam, YouTube, etc.. directly in the terminal as ASCII art.

Terminal Media Player View images, videos (files or YouTube links), webcam, etc directly in the terminal as ASCII. All images you see below are just m

A series of compact encoding schemes for building small and fast parsers and serializers

A series of compact encoding schemes for building small and fast parsers and serializers

Sets of libraries and tools to write applications and libraries mixing OCaml and Rust

Sets of libraries and tools to write applications and libraries mixing OCaml and Rust. These libraries will help keeping your types and data structures synchronized, and enable seamless exchange between OCaml and Rust

Fast, compact and all-around subdomain enumeration tool written in Rust
Fast, compact and all-around subdomain enumeration tool written in Rust

Fast, compact and all-around subdomain enumeration tool written in Rust, which uses dns bruteforce, internet search and recursive http content search.

A Rust library for building modular, fast and compact indexes over genomic data

mazu A Rust library for building modular, fast and compact indexes over genomic data Mazu (媽祖)... revered as a tutelary deity of seafarers, including

Baryon is a compact 3D engine focused on fast prototyping in code.
Baryon is a compact 3D engine focused on fast prototyping in code.

baryon Baryon is a compact 3D engine focused on fast prototyping in code. No big dependencies, no fancy run-times, GUI editors, or other magic. Depend

🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure. (Python wrapper for daachorse)

python-daachorse daachorse is a fast implementation of the Aho-Corasick algorithm using the compact double-array data structure. This is a Python wrap

Represent large sets and maps compactly with finite state transducers.

fst This crate provides a fast implementation of ordered sets and maps using finite state machines. In particular, it makes use of finite state transd

rsdate connects to an ntp server, printing the returned time and/or sets the system clock.

🦀 📅 rsdate rsdate connects to an ntp server, printing the returned time and/or sets the system clock.

A command line application which sets your wall paper with new image generating pollens once they arrive.

pollenwall Table of Contents pollenwall About Installation Binary releases Build from source Usage Command Line Arguments Running as a service MacOS L

Cakecutter - a utility tool that quickly sets up a project from a pre-built template
Cakecutter - a utility tool that quickly sets up a project from a pre-built template

Cakecutter Create projects from pre-built cakes (templates)! Supports files, packages, content, running commands and more! Cakecutter is a utility too

Sort (key, value) data sets that don't fit in memory

kv-par-merge-sort Key-Value Parallel Merge Sort Sort Pod (key, value) data sets that don't fit in memory. This crate provides the kv_par_merge_sort li

Owner
null
Astro Format is a library for efficiently encoding and decoding a set of bytes into a single buffer format.

Astro Format is a library for efficiently transcoding arrays into a single buffer and native rust types into strings

Stelar Labs 1 Aug 13, 2022
Decode SCALE bytes into custom types using a scale-info type registry and a custom Visitor impl.

scale-decode This crate attempts to simplify the process of decoding SCALE encoded bytes into a custom data structure given a type registry (from scal

Parity Technologies 6 Sep 20, 2022
A fast, performant implementation of skip list in Rust.

Subway A fast, performant implementation of skip list in Rust. A skip list is probabilistic data structure that provides O(log N) search and insertion

Sushrut 16 Apr 5, 2022
Pure Rust port of CRFsuite: a fast implementation of Conditional Random Fields (CRFs)

crfs-rs Pure Rust port of CRFsuite: a fast implementation of Conditional Random Fields (CRFs) Currently only support prediction, model training is not

messense 24 Nov 23, 2022
Encoding and decoding support for BSON in Rust

bson-rs Encoding and decoding support for BSON in Rust Index Overview of BSON Format Usage BSON Values BSON Documents Modeling BSON with strongly type

mongodb 304 Dec 30, 2022
Rust library for reading/writing numbers in big-endian and little-endian.

byteorder This crate provides convenience methods for encoding and decoding numbers in either big-endian or little-endian order. Dual-licensed under M

Andrew Gallant 811 Jan 1, 2023
Crate to parse and emit EDN

edn-rs Near Stable no breaking changes expected. Crate to parse and emit EDN This lib does not make effort to conform the EDN received to EDN Spec. Th

Julia Naomi 61 Dec 19, 2022
pem-rs pem PEM jcreekmore/pem-rs [pem] — A Rust based way to parse and encode PEM-encoded data

pem A Rust library for parsing and encoding PEM-encoded data. Documentation Module documentation with examples Usage Add this to your Cargo.toml: [dep

Jonathan Creekmore 30 Dec 27, 2022
Variable-length signed and unsigned integer encoding that is byte-orderable for Rust

ordered-varint Provides variable-length signed and unsigned integer encoding that is byte-orderable. This crate provides the Variable trait which enco

Khonsu Labs 7 Dec 6, 2022
Free Rust-only Xbox ADPCM encoder and decoder

XbadPCM Safe (and optionally no-std) Rust crate for encoding and decoding Xbox ADPCM blocks. Decoding example Here is example code for decoding stereo

Snowy 5 Nov 20, 2022