Compact, efficient data structures in contiguous byte arrays

Ellipsis Labs

Last update: Jun 5, 2023

Related tags

Overview

Sokoban

Compact, efficient data structures in contiguous byte arrays.

Benchmarks

Based on simple benchmarks, the naive performance of Sokoban data structures are on par with, but slightly slower than, the Rust Standard Library.

test bench_tests::bench_sokoban_avl_tree_insert_1000_u128             ... bench:     134,301 ns/iter (+/- 4,033)
test bench_tests::bench_sokoban_avl_tree_insert_1000_u128_stack       ... bench:     134,135 ns/iter (+/- 3,620)
test bench_tests::bench_sokoban_avl_tree_insert_20000_u128            ... bench:   2,744,853 ns/iter (+/- 158,364)
test bench_tests::bench_sokoban_avl_tree_remove_u128                  ... bench:     355,992 ns/iter (+/- 22,770)
test bench_tests::bench_sokoban_critbit_insert_1000_u128              ... bench:      90,306 ns/iter (+/- 590)
test bench_tests::bench_sokoban_critbit_insert_1000_u128_stack        ... bench:      76,819 ns/iter (+/- 661)
test bench_tests::bench_sokoban_critbit_insert_20000_u128             ... bench:   2,839,050 ns/iter (+/- 207,241)
test bench_tests::bench_sokoban_critbit_remove_1000_u128              ... bench:      97,366 ns/iter (+/- 6,124)
test bench_tests::bench_sokoban_hash_map_insert_1000_u128             ... bench:      46,828 ns/iter (+/- 1,928)
test bench_tests::bench_sokoban_hash_map_insert_1000_u128_stack       ... bench:      46,686 ns/iter (+/- 1,691)
test bench_tests::bench_sokoban_hash_map_insert_20000_u128            ... bench:   1,492,742 ns/iter (+/- 43,362)
test bench_tests::bench_sokoban_hash_map_remove_1000_u128             ... bench:      59,896 ns/iter (+/- 1,782)
test bench_tests::bench_sokoban_red_black_tree_insert_1000_u128       ... bench:      69,574 ns/iter (+/- 8,581)
test bench_tests::bench_sokoban_red_black_tree_insert_1000_u128_stack ... bench:      66,057 ns/iter (+/- 8,853)
test bench_tests::bench_sokoban_red_black_tree_insert_20000_u128      ... bench:   1,905,406 ns/iter (+/- 25,546)
test bench_tests::bench_sokoban_red_black_tree_remove_1000_u128       ... bench:     128,889 ns/iter (+/- 13,508)
test bench_tests::bench_std_btree_map_insert_1000_u128                ... bench:      51,353 ns/iter (+/- 10,240)
test bench_tests::bench_std_btree_map_insert_20000_u128               ... bench:   1,535,224 ns/iter (+/- 21,645)
test bench_tests::bench_std_btree_map_remove_1000_u128                ... bench:     131,879 ns/iter (+/- 19,325)
test bench_tests::bench_std_hash_map_insert_1000_u128                 ... bench:      38,775 ns/iter (+/- 237)
test bench_tests::bench_std_hash_map_insert_20000_u128                ... bench:     797,904 ns/iter (+/- 10,719)
test bench_tests::bench_std_hash_map_remove_1000_u128                 ... bench:      57,452 ns/iter (+/- 364)

Why compact data structures?

For most applications, there is no reason to look past the Rust standard library for data structures. However, when the application has limited or expensive memory and is bottlenecked by performance, programmers will often need to design custom solutions to address those constraints. These types of constraints come up quite frequently in high frequency trading, embedded systems, and blockchain development.

Enter Sokoban: A library of data structures designed to simplify this exact problem.

Generic Node Allocator

Almost all data structures can be represented by some sort of connected graph of nodes and edges. The node-allocator module implements a raw node allocation data structure for contiguous buffers. Each entry in the buffer must contain objects of the same underlying type. Each entry will also have a fixed number of registers that contain metadata relating to the current node. These registers will usually be interpreted as graph edges.

#[repr(C)]
#[derive(Copy, Clone)]
pub struct NodeAllocator<
    T: Default + Copy + Clone + Pod + Zeroable,
    const MAX_SIZE: usize,
    const NUM_REGISTERS: usize,
> {
    /// Size of the allocator
    pub size: u64,
    /// Furthest index of the allocator
    bump_index: u32,
    /// Buffer index of the first element in the free list
    free_list_head: u32,
    pub nodes: [Node<NUM_REGISTERS, T>; MAX_SIZE],
}

#[repr(C)]
#[derive(Copy, Clone)]
pub struct Node<T: Copy + Clone + Pod + Zeroable + Default, const NUM_REGISTERS: usize> {
    /// Arbitrary registers (generally used for pointers)
    /// Note: Register 0 is ALWAYS used for the free list
    registers: [u32; NUM_REGISTERS],
    value: T,
}

The templated NodeAllocator object is flexible primitive data structure for implementing more complex types. Here's how one might use the NodeAllocator to implement a doubly-linked list:

// Register aliases
pub const PREV: u32 = 0;
pub const NEXT: u32 = 1;

#[derive(Copy, Clone)]
pub struct DLL<T: Default + Copy + Clone + Pod + Zeroable, const MAX_SIZE: usize> {
    pub head: u32,
    pub tail: u32,
    allocator: NodeAllocator<T, MAX_SIZE, 2>,
}

The DLL is essentially just a node allocator with 2 registers per node. These registers represent the prev and next pointers of a DLL node. The logic for how edges are created and removed are specific to the type, but the allocator struct provides an interface for implementing arbitrary types that have this property (trees and graphs).

Comments

add generic heap implementation

PR to add a generic heap data structure that fits into fixed buffers.

Tried to make the tests comparable to data structures implemented with NodeAllocator.

opened by fabioibanez 2
Add avl tree implementation
PR to add an AVL tree implementation using the NodeAllocator. AVL trees (theoretically) offer faster lookups as trees are more strictly balanced compared to red-black ones – although they can be slower at insert/remove operations given that they require more rotations to maintain the tree balanced.

The tests included are the same ones as the red-black tree. Adding lookup tests for both avl and red-black trees can illustrate the difference in performance:

for _ in 0..200 { for k in keys.iter() { assert!(tree.get(k) != None); } }
opened by febo 2
Jxiao/fix rbt bugs and add test
Massive refactor:

Fixed glaring bug in insert: rotated the child node instead of the parent

Resolved a number of issues in remove:

Add special logic to handle the case where the pivot node is a leaf to allow the fix_remove function to properly rebalance (a)

Explicitly handle the case where the removed node is a leaf node (b)

Added comments

Added specialized test cases for inserts and remove

Added functions to pretty print the tree as well as verify that it follows the RBT property
opened by jarry-xiao 0

Improvements to bench tests

This PR includes changes to the bench tests to:

Change the label of bench_sokoban_avl_tree_remove_u128 to bench_sokoban_avl_tree_remove_1000_u128 to match the name of the other remove tests.
Move the insert step outside of the benchmark on the remove tests, so the measure only includes the time to remove the items.
Add lookup tests.

test bench_tests::bench_sokoban_avl_tree_insert_1000_u128             ... bench:      95,849 ns/iter (+/- 3,441)
test bench_tests::bench_sokoban_avl_tree_insert_1000_u128_stack       ... bench:     108,724 ns/iter (+/- 1,801)
test bench_tests::bench_sokoban_avl_tree_insert_20000_u128            ... bench:   2,175,486 ns/iter (+/- 26,504)
test bench_tests::bench_sokoban_avl_tree_lookup_20000_u128            ... bench:     946,922 ns/iter (+/- 22,870)
test bench_tests::bench_sokoban_avl_tree_remove_1000_u128             ... bench:       1,874 ns/iter (+/- 139)
test bench_tests::bench_sokoban_critbit_insert_1000_u128              ... bench:      65,728 ns/iter (+/- 1,217)
test bench_tests::bench_sokoban_critbit_insert_1000_u128_stack        ... bench:      66,342 ns/iter (+/- 2,142)
test bench_tests::bench_sokoban_critbit_insert_20000_u128             ... bench:   1,963,832 ns/iter (+/- 84,912)
test bench_tests::bench_sokoban_critbit_lookup_20000_u128             ... bench:   1,652,995 ns/iter (+/- 28,951)
test bench_tests::bench_sokoban_critbit_remove_1000_u128              ... bench:       2,110 ns/iter (+/- 22)
test bench_tests::bench_sokoban_hash_map_insert_1000_u128             ... bench:      28,473 ns/iter (+/- 476)
test bench_tests::bench_sokoban_hash_map_insert_1000_u128_stack       ... bench:      28,533 ns/iter (+/- 473)
test bench_tests::bench_sokoban_hash_map_insert_20000_u128            ... bench:   1,033,449 ns/iter (+/- 10,907)
test bench_tests::bench_sokoban_hash_map_lookup_20000_u128            ... bench:     825,924 ns/iter (+/- 35,330)
test bench_tests::bench_sokoban_hash_map_remove_1000_u128             ... bench:      13,045 ns/iter (+/- 665)
test bench_tests::bench_sokoban_red_black_tree_insert_1000_u128       ... bench:      49,026 ns/iter (+/- 549)
test bench_tests::bench_sokoban_red_black_tree_insert_1000_u128_stack ... bench:      49,218 ns/iter (+/- 859)
test bench_tests::bench_sokoban_red_black_tree_insert_20000_u128      ... bench:   1,450,428 ns/iter (+/- 24,906)
test bench_tests::bench_sokoban_red_black_tree_lookup_20000_u128      ... bench:   1,043,305 ns/iter (+/- 141,596)
test bench_tests::bench_sokoban_red_black_tree_remove_1000_u128       ... bench:       2,275 ns/iter (+/- 143)
test bench_tests::bench_std_btree_map_insert_1000_u128                ... bench:      39,719 ns/iter (+/- 465)
test bench_tests::bench_std_btree_map_insert_20000_u128               ... bench:     971,612 ns/iter (+/- 12,728)
test bench_tests::bench_std_btree_map_lookup_20000_u128               ... bench:     987,608 ns/iter (+/- 22,162)
test bench_tests::bench_std_btree_map_remove_1000_u128                ... bench:       1,201 ns/iter (+/- 10)
test bench_tests::bench_std_hash_map_insert_1000_u128                 ... bench:      20,465 ns/iter (+/- 462)
test bench_tests::bench_std_hash_map_insert_20000_u128                ... bench:     484,015 ns/iter (+/- 9,921)
test bench_tests::bench_std_hash_map_lookup_20000_u128                ... bench:     315,190 ns/iter (+/- 12,477)
test bench_tests::bench_std_hash_map_remove_1000_u128                 ... bench:      11,380 ns/iter (+/- 217)

test result: ok. 0 passed; 0 failed; 0 ignored; 28 measured; 0 filtered out; finished in 61.36s

opened by febo 0

add succinct data structures

It looks like using succinct data structures in this context might be a good fit. Especially, if you are looking for tries, filters with range queries functionality.

opened by KirillLykov 1

Releases(v0.3.0)

v0.3.0(Feb 13, 2023)

Source code(tar.gz)
Source code(zip)

Owner

Ellipsis Labs

GitHub https://docs.rs/lib-sokoban/latest/sokoban

svgcleaner could help you to clean up your SVG files from the unnecessary data.

svgcleaner svgcleaner helps you clean up your SVG files, keeping them free from unnecessary data. Table of Contents Purpose Goals Alternatives Charts

1.5k Jan 9, 2023

Graph data structure library for Rust.

petgraph Graph data structure library. Supports Rust 1.41 and later. Please read the API documentation here Crate feature flags: graphmap (default) en

2k Jan 9, 2023

A fast, lightweight and extensible implementation of a graph data structure.

fast-graph A fast, lightweight and extensible implementation of a graph data structure. Note ⚠️ There will be some breaking changes in the coming 1-2

34 Jul 6, 2024

concat-arrays: a rust macro for concatenating fixed-size arrays

concat-arrays: a rust macro for concatenating fixed-size arrays This crate defines concat_arrays!, a macro that allows you to concatenate arrays.

5 May 25, 2021

Rust lib for a Vec-like structure that can store different types of different sizes contiguous with each other in memory.

hvec In memory of Anna Harren, who coined the term turbofish - which you'll see a lot of if you use this crate. The main purpose of this crate is the

2 Oct 23, 2022

wait-free 4-level 64-bit pagetable for contiguous low-contention concurrent metadata

pagetable Wait-free 4-level page table that maps from a u64 key to an &AtomicU64 value. Page fan-out is 2^16. If a key doesn't exist, intermediate pag

21 Nov 20, 2022

This library provides a data view for reading and writing data in a byte array.

Docs This library provides a data view for reading and writing data in a byte array. This library requires feature(generic_const_exprs) to be enabled.

2 Nov 2, 2022

Fast, efficient, and robust memory reclamation for concurrent data structures

Seize Fast, efficient, and robust memory reclamation for concurrent data structures. Introduction Concurrent data structures are faced with the proble

240 Dec 23, 2022

An efficient method of heaplessly converting numbers into their string representations, storing the representation within a reusable byte array.

NumToA #![no_std] Compatible with Zero Heap Allocations The standard library provides a convenient method of converting numbers into strings, but thes

42 Sep 6, 2022

🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure. (Python wrapper for daachorse)

python-daachorse daachorse is a fast implementation of the Aho-Corasick algorithm using the compact double-array data structure. This is a Python wrap

11 Nov 30, 2022

A compact generational arena data structure for Rust.

Compact Generational Arena This crate provides Arena<T>, a contiguous growable container which assigns and returns IDs to values when they are added t

17 Dec 6, 2022

A Rust library for building modular, fast and compact indexes over genomic data

mazu A Rust library for building modular, fast and compact indexes over genomic data Mazu (媽祖)... revered as a tutelary deity of seafarers, including

6 Aug 15, 2023

stringsext - search for multi-byte encoded strings in binary data

title stringsext - search for multi-byte encoded strings in binary data stringsext is a Unicode enhancement of the GNU strings tool with additional fu

89 Dec 14, 2022

Fast suffix arrays for Rust (with Unicode support).

suffix Fast linear time & space suffix arrays for Rust. Supports Unicode! Dual-licensed under MIT or the UNLICENSE. Documentation https://docs.rs/suff

207 Dec 26, 2022

This crate allows writing a struct in Rust and have it derive a struct of arrays layed out in memory according to the arrow format.

Arrow2-derive - derive for Arrow2 This crate allows writing a struct in Rust and have it derive a struct of arrays layed out in memory according to th

29 Dec 27, 2022

Built on the Substrate-based runtime, this repository develops the Paralink Network, a cross-chain Oracle platform for secure and scalable data integration. Our goal is to create a robust infrastructure that ensures efficient and reliable data transmission across blockchain networks.

Getting Started This repository contains Substrate based runtime for Paralink Network. Makefile This project uses a Makefile to document helpful comma

4 Aug 14, 2024

Compact, efficient data structures in contiguous byte arrays

Related tags

Overview

Sokoban

Benchmarks

Why compact data structures?

Generic Node Allocator

Comments

add generic heap implementation

Add avl tree implementation

Jxiao/fix rbt bugs and add test

Improvements to bench tests

add succinct data structures

Releases(v0.3.0)

v0.3.0(Feb 13, 2023)

Owner

Ellipsis Labs

svgcleaner could help you to clean up your SVG files from the unnecessary data.

Graph data structure library for Rust.

A fast, lightweight and extensible implementation of a graph data structure.

concat-arrays: a rust macro for concatenating fixed-size arrays

Rust lib for a Vec-like structure that can store different types of different sizes contiguous with each other in memory.

wait-free 4-level 64-bit pagetable for contiguous low-contention concurrent metadata

This library provides a data view for reading and writing data in a byte array.

Fast, efficient, and robust memory reclamation for concurrent data structures

An efficient method of heaplessly converting numbers into their string representations, storing the representation within a reusable byte array.

🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure. (Python wrapper for daachorse)

A compact generational arena data structure for Rust.

A Rust library for building modular, fast and compact indexes over genomic data

stringsext - search for multi-byte encoded strings in binary data

Fast suffix arrays for Rust (with Unicode support).

This crate allows writing a struct in Rust and have it derive a struct of arrays layed out in memory according to the arrow format.

Vim plugin to quickly parse strings into arrays.

Rust implementations of Fast Fourier Transform convolution and correlation for n-dimensional arrays

Serde support for n-dimensional arrays from self-describing formats

Extension trait to chunk iterators into const-length arrays.

Built on the Substrate-based runtime, this repository develops the Paralink Network, a cross-chain Oracle platform for secure and scalable data integration. Our goal is to create a robust infrastructure that ensures efficient and reliable data transmission across blockchain networks.