Rust Persistent Data Structures

Overview

Build Status Code Coverage Dependency status crates.io Downloads Github stars Documentation License

Rust Persistent Data Structures

Rust Persistent Data Structures provides fully persistent data structures with structural sharing.

Setup

To use rpds add the following to your Cargo.toml:

[dependencies]
rpds = "<version>"

Data structures

This crate offers the following data structures:

  1. List
  2. Vector
  3. Stack
  4. Queue
  5. HashTrieMap
  6. HashTrieSet
  7. RedBlackTreeMap
  8. RedBlackTreeSet

List

List documentation

Your classic functional list.

Example

use rpds::List;

let list = List::new().push_front("list");

assert_eq!(list.first(), Some(&"list"));

let a_list = list.push_front("a");

assert_eq!(a_list.first(), Some(&"a"));

let list_dropped = a_list.drop_first().unwrap();

assert_eq!(list_dropped, list);

Vector

Vector documentation

A sequence that can be indexed. The implementation is described in Understanding Persistent Vector Part 1 and Understanding Persistent Vector Part 2.

Example

use rpds::Vector;

let vector = Vector::new()
    .push_back("I’m")
    .push_back("a")
    .push_back("vector");

assert_eq!(vector[1], "a");

let screaming_vector = vector
    .drop_last().unwrap()
    .push_back("VECTOR!!!");

assert_eq!(screaming_vector[2], "VECTOR!!!");

Stack

Stack documentation

A LIFO (last in, first out) data structure. This is just a List in disguise.

Example

use rpds::Stack;

let stack = Stack::new().push("stack");

assert_eq!(stack.peek(), Some(&"stack"));

let a_stack = stack.push("a");

assert_eq!(a_stack.peek(), Some(&"a"));

let stack_popped = a_stack.pop().unwrap();

assert_eq!(stack_popped, stack);

Queue

Queue documentation

A FIFO (first in, first out) data structure.

Example

use rpds::Queue;

let queue = Queue::new()
    .enqueue("um")
    .enqueue("dois")
    .enqueue("tres");

assert_eq!(queue.peek(), Some(&"um"));

let queue_dequeued = queue.dequeue().unwrap();

assert_eq!(queue_dequeued.peek(), Some(&"dois"));

HashTrieMap

HashTrieMap documentation

A map implemented with a hash array mapped trie. See Ideal Hash Trees for details.

Example

use rpds::HashTrieMap;

let map_en = HashTrieMap::new()
    .insert(0, "zero")
    .insert(1, "one");

assert_eq!(map_en.get(&1), Some(&"one"));

let map_pt = map_en
    .insert(1, "um")
    .insert(2, "dois");

assert_eq!(map_pt.get(&2), Some(&"dois"));

let map_pt_binary = map_pt.remove(&2);

assert_eq!(map_pt_binary.get(&2), None);

HashTrieSet

HashTrieSet documentation

A set implemented with a HashTrieMap.

Example

use rpds::HashTrieSet;

let set = HashTrieSet::new()
    .insert("zero")
    .insert("one");

assert!(set.contains(&"one"));

let set_extended = set.insert("two");

assert!(set_extended.contains(&"two"));

let set_positive = set_extended.remove(&"zero");

assert!(!set_positive.contains(&"zero"));

RedBlackTreeMap

RedBlackTreeMap documentation

A map implemented with a red-black tree.

Example

use rpds::RedBlackTreeMap;

let map_en = RedBlackTreeMap::new()
    .insert(0, "zero")
    .insert(1, "one");

assert_eq!(map_en.get(&1), Some(&"one"));

let map_pt = map_en
    .insert(1, "um")
    .insert(2, "dois");

assert_eq!(map_pt.get(&2), Some(&"dois"));

let map_pt_binary = map_pt.remove(&2);

assert_eq!(map_pt_binary.get(&2), None);

assert_eq!(map_pt_binary.first(), Some((&0, &"zero")));

RedBlackTreeSet

RedBlackTreeSet documentation

A set implemented with a RedBlackTreeMap.

Example

use rpds::RedBlackTreeSet;

let set = RedBlackTreeSet::new()
    .insert("zero")
    .insert("one");

assert!(set.contains(&"one"));

let set_extended = set.insert("two");

assert!(set_extended.contains(&"two"));

let set_positive = set_extended.remove(&"zero");

assert!(!set_positive.contains(&"zero"));

assert_eq!(set_positive.first(), Some(&"one"));

Other features

Mutable methods

When you change a data structure you often do not need its previous versions. For those cases rpds offers you mutable methods which are generally faster:

use rpds::HashTrieSet;

let mut set = HashTrieSet::new();

set.insert_mut("zero");
set.insert_mut("one");

let set_0_1 = set.clone();
let set_0_1_2 = set.insert("two");

Initialization macros

There are convenient initialization macros for all data structures:

use rpds::*;

let vector = vector![3, 1, 4, 1, 5];
let map = ht_map!["orange" => "orange", "banana" => "yellow"];

Check the documentation for initialization macros of other data structures.

Thread safety

All data structures in this crate can be shared between threads, but that is an opt-in ability. This is because there is a performance cost to make data structures thread safe. That cost is worth avoiding when you are not actually sharing them between threads.

Of course if you try to share a rpds data structure across different threads you can count on the rust compiler to ensure that it is safe to do so. If you are using the version of the data structure that is not thread safe you will get a compile-time error.

To create a thread-safe version of any data structure use new_sync():

let vec = Vector::new_sync()
    .push_back(42);

Or use the _sync variant of the initialization macro:

let vec = vector_sync!(42);

no_std support

This crate supports no_std. To enable that you need to disable the default feature std:

[dependencies]
rpds = { version = "<version>", default-features = false }

Further details

Internally the data structures in this crate maintain a lot of reference-counting pointers. These pointers are used both for links between the internal nodes of the data structure as well as for the values it stores.

There are two implementations of reference-counting pointers in the standard library: Rc and Arc. They behave the same way, but Arc allows you to share the data it points to across multiple threads. The downside is that it is significantly slower to clone and drop than Rc, and persistent data structures do a lot of those operations. In some microbenchmarks with rpds data structure we can see that using Rc instead of Arc can make some operations twice as fast! You can see this for yourself by running cargo bench.

To implement this we parameterize the type of reference-counting pointer (Rc or Arc) as a type argument of the data structure. We use the archery crate to do this in a convenient way.

The pointer type can be parameterized like this:

let vec: Vector<u32, archery::ArcK> = Vector::new_with_ptr_kind();
//                              ↖
//                                This will use `Arc` pointers.
//                                Change it to `archery::RcK` to use a `Rc` pointer.

Serialization

We support serialization through serde. To use it enable the serde feature. To do so change the rpds dependency in your Cargo.toml to

[dependencies]
rpds = { version = "<version>", features = ["serde"] }
Comments
  • Support for other underlying smart pointers for the data structures links

    Support for other underlying smart pointers for the data structures links

    It would be nice to either use Box or Rc as the underlying smart pointer type. Alas, until we have associated generic types, we can't be polymorphic over this, but perhaps in the interim code generation could be used to create concrete implementations of:

    • ArcList
    • RcList
    • BoxList
    • etc.

    Then once associated generic types comes we could convert these to type aliases that refer to a common List type.

    performance 
    opened by brendanzab 17
  • Reuse Arc allocation if the ref count is one

    Reuse Arc allocation if the ref count is one

    POC of using Arc::make_mut to avoid allocating new Arcs unnecesarily. Can speedup mutating operations quite significantly on certain workloads.

    The same approach could be used on most or all of the other datastructures as well though the difficulty of rewriting may vary.

    Before

    running 4 tests
    test vector_drop_last ... bench: 130,201,140 ns/iter (+/- 36,244,874)
    test vector_get       ... bench:   1,184,598 ns/iter (+/- 1,617,973)
    test vector_iterate   ... bench:     867,704 ns/iter (+/- 1,049,963)
    test vector_push_back ... bench: 144,737,691 ns/iter (+/- 37,627,413)
    
    test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured
    

    After

    running 6 tests
    test vector_drop_last     ... bench: 121,274,205 ns/iter (+/- 42,464,999)
    test vector_drop_last_mut ... bench:   6,236,952 ns/iter (+/- 5,259,908)
    test vector_get           ... bench:   1,346,970 ns/iter (+/- 1,673,853)
    test vector_iterate       ... bench:     941,329 ns/iter (+/- 1,300,796)
    test vector_push_back     ... bench: 166,950,667 ns/iter (+/- 50,590,077)
    test vector_push_back_mut ... bench:  12,237,360 ns/iter (+/- 8,773,744)
    
    test result: ok. 0 passed; 0 failed; 0 ignored; 6 measured
    
    $ cargo benchcmp old new
     name              old ns/iter  new ns/iter  diff ns/iter   diff %  speedup
     vector_drop_last  130,201,140  110,503,419   -19,697,721  -15.13%   x 1.18
     vector_get        1,184,598    1,143,228         -41,370   -3.49%   x 1.04
     vector_iterate    867,704      888,491            20,787    2.40%   x 0.98
     vector_push_back  144,737,691  128,403,711   -16,333,980  -11.29%   x 1.13
    
    opened by Marwes 11
  • cargo could use a persistent hash map in it's resolver

    cargo could use a persistent hash map in it's resolver

    opened by Eh2406 11
  • Implement serde for rpds types (optional dependency)

    Implement serde for rpds types (optional dependency)

    This patch set adds serde as an optional dependency; serialization/deserialization code isn't compiled if it's not used. Addresses issue #1

    You can run tests that include serialization/deserialization code using

    cargo test --features serde
    

    or more likely if you want the tests to finish somewhat promptly,

    cargo test --release --features serde
    
    opened by idupree 8
  • Add optional Serde support

    Add optional Serde support

    In #rust today someone was struggling to serialize rpds::Vector (logs). It would be great to have Serde impls for these data structures that can be enabled by a serde cfg.

    opened by dtolnay 8
  • Impl Hash for RedBlackTreeSet

    Impl Hash for RedBlackTreeSet

    I tried using a RedBlackTreeSet inside of another map and was surprised to find it doesn't impl Hash. Considering that you can already hash a RedBlackTreeMap<T, ()>, I think it makes sense to expose that functionality on the Set type here too.

    opened by FenrirWolf 5
  • Speedup HashTrie iteration

    Speedup HashTrie iteration

    Speedup HashTrie Iterator by:

    • avoid pushing single bucket leafs into the stack
    • avoid recursion
    • avoid iterator peeking

    Before: rpds hash trie map sync iterate time: [301.18 us 304.05 us 307.06 us]

    After (this PR, non inlined next) rpds hash trie map sync iterate time: [120.29 us 120.96 us 121.72 us]

    After (when next is inlined, not in this PR) rpds hash trie map sync iterate time: [103.74 us 104.39 us 105.03 us]

    opened by arthurprs 5
  • Add RedBlackTree::range

    Add RedBlackTree::range

    This implementation is based on the suggestion in #23, but I ended up diverging a bit from that. In particular, I'm not implementing IterArc in terms of RangeIter, because the termination logic ended up being different and so there wasn't much code-sharing to be had. Also, the logic in RangeIter is much nicer if I don't include the lazy-initialization optimization for the stacks.

    I did, however, attempt to reuse some code between IterArc and RangeIter by factoring out some utilities for managing the stacks.

    opened by jneem 5
  • [POC] Remove Arc wrapping of keys and values in Vector and RedBlackTree

    [POC] Remove Arc wrapping of keys and values in Vector and RedBlackTree

    Vectors seem promising but the RedBlackTree regresses in get and iterate which is rather odd... Need to investigate further.

     name                  old ns/iter  new ns/iter  diff ns/iter   diff %  speedup 
     vector_drop_last      120,354,099  95,403,062    -24,951,037  -20.73%   x 1.26 
     vector_drop_last_mut  7,469,591    5,451,750      -2,017,841  -27.01%   x 1.37 
     vector_get            2,969,756    2,909,477         -60,279   -2.03%   x 1.02 
     vector_iterate        1,411,878    1,269,607        -142,271  -10.08%   x 1.11 
     vector_push_back      131,497,470  103,677,766   -27,819,704  -21.16%   x 1.27 
     vector_push_back_mut  14,291,589   9,912,382      -4,379,207  -30.64%   x 1.44 
    

    (10k elements)

     name                        old_tree ns/iter  new_tree ns/iter  diff ns/iter   diff %  speedup 
     red_black_tree_map_get      624,572           667,942                 43,370    6.94%   x 0.94 
     red_black_tree_map_insert   17,937,010        15,219,633          -2,717,377  -15.15%   x 1.18 
     red_black_tree_map_iterate  172,588           197,880                 25,292   14.65%   x 0.87 
     red_black_tree_map_remove   22,687,565        18,574,023          -4,113,542  -18.13%   x 1.22 
    
    opened by Marwes 4
  • Avoiding stackoverflow when dropping large List

    Avoiding stackoverflow when dropping large List

    As #41 suggests, when dropping a large List, rust recursively calls Arc's drop function, which leads to stackoverflow. We can implement our custom Drop trait for List to prevent this, the following code seems to solve this problem:

    impl<T, P> Drop for List<T, P>
    where
        P: SharedPointerKind,
    {
        fn drop(&mut self) {
            let mut head = self.head.take();
            while let Some(node) = head {
                if let Ok(mut node) = SharedPointer::try_unwrap(node) {
                    head = node.next.take();
                }
                else {
                    break;
                }
            }
        }
    }
    
    opened by lulitao1997 3
  • invalid memory reference for HashTrieMap

    invalid memory reference for HashTrieMap

    Attempting to implement a wrapper for HashTrieMap and got the following error:

    SIGSEGV: invalid memory reference
    

    Most of the relevant code is below. It compiles but fails when running the test. Any idea what a fix would be for this?

    type ContainedHval = Box<HVal>;
    
    #[derive(Clone,Debug)]
    pub struct HDict(HashTrieMap<String, ContainedHval>);
    
    //pub type HDict<'b> = HashTrieMap<String, Box<HVal<'b>>>;
    
    impl HVal for HDict {
        fn to_zinc(&self) -> String {
            "Not implemented!".to_string()
        }
    }
    
    impl Eq for HDict {}
    
    impl PartialEq for HDict {
        fn eq(&self, other: &Self) -> bool {
            if self.0.size() != other.0.size() {
                return false;
            }
    
            for (k,v) in self.0.iter() {
                let tmp = match other.0.get(k) {
                    Some(ov) => **v == **ov,
                    None => false
                };
    
                if !tmp {
                    return false;
                }
            }
    
            true
        }
    }
    
    impl HDict {
        pub fn new() -> HDict {
            HDict(HashTrieMap::new())
        }
    
        fn has(&self, name: &String) -> HBool {
            self.0.contains_key(name)
        }
    
        fn is_empty(&self) -> HBool {
            self.0.is_empty()
        }
    
        fn missing(&self, name: &String) -> HBool {
            !self.0.contains_key(name)
        }
    
        pub fn iter(&self) -> Iter<String, ContainedHval> {
            self.0.iter()
        }
    
        fn insert(&mut self, name: &String, value: ContainedHval) -> Self {
            HDict(self.0.insert(name.clone(),value))
        }
    }
    
    #[cfg(test)]
    mod test {
        use super::*;
    
        #[test]
        fn create() {
            let a = HDict::new();
            let b = HDict::new().insert(&"Hello".to_owned(),Box::new(true));
    
            //assert_ne!(a,a.clone().insert(&"Hello".to_owned(),Box::new(true)));
            //assert_eq!(a,HDict::new());
            ///*
            assert_eq!(
                b.clone(),
                HDict::new().insert(&"Hello".to_owned(),Box::new(true))
            );
            //*/
        }
    }
    
    opened by candronikos 3
  • Consider providing a .pop_with_value() helper on Stack and List

    Consider providing a .pop_with_value() helper on Stack and List

    I find myself doing this a lot:

    match foo.peek() {
      Some(head) => {
        foo = foo.pop().unwrap();
      }
    }
    

    As far as I can see, there isn't a single API for popping an item and returning it. Would it make sense to add one?

    (I'm sure you can think of a better name if you do add this.)

    opened by Wilfred 0
  • Consider re-exporting archery items

    Consider re-exporting archery items

    I would like to make rpds an optional dependency while at the same time implementing some traits in a generic manner (i.e. impl<T, P: archery::SharedPointerKind> for rpds::Vector<T, P>).

    I can't find a way to make this work without adding archery as an optional dependency and then requiring users to enable both features.

    opened by teoxoy 0
  • faster slicing for vectors?

    faster slicing for vectors?

    I was using im and recently moving to rpds. im as methods like skip take drop_first...which create slices from existing vectors. I can't find these methods in rpds so I have to use for... .iter() and then copy elements into new vectors, and that would be slow due to lack of structural sharing.

    Is there suggesting ways to make slices?


    updated,

    probably just https://github.com/orium/rpds/issues/61 .

    opened by tiye 2
  • Finger Trees

    Finger Trees

    Any chance for adding a Finger Tree data-structure to rpds? 😃🙏

    (there are references for this being done—at least in partially persistent ways—as early as '86)

    opened by SamuelMarks 1
  • Memory Usage for Debugging

    Memory Usage for Debugging

    What is a good way to get the current memory usage of a rpds structure? I'm looking for something like Servo's malloc_size_of. I can get the size of leaves but I'm missing the size of branches. Or is it trivial to calculate it? Any recommendation would be appreciated.

    opened by fredfortier 4
Releases(v0.12.0)
Owner
Diogo Sousa
Diogo Sousa
A proof of concept implementation of cyclic data structures in stable, safe, Rust.

A proof of concept implementation of cyclic data structures in stable, safe, Rust. This demonstrates the combined power of the static-rc crate and the

null 157 Dec 28, 2022
Rust library for string parsing of basic data structures.

afmt Simple rust library for parsing basic data structures from strings. Usage You can specify string formats to any strucute, via the use of the fmt

Eduard 4 May 8, 2021
Serde is a framework for serializing and deserializing Rust data structures efficiently and generically.

Serde is a framework for serializing and deserializing Rust data structures efficiently and generically.

null 6.5k Dec 31, 2022
Algorithms and Data Structures of all kinds written in Rust.

Classic Algorithms in Rust This repo contains the implementation of various classic algorithms for educational purposes in Rust. Right now, it is in i

Alexander González 49 Dec 14, 2022
Library containing various Data Structures implemented using Rust.

rust-data-structures Library containing various Data Structures implemented using Rust. Running You can test the library by running cargo test, an exa

c1m50c 1 Jan 6, 2022
A library for comparing data structures in Rust, oriented toward testing

Delta: Structural differencing in Rust The delta crate defines the trait Delta, along with a derive macro for auto-generating instances of this trait

John Wiegley 19 Oct 7, 2022
A library for comparing data structures in Rust, oriented toward testing

The comparable crate defines the trait [Comparable], along with a derive macro for auto-generating instances of this trait for most data types. Primar

John Wiegley 19 Oct 7, 2022
Collection of Data Structures in Rust

cds - Collection of Data Structures !!! UNDER CONSTRUCTION !!! The version v0.0.1 is a crates.io placeholder. License Licensed under either of Apache

Rafael Buchbinder 2 May 23, 2022
Succinct data structures in Rust

sucds: Succinct data structures in Rust sucds contains some succinct data structures written in Rust. Data structures So far, the following data struc

Shunsuke Kanda 43 Dec 3, 2022
Coding-challenge - Algorithms and Data-structures, problems and solutions in Rust language using cargo-workspaces

Coding Challenge LeetCode/Hackerrank e.t.c Using this as an opportunity to improve my knowledge of rust lang If you found this repo useful to you, add

Tolumide Shopein 17 Apr 24, 2022
Common data structures and algorithms in Rust

Contest Algorithms in Rust A collection of classic data structures and algorithms, emphasizing usability, beauty and clarity over full generality. As

Aram Ebtekar 3.3k Dec 27, 2022
Rust data structures and client for the PubChem REST API

pubchem.rs Rust data structures and client for the PubChem REST API. ?? Usage ?? Compound Create a Compound to query the PubChem API for a single comp

Martin Larralde 2 Jan 18, 2022
Dade is data definition for Rust structures.

dade dade is data definition for Rust structures. For the easy handle of data, the following will support it. Data validation. Data schema conforms Js

odd 3 May 1, 2022
NixEl is a Rust library that turns Nix code into a variety of correct, typed, memory-safe data-structures

?? NixEL Lexer, Parser, Abstract Syntax Tree and Concrete Syntax Tree for the Nix Expressions Language. NixEl is a Rust library that turns Nix code in

Kevin Amado 56 Dec 29, 2022
rust_aads - Rust Algorithms And Data Structures

rust_aads - Rust Algorithms And Data Structures rust_aads is an open repository with algorithms and data structures, used in computer science and comp

stepa 2 Dec 15, 2022
This repository aims to organize codes related to data structures in Rust. 🦀

Rust Data Structure A project with the objective of introducing the main concepts about data structure using Rust! Explore the docs and learn Rust » R

guto 12 Apr 3, 2023
Obake is a procedural macro for declaring and maintaining versioned data-structures.

Obake is a procedural macro for declaring and maintaining versioned data-structures. The name 'obake' is taken from the Japanese 'お化け (おばけ)', a class of supernatural beings in Japanese folklore that shapeshift.

Nathan Corbyn 174 Dec 26, 2022
Garbage Collector(Hyaline- Safe Memory Reclaimation) for lock free data structures

Hyaline-SMR This crate provides garbage collection using hyaline algorithm for building concurrent data structures. When a thread removes an object fr

Abishek 2 Dec 21, 2022
Library containing implementations of various sequential data-structures.

Library containing implementations of various sequential data-structures.

c1m50c 0 May 15, 2022