A proof of concept implementation of cyclic data structures in stable, safe, Rust.

Overview

A proof of concept implementation of cyclic data structures in stable, safe, Rust.

This demonstrates the combined power of the static-rc crate and the ghost-cell crate.

Motivation

Why the focus on safety?

A simple combination of 2 facts about data structures:

  • They are pervasive and often used to store user-controlled input.
  • They are extremely error-prone, juggling lifetime and aliasing concerns at the same time.

The combination of this two factors means that a single logical bug can result in Undefined Behavior, opening up the door to a multitude of attacks.

How can you write safe cyclic data structures?

There are typically two obstacles, as mentioned: lifetime, and aliasing.

The state of the art recommendation today is one of:

  • A Vec with indices instead of pointers.
  • An arena to handle lifetime, and some form of cell to handle aliasing.
  • Rc + RefCell.

What's wrong with those approaches?

Vec based solutions will blow up the algorithmic complexity of splice/split operations. In a linked list, splicing in another list, or splitting a part of the list, are both O(1) operations, but transferring N elements from one Vec to another is at least an O(N) operation.

Arena based solution will allow implementing all operations with their expected algorithmic complexity, but this comes at the cost of never being able to reclaim memory from the arena as long as a single data structure exists which references the arena.

Finally Rc + RefCell suffer from increased memory consumption and runtime overhead:

  • Rc typically embeds 2 usize counters: the strong and weak count.
    • An Rc implementation optimized for data structures could ditch the weak count.
  • RefCell typically embeds an isize counter: to count the number of readers and writers.

Does ghost-collections offer a better alternative then?

Maybe.

Lighter-weight building bricks

This crate aims at refining the Rc + RefCell solution by substituting both of them with StaticRc and GhostCell respectively:

  • GhostCell is a compile-time cell, splitting the aliasing compile-time checks to the zero-sized GhostToken.
  • StaticRc is a compile-time reference counted pointer, so is mostly free at run-time.
    • One exception: joining 2 StaticRc requires checking that they both refer to the same allocation.

On the face of it, this seems fairly interesting:

  • No memory overhead.
  • No run-time overhead.

And a catch.

There's a catch, though.

GhostCell is a straightforward replacement, StaticRc is a bit more complicated. Typical implementations based on Rc will clone it whenever necessary -- but cloning a StaticRc is impossible.

If you have split your StaticRc in halves, that is, you have 2 instances of StaticRc, then you only have those two instances. You can split one, but it changes its type. This makes temporarily holding onto one more instance of a reference-counted pointer... impossible. There's no temporary instance.

To solve this problem, the nodes of the TripodList and TripodTree contain one extra instance of a pointer, compared to the typical implementation. That is, a node from a TripodList contains 3 pointers, and one from a TripodTree contains 4 pointers.

That's the same memory overhead as a special-purpose Rc with no weak count. And mostly the same run-time overhead too as deploying the tripod pointer is a memory write, just as cloning a Rc, and retracting the tripod pointer is another memory write, just as destroying a Rc. It may be slightly more efficient, and linear logic keeps one honest...

It may be possible to avoid this overhead altogether. LinkedList avoids it... and uses the experimental GhostCursor instead. It's not clear whether GhostCursor is quite sufficient to implement all the functionality of a list, though. And then there's the experimental issue...

At least it's safe, no?

Mostly?

The core functionality of GhostCell and StaticRc are really strong contenders:

  • GhostCell (the paper) was developed by Ralf Jung and co, and proven safe. The implementation hopefully is also safe.
  • StaticRc is really basic reference-counting, just at compile-time.

The core functionality, though, is also slightly insufficient. This forces this crate to use extra, experimental functionality from its 2 core crates, and those are much less proven:

  • static_rc::lift* is at the core of this crate. None of the collections are too useful without it. Its core idea looks sound (scary eh?), though even if it is, the implementation may not be...
  • GhostCursor is even more sketchy, safety-wise. If safe, it may allow writing the collections without an extra tripod pointer... May.

Is ghost-collections really helpful then?

Yes.

GhostCell is practical.

At the very least, this crate is a proof-of-concept that GhostCell is really usable.

To the best of our knowledge, this is the first crate making extensive use of GhostCell, and there were some questions as to whether it was possible to express algorithms with GhostCell. Having implemented a complete linked list and binary tree around GhostCell I can confirm that it is practical enough.

Replacing RefCell by GhostCell means avoiding both the memory and run-time overhead of RefCell. That's a significant advantage.

There are some ergonomic downsides, though:

  • Using lifetime as a brand means that all the entire program needs to be wrapped within a closure.
  • Having to pass the extra GhostToken everywhere means that many traits -- such as Debug -- cannot be implemented on the collections.

Those were all foreseen, so no surprise there, but of course it's still annoying.

StaticRc is practical.

This crate also proves that StaticRc works, although it is perhaps not the best tool for cyclic data structures.

Test bed.

Finally, this crate provides a test bed for both of the above.

After putting them through their paces, I am happy to report that MIRI finds no issue when running the full test-suite.

And beyond?

One interesting avenue of investigation is automatic translation of the current code, stripping out StaticRc and/or GhostCell.

As long as:

  • The current code is safe -- meaning, the 2 crates used are safe.
  • The translation preserves safety.

Then this opens up the ability to write entirely safe code and automatically translate it into simpler (though unsafe) code -- not unlike the work that compilers do.

Exciting, isn't it?

There is one difficulty, though, it's unclear whether the remove/split operations can be translated automatically:

  • Merging 2 pools of nodes -- through splice -- is trivially safe: the result is a coarser grain.
  • Splitting 1 pool of nodes into two, however, is not: the result is finer grain.

Today, the reference-counting and borrowing works globally, regardless of which pools of nodes a node belongs to. However translating StaticRc, N, D> into just NonNull breaks this assumptions, and functional bugs such as accidentally keeping a link between nodes across pools could lead to aliasing/lifetime issues.

Note: and even though tracing could solve the issue, it would be linear in the number of elements to trace at best, so would not make sense algorithmically.

That's all folks!

And thanks for reading.

You might also like...
Collection of Data Structures in Rust

cds - Collection of Data Structures !!! UNDER CONSTRUCTION !!! The version v0.0.1 is a crates.io placeholder. License Licensed under either of Apache

Succinct data structures in Rust

sucds: Succinct data structures in Rust sucds contains some succinct data structures written in Rust. Data structures So far, the following data struc

Coding-challenge - Algorithms and Data-structures, problems and solutions in Rust language using cargo-workspaces

Coding Challenge LeetCode/Hackerrank e.t.c Using this as an opportunity to improve my knowledge of rust lang If you found this repo useful to you, add

Common data structures and algorithms in Rust

Contest Algorithms in Rust A collection of classic data structures and algorithms, emphasizing usability, beauty and clarity over full generality. As

Rust data structures and client for the PubChem REST API

pubchem.rs Rust data structures and client for the PubChem REST API. πŸ”Œ Usage πŸ’Š Compound Create a Compound to query the PubChem API for a single comp

Dade is data definition for Rust structures.

dade dade is data definition for Rust structures. For the easy handle of data, the following will support it. Data validation. Data schema conforms Js

rust_aads - Rust Algorithms And Data Structures

rust_aads - Rust Algorithms And Data Structures rust_aads is an open repository with algorithms and data structures, used in computer science and comp

This repository aims to organize codes related to data structures in Rust. πŸ¦€
This repository aims to organize codes related to data structures in Rust. πŸ¦€

Rust Data Structure A project with the objective of introducing the main concepts about data structure using Rust! Explore the docs and learn Rust Β» R

Obake is a procedural macro for declaring and maintaining versioned data-structures.

Obake is a procedural macro for declaring and maintaining versioned data-structures. The name 'obake' is taken from the Japanese 'γŠεŒ–γ‘ (γŠγ°γ‘)', a class of supernatural beings in Japanese folklore that shapeshift.

Owner
null
NixEl is a Rust library that turns Nix code into a variety of correct, typed, memory-safe data-structures

?? NixEL Lexer, Parser, Abstract Syntax Tree and Concrete Syntax Tree for the Nix Expressions Language. NixEl is a Rust library that turns Nix code in

Kevin Amado 56 Dec 29, 2022
Garbage Collector(Hyaline- Safe Memory Reclaimation) for lock free data structures

Hyaline-SMR This crate provides garbage collection using hyaline algorithm for building concurrent data structures. When a thread removes an object fr

Abishek 2 Dec 21, 2022
Rust-algorithm-club - Learn algorithms and data structures with Rust

Rust Algorithm Club ?? ?? This repo is under construction. Most materials are written in Chinese. Check it out here if you are able to read Chinese. W

Weihang Lo 360 Dec 28, 2022
Rust Persistent Data Structures

Rust Persistent Data Structures Rust Persistent Data Structures provides fully persistent data structures with structural sharing. Setup To use rpds a

Diogo Sousa 883 Dec 31, 2022
Rust library for string parsing of basic data structures.

afmt Simple rust library for parsing basic data structures from strings. Usage You can specify string formats to any strucute, via the use of the fmt

Eduard 4 May 8, 2021
Serde is a framework for serializing and deserializing Rust data structures efficiently and generically.

Serde is a framework for serializing and deserializing Rust data structures efficiently and generically.

null 6.5k Dec 31, 2022
Algorithms and Data Structures of all kinds written in Rust.

Classic Algorithms in Rust This repo contains the implementation of various classic algorithms for educational purposes in Rust. Right now, it is in i

Alexander GonzΓ‘lez 49 Dec 14, 2022
Library containing various Data Structures implemented using Rust.

rust-data-structures Library containing various Data Structures implemented using Rust. Running You can test the library by running cargo test, an exa

c1m50c 1 Jan 6, 2022
A library for comparing data structures in Rust, oriented toward testing

Delta: Structural differencing in Rust The delta crate defines the trait Delta, along with a derive macro for auto-generating instances of this trait

John Wiegley 19 Oct 7, 2022
A library for comparing data structures in Rust, oriented toward testing

The comparable crate defines the trait [Comparable], along with a derive macro for auto-generating instances of this trait for most data types. Primar

John Wiegley 19 Oct 7, 2022