Für Elise (short for Elise) is a concurrent garbage collection attempt based on shifgrethor.

Related tags

Database elise
Overview

Für Elise

What is elise?

Für Elise (short for Elise) is a concurrent garbage collection attempt based on shifgrethor. The goal is to define an API for precise, tracing garbage collection in Rust which upholds all of Rust's safety guarantees. A user using the API defined in this library will not be at risk for any of the kinds of memory errors that Rust can prevent.

What kind of access does elise provide to data?

Some previous garbage collector APIs resolve some of the safety issues with garbage collection by only allowing you to copy data out of them, rather than allowing you to hold references directly into the managed memory. This is very convenient for copying collectors as they don't have to implement pinning.

Elise is not like those APIs. With elise, you can have direct references into the managed heap. In fact, you can have arbitrary references between the stack, the managed heap, and the unmanaged heap:

  • Garbage collected objects can own data allocated in the unmanaged heap, and that data will be dropped when those objects are collected.
  • Garbage collected objects can own references into the stack, and you are guaranteed not to be able to read from those references after they have gone out of scope in safe code.
  • You can store pointers to garbage collected objects in the heap or on the stack.

The transitive combination of all of these is true: for example, a GC'd object can own a heap-allocated vector of references to objects on the stack which themselves have GC'd objects inside them.

Note that, like all garbage collection in Rust (e.g. Rc), elise only provides immutable access to data it is managing. See the section on interior mutability later.

What kind of garbage collector is elise?

Elise provides a garbage collector, but that is not what is interesting about elise. The garbage collector here is a mark-and-sweep of the simplest and least optimized possible variety. However, the API which makes it safe could apply to much more performant garbage collectors, specifically with these properties:

  • This is an API for tracing garbage collectors, not for other garbage collection techniques like reference counting.
  • This is an API for precise tracing collectors, not a conservative collector like the Boehme GC.
  • The API could be trivially adapted to support concurrent GCs, the current implementation provides a thread safe HeapRoot.
  • The API can support moving collectors as long as they implement a pinning mechanism. A moving collector which does not support pinning is incompatible with elise's API goals.

What is the state of the project?

Code has been written, sometimes frantically. A few basic smoke tests of things that should work working correctly has been done. No attempts at proofs have been made. It likely has glaring bugs. It might seg fault, ruin your laundry, halt and catch fire, etc.

You should not use this for anything that you depend on (e.g. "in production")! But if you want to play around with it for fun, by all means.

What is elise going to be used for?

This is currently an experimental project for my JavaScript engine.

Why is it called elise?

"Für Elise" is a well known compositions from Beethoven. It's always played when there's a garbage truck in Taiwan.

How does elise work?

In brief, a precise tracing garbage collector like elise is designed for works like this:

  • All of the references from the unmanaged portion of memory (stack and heap, in our context) into the managed portion of memory are tracked. These are called "roots."
  • From those roots, the collector "traces" through the graph of objects to find all of the objects that can still be accessed from those roots (and therefore, the objects which are still "alive.")

Our API needs to properly account for both rooting objects and tracing through them to transitively rooted objects.

Rooting

Given our limitations (i.e. no language support & the existence of a dynamic, unmanaged heap), it is necessary that we track our roots through some sort of intrusive collection. As a result, our roots cannot be moved around.

Fortunately, we have recently made a lot of progress on supporting intrusive data structures in Rust, thanks to the pinning API. The rooting layer sits on top of an underlying pinning API, which we use to guarantee that roots are dropped in a deterministic stack order.

Roots are created with a special macro called letroot!. The roots created with this macro carry a special lifetime called 'root, which is the lifetime of the scope they are created in. You can use the gc method on a root to begin garbage collecting some data:

// root: Root<'root>;
letroot!(root);

let x: Gc<'root, i32> = root.gc(0);

The Gc pointer is a copyable reference to the data which proves that the data has been rooted. It carries the lifetime of the root, and therefore can't outlive the root you used to create it.

In order to return Gc'd data from a function, you need to pass a root into the function:

fn foo(root: Root<'root>) -> Gc<'root, i32> {
    root.gc(0);
}

You can also use a root to reroot data that has already been rooted once, extending its lifetime:

fn foo(outer: Root<'root1>) -> Gc<'root1, i32> {
    // This root is only alive for the frame of this function call
    //
    // inner: Gc<'root2, i32>
    letroot!(inner);
    let x: Gc<'root2, i32> = inner.gc(0);

    // But you can extend a Gc rooted only for this function using the outer root:
    let x: Gc<'root1, i32> = outer.reroot(x);
    return x;
}

Tracing

Its not enough to be able to root objects in the Gc, you also need to be able to trace from the root to other objects transitively. For example, you might want a struct, stored in the Gc, with fields pointing to other objects which are also being garbage collected.

The problem that emerges is ensuring that you can only access transitively rooted objects when you know they are actually being traced from a rooted object. A few components enable us to solve this:

  • First, to put a type in the garbage collector it must implement a trait which defines how to trace through it.
  • Second, instead of only having a Gc type, we have a second type: GcStore.
  • Using derived accessors, we can guarantee a safe API; let me explain:

The Gc type implements Deref and Copy, it functionally acts like a normal reference, except that you can extend its lifetime by rerooting it. It does not expose a safe API for constructing it: the only constructor is an unsafe Gc::rooted constructor: to safely call this constructor, you must prove that this will be rooted for the lifetime 'root.

The GcStore type is more like a Box, except that it does not implement Deref. You can safely construct a GcStore, which will have Box semantics until it is rooted - that is, if you drop a GcStore without having rooted it first, it will deallocate what you have put into it.

Finally, as a part of the same derive which implements the traits necessary to garbage collect your type, you can implement an accessor to transform your GcStore fields into Gc fields. For example:

#[derive(GC)]
struct Foo<'root> {
    #[gc] bar: GcStore<'root, Bar>,
}

This code gives generates this method on Foo:

fn bar(self: Gc<'root, Foo<'_>>) -> Gc<'root, Bar>

Because the derive also guarantees that this field is traced properly, if you have a Gc, it is safe to construct a Gc from it.

This behavior is also implemented for several container types. For example, you can transform a Vec> to a Vec in the same way:

#[derive(GC)]
struct Foo<'root> {
    #[gc] vec: Vec<GcStore<'root, Bar>>,
}

// Generates:
fn vec<'root>(self: Gc<'root, Self>) -> Vec<Gc<'root, Bar>>;

Destructors

Destructors present a troubling problem for garbage collectors. Destructors are safe because we can guarantee that they are run when the struct is dropped, but something garbage collected will not actually be dropped (and the destructor run) until much later. This can cause two problems:

  • If the destructor accesses other Gc'd data, that data might have been freed earlier by the collector.
  • If the destructor accesses data on the stack, that data might have been freed when the stack was popped before the collector ran.

As a result, the GC does not run destructors on its objects. Instead, it runs a finalizer just before collecting each object. You can define what happens in the finalizer by implementing the Finalize trait for your type and adding a #[gc(finalize)] attribute to your struct:

#[derive(GC)]
#[gc(finalize)]
struct Foo;

impl elise::Finalize for Foo {
    fn finalize(&mut self) {
        println!("Collecting a Foo");
    }
}

Because Finalize does not give you a Gc pointer to your type, you cannot access other Gc pointers (in other words, you cannot "prove rootedness" because you are no longer rooted in the finalizer.) However, this is insufficient for preventing you from accessing other non-owned data, like stack references.

As a result, if your type contains any lifetimes other than 'root, attempting to implement a finalizer like this will fail. Instead, you will need to implement an unsafe finalizer:

#[derive(GC)]
#[gc(unsafe_finalize)]
struct Foo<'a>(&'a i32);

unsafe impl elise::UnsafeFinalize for Foo {
    fn finalize(&mut self) {
        println!("Collecting a Foo");
    }
}

You must audit these finalizers and guarantee that your finalizer never reads from the any of the borrowed references inside of it, otherwise your code is not safe and contains undefined behavior.

Interior mutability

The final problem is interior mutability: you can only get a shared reference to a GC'd pointer, ideally you would be able to mutate things inside of it using some form of interior mutability.

The unique problem has to do with tracing. Let's say you have a RefCell>> inside of your type:

let x: Gc<RefCell<Option<GcStore<i32>>>>;

let moved: GcStore<i32> = x.borrow_mut().take().unwrap();

// The value behind `x` is now `None`. The `moved` variable is not being traced
// at all, its entirely unrooted!

// Run the garbage collector. Because `moved` is unrooted, it will be
// collected. `moved` is now a dangling pointer
elise::collect();

// Put the moved and dangling pointer back into `x`:
*x.borrow_mut() = Some(moved);

// Observe `x`, which is now dangling. Segfault!
println!("{}", x);

We cannot allow you to move traced GcStore pointers around without some other mechanism of rooting them.

For this reason, elise currently provides only partial support for interior mutability:

  • There is a separate trait called NullTrace, which indicates that tracing through this type is a no-op (i.e. it contains no Gc'd pointers). You are free to have Cell and RefCell types containing NullTrace data.
  • PinCell is trace safe, because it does not allow you to move the data it gives you. If you can't move the data, you can't unroot it.

In other words, you are free to have normal interior mutability of anything that doesn't contain a Gc pointer, and you can have partial interior mutability (only pinned mutable references) for things that do contain Gc pointers.

Note that PinCell introduces some problems for copying collectors, because it gives you a Pin<&mut T>, which other code (e.g. async/await code) might rely on memory stability (as opposed to semantic stability, which we rely on).

Its an open problem to find new abstractable APIs which allow moving data only between traced memory locations, which would allow you to safely move Gc pointers around.

You might also like...
A LSM-based Key-Value Store in Rust

CobbleDB A LSM-based Key-Value Store in Rust Motivation There is no open-source LSM-based key-value store in Rust natively. Some crates are either a w

PackDb is a simple messagepack based database in rust

PackDb is a simple key value messagepack store Inspired by kwik It uses your local storage

Simple document-based NoSQL DBMS from scratch

cudb (a.k.a. cuda++) Simple document-based noSQL DBMS modelled after MongoDB. (Has nothing to do with CUDA, has a lot to do with the Cooper Union and

Rust async runtime based on io-uring.

Monoio A thread-per-core Rust runtime with io_uring. 中文说明 Design Goal As a runtime based on io_uring, Monoio is designed to be the most efficient and

A rust Key-Value store based on Redis.

Key-Value Store A Key-Value store that uses Redis to store data. Built using an async web framework in Rust with a full Command-Line interface and log

A fast, searchable, knowledge engine using various machine learning models to aggregate based on importance, association and relevance

NewsAggregator We live in an era where both the demand and quantity of information are enormous. However, the way we store and access that information

rinflux is Rust based influx client implementation that have been inspired from influx other language implementation, developed with 💖
rinflux is Rust based influx client implementation that have been inspired from influx other language implementation, developed with 💖

Unofficial InfluxDB Driver for Rust This library is a work in progress. This means a feature you might need is not implemented yet or could be handled

This is an Oracle database driver for Rust based on ODPI-C

Rust-oracle This is an Oracle database driver for Rust based on ODPI-C. Change Log See ChangeLog.md. Build-time Requirements C compiler. See Compile-t

General basic key-value structs for Key-Value based storages

General basic key-value structs for Key-Value based storages

Owner
Ngo Iok Ui (Wu Yu Wei)
Ngo Iok Ui (Wu Yu Wei)
Sharded, concurrent mini redis that support http interface implemented in rust

Rudis A mini version of redis server that provides http interface implemented in Rust. The in-memorry kv-storage is sharded and concurrent safe. Inspi

Lorenzo Cao 43 May 30, 2023
High-performance, lock-free local and concurrent object memory pool with automated allocation, cleanup, and verification.

Opool: Fast lock-free concurrent and local object pool Opool is a high-performance Rust library that offers a concurrent and local object pool impleme

Khashayar Fereidani 8 Jun 3, 2023
The most efficient, scalable, and fast production-ready serverless REST API backend which provides CRUD operations for a MongoDB collection

Optimal CRUD Mongo Goals of This Project This is meant to be the most efficient, scalable, and fast production-ready serverless REST API backend which

Evaluates2 1 Feb 22, 2022
Rust and TypeScript example code for finding all members from a collection id.

get-collection-rs Use the Crawler cargo build --release ./target/release/get-collection <rpc_node> <collection_id> Example: ./target/release/get-col

Metaplex Foundation 16 Dec 22, 2022
A collection of example project using Njord.

Example Projects A collection of example project using Njord. Contributors The following contributors have either helped to start this project, have c

Njord 3 Dec 6, 2023
Asyncronous Rust Mysql driver based on Tokio.

mysql-async Tokio based asynchronous MySql client library for rust programming language. Installation Library hosted on crates.io. [dependencies] mysq

Anatoly I 292 Dec 30, 2022
An async executor based on the Win32 thread pool API

wae An async executor based on the Win32 thread pool API use futures::channel::oneshot; #[wae::main] async fn main() { let (tx, rx) = oneshot::ch

Raphaël Thériault 10 Dec 10, 2021
ForestDB - A Fast Key-Value Storage Engine Based on Hierarchical B+-Tree Trie

ForestDB is a key-value storage engine developed by Couchbase Caching and Storage Team, and its main index structure is built from Hierarchic

null 1.2k Dec 26, 2022
Owlyshield is an open-source AI-driven behaviour based antiransomware engine written in Rust.

Owlyshield (mailto:[email protected]) We at SitinCloud strongly believe that cybersecurity products should always be open-source: Critical decis

SitinCloud 255 Dec 25, 2022
Provides a Rust-based SQLite extension for using Hypercore as the VFS for your databases.

SQLite and Hypercore A Rust library providing SQLite with an virtual file system to enable Hypercore as a means of storage. Contributing The primary r

Jacky Alciné 14 Dec 5, 2022