A concurrent GC.

Related tags

Database elise
Overview

Für Elise

What is elise?

Für Elise (short for Elise) is a concurrent garbage collection attempt based on shifgrethor. The goal is to define an API for precise, tracing garbage collection in Rust which upholds all of Rust's safety guarantees. A user using the API defined in this library will not be at risk for any of the kinds of memory errors that Rust can prevent.

What kind of access does elise provide to data?

Some previous garbage collector APIs resolve some of the safety issues with garbage collection by only allowing you to copy data out of them, rather than allowing you to hold references directly into the managed memory. This is very convenient for copying collectors as they don't have to implement pinning.

Elise is not like those APIs. With elise, you can have direct references into the managed heap. In fact, you can have arbitrary references between the stack, the managed heap, and the unmanaged heap:

  • Garbage collected objects can own data allocated in the unmanaged heap, and that data will be dropped when those objects are collected.
  • Garbage collected objects can own references into the stack, and you are guaranteed not to be able to read from those references after they have gone out of scope in safe code.
  • You can store pointers to garbage collected objects in the heap or on the stack.

The transitive combination of all of these is true: for example, a GC'd object can own a heap-allocated vector of references to objects on the stack which themselves have GC'd objects inside them.

Note that, like all garbage collection in Rust (e.g. Rc), elise only provides immutable access to data it is managing. See the section on interior mutability later.

What kind of garbage collector is elise?

Elise provides a garbage collector, but that is not what is interesting about elise. The garbage collector here is a mark-and-sweep of the simplest and least optimized possible variety. However, the API which makes it safe could apply to much more performant garbage collectors, specifically with these properties:

  • This is an API for tracing garbage collectors, not for other garbage collection techniques like reference counting.
  • This is an API for precise tracing collectors, not a conservative collector like the Boehme GC.
  • The API could be trivially adapted to support concurrent GCs, the current implementation provides a thread safe HeapRoot.
  • The API can support moving collectors as long as they implement a pinning mechanism. A moving collector which does not support pinning is incompatible with elise's API goals.

What is the state of the project?

Code has been written, sometimes frantically. A few basic smoke tests of things that should work working correctly has been done. No attempts at proofs have been made. It likely has glaring bugs. It might seg fault, ruin your laundry, halt and catch fire, etc.

You should not use this for anything that you depend on (e.g. "in production")! But if you want to play around with it for fun, by all means.

What is elise going to be used for?

This is currently an experimental project for my JavaScript engine.

Why is it called elise?

"Für Elise" is a well known compositions from Beethoven. It's always played when there's a garbage truck in Taiwan.

How does elise work?

In brief, a precise tracing garbage collector like elise is designed for works like this:

  • All of the references from the unmanaged portion of memory (stack and heap, in our context) into the managed portion of memory are tracked. These are called "roots."
  • From those roots, the collector "traces" through the graph of objects to find all of the objects that can still be accessed from those roots (and therefore, the objects which are still "alive.")

Our API needs to properly account for both rooting objects and tracing through them to transitively rooted objects.

Rooting

Given our limitations (i.e. no language support & the existence of a dynamic, unmanaged heap), it is necessary that we track our roots through some sort of intrusive collection. As a result, our roots cannot be moved around.

Fortunately, we have recently made a lot of progress on supporting intrusive data structures in Rust, thanks to the pinning API. The rooting layer sits on top of an underlying pinning API, which we use to guarantee that roots are dropped in a deterministic stack order.

Roots are created with a special macro called letroot!. The roots created with this macro carry a special lifetime called 'root, which is the lifetime of the scope they are created in. You can use the gc method on a root to begin garbage collecting some data:

// root: Root<'root>;
letroot!(root);

let x: Gc<'root, i32> = root.gc(0);

The Gc pointer is a copyable reference to the data which proves that the data has been rooted. It carries the lifetime of the root, and therefore can't outlive the root you used to create it.

In order to return Gc'd data from a function, you need to pass a root into the function:

fn foo(root: Root<'root>) -> Gc<'root, i32> {
    root.gc(0);
}

You can also use a root to reroot data that has already been rooted once, extending its lifetime:

fn foo(outer: Root<'root1>) -> Gc<'root1, i32> {
    // This root is only alive for the frame of this function call
    //
    // inner: Gc<'root2, i32>
    letroot!(inner);
    let x: Gc<'root2, i32> = inner.gc(0);

    // But you can extend a Gc rooted only for this function using the outer root:
    let x: Gc<'root1, i32> = outer.reroot(x);
    return x;
}

Tracing

Its not enough to be able to root objects in the Gc, you also need to be able to trace from the root to other objects transitively. For example, you might want a struct, stored in the Gc, with fields pointing to other objects which are also being garbage collected.

The problem that emerges is ensuring that you can only access transitively rooted objects when you know they are actually being traced from a rooted object. A few components enable us to solve this:

  • First, to put a type in the garbage collector it must implement a trait which defines how to trace through it.
  • Second, instead of only having a Gc type, we have a second type: GcStore.
  • Using derived accessors, we can guarantee a safe API; let me explain:

The Gc type implements Deref and Copy, it functionally acts like a normal reference, except that you can extend its lifetime by rerooting it. It does not expose a safe API for constructing it: the only constructor is an unsafe Gc::rooted constructor: to safely call this constructor, you must prove that this will be rooted for the lifetime 'root.

The GcStore type is more like a Box, except that it does not implement Deref. You can safely construct a GcStore, which will have Box semantics until it is rooted - that is, if you drop a GcStore without having rooted it first, it will deallocate what you have put into it.

Finally, as a part of the same derive which implements the traits necessary to garbage collect your type, you can implement an accessor to transform your GcStore fields into Gc fields. For example:

#[derive(GC)]
struct Foo<'root> {
    #[gc] bar: GcStore<'root, Bar>,
}

This code gives generates this method on Foo:

fn bar(self: Gc<'root, Foo<'_>>) -> Gc<'root, Bar>

Because the derive also guarantees that this field is traced properly, if you have a Gc<Foo>, it is safe to construct a Gc<Bar> from it.

This behavior is also implemented for several container types. For example, you can transform a Vec<GcStore<_>> to a Vec<Gc> in the same way:

#[derive(GC)]
struct Foo<'root> {
    #[gc] vec: Vec<GcStore<'root, Bar>>,
}

// Generates:
fn vec<'root>(self: Gc<'root, Self>) -> Vec<Gc<'root, Bar>>;

Destructors

Destructors present a troubling problem for garbage collectors. Destructors are safe because we can guarantee that they are run when the struct is dropped, but something garbage collected will not actually be dropped (and the destructor run) until much later. This can cause two problems:

  • If the destructor accesses other Gc'd data, that data might have been freed earlier by the collector.
  • If the destructor accesses data on the stack, that data might have been freed when the stack was popped before the collector ran.

As a result, the GC does not run destructors on its objects. Instead, it runs a finalizer just before collecting each object. You can define what happens in the finalizer by implementing the Finalize trait for your type and adding a #[gc(finalize)] attribute to your struct:

#[derive(GC)]
#[gc(finalize)]
struct Foo;

impl elise::Finalize for Foo {
    fn finalize(&mut self) {
        println!("Collecting a Foo");
    }
}

Because Finalize does not give you a Gc pointer to your type, you cannot access other Gc pointers (in other words, you cannot "prove rootedness" because you are no longer rooted in the finalizer.) However, this is insufficient for preventing you from accessing other non-owned data, like stack references.

As a result, if your type contains any lifetimes other than 'root, attempting to implement a finalizer like this will fail. Instead, you will need to implement an unsafe finalizer:

#[derive(GC)]
#[gc(unsafe_finalize)]
struct Foo<'a>(&'a i32);

unsafe impl elise::UnsafeFinalize for Foo {
    fn finalize(&mut self) {
        println!("Collecting a Foo");
    }
}

You must audit these finalizers and guarantee that your finalizer never reads from the any of the borrowed references inside of it, otherwise your code is not safe and contains undefined behavior.

Interior mutability

The final problem is interior mutability: you can only get a shared reference to a GC'd pointer, ideally you would be able to mutate things inside of it using some form of interior mutability.

The unique problem has to do with tracing. Let's say you have a RefCell<Option<GcStore<i32>>> inside of your type:

let x: Gc<RefCell<Option<GcStore<i32>>>>;

let moved: GcStore<i32> = x.borrow_mut().take().unwrap();

// The value behind `x` is now `None`. The `moved` variable is not being traced
// at all, its entirely unrooted!

// Run the garbage collector. Because `moved` is unrooted, it will be
// collected. `moved` is now a dangling pointer
elise::collect();

// Put the moved and dangling pointer back into `x`:
*x.borrow_mut() = Some(moved);

// Observe `x`, which is now dangling. Segfault!
println!("{}", x);

We cannot allow you to move traced GcStore pointers around without some other mechanism of rooting them.

For this reason, elise currently provides only partial support for interior mutability:

  • There is a separate trait called NullTrace, which indicates that tracing through this type is a no-op (i.e. it contains no Gc'd pointers). You are free to have Cell and RefCell types containing NullTrace data.
  • PinCell is trace safe, because it does not allow you to move the data it gives you. If you can't move the data, you can't unroot it.

In other words, you are free to have normal interior mutability of anything that doesn't contain a Gc pointer, and you can have partial interior mutability (only pinned mutable references) for things that do contain Gc pointers.

Note that PinCell introduces some problems for copying collectors, because it gives you a Pin<&mut T>, which other code (e.g. async/await code) might rely on memory stability (as opposed to semantic stability, which we rely on).

Its an open problem to find new abstractable APIs which allow moving data only between traced memory locations, which would allow you to safely move Gc pointers around.

You might also like...
A concurrent, append-only vector
A concurrent, append-only vector

The vector provided by this crate suports concurrent get and push operations. Reads are always lock-free, as are writes except when resizing is required.

wait-free 4-level 64-bit pagetable for contiguous low-contention concurrent metadata

pagetable Wait-free 4-level page table that maps from a u64 key to an &AtomicU64 value. Page fan-out is 2^16. If a key doesn't exist, intermediate pag

Log for concurrent workloads, with support for atomic batches and in-order recovery

sharded-log A batch-oriented multi-threaded sharded log for workloads that occasionally flush logs into some other system. All batches have a 32-bit C

A collection of high performance concurrent channels.

A collection of high performance concurrent channels.

Für Elise (short for Elise) is a concurrent garbage collection attempt based on shifgrethor.

Für Elise (short for Elise) is a concurrent garbage collection attempt based on shifgrethor. The goal is to define an API for precise, tracing garbage collection in Rust which upholds all of Rust's safety guarantees. A user using the API defined in this library will not be at risk for any of the kinds of memory errors that Rust can prevent.

A concurrent GC.

Für Elise What is elise? Für Elise (short for Elise) is a concurrent garbage collection attempt based on shifgrethor. The goal is to define an API for

Concurrent and multi-stage data ingestion and data processing with Rust+Tokio

TokioSky Build concurrent and multi-stage data ingestion and data processing pipelines with Rust+Tokio. TokioSky allows developers to consume data eff

A lock-free, partially wait-free, eventually consistent, concurrent hashmap.
A lock-free, partially wait-free, eventually consistent, concurrent hashmap.

A lock-free, partially wait-free, eventually consistent, concurrent hashmap. This map implementation allows reads to always be wait-free on certain pl

Implementation of CSP for concurrent programming.
Implementation of CSP for concurrent programming.

CSPLib Communicating Sequential Processes (CSP) Background Communicating Sequential Processes (CSP) is a way of writing a concurrent application using

syncmap is a fast, concurrent cache library built with a focus on performance and correctness.

syncmap syncmap syncmap is a fast, concurrent cache library syncmap is a fast, concurrent cache library built with a focus on performance and correctn

A terminal-based password manager, generator, and importer/exporter (Firefox, Chrome) backed with a concurrent hashmap
A terminal-based password manager, generator, and importer/exporter (Firefox, Chrome) backed with a concurrent hashmap

rucksack A terminal-based password manager, generator, and importer/exporter (Firefox, Chrome) backed with a concurrent hashmap Features Password gene

Thread-safe clone-on-write container for fast concurrent writing and reading.

sync_cow Thread-safe clone-on-write container for fast concurrent writing and reading. SyncCow is a container for concurrent writing and reading of da

Fast, concurrent, arena-based allocator with drop support

blink-alloc Blink-alloc is extremely fast allocator based on the common idea of allocating linearly zipping a cursor through memory chunk and reset ev

Fast, Concurrent, Rust based Tidal-Media-Downloader implementation.

tdl tdl is a rust implementation of the Python Script Tidal-Media-Downloader. Overview tdl offers significant performance improvements over the origin

Sharded, concurrent mini redis that support http interface implemented in rust

Rudis A mini version of redis server that provides http interface implemented in Rust. The in-memorry kv-storage is sharded and concurrent safe. Inspi

High-performance, lock-free local and concurrent object memory pool with automated allocation, cleanup, and verification.

Opool: Fast lock-free concurrent and local object pool Opool is a high-performance Rust library that offers a concurrent and local object pool impleme

Rust library for concurrent data access, using memory-mapped files, zero-copy deserialization, and wait-free synchronization.

mmap-sync mmap-sync is a Rust crate designed to manage high-performance, concurrent data access between a single writer process and multiple reader pr

A lightweight async Web crawler in Rust, optimized for concurrent scraping while respecting `robots.txt` rules.

🕷️ crawly A lightweight and efficient web crawler in Rust, optimized for concurrent scraping while respecting robots.txt rules. 🚀 Features Concurren

Owner
Ngo Iok Ui (Wu Yu Wei)
Ngo Iok Ui (Wu Yu Wei)
Sharded, concurrent mini redis that support http interface implemented in rust

Rudis A mini version of redis server that provides http interface implemented in Rust. The in-memorry kv-storage is sharded and concurrent safe. Inspi

Lorenzo Cao 43 May 30, 2023
High-performance, lock-free local and concurrent object memory pool with automated allocation, cleanup, and verification.

Opool: Fast lock-free concurrent and local object pool Opool is a high-performance Rust library that offers a concurrent and local object pool impleme

Khashayar Fereidani 8 Jun 3, 2023
An Extensible, Concurrent Web Framework for Rust

Iron Extensible, Concurrency Focused Web Development in Rust. Response Timer Example Note: This example works with the current iron code in this repos

null 6.1k Dec 27, 2022
Shuttle is a library for testing concurrent Rust code

Shuttle Shuttle is a library for testing concurrent Rust code. It is an implementation of a number of randomized concurrency testing techniques, inclu

Amazon Web Services - Labs 373 Dec 27, 2022
Spawn multiple concurrent unix terminals in Discord

Using this bot can be exceedingly dangerous since you're basically granting people direct access to your shell.

Simon Larsson 11 Jun 1, 2021
A Rust synchronisation primitive for "Multiplexed Concurrent Single-Threaded Read" access

exit-left verb; 1. To exit or disappear in a quiet, non-dramatic fashion, making way for more interesting events. 2. (imperative) Leave the scene, and

Jonathan de Jong 0 Dec 5, 2021
A Multitask Parallel Concurrent Executor for ns-3 (network simulator)

A Multitask Parallel Concurrent Executor for ns-3 (network simulator)

BobAnkh 9 Oct 27, 2022
Real Time For the Masses (RTFM), a framework for building concurrent applications, for MSP430 MCUs

msp430-rtfm Real Time For the Masses (RTFM), a framework for building concurrent applications, for MSP430 MCUs License Licensed under either of Apache

Jorge Aparicio 9 Sep 10, 2022
Cogo is a high-performance library for programming stackful coroutines with which you can easily develop and maintain massive concurrent programs.

Cogo is a high-performance library for programming stackful coroutines with which you can easily develop and maintain massive concurrent programs.

co-rs 47 Nov 17, 2022
Fast, efficient, and robust memory reclamation for concurrent data structures

Seize Fast, efficient, and robust memory reclamation for concurrent data structures. Introduction Concurrent data structures are faced with the proble

Ibraheem Ahmed 240 Dec 23, 2022