VLists in rust

Anton Dyudin

Last update: Jan 29, 2022

Related tags

Data structures vlist-rs

Overview

Running

WORK IN PROGRESS!

(0. curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh)
1. cargo test

Motivation

In the urbit memory model every cell is stored as, effectively,

struct _u3a_boxed_cell {
    c3_w  siz_w;  // size of this box (6)
    c3_w  use_w;  // reference count
    struct u3a_cell {
        c3_w    mug_w;  // hash
        u3_noun hed;    // car
        u3_noun tal;    // cdr
    }
    c3_w  siz_w;  // size of this box (still 6)
}

Assuming cells are something like 80% of the loom, this means at any given point a quarter of nonfree memory consists of the bytes 0x0000_0006 repeated over and over and over
which has at times struck me as… inelegant

(use_w is almost always 1, mug_w is frequently uninitialized at 0, but there are certainly merits in including them, if perhaps not once per byte of a "tape" linked-list or usually-nybble of nock formula. cdr - we'll get to that)

One simple way to reclaim those siz bytes is to store them in the pointer: e.g. devote alternating 4kb pages to cells and noncells, such that a bit test can discriminate them. (You can of course still have indirect atoms and other structures >4kb, they just have to begin on an odd-numbered page). This produces a physical memory(cache etc) savings of 25%, and about breaks even on virtual memory(better the closer you are to 50% of your boxes being cells), at the cost of some fragmentation(and 25% more address use if you have only cells or noncells)

A more flexible, but slower, scheme is to use a bitvector of type tags - even a full page granularity of 2^19 bits is only 65kb, fitting easily into L2 cache(at the cost of whatever other things you might on the margin want in L2 cache).

The pesky cdr field

Of course we are not constrained to only the type tags currently present in u3 (cell vs indirect atom). An ancient idea in the lisp world is CDR Coding, where you store [A B C D] as a linear array, despite its logical structure of [A [B [C D]]]

One can imagine a third "long cell" type, struct u3a_quad { u3_noun car; u3_noun cadr; u3_noun caddr; u3_noun cdddr}, for storing tuples of 3 or 4 elements. (Aligned so that a pointer to its cadr is distinguishable from a pointer to is car, respectively.) This introduces overhead when only 3 elements are present, though not more than if you were to store them as a pair linking to a second pair - but what is "overhead"? The use and mug have been swept under the rug.
A minimal answer is to add a header for use_dot; use_cdr; use_cddr; mug_dot; mug_cdr; mug_cddr, which rolls back the space savings to only the cdr pointers.
The three use can also be consolidated to a total "references anywhere inside this structure" count, at the cost of occasionally keeping whole quad-cells whose only live data is a [caddr cdddr] at the end. As for mug, caching only mug_cddr and forbidding placing quad-cell dot/cdr pointers in another quad-cell's car or cadr positions would maintain the constant-time bound. (You can, after all, always allocate a regular cell instead.)

Vlists

This repository implements an elaboration on the "quad cell" scheme, the VList Datastructure. The core conceit is to store a list ~[1 2 3 4 5 6 7 8 9] as an exponentially flatter sequence

a: {(-:! +<:!) +>-:1 +>+<:2 +>+>-:3 +>+>+<:4 +>+>+>-:5 +>+>+>+<:&b}
b: {-:6 +<:7 +>-:8 +>+:&c}
c: {-:9 +:~}

That is, whenever you find yourself consing onto the middle of a list segment, insert your car as the previous element(if unoccupied) and return a pointer to that; if you cons onto an entire list segment, allocate a new segment twice as big - about the current length of the entire list. This uses at most twice as many words as list elements(compare the original linked list scheme which always uses twice as many words), averaging 1.5x. Placing a bound of 1 page on segment size restricts the maximum absolute overhead to that one page. And now whenever you have a list with a thousand elemends, they're mostly sequential in memory - operations like lent and snag admit log N jets, and unoptimized cdr in general-purpose code leaves the cache line prefetcher much happier.

Metadata

Reference counts are kept in a per-page parallel array of u16 (overflow scheme tbd for adversarial inputs), figuring that the locality savings for the simple case of read-only access of outer-road data(e.g. library code) outweigh the cache misses on updating them.
Further work could include implementing "immutable bean" borrow-inference in the bytecode compiler to further reduce reference count thrashing.

mug is not presently implemented, but could work in a manner similar to rc.

History

Originally implemented as a gist which may be easier to follow.

Bytes?

In the canonical urbit allocator, (trip 'abcdefghijklmnop') producing "abcdefghijklmnop" takes a 16-byte indirect atom value(+16 byte overhead), and converts it into a 384-byte linked list value, either a 12x or 24x increase. Under the proposed scheme(assuming small-but-indirect atoms are stuffed into the corresponsing power-of-two segment slabs and pointer-tagged), the 16 byte value(+ 2 byte reference count) will as a linked list occupy ~64 bytes - we've gotten rid of the cdr, but the car still stores each character as a 31-byte direct atom.
You could instead imagine yet another pointer tag for "byte vector, but pretend it's a linked list" - car is "read the character pointer to a direct atom", cdr is "increment pointer, and if it ends up too round replace with ~", is_atom is false and successor is crash, technically that's all you need for nock. This suggests a broader field of data-encoding-jets, to match nock execution jets - but this "readme" is getting long as is :)

Comments

Simplify grab_page()

Create a new Iterator-implementing type to iterate over the contents of a page, and have grab_page() return that, rather than doing a bunch of messy type casts in grab_page() itself.

opened by ronreg-ribdev 4

Build the obvious version of unify-as-isEqual

[ ] Write the unify function without any performance improvements
[ ] cons 1 2 = cons 1 2
[ ] list 1 thru 10 == list 1 thru 10
[ ] list of 10 fresh copies of 00 == ... and is not = list of (something else)

Write test cases in nock, or call directly? In nock has more of a sense of progress but... requires more working things. Don't do that yet..

Example test case

    #[test]
    fn decrement(){
        use Op::*;
        let store = heap();
        let prog = program(store,&[
            ...stuff
        ]);                        
        assert_eq!(?, nybble(...));
    }

// TODO detect when things are oif the same optimized representation type and detect when it has an equality special case
// TODO add bit streams - and then need equality for them
// TODO heap: &mut Store, actually unify
fn unify(heap: &Store, a: Elem, b: Elem) -> bool {
    if a == b { return true }
    use Noun::*;
    match (a.into(),b.into()) {
        //TODO test any of this
        (Ind(a), Ind(b)) => {
            if len(a) != len(b) { return false }
            for (ax,bx) in iter::zip(a.iter(), b.iter()) {
                if ax != bx { return false }
            }
            return true
        },
        (Cel(a), Cel(b)) => {
            if len(a) != len(b) { return false }
            for (ax,bx) in iter::zip(a.iter(), b.iter()) {
                if !unify(ax, bx) { return false }
            }
            return true
        },
        _ => false
    }
}

opened by compwron 0

implement minimal hint

implement minimal hint that just prints text to console

                Op::Hint => {
                    do_cons();
                    println!("hint: {:?}", Noun::from(subj));
                    //TODO implement hints
                    subj = stack.pop().expect("Underflow: no stack after hint");
                }

opened by compwron 0

fix lose: be able to run code without leaking memory
Doing this may involve changing ... data structures... in order to be able to free...

Be able to run code without leaking memory - by implementing freeing memory and call it

[ ] Make this method compile

[ ] Add a test to the main tests that freeing list of integers works fine before changing this

[ ] test: after running some nock that returns 0 or something, there's not more memory in use than before you ran the nock. In order to test this, might need to add another function to detect how much total memory is being used

There will need to be code in nock.rs that may just be an exact copy of this? that will be the recursive version of the lost function... maybe using public interfaces for some % ...

The commented out code frees lists of integers. We are trying to free lists of pointers, which involves some kind of architectural decisions... reckonings... contemplation...

// fn lose(&mut self, mut idx: Index) { // loop { // let Index(page, list, _item) = idx; // self.rc[page as usize][list as usize] -= 1; // if self.rc[page as usize][list as usize] > 0 { break; } // //TODO fix used // if Some(Ok(idx)) = self.cdr(idx){} else { break; } // //TODO free pages ever? // } // }

+ #[test] + fn reclaims_memory(){ + use Op::*; + let store = heap(); + let original_used = store.used_bytes(); + assert_eq!(0, nybble(store, 1, &[Dup, Cons, Lit(0)])); + assert_eq!(original_used, store.used_bytes()); + }
opened by compwron 0
[blocked] test overlarge data
This is blocked on - right now the pointer is statically sliced into index and subindex of a page page size is oging to have to be page size * the smallest element * how many elements you can have pages should be bigger in theory, unless the linker complains... not wuote 4GB "for one thing, you have to put the kernel somewhere..." xD pointer is statically sliced into 8bit index and 8bit subindex ... 2 things its being simplified for - 1. if pages are small, can allocate a few things and have it overflow into the next page and test for that 2. debug representation doesn't have a giant pile of zeros 3. don't need to allocate too much to test what happens when you run out (or fake it)

One of the blockers is- things are small to be easier to work with and debug You can just leave the larger chunk sizes commented out it- in the enum of all the different page types... here are 4 being used, lots that aren't...

//TODO 4096 maybe L: this should be a test case - allocate a large thing that doesn't fit const PAGE_SIZE: usize = 128;
blocked
opened by compwron 0