a collection of well-tested, serializable CRDTs for Rust

Overview

crates.io docs.rs

crdts: family of thoroughly tested hybrid crdt's.

A family of CRDT's supporting both State and Op based replication.

what is a CRDT?

CRDT's are the solution to highly available mutable state.

CRDT expands to Conflict Free Replicated Data Type, it refers to a family of structures that know how to merge without conflicts. There are two main sub-families of CRDT's: CvRDT and CmRDT. They differ in how they replicate. CvRDT's are state based, meaning you would ship the entire CRDT state across the network to your peers. CmRDT's instead replicate by distributing all edits (called Op's) to each node in your system.

Here we'll take a quick look at CRDT's, for the sake of clarity and brevity, we'll focusing only on CvRDT (all ideas still apply to CmRDT's). CvRDT structures define a merge(a, b) operation which takes states a and b produces a merged state. A simple example is the GSet (grow-only set), it's merge is the union of the two sets.

an attempt to understanding CRDT's by building one

CRDT's are all about partial orders, to turn a structure into a CRDT, you must first define a special kind of partial order over the state space of your structure. You must do this carefully as the partial order also defines how your merge behaves. For example lets take a look at the state space of a 2-tuple like structure that stores cubes in two slots, it's state space looks like so:

To make this structure a CRDT, we need a partial order over the state space that satisfies the folowing constraint:

∀ s ⊆ S where S is your state space  # for any subset of the state space ...
∃ lub s and lub s ∈ S                # .. the least-upper-bound of the set is also in the state space

lub is the Least Upper Bound operation, it takes a subset of the state space and produces a unique state that is greater than or equal to all states in the subset. Here's a partial order that satisfies the constraints:

Now say we want to merge two instances of this structure, well turns out we've already done the hard part as the partial order tells us what the final merged state will be.

merge(a, b) = lub { a, b }

The merge(a, b) operation is exactly the same as computing the least-upper-bound of the set {a, b}.

Looking over this partial order, we can derive a few other properties of CRDT's.

  1. merge(a, b) always causes us to go up or stay the same
  2. By 1. merge's are idempotent, since a previous state will be below or equal to the current state, remerging stale states will have no effect.
  3. merge(a, b) is reflexive, commutative and associative

How to use this library

Interacting with the CRDT's

Working with a CRDT is a bit different from datastructures you may be used to. Since we may be acting on data that is concurrently being edited by others, we need to make sure that your local edits only effect the data that you've seen.

Bad way of interacting with CRDT's

For example, if you clear a Map, we want to be able to say that this clear operation will only effect entries in the map that you are aware of. If you are not tracking this causal history of your edits correctly, you could end up deleting data that you are not aware of. e.g. a good way to lose data would be to do something like this:

  1. you receive a Map CRDT from across the network.
  2. you read the Map's key/value pairs and display them to the user.
  3. you receive an updated version of the Map CRDT but the user has not refreshed their view.
  4. The user chooses to clear the values of the Map. So you call Map::clear() on your CRDT.

At this point you've potentially cleared data that the user didn't want to clear. To fix this, we need to include a Causal context with the clear operation. This causal context is a vector clock (VClock) that stores the version of the Map that was seen by this user when they decided to Map::clear().

Good way to interact with CRDT's

Lets take a look at what interacting with CRDT's looks like when using crdts.

  1. First create an instance of the CRDT, we'll use the MVReg (Multi-Value Register) CRDT for this example. It allows us to store a value and resolves concurrently set values by keeping both values.
let mut reg = MVReg::new();
  1. To set a value in your CRDT, you'll need to provide a context (even for the initial value), the only way to get a context is to first read from the CRDT.
let read_ctx = reg.read();
assert_eq!(read_ctx.val, vec![]); // the registers is empty!
  1. Reading any state from a CRDT will produces a ReadCtx.to access the value from the ReadCtx, use the .val field. From the example above we see the register is currently not storing any values (empty Vec).

Now to make your edit to the reg, you'll derive the appropriate context for the edit you want to make, for edits that remove data, you'll need to use .derive_rm_ctx(), for adding new data you'll need .derive_add_ctx( ) where is a unique identifier of whatever is acting on the CRDT.

let add_ctx = read_ctx.derive_add_ctx(123);
let rm_ctx = read_ctx.derive_rm_ctx();

reg.set("Value".to_string(), add_ctx);                // We set the value of the register using the Add context
reg.clear(rm_ctx);                                    // We remove using the (stale) Rm context
assert_eq!(reg.read().val, vec!["Value".to_string()]) // and we see that the MVReg::clear() did not remove the new value

Now you may be wondering why we have a "Value" after we've cleared the register. The "Value" string was added with an AddContext that included a marker showing that new information from actor 123 was present. The clear operation used an RmCtx that was derived from a read where we did not have this information from actor 123, only data that was seen at the time of the read that the RmCtx was derived from is removed.

Another trap you may fall into is reusing a context derived from one part of the CRDT to edit another part of the CRDT. Steps to lose data:

let read_ctx = map.get(&"key 1".to_string());
map.rm(&"key 2".to_string(), read_ctx.derive_rm_ctx());

Now you're using an RmCtx derived from another key, this RmCtx should only be used to remove the same data that it read. Same goes for the AddCtx, it should only be used to overwrite data that it had been derived from.

If you keep these things in mind, you'll have a good time :)

Further reading

If you want to learn about how CRDTs work, I suggest starting with the readme from aphyr's meangirls repo. Afterwards, either check out the riak dt source code or A comprehensive study of CRDTs depending on if you like to read papers or jump straight to source code examples.

references

Comments
  • Version Vector with Exceptions

    Version Vector with Exceptions

    Here is an implementation of a causality buffer based on the algorithm presented in

    D. Malkhi and D. Terry, “Concise Version Vectors in WinFS,” p. 13.

    The idea is to ensure we re-order messages so that they respect a causal order, which is important for datastructures like LSEQ where we need to receive each peer's messages in order but don't want to pay the cost of sending a full vector clock with each operation.

    The Causality Barrier is composed of two components: a version vector (with exceptions) and an unapplied operations buffer.

    The Version Vector with Exceptions works by tracking two pieces of information for every known peer: the maximum message timestamp and a list of exceptions. The list of exceptions is composed of the timestamps less than the maximum which we haven't seen yet.

    When we receive a new operation we follow the following algorithm in pseudo-pseudo-code:

    1. Update the version vector with A's timestamp, if A > max then we set max = A and we add the range [old max, A) to the exceptions. Otherwise if A is in the exception set, we remove it's timestamp.
    2. Check if the operation A must occur after another operation B
    3. Check if we've seen B a. If yes, then allow A, and check the unapplied buffer for opeations that transitively depend on A b. If no, buffer A
    opened by xldenis 12
  • Sequence CRDTs

    Sequence CRDTs

    Hi,

    I've been implementing a version of LSEQ focused on correctness (I've found bugs in every OSS implementation so far :/). Would there be any interest in having it merged into this library?

    I still need to rewrite the id generation code to more cleanly integrate all the fixes to all the bugs I discovered along the way and write more property tests, but once that's done I can prepare a PR.

    opened by xldenis 9
  • Add LSeq type and implementation

    Add LSeq type and implementation

    I'm opening this pull request now so that everyone can see the implementation.

    The major thing I'd like to do before considering this ready to merge is to write more property tests. I'd also like to write longer docs and cite the relevant [2][3]papers in those.

    It could also be possible to use an actual exponential tree structure to store the LSEQ but I'm not convinced that it would be more efficient at least without some significant unsafe code.

    If you have a causal network layer then this implementation is correct :tm:. I also have an implementation of a "Version Vector with Exceptions"[1] which turns any reliable network into a causal one with low overhead.

    relevant: #58 #70

    cc: @FintanH

    [1]D. Malkhi and D. Terry, “Concise Version Vectors in WinFS,” p. 13. [2]S. Weiss, P. Urso, and P. Molli, “Logoot: A Scalable Optimistic Replication Algorithm for Collaborative Editing on P2P Networks,” in 2009 29th IEEE International Conference on Distributed Computing Systems, Montreal, Quebec, Canada, Jun. 2009, pp. 404–412, doi: 10.1109/ICDCS.2009.75. [3]B. Nédelec, P. Molli, A. Mostefaoui, and E. Desmontils, “LSEQ: an adaptive structure for sequences in distributed collaborative editing,” in Proceedings of the 2013 ACM symposium on Document engineering - DocEng ’13, Florence, Italy, 2013, p. 37, doi: 10.1145/2494266.2494278.

    opened by xldenis 8
  • LSEQ Dot counter is 0-based in operations

    LSEQ Dot counter is 0-based in operations

    @davidrusu , I'm not fully sure, but I'm trying to use the Dot of each LSEQ operation to populate a VClock that I'm planning to use as a context.

    When I get the Dot from the very first append op on an LSEQ its counter is 0, so when I want to compare that with the context, where this op didn't happen, (using VClock::partial_cmp) they happen to be Equal. Thus, shouldn't the Dot's counter of the very first op be 1 instead of 0?

    opened by bochaco 4
  • Orswot add/rm mutation

    Orswot add/rm mutation

    As I was browsing through the Orswot implementation it struck me by surprise that the set is not updated underneath.

    This could be because I have a different idea of what would happen while using these CRDTs. In my head, I would think that if I call add that it gets added to the set and I now have my view of the world. I also get the Op as an output which I broadcast to other's that I collaborate with. They can then apply this to their view.

    The current behaviour, however, leaves me with an empty set and two ops. Do I have to apply the ops myself?

    opened by FintanH 4
  • Vector-like CRDT

    Vector-like CRDT

    I wanted to check if there might be some known state of the art for a vector-like CRDT.

    I believe what I want is a grow-only vector where I can also edit the items using an index. I started playing around with the concept of a grow-only vector using sorted-vec. This seemed to work and make sense as a CvRDT. It covers the case of creating a new vector and inserting based on ordering. Introducing editing, however, would mean that I need a CmRDT.

    Does this make sense? Have you heard of something like this? And would someone like to help implement this with me? :grin:

    opened by FintanH 4
  • How sync (sql) tables/records?

    How sync (sql) tables/records?

    I need to implement (again!) a sync solution on top of SQL (sqlite/postgresql) and have found about CRDt and wonder if is a good fit. Next, how actually could be used.

    My apps are traditional business apps (like invoices) that have central databases and only sync with stable schemas (if a schema changes, the clients MUST first migrate).

    So this is maybe simpler than https://github.com/rust-crdt/rust-crdt/issues/51 (JSON), where things go recursive.

    P.D: I need to load all the database records to apply CRDt, and if yes, is possible to separate the storage and querying from the resolution algorithm?

    opened by mamcx 3
  • LWWReg: Why are timestamps bad?

    LWWReg: Why are timestamps bad?

    The documentation for crdts::lwwreg::LWWReg mentions not to use timestamps "unless you are comfortable with divergence" without explaining why.

    But to me, timestamps are a fairly obvious way of determining the "last" write. If they are precise enough, you should never have an equal timestamp (and if they do happen to be equal, you could probably just choose arbitrary).

    Is there something I'm not seeing here that would cause timestamps to be more of an issue?

    opened by ColonelThirtyTwo 3
  • lseq: do not apply op automatically, require apply method call

    lseq: do not apply op automatically, require apply method call

    Right now it seems LSeq is the only CRDT type to automatically apply ops when funcs are called on a given instance.

    This PR brings Lseq inline in requiring that lseq.apply(op) to be called to change the lseq itself

    opened by joshuef 3
  • Trait Bounds

    Trait Bounds

    Hey! :wave:

    I've just picked up your library to implement a convergent issue tracker and it's been nice to work with so far. So thank you for providing such a useful crate in the Rust ecosystem :grin:

    What I wanted to bring up was that the bounds imposed on stuff like mvreg::Val, i.e Debug and Clone seem very restrictive. I've cloned the repo locally and tried removing them. Debug wasn't used anywhere and Clone is only used on a few of the functions.

    I think it might be better to localise the trait bounds where they are needed rather than have blanket bounds. This localises the needs for the traits indicating to the caller when they are needed. Maybe they don't even need the bound in the first place :man_shrugging:

    Just wanted to get your thoughts on this and ask would you mind a pull request that included these changes?

    opened by FintanH 3
  • Feature Request: Support addition and removal of multiple Orswot members at a time

    Feature Request: Support addition and removal of multiple Orswot members at a time

    In some use-cases I would like to be able to add/remove multiple members to/from an Orswot at a time. It seems that Op::Rm already tracks multiple members and Op::Add (and the corresponding apply logic) could be adapted as well to apply the actor's dot to each individual member's clock.

    Do you think this would cause any unforeseen problems?

    Happy to submit a PR if interested.

    opened by pnehrer 3
  • source contains non-free image

    source contains non-free image

    The source file art/crdt_merge.png contains an embedded ICC profile which is copyright Apple and not freely licensed, effectively making that image non-free.

    Please strip that ICC profile from the image (assuming there is no actual need for it).

    In case you are curious how to check for (e.g. license and copyright) metadata in ICC profiles embedded in images, the most convenient way I am aware of is to use commandline tool exiftool part of perl CPAN module Image::ExifTool.

    opened by jonassmedegaard 0
  • Docs: Explain `ResetRemove`

    Docs: Explain `ResetRemove`

    I have read most of the A comprehensive study of Convergent and Commutative Replicated Data Types paper, but as far as I can tell, it contains no notion similar to ResetRemove, and I can't figure out what purpose that trait serves.

    The following explanation can be found in the docs for Map, but not for ResetRemove:

    Reset-remove means that if one replica removes an entry while another actor concurrently edits that entry, once we sync these two maps, we will see that the entry is still in the map but all edits seen by the removing actor will be gone.

    1. For some reason, only CRDTs which fulfill ResetRemove are a Val and can be used in Map. Why?
    2. The above quote should be in ResetRemove probably?
    3. I don't understand why the property as described in the quote makes sense. There is no info on it in the paper either. How does that help maintain the CRDT properties? I am confused.
    opened by Kiiyya 3
  • `Map::update` does not play well with `GCounter::inc`

    `Map::update` does not play well with `GCounter::inc`

    It seems like the intended way to use the Map::update function is to use the context given in the closure, e.g. |set, c| set.add("erik", c):

    https://github.com/rust-crdt/rust-crdt/blob/b405f77361d8cddc8d0b5375e7d6c50f2de16698/examples/reset_remove.rs#L18-L22

    However, with GCounter instead of Orswot as the inner CRDT, this is more awkward, since the GCounter::inc function takes an actor directly, and not a context. Feels like an inconsistency. Or maybe I just missed docs somewhere, since I'm still exploring CRDTs.

    My code so far, which feels wrong:

    type A = &'static str;
    let mut m: Map<String, GCounter<A>, A> = Map::new();
    let add_ctx: AddCtx<A> = m.read_ctx().derive_add_ctx("Actor1");
    m.apply(m.update(
        "Kiiya",
        add_ctx,
        |previous: &GCounter<A>, add_ctx: AddCtx<A>| {
            // return type of this closure is a `Dot<A>`.
            previous.inc(add_ctx.dot.actor) // This feels very wrong. I should be able to use the add_ctx directly here.
        }
    ));
    
    opened by Kiiyya 0
  • Chain CRDT - Append only data structure

    Chain CRDT - Append only data structure

    This is a POC for an append only CRDT.

    The idea is to use VClock partial order when possible, and fall back to ordering by Actor when VClock's are concurrent.

    Take the example below:

    chain-crdt-fork

    1. We start with the empty chain
    2. A then appends a1 with context {A1}
    3. Then B and C concurrently append b1 and c1 respectively.

    Note that B and C's context dominates A's context.

    If we try to resolve the final chain order, it's clear that the chain begins with a1, ... but it's not clear how to order b1 and c1 since their contexts are concurrent ({A1, B1} is not bigger than {A1, C1} and vice-versa).

    To resolve this ambiguity, we need a total order over the competing contexts. The rule used for this chain CRDT here is as follows:

    1. first try to order by VClock's partial order, if that succeeds, use that ordering.
    2. Otherwise, remove any dots that have been seen by the other side
    3. sort the remaining dots by (Actor, Counter) for each side respectively.
    4. Pairwise compare the two sorted sets to find the final ordering.

    In our case, step 2. produces {A1, B1} => {B1} and {A1, C1} => {C1} since A1 has been seen by both sides. Then we see that B's value will come before C's since B < C lexigraphically.

    So the final sequence is a1, b1, c1.

    opened by davidrusu 0
  • Mutating Ops

    Mutating Ops

    As discussed in #72 it can be perceived as weird by users that a function such as Orswot::add does not actually add to the underlying set.

    For example, if I were to call add followed by read the addition would not be seen on my side. One idea would be to have another function called apply_add that mutates the underlying set as well as outputting the Op to write to a log/broadcast the changes.

    opened by FintanH 1
Releases(7.2.0)
Owner
null
An experimental, well-documented and expansion-ready virtual machine written in Rust.

Notice ivm is undergoing a complete refactor. ivm ivm is a rich, well-documented virtual machine written in Rust. Pros Documentation everywhere. Every

Imajin 4 Jul 9, 2022
The utility is designed to check the availability of peers and automatically update them in the Yggdrasil configuration file, as well as using the admin API - addPeer method.

Yggrasil network peers checker / updater The utility is designed to check the availability of peers and automatically update them in the Yggdrasil con

null 6 Dec 25, 2022
A peer-reviewed collection of articles/talks/repos which teach concise, idiomatic Rust.

This repository collects resources for writing clean, idiomatic Rust code. Please bring your own. ?? Idiomatic coding means following the conventions

Matthias 4.2k Dec 30, 2022
A Collection of Rust Tips and Tricks

Table of Contents Using Nested Paths in Rust Struct Update Syntax in Rust Field Init Shorthand in Rust Shadowing in Rust Using Nested Paths in Rust So

Vandad Nahavandipoor 156 Dec 31, 2022
"Crates for Cheese" is a Rust collection library of those crates I consider a useful "extended standard".

cfc The purpose of this library is to provide a minimal list of currated crates which enhance the std library. In addition, most or all crates in this

null 0 Dec 23, 2021
A collection of crates to make minecraft development (client, server) with rust possible.

rust-craft rust-craft is a collection of crates to make minecraft development (client, server) with rust possible. Motivation There's no better way of

João Victor 15 Mar 23, 2023
A collection (eventually) of examples that use some non-beginner things.

nannou examples A collection (eventually) of examples that use some non-beginner things. Right now the only example combines nannou's standard draw AP

Alexis Andre 22 Oct 21, 2022
A SCALE-compatible collection of bits

scale-bits · This small utility crate provides two separate things: A Bits type that can be SCALE encoded and decoded, and is fully SCALE compatible w

Parity Technologies 3 Sep 25, 2022
A simple collection of formatters focused on speed, reliability, correctness and most importantly readability.

EFC The Enoki Formatter Collection (name inspired by the GNU Compiler Collection) is a simple to use colelction of next generation language agnostic f

Enoki 3 Nov 21, 2022
Diosic is an open source web-based music collection server and streamer

Diosic is an open source web-based music collection server and streamer. Mainly suitable for users who need to deploy on servers with low hardware specifications.

Jinker 45 Jan 28, 2023
A collection of comparison-benchmarks for Nova & related Proving systems

Nova benchmarks Here's a list of some of the benchmarks we've been taking to better understand how Nova performs vs other proof systems. Live version:

Privacy & Scaling Explorations (formerly known as appliedzkp) 18 Apr 17, 2023
Leetcode Solutions in Rust, Advent of Code Solutions in Rust and more

RUST GYM Rust Solutions Leetcode Solutions in Rust AdventOfCode Solutions in Rust This project demostrates how to create Data Structures and to implem

Larry Fantasy 635 Jan 3, 2023
Simple autoclicker written in Rust, to learn the Rust language.

RClicker is an autoclicker written in Rust, written to learn more about the Rust programming language. RClicker was was written by me to learn more ab

null 7 Nov 15, 2022
Rust programs written entirely in Rust

mustang Programs written entirely in Rust Mustang is a system for building programs built entirely in Rust, meaning they do not depend on any part of

Dan Gohman 561 Dec 26, 2022
Rust 核心库和标准库的源码级中文翻译,可作为 IDE 工具的智能提示 (Rust core library and standard library translation. can be used as IntelliSense for IDE tools)

Rust 标准库中文版 这是翻译 Rust 库 的地方, 相关源代码来自于 https://github.com/rust-lang/rust。 如果您不会说英语,那么拥有使用中文的文档至关重要,即使您会说英语,使用母语也仍然能让您感到愉快。Rust 标准库是高质量的,不管是新手还是老手,都可以从中

wtklbm 493 Jan 4, 2023
A library for extracting #[no_mangle] pub extern "C" functions (https://docs.rust-embedded.org/book/interoperability/rust-with-c.html#no_mangle)

A library for extracting #[no_mangle] pub extern "C" functions In order to expose a function with C binary interface for interoperability with other p

Dmitrii - Demenev 0 Feb 17, 2022
clone of grep cli written in Rust. From Chapter 12 of the Rust Programming Language book

minigrep is a clone of the grep cli in rust Minigrep will find a query string in a file. To test it out, clone the project and run cargo run body poem

Raunak Singh 1 Dec 14, 2021
Rust-blog - Educational blog posts for Rust beginners

pretzelhammer's Rust blog ?? I write educational content for Rust beginners and Rust advanced beginners. My posts are listed below in reverse chronolo

kirill 5.2k Jan 1, 2023
The ray tracer challenge in rust - Repository to follow my development of "The Raytracer Challenge" book by Jamis Buck in the language Rust

The Ray Tracer Challenge This repository contains all the code written, while step by implementing Ray Tracer, based on the book "The Ray Tracer Chall

Jakob Westhoff 54 Dec 25, 2022