A model checker for implementing distributed systems.

Stateright Actor Framework

Last update: Dec 15, 2022

Related tags

Overview

Correctly implementing distributed algorithms such as the Paxos and Raft consensus protocols is notoriously difficult due to inherent nondetermism such as message reordering by network devices. Stateright is a Rust actor library that aims to solve this problem by providing an embedded model checker, a UI for exploring system behavior (demo), and a lightweight actor runtime. It also features a linearizability tester that can be run within the model checker for more exhaustive test coverage than similar solutions such as Jepsen.

Getting Started

Please see the book, "Building Distributed Systems With Stateright."
A video introduction is also available.
Stateright also has detailed API docs.
Consider also joining the Stateright Discord server for Q&A or other feedback.

Examples

Stateright includes a variety of examples, such as a Single Decree Paxos cluster and an abstract two phase commit model.

Passing a check CLI argument causes each example to validate itself using Stateright's model checker:

# Two phase commit with 3 resource managers.
cargo run --release --example 2pc check 3
# Paxos cluster with 3 clients.
cargo run --release --example paxos check 3
# Single-copy (unreplicated) register with 3 clients.
cargo run --release --example single-copy-register check 3
# Linearizable distributed register (ABD algorithm) with 2 clients.
# (please be patient with this one, as it takes longer to check)
cargo run --release --example linearizable-register check 2

Passing an explore CLI argument causes each example to spin up the Stateright Explorer web UI locally on port 3000, allowing you to browse system behaviors:

cargo run --release --example 2pc explore
cargo run --release --example paxos explore
cargo run --release --example single-copy-register explore
cargo run --release --example linearizable-register explore

Passing a spawn CLI argument to the examples leveraging the actor functionality will cause each to spawn actors using the included runtime, transmitting JSON messages over UDP:

cargo run --release --example paxos spawn
cargo run --release --example single-copy-register spawn
cargo run --release --example linearizable-register spawn

Features

Stateright contains a general purpose model checker offering:

Invariant checks via "always" properties.
Nontriviality checks via "sometimes" properties.
Liveness checks via "eventually" properties.
A web browser UI for interactively exploring state space.
Linearizability and sequential consistency testers.

Stateright's actor system features include:

An actor runtime that can execute actors outside the model checker in the "real world."
A model for lossy/lossless duplicating/non-duplicating networks with the ability to capture actor message history to check an actor system against an expected consistency model.
An optional network adapter that provides a lossless non-duplicating ordered virtual channel for messages between a pair of actors.

In contrast with other actor libraries, Stateright enables you to formally verify the correctness of your implementation, and in contrast with model checkers such as TLC for TLA+, systems implemented using Stateright can also be run on a real network without being reimplemented in a different language.

Contribution

Contributions are welcome! Please fork the library, push changes to your fork, and send a pull request. All contributions are shared under an MIT license unless explicitly stated otherwise in the pull request.

License

To avoid the need for a Javascript package manager, the Stateright repository includes code for the following Javascript dependency used by Stateright Explorer:

KnockoutJS is copyright 2010 Steven Sanderson, the Knockout.js team, and other contributors. It is made available under the MIT License.

Comments

Relax trait bounds for messages and actor state

Hi. I was upgrading to the latest version and noticed that new trait bounds were introduced (e.g. Ord for Actor::Message). If I understand correctly, this prevents the use of e.g. HashMap's in such messages. Would it be possible to relax these trait bounds?

(I tried to perform such change, but it turned out more complex than what I was expecting (probably because I'm not familiar with the codebase). If you have ideas on how to go about this, I can give it a try. Thanks.)

opened by vitorenesduarte 12
Add symmetry checking
This PR adds permutation based symmetry checking.

It currently works by checking whether there are any permutations of the current state, who's fingerprint is within generated and in that case not persuing that state further.

This is not an efficient approach for doing symmetry checking, but for low numbers of symmetric elements, this can improve the model checking time. For an example of this test the increment_lock.rs example (which models multiple threads trying to increment a variable), goes from 38169180 states to 41 states. Unfortunately this takes longer to check since it fingerprints 10! states per actual state.

Additionally the actual interface for this feature is not very user friendly, requiring the user to manually implement a permutation function. This may be possible to fix using hash maps.

A more efficient approach would be to canonicalise each state before fingerprinting (sort hashmaps etc by non-symmetric elements) such that two symmetric states should end up having the same hash in most cases.

TODO

[ ] add tests

[ ] optimise
opened by Cjen1 5
Infinite generation of states

I recently tried to write the Ballot Leader Election with Gossiping using Stateright and encountered some issues along the way.

I can not stop Stateright from generating new states. Even though I added a maximum of rounds to stop the set-up of the timer, it keeps generating states.

Are there issues with using the function on_timeout?

Repo: https://github.com/Nightcro/ble-stateright

opened by Nightcro 5
Leveraging future's state machines

Hello! Love this project. I had a thought the other day...Futures are state machines, and stateright explores state space. Rather than having to manually hoist state into the State associated type forActors, would it possible to automatically plug stateright into the generated state machine for futures? Essentially treating futures as actors, where the state is always Poll::Ready(val) or Poll::Pending?

opened by LegNeato 3
Order-preserving network flows

This is a note about an addition I'm thinking of making -- thought I'd post here before I work on it to ask if it makes sense to you and/or would be welcome.

(Context: the actor system I'm driving with stateright exhibits pretty bad state-space explosion -- 3 peers and 2 consensus cycles finish in "only" a few hundred thousand states, but 3 consensus cycles exceeds 100m states and exhausts memory on my 128gb test box -- so I'm kinda grasping at straws to figure out things to reduce it, and/or whether there's a bug in my code. I'll file a couple other issues as possible avenues I'm considering for discussion of those also.)

The simplest change I can think of is to add another flag concerning the network semantics, such that messages in a given flow don't get reordered: if peer 1 sends messages A then B to peer 2, I'd like to only explore delivery schedules that have A arriving at peer 2 before B. I think this is not an unreasonable variant -- lots of network abstractions preserve order per-flow -- and it'll cut out a lot of the state space, for a reasonably small change.

This would not alter the interleaving of separate flows: if peer 3 sends message C to peer 2, then we'd still explore arrival schedules at peer 2 of each of [C,A,B], [A,C,B], [A,B,C]. Just not [C,B,A], [B,C,A], or [B,A,C].

What do you think?

opened by graydon 3
Dynamic property feedback
Adds properties field per-state in the explorer API

Renders property icons per-state (toggleable)

Renders properties for currently selected state (toggleable)

Renders breadcrumbs for states on a path to a discovery

Fixes #22
opened by jeffa5 2
Explorer UI: dynamic property feedback

When exploring states in the UI it shows the overall properties for all states so far, could it also show the status of the properties at the current state being explored.

My hope is that this will allow us to automatically be told (when clicking through) that we are now in a bad state (violating properties).

opened by jeffa5 2
Add support for "eventually" (at least acyclic liveness) properties.
This PR adds a simple form of liveness-proerty checking -- "eventually" properties. They're implemented by assigning a bit-number to each such property and propagating a compact bitset (IdSet from the id-set crate) representing properties not-yet-established on a given path, along with the state, in the queue of states. Any property still not-yet-established when we arrive at a terminal state is marked as a counterexample discovery, as with "always" properties, and reported the same way.

These properties suffer from two possible false negatives:

Cyclic paths are not disambiguated from DAG joins, so neither is considered terminal. If an eventually property is not met by the cycle-closing edge of a cyclic path, it is forgotten.

DAG joins with different sets of liveness properties at the join point are not differentiated. If an eventually property was met on the first visit through a state but not on the second visit that joins it, it is not reported.

In theory both of these could be converted to a false positive by considering revisited states as terminal. I'd be open to the argument that that's more-useful / more-correct. Not sure what's more annoying to users in practice; the code is anyways factored in such a way that you'd only have to add one more call to note_terminal_state to switch to that way, if you prefer.
opened by graydon 2
Add duplicating_network flag.

Hi! Thanks so much for building this library, it's great!

As a first contribution, here's a change that lets users prune the portion of search space that consists of duplicate messages -- some systems have trivially idempotent messages, where exploring all the extra delivery schedules is just a waste of model-checking time.

opened by graydon 2
Add an on-demand checker

This checker waits for requests to check fingerprints before doing the checking. This makes exploring less resource-intensive, without the need to check every state to just explore a few paths.

There is a 'run to completion' button in the UI for letting the checker run without waiting, back to the normal functionality.

The checker itself is mostly a copy of the BFS checker, besides the modifications to wait for fingerprints.

A potential future change would be to merge this control-flow functionality into both the DFS and BFS checkers directly

opened by jeffa5 1
Add a sentence in the spawn help for using 'tcpdump -D`

The -i lo0 parameter referrers to the device/interface that tcpdump will listen on and not all systems use lo0 to refer to the loop back interface. Executing tcpdump -D will display a list of devices available on the system and the user can find the current loop back device:

For instance on my computer device 3, lo, is the loop back device:

$ tcpdump -D 1.wlp170s0 [Up, Running, Wireless, Associated] 2.any (Pseudo-device that captures on all interfaces) [Up, Running] 3.lo [Up, Running, Loopback] 4.bluetooth0 (Bluetooth adapter number 0) [Wireless, Association status unknown] 5.bluetooth-monitor (Bluetooth Linux Monitor) [Wireless] 6.nflog (Linux netfilter log (NFLOG) interface) [none] 7.nfqueue (Linux netfilter queue (NFQUEUE) interface) [none] 8.dbus-system (D-Bus system bus) [none] 9.dbus-session (D-Bus session bus) [none]

opened by winksaville 1
Go code implementation of stateright project

Hello author, I am very interested in the stateright project, especially its model checker, and I have also tried it. I personally think that Rust is easier to use than TLA+ in terms of writing specifications. However, most of our project teams are familiar with Go and Java code, so we would like to try to use the Go language to implement the stateright project to facilitate its promotion and use. Could you please provide the structure flow chart of the Stateright project, or other information related to Stateright, so as to facilitate our development. If you are interested, we especially welcome you to join to develop the go version of stateright to make it more valuable. thank you!

opened by niuzhi 1
Add ability to save and load checker state

It would be really nice to be able to have the results of a check be saved (including states created/visited etc) so that they could be shared without others having to rerun the check.

Maybe this could lead to the web explorer having an option to upload a file with the check that it can process into an explorable version.

I guess this wouldn't directly mean that the checker can continue from a persisted state but could also enable snapshots while checking is running to enable investigating issues later.

opened by jeffa5 2
Question about stateright states

In stateright, is it possible for the "environment" to have state? Or is all of the state represented in the actors?

For example, could the environment have a clock that is non-deterministically updated?

(Sorry for the naive question.)

opened by smoelius 2
DFS and/or DPOR
A fair number of explicit-state model checkers use DFS not BFS. I realize this would be a fairly big revision to the structure (at least of the checker module) but I think there are good reasons for it:

We can spit the sources (visited-state-tracking) hashtable in two: one piece that's a strictly-consulted and unbounded (but probably quite small), containing the current DFS path, used for detecting re-visits on a single path; and a second piece that's a size-bounded (randomly evicting on overflow) set of previously-visited states not on the current DFS path. A false negative on the second piece (due to an eviction) causes accidental revisiting of a subspace of the state-space but doesn't affect termination of the algorithm. This allows trading time and space: specifically avoiding memory exhaustion, at the cost of a longer runtime.

I think it might better-enable distributed operation in the future (on a cluster of machines) by delegating entire subtrees to separate nodes. Not sure about this.

More significantly, I think it's a path towards doing dynamic partial-order reduction (as described here: https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.421.1291). Specifically in an actor system we could classify the choice-points in a given DFS path as splitting or not-splitting the future state space into disjoint sets of communicating actors and only backtrack and reorder within a given disjoint set.

I'm not sure about drawbacks. Maybe a slightly more complicated stack-of-choice-points structure to track progress? What do you think?
opened by graydon 2
Pending-state randomization

Currently it looks like the checker always splits chunks off the end of the pending vecdeque, and then pushes new extensions back on the end. But it doesn't necessarily split off everything pending, and it can often grow the end by more than it tried to consume. When this happens, it ignores (potentially for a very long time, until much later in the run) entries earlier in the deque.

I'm not sure this is a problem (I guess if we're going to explore all states eventually the order might not matter?) but I worry a bit vaguely that there's some possibility that an imbalance in the shape the frontier gets expanded at will wind up costing more work or more memory than if it were more balanced.

Might it make sense to randomize or otherwise permute the pending vector from time to time?

opened by graydon 4

Owner

Stateright Actor Framework

GitHub https://docs.rs/stateright

Fluvio is a high-performance distributed streaming platform that's written in Rust

Fluvio is a high-performance distributed streaming platform that's written in Rust, built to make it easy to develop real-time applications.

1.6k Dec 30, 2022

The lightest distributed consensus library. Run your own replicated state machine! ❤️

Little Raft The lightest distributed consensus library. Run your own replicated state machine! ❤️ Installing Simply import the crate. In your Cargo.to

359 Dec 26, 2022

Raft distributed consensus for WebAssembly in Rust

WRaft: Raft in WebAssembly What is this? A toy implementation of the Raft Consensus Algorithm for WebAssembly, written in Rust. Basically, it synchron

60 Oct 22, 2022

Paxakos is a pure Rust implementation of a distributed consensus algorithm

Paxakos is a pure Rust implementation of a distributed consensus algorithm based on Leslie Lamport's Paxos. It enables distributed systems to consistently modify shared state across their network, even in the presence of failures.

2 Jul 5, 2022

Sorock is an experimental "so rocking" scale-out distributed object storage

6 Jun 13, 2022

A universal, distributed package manager

39 Dec 30, 2022

Easy c̵̰͠r̵̛̠ö̴̪s̶̩̒s̵̭̀-t̶̲͝h̶̯̚r̵̺͐e̷̖̽ḁ̴̍d̶̖̔ ȓ̵͙ė̶͎ḟ̴͙e̸̖͛r̶̖͗ë̶̱́ṉ̵̒ĉ̷̥e̷͚̍ s̷̹͌h̷̲̉a̵̭͋r̷̫̊ḭ̵̊n̷̬͂g̵̦̃ f̶̻̊ơ̵̜ṟ̸̈́ R̵̞̋ù̵̺s̷̖̅ţ̸͗!̸̼͋

Rust S̵̓i̸̓n̵̉ I̴n̴f̶e̸r̵n̷a̴l mutability! Howdy, friendly Rust developer! Ever had a value get m̵̯̅ð̶͊v̴̮̾ê̴̼͘d away right under your nose just when

294 Dec 23, 2022

Tight Model format is a lossy 3D model format focused on reducing file size as much as posible without decreasing visual quality of the viewed model or read speeds.

What is Tight Model Format The main goal of the tmf project is to provide a way to save 3D game assets compressed in such a way, that there are no not

59 Mar 6, 2023

Low-level Rust library for implementing terminal command line interface, like in embedded systems.

Terminal CLI Need to build an interactive command prompt, with commands, properties and with full autocomplete? This is for you. Example, output only

47 Nov 25, 2022

Learn Rust black magics by implementing basic types in database systems

Type Exercise in Rust (In Chinese) 数据库表达式执行的黑魔法：用 Rust 做类型体操 This is a short lecture on how to use the Rust type system to build necessary components

996 Jan 3, 2023

Build SQLite virtual file systems (VFS) by implementing a simple Rust trait.

sqlite-vfs Build SQLite virtual file systems (VFS) by implementing a simple Rust trait. Documentation | Example This library is build for my own use-c

56 Dec 19, 2022

Damavand is a quantum circuit simulator. It can run on laptops or High Performance Computing architectures, such CPU distributed architectures or multi GPU distributed architectures.

0 Mar 29, 2022

open source training courses about distributed database and distributed systemes

Welcome to learn Talent Plan Courses! Talent Plan is an open source training program initiated by PingCAP. It aims to create or combine some open sour

8.3k Dec 30, 2022

Damavand is a quantum circuit simulator. It can run on laptops or High Performance Computing architectures, such CPU distributed architectures or multi GPU distributed architectures.

Damavand is a code that simulates quantum circuits. In order to learn more about damavand, refer to the documentation. Development status Core feature

6 Mar 29, 2022

Magical Automatic Deterministic Simulator for distributed systems in Rust.

MadSim Magical Automatic Deterministic Simulator for distributed systems. Deterministic simulation MadSim is a Rust async runtime similar to tokio, bu

249 Dec 28, 2022

Canary - Distributed systems library for making communications through the network easier, while keeping minimalism and flexibility.

Canary Canary is a distributed systems and communications framework, focusing on minimalism, ease of use and performance. Development of Canary utiliz

28 Nov 3, 2022

Cap'n Proto is a type system for distributed systems

Cap'n Proto for Rust documentation blog Introduction Cap'n Proto is a type system for distributed systems. With Cap'n Proto, you describe your data an

1.5k Jan 1, 2023

A simple to use rust package to generate or parse Twitter snowflake IDs,generate time sortable 64 bits unique ids for distributed systems

A simple to use rust package to generate or parse Twitter snowflake IDs,generate time sortable 64 bits unique ids for distributed systems (inspired from twitter snowflake)

5 Oct 6, 2022

The fly.io distributed systems challenges solved in Rust

The fly.io distributed systems challenges solved in Rust. Live-streamed in https://youtu.be/gboGyccRVXI License Licensed under either of Apache Licens

162 Apr 19, 2023

NodeCraft - Crafting seamless node operations for distributed systems

NodeCraft Crafting seamless node operations for distributed systems, which provides foundational traits for node identification and address resolution

3 Oct 9, 2023