A model checker for implementing distributed systems.

Overview

chat crates.io docs.rs LICENSE

Correctly implementing distributed algorithms such as the Paxos and Raft consensus protocols is notoriously difficult due to inherent nondetermism such as message reordering by network devices. Stateright is a Rust actor library that aims to solve this problem by providing an embedded model checker, a UI for exploring system behavior (demo), and a lightweight actor runtime. It also features a linearizability tester that can be run within the model checker for more exhaustive test coverage than similar solutions such as Jepsen.

Stateright Explorer screenshot

Getting Started

  1. Please see the book, "Building Distributed Systems With Stateright."
  2. A video introduction is also available.
  3. Stateright also has detailed API docs.
  4. Consider also joining the Stateright Discord server for Q&A or other feedback.

Examples

Stateright includes a variety of examples, such as a Single Decree Paxos cluster and an abstract two phase commit model.

Passing a check CLI argument causes each example to validate itself using Stateright's model checker:

# Two phase commit with 3 resource managers.
cargo run --release --example 2pc check 3
# Paxos cluster with 3 clients.
cargo run --release --example paxos check 3
# Single-copy (unreplicated) register with 3 clients.
cargo run --release --example single-copy-register check 3
# Linearizable distributed register (ABD algorithm) with 2 clients.
# (please be patient with this one, as it takes longer to check)
cargo run --release --example linearizable-register check 2

Passing an explore CLI argument causes each example to spin up the Stateright Explorer web UI locally on port 3000, allowing you to browse system behaviors:

cargo run --release --example 2pc explore
cargo run --release --example paxos explore
cargo run --release --example single-copy-register explore
cargo run --release --example linearizable-register explore

Passing a spawn CLI argument to the examples leveraging the actor functionality will cause each to spawn actors using the included runtime, transmitting JSON messages over UDP:

cargo run --release --example paxos spawn
cargo run --release --example single-copy-register spawn
cargo run --release --example linearizable-register spawn

Features

Stateright contains a general purpose model checker offering:

  • Invariant checks via "always" properties.
  • Nontriviality checks via "sometimes" properties.
  • Liveness checks via "eventually" properties.
  • A web browser UI for interactively exploring state space.
  • Linearizability and sequential consistency testers.

Stateright's actor system features include:

  • An actor runtime that can execute actors outside the model checker in the "real world."
  • A model for lossy/lossless duplicating/non-duplicating networks with the ability to capture actor message history to check an actor system against an expected consistency model.
  • An optional network adapter that provides a lossless non-duplicating ordered virtual channel for messages between a pair of actors.

In contrast with other actor libraries, Stateright enables you to formally verify the correctness of your implementation, and in contrast with model checkers such as TLC for TLA+, systems implemented using Stateright can also be run on a real network without being reimplemented in a different language.

Contribution

Contributions are welcome! Please fork the library, push changes to your fork, and send a pull request. All contributions are shared under an MIT license unless explicitly stated otherwise in the pull request.

License

Stateright is copyright 2018 Jonathan Nadal and other contributors. It is made available under the MIT License.

To avoid the need for a Javascript package manager, the Stateright repository includes code for the following Javascript dependency used by Stateright Explorer:

  • KnockoutJS is copyright 2010 Steven Sanderson, the Knockout.js team, and other contributors. It is made available under the MIT License.
Issues
  • Relax trait bounds for messages and actor state

    Relax trait bounds for messages and actor state

    Hi. I was upgrading to the latest version and noticed that new trait bounds were introduced (e.g. Ord for Actor::Message). If I understand correctly, this prevents the use of e.g. HashMap's in such messages. Would it be possible to relax these trait bounds?

    (I tried to perform such change, but it turned out more complex than what I was expecting (probably because I'm not familiar with the codebase). If you have ideas on how to go about this, I can give it a try. Thanks.)

    opened by vitorenesduarte 12
  • Discord link seems to be not working

    Discord link seems to be not working

    Title explains it all :)

    opened by tekjar 5
  • Add duplicating_network flag.

    Add duplicating_network flag.

    Hi! Thanks so much for building this library, it's great!

    As a first contribution, here's a change that lets users prune the portion of search space that consists of duplicate messages -- some systems have trivially idempotent messages, where exploring all the extra delivery schedules is just a waste of model-checking time.

    opened by graydon 2
  • Add support for

    Add support for "eventually" (at least acyclic liveness) properties.

    This PR adds a simple form of liveness-proerty checking -- "eventually" properties. They're implemented by assigning a bit-number to each such property and propagating a compact bitset (IdSet from the id-set crate) representing properties not-yet-established on a given path, along with the state, in the queue of states. Any property still not-yet-established when we arrive at a terminal state is marked as a counterexample discovery, as with "always" properties, and reported the same way.

    These properties suffer from two possible false negatives:

    • Cyclic paths are not disambiguated from DAG joins, so neither is considered terminal. If an eventually property is not met by the cycle-closing edge of a cyclic path, it is forgotten.
    • DAG joins with different sets of liveness properties at the join point are not differentiated. If an eventually property was met on the first visit through a state but not on the second visit that joins it, it is not reported.

    In theory both of these could be converted to a false positive by considering revisited states as terminal. I'd be open to the argument that that's more-useful / more-correct. Not sure what's more annoying to users in practice; the code is anyways factored in such a way that you'd only have to add one more call to note_terminal_state to switch to that way, if you prefer.

    opened by graydon 2
  • derive deault and deserialize for Id as well

    derive deault and deserialize for Id as well

    Minor change but necessary for interop with the library I'm using stateright with and, I think, relatively harmless?

    opened by graydon 1
  • Add missing 'to' in the sentence

    Add missing 'to' in the sentence

    opened by abhinav-upadhyay 1
  • introduce incomplete `replicated-processor` example

    introduce incomplete `replicated-processor` example

    This commit outlines the design and code structure.

    opened by jonnadal 2
  • DFS and/or DPOR

    DFS and/or DPOR

    A fair number of explicit-state model checkers use DFS not BFS. I realize this would be a fairly big revision to the structure (at least of the checker module) but I think there are good reasons for it:

    • We can spit the sources (visited-state-tracking) hashtable in two: one piece that's a strictly-consulted and unbounded (but probably quite small), containing the current DFS path, used for detecting re-visits on a single path; and a second piece that's a size-bounded (randomly evicting on overflow) set of previously-visited states not on the current DFS path. A false negative on the second piece (due to an eviction) causes accidental revisiting of a subspace of the state-space but doesn't affect termination of the algorithm. This allows trading time and space: specifically avoiding memory exhaustion, at the cost of a longer runtime.
    • I think it might better-enable distributed operation in the future (on a cluster of machines) by delegating entire subtrees to separate nodes. Not sure about this.
    • More significantly, I think it's a path towards doing dynamic partial-order reduction (as described here: https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.421.1291). Specifically in an actor system we could classify the choice-points in a given DFS path as splitting or not-splitting the future state space into disjoint sets of communicating actors and only backtrack and reorder within a given disjoint set.

    I'm not sure about drawbacks. Maybe a slightly more complicated stack-of-choice-points structure to track progress? What do you think?

    opened by graydon 2
  • Pending-state randomization

    Pending-state randomization

    Currently it looks like the checker always splits chunks off the end of the pending vecdeque, and then pushes new extensions back on the end. But it doesn't necessarily split off everything pending, and it can often grow the end by more than it tried to consume. When this happens, it ignores (potentially for a very long time, until much later in the run) entries earlier in the deque.

    I'm not sure this is a problem (I guess if we're going to explore all states eventually the order might not matter?) but I worry a bit vaguely that there's some possibility that an imbalance in the shape the frontier gets expanded at will wind up costing more work or more memory than if it were more balanced.

    Might it make sense to randomize or otherwise permute the pending vector from time to time?

    opened by graydon 4
  • Smaller (and lazy/persistent) states

    Smaller (and lazy/persistent) states

    The system state representation in the actor subsystem is relatively large and eagerly copies vectors and btrees full of contents when cloned. This makes it relatively memory-hungry, which in turn limits the size of the state-space it's possible to explore.

    How would you feel about changes that reworked the representation to use lazy/persistent types (that copy-on-write only the substructures written) like those in the im crate? They are a bit slower than the eager types but I think they can take up substantially less room.

    opened by graydon 1
  • Order-preserving network flows

    Order-preserving network flows

    This is a note about an addition I'm thinking of making -- thought I'd post here before I work on it to ask if it makes sense to you and/or would be welcome.

    (Context: the actor system I'm driving with stateright exhibits pretty bad state-space explosion -- 3 peers and 2 consensus cycles finish in "only" a few hundred thousand states, but 3 consensus cycles exceeds 100m states and exhausts memory on my 128gb test box -- so I'm kinda grasping at straws to figure out things to reduce it, and/or whether there's a bug in my code. I'll file a couple other issues as possible avenues I'm considering for discussion of those also.)

    The simplest change I can think of is to add another flag concerning the network semantics, such that messages in a given flow don't get reordered: if peer 1 sends messages A then B to peer 2, I'd like to only explore delivery schedules that have A arriving at peer 2 before B. I think this is not an unreasonable variant -- lots of network abstractions preserve order per-flow -- and it'll cut out a lot of the state space, for a reasonably small change.

    This would not alter the interleaving of separate flows: if peer 3 sends message C to peer 2, then we'd still explore arrival schedules at peer 2 of each of [C,A,B], [A,C,B], [A,B,C]. Just not [C,B,A], [B,C,A], or [B,A,C].

    What do you think?

    opened by graydon 2
Owner
Stateright Actor Framework
Stateright Actor Framework
Fluvio is a high-performance distributed streaming platform that's written in Rust

Fluvio is a high-performance distributed streaming platform that's written in Rust, built to make it easy to develop real-time applications.

InfinyOn 313 Jun 16, 2021
A fully asynchronous, futures-based Kafka client library for Rust based on librdkafka

rust-rdkafka A fully asynchronous, futures-enabled Apache Kafka client library for Rust based on librdkafka. The library rust-rdkafka provides a safe

Federico Giraud 693 Jun 15, 2021