Shuttle is a library for testing concurrent Rust code

Overview

Shuttle

crates.io docs.rs Tests

Shuttle is a library for testing concurrent Rust code. It is an implementation of a number of randomized concurrency testing techniques, including A Randomized Scheduler with Probabilistic Guarantees of Finding Bugs.

Getting started

Consider this simple piece of concurrent code:

use std::sync::{Arc, Mutex};
use std::thread;

let lock = Arc::new(Mutex::new(0u64));
let lock2 = lock.clone();

thread::spawn(move || {
    *lock.lock().unwrap() = 1;
});

assert_eq!(0, *lock2.lock().unwrap());

There is an obvious race condition here: if the spawned thread runs before the assertion, the assertion will fail. But writing a unit test that finds this execution is tricky. We could run the test many times and try to "get lucky" by finding a failing execution, but that's not a very reliable testing approach. Even if the test does fail, it will be difficult to debug: we won't be able to easily catch the failure in a debugger, and every time we make a change, we will need to run the test many times to decide whether we fixed the issue.

Randomly testing concurrent code with Shuttle

Shuttle avoids this issue by controlling the scheduling of each thread in the program, and scheduling those threads randomly. By controlling the scheduling, Shuttle allows us to reproduce failing tests deterministically. By using random scheduling, with appropriate heuristics, Shuttle can still catch most (non-adversarial) concurrency bugs even though it is not an exhaustive checker.

A Shuttle version of the above test just wraps the test body in a call to Shuttle's check_random function, and replaces the concurrency-related imports from std with imports from shuttle:

use shuttle::sync::{Arc, Mutex};
use shuttle::thread;

shuttle::check_random(|| {
    let lock = Arc::new(Mutex::new(0u64));
    let lock2 = lock.clone();

    thread::spawn(move || {
        *lock.lock().unwrap() = 1;
    });

    assert_eq!(0, *lock2.lock().unwrap());
}, 100);

This test detects the assertion failure with extremely high probability (over 99.9999%).

Shuttle is inspired by the Loom library for testing concurrent Rust code. Shuttle focuses on randomized testing, rather than the exhaustive testing that Loom offers. This is a soundness—scalability trade-off: Shuttle is not sound (a passing Shuttle test does not prove the code is correct), but it scales to much larger test cases than Loom. Empirically, randomized testing is successful at finding most concurrency bugs, which tend not to be adversarial.

License

This project is licensed under the Apache-2.0 License.

Security

See CONTRIBUTING for more information.

Comments
  • shuttle::thread_local isn't dropped when threads exit the scheduler

    shuttle::thread_local isn't dropped when threads exit the scheduler

    I have a crate that setups up cross-thread channels via thread_local and lazy_static. Shuttle reports a spurious deadlock, however, because it only frees the thread_local value for threads when they actually exit and not when they "exit" the thread scheduler (such as hitting the end of shuttle::check_random(|| { })) - I am using the thread_local Drop impl causing the last sender of a channel to go away to unblock another thread's receiver, so the other thread can exit once there are no more clients. Because thread_locals don't drop, the sender is never removed, and so the receiver is reported as a deadlock, despite this not being possible in normal operations.

    I have a manual workaround, where I just RefCell::take() the channel at the end of my shuttle tests in order to manually drop the sender, but it would be nice to not need this (and it took me more than a little while to figure out what exactly was going on and realize the problem, which may hit other people).

    opened by chc4 4
  • Implement `Mutex::try_lock`

    Implement `Mutex::try_lock`

    Without try_lock it was easier to justify context switches, because acquire was a right mover (we only needed a context switch before) and release was a left mover (we only needed a context switch after). However, with try_lock that is not the case anymore. This commit argues why we need a context switch at the end of lock and try_lock (both in the success and failure case), and why we do not need a context switch at the beginning of try_lock and MutexGuard::drop.


    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by bkragl 1
  • Small cleanups

    Small cleanups

    Just a couple of small things: removing the Debug bound from the Scheduler trait, and removing some clippy allows that are now fixed upstream.


    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by jamesbornholt 1
  • Friendlier error messages for re-entrancy failures

    Friendlier error messages for re-entrancy failures

    Currently, if code tries to re-acquire a lock it already holds, it correctly panics but with an unhelpful error message about Shuttle's internals. Instead, let's provide a nicer error message about why the calling code is buggy.

    This came up while trying out the super neat example from a fasterthanlime blog post, which Shuttle has no trouble finding the deadlock in, but wasn't giving a helpful error message for.


    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by jamesbornholt 1
  • Release a new version?

    Release a new version?

    Hi, first of all, thanks for the great project!

    I try to use shuttle in one of my projects and it failed to resolve dependencies. The reason is that the version on crates.io pins its dependencies on minor versions. I noticed that it was fixed in this commit, but the code is not released to crates.io.

    Is it possible to release a new minor version to reflect the changes made to shuttle since last release?

    opened by XiangpengHao 1
  • Remove implicit `Sized` bounds on `Mutex` and `RwLock`

    Remove implicit `Sized` bounds on `Mutex` and `RwLock`

    This is to match the standard library, which doesn't require the contents of a lock to be Sized.


    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by jamesbornholt 1
  • Fix waker behavior when invoked before a `poll` finishes

    Fix waker behavior when invoked before a `poll` finishes

    It's possible for a task's waker to be invoked in the middle of a call to that task's poll by the executor. We had accounted for that possibility if the task called its own waker, but that's not good enough: the waker can escape to other threads that can invoke it before poll finishes (e.g., if the task blocks to acquire a lock).

    This change fixes the waker behavior by clarifying the semantics of a call to wake: a task whose waker is invoked should not be blocked when it next returns Pending to the executor, and should be woken if that has already happened. To do this, we introduce a new Sleeping state for tasks, that has the same semantics as Blocked but that is recognized by waker invocations, which will only unblock a task in Sleeping state. This also removes the special case "woken by self" behavior -- being woken by any thread should be enough to trigger this sleep logic.


    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by jamesbornholt 1
  • Implement `Drop` for async task `JoinHandle`

    Implement `Drop` for async task `JoinHandle`

    When an async task's JoinHandle is dropped, we mark the task as dropped. Dropped tasks are not allowed to force a deadlock, meaning that execution can end successfully when all tasks are either finished or dropped. At the same time, dropped tasks should still be runnable by the executor so long as some non-dropped task has not finished.


    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by JacobVanGeffen 1
  • Print a failing schedule even during double panics

    Print a failing schedule even during double panics

    Double panics are a common problem in concurrent Rust code. For example, a thread might panic while holding a lock, and then need to acquire that lock again during stack unwinding, but the lock is already poisoned. In these cases, Shuttle currently can't print the schedule that led to the original panic, because the double panic aborts the process before our catch_unwind has a chance to run.

    This change adds a new panic hook that gets a chance to run before the double panic happens. We use this hook to print the schedule, so that even if we double panic in future at least the user gets some output they can use to reproduce the problem. There's some trickiness here (outlined in a module comment for failure.rs) around running different panic handlers at different times, which makes the code a little more complex than I would have hoped, but it gets the job done.


    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by jamesbornholt 1
  • Add support for tracking thread causality.

    Add support for tracking thread causality.

    This change adds support for tracking causality among thread operations. We do this in the standard way, by associating a vector clock with each thread. The i'th element of a thread's vector clock denotes its knowledge of the clock of thread i. Clocks are partially ordered using a pointwise ordering <. The main property we want is that for any pair of events p, q: (p causally precedes q) iff (clock at p < clock at q).

    We update the code for thread spawn and join, as well as the various synchronization objects (Atomics, Barriers, CondVars, Mutexes, RwLocks and mpsc channels) to track causality by updating vector clocks appropriately.

    This change does not currently properly track causality for async interactions; those will be done in a subsequent PR.


    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by jorajeev 1
  • Add hints to `yield_now` to allow schedulers to deprioritize a yielding thread

    Add hints to `yield_now` to allow schedulers to deprioritize a yielding thread

    Tasks that implement busy-wait loops or other infinite loops are a fairness issue for schedulers like PCT, which will run a thread indefinitely until it blocks. This change makes both sync and async versions of yield_now act as a hint to the scheduler that the current task should be deprioritized. Only the PCT scheduler acts on this hint, by moving the current task to the lowest priority. We could also make the DFS scheduler react to this hint by not allowing it to re-schedule the current task immediately, but that would be unsound.


    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by jamesbornholt 1
  • Make `future::JoinHandle::abort` actually cancel task

    Make `future::JoinHandle::abort` actually cancel task

    The implementation of abort in #87 as detaching the task does not seem right, because detached tasks may still continue to run. Consider the following test, which should not fail but currently does.

    #[test]
    fn join_handle_abort_bug() {
        check_dfs(
            || {
                let (sender, receiver) = futures::channel::oneshot::channel();
                let t = future::spawn({
                    async move {
                        receiver.await.unwrap();
                        panic!("should not get here");
                    }
                });
                t.abort();
                sender.send(()).unwrap();
                shuttle::thread::yield_now();
            },
            None,
        );
    }
    

    (The yield_now is needed because otherwise the main task would immediately finish, in which case also the execution finishes because there are no attached tasks left.)

    We need a way to actually cancel a task, which drops its continuation and returns a JoinError indicating cancellation.

    opened by bkragl 1
  • Can't load replay schedule for max_steps failures

    Can't load replay schedule for max_steps failures

    I have a shuttle test for a crate I wrote. It occasionally hits a deadlock that shuttle reports via hitting "exceeded max_steps bound ". It gives a (very big) failing schedule that I should pass to replay in order to reproduce the issue.

    The problems are two fold:

    1. replay_from_file can't load the outputted schedule string, always panicking with "invalid schedule"
    2. ~~reducing the max_steps via a custom Config.max_steps so that the schedule is able to be embedded as an argument to replay directly ends with shuttle erroring out with "expected context switch but next schedule step is random choice".~~

    This unfortunately makes shuttle kind of useless for trying to fix this bug, since I can't exercise the reported deadlock to try and debug it under gdb or something to get a stacktrace of the stuck thread.

    opened by chc4 2
  • Is shuttle planning to support fork mode?

    Is shuttle planning to support fork mode?

    My application mutates global states (e.g., allocators, buffer pools) and may benefit from testing with separate processes. I'm wondering if it's possible for the shuttle to support fork mode. The first step could be making the PortfolioRunner run in different processes.

    opened by XiangpengHao 0
  • Implement thread park(), unpark(), and panicking()

    Implement thread park(), unpark(), and panicking()

    Already a TODO in code: https://github.com/awslabs/shuttle/blob/4a00174db03b2ef9dc4a1e0707b94827bac15f7a/src/thread.rs#L173 This would help making shuttle::thread to be an in-place substitute to std::thread.

    opened by kvark 1
  • `wait_timeout` and friends shouldn't count as deadlocking

    `wait_timeout` and friends shouldn't count as deadlocking

    Shuttle complains that this test deadlocks, but in reality it doesn't:

    #[test]
    fn wait_timeout_deadlock() {
        check_dfs(
            || {
                let lock = Arc::new(Mutex::new(false));
                let cond = Arc::new(Condvar::new());
    
                let guard = lock.lock().unwrap();
                let (_guard, result) = cond.wait_timeout(guard, Duration::from_secs(1)).unwrap();
                assert!(result.timed_out());
            },
            None,
        )
    }
    

    The problem is that, while wait_timeout temporarily blocks the thread, it's not permanent, and so shouldn't count as deadlock.

    We already knew that our modeling of wait_timeout wasn't complete because it doesn't test the timeout case, but this test shows the effects of such incompleteness—spurious failures.

    We need to have a better notion of "blocked but can be unblocked". One cheap-ish idea would be for wait_timeout to spawn another (internal to Shuttle) "thread" that, when executed, causes the thread blocked in wait_timeout to unblock and return timeout. This would let the scheduler "naturally" decide when to trigger a timeout rather than us having to build any fancy time handling into the scheduler itself.

    opened by jamesbornholt 0
  • Determinism Check Scheduler

    Determinism Check Scheduler

    Added a DeterminismCheckScheduler to check functions for determinism by recording runnable tasks at each step and comparing on future iterations of the scheduler. Wraps an inner scheduler, which can be Random, RoundRobin, or PCT. Added unit tests as well.


    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by jdeans289 0
Releases(v0.5.0)
  • v0.5.0(Nov 23, 2022)

    This version updates the embedded rand library to v0.8. Tests that use shuttle::rand will need to update to the v0.8 interface of rand, which included some breaking changes.

    • Update rand and other dependencies (#89)
    • Implement abort for future::JoinHandle (#87)
    • Correctly handle the main thread's thread-local storage destructors (#88)
    Source code(tar.gz)
    Source code(zip)
  • v0.4.1(Nov 23, 2022)

  • v0.4.0(Oct 29, 2022)

  • v0.3.0(Sep 1, 2022)

    Note that clients using async primitives provided by Shuttle (task spawn, block_on, yield_now) will need to be updated due to the renaming of the asynch module to future in this release.

    • Rust 2021 conversion and dependency bumps (#76)
    • Implement thread::park and thread::unpark (#77)
    • Implement std::hint (#78)
    • Rename the asynch module to future (#79)
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Jul 27, 2022)

    Note that failing test schedules created by versions of Shuttle before 0.2.0 will not successfully replay on version 0.2.0, and vice versa, as the changes below affect Mutex and RwLock scheduling decisions.

    • Implement Mutex::try_lock (#71)
    • Implement RwLock::{try_read, try_write} (#72)
    • Export a version of std::sync::Weak (#69)
    • Provide better error messages for deadlocks caused by non-reentrant locking (#66)
    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Apr 11, 2022)

    • Implement Condvar::wait_while and Condvar::wait_timeout_while (#59)
    • Remove implicit Sized bounds on Mutex and RwLock (#62)
    • Dependency updates (#58, #60)
    Source code(tar.gz)
    Source code(zip)
  • v0.0.7(Sep 21, 2021)

    • Fix a number of issues in support for async tasks (#50, #51, #52, #54)
    • Improve error messages when using Shuttle primitives outside a Shuttle test (#42)
    • Add support for thread local storage (the thread_local! macro) (#43, #53)
    • Add support for Once cells (#49)
    • Simplify some dependencies to improve build times (#55)
    • Move context_switches and my_clock functions into a new current module (#56)
    Source code(tar.gz)
    Source code(zip)
  • v0.0.6(Jul 8, 2021)

    • Add support for std::sync::atomic (#33)
    • Add shuttle::context_switches to get a logical clock for an execution (#37)
    • Track causality between threads (#38)
    • Better handling for double panics and poisoned locks (#30, #40)
    • Add option to not persist failures (#34)
    Source code(tar.gz)
    Source code(zip)
  • v0.0.5(Jun 11, 2021)

    • Fix a performance regression with tracing introduced by #24 (#31)
    • Include default features for the rand crate to fix compilation issues (#29)
    Source code(tar.gz)
    Source code(zip)
  • v0.0.4(Jun 2, 2021)

    • Add a timeout option to run tests for a fixed amount of time (#25)
    • Include task ID in all tracing log output (#24)
    • Implement thread::current (#23)
    Source code(tar.gz)
    Source code(zip)
  • v0.0.3(Apr 14, 2021)

    • Update for Rust 1.51 (#11)
    • Add option to bound how many steps a test runs on each iterations (#14)
    • Remove option to configure the maximum number of threads/tasks (#16, #19)
    • Make yield_now a hint to the scheduler to allow validating busy loops (#18)
    • Add ReplayScheduler::new_from_file (#20)
    Source code(tar.gz)
    Source code(zip)
  • v0.0.2(Mar 19, 2021)

  • v0.0.1(Mar 3, 2021)

Owner
Amazon Web Services - Labs
AWS Labs
Amazon Web Services - Labs
Cogo is a high-performance library for programming stackful coroutines with which you can easily develop and maintain massive concurrent programs.

Cogo is a high-performance library for programming stackful coroutines with which you can easily develop and maintain massive concurrent programs.

co-rs 47 Nov 17, 2022
A lock-free, partially wait-free, eventually consistent, concurrent hashmap.

A lock-free, partially wait-free, eventually consistent, concurrent hashmap. This map implementation allows reads to always be wait-free on certain pl

Ian Smith 216 Nov 18, 2022
Thread-safe clone-on-write container for fast concurrent writing and reading.

sync_cow Thread-safe clone-on-write container for fast concurrent writing and reading. SyncCow is a container for concurrent writing and reading of da

null 40 Jan 16, 2023
Texting Robots: A Rust native `robots.txt` parser with thorough unit testing

Texting Robots Crate texting_robots is a library for parsing robots.txt files. A key design goal of this crate is to have a thorough test suite tested

Stephen Merity 20 Aug 17, 2022
rusty-riscy is a performance testing and system resource monitoring tool written in Rust to benchmark RISC-V processors.

rusty-riscy rusty-riscy is a performance testing and system resource monitoring tool written in Rust to benchmark RISC-V processors. Objectives To cre

Suhas KV 4 May 3, 2022
OP-Up is a hive tool for testing OP-Stack-compatible software modules

op-up Warning This is a work in progress. OP-Up is a hive tool for testing OP-Stack-compatible software modules. This project was born out of the need

nicolas 20 Jun 13, 2023
An inquiry into nondogmatic software development. An experiment showing double performance of the code running on JVM comparing to equivalent native C code.

java-2-times-faster-than-c An experiment showing double performance of the code running on JVM comparing to equivalent native C code ⚠️ The title of t

xemantic 49 Aug 14, 2022
Rust 核心库和标准库的源码级中文翻译,可作为 IDE 工具的智能提示 (Rust core library and standard library translation. can be used as IntelliSense for IDE tools)

Rust 标准库中文版 这是翻译 Rust 库 的地方, 相关源代码来自于 https://github.com/rust-lang/rust。 如果您不会说英语,那么拥有使用中文的文档至关重要,即使您会说英语,使用母语也仍然能让您感到愉快。Rust 标准库是高质量的,不管是新手还是老手,都可以从中

wtklbm 493 Jan 4, 2023
Leetcode Solutions in Rust, Advent of Code Solutions in Rust and more

RUST GYM Rust Solutions Leetcode Solutions in Rust AdventOfCode Solutions in Rust This project demostrates how to create Data Structures and to implem

Larry Fantasy 635 Jan 3, 2023
Rust Sandbox [code for 15 concepts of Rust language]

Rust-Programming-Tutorial Rust Sandbox [code for 15 concepts of Rust language]. The first time I've been introduced to Rust was on January 2022, you m

Bek Brace 4 Aug 30, 2022
TypeRust - simple Rust playground where you can build or run your Rust code and share it with others

Rust playground Welcome to TypeRust! This is a simple Rust playground where you can build or run your Rust code and share it with others. There are a

Kirill Vasiltsov 28 Dec 12, 2022
In this repository you can find modules with code and comments that explain rust syntax and all about Rust lang.

Learn Rust What is this? In this repository you can find modules with code and comments that explain rust syntax and all about Rust lang. This is usef

Domagoj Ratko 5 Nov 5, 2022
Code Examples in Rust. Reviewing RUST

There are some RUST example code here. Run like this cargo run --example enums cargo run --example iterator ... You can learn about RUST coding from

James Johnson 9 Oct 1, 2022
:crab: Small exercises to get you used to reading and writing Rust code!

rustlings ?? ❤️ Greetings and welcome to rustlings. This project contains small exercises to get you used to reading and writing Rust code. This inclu

The Rust Programming Language 33.1k Jan 2, 2023
Crabzilla provides a simple interface for running JavaScript modules alongside Rust code.

Crabzilla Crabzilla provides a simple interface for running JavaScript modules alongside Rust code. Example use crabzilla::*; use std::io::stdin; #[i

Andy 14 Feb 19, 2022
a cheat-sheet for mathematical notation in Rust 🦀 code form

math-as-rust ?? Based on math-as-code This is a reference to ease developers into mathematical notation by showing comparisons with Rust code.

Eduardo Pereira 13 Jan 4, 2023
Some UwU and OwO for your Rust code

UwU Types Some UwU and OwO for your Rust code This is a Rust crate inspired by this tweet from @thingskatedid / @katef. Credits Some extra functionali

Evan Pratten 12 Feb 8, 2022
The source code that accompanies Hands-on Rust: Effective Learning through 2D Game Development and Play by Herbert Wolverson

Hands-on Rust Source Code This repository contains the source code for the examples found in Hands-on Rust. These are also available from my publisher

Herbert 261 Dec 14, 2022
This repository contains the Rust source code for the algorithms in the textbook Algorithms, 4th Edition

Overview This repository contains the Rust source code for the algorithms in the textbook Algorithms, 4th Edition by Robert Sedgewick and Kevin Wayne.

chuan 549 Dec 26, 2022