An asynchronous, multi-producer, single-consumer (MPSC) bounded channel that operates at tachyonic speeds

Overview

tachyonix

An asynchronous, multi-producer, single-consumer (MPSC) bounded channel that operates at tachyonic speeds.

This library is an offshoot of Asynchronix, an ongoing effort at a high performance asynchronous computation framework for system simulation.

No laws of physics were broken in the making of this library.

Cargo Documentation License

Overview

This is a no-frills async channel which only claim to fame is to be extremely fast (see benchmarks), without taking any shortcuts on correctness and implementation quality. Its performance mainly results from its focus on the MPSC use-case and from a number of careful optimizations, among which:

  • aggressively optimized notification primitives for full-queue and empty-queue events (the latter is courtesy of diatomic-waker, a fast, spinlock-free alternative to atomic-waker),
  • no allocation after channel creation except for blocked sender notifications,
  • no spinlocks whatsoever, and no mutex in the hot path (the only mutex is a std::sync::mutex used for blocked senders notifications),
  • underlying queue optimized for single receiver.

Usage

Add this to your Cargo.toml:

[dependencies]
tachyonix = "0.2.0"

Example

use tachyonix;
use futures_executor::{block_on, ThreadPool};

let pool = ThreadPool::new().unwrap();

let (s, mut r) = tachyonix::channel(3);

block_on( async move {
    pool.spawn_ok( async move {
        assert_eq!(s.send("Hello").await, Ok(()));
    });
    
    assert_eq!(r.recv().await, Ok("Hello"));
});

Limitations

The original raison d'être of this library was to provide a less idiosyncratic sibling to the channels developed for Asynchronix that could be easily benchmarked against other channel implementations. The experiment turned out better than anticipated so a slightly more fleshed out version was released for public consumption in the hope that others may find it useful. However, its API surface is intentionally kept small and it does not aspire to become much more than it is today. More importantly, it makes trade-offs that may or may not be acceptable depending on your use-case:

  • just like most other async channels except the MPSC channels in the tokio and futures crates, fairness for blocked senders is not enforced: while the first sender blocked on a full channel is indeed notified first, it may still be outrun if another sender happens to be scheduled before; if your application requires better fairness guarantees, you should use tokio's or futures's channels.
  • just like most other async channels except the MPSC channels in the futures crate, the effective capacity of the channel decreases with each "forgotten" blocked sender (i.e. blocked senders which, for some reason, were not polled to completion but were not dropped either) and the channel will eventually deadlock if the effective capacity drops to zero; if this can happen in your application, you should use futures's channels.
  • just like most other async channel with the exception of flume, its low-level primitives rely on unsafe (see dedicated section),
  • zero-capacity channels (a.k.a. rendez-vous channels) are not supported.

Safety

Despite the focus on performance, implementation quality and correctness are the highest priority.

The library comes with a decent battery of tests, in particular for all low-level (unsafe) concurrency primitives which are extensively tested with Loom, complemented with MIRI for integrations tests. As amazing as they are, however, Loom and MIRI cannot formally prove the absence of data races so soundness issues are possible. You should therefore exercise caution before using it in mission-critical software until it receives more testing in the wild.

Benchmarks

Benchmarks overview

A custom benchmarking suite was implemented that can test a number of popular MPSC and MPMC channels with several executors (Tokio, async-std, smolscale and Asynchronix).

It contains at the moment 2 benchmarks:

  • pinball: an upgraded version of the classical pin-pong benchmark where messages ("balls") perform a random walk between 13 vertices ("pins") of a fully connected graph; it is parametrized by the total number of balls within the graph,
  • funnel: the most common MPSC benchmark where messages are sent in a tight loop from 13 senders to a unique receiver; it is parametrized by the channel capacity.

Each benchmark executes 61 instances of an elementary bench rig, which ensures that all executor threads are busy at nearly all times. The pinball benchmark is a relatively good proxy for performance in situations where channel receivers are often starved but senders are never blocked (i.e. the channel capacity is always sufficient).

Regardless of its popularity, the funnel benchmark is less realistic and less objective as it is sensitive not only to the absolute speed of enqueue, dequeue and notifications, but is also strongly affected by their relative speed and by other subtle details. Its extrapolation to real-life performance is rather debatable.

More information about these benchmarks can be found in the bench repo.

Benchmark results

Keep in mind that raw speed is not everything: every channel makes design choices and trade-offs (regarding e.g. unsafety, fairness, mpmc support, ...) which can have a significant impact on performance. Be sure to read the sections about limitations and safety.

The benchmarks were run on EC2 instances of comparable performance but different micro-architectures (Intel Ice Lake, AMD Zen 3, ARM Graviton 2). The reported performance is the mean number of messages per microsecond after averaging over 10 benchmark runs (higher is better).

The reported results were obtained with Tokio, which in practice was found significantly faster than either async-std or smolscale. Asynchronix is faster yet, but less relevant as a baseline as it is not meant for general-purpose async programming.

EC2 c6i.2xlarge

Alt text

EC2 c6a.2xlarge

Alt text

EC2 c6g.2xlarge

Alt text

License

This software is licensed under the Apache License, Version 2.0 or the MIT license, at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

You might also like...
A high-performance SPSC bounded circular buffer of bytes

Cueue A high performance, single-producer, single-consumer, bounded circular buffer of contiguous elements, that supports lock-free atomic batch opera

Multi-channel signed distance field (MSDF) generator for fonts implemented in pure Rust.

msdfont WIP - school started so less updates from now on :(( Multi-channel signed distance field (MSDF) generator for fonts implemented in pure Rust.

A super-easy, composable, web server framework for warp speeds.

warp A super-easy, composable, web server framework for warp speeds. The fundamental building block of warp is the Filter: they can be combined and co

Download a file using multiple threads in parallel for faster download speeds.

multidl Download a file using multiple threads in parallel for faster download speeds. Uses 0 external dependencies. Usage Usage: multidl [--help] ADD

Code for comparing CDN speeds!

How to run speed test. the image to use The image you should probably use is: cf_219kb.png cf_219kb.png is an image that won't be compressed by Jetpac

 🔎 Search millions of files at lightning-fast speeds to find what you are looking for
🔎 Search millions of files at lightning-fast speeds to find what you are looking for

🔎 Search millions of files at lightning-fast speeds to find what you are looking for

⚡️ NFT Snapshot script written in Rust for blazingly fast speeds ⚡️

⚡️ NFT SNAPSHOT ⚡️ This project uses rust's blazingly fast performance along with the ethers-rs library to read blockchain state. Setup You will have

Tight Model format is a lossy 3D model format focused on reducing file size as much as posible without decreasing visual quality of the viewed model or read speeds.
Tight Model format is a lossy 3D model format focused on reducing file size as much as posible without decreasing visual quality of the viewed model or read speeds.

What is Tight Model Format The main goal of the tmf project is to provide a way to save 3D game assets compressed in such a way, that there are no not

Hitbox is an asynchronous caching framework supporting multiple backends and suitable for distributed and for single-machine applications.

Hitbox is an asynchronous caching framework supporting multiple backends and suitable for distributed and for single-machine applications.

An asynchronous dumb exporter proxy for prometheus. This aggregates all the metrics and exposes as a single scrape endpoint.

A dumb light weight asynchronous exporter proxy This is a dumb lightweight asynchronous exporter proxy that will help to expose multiple application m

All-batteries included GStreamer WebRTC producer

webrtcsink All-batteries included GStreamer WebRTC producer, that tries its best to do The Right Thing™. Use case The webrtcbin element in GStreamer i

command line tools for coprolite research (paleontology and archaeology): estimate the producer's body mass based on coprolite diameter by the use of regression models
command line tools for coprolite research (paleontology and archaeology): estimate the producer's body mass based on coprolite diameter by the use of regression models

OVERVIEW OF COPROSIZE coprosize employs power, exponential and cubic regression models allowing to estimate the producer's body mass based on coprolit

Unlock vGPU functionality for consumer grade GPUs

Rust-based vgpu_unlock Unlock vGPU functionality for consumer-grade NVIDIA GPUs. This tool is to be used with the kernel patches from the main vgpu_un

Twitch data consumer and broadcaster

NeoTwitch Arch Network broadcaster Chat (message buffer) If the message buffer is full then shut down Channel point events If the message buffer is fu

A customizable, simple and easy to use json REST API consumer

JACK is a generic JSON API client. It is useful to interact with APIs from multiple services such as Google and Twitter

Zinnia is a runtime for Filecoin Station modules. It provides a sandboxed environment to execute untrusted code on consumer-grade computers.
Zinnia is a runtime for Filecoin Station modules. It provides a sandboxed environment to execute untrusted code on consumer-grade computers.

🌼 Zinnia Zinnia is a runtime for Filecoin Station modules. It provides a sandboxed environment to execute untrusted code on consumer-grade computers.

CarLI is a framework for creating single-command and multi-command CLI applications in Rust

CarLI is a framework for creating single-command and multi-command CLI applications in Rust. The framework provides error and IO types better suited for the command line environment, especially in cases where unit testing is needed.

Rust library to scan files and expand multi-file crates source code as a single tree

syn-file-expand This library allows you to load full source code of multi-file crates into a single syn::File. Features: Based on syn crate. Handling

Single and multi-threaded custom ingestion crate for Stellar Futurenet, written in Rust.

rs-ingest Ingestion library written in rust for Futurenet rs-ingest Ingestion library written in rust for Futurenet Features Running offline Single-th

Comments
  • Intrusive linked lists should be stored in a pinned location

    Intrusive linked lists should be stored in a pinned location

    By storing the Notifier inside the sender instead of in the send future, it is possible to exhibit a use-after-free in the following manner. Note that this requires use of mem::forget.

    // Fails with use-after-free on miri.
    use futures_executor::block_on;
    use std::future::{poll_fn, Future};
    use std::task::Poll;
    
    fn main() {
        let (s, mut r) = tachyonix::channel(1);
        let mut s_box = Box::new(s);
        block_on(s_box.send(10i32)).unwrap();
    
    
        let mut send_fut = Box::pin(s_box.send(10i32));
        assert!(block_on(poll_fn(|cx| Poll::Ready(send_fut.as_mut().poll(cx)))).is_pending());
        std::mem::forget(send_fut);
        drop(s_box);
    
        assert!(r.try_recv().is_ok());
    }
    

    I believe that making notifier be a local variable in the send future should solve this issue. In that case, the memory containing the forgotten send_fut still remains valid since the memory containing it is leaked. This is how Tokio implements the same thing.

    bug 
    opened by Darksonn 5
  • Always allocate notifiers for blocked senders

    Always allocate notifiers for blocked senders

    Previous versions of tachyonix would cache the notifiers and their associated waker in the sender. While in theory this spares both on allocation and waker cloning, it appears to prevent some optimizations and actually results in worse performance across nearly all benchmarks and benchmark parameters.

    So this is a "worse" version which appears to perform better and makes it possible to have the sender take self by shared reference rather than by mutable reference.

    Also, this commit removes the UnsafeCell around the waker, which was there only to enable tracking through loom but which turned out to have a non-negligible impact on performance.

    opened by sbarral 0
  • Fix notifier list

    Fix notifier list

    So this is my 3rd rewrite of a fix for issue #1. Hopefully 3rd time is the charm...

    What I tried

    First attempt

    I initially tried the approach which I suggested at the bottom of the issue: box the notifiers and make Sender ensure that the notifier is no longer in the list before it is deallocated. While I believe this was a valid approach, it made the WaitUntil future an unsafe leaky abstraction and made it necessary to mark pretty much everything around it as unsafe, making the code ugly and brittle.

    Second attempt

    I then decided to take advantage of structural pinning in the WaitUntil future and store the notifier inline, which is I guess the approach hinted at by @Darksonn. It looked great since no allocation was needed, and I even figured I could still cache the waker with an approach similar to what I use in this PR. The problem is the cognitive load necessary to make this sound and keep it sound as the code evolves. A good reading on this topic is Sabrina Jewson's pinned-aliasable crate and her ultimate tour-de-force, the pin-list crate.

    Third attempt

    After this sobering dive into pin-based intrusive lists, I decided to keep things simple and use the tried-and-true approach to intrusive list, using only boxed notifiers but this time passing them by value so that the WaitUntil future owns the boxed notifier until it is removed from the intrusive list (or forever if forgotten). On future completion, the boxed notifier returns by value to the sender to be cached with its waker, so the notifier is only ever boxed once unless the previous send was not polled to completion.

    What now?

    I will leave this PR sit here a couple of days in case anyone was willing to review it (many thanks in advance if anyone does!), and to give myself a little more time to think about it.

    The only interesting files to review are lib.rs and event.rs (second commit). The others changes only relates to adding new tests that would catch this issue. They made the diff bigger than I'd like because I had to segregate tests that can legitimately leak memory and must run with -Zmiri-ignore-leaks.

    What after

    After this is merged, I will proceed with a small commit adding inlining hints, because for some reason this fix throws off rustc a little and impacts performance even though it shouldn't, but I have identified the culprit and this can be easily repaired.

    opened by sbarral 0
Owner
Asynchronics
Asynchronics
Twitch data consumer and broadcaster

NeoTwitch Arch Network broadcaster Chat (message buffer) If the message buffer is full then shut down Channel point events If the message buffer is fu

Togglebit 3 Dec 3, 2021
A fully asynchronous, futures-based Kafka client library for Rust based on librdkafka

rust-rdkafka A fully asynchronous, futures-enabled Apache Kafka client library for Rust based on librdkafka. The library rust-rdkafka provides a safe

Federico Giraud 1.1k Jan 8, 2023
Handoff is an unbuffered, single-producer / single-consumer, async channel

handoff handoff is a single-producer / single-consumer, unbuffered, asynchronous channel. It's intended for cases where you want blocking communicatio

Nathan West 7 Feb 7, 2023
Fast multi-producer, multi-consumer unbounded channel with async support.

Hyperbridge Fast multi-producer, multi-consumer unbounded channel with async support. Inspired by crossbeam unbounded channel. Examples Hyperbridge::c

Anton 1 Apr 20, 2022
A safe sync/async multi-producer, multi-consumer channel

Loole A safe async/sync multi-producer multi-consumer channel. Producers can send and consumers can receive messages asynchronously or synchronously:

Mahdi Shojaee 50 Oct 6, 2023
A single-producer single-consumer Rust queue with smart batching

Batching Queue A library that implements smart batching between a producer and a consumer. In other words, a single-producer single-consumer queue tha

Roland Kuhn 2 Dec 21, 2021
A lock-free multi-producer multi-consumer unbounded queue.

lf-queue A lock-free multi-producer multi-consumer unbounded queue. Examples [dependencies] lf-queue = "0.1" Single Producer - Single Consumer: use lf

Pierre Brouca 2 Sep 11, 2022
Single-reader, multi-writer & single-reader, multi-verifier; broadcasts reads to multiple writeable destinations in parallel

Bus Writer This Rust crate provides a generic single-reader, multi-writer, with support for callbacks for monitoring progress. It also provides a gene

Pop!_OS 26 Feb 7, 2022
A Rust on-site channel benchmarking helper. Inter-Process (async / busy) & Intra-Process (async single threaded / async multi threaded)

On-Site Rust Channel Benchmarking Helper Deploy on server to determine which public crates are the fastest for communicating in different architecture

null 23 Jul 9, 2024
Used to generate and compare bounded timestamps.

ClockBound Summary: ClockBound allows you to generate and compare bounded timestamps that include accumulated error as reported from the local chronyd

Amazon Web Services 149 Nov 28, 2022