A high-performance SPSC bounded circular buffer of bytes

Related tags

Utilities cueue
Overview

Cueue

A high performance, single-producer, single-consumer, bounded circular buffer of contiguous elements, that supports lock-free atomic batch operations, suitable for inter-thread communication.

Example

fn main() {
    let (mut w, mut r) = cueue::cueue(1 << 20).unwrap();

    let buf = w.write_chunk();
    assert!(buf.len() >= 9);
    buf[..9].copy_from_slice(b"foobarbaz");
    w.commit(9);

    let read_result = r.read_chunk();
    assert_eq!(read_result, b"foobarbaz");
    r.commit();
}

A bounded cueue of requested capacity is referenced by a single Writer and a single Reader. The Writer can request space to write (write_chunk), limited by the queue capacity minus the already committed but unread space. Requested space can written to, then committed (end_write). A special feature of this container is that stored elements are always initialized, (in the beginning, defaulted, therefore T must implement Default), and only dropped when the queue is dropped. Therefore, the writer can reuse previously written, but then consumed elements (useful if the elements own e.g: heap allocated memory), and in those cases, contention on the producers heap lock is avoided (that is otherwise present, if the consumer drops heap allocated elements continuously that the producer allocated).

The Reader can check out the written elements (read_chunk), process it at will, then mark it as consumed (commit). The returned slice of elements might be a result of multiple writer commits (i.e: the reading is batched), but it never include uncommitted elements (i.e: write commits are atomic). This prevents the reader observing partial messages.

Use-case

This data structure is designed to allow one thread (actor) sending variable-sized messages (bytes) to a different thread (actor), that processes the messages in batches (e.g: writes them to a file, sends them over the network, etc.). For example, asynchronous logging.

Alternative options:

  • Use a standard channel of Strings (or Vec<u8>). This is slow, because strings require memory allocations, and with one thread allocating and the other deallocating, quickly yields to contention on the heap lock.

  • Use a standard channel of fixed size arrays. Works, but bounds the size of the messages and wastes memory.

  • Use two ringbuffers of Vec<u8> containers (one for sending data, one for reusing the consumed vectors). Does not allow efficient reading (separate messages are not contiguous). Requires to estimate the max number of messages in flight, instead of the max sum of size of messages.

This data structure uses a single array of the user specified capacity. At any given time, this array is sliced into three logical parts: allocated for writing, ready for reading, unwritten. (Any maximum two of the three can be zero sized)

write_chunk joins the unwritten part to the part already allocated for writing: the result is limited by the capacity minus the space ready for reading. Writer::commit makes the written space ready for reading, zeroing the slice allocated for writing. read_chunk determines the boundary of the space ready for reading, Reader::commit marks this space unwritten. Thanks for the truly circular nature of cueue, the writer and reader can freely chase each other around.

How Does it Work

The cueue constructor creates a memory area, and maps it into virtual memory twice, the two maps next to each other. This means that for the resulting map of capacity cap, map[0] and map[cap], refers to the same byte. (In general, map[N] and map[cap+N] are the same for every 0 <= N < cap indices)

With this double map, there's no need to wrap around, this maximises the useful capacity of the queue during any point of usage, and simplifies the indexing logic of the code. Synchronization between writer and reader is done by atomic operations, there are no mutexes or lock ASM instruction prefixes (on the tested platforms: x86 and M1).

(Not shown here, but this structure also allows inter-process communication using shared memory, and data recovery from coredumps)

Limitations

  • Supported platforms: Linux (3.17) and macOS
  • rust 1.63
  • Uses unsafe operations

Build and Test

$ cargo build
$ cargo test
$ cargo run --example basics
$ cargo fmt
$ cargo clippy
$ cargo bench
$ cargo doc --open

Acknowledgments

This is a rust port of the binlog C++ cueue. The interface names are changed to match rtrb, to make it more familiar for Rust developers.

You might also like...
Membrane is an opinionated crate that generates a Dart package from a Rust library. Extremely fast performance with strict typing and zero copy returns over the FFI boundary via bincode.

Membrane is an opinionated crate that generates a Dart package from a Rust library. Extremely fast performance with strict typing and zero copy returns over the FFI boundary via bincode.

A Diablo II library for core and simple client functionality, written in Rust for performance, safety and re-usability

A Diablo II library for core and simple client functionality, written in Rust for performance, safety and re-usability

syncmap is a fast, concurrent cache library built with a focus on performance and correctness.

syncmap syncmap syncmap is a fast, concurrent cache library syncmap is a fast, concurrent cache library built with a focus on performance and correctn

Let Tauri's transparent background rendering window be stacked on Bevy's rendering window in real time to run the game with native-level performance!

Native Bevy with Tauri HUD DEMO 将 Tauri 的透明背景渲染窗口实时叠在 Bevy 的渲染窗口上,以使用原生级别性能运行游戏! Let Tauri's transparent background rendering window be stacked on Bev

A high level diffing library for rust based on diffs
A high level diffing library for rust based on diffs

Similar: A Diffing Library Similar is a dependency free crate for Rust that implements different diffing algorithms and high level interfaces for it.

High level rust abstractions for the libretro API

libretro-rs Design Philosophy The approach to this crate can best be summarized as wanting to expose all functionality, even if not idiomatically. The

hy-rs, pronounced high rise, provides a unified and portable to the hypervisor APIs provided by various platforms.

Introduction The hy-rs crate, pronounced as high rise, provides a unified and portable interface to the hypervisor APIs provided by various platforms.

High-level documentation for rerun

rerun-docs This is the high-level documentation for rerun that is hosted at https://www.rerun.io/docs Other documentation API-level documentation for

A code generator to reduce repetitive tasks and build high-quality Rust libraries. 🦀

LibMake A code generator to reduce repetitive tasks and build high-quality Rust libraries Welcome to libmake 👋 Website • Documentation • Report Bug •

Comments
  • Change write_pos()/read_pos() from &mut to &

    Change write_pos()/read_pos() from &mut to &

    Previously, this allowed two threads to have a &mut to the same thing at the same time, which isn't really Rust's thing.

    I'm not sure if it's actually UB, because it was only used in a non-mutable fashion, but since it doesn't have to be mutable, why make it mutable?

    I wanted to test this with Miri, but sadly this doesn't work because libc::memfd_create() is not supported by Miri.

    opened by mgeier 2
  • Make is_abandoned non-mut

    Make is_abandoned non-mut

    I don't think is_abandoned() has to take &mut self, nor should it.

    That's how I'm doing it in rtrb: https://github.com/mgeier/rtrb/blob/7390730529fc474691846480de3c4ea59d4688fb/src/lib.rs#L429-L431

    opened by mgeier 1
  • Get rid of lifetime specifiers

    Get rid of lifetime specifiers

    • Release 0.1.1
    • Make Cueue generic over the contained elements
    • Use is_abandoned in bench
    • Add Writer push method
    • Get rid of lifetime specifiers from Writer/Reader
    opened by erenon 0
Owner
Thaler Benedek
Thaler Benedek
Stack buffer provides alternatives to Buf{Reader,Writer} allocated on the stack instead of the heap.

StackBuf{Reader,Writer} Stack buffer provides alternatives to BufReader and BufWriter allocated on the stack instead of the heap. Its implementation i

Alex Saveau 14 Nov 20, 2022
Simple async codec for rkyv. Reuses streaming buffer for maximum speed

rkyv_codec Simple async codec for rkyv. Reuses streaming buffer for maximum speed! This crate provides a makeshift adaptor for streaming &Archived<Obj

Zyansheep 19 Jun 14, 2022
A memory efficient immutable string type that can store up to 24* bytes on the stack

compact_str A memory efficient immutable string type that can store up to 24* bytes on the stack. * 12 bytes for 32-bit architectures About A CompactS

Parker Timmerman 342 Jan 2, 2023
Rc version `tokio-rs/bytes`

RcBytes The aim for this crate is to implement a Rc version bytes, which means that the structs in this crate does not implement the Sync and Send. Th

Al Liu 2 Aug 1, 2022
LinkedBytes is a linked list of Bytes and BytesMut.

LinkedBytes LinkedBytes is a linked list of Bytes and BytesMut (though we use VecDeque to implement it now). It is primarily used to manage Bytes and

Volo 5 Dec 9, 2022
Monorep for fnRPC (high performance serverless rpc framework)

fnrpc Monorep for fnRPC (high performance serverless rpc framework) cli Cli tool help build and manage functions Create RPC functions Create & Manage

Faasly 3 Dec 21, 2022
High-performance BitTorrent tracker compatible with UNIT3D tracker software

UNIT3D-Announce High-performance backend BitTorrent tracker compatible with UNIT3D tracker software. Usage # Clone this repository $ git clone https:/

HDInnovations 4 Feb 6, 2023
High-performance QEMU memory and instruction tracing

Cannoli Cannoli is a high-performance tracing engine for qemu-user. It can record a trace of both PCs executed, as well as memory operations. It consi

Margin Research 412 Oct 18, 2023
A high-performance Lambda authorizer for API Gateway that can validate OIDC tokens

oidc-authorizer A high-performance token-based API Gateway authorizer Lambda that can validate OIDC-issued JWT tokens. ?? Use case This project provid

Luciano Mammino 4 Oct 30, 2023
High-performance, Reliable ChatGLM SDK natural language processing in Rust-Lang

RustGLM for ChatGLM Rust SDK - 中文文档 High-performance, high-quality Experience and Reliable ChatGLM SDK natural language processing in Rust-Language 1.

Blueokanna 3 Feb 29, 2024