Rust library for concurrent data access, using memory-mapped files, zero-copy deserialization, and wait-free synchronization.

Overview

mmap-sync

build docs.rs crates.io License

mmap-sync is a Rust crate designed to manage high-performance, concurrent data access between a single writer process and multiple reader processes, leveraging the benefits of memory-mapped files, wait-free synchronization, and zero-copy deserialization. We're using mmap-sync for large-scale machine learning, detailed in our blog post: "Every Request, Every Microsecond: Scalable machine learning at Cloudflare".

Overview

At the core of mmap-sync is a Synchronizer structure that offers a simple interface with "write" and "read" methods, allowing users to read and write any Rust struct (T) that implements or derives certain rkyv traits.

impl Synchronizer {
    /// Write a given `entity` into the next available memory mapped file.
    pub fn write<T>(&mut self, entity: &T, grace_duration: Duration) -> Result<(usize, bool), SynchronizerError> {}

    /// Reads and returns `entity` struct from mapped memory wrapped in `ReadResult`
    pub fn read<T>(&mut self) -> Result<ReadResult<T>, SynchronizerError> {}
}

Data is stored in shared mapped memory, allowing the Synchronizer to "write" and "read" from it concurrently. This makes mmap-sync a highly efficient and flexible tool for managing shared, concurrent data access.

Mapped Memory

The use of memory-mapped files offers several advantages over other inter-process communication (IPC) mechanisms. It allows different processes to access the same memory space, bypassing the need for costly serialization and deserialization. This design allows mmap-sync to provide extremely fast, low-overhead data sharing between processes.

Wait-free Synchronization

Our wait-free data access pattern draws inspiration from Linux kernel's Read-Copy-Update (RCU) pattern and the Left-Right concurrency control technique. In our solution, we maintain two copies of the data in separate memory-mapped files. Write access to this data is managed by a single writer, with multiple readers able to access the data concurrently.

We store the synchronization state, which coordinates access to these data copies, in a third memory-mapped file, referred to as "state". This file contains an atomic 64-bit integer, which represents an InstanceVersion and a pair of additional atomic 32-bit variables, tracking the number of active readers for each data copy. The InstanceVersion consists of the currently active data file index (1 bit), the data size (39 bits, accommodating data sizes up to 549 GB), and a data checksum (24 bits).

Zero-copy Deserialization

To efficiently store and fetch data, mmap-sync utilizes zero-copy deserialization with the help of the rkyv library, directly referencing bytes in the serialized form. This significantly reduces the time and memory required to access and use data. The templated type T for Synchronizer can be any Rust struct implementing specified rkyv traits.

Getting Started

To use mmap-sync, add it to your Cargo.toml under [dependencies]:

[dependencies]
mmap-sync = "1.0.0"

Then, import mmap-sync in your Rust program:

use mmap_sync::synchronizer::Synchronizer;

Check out the provided examples for detailed usage:

  • Writer process example: This example demonstrates how to define a Rust struct and write it into shared memory using mmap-sync.
  • Reader process example: This example shows how to read data written into shared memory by a writer process.

These examples share a common module that defines the data structure being written and read.

To run these examples, follow these steps:

  1. Open a terminal and navigate to your project directory.
  2. Execute the writer example with the command cargo run --example writer.
  3. In the same way, run the reader example using cargo run --example reader.

Upon successful execution of these examples, the terminal output should resemble:

# Writer example
written: 36 bytes | reset: false
# Reader example
version: 7 messages: ["Hello", "World", "!"]

Moreover, the following files will be created:

$ stat -c '%A %s %n' /tmp/hello_world_*
-rw-r----- 36 /tmp/hello_world_data_0
-rw-r----- 36 /tmp/hello_world_data_1
-rw-rw---- 16 /tmp/hello_world_state

With these steps, you can start utilizing mmap-sync in your Rust applications for efficient concurrent data access across processes.

You might also like...
A personally annotated copy of the
A personally annotated copy of the "The Rust Programming Language"

Rust Book This is a personally annotated copy of the "The Rust Programming Language"1. Why Rust For me, I've never really been exposed to low-level sy

Proof-of-concept for a memory-efficient data structure for zooming billion-event traces

Proof-of-concept for a gigabyte-scale trace viewer This repo includes: A memory-efficient representation for event traces An unusually simple and memo

Low level access to T-Head Xuantie RISC-V processors

XuanTie Low level access to T-Head XuanTie RISC-V processors Contributing We welcome contribution! Please send an issue or pull request if you are rea

A comprehensive and FREE Online Rust hacking tutorial utilizing the x64, ARM64 and ARM32 architectures going step-by-step into the world of reverse engineering Rust from scratch.
A comprehensive and FREE Online Rust hacking tutorial utilizing the x64, ARM64 and ARM32 architectures going step-by-step into the world of reverse engineering Rust from scratch.

FREE Reverse Engineering Self-Study Course HERE Hacking Rust A comprehensive and FREE Online Rust hacking tutorial utilizing the x64, ARM64 and ARM32

High Assurance Rust - A free book about developing secure and robust systems software.

High Assurance Rust - A free book about developing secure and robust systems software.

A set of Zero Knowledge modules, written in Rust and designed to be used in other system programming environments.

Zerokit A set of Zero Knowledge modules, written in Rust and designed to be used in other system programming environments. Initial scope Focus on RLN

Free Rust 🦀 course in English 🇬🇧
Free Rust 🦀 course in English 🇬🇧

Learn Rust 🦀 Free Rust 🦀 course in English 🇬🇧 This course was inspired by Dcode Before starting to learn a programming language, you need to under

Simple CLI tool to create dummy accounts with referral links to give yourself free Plus.
Simple CLI tool to create dummy accounts with referral links to give yourself free Plus.

Free Duolingo Plus A simple CLI tool to create dummy accounts with referral links to give yourself free Plus (max 24/41 weeks depending on whether you

An expression based data notation, aimed at transpiling itself to any cascaded data notation.

Lala An expression oriented data notation, aimed at transpiling itself to any cascaded data notation. Lala is separated into three components: Nana, L

Comments
  • unsafe read & grace duration

    unsafe read & grace duration

    First of thanks for open-sourcing this crate, it seems to fit my need perfectly. It would be nice to provide a little bit more documentation regarding read being unsafe . It returns a Result of mmap_sync::guard::ReadResult, and the documentation seems to say it's safe at this point (The guard module provides a safe interface for accessing shared memory.). I'm a bit confused here.

    Regarding the grace_duration, is it some sort of timeout?

    Thanks a lot.

    opened by nbittich 1
  • Update GitHub Actions CI

    Update GitHub Actions CI

    The following updates are performed:

    Still using the outdated / unmaintained actions will generate several warnings in CI runs, for example in https://github.com/cloudflare/mmap-sync/actions/runs/5281473531:

    Node.js 12 actions are deprecated. Please update the following actions to use Node.js 16: actions/checkout@v2, actions-rs/toolchain@v1, actions-rs/cargo@v1. For more information see: https://github.blog/changelog/2022-09-22-github-actions-all-actions-will-begin-running-on-node16-instead-of-node12/.

    The PR will get rid of those warnings.

    opened by striezel 0
A minimal and fast zero-copy parser for the PE32+ file format.

peview A minimal and fast zero-copy parser for the PE32+ file format. Goal This project aims to offer a more light weight and easier to use alternativ

null 5 Dec 20, 2022
Zero-copy, no-std proquint encoding and decoding

proqnt A pronounceable quintuplet, or proquint, is a pronounceable 5-letter string encoding a unique 16-bit integer. Proquints may be used to encode b

Jad Ghalayini 11 May 9, 2023
🍋: A General Lock following paper "Optimistic Lock Coupling: A Scalable and Efficient General-Purpose Synchronization Method"

Optimistic Lock Coupling from paper "Optimistic Lock Coupling: A Scalable and Efficient General-Purpose Synchronization Method" In actual projects, th

LemonHX 22 Oct 13, 2022
Custom deserialization for fields that can be specified as multiple types.

serde-this-or-that Custom deserialization for fields that can be specified as multiple types. This crate works with Cargo with a Cargo.toml like: [dep

Ritvik Nag 7 Aug 25, 2022
Wait, another virtual machine ?

WAVM WAVM, Wait, another virtual machine ?, is a register based 64 bits virtual machine written in Rust. It relies on 32 registers and 31 opcodes that

Wafelack 61 May 2, 2022
Cogo is a high-performance library for programming stackful coroutines with which you can easily develop and maintain massive concurrent programs.

Cogo is a high-performance library for programming stackful coroutines with which you can easily develop and maintain massive concurrent programs.

co-rs 47 Nov 17, 2022
Shuttle is a library for testing concurrent Rust code

Shuttle Shuttle is a library for testing concurrent Rust code. It is an implementation of a number of randomized concurrency testing techniques, inclu

Amazon Web Services - Labs 373 Dec 27, 2022
Library and proc macro to analyze memory usage of data structures in rust.

Allocative: memory profiler for Rust This crate implements a lightweight memory profiler which allows object traversal and memory size introspection.

Meta Experimental 19 Jan 6, 2023
Thread-safe clone-on-write container for fast concurrent writing and reading.

sync_cow Thread-safe clone-on-write container for fast concurrent writing and reading. SyncCow is a container for concurrent writing and reading of da

null 40 Jan 16, 2023
A turing-complete programming language using only zero-width unicode characters, inspired by brainfuck and whitespace.

Zero-Width A turing-complete programming language using only zero-width unicode characters, inspired by brainfuck and whitespace. Currently a (possibl

Gavin M 2 Jan 14, 2022