Lazy Binary Serialization

Overview

LBS

crates.io | docs.rs

Library name stands for Lazy Binary Serialization. We call it "lazy" because it does not serizalize/deserialize struct fields initialized with default values like 0, 0.0, "", Option::None, empty containers and so on. This simple technique makes LBS much faster than other binary serialization libraries when it comes to large structures.

LBS emerged from a highload project which demands very cheap deserialization of large structures (about 160 fields) where only some fields are explicitly filled and others are initialized with default values.

Why no serde?

Serde is great and convenient framework, but, being an additional layer between user's and serialization library's code, it incurs substantial performance penalty. Moreover, serde 1.0.125 still does not allow to use numeric values as field names (or identifiers) which is another performance penalty if we only want to serialize some fields and omit others.

Why not just use other libraries without serde?

Indeed, there are some binary serialization libraries out there which do not require serde. Good examples are speedy and borsh, but, unfortunately, they do not support conditional serialization - every field must be serialized/deserialized. There is also msgpack-rust (rmp) which does support conditional serialization as maps, but it is still not fast enough for our use case.

Format specification

LBS uses very simple, not self-describing format. Byte order is always little-endian.

Every struct field or enum variant is assigned with numeric ID (u16), either implicitly, using field/variant index, or explicitly, with lbs attribute.

Struct fields with default values are omitted during serialization/deserialization. Field may be explicitly ommited with lbs(omit) attribute.

During struct deserialization each field is initialized with default() method of it's type. Custom constructor may be explicitly defined with lbs_default attribute.

Type Omitted if Representation
() always void
u{8-128}, i{8-128}, f{32-64}, usize 0 as is
bool false u8
char '\0' u32
String / str is_empty() length (u32) + content ([u8])
std::time::Duration as_nanos() == 0 secs (u64) + subsec_nanos (u32)
std::time::SystemTime never as Duration since UNIX_EPOCH
std::net::ip::Ipv4Addr is_unspecified() u32
std::net::ip::Ipv6Addr is_unspecified() u128
std::net::ip::IpAddr is_unspecified() is v4 (u8) + u32 or u128
std::ops::Range<T> is_empty() start (T) + end (T)
Option<T> is_none() 0u8 or (1u8, T)
Box<T> / Rc<T> / Arc<T> / Cow<'a, T> T is omitted T
Vec<T> is_empty() length (u32) + content ([T])
HashMap<K, V> / BTreeMap<K, V> is_empty() length (u32) + content ([(K, V)])
HashSet<T> / BTreeSet<T> is_empty() length (u32) + content ([T])
chrono::DateTime (feature chrono) never secs (i64) + subsec_nanos (u32)
smallvec::SmallVec<T> (feature smallvec) is_empty() length (u32) + content ([T])
struct never field_count (u8) + field IDs and values ([(u8, T)])
enum never variant ID (u8) + optional value (T)
ipnet::IpNet is_unspecified() is v4 (u8) + u32 or u128

Third-party types coverage

Obviously it's impossible for library author to cover all possible types. Currently, the only way to enable LBS for some unsupported third-party type is to use New Type pattern and implement LBSWrite and LBSRead traits manually.

Safety

No unsafe code is used.

Status

Format or API changes may be introduced until v1.0.0.

Minimal Rust Version

1.52.1

Benchmark results

Disclaimer. Never trust third-party benchmarks. Always make your own measurements using your specific data/workload/configuration/hardware.

Hardware: 2,6 GHz 6-Core Intel Core i7 (12 vCPU), 16 GB 2667 MHz DDR4

OS: MacOS Big Sur

Allocator: mimalloc, without encryption

Data: struct with 160 fields, mostly strings. Only 20 fields are initialized with non-default values. Other fields (with default values) are ommited by serde_json, rmp_serde and LBS (speedy does not allow this).

Library Serialization Deserialization Size when serialized
serde_json 823.87 ns 2.9540 us 606 bytes
rmp_serde 550.54 ns 2.3190 us 522 bytes
speedy 620.72 ns 1.2825 us 944 bytes
LBS 242.62 ns 683.75 ns 307 bytes

Usage

There are LBSWrite and LBSRead traits which implementations can be derived for structs and enums.

use bytes::{Buf, BufMut, Bytes, BytesMut};
use lbs::{LBSRead, LBSWrite};
use std::{
    borrow::Cow,
    collections::{BTreeMap, BTreeSet, HashMap, HashSet},
    net::{IpAddr, Ipv4Addr, Ipv6Addr},
    ops::Range,
    rc::Rc,
    str::FromStr,
    sync::Arc,
    time::{Duration, SystemTime},
};

// IDs for most fields are assigned explicitly, using #[lbs(<id>)] attribute.
// Other fields receive implicit IDs (member index).
#[derive(LBSWrite, LBSRead)]
struct SomeStruct<'a> {
    #[lbs(0)]
    f0: u8,
    #[lbs(1)]
    f1: u16,
    #[lbs(2)]
    f2: u32,
    #[lbs(3)]
    f3: u64,
    #[lbs(4)]
    f4: usize,
    #[lbs(5)]
    f5: u128,
    #[lbs(6)]
    f6: i8,
    #[lbs(7)]
    f7: i16,
    #[lbs(8)]
    f8: i32,
    #[lbs(9)]
    f9: i64,
    #[lbs(10)]
    f10: f32,
    #[lbs(11)]
    f11: f64,
    #[lbs(12)]
    f12: (),
    #[lbs(13)]
    f13: bool,
    #[lbs(14)]
    f14: char,
    #[lbs(15)]
    f15: String,
    #[lbs(16)]
    f16: Duration,
    #[lbs(17)]
    #[lbs_default(SystemTime::now())]
    f17: SystemTime,
    #[lbs(18)]
    #[lbs_default(Ipv4Addr::UNSPECIFIED)]
    f18: Ipv4Addr,
    #[lbs(19)]
    #[lbs_default(Ipv6Addr::UNSPECIFIED)]
    f19: Ipv6Addr,
    #[lbs(20)]
    #[lbs_default(IpAddr::V4(Ipv4Addr::UNSPECIFIED))]
    f20: IpAddr,
    #[lbs(21)]
    #[lbs_default(Range{start:0, end:0})]
    f21: Range<u64>,
    #[lbs(22)]
    f22: Box<Vec<u64>>,
    #[lbs(23)]
    f23: Rc<String>,
    #[lbs(24)]
    f24: Arc<String>,
    #[lbs(25)]
    #[lbs_default(Arc::from(""))]
    f25: Arc<str>,
    #[lbs(26)]
    f26: Cow<'a, str>,
    #[lbs(27)]
    f27: Option<SystemTime>,
    #[lbs(28)]
    f28: Vec<String>,
    #[lbs(29)]
    f29: HashMap<String, u64>,
    #[lbs(30)]
    f30: BTreeMap<u64, String>,
    #[lbs(31)]
    f31: HashSet<String>,
    #[lbs(32)]
    f32: BTreeSet<u64>,
    #[lbs_default(chrono::Utc::now())]
    f33: chrono::DateTime<chrono::Utc>,
    f34: smallvec::SmallVec<[i64; 4]>,
    f35: AnotherStruct,
    #[lbs_default(SomeEnum::One)]
    f36: SomeEnum,
    #[lbs(omit)]
    f37: bool,
    f38: ipnet::
}

// Field IDs are assigned implicitly, using their index
#[derive(LBSWrite, LBSRead, Default)]
struct AnotherStruct {
    id: String,
    done: bool,
}

// Variant IDs are assigned implicitly, using their index
#[derive(LBSWrite, LBSRead)]
enum SomeEnum {
    One,
    Two,
    Three(String),
}

impl Default for SomeEnum {
    fn default() -> Self {
        SomeEnum::One
    }
}

#[test]
fn usage() {
    let mut original = SomeStruct {
        f0: 0,
        f1: 1,
        f2: 2,
        f3: 3,
        f4: 4,
        f5: 5,
        f6: 0,
        f7: -1,
        f8: -2,
        f9: -3,
        f10: 0.0,
        f11: -3.14,
        f12: (),
        f13: true,
        f14: 'a',
        f15: String::from("test"),
        f16: Duration::from_millis(1000),
        f17: SystemTime::now(),
        f18: Ipv4Addr::new(192, 168, 1, 2),
        f19: Ipv6Addr::from_str("2001:0db8:85a3:0000:0000:8a2e:0370:7334").unwrap(),
        f20: IpAddr::V4(Ipv4Addr::UNSPECIFIED),
        f21: Range { start: 0, end: 1 },
        f22: Box::new(vec![1, 2, 3]),
        f23: Rc::new(String::from("test_rc")),
        f24: Arc::new(String::from("test_arc")),
        f25: Arc::from("test_str_arc"),
        f26: Cow::Owned(String::from("test_cow")),
        f27: None,
        f28: Vec::new(),
        f29: HashMap::new(),
        f30: BTreeMap::new(),
        f31: HashSet::new(),
        f32: BTreeSet::new(),
        f33: chrono::Utc::now(),
        f34: smallvec::smallvec![0, 1],
        f35: AnotherStruct::default(),
        f36: SomeEnum::Three(String::from("test_enum")),
        f37: true,
    };

    original.f29.insert(String::from("key1"), 1);
    original.f29.insert(String::from("key2"), 2);

    original.f30.insert(1, String::from("key1"));
    original.f30.insert(2, String::from("key2"));

    original.f31.insert(String::from("key1"));
    original.f31.insert(String::from("key2"));

    original.f32.insert(1);
    original.f32.insert(1);

    // Serialize
    let mut buf = Vec::with_capacity(128);
    original.lbs_write(&mut buf).unwrap();

    // Deserialize
    let decoded = SomeStruct::lbs_read(&mut buf.as_slice()).unwrap();

    assert_eq!(decoded.f0, original.f0);
    assert_eq!(decoded.f1, original.f1);
    assert_eq!(decoded.f2, original.f2);
    assert_eq!(decoded.f3, original.f3);
    assert_eq!(decoded.f4, original.f4);
    assert_eq!(decoded.f5, original.f5);
    assert_eq!(decoded.f6, original.f6);
    assert_eq!(decoded.f7, original.f7);
    assert_eq!(decoded.f8, original.f8);
    assert_eq!(decoded.f9, original.f9);
    assert_eq!(decoded.f10, original.f10);
    assert_eq!(decoded.f11, original.f11);
    assert_eq!(decoded.f12, original.f12);
    assert_eq!(decoded.f13, original.f13);
    assert_eq!(decoded.f14, original.f14);
    assert_eq!(decoded.f15, original.f15);
    assert_eq!(decoded.f16, original.f16);
    assert_eq!(decoded.f17, original.f17);
    assert_eq!(decoded.f18, original.f18);
    assert_eq!(decoded.f19, original.f19);
    assert_eq!(decoded.f20, original.f20);
    assert_eq!(decoded.f21, original.f21);
    assert_eq!(decoded.f22, original.f22);
    assert_eq!(decoded.f23, original.f23);
    assert_eq!(decoded.f24, original.f24);
    assert_eq!(decoded.f25, original.f25);
    assert_eq!(decoded.f26, original.f26);
    assert_eq!(decoded.f27, original.f27);
    assert_eq!(decoded.f28, original.f28);
    assert_eq!(decoded.f29, original.f29);
    assert_eq!(decoded.f30, original.f30);
    assert_eq!(decoded.f31, original.f31);
    assert_eq!(decoded.f32, original.f32);
    assert_eq!(decoded.f33, original.f33);
    assert_eq!(decoded.f34, original.f34);
    assert_eq!(decoded.f35.id, original.f35.id);
    assert_eq!(decoded.f35.done, original.f35.done);

    if let SomeEnum::Three(s) = decoded.f36 {
        assert_eq!(s, "test_enum")
    } else {
        panic!("not SomeEnum::Three")
    }

    assert_eq!(decoded.f37, false);
}

#[test]
fn usage_batch() {
    let o1 = AnotherStruct {
        id: "1".to_string(),
        done: false,
    };

    let o2 = AnotherStruct {
        id: "2".to_string(),
        done: true,
    };

    // Serialize batch
    let batch = BytesMut::new();
    let mut w = batch.writer();
    o1.lbs_write(&mut w).unwrap();
    o2.lbs_write(&mut w).unwrap();

    // Deserialize batch
    let batch = w.into_inner();
    let mut r = batch.reader();
    let mut decoded = Vec::new();

    while r.get_ref().has_remaining() {
        decoded.push(AnotherStruct::lbs_read(&mut r).unwrap());
    }

    assert_eq!(decoded.len(), 2);
    assert_eq!(decoded[0].id, o1.id);
    assert_eq!(decoded[0].done, o1.done);
    assert_eq!(decoded[1].id, o2.id);
    assert_eq!(decoded[1].done, o2.done);
}
You might also like...
Lazy Sieve of Eratosthenes for infinitely generating primes lazily in Rust.

lazy-prime-sieve Lazy Sieve of Eratosthenes for infinitely generating primes lazily in Rust. Usage lazy-prime-sieve is a library crate. You may add it

Binary coverage tool without binary modification for Windows
Binary coverage tool without binary modification for Windows

Summary Mesos is a tool to gather binary code coverage on all user-land Windows targets without need for source or recompilation. It also provides an

Binary coverage tool without binary modification for Windows
Binary coverage tool without binary modification for Windows

Summary Mesos is a tool to gather binary code coverage on all user-land Windows targets without need for source or recompilation. It also provides an

A HTTP Archive format (HAR) serialization & deserialization library, written in Rust.

har-rs HTTP Archive format (HAR) serialization & deserialization library, written in Rust. Install Add the following to your Cargo.toml file: [depende

tnetstring serialization library for rust.

TNetStrings: Tagged Netstrings This module implements bindings for the tnetstring serialization format. API let t = tnetstring::str("hello world"); le

openapi schema serialization for rust

open api Rust crate for serializing and deserializing open api documents Documentation install add the following to your Cargo.toml file [dependencies

Benchmarks for rust serialization frameworks

Rust serialization benchmark The goal of these benchmarks is to provide thorough and complete benchmarks for various rust serialization frameworks. Th

Msgpack serialization/deserialization library for Python, written in Rust using PyO3, and rust-msgpack. Reboot of orjson. msgpack.org[Python]

ormsgpack ormsgpack is a fast msgpack library for Python. It is a fork/reboot of orjson It serializes faster than msgpack-python and deserializes a bi

Alkahest - Fantastic serialization library.

Alkahest is serialization library aimed for packet writing and reading in hot path. For this purpose Alkahest avoids allocations and reads data only on demand.

A simple entity-component-system crate for rust with serialization support

Gallium A simple entity-component-system crate for rust with serialization support Usage You can include the library using carge: [dependencies] galli

A rustic tcp + serialization abstraction.

Wire An abstraction over TCP and Serialization "put a struct in one side and it comes out the other end" Wire is a library that makes writing applicat

WebAssembly serialization/deserialization in rust

parity-wasm Low-level WebAssembly format library. Documentation Rust WebAssembly format serializing/deserializing Add to Cargo.toml [dependencies] par

A SOME/IP serialization format using the serde framework

serde_someip implements SOME/IP ontop of serde use serde::{Serialize, Deserialize}; use serde_someip::SomeIp; use serde_someip::options::ExampleOption

serde-like serialization and deserialization of static Rust types in XML

static-xml static-xml is a serde-like serialization and deserialization library for XML, currently written as a layer on top of xml-rs. Status: in ear

Deser: an experimental serialization and deserialization library for Rust

deser: an experimental serialization and deserialization library for Rust Deser is an experimental serialization system for Rust. It wants to explore

Library with support for de/serialization, parsing and executing on data-structures and network messages related to Bitcoin
Library with support for de/serialization, parsing and executing on data-structures and network messages related to Bitcoin

Rust Bitcoin Library with support for de/serialization, parsing and executing on data-structures and network messages related to Bitcoin. Heads up for

Rust libraries and tools to help with interoperability and testing of serialization formats based on Serde.

The repository zefchain/serde-reflection is based on Facebook's repository novifinancial/serde-reflection. We are now maintaining the project here and

A memcomparable serialization format.

memcomparable A memcomparable serialization format. The memcomparable format allows comparison of two values by using the simple memcmp function. Inst

Schemars is a high-performance Python serialization library, leveraging Rust and PyO3 for efficient handling of complex objects

Schemars Introduction Schemars is a Python package, written in Rust and leveraging PyO3, designed for efficient and flexible serialization of Python c

Comments
  • how to use lbs_read_fn?

    how to use lbs_read_fn?

    I see lbs_read_fn in docs, but cant understand how to use it?

    // Attributes available to this derive:
        #[lbs]
        #[lbs_default]
        #[lbs_read_fn]
    

    It would be nice to have ability to implement custom read/write functions for specific fields.

    opened by serzhiio 1
Releases(v0.3.0)
Owner
Roman Kuzmin
Roman Kuzmin
tnetstring serialization library for rust.

TNetStrings: Tagged Netstrings This module implements bindings for the tnetstring serialization format. API let t = tnetstring::str("hello world"); le

Erick Tryzelaar 16 Jul 14, 2019
A binary encoder / decoder implementation in Rust.

Bincode A compact encoder / decoder pair that uses a binary zero-fluff encoding scheme. The size of the encoded object will be the same or smaller tha

Bincode 1.9k Dec 29, 2022
Fast binary serialization with versioning.

BinVerSe (Binary Versioned Serializer) Provides fast binary serialization with versioning to store data in a backwards-compatible, compact way. Right

Linus Dikomey 4 Mar 25, 2022
Rust implementation of the Binary Canonical Serialization (BCS) format

Binary Canonical Serialization (BCS) BCS (formerly "Libra Canonical Serialization" or LCS) is a serialization format developed in the context of the D

Zefchain Labs 4 Nov 13, 2022
High-order Virtual Machine (HVM) is a pure functional compile target that is lazy, non-garbage-collected and massively parallel

High-order Virtual Machine (HVM) High-order Virtual Machine (HVM) is a pure functional compile target that is lazy, non-garbage-collected and massivel

null 5.5k Jan 2, 2023
🔈 Elegant print for lazy devs

leg ?? Elegant print for lazy devs Make your CLIs nicer with minimal effort. Simple wrapper on top of: async-std printing macros. Prints to stderr to

Jesús Rubio 202 Nov 6, 2022
A simple, stable and thread-safe implementation of a lazy value

Laizy Laizy is a Rust library that provides a simple, stable and thread-safe implementation of a Lazy Features Name Description Dependencies nightly A

Alex 5 May 15, 2022
Rate limit guard - Lazy rate limit semaphore implementation to control your asynchronous code frequency execution

Lazy rate limit semaphore (a.k.a fixed window algorithm without queueing) implementation to control your asynchronous code frequency execution

Yan Kurbatov 4 Aug 1, 2022
Embeddable tree-walk interpreter for a "mostly lazy" Lisp-like scripting language.

ceceio Embeddable tree-walk interpreter for a "mostly lazy" Lisp-like scripting language. Just a work-in-progress testbed for now. Sample usage us

Vinícius Miguel 7 Aug 18, 2022
Too lazy to read the full article? Skim it

SkimGPT When you're too lazy to either read the article or ask AI questions, you can use SkimGPT to help you. Install Clone this repo: git clone https

Huy 9 Apr 22, 2023