a simple, non-self-describing data-interchange format.

Overview

rust-fr

'rust-fr' (aka rust for real) is a simple, non-self-describing data-interchange format.

installation

You can use either of these methods.

  • Add via cargo add rust-fr
  • Add via Cargo.toml
[dependencies]
rust-fr = "1"

usage.

use serde::{Serialize, Deserialize};
use rust_fr::{serializer, deserializer};

// define some data
#[derive(Debug, Serialize, Deserialize, PartialEq, Eq)]
struct Human {
    name: String,
    age: u8
};
let human = Human {
    name: "Ayush".to_string(),
    age: 19
};

// serialize the data to bytes (Vec<u8>)
let human_bytes = serializer::to_bytes(&human).unwrap();

// deserialize the data from serialized bytes.
let deserialized_human = deserializer::from_bytes::<Human>(&human_bytes).unwrap();

assert_eq!(human, deserialized_human);

benchmark.

  • Run cargo test -- --nocapture --ignored to run the benchmark tests.
running 3 tests
---- Small Data ----
rust_fr:        218 bytes
serde_json:     332 bytes
rmp_serde:      146 bytes
ciborium:       170 bytes
test tests::length_test_small_data ... ok
---- Medium Data ----
rust_fr:        14264 bytes
serde_json:     30125 bytes
rmp_serde:      10731 bytes
ciborium:       18347 bytes
test tests::length_test_medium_data ... ok
---- Large Data ----
rust_fr:        139214 bytes
serde_json:     367595 bytes
rmp_serde:      157219 bytes
ciborium:       198277 bytes
test tests::length_test_large_data ... ok

test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 4 filtered out; finished in 0.01s

why?

The goal was to learn/understand. I wrote this so I can learn how serde internally works and how to encode data into bytes that can ultimately be transferred over the wire or elsewhere.

format specification.

  • The format is non-self-describing.
  • Primitive types are serialized as is.
    • bool: 0 -> false, 1 -> true (1 bit)
    • i8, i16, i32, i64: as is.
    • u8, u16, u32, u64: as is.
    • f32, f64: as is.
    • char: as u32 (4 bytes)
  • Delimiters are used to separate different types of data.
  • String, Byte and Map Delimiters are 1 byte long while all other delimiters are 3 bits long.
  • Delimiters:
    • String = 134; 0b10000110
    • Byte = 135; 0b10000111
    • Unit = 2; 0b010
    • Seq = 3; 0b011
    • SeqValue = 4; 0b100
    • Map = 139; 0b10001011
    • MapKey = 6; 0b110
    • MapValue = 7; 0b111
  • String, Bytes, Unit, Option are serialized as:
    • str: bytes + STRING_DELIMITER
    • bytes: bytes + BYTE_DELIMITER
    • unit: UNIT (null)
    • option: None -> unit(), Some -> self
  • Structs are serialized as:
    • unit_struct: unit()
    • newtype_struct: self
    • tuple_struct: seq()
  • Enums are serialized as:
    • unit_variant: variant_index
    • newtype_variant: variant_index + self
    • tuple_variant: variant_index + tuple()
    • struct_variant: variant_index + struct()
  • seq(): Sequences are serialized as:
    • SEQ_DELIMITER + value_1 + SEQ_VALUE_DELIMITER + value_2 + SEQ_VALUE_DELIMITER + ... + SEQ_DELIMITER
  • map(): Maps are serialized as:
    • key_1 + MAP_KEY_DELIMITER + value_1 + MAP_VALUE_DELIMITER + key_2 + MAP_KEY_DELIMITER + value_2 + MAP_VALUE_DELIMITER + ... + MAP_DELIMITER
  • Tuples and Structs are serialized as:
    • tuple: seq()
    • struct: map()

license.

It's MIT so you can do whatever you want. You can still read it here.

You might also like...
Temporary elevated access management as a self-hosted service
Temporary elevated access management as a self-hosted service

💻🔐☁️ S A T O U N K I Temporary elevated access management as a self-hosted service Overview Satounki is a self-hosted service which brings visibilit

Standalone analytics provider and realtime dashboard designed for self-hosting.
Standalone analytics provider and realtime dashboard designed for self-hosting.

Stats Stats is a high-performance, standalone analytics provider designed for self-hosting, enabling the collection and viewing of event data from web

Self-Hosted Remote Dev Environment
Self-Hosted Remote Dev Environment

Lapdev Self-hosted remote development enviroment management with ease Lapdev is a self hosted application that spins up remote development environment

Non-Recursive Inverting of Binary Tree in Rust

Non-Recursive Inverting of Binary Tree in Rust The idea is to implement the classical Inverting of Binary Tree but without using recursion. Quick Star

Non-interactive nREPL client for shell scripts and command-line

nreplops-tool (nr) nreplops-tool (nr) is a non-interactive nREPL client designed to be used in shell scripts and on the command-line. Early α warning:

Cost saving K8s controller to scale down and up of resources during non-business hours

Kube-Saver Motivation Scale down cluster nodes by scaling down Deployments, StatefulSet, CronJob, Hpa during non-business hours and save $$, but if yo

No non-sense dotfiles linker

dotlink A simple program that can help you link all your dotfiles in place. Supports multiple presets, in order to avoid linking every file in every m

Concurrent and multi-stage data ingestion and data processing with Rust+Tokio

TokioSky Build concurrent and multi-stage data ingestion and data processing pipelines with Rust+Tokio. TokioSky allows developers to consume data eff

Infer a JSON schema from example data, produce nonsense synthetic data (drivel) according to the schema

drivel drivel is a command-line tool written in Rust for inferring a schema from an example JSON (or JSON lines) file, and generating synthetic data (

Comments
  • feat: bit optimization

    feat: bit optimization

    the problem.

    Currently our token are 8 bits (u8) in length. The thing is, we only have 8 tokens and this perfectly fits into 3 bits. We can substantially reduce the size of the encoded data by migrating from 8 bit tokens to 3 bit tokens.

    so what.

    The PR switched from 8 bit tokens to 3 bit tokens.

    gotchas.

    • We gotta be extra careful around String and Bytes since the only way to decode them is to march forward on them. The problem is when marching forward you could encounter string bits that can resemble token bits causing you to exit early. The only workaround that works for this is to use fixed length tokens in the 8 bit space. Hence for strings and bytes. We return back to 8 bit space.

    before.

    image

    after.

    image

    opened by is-it-ayush 0
Releases(v1.0.1)
  • v1.0.1(Feb 28, 2024)

    What's Changed

    • release: patch 1.0.1 by @is-it-ayush in https://github.com/is-it-ayush/rust-fr/pull/13

    Full Changelog: https://github.com/is-it-ayush/rust-fr/compare/v1.0.0...v1.0.1

    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Feb 26, 2024)

    What's Changed

    • fix: readme links by @is-it-ayush in https://github.com/is-it-ayush/rust-fr/pull/2
    • feat: bit optimization @is-it-ayush in https://github.com/is-it-ayush/rust-fr/pull/3
    • chore: docs @is-it-ayush in https://github.com/is-it-ayush/rust-fr/pull/6
    • chore: version bump @is-it-ayush https://github.com/is-it-ayush/rust-fr/pull/7
    • release: v1.0.0 by @is-it-ayush in https://github.com/is-it-ayush/rust-fr/pull/8

    Full Changelog: https://github.com/is-it-ayush/rust-fr/compare/v0.1.0...v1.0.0

    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Feb 20, 2024)

    • The initial release

    What's Changed

    • feat: add ci by @is-it-ayush in https://github.com/is-it-ayush/rust-fr/pull/1

    New Contributors

    • @is-it-ayush made their first contribution in https://github.com/is-it-ayush/rust-fr/pull/1

    Full Changelog: https://github.com/is-it-ayush/rust-fr/commits/v0.1.0

    Source code(tar.gz)
    Source code(zip)
Owner
Ayush
i have internet access.
Ayush
Given a set of kmers (fasta format) and a set of sequences (fasta format), this tool will extract the sequences containing the kmers.

Kmer2sequences Description Given a set of kmers (fasta / fastq [.gz] format) and a set of sequences (fasta / fastq [.gz] format), this tool will extra

Pierre Peterlongo 22 Sep 16, 2023
Databento Binary Encoding (DBZ) - Fast message encoding and storage format for market data

dbz A library (dbz-lib) and CLI tool (dbz-cli) for working with Databento Binary Encoding (DBZ) files. Python bindings for dbz-lib are provided in the

Databento, Inc. 15 Nov 4, 2022
UniSBOM is a tool to build a software bill of materials on any platform with a unified data format.

UniSBOM is a tool to build a software bill of materials on any platform with a unified data format. Work in progress Support MacOS Uses system_profile

Simone Margaritelli 32 Nov 2, 2022
An apocalypse-resistant data storage format for the truly paranoid.

Carbonado An apocalypse-resistant data storage format for the truly paranoid. Designed to keep encrypted, durable, compressed, provably replicated con

diba-io 30 Dec 29, 2022
PyO3's PyAny as a serde data format

serde-pyobject PyO3's PyAny as a serde data format Usage Serialize T: Serialize into &'py PyAny: use serde::Serialize; use pyo3::{Python, types::{PyAn

Jij 3 Nov 24, 2023
Self-contained template system with Handlebars and inline shell scripts

Handlematters Self-contained template system with Handlebars and inline shell scripts Introduction Handlematters is a template system that combines Ha

Keita Urashima 3 Sep 9, 2022
Work-in-progress Rust application that converts C++ header-only libraries to single self-contained headers.

unosolo Work-in-progress Rust application that converts C++ header-only libraries to single self-contained headers. Disclaimer This is my first Rust p

Vittorio Romeo 26 Jul 9, 2021
Guardian Self Assessment CLI tool

Guardian Self Assessment CLI tool What? self-assessment is a tool that generates a list of PRs authored and reviewed by you. Why? Assessing oneself is

The Guardian 5 Jul 6, 2022
CLI for self-bootstrapped Python applications

PyApp PyApp is a CLI wrapper for Python applications that bootstrap themselves at runtime. Each application is configured with environment variables a

Ofek Lev 6 May 10, 2023
A self-contained, unopinionated, fast and lightweight executable launcher.

Kickoff ?? A self-contained, unopinionated, fast and lightweight executable launcher. Supported Platforms Platform Host Target aarch64-apple-macos-non

Nimbus 18 Oct 27, 2023