CBOR: Concise Binary Object Representation

Overview

CBOR 0x(4+4)9 0x49

github actions crates license docs.rs

“The Concise Binary Object Representation (CBOR) is a data format whose design goals include the possibility of extremely small code size, fairly small message size, and extensibility without the need for version negotiation.”

see rfc8949

Compatibility

The core mod should be fully compatible with rfc8949, but some extensions will not be implemented in this crate, such as datetime, bignum, bigfloat.

The serde mod defines how Rust types should be expressed in CBOR, which is not any standard, so different crate may have inconsistent behavior.

This library is intended to be compatible with serde_cbor, but will not follow some unreasonable designs of serde_cbor.

  • cbor4ii will express the unit type as an empty array instead of null. This avoids the problem that serde_cbor cannot distinguish between None and Some(()). see https://github.com/pyfisch/cbor/issues/185
  • cbor4ii does not support packed mode, and it may be implemented in future, but it may not be compatible with serde_cbor. If you want packed mode, you should look at bincode.

Performance

It is not specifically optimized for performance in implementation, but benchmarks shows that its performance is slightly better than serde_cbor.

And it supports zero-copy deserialization and deserialize_ignored_any of serde, so in some scenarios it may perform better than crate that do not support such feature.

Robustness

The decode part has been fuzz tested, and it should not crash or panic during the decoding process.

The decode of serde module has a depth limit to prevent stack overflow or OOM caused by specially constructed input. If you want to turn off deep inspection or adjust parameters, you can implement the dec::Read trait yourself.

License

This project is licensed under the MIT license.

Comments
  • Support for RFC-7049 Canonical CBOR key ordering

    Support for RFC-7049 Canonical CBOR key ordering

    This library explicitly specifies RF-8949, so this request might be out of scope.

    In the project I want to use cbor4ii for I'm stuck with RFC-7049 Canonical CBOR key ordering. This means that keys are sorted by their length first. I wonder if that could perhaps be added behind a feature flag. Here is an implementation the seems to work. I didn't create a PR as this clearly needs more discussion first.

        fn collect_map<K, V, I>(self, iter: I) -> Result<(), Self::Error>
        where
            K: ser::Serialize,
            V: ser::Serialize,
            I: IntoIterator<Item = (K, V)>,
        {
            #[cfg(not(feature = "use_std"))]
            use crate::alloc::vec::Vec;
            use serde::ser::SerializeMap;
    
            // TODO vmx 2022-04-04: This could perhaps be upstreamed, or the
            // `cbor4ii::serde::buf_writer::BufWriter` could be made public.
            impl enc::Write for Vec<u8> {
                type Error = crate::alloc::collections::TryReserveError;
    
                fn push(&mut self, input: &[u8]) -> Result<(), Self::Error> {
                    self.try_reserve(input.len())?;
                    self.extend_from_slice(input);
                    Ok(())
                }
            }
    
            // CBOR RFC-7049 specifies a canonical sort order, where keys are sorted by length first.
            // This was later revised with RFC-8949, but we need to stick to the original order to stay
            // compatible with existing data.
            // We first serialize each map entry into a buffer and then sort those buffers. Byte-wise
            // comparison gives us the right order as keys in DAG-CBOR are always strings and prefixed
            // with the length. Once sorted they are written to the actual output.
            let mut buffer: Vec<u8> = Vec::new();
            let mut mem_serializer = Serializer::new(&mut buffer);
            let mut serializer = Collect {
                bounded: true,
                ser: &mut mem_serializer,
            };
            let mut entries = Vec::new();
            for (key, value) in iter {
                serializer.serialize_entry(&key, &value)
                   .map_err(|_| enc::Error::Msg("Map entry cannot be serialized.".into()))?;
                entries.push(serializer.ser.writer.clone());
                serializer.ser.writer.clear();
            }
    
            TypeNum::new(major::MAP << 5, entries.len() as u64).encode(&mut self.writer)?;
            entries.sort_unstable();
            for entry in entries {
                self.writer.push(&entry)?;
            }
    
            Ok(())
        }
    

    I'd also like to note that I need even more changes for my use case (it's a subset of CBOR), for which I will need to fork this library. Nonetheless I think it would be a useful addition and I'd also prefer if the fork would be as minimal as possible. I thought I bring it up, to make clear that it won't be a showstopper if this change wouldn't be accepted.

    opened by vmx 12
  • Making more things public

    Making more things public

    I've now created a working Serde implementation for my needs, that is based on cbor4ii core. Though I still need to patch core as some things that I need for the deserializer are not public.

    core::marker, core::dec::peek_one, core::dec::pull_one and core::dec::decode_len are currently pub(crate), but are needed by my deserializer (which is more or less a copy of yours). Could those be made public?

    Besides those, I've copy and paste also other things. I'm OK with having those duplicated if you don't think they should be part of the public interface, though I surely prefer having less code in my crate. Those are:

    • core::util::ScopeGuard: I've a full copy of that without any changes.
    • serde::io_buf_reader::IoReader: I've added a constructor. As I don't use the serde1 one feature, it would be cool if it could perhaps be added to core::utils.
    • serde::io_writer::IoWrite: I've added a constructor. Same as with the reader, having it in core::utils would be great.
    opened by vmx 6
  • Should this work?

    Should this work?

    use serde::{Deserialize, Serialize};
    
    #[derive(Deserialize, Serialize)]
    enum Foo {
        A(String),
    }
    
    fn main() {
        let foo = Foo::A(String::new());
    
        let mut data = Vec::new();
        cbor4ii::serde::to_writer(&mut data, &foo).unwrap();
    
        let reader = std::io::BufReader::new(data.as_slice());
        let _: Foo = cbor4ii::serde::from_reader(reader).unwrap();
    }
    

    I get:

    thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: RequireBorrowed { name: "str" }', src/main.rs:15:54
    

    Can you tell me what am I doing wrong?

    opened by smoelius 4
  • core: tag parsing without value

    core: tag parsing without value

    It's not easily possible to parse a tag without the value with the current public API. This commit introduces a TagStart, which is similar to ArrayStart and MapStart. It only parses the tag into a u64, without advancing to the value.

    This is the API I came up with, as always, if there's a better way to solve my problem of parsing a tag value with the current public API, I'm happy to hear about it.

    opened by vmx 3
  • serde-derived `Serialize`/`Deserialize` impl for `Value` is not working as expected

    serde-derived `Serialize`/`Deserialize` impl for `Value` is not working as expected

    The Value type is expected to serialize/deserialize from data it represents. For the following test code:

        fn test_deserialize_null_value() {
            let val = crate::core::Value::Null;
            let ret = crate::serde::to_vec(vec![], &val).unwrap();
            eprintln!("{:02x?}", &ret);
        }
    

    Expected Output:

    [f6]
    

    which is the primitive value null, but

    Actual Output:

    [64, 4e, 75, 6c, 6c]
    

    which is a 4-character UTF-8 text Null.

    It looks like the generated implementation from the derive directives interprets Value as a externally tagged enum. Is it a mistake or a feature by design?

    opened by bdbai 3
  • serde: use scopeguard dependency

    serde: use scopeguard dependency

    Instead of having a custom scope guard implementation use the established scopeguard crate.

    I made this change to my Serde implementation and I thought I upstream it. I'm a fan of having less code to maintain. scopeguard itself doesn't have any further dependencies so the overall build time/code size should stay similar. Though I can also understand if you prefer keeping your dependencies to a minimum.

    opened by vmx 2
  • serde: give access to underlying writer

    serde: give access to underlying writer

    When using the Serializer from an external crate it can be useful to be able to access the underlying writer.

    It turns out that the code I posted at https://github.com/quininer/cbor4ii/issues/13#issuecomment-1095342912 only works with this patch.

    opened by vmx 2
  • core: expose a public in-memory writer

    core: expose a public in-memory writer

    BufWriter can be used to serialize things into memory. The API is inspired by std::io::BufWriter.

    In case you wonder why the decode tests suddenly need use_std, that's been the case even before this change.

    opened by vmx 2
  • core: fix decoding of the maximum negative 64-bit value as i128

    core: fix decoding of the maximum negative 64-bit value as i128

    Decoding of the maximum negative 64-bit value in CBOR (-2^64 = -18446744073709551616) wasn't possible and resulted in an overflow error.

    This commit also adds test for smaller values, e.g. decoding the maximum negative 32-bit as i64. Those were already working correctly.

    opened by vmx 1
  • core: decode_len can be private

    core: decode_len can be private

    core::dec::decode_len doesn't need to be public to the crate, it can be private. This PR is based on the comment by @quininer at https://github.com/quininer/cbor4ii/issues/16#issuecomment-1098666029

    opened by vmx 1
  • serde: fix overflow on i64::MIN

    serde: fix overflow on i64::MIN

    When decoding the minimum value of i64 (-9223372036854775808), there is an overflow error. With this fix it can be deserialized correctly.

    The reason for the failure was the order of the operations. Prior to this change the decoding had these steps:

    1. Decode the bytes into a u64 => 9223372036854775807
    2. Add 1 => 9223372036854775808
    3. Cast to i64 => error as i64::MAX is 9223372036854775807.
    4. Negate the number

    The new steps are:

    1. Decode the bytes into a u64 => 9223372036854775807
    2. Cast to i64 => 9223372036854775807
    3. Negate the number => -9223372036854775807
    4. Subtract 1 => -9223372036854775808
    opened by vmx 1
  • Support zero copy using the bytes crate

    Support zero copy using the bytes crate

    I have a few cases where I would like to deserialize cbor from an incoming BytesMut into cbor struct that looks sth like this

    struct Foo {
       pub data: Bytes,
       // other small fields
    }
    

    where I very much need to avoid copying the data (that data is on the order of 100s of MB often).

    I tried hacking sth together, but didn't quite get anywhere, so wondering if you could give me some hints on how this could be done. Thanks.

    opened by dignifiedquire 4
  • Remove `*Start` types from dec module

    Remove `*Start` types from dec module

    Instead of having TagStart, ArrayStart and MapStart, implement functions directly on the concrete types.

    This PR tries to address the comment at https://github.com/quininer/cbor4ii/pull/22#issuecomment-1108791827.

    I've based it on the 0.3 branch. But this also means that I haven't tested it with my own Serde implementation (which is still on on 0.2.x), but I don't see a reason why it shouldn't work there. I didn't want to spent too much time on this PR as I'm not sure if that's a good approach or not. Though I'm happy to spend more time on it, in case it's the right direction.

    opened by vmx 2
  • improve some case

    improve some case

    I noticed some cases where Cow<str> is not enough.

    For example, decoding a struct with a short lifetime reader requires a memory allocation for each fields name. This is unnecessary, because we only need to judge whether the key is as expected, and we don't need to use it. Also, the automatic allocation of memory on the heap makes it difficult for us to improve this.

    I'm thinking of exposing the decode_buf interface in some form to get around this.

    Change Decode trait

    I considered changing the Decode trait to allow this optimization. like

    trait Decode<'de, T> {
        fn decode<R: Read<'de>>(&mut self, reader: &mut R) -> Result<T, Error>;
    }
    

    This allows decoding the object without allocating any memory, just identifying if it is as expected. will look like this

    struct Expect<'a> {
        expect: &'a str,
        count: usize
    }
    
    impl<'de> Decode<'de, bool> for Expect<'a> {
        fn decode<R: Read<'de>>(&mut self, reader: &mut R) -> Result<T, Error> {
            let mut result = true;
            while let Some(want_len) = self.expect.len().checked_sub(self.count) {
                let buf = reader.fill(want_len)?;
                if buf.is_empty() { return Err(Error::Eof) };
                let len = cmp::min(buf.len(), want_len);
                if self.expect.as_bytes()[self.count..][..len] != buf {
                    result = true;
                    break
                }
                self.count += len;
                reader.advance(len);
            }
            Ok(result)
        }
    }
    

    This also allows for more precise memory allocations, such as decode bytes to stack

    struct StackVec([u8; 1024]);
    
    impl<'de> Decode<'de, &[u8]> for StackVec {
        fn decode<R: Read<'de>>(&mut self, reader: &mut R) -> Result<T, Error> {
            let mut len = decode_len(reader)?;
            let mut count = 0;
            while len != 0 {
                let buf = reader.fill(len)?;
                let buf_len = buf.len()
                if buf_len + count > self.0.len() { return Err(Error::Eof) };
                self.0[count..][..buf_len)].copy_from_slice(&buf);
                count += buf_len;
                reader.advance(buf_len);
            }
            Ok(&self.0[..count])
        }
    }
    
    enhancement 
    opened by quininer 1
  • Remove decode_with

    Remove decode_with

    The idea of decode_with is to avoid peeking byte multiple times, which might be good for performance, but the api becomes a little bit more complicated.

    We should try remove it and see if there is a negative impact on performance.

    opened by quininer 0
  • Error refactor

    Error refactor

    Error seems a bit confusing now, especially DecodeError.

    • Split core::Error and serde::Error so that you don't care about Msg kind when using core mod.
    • Merge kinds such as Eof/RequireLength, Mismatch/TypeMismatch, etc.

    Since this affects the api, it will be in the next version.

    opened by quininer 2
Owner
quininer
二十世紀,末日未接近時出生。
quininer
A peer-reviewed collection of articles/talks/repos which teach concise, idiomatic Rust.

This repository collects resources for writing clean, idiomatic Rust code. Please bring your own. ?? Idiomatic coding means following the conventions

Matthias 4.2k Dec 30, 2022
A bit-packed k-mer representation (and relevant utilities) for rust

K-mer class for rust The purpose of this repository is to build a simple library that exposes a bit-packed k-mer class for use in rust-based bioinform

COMBINE lab 41 Dec 15, 2022
Bril: A Compiler Intermediate Representation for Learning

Bril: A Compiler Intermediate Representation for Learning Bril (the Big Red Intermediate Language) is a compiler IR made for teaching CS 6120, a grad

Lesley Lai 0 Dec 5, 2022
A Rust trait to convert numbers of any type and size to their English representation.

num2english This Rust crate provides the NumberToEnglish trait which can be used to convert any* number to its string representation in English. It us

Travis A. Wagner 6 Mar 8, 2023
Rust implementation of Surging Object DiGraph (SODG)

This Rust library implements a Surging Object DiGraph (SODG) for reo virtual machine for EO programs. Here is how you can create a di-graph: use sodg:

Objectionary 8 Jan 14, 2023
Rust on ESP32 STD "Hello, World" app. A "Hello, world!" STD binary crate for the ESP32[XX] and ESP-IDF.

Rust on ESP32 STD "Hello, World" app A "Hello, world!" STD binary crate for the ESP32[XX] and ESP-IDF. This is the crate you get when running cargo ne

Ivan Markov 138 Jan 1, 2023
Serialize & deserialize device tree binary using serde

serde_device_tree Use serde framework to deserialize Device Tree Blob binary files; no_std compatible. Use this library Run example: cargo run --examp

Luo Jia 20 Aug 20, 2022
Base Garry's Mod binary module (Rust)

gmod-module-base-rs A base for developing Garry's Mod binary modules in Rust. Getting Started Install Rust Download or git clone this repository Open

William 7 Jul 30, 2022
Safe Rust bindings to the DynamoRIO dynamic binary instrumentation framework.

Introduction The dynamorio-rs crate provides safe Rust bindings to the DynamoRIO dynamic binary instrumentation framework, essentially allowing you to

S.J.R. van Schaik 17 Nov 21, 2022
Convert PNG image files to binary for use with AgonLight (TM)

image2agon Converts PNG files to binary data for AgonLight (TM) usage. This document is for version V1.0 of the program. V1.0 - initial upload NOTE: T

null 5 Apr 30, 2023
Convert character to binary using Rust.

Character-to-Binary-Rust This is a simple operation that is used to convert character to binary using Rust. Installation and Requirements First instal

Kariappa K R 8 Nov 20, 2023
It's a library AND a binary, but at what cost?

aria-of-borrow It's a library AND a binary, but at what cost? This is a simple toy project that demonstrates the various failure modes of trying to ma

Aria Beingessner 5 Apr 2, 2024
CBOR (binary JSON) for Rust with automatic type based decoding and encoding.

THIS PROJECT IS UNMAINTAINED. USE serde_cbor INSTEAD. This crate provides an implementation of RFC 7049, which specifies Concise Binary Object Represe

Andrew Gallant 121 Dec 27, 2022
Merge multiple Juniper object definitions into a single object type.

juniper-compose Merge multiple Juniper object definitions into a single object type. crates.io | docs | github Motivation You are building a GraphQL s

Kit Isaev 3 Aug 5, 2022
This crate allows to generate a flat binary with the memory representation of an ELF.

flatelf Library This crate allows to generate a flat binary with the memory representation of an ELF. It also allows to generate a FLATELF with the fo

Roi Martin 3 Sep 29, 2022
A peer-reviewed collection of articles/talks/repos which teach concise, idiomatic Rust.

This repository collects resources for writing clean, idiomatic Rust code. Please bring your own. ?? Idiomatic coding means following the conventions

Matthias 4.2k Dec 30, 2022
Concise Reference Book for the Bevy Game Engine

Unofficial Bevy Cheat Book Click here to read the book! Concise reference to programming in the Bevy game engine. Covers useful syntax, features, prog

null 947 Jan 8, 2023
A simple wrapper for the detour-rs library that makes making hooks much more concise

A simple wrapper for the detour-rs library that makes making hooks much more concise

Khangaroo 6 Jun 21, 2022
An abstract, safe, and concise color conversion library for rust nightly This requires the feature adt_const_params

colortypes A type safe color conversion library This crate provides many methods for converting between color types. Everything is implemented abstrac

Jacob 13 Dec 7, 2022
Binary coverage tool without binary modification for Windows

Summary Mesos is a tool to gather binary code coverage on all user-land Windows targets without need for source or recompilation. It also provides an

null 384 Dec 30, 2022