Deser: an experimental serialization and deserialization library for Rust

Overview

deser: an experimental serialization and deserialization library for Rust

Crates.io License Documentation

Deser is an experimental serialization system for Rust. It wants to explore the possibilities of serialization and deserialization of structural formats such as JSON or msgpack. It intentionally does not desire to support non self describing formats such as bincode.

This is not a production ready yet.

use deser::{Serialize, Deserialize};

#[derive(Debug, Serialize, Deserialize)]
#[deser(rename_all = "camelCase")]
pub struct Account {
    id: usize,
    account_holder: String,
    is_deactivated: bool,
}

This generates out the necessary Serialize and Deserialize implementations.

To see what this looks like behind the scenes there are two examples that show how structs are implemented:

  • derive: shows an example using automatic deriving
  • manual-struct: shows the same example with a manual implementation

Design Goals

  • Fast Compile Times: deser avoids excessive monomorphization by encouraging dynamic dispatch.
  • Unlimited Recursion: the real world is nasty and incoming data might be badly nested. Do not exhaust the call stack no matter how deep your data is. It accomplishes this by an alternative trait design to serde where handles to "sinks" or "serializable" objects are returned. This means that it's up to the caller to manage the recursion.
  • Simple Data Model: deser simplifies the data model on the serialization and deserialization interface. For instance instead of making a distinction between u8 and u64 they are represented the same in the model. To compensate for this, it provides type descriptors that provide auxiliary information for when a serializer wants to process it.
  • Meta Information: deser compensates the simplified data model with providing a space to hold meta information. This for instance can be used to automatically keep track of the "path" to the current structure during serialization and deserialization.
  • Native Byte Serialization: deser has built-in specialization for serializing bytes and byte vectors as distinct formats from slices and vectors.

Future Plans

  • Extensible Data Model: deser wants to make it possible to extend the data model with types that are not native to the serialization interface. For instance if a data format wants to support arbitrarily sized integers this should be possible without falling back to in-band signalling.

Crates

  • deser: the core crate providing the base functionality
  • deser-path: a crate that extends deser to track the path during serialization
  • deser-debug: formats a serializable to the std::fmt debug format

Inspiration

This crate heavily borrows from miniserde, serde and Sentry Relay's meta system. The general trait design was modelled after miniserde.

Safety

Deser (currently) uses excessive amounts of unsafe code internally. It is not vetted and it is likely completely wrong. If this design turns out to be useful there will be need to be a re-design of the internals.

License and Links

Comments
  • Add support for `#[deser(flatten)]`

    Add support for `#[deser(flatten)]`

    Hi, I am not sure if that is the correct way for feedback. I could not find if this is a feature being addressed or if there is a plan for providing something similar in deser. At least, I want to contribute it as a point to be aware of.

    serde has a long-standing issue about internal buffering which interacts with a lot features and tends to break deserialization unexpectedly: https://github.com/serde-rs/serde/issues/1183 (Example) Internal buffering is used to implement the flatten attribute, untagged and adjacently tagged enums.

    The problem in the example is that serde_json "knows" how to deserialize a HashMap<i64, i64> (by treating string keys as numbers), but due to buffering it turns into a Map<String, i64> which then cannot be deserialized. The problem is also that the Inner type works well on its own, but by being included in the Outer type it starts failing.

    design 
    opened by jonasbb 3
  • Redesign ignore

    Redesign ignore

    deser currently inherits the design for ignore from miniserde. It involves creating a mutable reference to some static Sink. Miri complains about this and it does sound dodgy. I definitely get miri failures from this when I use ignore and I was also able to reproduce the same issue in miniserde: https://github.com/dtolnay/miniserde/issues/24

    The solution for deser would be to embed a zero sized type Ignore directly in the SinkHandle like so:

    pub enum SinkHandle<'a> {
        Borrowed(&'a mut dyn Sink),
        Owned(Box<dyn Sink + 'a>),
        Null(Ignore),
    }
    

    Ignore can stay internal and SinkHandle gets a new method to create it (SinkHandle::null()) to replace SinkHandle::to(ignore()). This is also more convenient to use for a user.

    Size of the enum should stay the same I think.

    enhancement defect 
    opened by mitsuhiko 2
  • Consider moving finish() on Serialize

    Consider moving finish() on Serialize

    It seems a bit odd that the main purpose of finish on Serialize is to undo state changes for the nested emitters, but it can't be conditional on the state of the emitters. If the method were to move onto the emitters then the state of the emitter can be used to undo the state in the serializer.

    Relatedly recent changes now call finish for atoms as well on the deserializing sink. This seems wasteful.

    design 
    opened by mitsuhiko 1
  • Remove MapSink and SeqSink

    Remove MapSink and SeqSink

    This is a followup to #6. At the moment MapSink and SeqSink primarily exist because before the introduction of the SinkHandle it was impossible for a Sink to hold state. The solution inherited from miniserde was to create a map/seq sink when map/seq was called which in turn creates the boxed allocation and writes into the slot on finalization.

    Now that we can hold a boxed sink directly in the SinkHandle this indirection is not helpful any more. More than that, this indirection causes one new challenge which is that MapSink has its lifetime bound to the outer Sink which makes it hard to compose deserializers. For instance for flattening (#9) it would be nice to be able to inquire another sink about if its interested in a key. This basically requires that a sink also starts another sink so it can drive that one as well. With the current MapSink indirection this is not possible due to lifetimes.

    So the plan could be to inline the logic for the map and seq sinks directly onto the main sink. Example for BTreeMap:

    impl<K, V> Deserialize for BTreeMap<K, V>
    where
        K: Ord + Deserialize,
        V: Deserialize,
    {
        fn deserialize_into(out: &mut Option<Self>) -> SinkHandle {
            struct MapSink<'a, K: 'a, V: 'a> {
                slot: &'a mut Option<BTreeMap<K, V>>,
                map: BTreeMap<K, V>,
                key: Option<K>,
                value: Option<V>,
            }
    
            impl<'a, K, V> MapSink<'a, K, V>
            where
                K: Ord,
            {
                fn flush(&mut self) {
                    if let (Some(key), Some(value)) = (self.key.take(), self.value.take()) {
                        self.map.insert(key, value);
                    }
                }
            }
    
            impl<'a, K, V> Sink for MapSink<'a, K, V>
            where
                K: Ord + Deserialize,
                V: Deserialize,
            {
                fn map(&mut self, _state: &DeserializerState) -> Result<(), Error> {
                    Ok(())
                }
    
                fn key(&mut self) -> Result<SinkHandle, Error> {
                    self.flush();
                    Ok(Deserialize::deserialize_into(&mut self.key))
                }
    
                fn value(&mut self) -> Result<SinkHandle, Error> {
                    Ok(Deserialize::deserialize_into(&mut self.value))
                }
    
                fn finish(&mut self, _state: &DeserializerState) -> Result<(), Error> {
                    self.flush();
                    *self.slot = Some(take(&mut self.map));
                    Ok(())
                }
            }
    
            SinkHandle::boxed(MapSink {
                slot: out,
                map: BTreeMap::new(),
                key: None,
                value: None,
            })
        }
    }
    

    For structs one could then introduce a value_for_key method which could be reached through to implement flattening. In the following example the StructSink holds an inner struct which should be flattened:

    impl<'a, K, V> Sink for StructSink<'a, K, V>
    where
        K: Ord + Deserialize,
        V: Deserialize,
    {
        fn map(&mut self, _state: &DeserializerState) -> Result<(), Error> {
            Ok(())
        }
    
        fn key(&mut self) -> Result<SinkHandle, Error> {
            self.flush();
            Ok(Deserialize::deserialize_into(&mut self.key))
        }
    
        fn value(&mut self) -> Result<SinkHandle, Error> {
            let key = self.key.unwrap();
            if let Some(sink) = self.value_for_key(&key) {
                Ok(sink)
            } else {
                Ok(SinkHandle::null())
            }
        }
    
        fn value_for_key(&mut self, key: &str) -> Option<SinkHandle> {
            match key {
                "x" => Some(Deserialize::deserialize_into(&mut self.x)),
                "y" => Some(Deserialize::deserialize_into(&mut self.y)),
                other => self.nested.value_for_key(other),
            }
        }
    
        fn finish(&mut self, _state: &DeserializerState) -> Result<(), Error> {
            self.flush();
            *self.slot = Some(take(&mut self.map));
            Ok(())
        }
    }
    
    opened by mitsuhiko 1
  • Reconsider MapSink and SeqSink

    Reconsider MapSink and SeqSink

    Currently a Sink has two sub sinks MapSink and SeqSink which are returned from map and seq respectively. The challenge with these is that their lifetime has to be constrained to the sink itself which means that the MapSink and SeqSink returned can't reference data which is not on &mut self.

    This seems like a non issue but actually is problematic for where a in direction is needed during deserialization. For greater flexibility our Deserialize::deserialize_into method returns a SinkHandle which can also contain a boxed Sink. This means that the following piece of code does not work:

    impl<T: Deserialize> Sink for SlotWrapper<Option<T>> {
        fn map(&mut self, state: &DeserializerState) -> Result<Box<dyn MapSink + '_>, Error> {
            **self = Some(None);
            Deserialize::deserialize_into(self.as_mut().unwrap()).map(state)
        }
    }
    

    This fails as deserialize_into when it returns a SinkHandle::Owned cannot be held on to. The following code would compile:

    impl<T: Deserialize> Sink for SlotWrapper<Option<T>> {
        fn map(&mut self, state: &DeserializerState) -> Result<Box<dyn MapSink + '_>, Error> {
            **self = Some(None);
            let handle = Deserialize::deserialize_into(self.as_mut().unwrap());
            match handle {
                SinkHandle::Borrowed(handle) => handle.map(state),
                SinkHandle::Owned(_) => panic!(),
            }
        }
    }
    

    The challenge now is how does one come up with a better design which retains the general usefulness of the SinkHandle. Currently this complexity is largely pushed into the highly unsafe internals of the Driver but for quite a few sinks this complexity is resurfacing. What's worse though is that there is really no safe way in which one can fulfill the contract of a MapSink + '_ while holding on to a SinkHandle::Owned without violating some rules.

    defect 
    opened by mitsuhiko 1
  • Implement skip_serializing_if

    Implement skip_serializing_if

    This is the most common attribute we currently use in serde. This relates to #3 however one of the most common cases is to skip the serialization of defaulted or nullable values.

    enhancement 
    opened by mitsuhiko 1
  • Implement Option Handling

    Implement Option Handling

    Currently optional values are not supported yet. This requires a bit of an explicit design consideration as for many situations users want different behavior with regards to how undefined/missing and null values are handled. Likewise there should be some consideration about whether optional values can automatically be skipped on serialization if desired.

    enhancement design 
    opened by mitsuhiko 1
  • Add benchmarks

    Add benchmarks

    This adds the benchmarks from miniserde. Results at the time of writing on an M1 mac:

    test bench_deserialize_deser_json ... bench:   1,757,745 ns/iter (+/- 24,935)
    test bench_deserialize_miniserde  ... bench:     779,037 ns/iter (+/- 28,255)
    test bench_deserialize_serdejson  ... bench:     691,487 ns/iter (+/- 20,234)
    test bench_serialize_deser_json   ... bench:   1,311,733 ns/iter (+/- 14,458)
    test bench_serialize_miniserde    ... bench:     493,408 ns/iter (+/- 7,772)
    test bench_serialize_serdejson    ... bench:     321,022 ns/iter (+/- 6,945)
    

    Refs #34

    opened by mitsuhiko 0
  • Try to avoid an allocation for struct keys

    Try to avoid an allocation for struct keys

    This avoids boxing the key for structs during serialization. There might be better ways to do this later that does not involve managing so many vectors there and blowing up the size of the SerializableOnStack significantly.

    Refs #34

    opened by mitsuhiko 0
  • Enable Default Optionals in Deserialization

    Enable Default Optionals in Deserialization

    Currently optionals on deserialization are only optional if #[deser(default)] is set. This is unexpected coming from serde. This should probably be changed and a separate flag could be provided if this is not intended.

    design 
    opened by mitsuhiko 0
  • Deriving requires traits in scope

    Deriving requires traits in scope

    Currently if one does not have the right traits in scope deriving fails:

    error[E0599]: no method named `finish` found for struct `Attributes` in the current scope
       --> src/main.rs:1:10
        |
    1   | #[derive(deser::Serialize, deser::Deserialize)]
        |          ^^^^^^^^^^^^^^^^ method not found in `Attributes`
    
    bug 
    opened by mitsuhiko 0
  • Spanned elements

    Spanned elements

    One thing which might be worth thinking about is whether the difference in data model allow deser to implement Spanned, In the following comment dtolnay says serde's model is not necessarily byte oriented. https://github.com/serde-rs/serde/issues/1811#issuecomment-629595336

    if deser's model is, perhaps it could be worth including.

    A few serde serializer/deserializer implementations implement it directly.

    https://crates.io/crates/json-spanned-value https://docs.rs/toml/latest/toml/struct.Spanned.html

    But even with these, there are for instance odd interactions with Default. Presumably a span for a default element could be one of None for an optional span type, or alternately the Span of any enclosing element?

    Anyhow, it seemed worth thinking about perhaps, but doesn't seem to be essential.

    opened by ratmice 1
  • Remove current descriptors

    Remove current descriptors

    This PR removes the current descriptor support. I'm not sure yet if I want to remove it or try to replace it with a better alternative right away, but as descriptors are right now they are not very useful.

    opened by mitsuhiko 0
  • Experimental string tunneling and new extension system

    Experimental string tunneling and new extension system

    This is an experimental implementation for #39.

    Generally right now the cost of the extension system is prohibitive runtime wise which is why I'm not entirely sure something like this is the correct approach. This also does not address serialization where the open question is whether this is the responsibility of the serializer or the serializable.

    opened by mitsuhiko 0
  • Numbers in JSON Serializer Keys

    Numbers in JSON Serializer Keys

    It's not clear right now how numbers as key in JSON are best to be supported. Right now this would fail as the number serializer and deserializer does not accept numbers encoded in strings. However that's effectively what one would need to do to support integers as hash maps.

    This problem is somewhat tricky because there is no way right not to customize behavior within nested structures. The only customization is really only available on the level of the derive. So this hypothetical example does not work:

    #[derive(Serialize, Deserialize)]
    pub struct MyThing {
        map: HashMap<#[deser(as_string)] u32, bool>,
    }
    

    One hypothetical option would be to make the concept of "funnel through string" a property of the serialization system. In that case the u32 serializer and deserializer could probe the state to figure out if the current context requires supporting deserializing from a string. Something like this:

    impl Sink for SlotWrapper<u32> {
        fn atom(&mut self, atom: Atom, state: &DeserializerState) -> Result<(), Error> {
            match atom {
                Atom::Str(s) if state.uses_string_tunneling() => {
                    if let Ok(value) = s.parse() {
                        **self = Some(value);
                    } else {
                        Err(Error::new(
                            ErrorKind::Unexpected,
                            "invalid value for number",
                        ))
                    }
                }
                Atom::U64(value) => {
                    let truncated = value as u32;
                    if truncated as u64 == value {
                        **self = Some(truncated);
                        Ok(())
                    } else {
                        Err(Error::new(
                            ErrorKind::OutOfRange,
                            "value out of range for type",
                        ))
                    }
                }
                Atom::I64(value) => {
                    let truncated = value as u32;
                    if truncated as i64 == value {
                        **self = Some(truncated);
                        Ok(())
                    } else {
                        Err(Error::new(
                            ErrorKind::OutOfRange,
                            "value out of range for type",
                        ))
                    }
                }
                other => self.unexpected_atom(other, state),
            }
        }
    }
    
    opened by mitsuhiko 0
  • Improve Performance

    Improve Performance

    The JSON serializer/deserializer currently demonstrates that the performance of the entire system is pretty absymal. Running the same benchmark as with serde/miniserde yields significantly worse results:

    running 6 tests
    test bench_deserialize_deser_json ... bench:   1,777,264 ns/iter (+/- 93,338)
    test bench_deserialize_miniserde  ... bench:     776,057 ns/iter (+/- 49,367)
    test bench_deserialize_serdejson  ... bench:     674,177 ns/iter (+/- 2,023)
    test bench_serialize_deser_json   ... bench:   1,471,595 ns/iter (+/- 53,628)
    test bench_serialize_miniserde    ... bench:     482,137 ns/iter (+/- 46,288)
    test bench_serialize_serdejson    ... bench:     317,567 ns/iter (+/- 23,359)
    
    enhancement defect design 
    opened by mitsuhiko 3
Owner
Armin Ronacher
Software developer and Open Source nut. Creator of the Flask framework. Engineering at @getsentry. Other things of interest: @pallets and @rust-lang
Armin Ronacher
Compiler & Interpreter for the (rather new and very experimental) Y programming language.

Y Lang Why would anyone build a new (and rather experimental) language with no real world use case. Design Y (pronounced as why) is based on the idea

Louis Meyer 8 Mar 5, 2023
Meet Rustacean GPT, an experimental project transforming OpenAi's GPT into a helpful, autonomous software engineer to support senior developers and simplify coding life! 🚀🤖🧠

Rustacean GPT Welcome, fellow coding enthusiasts! ?? ?? I am excited to introduce you to Rustacean GPT, my humble yet ambitious project that aims to t

Gary McDougall 3 May 10, 2023
An experimental implementation of gitbom in Rust

gitbom-rs "An experimental implementation of GitBOM in Rust" NOTICE: This project is still a work in progress and is not ready for any use beyond expe

GitBOM 9 Sep 1, 2022
An experimental logical language

Last Order Logic An experimental logical language. Based on paper Last Order Logic. Motivation In First Order Logic, the truth values of quantified ex

AdvancedResearch 5 Nov 9, 2021
An experimental Discord bot using Serenity.

VoidBot An experimental Discord bot using Serenity. Environment Variables Can be set with a .env file. DISCORD_TOKEN: The token for your bot. (require

null 1 May 21, 2022
Polydrive an experimental open source alternative to Google Drive

Polydrive is an experimental open source alternative to Google Drive. It allows users to synchronize their files on multiple devices.

null 3 Apr 20, 2022
A library and tool for automata and formal languages, inspired by JFLAP

Sugarcubes is a library and application for automata and formal languages. It is inspired by JFLAP, and is intended to eventually to be an alternative to JFLAP.

Henry Sloan 22 Nov 2, 2022
NSE is a rust cli binary and library for extracting real-time data from National Stock Exchange (India)

NSE Check out the sister projects NsePython and SaveKiteEnctoken which are Python & Javascript libraries to use the NSE and Zerodha APIs respectively

Techfane Technologies 4 Nov 28, 2022
A rust library for interacting with multiple Discord.com-compatible APIs and Gateways at the same time.

Chorus A rust library for interacting with (multiple) Spacebar-compatible APIs and Gateways (at the same time). Explore the docs » Report Bug · Reques

polyphony 4 Apr 30, 2023
A library to access BGPKIT Broker API and enable searching for BGP data archive files over time from public available data sources.

BGPKIT Broker BGPKIT Broker is a online data API service that allows users to search for publicly available BGP archive files by time, collector, proj

BGPKIT 10 Nov 30, 2022
`fugit` provides a comprehensive library of `Duration` and `Instant` for the handling of time in embedded systems, doing all it can at compile time.

fugit fugit provides a comprehensive library of Duration and Instant for the handling of time in embedded systems, doing all it can at compile time. T

Emil Fresk 40 Oct 2, 2022
Abstract GPU Project - The easiest and most ergonomic GPU library

Abstract GPU Project - The easiest and most ergonomic GPU library

LyricWulf 9 Nov 30, 2022
Ampseer examines reads in fastq format and identifies which multiplex PCR primer set was used to generate the SARS-CoV-2 sequencing library they are read from.

Ampseer examines reads in fastq format and identifies which multiplex PCR primer set was used to generate the SARS-CoV-2 sequencing library they are read from.

New England Biolabs Inc. 7 Nov 2, 2022
A library for decoding and encoding DirectDraw Surface files

A library for decoding and encoding DirectDraw Surface files. Currently handles decoding some uncompressed DX9 formats, as well as DXT1-5. Supports encoding in the A8R8G8B8 format. Support for cubemaps and volumes, as well as DX10 is planned.

Shengqiu Li 1 Feb 18, 2022
A library for parsing and generating ESP-IDF partition tables

esp-idf-part A library for parsing and generating ESP-IDF partition tables. Supports parsing from and generating to both CSV and binary formats. This

esp-rs 5 Nov 16, 2022
A highly modular Bitcoin Lightning library written in Rust. Its Rust-Lightning, not Rusty's Lightning!

Rust-Lightning is a Bitcoin Lightning library written in Rust. The main crate, lightning, does not handle networking, persistence, or any other I/O. Thus, it is runtime-agnostic, but users must implement basic networking logic, chain interactions, and disk storage. More information is available in the About section.

Lightning Dev Kit 850 Jan 3, 2023
A Rust proc-macro crate which derives functions to compile and parse back enums and structs to and from a bytecode representation

Bytecode A simple way to derive bytecode for you Enums and Structs. What is this This is a crate that provides a proc macro which will derive bytecode

null 4 Sep 3, 2022
Rust library that can be reset if you think it's slow

GoodbyeKT Rust library that can be reset if you think it's slow

null 39 Jun 16, 2022
Notion Offical API client library for rust

Notion API client library for rust.

Jake Swenson 65 Dec 26, 2022