Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a RawValue type #355

Closed
alteous opened this issue Sep 3, 2017 · 17 comments
Closed

Add a RawValue type #355

alteous opened this issue Sep 3, 2017 · 17 comments

Comments

@alteous
Copy link

alteous commented Sep 3, 2017

It would be helpful to have a type similar to Go's json.RawMessage that is not tokenized during deserialization, but rather its raw contents stored as a Vec<u8> or &'de [u8].

The following pseudo-code demonstrates the idea.

#[derive(Deserialize)]
struct Struct {
    /// Deserialized normally.
    core_data: Vec<i32>,

    /// Contents of `user_data` are copied / borrowed directly from the input
    /// with no modification.
    ///
    /// `RawValue<'static>` is akin to `Vec<u8>`.
    /// `RawValue<'a>` is akin to `&'a [u8]`.
    user_data: serde_json::RawValue<'static>,
}

fn main() {
    let json = r#"
    {
        "core_data": [1, 2, 3],
        "user_data": { "foo": {}, "bar": 123, "baz": "abc" }
    }
    "#;
    
    let s: Struct = serde_json::from_bytes(&json).unwrap();
    println!("{}", s.user_data); // "{ \"foo\": {}, \"bar\": 123, \"baz\": \"abc\" }"
}

The main advantage of this is would be to have 'lazily-deserialized' values.

@shepmaster
Copy link

@dtolnay mentions that the work happening in #416 might lay some groundwork for such a change to take place.

@alexreg
Copy link
Contributor

alexreg commented Mar 19, 2018

Yeah, this sounds useful. Ideally we could get some of the building blocks for this sort of thing (custom 'built-in' types) integrated into serde itself, in time.

@srijs
Copy link
Contributor

srijs commented Sep 9, 2018

I need this for a project I'm working on, and would be happy to give it shot following the same approach as #416. Is this something that you'd accept a PR for?

@dtolnay
Copy link
Member

dtolnay commented Sep 9, 2018

Yes I would like a PR for this. When you say you need this feature, is it because deserializing to serde_json::Value is too slow in your project and you are expecting this to have better performance, or some other reason?

@srijs
Copy link
Contributor

srijs commented Sep 9, 2018

Yeah, I've got an http web server that's backed by a postgres database which stores JSONB values, and would like to shuffle data back and forth (and embed into other payloads) with as few allocations as possible.

@dtolnay
Copy link
Member

dtolnay commented Sep 9, 2018

As a first step, please put together a short benchmark that you trust as reasonably representative of the shuffling data back and forth piece of your application. Ideally it would depend on serde / serde_derive / serde_json but nothing else -- so no postgres or http stuff. Focus on the data structures so that they have the same ratio and pattern of raw vs meaningful content as your real use case. Write it using serde_json::Value for now, and then the goal will be to swap in a raw value once it's implemented and observe a speedup.

@srijs
Copy link
Contributor

srijs commented Sep 13, 2018

Wrote a small microbenchmark here: https://gist.github.com/srijs/7fe21b43c2cf5d2ceeb593ae4591c6f5

Please let me know if this is what you had in mind!

@dtolnay
Copy link
Member

dtolnay commented Sep 13, 2018

👍 looks good, let's make it work.

@dtolnay
Copy link
Member

dtolnay commented Sep 20, 2018

I released one possible implementation of this in serde_json_experimental 1.0.29-rc1. Here is the documentation. I would be interested in hearing people's experience with this API before committing to it.

[dependencies]
serde_json_experimental = { version = "1.0.29-rc1", features = ["raw_value"] }

@shepmaster
Copy link

From the linked PR:

using the RawValue type instead of deserializing/serializing via Value results in a 3-4x speed-up.

test deserialize_and_serialize_raw_value ... bench:       1,759 ns/iter (+/- 354)
test deserialize_and_serialize_value     ... bench:       6,091 ns/iter (+/- 1,776)

@srijs
Copy link
Contributor

srijs commented Sep 20, 2018

@dtolnay will this be behind a feature flag when released in serde_json proper as well? I'm worried that it might be difficult to get other crates (such as postgres) to add support for RawValue when it's feature-gated like this.

@dtolnay
Copy link
Member

dtolnay commented Sep 20, 2018

I would like for this to be feature gated because it adds code to the critical path of applications even that do not use RawValue. In what ways do you see crates needing to add support for RawValue where the feature gate would be an obstacle?

@srijs
Copy link
Contributor

srijs commented Sep 20, 2018

That makes sense to me from a performance standpoint.

I'm not super familiar with how feature flags interact in cargo, but my concern is that in order to e.g. add an impl ToSql for RawValue, postgres needs to opt-in into the raw_value feature. What does that mean for crates wanting to use this impl? Do they need also need to enable the raw_value feature? Will they get type errors since the serde_json crate that postgres depends on (and exposes impls on) is be different from the serde_json dependency that their crate has?

On a different note, I've been giving the experimental release a shot in my project, and it's been working well so far. The only thing I ran into that wasn't easy to solve was converting a String into a Box<RawValue>. To solve this, I've used owning_ref for now, but that was a bit cumbersome and has it's own drawbacks. Do you think it might be worth adding a Box<str> -> Result<Box<RawValue>> method to cover this use-case?

@shepmaster
Copy link

Do they need also need to enable the raw_value feature

No; Cargo features are globally additive.

@dtolnay
Copy link
Member

dtolnay commented Sep 21, 2018

As @shepmaster wrote, I believe Cargo handles that situation correctly. If the postgres crate enables the feature and provides that impl, any crate that depends on postgres can use the impl.

Regarding Box<str> -> Result<Box<RawValue>>, is this needed due to a performance bottleneck? Otherwise you would write serde_json::from_str::<Box<RawValue>>(&string). It should be fine to provide something like the following constructor but I would want to know how much faster it is than from_str::<Box<RawValue>> in practice.

impl RawValue {
    pub fn from_string(json: String) -> Result<Box<Self>> {
        {
            let borrowed = ::from_str::<&Self>(&json)?;
            if borrowed.json.len() < json.len() {
                return Ok(borrowed.to_owned());
            }
        }
        Ok(Self::from_owned(json.into_boxed_str()))
    }
}

@srijs
Copy link
Contributor

srijs commented Sep 22, 2018

Thanks, I wasn't quite sure, but it sounds like cargo will do the right thing 👍

Yeah, I'd like to avoid the extra malloc + memcpy if possible. Haven't had a chanche to benchmark it yet, just wanted to prevent it from becoming a bottleneck for larger documents in the first place.

@dtolnay
Copy link
Member

dtolnay commented Sep 23, 2018

Released in 1.0.29: https://docs.rs/serde_json/1.0/serde_json/value/struct.RawValue.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants