Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolves #334 - Bumping Deserializer's default recursion limit to 1024 #346

Closed
wants to merge 1 commit into from

Conversation

Yoric
Copy link

@Yoric Yoric commented Aug 21, 2017

No description provided.

Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! From the issue you filed, it sounds like you are interested in increasing the recursion limit for your use case. If this is going to be configurable, we need to evaluate whether u8 is the right type. How do you know 255 is enough for your use case?

@Yoric
Copy link
Author

Yoric commented Aug 21, 2017

I do not have enough samples to be sure, but you are right, I suspect that 255 might not be sufficient.

@oli-obk
Copy link
Member

oli-obk commented Aug 21, 2017

Is there any advantage to using a small integer? The value is stored right next to a Vec. The u8 will probably make the Deserializer bigger by a pointer size. So we can just use usize.

@Yoric
Copy link
Author

Yoric commented Aug 21, 2017

Fine by me.

@dtolnay
Copy link
Member

dtolnay commented Aug 21, 2017

Usize seems fine. But we will need to check benchmarks before merging because such a change would affect even users who do not change the recursion limit.

@Yoric
Copy link
Author

Yoric commented Aug 28, 2017

@dtolnay How do you check this? Do you have benchmarks at hand?

@dtolnay
Copy link
Member

dtolnay commented Aug 28, 2017

I would look at json-benchmark before and after the usize change.

cargo build \
    --release \
    --bin json-benchmark \
    --no-default-features \
    --features lib-serde,parse-dom,parse-struct,all-files

@dtolnay dtolnay changed the title Resolves #334 - Implementing Serializer::with_recursion_limit Resolves #334 - Implementing Deserializer::with_recursion_limit Aug 28, 2017
@Yoric
Copy link
Author

Yoric commented Aug 29, 2017

Benchmark results on my machine:

Before change:

                                DOM                STRUCT
======= serde_json ======= parse|stringify === parse|stringify ===
data/canada.json          13.2ms    12.7ms     4.8ms     7.7ms
data/citm_catalog.json     7.7ms     1.5ms     2.3ms     1.0ms
data/twitter.json          2.7ms     0.8ms     1.4ms     0.7ms

======= json-rust ======== parse|stringify === parse|stringify ===
data/canada.json           8.9ms     4.4ms
data/citm_catalog.json     4.6ms     1.0ms
data/twitter.json          1.6ms     0.6ms

==== rustc_serialize ===== parse|stringify === parse|stringify ===
data/canada.json          22.4ms    48.0ms    27.1ms    64.8ms
data/citm_catalog.json    16.9ms     4.1ms    21.4ms     2.9ms
data/twitter.json          7.7ms     1.7ms     9.8ms     1.5ms

Changing u8 to u32:

                                DOM                STRUCT
======= serde_json ======= parse|stringify === parse|stringify ===
data/canada.json          13.5ms    12.7ms     4.8ms     7.7ms
data/citm_catalog.json     7.7ms     1.5ms     2.3ms     1.0ms
data/twitter.json          2.7ms     0.8ms     1.4ms     0.7ms

======= json-rust ======== parse|stringify === parse|stringify ===
data/canada.json           8.9ms     4.4ms
data/citm_catalog.json     4.7ms     1.0ms
data/twitter.json          1.6ms     0.6ms

==== rustc_serialize ===== parse|stringify === parse|stringify ===
data/canada.json          22.7ms    47.8ms    26.7ms    65.2ms
data/citm_catalog.json    17.0ms     4.3ms    21.4ms     2.9ms
data/twitter.json          7.7ms     1.7ms     9.9ms     1.6ms

There's apparently a regression of 0.3ms on parsing canada.json.

Changing u8 to usize:

                                DOM                STRUCT
======= serde_json ======= parse|stringify === parse|stringify ===
data/canada.json          13.1ms    12.5ms     4.7ms     7.7ms
data/citm_catalog.json     7.6ms     1.5ms     2.3ms     1.0ms
data/twitter.json          2.7ms     0.8ms     1.3ms     0.7ms

======= json-rust ======== parse|stringify === parse|stringify ===
data/canada.json           9.0ms     4.5ms
data/citm_catalog.json     4.7ms     1.0ms
data/twitter.json          1.6ms     0.6ms

==== rustc_serialize ===== parse|stringify === parse|stringify ===
data/canada.json          21.9ms    47.6ms    26.9ms    64.9ms
data/citm_catalog.json    17.0ms     4.2ms    21.4ms     2.9ms
data/twitter.json          7.8ms     1.7ms     9.9ms     1.6ms

which looks like a 0.1ms improvement on canada.json and citm_catalog.json.

So, I'd go ahead with usize, if that's ok with you, @dtolnay.

@dtolnay
Copy link
Member

dtolnay commented Aug 29, 2017

Yes let's go with usize. Thanks for taking care of that!

@Yoric
Copy link
Author

Yoric commented Aug 30, 2017

@dtolnay Ready when you are :)

@dtolnay
Copy link
Member

dtolnay commented Sep 3, 2017

I think the remaining things to figure out here are:

  • Naming convention for builder methods. In the standard library, typically with_ functions are constructors like Vec::with_capacity, and builder methods do not use a with_ prefix. Maybe fn recursion_limit would be more idiomatic.
  • Self by value or &mut. In Strengthen the recommendation to use &mut self builders rust-lang/api-guidelines#81 we observed that self-by-value builders are almost never ideal. If this is one of the exceptions, how do we justify it? In your use case, would you always be invoking this on the same line as the Deserializer constructor?
  • How to reconcile with an eventual serde_json::Format. I would feel more comfortable if we make more progress on Abstract over formats that deal with bytes serde#644 before merging this. That issue will involve some data structure that describes JSON as a format, including whether to use indentation when serializing and presumably the recursion limit for deserializing. It may turn out to be a better API to treat that data structure as a builder for serde_json::Deserializer, rather than having builder methods directly on Deserializer.

@Yoric
Copy link
Author

Yoric commented Sep 3, 2017

src/de.rs Outdated
}

impl<'de, R> Deserializer<R>
where
R: read::Read<'de>,
{
/// Create a JSON deserializer from one of the possible serde_json input
/// sources.
/// sources. By default, it has a recursion limit of 128.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should say 1024.

@dtolnay
Copy link
Member

dtolnay commented Sep 4, 2017

Could you check on the failing test case?

Benchmarks indicate that changing `remaining_depth` from `u8` to
`usize` doesn't cause any performance regression (on my computer, it
actually marginally improves performance), so we perform this change
and take the opportunity to bump the value to a more generous depth of 1024.

Resolves serde-rs#334 - Implementing Serializer::with_recursion_limit
@Yoric
Copy link
Author

Yoric commented Sep 4, 2017

@dtolnay Ah, I hadn't seen that test. Fixed.

@Yoric Yoric changed the title Resolves #334 - Implementing Deserializer::with_recursion_limit Resolves #334 - Bumping Deserializer's default recursion limit to 1024 Sep 4, 2017
@Yoric
Copy link
Author

Yoric commented Sep 4, 2017

Of course, I realize that this since the default value of 1024 actually blows up Rust's stack, the patch may be problematic.

@dtolnay
Copy link
Member

dtolnay commented Sep 5, 2017

Yes, the whole point of the recursion limit is to limit the ability of untrusted JSON input to crash your program. 😿 Are you able to get by for now with a fork of serde_json that exposes this setting? I can try to get things moving on serde-rs/serde#644 again.

@dtolnay
Copy link
Member

dtolnay commented Sep 8, 2017

Let's continue to track this in #334.

@dtolnay dtolnay closed this Sep 8, 2017
remexre added a commit to remexre/json that referenced this pull request Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants