Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arbitrary-precision numerics support #416

Merged
merged 30 commits into from Mar 21, 2018

Conversation

alexreg
Copy link
Contributor

@alexreg alexreg commented Mar 2, 2018

Added support for arbitrary-precision numerics, in a similar way that the toml crate does for date-times (using an internal special struct).

@alexreg
Copy link
Contributor Author

alexreg commented Mar 2, 2018

@dtolnay The main outstanding issue is the following test failure when the feature is on. (All tests pass when the feature is off.)

---- src/value/mod.rs - value::Value::is_f64 (line 598) stdout ----
	thread 'rustc' panicked at 'test executable failed:

thread 'main' panicked at 'assertion failed: !v["b"].is_f64()', src/value/mod.rs:11:1
note: Run with `RUST_BACKTRACE=1` for a backtrace.

', librustdoc/test.rs:330:17
note: Run with `RUST_BACKTRACE=1` for a backtrace.

This is of course due to the fact that the old logic for as_f64 does return Some for a PosInt or NegInt value, but the parse logic for arbitrary precision numerics does. (I presume you remember discussing this briefly on IRC.) So, we can either add some hacky logic to check to check if a string is "float-like" (this would be more than just checking for a decimal point, because of exponential notation), or break backwards compatibility. What do you think?

Once this is done, and I can do a quick rebase on master, I think we're ready for a proper full review.

@alexreg alexreg force-pushed the arbitrary-precision branch 2 times, most recently from 8435ba8 to 20d55dd Compare March 3, 2018 18:43
@alexreg alexreg changed the title WIP: Arbitrary-precision numerics support Arbitrary-precision numerics support Mar 3, 2018
@alexreg
Copy link
Contributor Author

alexreg commented Mar 3, 2018

Okay, I decided to resolve that in a backwards-compatible manner, in the simplest way I thought possible.

Ready for review now, @dtolnay!

Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I began looking through this and everything looks great so far. I continue to be on board with this approach. Will try to get through the rest by ~midweek.

Could you check on why the Rust 1.15.0 build is failing in Travis?

@dtolnay
Copy link
Member

dtolnay commented Mar 4, 2018

I agree with the decision to conservatively support is_f64 as close to backward compatibly as possible.

@alexreg
Copy link
Contributor Author

alexreg commented Mar 4, 2018

Thanks! I began looking through this and everything looks great so far. I continue to be on board with this approach. Will try to get through the rest by ~midweek.

Could you check on why the Rust 1.15.0 build is failing in Travis?

Okay, glad to hear!

The Rust 1.15 failure was due to a currently stable feature not being stable back then (error: struct field shorthands are unstable). I've fixed that with a new commit.

The clippy lint causing the failure for the nightly build is confusing me. What do you want to do about that?

@dtolnay
Copy link
Member

dtolnay commented Mar 8, 2018

This one?

error: redundant closure found
    --> src/ser.rs:1554:52
     |
1554 |         writer.write_all(value.as_bytes()).map_err(|e| Error::io(e))
     |                                                    ^^^^^^^^^^^^^^^^ help: remove closure as shown: `Error::io`
     |
     = note: #[deny(redundant_closure)] implied by #[deny(clippy)]
     = help: for further information visit https://rust-lang-nursery.github.io/rust-clippy/v0.0.187/index.html#redundant_closure

We should follow what it says and use map_err(Error::io).

The 1.15.0 build is going to need another look.

src/number.rs Outdated
{
for c in self.n.chars() {
if c == '.' || c == 'e' || c == 'E' {
return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also check that parse::<f64>() is finite. For example 1e999 should not be considered representable as f64.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/number.rs Outdated
#[cfg(feature = "arbitrary_precision")]
{
let mut buf = Vec::new();
dtoa::write(&mut buf, f).ok()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be an unwrap() because Vec<u8> as io::Write never fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point.

src/number.rs Outdated
{
let mut buf = Vec::new();
dtoa::write(&mut buf, f).ok()?;
String::from_utf8(buf).ok()?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be an unwrap() because the dtoa representation of a float is always UTF-8.

src/number.rs Outdated
where
V: de::Visitor<'de>,
{
visitor.visit_str(&self.n)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an intentional behavior change or required somewhere else in the PR? The following fails without arbitrary_precision but gives you a string with arbitrary_precision.

println!("{:?}", String::deserialize(Number::from(0)).unwrap());

We have tended to stay away from such implicit type conversions because they make other functionality fail in surprising ways, for example deserializing to the following untagged enum would produce the string variant even though the input is a number.

#[derive(Deserialize)]
#[serde(untagged)]
enum E {
    S(String),
    N(i64),
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. It was so long ago I originally wrote this, I don't fully remember my motivations, but I think it was mainly for convenience. Your point is well taken though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the methods that are resulting in a behavior change.

src/number.rs Outdated
}
}

impl<'de, 'a> Deserializer<'de> for &'a Number {
impl<'de, 'a: 'de> Deserializer<'de> for &'a Number {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required? Looks like a breaking change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly yes. This is the most liberal bound that allows for it to compile. We could make the bound dependent on the feature, I suppose, but that might cause confusion for users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supposing that we cannot make this breaking change (whether or not it depends on arbitrary_precision), is there any other way to write this impl?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to need to be fixed. We cannot release with this breaking change.

@alexreg
Copy link
Contributor Author

alexreg commented Mar 9, 2018

We should follow what it says and use map_err(Error::io).

The 1.15.0 build is going to need another look.

Oh, I misunderstood that hint. It shouldn't say "remove closure", but rather "change closure to"... anything, that's an issue for the clippy people.

@alexreg
Copy link
Contributor Author

alexreg commented Mar 9, 2018

Okay, I've pushed a new revision, which should resolve your above concerns. I suspect everything is in order now, but I'm getting some strange errors on testing now, even with the old version (previously confirmed working).

@dtolnay
Copy link
Member

dtolnay commented Mar 9, 2018

The "unnecessary parentheses" warnings should go away with a cargo update. The "expected expression" failures are from a change in the nightly compiler's diagnostic output and are fixed if you rebase.

@alexreg
Copy link
Contributor Author

alexreg commented Mar 9, 2018

Okay, that worked. (Had to clobber the target/ directories too, which was an issue.)

Thanks for the review. LGTM? 😀

(And maybe roll a new release soon, if you think things are in a good place?)

src/number.rs Outdated
{
let value = visitor.next_key::<NumberKey>()?;
if value.is_none() {
return Err(de::Error::custom("number key was not found"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the map being visited does not have the right key, I think we need to assume not that "they probably misspelled the $__serde_private_number field name" but that "they must be deserializing something that is not a JSON number." As such, would it be possible to show exactly the same error as they would have gotten without the arbitrary_precision feature?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, sounds like a good idea.

src/number.rs Outdated
fn deserialize_any<V>(self, visitor: V) -> Result<V::Value, Error>
where V: Visitor<'de>
{
visitor.visit_map(NumberDeserializer {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to be somewhat more cautious here to avoid breaking existing code. For example in the untagged enum case which relies on deserialize_any:

#[derive(Deserialize, Debug)]
#[serde(untagged)]
enum E {
    N(i64),
}

fn main() {
    println!("{:?}", E::deserialize(Number::from(0)).unwrap());
}

One approach would be to cascade a sequence of checks: if self.n can parse as u64, use visit_u64; if self.n can parse as i64, use visit_i64; if self.n parsed to f64 and then to_string'd gives you the same string as self.n, use visit_f64; only otherwise use visit_map.

That way for the most part only code that relies on large numbers (which previously failed to deserialize into serde_json::Number) would be affected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is there something special about the way untagged enums work that causes this behaviour? It doesn't seem like the most elegant solution, but I'll have a go at this fix.

src/number.rs Outdated
}
}

impl<'de, 'a> Deserializer<'de> for &'a Number {
impl<'de, 'a: 'de> Deserializer<'de> for &'a Number {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to need to be fixed. We cannot release with this breaking change.

src/number.rs Outdated
fn deserialize_any<V>(self, visitor: V) -> Result<V::Value, Error>
where V: Visitor<'de>
{
visitor.visit_map(NumberDeserializer {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as <Number as Deserializer<'de>>::deserialize_any.

src/number.rs Outdated
where
V: de::Visitor<'de>,
{
visitor.visit_str(&self.n)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the methods that are resulting in a behavior change.

src/number.rs Outdated
} else {
Number { n: N::PosInt(i as u64) }
}
impl FromPrimitive for Number {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The num-traits crate is not stable yet so we cannot have these trait impls in the serde_json public API. Please find a different way to implement this.

Copy link
Contributor Author

@alexreg alexreg Mar 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just hide the impls then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to not have these impls in the public API even if we hide them. Is it possible to implement a different way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, although FromPrimitive/ToPrimitive does the heavy lifting for us, which is quite nice. I also don't see what's "unstable" about them... is the fact the version is 0.2 really that bad? Serde was 0.x not so long ago, after all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I don't understand what heavy lifting you are referring to. Just looking at the current implementation of the From impls it looks like making them work in the arbitrary_precision case should be not much more than 1 line of code: Number { n: i.to_string() }.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has not been addressed yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment in the main thread.

src/number.rs Outdated
}
}

impl ToPrimitive for Number {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto num-traits is not stable yet.

@alexreg
Copy link
Contributor Author

alexreg commented Mar 9, 2018

I don't see any way to get rid of the 'a: 'de bound above... do you?

@dtolnay
Copy link
Member

dtolnay commented Mar 9, 2018

Yes, the NumberDeserializer struct should not need a lifetime in the first place because the implementation eventually calls to_owned() to make an owned string anyway. If you implement struct NumberDeserializer without a lifetime and have it store Option<String>, you no longer need to make that breaking change.

@alexreg
Copy link
Contributor Author

alexreg commented Mar 9, 2018

Ah, fair point that. I presume you're suggesting Option<String> so that one can then do number.take().unwrap()?

@alexreg
Copy link
Contributor Author

alexreg commented Mar 10, 2018

@dtolnay All should be in order now, except for the issue with num-traits. I await your response re. my above comment. :-)

src/number.rs Outdated
where
V: de::DeserializeSeed<'de>,
{
seed.deserialize(self.number.to_owned().take().unwrap().into_deserializer())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may or may not be doing what you meant -- the to_owned() call resolves to impl<T> ToOwned for T where T: Clone so basically this is equivalent to self.number.clone().take().unwrap().into_deserializer(). Either the clone should not be there or the take should not be there. Seems likely that you intended to take but not clone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the to_owned() call was just a relic from the old code using a Cow value.

src/number.rs Outdated
// Not public API. Should be pub(crate).
#[doc(hidden)]
pub struct NumberDeserializer {
pub visited: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that self.number is an Option, this visited flag is redundant. The next_key_seed method can check self.number.is_some() instead.

src/value/de.rs Outdated
V: Visitor<'de>,
{
match self {
Value::Number(n) => n.deserialize_str(visitor),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed in the impl Deserializer for Number case, let's not implement these implicit conversions from number to string.

src/value/de.rs Outdated
V: Visitor<'de>,
{
match *self {
Value::Number(ref n) => n.deserialize_str(visitor),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto comment about not implementing implicit conversions from number to string.

src/number.rs Outdated
}

macro_rules! from_unsigned {
($($unsigned_ty:ident)*) => {
impl ToPrimitive for Number {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the ToPrimitive impl needed by any other code in the PR or is it here for symmetry with FromPrimitive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly for symmetry, and because I think it would be nice for consumers. I really don't see the problem with implementing FromPrimitive and ToPrimitive... people who use num-traits will appreciate it, in fact.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People who use num-traits 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, or 0.9 may not appreciate it as much as you think. Look at how much people love the Serde 0.8 impls in the url crate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, fair point. But we'll still have the num-traits dependency, as before. Its usage just won't be exposed. Are you okay with that?

src/value/de.rs Outdated
visitor: V) -> Result<V::Value, Error>
where V: Visitor<'de>,
{
if name == SERDE_STRUCT_NAME && fields == &[SERDE_STRUCT_FIELD_NAME] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this case here because the Deserialize impls for Number and Value both call deserialize_any, not deserialize_struct (which is fine). AIUI nothing would ever hit this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has not been addressed yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

tests/test.rs Outdated
);
]);

if cfg!(not(feature = "arbitrary_precision")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are also going to want to add some test cases that test the arbitrary precision behavior, i.e. tests that pass only if arbitrary_precision is enabled. What use cases do you think would be valuable to test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has not been addressed yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, will add these. A few integers and floats that are too big for u64/i64/f64 should be good enough, I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to be addressed.

Copy link
Contributor Author

@alexreg alexreg Mar 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It essentially has been: there is a test_parse_arbitrary_precision_number fn in tests, it just has code commented out for the time being. Hope it looks alright to you.

src/de.rs Outdated
visitor: V
) -> Result<V::Value>
where
V: de::Visitor<'de>,
{
#[cfg(feature = "arbitrary_precision")]
{
if name == SERDE_STRUCT_NAME && fields == &[SERDE_STRUCT_FIELD_NAME] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is reachable because nothing calls deserialize_struct with this name and field names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure, but I think you're right. Removed now.

src/de.rs Outdated
@@ -184,7 +198,7 @@ impl<'de, R: Read<'de>> Deserializer<R> {
}

#[cold]
fn peek_invalid_type(&mut self, exp: &Expected) -> Error {
fn peek_invalid_type(&mut self, exp: &Expected, prim: bool) -> Error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am reading this correctly, there are 8 calls to peek_invalid_type and all of them pass prim=false. So do we need the new parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, an oversight of mine. Gone now.

src/de.rs Outdated
@@ -240,7 +254,7 @@ impl<'de, R: Read<'de>> Deserializer<R> {
self.fix_position(err)
}

fn deserialize_number<V>(&mut self, visitor: V) -> Result<V::Value>
fn deserialize_number<V>(&mut self, visitor: V, prim: bool) -> Result<V::Value>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this method is only called when deserializing a primitive so it shouldn't need the new prim parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That wasn't true before, but now that I've removed the block you suggested above in deserialize_struct, it should be fine to get rid of.

Cargo.toml Outdated
@@ -37,3 +37,5 @@ default = []
# This allows data to be read into a Value and written back to a JSON string
# while preserving the order of map keys in the input.
preserve_order = ["linked-hash-map"]

arbitrary_precision = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment explaining what is (and maybe what is not) guaranteed when you enable this feature.

Copy link
Contributor Author

@alexreg alexreg Mar 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like the following?

# Use `String` to represent numbers internally, thus allowing the serialization
# and deserialization of numbers of arbitrary size/precision, instead of just
# the primitive integers and floats.
# This feature does not provide provide the ability to do arithmetic on
# arbitrary-precision numbers; this is left to the discretion of the consumer.

@alexreg
Copy link
Contributor Author

alexreg commented Mar 10, 2018

@dtolnay The above review should be resolved now, modulo one comment.

@dtolnay
Copy link
Member

dtolnay commented Mar 11, 2018

You can't fool me. 😜 There are 4 comments still visible in https://github.com/serde-rs/json/pull/416/files that all need to be addressed. I will respond to them individually. You need to expand all the files and search for dtolnay.

@dtolnay
Copy link
Member

dtolnay commented Mar 11, 2018

I still need to review the serialization side -- src/ser.rs and src/value/ser.rs -- as well as the fn scan_... part of src/de.rs.

src/de.rs Outdated
fn parse_any_number(
&mut self,
pos: bool,
prim: bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the callers of this function pass a boolean literal value for prim. I think this would be simpler if we remove the prim argument and callers that need a primitive value call parse_integer directly instead of calling parse_any_number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Since the changes in the code during the review, this makes more sense now.

src/value/ser.rs Outdated
@@ -247,23 +266,33 @@ impl serde::Serializer for Serializer {
}
}

/// Not public API. Should be pub(crate).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this and the following few similar comments are meaningful because these types are effectively already pub(crate). They are not accessible from outside of serde_json. Please remove.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I can get rid of the doc(hidden) attributes too...

where
T: Serialize,
{
Err(key_must_be_a_string())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended to be invalid_number() like the other serializer methods here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so. Not 100% sure, since I can't remember my motivation for ever writing that... or if it was sloppy copy & paste coding, perhaps.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

src/value/ser.rs Outdated
}
}

impl serde::ser::SerializeStructVariant for SerializeStructVariant {
type Ok = Value;
type Error = Error;

fn serialize_field<T: ?Sized>(&mut self, key: &'static str, value: &T) -> Result<(), Error>
fn serialize_field<T: ?Sized>(&mut self, key: &'static str, value: &T) -> Result<(), Self::Error>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not to make this change in this PR. The existing code is inconsistent but let's leave it to a later PR to pick one usage and standardize everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, sure. I'll leave that to someone else. ;-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this and similar unnecessary formatting changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I thought this was in code I introduced for some reason. Reverted, and left the inconsistency as-is.

src/value/ser.rs Outdated
fn serialize_str(self, value: &str) -> Result<Self::Ok, Self::Error> {
let n = Number::from_string_unchecked(value.to_owned());
*self.0 = Some(Value::Number(n));
Ok(())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be clearer if we use the Self::Ok return value to hand back the resulting Number. The caller can take care of placing that number where they want it.

Ok(Number::from_string...)

So-called "out parameters" like &'a mut Option<Value> are not idiomatic in Rust when we can help it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Not sure why I did it this way originally... possibly my knowledge of Rust/Serde wasn't very good when I first wrote this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I believe I still need to use an "out value" internally for impl serde::ser::SerializeStruct for SerializeMap.

src/value/ser.rs Outdated
}

fn serialize_str(self, value: &str) -> Result<Self::Ok, Self::Error> {
let n = Number::from_string_unchecked(value.to_owned());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this from_string_unchecked method was copied from the specialization-based implementation. But in the specialization case it was not necessary to check the string because we could know it came from a number. Here that is not the case and checking would be sensible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would also make sense to check in the NumberFromString Visitor in the number module?

Anyway, how do you recommend doing this check? I think we should add a FromStr implementation for Number to be honest, which does the checking... perhaps using the scan_integer function? Although this would necessitate the construction of a Deserializer value. If that doesn't seem to "heavy" an approach for you, I can give it a go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it seems the use of from_string_unchecked is okay here. Wouldn't test_parse_number_errors fail otherwise?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of checking would be to uphold the invariant that in safe code you cannot get a Number holding anything other than what we consider a JSON number. This allows other code in serde_json to assume that the string inside of a Number contains something that is a number.

#[macro_use]
extern crate serde_derive;

extern crate serde_json;

#[derive(Serialize)]
#[serde(rename = "$__serde_private_Number")]
struct S {
    #[serde(rename = "$__serde_private_number")]
    f: &'static str,
}

fn main() {
    println!("{:#?}", serde_json::to_value(&S { f: "serde" }).unwrap());
}

A rename attribute is the simplest way to illustrate the failure mode; a more realistic example would be transcoding from RON to JSON in which malicious RON input fed by an attacker is able to trigger exploitable undefined behavior by violating assumptions in downstream code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dtolnay What were your thoughts on my above comment by the way?

Anyway, how do you recommend doing this check? I think we should add a FromStr implementation for Number to be honest, which does the checking... perhaps using the scan_integer function? Although this would necessitate the construction of a Deserializer value. If that doesn't seem to "heavy" an approach for you, I can give it a go.

I could even make it simpler and just do value::from_str and extract the Number from that... but again, is that too heavy? Would appreciate your thoughts here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not consider it heavy to construct a Deserializer. Implementing FromStr for Number in terms of scan seems like it would work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll give that a go, ta.

src/value/ser.rs Outdated
next_key: None,
},
)
Ok(SerializeMap::Map {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert unnecessary whitespace changes like this one. We can run the new rustfmt after the PR lands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not sure how that happened!

src/value/ser.rs Outdated
Ok(())
},
#[cfg(feature = "arbitrary_precision")]
SerializeMap::Number { .. } => panic!(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this reachable? A malicious input should not be able to trigger a panic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's not reachable AFAIK. I should probably change to unreachable!().

src/ser.rs Outdated
@@ -346,24 +361,20 @@ where
.end_array(&mut self.writer)
.map_err(Error::io)
);
Ok(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert these formatting changes. If rustfmt is formatting this code differently these days, we can run rustfmt separately from this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/value/ser.rs Outdated
}
}

impl serde::ser::SerializeStruct for SerializeMap {
type Ok = Value;
type Error = Error;

fn serialize_field<T: ?Sized>(&mut self, key: &'static str, value: &T) -> Result<(), Error>
fn serialize_field<T: ?Sized>(&mut self, key: &'static str, value: &T) -> Result<(), Self::Error>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert anywhere that you changed Error to Self::Error or vice versa. We can leave it to a later PR to pick one usage and standardize everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I'm having a hard time spotting where I did this, so I think you'll just have to continue pointing them out, I'm afraid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/value/ser.rs Outdated
@@ -404,16 +631,186 @@ impl serde::ser::SerializeStructVariant for SerializeStructVariant {
where
T: Serialize,
{
self.map
.insert(String::from(key), try!(to_value(&value)));
self.map.insert(String::from(key), try!(to_value(&value)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this formatting change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

tests/test.rs Outdated
@@ -568,6 +569,19 @@ fn test_write_newtype_struct() {
test_encode_ok(&[(outer, r#"{"outer":{"inner":123}}"#)]);
}

#[test]
fn test_deserialize_number_to_untagged_enum() {
use serde_json::Number;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already imported on line 42.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

src/value/ser.rs Outdated
}

fn end(self) -> Result<Value, Error> {
serde::ser::SerializeMap::end(self)
fn end(self) -> Result<Self::Ok, Self::Error> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert the change in this method signature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/value/ser.rs Outdated
Ok(())
}

fn end(self) -> Result<Value, Error> {
fn end(self) -> Result<Self::Ok, Self::Error> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@alexreg
Copy link
Contributor Author

alexreg commented Mar 18, 2018

@dtolnay Okay, I believe all outstanding issues are in order now! I think we're very close to being able to merge. Please tell me I'm right. 😉

@shepmaster shepmaster mentioned this pull request Mar 19, 2018
type Err = Error;

fn from_str(s: &str) -> Result<Self, Self::Err> {
super::de::Deserializer::from_str(s).parse_any_signed_number().map(|n| n.into())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this act as much as possible like the FromStr for primitive types. For example the leading whitespace behavior should be the same. Compare:

extern crate serde_json;

fn main() {
    println!("{}", " 0".parse::<serde_json::Number>().unwrap());
    println!("{}", " 0".parse::<u64>().unwrap_err());
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll give it a go. Unfortunately, the ui tests are failing again now...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. As for the primitive types, leading and trailing whitespace now cause errors.

src/number.rs Outdated
formatter.write_str("string containing a number")
}

fn visit_string<E>(self, s: String) -> Result<NumberFromString, E>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the Visitor::visit_string documentation:

It is never correct to implement visit_string without implementing visit_str.

In this case I think you want just visit_str because the implementation does not benefit from taking ownership of the String data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point.

@alexreg
Copy link
Contributor Author

alexreg commented Mar 19, 2018

@dtolnay Question: should we implement the experimental TryFrom trait for floats to Numbers?

@alexreg
Copy link
Contributor Author

alexreg commented Mar 20, 2018

@dtolnay How are we looking now?

src/ser.rs Outdated
@@ -1304,6 +1554,14 @@ pub trait Formatter {
dtoa::write(writer, value).map(|_| ())
}

/// Writes a number that has already been rendered to a string.
#[inline]
fn write_number_str<W: ?Sized>(&mut self, writer: &mut W, value: &str) -> Result<()>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The returned Result type should be io::Result instead of serde_json::Result to be consistent with the rest of the Formatter methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

src/ser.rs Outdated
fn serialize_some<T: ?Sized>(self, _value: &T) -> Result<Self::Ok>
where T: Serialize
{
Err(key_must_be_a_string())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we saw this somewhere else before, the error here is inconsistent with the invalid_number() in all the other methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, copy & paste coding. :-P

@dtolnay
Copy link
Member

dtolnay commented Mar 20, 2018

Question: should we implement the experimental TryFrom trait for floats to Numbers?

I would not do this yet until someone has a compelling real use case that benefits from TryFrom.

@alexreg
Copy link
Contributor Author

alexreg commented Mar 20, 2018

@dtolnay Fair enough.

@dtolnay
Copy link
Member

dtolnay commented Mar 20, 2018 via email

@alexreg
Copy link
Contributor Author

alexreg commented Mar 20, 2018

Fix the two review comments first. :-)

Serves me right for replying to your comment before I open up the issue. (GitHub notifies me for normal replies, but not review comments.) All in order now... I think!

@dtolnay
Copy link
Member

dtolnay commented Mar 20, 2018

You may need to check the box for "Pull Request reviews" on https://github.com/settings/notifications.

@alexreg
Copy link
Contributor Author

alexreg commented Mar 20, 2018

@dtolnay Yep. I think I turned it off because I was getting spam from reviews on another project. Seems it can't be enabled on a per-PR/repo basis. Anyway, I think we're coming to the end of this review now. Let me know if there's anything else to do before we can get this merged, and I'll try to fix it all tonight!

Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks for sticking with this.

@dtolnay dtolnay merged commit 31bba4b into serde-rs:master Mar 21, 2018
@alexreg
Copy link
Contributor Author

alexreg commented Mar 21, 2018

No problem, @dtolnay. Thanks for your patience, and guiding me through. A review for such a large PR was never going to be simple.

Glad to see you already made a new release, as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants