Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rc<> gets deserialized incorrectly #194

Closed
shahn opened this issue Dec 8, 2015 · 31 comments
Closed

Rc<> gets deserialized incorrectly #194

shahn opened this issue Dec 8, 2015 · 31 comments
Assignees
Milestone

Comments

@shahn
Copy link

shahn commented Dec 8, 2015

The way serde currently treats ref-counted data structures means that it creates copies of the references objects. This is bad for several reasons:

  • Potentially increased memory usage
  • Equality comparison that relies on comparison of address breaks
  • Interior mutability is not reflected in copies
    and maybe more that I can't think of right away. I think there should at least be big warnings in the documentation about this behaviour, but the support should be either fixed or deprecated, imo.
@dylanede
Copy link

IMO this is kind of a big deal. Any word on progress towards dealing with this?

@shahn
Copy link
Author

shahn commented Feb 12, 2017

I agree. I think serde either needs to drop Rc support or add an annotation that users can specify (or something so people are aware of this issue). It's pretty hard to come up with a general solution that doesn't have other issues. In any case, this is a big backwards compatibility hazard because this will require breaking changes to fix.

@dtolnay
Copy link
Member

dtolnay commented Feb 12, 2017

I am open to dropping Serialize and Deserialize impls for Rc in the next breaking release, and requiring explicit serialize_with and deserialize_with to opt into serializing them.

@alexbool
Copy link

Whoa, I personally find Rc/Arc serialization scheme perfectly valid, probably we just have to stress this moment in the docs?

My application will suffer from removal of these implementations. I use serde for deserializing plain-old-data configs with Arc<String> and then clone that Arc throughout the application.
With this impl broken, I will have to either use the hypothetical opt-in mechanism (I don't know what it will be like), or deserialize plain String, then destructore config struct, move the string into Arc - the code is becoming more complex.

On the other hand, one can always write his own personal fancy deserializer if he wants interning.

@oli-obk
Copy link
Member

oli-obk commented Feb 14, 2017

@alexbool: the reverse argument is that you could use #[serde(deserialize_with="val_to_arc")]

Maybe we should provide a #[serde(deserialize_map="Arc::new")]

@alexbool
Copy link

@oli-obk while this is clearly possible, I still think this is not worth breaking projects

@oli-obk
Copy link
Member

oli-obk commented Feb 14, 2017

@alexbool It's not a silent breaking change, so everyone will notice and it will be in a major version bump which will break projects most likely anyway.

The reverse is a silent source of bugs. The advantage of breaking it is that in the future we can re-add it when/if we figure out a way to serialize/deserialize the duplication.

@alexbool
Copy link

@oli-obk what do you call "not silent"? Do we have a deprecation that is a compilation warning? At the moment I doubt so.

@oli-obk
Copy link
Member

oli-obk commented Feb 14, 2017

@alexbool The following will stop compiling, because you can only have fields that implement Serialize and Deserialize in a struct that derives these traits

#[derive(Deserialize, Serialize)]
struct Foo {
    a: i32,
    b: Arc<String>, //~ERROR Arc<String> does not implement Deserialize
}

@alexbool
Copy link

@oli-obk In my understanding, this is simply a breaking change, a silent one. No one is expecting that until we can throw a compilation warning some time before the actual breakage will occur.
By the way, is it possible to add such a warning now?

@oli-obk
Copy link
Member

oli-obk commented Feb 14, 2017

By the way, is it possible to add such a warning now?

#[deprecate] doesn't work on impls to the best of my knowledge.

In my understanding, this is simply a breaking change, a silent one

a silent breaking change is a change that only occurs at runtime

No one is expecting that until we can throw a compilation warning some time before the actual breakage will occur.

When you do a major version bump, you should always expect breakage.

@alexbool
Copy link

Can we throw a deprcation message during proc-macro expansion?

@shahn
Copy link
Author

shahn commented Feb 14, 2017

I think there's a disconnect what "silent" means. The current behaviour is a bug in my book, because it means a serialize/deserialize cycle breaks programs "silently" - it appears to work, but doesn't. A breaking change that breaks the compile is not silent, independent of whether a deprecation note happened earlier or not. Under semver, when you update to a version with breaking changes, you are supposed to read the release notes to learn whether you're affected, then decide whether you want to upgrade and if you decide you want to you deal with the breakage. That seems normal and expected?

@nox
Copy link
Contributor

nox commented Mar 12, 2017

I'm -1 on removing the impls. Why would one assume that serialising and then deserialising a Rc<T> would share the T inside it?

@oli-obk
Copy link
Member

oli-obk commented Mar 13, 2017

None would assume that, but you might accidentally end up creating code that expects that it would. Generic structures which derive Serialize and Deserialize might get inialized with Rc.... Also, in the few cases you actually want cloning serialization, you can use serialize_with.

@dtolnay
Copy link
Member

dtolnay commented Mar 17, 2017

I put together a proof of concept of an efficient Rc<> DAG serialization/deserialization approach here in case anyone is interested in pursuing this further: see bincode-org/bincode#139 (comment).

@erickt
Copy link
Member

erickt commented Mar 19, 2017

One idea I had for this was that we could add an API to Deserializer that provided a typemap-ish store. This would allow an impl for Rc<T> store a map from some unique id to the deduplicated pointer. This might also be useful for some zero-copy situations, where the creator of the deserializer could prepopulate the typemap with shared data that was used in the Deserialize impl.

This does have some smell to it though.

@dtolnay
Copy link
Member

dtolnay commented Mar 19, 2017

@erickt take a look at the implementation in bincode-org/bincode#139 (comment) (the hashmap could be swapped out for a typemap). I think this should be a separate library on top of Serde. It does not need to be baked into the Deserializer API.

@dtolnay dtolnay added this to the v1.0 milestone Apr 5, 2017
@dtolnay
Copy link
Member

dtolnay commented Apr 10, 2017

I am still torn on this. I understand the concern that implicitly duplicating Rc data may be unexpected and is not aligned with Rust's tendency to make expensive operations explicit. I also know there are use cases where serializing an Rc the current way is exactly what is desired and working around a missing impl would be obnoxious.

@shahn @dylanede or others, does anyone have an example of a library using Rc and Serde where serializing copies of an Rc would lead to undesirable behavior?

Perhaps putting Rc impls in Serde behind a cfg would be an acceptable middle ground?

@nox
Copy link
Contributor

nox commented Apr 10, 2017

@alexbool: the reverse argument is that you could use #[serde(deserialize_with="val_to_arc")]

What about Rc<T> in other types? Option<Rc<T>>? HashSet<Rc<T>>?

@shahn
Copy link
Author

shahn commented Apr 10, 2017

I don't have an example for a public crate, but this ticket exists because I was using serde to serialize the state of a toy compiler and deserializing broke it completely (I was using pointer equality to check for equality of two objects behind an Rc, nested into a much larger struct). Since Serde allowed me to just derive Serialize, I assumed it would work correctly.

@nox
Copy link
Contributor

nox commented Apr 10, 2017

I was using pointer equality to check for equality of two objects behind an Rc

This is not the default behaviour of PartialEq<Rc<T>> for Rc<T>, so allow me to argue that this breakage wasn't the serde impls' fault.

@shahn
Copy link
Author

shahn commented Apr 10, 2017

I don't think the point you're making is correct. I did not modify the way Rc checks equality at all (I could not do that in fact, because of coherence). Rc works for any T, completely independently of whether T's PartialEq impl is derived, missing, or custom-defined.

@dtolnay
Copy link
Member

dtolnay commented Apr 11, 2017

Thanks for the valuable feedback @shahn. Let me know whether you think #864 would have set more correct expectations around Serde's handling of Rc<T> and Arc<T>.

@shahn
Copy link
Author

shahn commented Apr 11, 2017

Yes definitely. Putting this behind a cfg transports the message that this might be what you want, but it might also break your assumptions. I think that's the right tradeoff. Plus it's a very easily fixed breaking change, which I hope won't be too hard for people to adapt to.

@dtolnay
Copy link
Member

dtolnay commented Apr 11, 2017

Cool. I am moving forward with the cfg and closing this issue but we have some time before this will be released, so I am open to continuing the discussion here if other people have strong opinions.

@dtolnay dtolnay closed this as completed Apr 11, 2017
@dtolnay dtolnay self-assigned this Apr 18, 2017
@whitequark
Copy link

@dtolnay Question. Given how cargo deduplicates dependencies, couldn't one accidentally opt into this feature by having a dependency that, as its implementation detail, requests the rc feature from serde?

@dtolnay
Copy link
Member

dtolnay commented Apr 20, 2017

That is correct. This is just a way to improve the chance that people understand what they are doing.

@shahn
Copy link
Author

shahn commented Apr 21, 2017

hm, that would be sad if a popular dependency used that feature. Are there better alternative ideas?

[edit]Ah. serde 1.0 is out, so this is probably it for now :)

@nox
Copy link
Contributor

nox commented Apr 21, 2017

Yeah it shipped so it's too late. I don't understand why you think it would be sad, the existence of this feature is what is sad. :)

@oli-obk
Copy link
Member

oli-obk commented Apr 21, 2017

As I see it, enabling the feature by default is not a breaking change, so the feature is the most sensible choice in the presence of uncertainty. If you want to make sure you don't accidentally end up with a derived impl containing an rc, open a clippy issue.

rubdos pushed a commit to rubdos/serde that referenced this issue Jun 20, 2017
Add a trait for floating-point constants

The pull request is to address issue serde-rs#194. In order to keep the library organized, I’ve introduced a new trait for the new functionality. The trait is supposed to closely follows the [`consts`](https://doc.rust-lang.org/std/f64/consts/index.html) module from the standard library. There are at least the following three open questions:

1. What should the name of the trait be? Currently, it’s `Constant`; an alternative is `Consts`.
2. What should the names of the getters be? Currently, they are lower-case versions of the constants defined in the `consts` module. Note that `Float` provides `log2` and `log10`, whereas `LOG_2` and `LOG_10` get translated into `log_2` and `log_10`, respectively.
3. Should `Float` require the new trait? Or should it be left up to the user?

Please let me know what you think. Thank you!

Regards,
Ivan
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

8 participants