Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(De)serialization of enums could use a simpler format #251

Closed
Yoric opened this issue Feb 24, 2016 · 32 comments
Closed

(De)serialization of enums could use a simpler format #251

Yoric opened this issue Feb 24, 2016 · 32 comments
Assignees

Comments

@Yoric
Copy link
Contributor

Yoric commented Feb 24, 2016

I have worked only with serde_json, to I may be missing something that applies to other (de)serializers.

Serialization

Serialization of enums seems to always produce objects, even for trivial constructors

#[derive(Serialize, Deserialize)]
pub enum Foo {
    A, B, C
}

fn main() {
  println!("{}", serde_json::to_string(&Foo::A).unwrap());
  // prints: `{"A":[]}`
}

This produces {"A":[]}. I would prefer if it produced just the string "A".

Deserialization

More importantly, deserialization of enums seems to always require objects, which is annoying when one attempts to implement a protocol.

#[derive(Serialize, Deserialize)]
pub enum Foo {
    A, B, C
}

fn main() {
  let a : Foo = self::serde_json::from_str("\"A\"").unwrap();
  // panics with a syntax error
}

I would have preferred if it managed to deserialize the string "A" into enum constructor A.

edit Fixed typo in the second example.

@erickt
Copy link
Member

erickt commented Feb 24, 2016

cc-ing serde-rs/json#37.

Yeah, it's unfortunately a bit painful. Serde used to have this serialization, where for a given enum like:

enum Foo {
    A,
    B(usize, usize),
}

It used to serialize this into "A" and {"B", [0]}. We removed it on the request from the servo team because JavaScript optimizers prefer to have values in an array to have similar shapes.

Given that users might want things to be serialized in different manners, I've come to the feeling that it's the responsibility of serializers to have "some" consistent format, but it might not be the correct choice in every case. Other users might want enums to be serialized as an integer instead of a string. This is what cloudflare/gose does in their benchmark.

Instead, I've been thinking that we this might be better solved if we add in some attributes that help to customize the serialization format. What if we had something like:

#[derive(Serialize, Deserialize)]
#[serde(serialize_as_string)]
pub enum Foo {
    A, B, C
}

#[derive(Serialize, Deserialize)]
#[serde(serialize_as_usize)]
pub enum Bar {
    A, B, C
}

That at compile time would verify that all variants have no arguments, and have them generate code that serializes them as a &str or a usize? I'm open to suggestions for better names.

@jwilm
Copy link
Contributor

jwilm commented Feb 24, 2016

I've run into this same situation where enum values must be deserialized into objects. My workaround so far has been a two step deserialization. First, serde is used to deserialize the data into a StructRaw where the enum member is typed as String. The second step tries to convert StructRaw into Struct and replace the stringified enum members with the actual enum variant. Naturally, this isn't ideal, and what's worse, it's surprising. Enum variants without data are being serialized and deserialized as if they did.

The old serialization @erickt mentions that was removed sounds like the correct default - type driven without surprising behavior. An application developer describes what their data looks like, and the de/serializer can figure out what to do. The format sometimes dictates whether they are encoded as usize or strings. Other times, as with JSON, the format is indifferent, and at that point it would be nice to hint to the serializer what to do as in @erickt's examples (#[serde(serialize_as_string)]).

If serde did revert to the previous strategy, servo would need a way to keep its current behavior. It doesn't feel unreasonable (if you agree with my argument so far) to require an annotation to make that happen; for example, #[serde(enum_constant_shape)], or something to that effect could trigger the servo behavior.

@Yoric
Copy link
Contributor Author

Yoric commented Feb 24, 2016

I tend to agree with @jwilm that the least surprising is to serialize as string.

Also, my biggest issue is deserialization, which doesn't accept a raw string, despite it being much more logical.

@jwilm
Copy link
Contributor

jwilm commented Feb 24, 2016

Least surprising seems subjective based on target format. I'm just saying that transforming the layout of non tuple variants into objects is surprising regardless.

@jwilm
Copy link
Contributor

jwilm commented Feb 24, 2016

Here are some examples to clarify my stance on the various issues.

enum Foo {
    A, B, C
}

Serde itself should not be opinionated about this. The target format should have a sensible default, (eg usize or &str). If it's a format like JSON that could easily encode them as either, serde should support specification via annotation as @erickt described.

enum Bar {
    A,
    B(usize)
}

Expected behavior for this case is serializing Bar::A as "A", and Bar::B(5) as "{B: [5]}" assuming JSON. Generally, the latter should serialize as a tuple variant, and the former as as a &str or usize. If serde doesn't want to revert to this standard behavior, an opt-in attribute would be nice in addition to the blanket usize/&str attributes. Though, I'm still of the opinion that this should be the default, and the current behavior be opt-in.

@fuchsnj
Copy link

fuchsnj commented Feb 25, 2016

Just want to make a small clarification to Yoric's deserialization example

let a : Foo = self::serde_json::from_str("A").unwrap();

This should be expected to fail considering A is not valid JSON. Perhaps you meant this? (Extra quotation marks)

let a : Foo = self::serde_json::from_str("\"A\"").unwrap();

@fuchsnj
Copy link

fuchsnj commented Feb 25, 2016

We removed it on the request from the servo team because JavaScript optimizers prefer to have values in an array to have similar shapes.

This reasoning is absurd! The performance problems of a completely different language for a single Rust project should not be complicating every other project written in Rust. I understand if we want to support this use case, but certainly do not make it the default!

@Yoric
Copy link
Contributor Author

Yoric commented Feb 25, 2016

@fuchsnj Thanks, I'll fix the example.

@Yoric
Copy link
Contributor Author

Yoric commented Feb 26, 2016

@erickt If you point me at the code, I can fix at least the deserialization part, to try and make sure that we also accept "A" and not just { "A": [] }. I believe that being more relaxed in deserialization won't hurt. Mostly, I need to know if it's a problem of serde or serde_json.

@oli-obk
Copy link
Member

oli-obk commented Feb 26, 2016

Just implement a branch for quotes here: https://github.com/serde-rs/json/blob/master/json/src/de.rs#L593

@softprops
Copy link
Contributor

sigh I just ported a bunch of rustc-serialize based code over to serde 0.7 and hit this ( its a good thing I had unit tests! ). Is there a way to work around this to get the older behavior. The empty array behavior seems like a very special case. The old behavior seems like the common case.

@softprops
Copy link
Contributor

Found my fix here. Dropping a for anyone else that needs the same solution.

@Yoric
Copy link
Contributor Author

Yoric commented Mar 2, 2016

@SimonSapin , what's Servo's take on this issue?

@SimonSapin
Copy link
Contributor

@Yoric I can’t speak for everyone on the Servo team on this. In particular:

It used to serialize this into "A" and {"B", [0]}. We removed it on the request from the servo team because JavaScript optimizers prefer to have values in an array to have similar shapes.

@erickt, do you remember who said that and in what context? I wasn’t even aware that we had any serde-serialized JSON parsed by JavaScript.

My personal opinion of this issue is that it doesn’t matter. In my use cases so far, either:

  • A particular bit of data serialized by serde is also parsed by serde using the Serialize and Deserialize traits on custom types, so the wire format is an implementation detail. Servo switched to the more efficient bincode instead of JSON for IPC.
  • Or I want to parse JSON data given by something that is not serde (and likely not Rust) in a particular format that I can’t change, or need to serialize data for something that expects it in a particular format. I’ve found it easier to use the serde_json::value::Value or rustc_serialize::json::Json enum than try to shoehorn this format into the Serialize and Deserialize traits.

@Yoric
Copy link
Contributor Author

Yoric commented Mar 6, 2016

I’ve found it easier to use the serde_json::value::Value or rustc_serialize::json::Json enum than try to shoehorn this format into the Serialize and Deserialize traits.

Well, having derived Serialize and Deserialize to experiment is extremely useful. Unfortunately, the second I need to step out of Serialize and Deserialize and into manually using serde_json::value::Value for just one of the 15 enums and 40 structs involved in this AST, I need to do it for all these enums and structs, which kind of kills the fun of it.

@erickt I'd go for either rolling back to the previous behavior or accepting a hint serialize_as_string. Note that we probably want it to work per-variant, too, e.g. in the following, only variantA can be meaningfully serialized as string:

enum Foo {
  #[serde(serialize_as_string)]
  A,
  B(usize),
  C{min: usize, max:usize}
}

@jwilm
Copy link
Contributor

jwilm commented Apr 2, 2016

@erickt any updated thoughts since all of the feedback?

I've started implementing Serialize by hand for some enum types just to work around this behavior.

@jimmycuadra
Copy link
Contributor

This issue is complicating things for me in a few different projects. I'd prefer the old default that everyone seems to be asking for, but it would also be a workable replacement if we could use #[serde(serialize_with=$path)] for enum variants in addition to struct fields.

@jimmycuadra
Copy link
Contributor

Does anyone have a workaround for this until the Serde team addresses it? Is the only choice to implement the relevant traits manually?

@jwilm
Copy link
Contributor

jwilm commented Apr 17, 2016

@jimmycuadra I've been manually implementing the conversion when needed. Here's an example: https://gist.github.com/jwilm/30b62746a8f9dcd910cd470e2836a7cc. You might want to implement the conversion to a &str with AsRef<str> instead of a custom function as_str as in the example.

@jwilm
Copy link
Contributor

jwilm commented May 13, 2016

Since we're doing this in several places, I ended up refactoring the above into a macro relying on a FromStr implementation for the same type. Here's the macro for anyone interested.

@TimNN
Copy link

TimNN commented May 19, 2016

I've just spend some time trying to figure out why my enum was not deserializing (I'm consuming and external json api).

I find the current behaviour especially surprising when trying to deserialize a C-like enum from a json string value.

@jwilm
Copy link
Contributor

jwilm commented May 24, 2016

Hoping to move this forward with some reflection on what's been discussed so far.

Users who have spoken up in this thread suggest that the intuitive behavior of serde in the case of enums with unit-like variants is to serialize them as a String/usize and not as a map. @erickt mentioned that Servo actually depends on the new behavior which treats them as a map and suggested introducing annotations to bring back the old behavior. However, @SimonSapin suggested this wasn't true of Servo's serde use.

The annotations would still be useful for hinting at the Serializer/Deserializer implementations for choosing usize vs string, but to require annotations by default has been shown to be counter-intuitive based on the activity in this thread.

It would be great to get some consensus on this decision from the serde team so that someone could make an appropriate PR to implement it. I noticed that some issues are starting to get marked with breakage, and it would be nice to get this resolved for the next breaking release (or earlier depending on the decision here).

As to not leave this too open-ended or hand wavey, here is a concrete proposal.

  1. Revert to the original unit-like variant serialization/deserialization behavior. That is, expect a String or usize instead of a map.
  2. Support annotations on enums to choose usize/string behavior for unit-like variants. For example, #[serde(unit_variant_as_usize)] and #[serde(unit_variant_as_string)] could hint at serializer/deserializer implementations to choose one over the other. This probably only matters for human consumable serialization formats such as yaml and json since binary formats would almost certainly want to use the more compact usize serialization.

@SimonSapin
Copy link
Contributor

Servo isn’t a single person. I wrote above about my experience, but I can’t speak for everyone else on the Servo team.

@jwilm
Copy link
Contributor

jwilm commented May 24, 2016

d8fb2ab is the commit where enum serialization was initially changed to {"variant": ["fields, ...]}. Unfortunately the message doesn't elaborate on the Servo question.

I also poked around in the Servo script component a bit, and the only references to serde::{Serialize, Deserialize} traits were for empty JSTrace impls on ipc channel types.

The aforementioned commit is dated Date: Mon Jun 9 07:51:53 2014 -0700; the next step is looking for serde usage around that time, and we can maybe find out who requested this and whether its still relied upon.

@jwilm
Copy link
Contributor

jwilm commented May 24, 2016

The earliest commit I can find in Servo referencing serde are from @pcwalton servo/servo@6eacb0c almost 1 year after this change was made in serde.

Aside from serializing display lists, it looks like Servo's serde usage is almost exclusively related to IPC, and it might not care about the actual serialization layout as long as it's symmetric for IPC sender/receiver pairs. The IPC components don't use JSON anymore either, it's all bincode.

None of Servo's usage I can find appears reliant on unit-like enum variant serialization change being discussed here, but that's just what I can see as an outside observer.

There's not a lot of evidence to suggest that this is a real need (or maybe I'm just missing it):

We removed it on the request from the servo team because JavaScript optimizers prefer to have values in an array to have similar shapes

@Lindenk
Copy link

Lindenk commented May 24, 2016

So, at this point, have we reached a consensus to restore the old behavior, and if not, who else should be in agreement before someone can submit a PR?

@alexbool
Copy link

alexbool commented Jun 5, 2016

+1 for restoring the old behavior

@dtolnay
Copy link
Member

dtolnay commented Jun 5, 2016

+1 from me. I don't see any compelling argument for {"A": []}.

The behavior from serde_codegen is correct though - it calls serialize_unit_variant. We need to fix serde_json (and serde_yaml and possibly others) to use the simpler format for unit variants. Let's resurrect serde-rs/json#37 and just make sure it is able to deserialize both formats.

@Lindenk
Copy link

Lindenk commented Jun 5, 2016

Well, would this PR solve the issue for serde_yaml then: dtolnay/serde-yaml#16?

@dtolnay
Copy link
Member

dtolnay commented Jun 5, 2016

Yes - although above all, I would like to keep serde_yaml consistent with serde_json because JSON is much more mainstream, so I would wait on serde_yaml until there is a solution in serde_json.

homu added a commit to serde-rs/json that referenced this issue Jul 3, 2016
Serialize unit variants as name only

Ref serde-rs/serde#251. This is a rebase of #37 with more tests to ensure backward-compatibility with the current format.
@dtolnay
Copy link
Member

dtolnay commented Jul 28, 2016

This has been fixed in serde_json 0.8.0 and serde_yaml 0.4.0.

@dtolnay dtolnay closed this as completed Jul 28, 2016
@jwilm
Copy link
Contributor

jwilm commented Aug 16, 2016

Thanks @dtolnay!

@dtolnay dtolnay self-assigned this Apr 9, 2017
rubdos pushed a commit to rubdos/serde that referenced this issue Jun 20, 2017
macros 1.1 is now stable

Fixes serde-rs#251. Also includes a patch to get the tests working again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests