Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capture other unrecognized fields #941

Closed
dtolnay opened this issue May 21, 2017 · 31 comments
Closed

Capture other unrecognized fields #941

dtolnay opened this issue May 21, 2017 · 31 comments

Comments

@dtolnay
Copy link
Member

dtolnay commented May 21, 2017

#[derive(Serialize, Deserialize)]
struct S {
    a: u32,
    b: String,
    #[serde(other)]
    other: Map<String, Value>,
}
{"a":0,"b":"","c":true}

This would deserialize into other containing "c" => true.

@TedDriggs
Copy link
Contributor

I like this idea and might be able to submit a PR for it, but I have a few questions:

  1. What are the type constraints on other? I'm guessing it works with any type that supports entries (so that it can report an error on duplicate fields and insert non-duplicate values).
  2. Does the K type parameter of the map have to be String, or can it also be &'de str?
  3. Does the V type parameter of the map have to be Value, or can it be any type? The latter seems more powerful, and I think it would work as long as local type inference can figure out what type to deserialize into.
  4. Where is serde(other) valid?
    1. Named-field structs
    2. Named-field enum variants
    3. Anywhere else?
  5. Should using serde(other) with serde(deny_unknown_fields) on a struct produce a warning?
  6. What other attributes are valid with serde(other)?
    1. skip, skip_serializing, skip_serializing_if, and skip_deserializing
    2. rename_all
    3. borrow? (This is a new feature I'm not familiar with yet)
  7. During serialization, should an error be generated if a key occurs on a named field and in other? Is name collision allowed if the named field's skip_serializing_if resolves to true?

@dtolnay
Copy link
Member Author

dtolnay commented May 31, 2017

  1. It should support any type that implements visit_map (for deserialization) and uses serialize_map (for serialization). The handling of duplicate fields is baked into the Deserialize impl so doesn't need to be handled as part of this feature.

  2. This would be handled by the Deserialize impl in visit_map.

  3. This would be handled by the Deserialize impl in visit_map.

  4. Just those two I think.

  5. Yes, good idea.

  6. For now I would just reject any other attributes. We can follow up later if anybody makes a strong case for a particular one.

  7. Serde allows duplicate keys during serialization.

@clarfonthey
Copy link
Contributor

I personally would prefer naming this something more verbose like other_fields. The extra typing will make the attribute more understandable in the long run, and right now, I wouldn't know what other means without having it explained.

@TedDriggs
Copy link
Contributor

@dtolnay Right now, I only see errors in serde_derive_internals::Ctxt; is there a way to surface warnings?

@dtolnay
Copy link
Member Author

dtolnay commented Jun 1, 2017

No there is not. I guess make it an error.

@TedDriggs
Copy link
Contributor

This is starting to feel suspiciously similar to #119, in that it will allow reading and writing of keys from a nested object into a flat namespace in the serialized object. If we had #[serde(flatten)], I feel like we could allow that attribute on a generic map type and avoid needing this separate attribute at all.

Assuming this is still desirable, I'm starting to wonder if this should be a container-level attribute rather than a field-level attribute.

#[derive(Serialize, Deserialize)]
#[serde(unknown_fields_in="other")]
pub struct Sample {
    hello: String,
    other: HashMap<String, Value>
}

The user experience is similar: The duplicated identifier is unfortunate, but semantically I think it's a little clearer. From an implementation perspective, a container-level approach would be marginally simpler:

  1. Two fields can't both claim to be the recipient of unknown fields
  2. Validation errors about deny_unknown_fields and unknown_fields_in are easier to surface and explain
  3. Code generation in serde_derive::de::deserialize_map doesn't have to scan the fields to determine how to handle unknown values.

@daschl
Copy link

daschl commented Sep 12, 2017

I'm interested in developing this feature, but I haven't dug into the serde codebase at all up until this point. Is it a "beginner" issue? If so and with a little guidance I'd like to take this one up.

@dtolnay
Copy link
Member Author

dtolnay commented Sep 12, 2017

Yes, go for it!

@daschl
Copy link

daschl commented Sep 12, 2017

@dtolnay can you give me some pointers - like if the field or container level attribute is preferred

@dtolnay
Copy link
Member Author

dtolnay commented Sep 12, 2017

I would recommend handwriting some Deserialize impls for this and testing them against some JSON inputs to make sure they capture unrecognized fields the way you expected. I would recommend doing this before diving into attributes and proc macro stuff.

@daschl
Copy link

daschl commented Sep 12, 2017

@dtolnay okay thanks I'll see how far I get with this and if I make progress I'll share it here.

@mitsuhiko
Copy link
Contributor

I added a somewhat working PR for this now: #1178

I went with @TedDriggs's suggestion of a container attribute. However I do think that flattening and this feature cannot be merged for now. The reason for this is that in order to serialize unknown fields one needs to go through the map serialization system because of lifetime issues. As a result of that I decided to add a special mode to structs where structs are serialized into maps all the time and if that mode is on, deserialization into an Extends contained is possible.

For JSON it does not make a difference currently as the only thing you lose from this is the type name on the way to the serializer but JSON never used this anyways.

@Marwes
Copy link
Contributor

Marwes commented Mar 14, 2018

I think this issue is subset of #119 ? Probably worth considering forwards compatibility with solution for that issue.

@mitsuhiko
Copy link
Contributor

@Marwes i do not believe that with the current traits flattening can be combined with capturing leftover data for the reasons outlined above. At least the repr=map would have to stick around.

@Marwes
Copy link
Contributor

Marwes commented Mar 14, 2018

@mitsuhiko I were thinking about that but it might be possible to have a other/flatten/collect_into field in a struct if that is a struct itself (but not a map).

// OK
#[serde(unknown_fields_into="other")]
struct S {
   field: String,
    
   other: T,
}

struct T {
   field1: i32,
   field2: i32,
}

// ERROR
#[serde(unknown_fields_into="other")]
struct S {
   field: String,
    
   other: HashMap<String, i32>,
}

Failing that, the traits could be extended with alternative methods without the 'static lifetimes. These methods would only be called if the collect_into attribute is specified. To preserve backwards compatibility these added methods would need to return an error (or panic) in their default implementation and users would need to make sure that they use a format which has been updated to implement these methods.

@mitsuhiko
Copy link
Contributor

@Marwes i don't think forwarding unknown fields into anything other than a container that can eat anything makes sense. That would be a full overlap to a potential future flatten feature.

@dtolnay
Copy link
Member Author

dtolnay commented Mar 14, 2018

I would prefer to combine this with #119.

Sketch of a solution

@mitsuhiko
Copy link
Contributor

@dtolnay would all flatten downgrade into map serialization or only if flatten is used on a non struct?

@dtolnay
Copy link
Member Author

dtolnay commented Mar 14, 2018

Flatten would always downgrade your struct into a map. In practice people only care about JSON anyway.

@mitsuhiko
Copy link
Contributor

@dtolnay doesn't your solution require that data in inner can always be converted from the data in other?

@dtolnay
Copy link
Member Author

dtolnay commented Mar 14, 2018

Yes. I think that should be okay because in the case where there is more than one flatten field it will pretty much always be 0 or more structs/enums within which we ignore unrecognized fields by default, followed by 0 or 1 final catch-all map. The DeserializeMapAdapter will need to remove entries from the input vec that are deserialized at each step, and keep only the ones that are deserialized with deserialize_ignored_any. That way the final catch-all only sees fields that have not been claimed by any previous flattened field.

@mitsuhiko
Copy link
Contributor

That means though if other is u32 it will no longer work or am I misunderstanding this?

@dtolnay
Copy link
Member Author

dtolnay commented Mar 14, 2018

Like this?

#[derive(Serialize, Deserialize)]
struct S {
    s: String,
    #[serde(flatten)]
    other: u32,
}

I would expect that to not work.

@mitsuhiko
Copy link
Contributor

mitsuhiko commented Mar 14, 2018 via email

@dtolnay
Copy link
Member Author

dtolnay commented Mar 14, 2018

HashMap<String, u32> will work just like HashMap<String, Value> -- its existing Deserialize impl will take care of deserializing from the input deserializer (DeserializeMapAdapter) and coming up with error messages as necessary if things are the wrong type.

@mitsuhiko
Copy link
Contributor

I see what I missed: that your sketch deserializes into the internal Content type. I somehow read the type inference as deserializing to serde json's Value. I will play around with this a bit. Not a huge fan that this means you potentially buffer up a huge amount of data because obviously this also means that efficiently ignoring items no longer works.

This was referenced Mar 15, 2018
@dtolnay
Copy link
Member Author

dtolnay commented Mar 22, 2018

Implemented in #1179, released in Serde 1.0.34.

#[macro_use]
extern crate serde_derive;

extern crate serde;
extern crate serde_json;

use std::collections::BTreeMap as Map;
use serde_json::Value;

#[derive(Serialize, Deserialize)]
struct S {
    a: u32,
    b: String,
    #[serde(flatten)]
    other: Map<String, Value>,
}

fn main() {
    let s = S {
        a: 0,
        b: "".to_owned(),
        other: {
            let mut map = Map::new();
            map.insert("c".to_owned(), Value::Bool(true));
            map
        },
    };

    println!("{}", serde_json::to_string(&s).unwrap());
}

@nixpulvis
Copy link

Slightly simpler, but same idea working well for me as a debug with no other use or other statements required 😉

    #[serde(flatten)]
    other: serde_json::Value,

On a tangent; this confirmed for me the way that serde parsing consumes as it flattens. I'm kinda curious if there's a way to control that somewhat. For example I have another place with two structs I would like nested, who each both get a copy of the parent id, for example. If the parser consumes the id for the first flatten structure then it can't give it to the second.

bors added a commit to rust-lang/cargo that referenced this issue Mar 1, 2023
fix(toml): Provide a way to show unused manifest keys for dependencies

Dependencies have not been able to show unused manifest keys for some time, this problem partially resulted in #11329.

This problem is caused by having an `enum` when deserializing.  To get around this you can use:
```rust
#[serde(flatten)]
other: BTreeMap<String, toml::Value>,
```
This collects any unused keys into `other` that can later be used to show warnings. This idea was suggested in a thread I cannot find but is mentioned in [serde#941](serde-rs/serde#941).
@danielo515
Copy link

Is there any other way to achieve this without using serde_json?

@Mingun
Copy link
Contributor

Mingun commented Jun 18, 2023

If you only concerned about serde_json, then use serde_value.

@danielo515
Copy link

If you only concerned about serde_json, then use serde_value.

Yes, because I'm not currently using JSON for anything, it was more s debugging tool, so serde value will probably suffice if it allows me to inspect the captured values

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

9 participants