Rust high performance xml reader and writer

Overview

quick-xml

Build Status Crate

High performance xml pull reader/writer.

The reader:

  • is almost zero-copy (use of Cow whenever possible)
  • is easy on memory allocation (the API provides a way to reuse buffers)
  • support various encoding (with encoding feature), namespaces resolution, special characters.

docs.rs

Syntax is inspired by xml-rs.

Example

Reader

use quick_xml::Reader;
use quick_xml::events::Event;

let xml = r#"<tag1 att1 = "test">
                <tag2><!--Test comment-->Test</tag2>
                <tag2>
                    Test 2
                </tag2>
            </tag1>"#;

let mut reader = Reader::from_str(xml);
reader.trim_text(true);

let mut count = 0;
let mut txt = Vec::new();
let mut buf = Vec::new();

// The `Reader` does not implement `Iterator` because it outputs borrowed data (`Cow`s)
loop {
    match reader.read_event(&mut buf) {
        Ok(Event::Start(ref e)) => {
            match e.name() {
                b"tag1" => println!("attributes values: {:?}",
                                    e.attributes().map(|a| a.unwrap().value).collect::<Vec<_>>()),
                b"tag2" => count += 1,
                _ => (),
            }
        },
        Ok(Event::Text(e)) => txt.push(e.unescape_and_decode(&reader).unwrap()),
        Ok(Event::Eof) => break, // exits the loop when reaching end of file
        Err(e) => panic!("Error at position {}: {:?}", reader.buffer_position(), e),
        _ => (), // There are several other `Event`s we do not consider here
    }

    // if we don't keep a borrow elsewhere, we can clear the buffer to keep memory usage low
    buf.clear();
}

Writer

use quick_xml::Writer;
use quick_xml::Reader;
use quick_xml::events::{Event, BytesEnd, BytesStart};
use std::io::Cursor;
use std::iter;

let xml = r#"<this_tag k1="v1" k2="v2"><child>text</child></this_tag>"#;
let mut reader = Reader::from_str(xml);
reader.trim_text(true);
let mut writer = Writer::new(Cursor::new(Vec::new()));
let mut buf = Vec::new();
loop {
    match reader.read_event(&mut buf) {
        Ok(Event::Start(ref e)) if e.name() == b"this_tag" => {

            // crates a new element ... alternatively we could reuse `e` by calling
            // `e.into_owned()`
            let mut elem = BytesStart::owned(b"my_elem".to_vec(), "my_elem".len());

            // collect existing attributes
            elem.extend_attributes(e.attributes().map(|attr| attr.unwrap()));

            // copy existing attributes, adds a new my-key="some value" attribute
            elem.push_attribute(("my-key", "some value"));

            // writes the event to the writer
            assert!(writer.write_event(Event::Start(elem)).is_ok());
        },
        Ok(Event::End(ref e)) if e.name() == b"this_tag" => {
            assert!(writer.write_event(Event::End(BytesEnd::borrowed(b"my_elem"))).is_ok());
        },
        Ok(Event::Eof) => break,
	// you can use either `e` or `&e` if you don't want to move the event
        Ok(e) => assert!(writer.write_event(&e).is_ok()),
        Err(e) => panic!("Error at position {}: {:?}", reader.buffer_position(), e),
    }
    buf.clear();
}

let result = writer.into_inner().into_inner();
let expected = r#"<my_elem k1="v1" k2="v2" my-key="some value"><child>text</child></my_elem>"#;
assert_eq!(result, expected.as_bytes());

Serde

When using the serialize feature, quick-xml can be used with serde's Serialize/Deserialize traits.

Here is an example deserializing crates.io source:

// Cargo.toml
// [dependencies]
// serde = { version = "1.0", features = [ "derive" ] }
// quick-xml = { version = "0.21", features = [ "serialize" ] }
extern crate serde;
extern crate quick_xml;

use serde::Deserialize;
use quick_xml::de::{from_str, DeError};

#[derive(Debug, Deserialize, PartialEq)]
struct Link {
    rel: String,
    href: String,
    sizes: Option<String>,
}

#[derive(Debug, Deserialize, PartialEq)]
#[serde(rename_all = "lowercase")]
enum Lang {
    En,
    Fr,
    De,
}

#[derive(Debug, Deserialize, PartialEq)]
struct Head {
    title: String,
    #[serde(rename = "link", default)]
    links: Vec<Link>,
}

#[derive(Debug, Deserialize, PartialEq)]
struct Script {
    src: String,
    integrity: String,
}

#[derive(Debug, Deserialize, PartialEq)]
struct Body {
    #[serde(rename = "script", default)]
    scripts: Vec<Script>,
}

#[derive(Debug, Deserialize, PartialEq)]
struct Html {
    lang: Option<String>,
    head: Head,
    body: Body,
}

fn crates_io() -> Result<Html, DeError> {
    let xml = "<!DOCTYPE html>
        <html lang=\"en\">
          <head>
            <meta charset=\"utf-8\">
            <meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">
            <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">

            <title>crates.io: Rust Package Registry</title>


        <!-- EMBER_CLI_FASTBOOT_TITLE --><!-- EMBER_CLI_FASTBOOT_HEAD -->
        <link rel=\"manifest\" href=\"/manifest.webmanifest\">
        <link rel=\"apple-touch-icon\" href=\"/cargo-835dd6a18132048a52ac569f2615b59d.png\" sizes=\"227x227\">

            <link rel=\"stylesheet\" href=\"/assets/vendor-8d023d47762d5431764f589a6012123e.css\" integrity=\"sha256-EoB7fsYkdS7BZba47+C/9D7yxwPZojsE4pO7RIuUXdE= sha512-/SzGQGR0yj5AG6YPehZB3b6MjpnuNCTOGREQTStETobVRrpYPZKneJwcL/14B8ufcvobJGFDvnTKdcDDxbh6/A==\" >
            <link rel=\"stylesheet\" href=\"/assets/cargo-cedb8082b232ce89dd449d869fb54b98.css\" integrity=\"sha256-S9K9jZr6nSyYicYad3JdiTKrvsstXZrvYqmLUX9i3tc= sha512-CDGjy3xeyiqBgUMa+GelihW394pqAARXwsU+HIiOotlnp1sLBVgO6v2ZszL0arwKU8CpvL9wHyLYBIdfX92YbQ==\" >


            <link rel=\"shortcut icon\" href=\"/favicon.ico\" type=\"image/x-icon\">
            <link rel=\"icon\" href=\"/cargo-835dd6a18132048a52ac569f2615b59d.png\" type=\"image/png\">
            <link rel=\"search\" href=\"/opensearch.xml\" type=\"application/opensearchdescription+xml\" title=\"Cargo\">
          </head>
          <body>
            <!-- EMBER_CLI_FASTBOOT_BODY -->
            <noscript>
                <div id=\"main\">
                    <div class='noscript'>
                        This site requires JavaScript to be enabled.
                    </div>
                </div>
            </noscript>

            <script src=\"/assets/vendor-bfe89101b20262535de5a5ccdc276965.js\" integrity=\"sha256-U12Xuwhz1bhJXWyFW/hRr+Wa8B6FFDheTowik5VLkbw= sha512-J/cUUuUN55TrdG8P6Zk3/slI0nTgzYb8pOQlrXfaLgzr9aEumr9D1EzmFyLy1nrhaDGpRN1T8EQrU21Jl81pJQ==\" ></script>
            <script src=\"/assets/cargo-4023b68501b7b3e17b2bb31f50f5eeea.js\" integrity=\"sha256-9atimKc1KC6HMJF/B07lP3Cjtgr2tmET8Vau0Re5mVI= sha512-XJyBDQU4wtA1aPyPXaFzTE5Wh/mYJwkKHqZ/Fn4p/ezgdKzSCFu6FYn81raBCnCBNsihfhrkb88uF6H5VraHMA==\" ></script>

          </body>
        </html>
}";
    let html: Html = from_str(xml)?;
    assert_eq!(&html.head.title, "crates.io: Rust Package Registry");
    Ok(html)
}

Credits

This has largely been inspired by serde-xml-rs. quick-xml follows its convention for deserialization, including the $value special name.

Parsing the "value" of a tag

If you have an input of the form <foo abc="xyz">bar</foo>, and you want to get at the bar, you can use the special name $value:

struct Foo {
    pub abc: String,
    #[serde(rename = "$value")]
    pub body: String,
}

Performance

Note that despite not focusing on performance (there are several unecessary copies), it remains about 10x faster than serde-xml-rs.

Features

  • encoding: support non utf8 xmls
  • serialize: support serde Serialize/Deserialize

Performance

Benchmarking is hard and the results depend on your input file and your machine.

Here on my particular file, quick-xml is around 50 times faster than xml-rs crate.

// quick-xml benches
test bench_quick_xml            ... bench:     198,866 ns/iter (+/- 9,663)
test bench_quick_xml_escaped    ... bench:     282,740 ns/iter (+/- 61,625)
test bench_quick_xml_namespaced ... bench:     389,977 ns/iter (+/- 32,045)

// same bench with xml-rs
test bench_xml_rs               ... bench:  14,468,930 ns/iter (+/- 321,171)

// serde-xml-rs vs serialize feature
test bench_serde_quick_xml      ... bench:   1,181,198 ns/iter (+/- 138,290)
test bench_serde_xml_rs         ... bench:  15,039,564 ns/iter (+/- 783,485)

For a feature and performance comparison, you can also have a look at RazrFalcon's parser comparison table.

Contribute

Any PR is welcomed!

License

MIT

Comments
  • Split `Reader` into `SliceReader` and `BufferedReader`

    Split `Reader` into `SliceReader` and `BufferedReader`

    This PR was split from #417.

    This splits Reader into two new structs, SliceReader and IoReader to better separate which kind of byte source the Reader uses to read bytes. Changes are based on https://github.com/tafia/quick-xml/pull/417#issuecomment-1181318331. A Reader<SliceReader> also explicitly doesn't have methods for buffered access anymore.

    opened by 999eagle 25
  • Loosing attributes during serialization

    Loosing attributes during serialization

    Hello,

    I have an XML with some elements that have attributes. It deserializes into a struct ok, but when I serialize to output, the attributes are no longer attributes, but rather individual elements. How do I correctly indicate that attributes need to stay as attributes and nothing else?

    For example, I have the following element with attributes:

    <Representation audioSamplingRate="48000" bandwidth="63700" codecs="mp4a.40.2" id="519678732693145249_AO-00-00-00-71">
    

    When I read the element into the struct:

    #[derive(Default, Debug, Clone, PartialEq, Serialize, Deserialize)]
    #[serde(rename_all = "camelCase")]
    struct Representation{
        #[serde(rename = "audioSamplingRate")]
        a_audio_sampling_rate: Option<String>,
        #[serde(rename = "bandwidth")]
        a_bandwidth: String,
        #[serde(rename = "codecs")]
        a_codecs: String,
        #[serde(rename = "id")]
        a_id: String,
    }
    

    It looks and feels right. However, during the output, the serialization step, I get the output that looks like this:

    <Representation>
       <audioSamplingRate>48000</audioSamplingRate>
       <bandwidth>63700</bandwidth>
       <codecs>mp4a.40.2</codecs>
       <id>519678732693145249_AO-00-00-00-71</id>
    </Representation>
    

    What am I not doing right? Or, what am I missing in this process? I'm reading the element indicated above, as a string. Also, the output is a string as well.

    Thank you, Max

    enhancement serde 
    opened by mlevkov 19
  • Allow using tokio's AsyncBufRead

    Allow using tokio's AsyncBufRead

    I would like to use quick-xml together with tokio's AsyncRead, as I am fetching an object from S3 with rusoto_s3 and that returns a body that implements AsyncRead. This is an attempt at making that happen.

    opened by endor 18
  • Allow using tokio's AsyncBufRead [Rebased]

    Allow using tokio's AsyncBufRead [Rebased]

    This is a rebase of this PR https://github.com/tafia/quick-xml/pull/233 on top of the current master.

    I don't know if there was a better way to do this and my intention is not to step on anyone's work but rather keep it moving along. I'm happy to take this PR down if you want to update yours @endor or honestly whatever works for people. This PR is a proper rebase of the endor/async on top of tafia/master so line by line credit is still in tact except for the few places I had to fix rebase conflicts.

    @endor, @tafia happy to help with whatever y'all need help with here, lemme know!

    opened by itsgreggreg 17
  • Rewrite serializer

    Rewrite serializer

    This PR rewrites serializer and fixes the following issues:

    • Fixes #252
    • Fixes #280
    • Fixes #287
    • Fixes #343
    • Fixes #346
    • Invalidates #354 (but it still should be checked that it works aligned with the serializer)
    • Fixes #361
    • Partially addresses #368
    • Fixes #429
    • Fixes #430

    Notable changes:

    • Removed $unflatten= prefix, as it not used anymore
    • Removed $primitive= prefix, as it not used anymore
    • $value special name split into two special names #text and ~#any~ #content. @dralley, please check especially, is the documentation clear about the differences?
    • Change how attributes is defined. Now attribute should have a name started with @ character. Without that fields serialized as elements and can be deserialized only from elements. Deserialization from element or from attribute into one field is not possible anymore. It's a weird wish anyway
    bug enhancement serde arrays 
    opened by Mingun 16
  • Getting

    Getting "invalid type: map, expected a sequence" when deserialising

    Hello. First of all, thanks for all the great work on this crate!

    I'm trying to use the serde implementation to deserialise some XML that looks like this:

    <Attribute Name="Example">
        <Array>
            <DataObject ObjectType="TestObject">A</DataObject>
            <DataObject ObjectType="TestObject">B</DataObject>
            <DataObject ObjectType="TestObject">C</DataObject>
        </Array>
    </Attribute>
    

    I would like to read this as:

    struct Example {
        value: Vec<TestObject>,
    }
    

    Reading the DataObject as a TestObject is easy since all DataObjects in this context will be TestObjects. The Example part was harder, but I got it working using an enum and serde's tag. However, I can't get the Vec to read correctly. I'm getting the error Custom("invalid type: map, expected a sequence").

    To make sure there wasn't anything more complex causing a problem, I also created an Array object which only contains the Vec, but still get the same error.

    If I remove the parent Example object and just try to deserialise the Array, it works as expected.

    I also tried serialising the data structure I want, and I get exactly the output I expect, but deserialising this output fails with the error. This is strange since (as far as I can tell) my serialisation logic is identical to my deserialisation logic.

    I also have some code here that reproduces this last example:

    Example
    use std::{error::Error, fmt::Display};
    
    use serde::{Serialize, Deserialize};
    use serde_with::serde_as;
    
    #[derive(Serialize, Deserialize, Debug)]
    struct Array<T> {
        #[serde(rename = "DataObject")]
        items: Vec<T>,
    }
    
    #[derive(Serialize, Deserialize, Debug, Default)]
    struct DataObject {
        #[serde(rename = "ObjectType")]
        object_type: String,
    
        #[serde(rename = "Attribute", default)]
        attributes: Vec<Attribute>,
    
        #[serde(rename = "$value")]
        data: String,
    }
    
    impl From<TestObject> for DataObject {
        fn from(value: TestObject) -> Self {
            Self { object_type: stringify!(TestObject).into(), attributes: Vec::new(), data: value.data }
        }
    }
    
    #[derive(Debug)]
    struct BadObjectTypeError {
        expected_object_type: String,
        actual_object_type: String,
    }
    
    impl Display for BadObjectTypeError {
        fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
            write!(f, "Bad object type. Expected {}, found {}", self.expected_object_type, self.actual_object_type)
        }
    }
    
    impl Error for BadObjectTypeError {}
    
    impl TryFrom<DataObject> for TestObject {
        type Error = BadObjectTypeError;
    
        fn try_from(value: DataObject) -> Result<Self, Self::Error> {
            if "TestObject" == value.object_type {
                Ok(Self {
                    data: value.data,
                })
            } else {
                Err(BadObjectTypeError {
                    expected_object_type: std::any::type_name::<Self>().into(),
                    actual_object_type: value.object_type,
                })
            }
        }
    }
    
    #[derive(Serialize, Clone, Deserialize, Debug)]
    #[serde(try_from = "DataObject", into = "DataObject")]
    struct TestObject {
        #[serde(rename = "$value")]
        data: String,
    }
    
    #[serde_as]
    #[derive(Serialize, Deserialize, Debug)]
    #[serde(tag = "Name", rename_all = "SCREAMING_SNAKE_CASE")]
    enum Attribute {
        Example {
            #[serde(rename = "Array")]
            value: Array<TestObject>
        }
    }
    
    fn main() {
        let attribute = Attribute::Example { value: Array { items: vec![TestObject { data: "A".into() }, TestObject { data: "B".into() }, TestObject { data: "C".into() }] } };
        let xml = quick_xml::se::to_string(&attribute).unwrap();
        println!("{}", &xml);
    
        let xd = &mut quick_xml::de::Deserializer::from_str(&xml);
        let import: Attribute = serde_path_to_error::deserialize(xd).unwrap();
        dbg!(&import);
    }
    

    Would really appreciate any guidance with this!

    wontfix serde arrays 
    opened by davystrong 16
  • Selectively deserializing XML elements with serde

    Selectively deserializing XML elements with serde

    Hi!

    I want to parse a 3MB XML file with serde. I used serde-xml-rs, and it is painfully slow; I hacked my way through serde-xml-rs's code to make it work with quick-xml instead (the APIs are very similar after all), and that sped it up tremendously (from 1.5s to 0.1s).

    But! I don't want to deserialize an entire 3MB XML into a giant struct (which has loads of small heap-allocated vecs inside it), when all I want is to scan for a specific element inside it, and deserialize just that one.

    I thought of using quick-xml's events to reach the element I want, then read_to_end() to get the whole element as a big text block, and then use serde-xml-rs to parse the text block as xml; except this approach loses all namespacing/encoding info.

    I also thought of implementing some sort of from_element(xmlreader, start_element), which would give a partial Deserializer object, which is my current favorite option.

    Thoughts? Any better ways to do this?

    opened by andreivasiliu 16
  • Add support for empty elements

    Add support for empty elements

    Empty elements are represented as the combination of a start element with an end element. This means that reading a file with empty elements:

      <some_tag attr='1'/>
    

    would result in an empty start element, followed by an end element:

      <some_tag attr='1'></some_tag>
    

    This merge request changes that behaviour so that a read/write roundtrip results in an identical file.

    enhancement 
    opened by tmoers 16
  • Add Vec-less reading of events and borrowing deserialization

    Add Vec-less reading of events and borrowing deserialization

    Add a new reader.read_event_unbuffered() method that does not require a user-given Vec buffer, and borrows from the input, implemented if the input is a &[u8].

    Still needs more polishing, and it needs the buf_position branch to be pushed first, since it is based on those changes. I had to move all of the input-reading methods into a trait, and make them return a reference to the text that was read. Because of that, there's now a new requirement of at most 1 input-reading method being called per read_event(), so I had to rework whitespace skipping, and to move all of the bang element processing into yet another read_until-like function which doesn't return until it has all of the text.

    Next up is making a deserializer that can use this to remove the DeserializedOwned restriction and allow user structs to borrow from the input when possible, allowing for truly zero-copy parsing and deserialization.

    The only user-facing change is the 1 new method, the rest is completely hidden. I'm not fond of the new method's name, so I'd appreciate any help with figuring out a better name for it.

    Bikeshedding for the rest of the names would be appreciated too.

    opened by andreivasiliu 13
  • DeserializeOwned prevents deserialization into structs with lifetime bounds

    DeserializeOwned prevents deserialization into structs with lifetime bounds

    First of all, thanks for this great crate. It has been serving me very well and I am a big fan.

    I am trying to deserialize a struct I just serialized, just a basic round-trip. The struct, to be specific, is this one. You'll notice the lifetimes and Cow<'a, str>s in the nested structs. Serialization works fine, but deserialization does not.

    implementation of `_IMPL_DESERIALIZE_FOR_EdiConvertToRequest::_serde::Deserialize` is not general enough
       --> src/main.rs:102:28
        |
    102 |           DataFormat::Xml => quick_xml::de::from_str(&req.data)?,
        |                              ^^^^^^^^^^^^^^^^^^^^^^^ implementation of `_IMPL_DESERIALIZE_FOR_EdiConvertToRequest::_serde::Deserialize` is not general enough
        | 
       ::: /home/alex/.cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.104/src/de/mod.rs:531:1
        |
    531 | / pub trait Deserialize<'de>: Sized {
    532 | |     /// Deserialize this value from the given Serde deserializer.
    533 | |     ///
    534 | |     /// See the [Implementing `Deserialize`][impl-deserialize] section of the
    ...   |
    569 | |     }
    570 | | }
        | |_- trait `_IMPL_DESERIALIZE_FOR_EdiConvertToRequest::_serde::Deserialize` defined here
        |
        = note: `edi::edi_document::EdiDocument<'_, '_>` must implement `_IMPL_DESERIALIZE_FOR_EdiConvertToRequest::_serde::Deserialize<'0>`, for any lifetime `'0`...
        = note: ...but `edi::edi_document::EdiDocument<'_, '_>` actually implements `_IMPL_DESERIALIZE_FOR_EdiConvertToRequest::_serde::Deserialize<'1>`, for some specific lifetime `'1`
    
    

    Following the guidance on this SO post, I think changing this crate's Deserialize definition to one like serde_json's would work. Note that I am able to successfully deserialize this struct back from JSON without issue, it is just the XML that isn't working.

    I report this as an issue because as far as I can tell, there is no workaround besides forking the crate that defines the struct and making everything it contains owned.

    enhancement serde optimization 
    opened by sezna 13
  • panick on read_namespaced_event with different buffers

    panick on read_namespaced_event with different buffers

    Environment: Debian 9 amd64 Reproducibility: Always Version: quick_xml 0.12.1. Also known to be reproducible with 0.11.0. Steps to reproduce: compile and run this:

    extern crate quick_xml; // version "0.11.0" or "0.12.1".
    
    fn main()
    {
        let xml = r#"<?xml version='1.0'?><a:a xmlns:a='http://example.org/something' xmlns='b:c'><a:d><hello xmlns='x:y:z'><world>earth</world></hello></a:d>"#;
        let mut parser = quick_xml::Reader::from_str(xml); // for quick_xml 0.11.0, use "quick_xml::reader::Reader".
        let mut buf = Vec::new();
        let mut buf_ns = Vec::new();
        for _ in 0..4 {
          let _ = parser.read_namespaced_event(&mut buf, &mut buf_ns);
        }
        let mut buf = Vec::new();
        let mut buf_ns = Vec::new();
        for _ in 0..4 {
            let _ = parser.read_namespaced_event(&mut buf, &mut buf_ns); // <-- panick
        }
    }
    

    Acutal result: it panics:

    $ RUST_BACKTRACE=1 cargo run
        Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
         Running `/home/willem/e/crash-quick-xml/target/debug/crash-quick-xml`
    thread 'main' panicked at 'index 29 out of range for slice of length 0', libcore/slice/mod.rs:785:5
    stack backtrace:
       0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
                 at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
       1: std::sys_common::backtrace::_print
                 at libstd/sys_common/backtrace.rs:71
       2: std::panicking::default_hook::{{closure}}
                 at libstd/sys_common/backtrace.rs:59
                 at libstd/panicking.rs:380
       3: std::panicking::default_hook
                 at libstd/panicking.rs:396
       4: std::panicking::rust_panic_with_hook
                 at libstd/panicking.rs:576
       5: std::panicking::begin_panic
                 at libstd/panicking.rs:537
       6: std::panicking::begin_panic_fmt
                 at libstd/panicking.rs:521
       7: rust_begin_unwind
                 at libstd/panicking.rs:497
       8: core::panicking::panic_fmt
                 at libcore/panicking.rs:71
       9: core::slice::slice_index_len_fail
                 at libcore/slice/mod.rs:785
      10: <core::ops::range::Range<usize> as core::slice::SliceIndex<[T]>>::index
                 at /checkout/src/libcore/slice/mod.rs:916
      11: core::slice::<impl core::ops::index::Index<I> for [T]>::index
                 at /checkout/src/libcore/slice/mod.rs:767
      12: quick_xml::reader::Namespace::prefix
                 at /home/willem/.cargo/registry/src/github.com-1ecc6299db9ec823/quick-xml-0.12.1/src/reader.rs:855
      13: quick_xml::reader::NamespaceBufferIndex::find_namespace_value::{{closure}}
                 at /home/willem/.cargo/registry/src/github.com-1ecc6299db9ec823/quick-xml-0.12.1/src/reader.rs:902
      14: core::iter::traits::DoubleEndedIterator::rfind::{{closure}}
                 at /checkout/src/libcore/iter/traits.rs:580
      15: <core::slice::Iter<'a, T> as core::iter::traits::DoubleEndedIterator>::try_rfold
                 at /checkout/src/libcore/slice/mod.rs:1319
      16: core::iter::traits::DoubleEndedIterator::rfind
                 at /checkout/src/libcore/iter/traits.rs:579
      17: <core::iter::Rev<I> as core::iter::iterator::Iterator>::find
                 at /checkout/src/libcore/iter/mod.rs:459
      18: quick_xml::reader::NamespaceBufferIndex::find_namespace_value
                 at /home/willem/.cargo/registry/src/github.com-1ecc6299db9ec823/quick-xml-0.12.1/src/reader.rs:899
      19: <quick_xml::reader::Reader<B>>::read_namespaced_event
                 at /home/willem/.cargo/registry/src/github.com-1ecc6299db9ec823/quick-xml-0.12.1/src/reader.rs:566
      20: crash_quick_xml::main
                 at src/main.rs:15
      21: std::rt::lang_start::{{closure}}
                 at /checkout/src/libstd/rt.rs:74
      22: std::panicking::try::do_call
                 at libstd/rt.rs:59
                 at libstd/panicking.rs:479
      23: __rust_maybe_catch_panic
                 at libpanic_unwind/lib.rs:102
      24: std::rt::lang_start_internal
                 at libstd/panicking.rs:458
                 at libstd/panic.rs:358
                 at libstd/rt.rs:58
      25: std::rt::lang_start
                 at /checkout/src/libstd/rt.rs:74
      26: main
      27: __libc_start_main
      28: _start
    

    Expected result:

    • if the API is not supposed to be used like this, and if it is possible to enforce that with Rust's typesystem: the API is written in such a way that Rust's typesystem forbids to use the library like this
    • if the API is not supposed to be used like this, and if it is not possible to enforce it with Rust's typesystem: the English API specifications state it must not be used like this
    • if the API does not intend to forbid to use it like this: it should not panic.
    bug namespaces 
    opened by willempx 13
  • Deserialization of a doctype with very long content fails

    Deserialization of a doctype with very long content fails

    quick_xml::de::from_reader() parsing fails if the XML contains a doctype with content larger than the internal BufRead capacity. For instance

    <!DOCTYPE [
    <!-- A very very long comment *snipped* -->
    ]>
    <X></X>
    

    Here is a minimal code to reproduce this issue. It fails with an ExpectedStart error.

    use std::io::Write;
    use serde::Deserialize;
    
    #[derive(Deserialize)]
    struct X {}
    
    fn main() {
        {
            let mut file = std::fs::File::create("test.xml").unwrap();
            let header = &"<!DOCTYPE X [<!--";
            let footer = &"-->]><X></X>";
            let padding = 8192 - (header.len() + 2);
            write!(file, "{header}{:1$}{footer}", "", padding).unwrap();
        }
    
        let file = std::fs::File::open("test.xml").unwrap();
        let reader = std::io::BufReader::new(file);
        let _: X = quick_xml::de::from_reader(reader).unwrap();
    }
    

    Cargo.toml content

    [package]
    name = "test"
    version = "0.1.0"
    edition = "2021"
    
    [dependencies]
    quick-xml = { version = "0.27.1", features = ["serialize"] }
    serde = { version = "1.0", features = ["derive"] }
    
    • When decreasing the padding size, or using BufReader::with_capacity() to increase the buffer, even of 1 byte, there is no error.
    • Other BufRead implementations don't have this issue (checked with &[u8] and stdin).
    • Content does not have to be in one "block". The same issue occurs for a doctype split into multiple declarations and comments.
    • With a longer doctype with real content, the error may be different. For instance it may complains about an invalid ! from a !ENTITY tag.
    • No issue with serde-xml-rs, even for larger comments.
    • Tested on Windows, with rustc 1.66.0.
    bug help wanted 
    opened by benoitryder 0
  • Deserializing data into an untagged enum with serde

    Deserializing data into an untagged enum with serde

    Problem description

    Hello 👋 I've been setting up the data models/types to automatically deserialize XML data from an API into a set of types, but I'm struggling with a particular issue involving untagged enums.

    The XML data I'm receiving can include data of two formats, and I would like to deserialize the XML data into whichever type it can successfully deserialize into first.

    Example XML data

    Format 1: text containing a reference/href

           <ServiceLocation xmlns="http://naesb.org/espi/customer">
                <UsagePoints>
                    <UsagePoint>https://api.com/DataCustodian/espi/1_1/resource/Subscription/1/UsagePoint/100</UsagePoint>
                    <UsagePoint>https://api.com/DataCustodian/espi/1_1/resource/Subscription/1/UsagePoint/200</UsagePoint>
                </UsagePoints>
            </ServiceLocation>
    

    Format 2: fully-expanded

           <ServiceLocation xmlns="http://naesb.org/espi/customer">
                <UsagePoints>
                    <UsagePoint>
                      <serialNumber>100</serialNumber>
                      <status>On</status>
                    </UsagePoint>
                    <UsagePoint>
                      <serialNumber>200</serialNumber>
                      <status>On</status>
                    </UsagePoint>
                </UsagePoints>
            </ServiceLocation>
    

    Solution attempt

    I want to deserialize this data into a single type, where a field containing an enum can contain either of these variants.

    Type definitions

    Here's my attempt at a type definition:

    #[derive(Debug, serde::Deserialize)]
    struct ServiceLocation {
        #[serde(rename = "UsagePoints")]
        usage_points: Option<Vec<UsagePoint>>
    }
    
    #[derive(Debug, serde::Deserialize)]
    #[serde(untagged)
    enum UsagePoint {
        UsagePointReference(UsagePointReference),
        UsagePointFull(UsagePointFull)
    }
    
    #[derive(Debug, Deserialize)]
    struct UsagePointReference {
        #[serde(rename = "#text")]
        href: Url,
    }
    
    #[derive(Debug, Deserialize)]
    struct UsagePointFull {
        #[serde(rename = "serialNumber")]
        serial_number: String,
        status: String
    }
    

    Deserialization error

    Here's the error I received when providing either of the XML blobs (shown in the previous section) to a call of quick_xml::de::from_str:

    called `Result::unwrap()` on an `Err` value: Custom("data did not match any variant of untagged enum UsagePoint")
    thread 'resources::customer::tests::should_parse_service_location_with_usagepoint_reference' panicked at 'called `Result::unwrap()` on an `Err` value: Custom("data did not match any variant of untagged enum UsagePoint")', 
    

    If you could advise me on how I'm supposed to define the types to handle this variable situation regarding my data, or point me to docs where some similar use-case is elaborated on, I'd really appreciate it.

    Thanks!

    help wanted serde 
    opened by seanpianka 3
  • Merge consequent text and CDATA events into one string

    Merge consequent text and CDATA events into one string

    This PR fixes #474 and introduces a way to read current parser configuration, which was impossible before that.

    I've changed the way how configuration is accessed and changed: instead of having functions to change configuration flags, readers now provides a reference to a Config object. Immutable and mutable references are provided. This new feature is used to temporary disable trimming while read text events in serde Deserializer.

    After fixing #516, all configuration flags are safe to changed at any time, because their does not change the internal state of a reader in a user-visible way (for example, the expand_empty_elements changes an internal state of a reader, but that change is rolled back after next call to read_event, so user cannot see it consequences. It is safe to disable this setting just after read fake Start event and still get a fake End event after that).

    bug enhancement serde 
    opened by Mingun 3
  • Allow to continue parsing after `Error::EndEventMismatch`

    Allow to continue parsing after `Error::EndEventMismatch`

    I'm trying to parse a Netscape bookmark file which has unclosed tags (i.e. <DL>), and for this I'm explicitly ignoring the EndEventMismatch error during the parsing:

    match reader.read_event() {
        Err(e) => match e {
            QuickXmlError::EndEventMismatch { expected: _, found: _ } => (),
    

    The issue is that an Eof event immediately follows this, which causes that the reader stops parsing the rest of the document.

    I couldn't find in the documentation if this behaviour is expected, hence why I'm opening the issue.

    enhancement help wanted 
    opened by moy2010 2
  • Fix #257 and allow $text to work with tags in the text

    Fix #257 and allow $text to work with tags in the text

    This is a very early attempt at solving #257 (awful, but functional, code below). Unfortunately, I ran into a design issue so I'd like to open the discussion now and get your feedback on what to do.

    Suppose we have the following struct to deserialize (from the test case attached in serde-de.rs below)

    #[derive(Debug, Deserialize, PartialEq)]
    struct Trivial<T> {
        #[serde(rename = "$text")]
        value: T,
    }
    

    The test case has the following xml

    <root>style tags <em>in this document</em></root>
    

    deserialize into

    Trivial {
        value: "style tags <em>in this document</em>".to_string(),
    }
    

    The test case also assumes the xml

    <outer><root>style tags <em>in this document</em></root></outer>
    

    should not deserialize (missing field $text)

    However, if we change the design for how $text works to include embedded tags, this would now be deserializable where 'outer' is the root tag and everything between can now be fed into a text field because root can be part of the string now.

    Outside of the above being newly expected behavior we could make the user describe which tags can be deserialized into text fields instead of read:

    #[derive(Debug, Deserialize, PartialEq)]
    struct Trivial<T> {
        #[serde(rename = "$text$<EM>$<B>")] // only <EM>...</EM>s and <B>...</B>s are allowed to be part of the string
        value: T,
    }
    

    Let me know if you have any other ideas on what would be best to do here.

    enhancement serde 
    opened by JOSEPHGILBY 3
  • [Question]: Deserialize optional vector

    [Question]: Deserialize optional vector

    Afaik the following line in Cargo.toml should mean that I'm using the latest git version of quick-xml:

    quick-xml = { git = "https://github.com/tafia/quick-xml", features = ['serialize']}
    

    Although I've put question in the title this issue might be a bug, but I can't tell whether I'm doing something wrong or if it is indeed a bug.

    I've got the following xml:

    <ENTRY>
      <CUE_V2 NAME="foo"></CUE_V2>
      <CUE_V2 NAME="bar"></CUE_V2>
    </ENTRY>
    

    And I've got the following struct to deserialize:

    use serde::{Deserialize, Serialize};
    
    #[derive(Debug, Serialize, Deserialize)]
    #[serde(rename = "ENTRY")]
    struct Entry {
        #[serde(rename = "CUE_V2")]
        cues: Vec<Cue>,
    }
    
    #[derive(Debug, Serialize, Deserialize)]
    #[serde_with::serde_as]
    struct Cue {
        #[serde(rename = "@NAME")]
        name: String,
    }
    

    This works but is incorrect since the cue vector should be optional like so:

    #[derive(Debug, Serialize, Deserialize)]
    #[serde(rename = "ENTRY")]
    struct Entry {
        #[serde(rename = "CUE_V2")]
        cues: Option<Vec<Cue>>,
    }
    

    But this gives the error:

    thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: UnexpectedEnd([69, 78, 84, 82, 89])', src/main.rs:330:56
    

    There might be a trivial solution, but after two days of using rust in total I can't seem to find it :)

    If I may take the opportunity to ask another unrelated question, how do I build the documentation for the git version of the package locally? I've been looking through #490 but I think having the actual documentation would be a bit better :)

    bug serde arrays 
    opened by sandersantema 4
Releases(v0.27.1)
  • v0.27.1(Dec 28, 2022)

    What's Changed

    Bug Fixes

    • #530: Fix an infinite loop leading to unbounded memory consumption that occurs when skipping events on malformed XML with the overlapped-lists feature active.
    • #530: Fix an error in the Deserializer::read_to_end when overlapped-lists feature is active and malformed XML is parsed

    Full Changelog: https://github.com/tafia/quick-xml/compare/v0.27.0...v0.27.1

    Source code(tar.gz)
    Source code(zip)
  • v0.27.0(Dec 25, 2022)

    What's Changed

    MSRV was increased from 1.46 to 1.52 in #521.

    New Features

    • #521: Implement Clone for all error types. This required changing Error::Io to contain Arc<std::io::Error> instead of std::io::Error since std::io::Error does not implement Clone.

    Bug Fixes

    • #490: Ensure that serialization of map keys always produces valid XML names. In particular, that means that maps with numeric and numeric-like keys (for example, "42") no longer can be serialized because XML name cannot start from a digit
    • #500: Fix deserialization of top-level sequences of enums, like
      <?xml version="1.0" encoding="UTF-8"?>
      <!-- list of enum Enum { A, B, С } -->
      <A/>
      <B/>
      <C/>
      
    • #514: Fix wrong reporting Error::EndEventMismatch after disabling and enabling .check_end_names
    • #517: Fix swapped codes for \r and \n characters when escaping them
    • #523: Fix incorrect skipping text and CDATA content before any map-like structures in serde deserializer, like
      unwanted text<struct>...</struct>
      
    • #523: Fix incorrect handling of xs:lists with encoded spaces: they still act as delimiters, which is confirmed also by mature XmlBeans Java library
    • #473: Fix a hidden requirement to enable serde's derive feature to get quick-xml's serialize feature for edition = 2021 or resolver = 2 crates

    Misc Changes

    • #490: Removed $unflatten= special prefix for fields for serde (de)serializer, because:

      • it is useless for deserializer
      • serializer was rewritten and does not require it anymore

      This prefix allowed you to serialize struct field as an XML element and now replaced by a more thoughtful system explicitly indicating that a field should be serialized as an attribute by prepending @ character to its name

    • #490: Removed $primitive= prefix. That prefix allowed you to serialize struct field as an attribute instead of an element and now replaced by a more thoughtful system explicitly indicating that a field should be serialized as an attribute by prepending @ character to its name

    • #490: In addition to the $value special name for a field a new $text special name was added:

      • $text is used if you want to map field to text content only. No markup is expected (but text can represent a list as defined by xs:list type)
      • $value is used if you want to map elements with different names to one field, that should be represented either by an enum, or by sequence of enums (Vec, tuple, etc.), or by string. Use it when you want to map field to any content of the field, text or markup

      Refer to documentation for details.

    • #521: MSRV bumped to 1.52.

    • #473: serde feature that used to make some types serializable, renamed to serde-types

    • #528: Added documentation for XML to serde mapping

    New Contributors

    • @sashka made their first contribution in https://github.com/tafia/quick-xml/pull/498
    • @ultrasaurus made their first contribution in https://github.com/tafia/quick-xml/pull/504
    • @zeenix made their first contribution in https://github.com/tafia/quick-xml/pull/521

    Full Changelog: https://github.com/tafia/quick-xml/compare/v0.26.0...v0.27.0

    Source code(tar.gz)
    Source code(zip)
Owner
Johann Tuffe
Johann Tuffe
A XML parser written in Rust

RustyXML Documentation RustyXML is a namespace aware XML parser written in Rust. Right now it provides a basic SAX-like API, and an ElementBuilder bas

null 97 Dec 27, 2022
An XML library in Rust

SXD-Document An XML library in Rust. Overview The project is currently broken into two crates: document - Basic DOM manipulation and reading/writing X

Jake Goulding 146 Nov 11, 2022
An XML library in Rust

xml-rs, an XML library for Rust Documentation xml-rs is an XML library for Rust programming language. It is heavily inspired by Java Streaming API for

Vladimir Matveev 417 Dec 13, 2022
An XPath library in Rust

SXD-XPath An XML XPath library in Rust. Overview The project is broken into two crates: document - Basic DOM manipulation and reading/writing XML from

Jake Goulding 107 Nov 11, 2022
A Rust OpenType manipulation library

fonttools-rs   This is an attempt to write an Rust library to read, manipulate and write TTF/OTF files. It is in the early stages of development. Cont

Simon Cozens 36 Nov 14, 2022
Single-reader, multi-writer & single-reader, multi-verifier; broadcasts reads to multiple writeable destinations in parallel

Bus Writer This Rust crate provides a generic single-reader, multi-writer, with support for callbacks for monitoring progress. It also provides a gene

Pop!_OS 26 Feb 7, 2022
xml-rs is an XML library for Rust programming language

xml-rs, an XML library for Rust Documentation xml-rs is an XML library for Rust programming language. It is heavily inspired by Java Streaming API for

Vladimir Matveev 417 Jan 3, 2023
Rust low-level minimalist APNG writer and PNG reader with just a few dependencies with all possible formats coverage (including HDR).

project Wiki https://github.com/js29a/micro_png/wiki at glance use micro_png::*; fn main() { // load an image let image = read_png("tmp/test.

jacek SQ6KBQ 8 Aug 30, 2023
GPIO reader, writer and listener

Unbothered gpio Everything is unwrapped under the hood for the precious prettiness of your code. It's more than a simple Rust crate, it's a philosophy

null 0 Nov 7, 2021
Stack buffer provides alternatives to Buf{Reader,Writer} allocated on the stack instead of the heap.

StackBuf{Reader,Writer} Stack buffer provides alternatives to BufReader and BufWriter allocated on the stack instead of the heap. Its implementation i

Alex Saveau 14 Nov 20, 2022
Dynamic csv reader, editor, writer

dcsv Dynamic csv reader, editor, and writer library. If you use structured csv data, use csv crate Feature Read csv which has undecided format Optiona

Simhyeon 2 May 10, 2022
A fetcher hook for the Plato document reader that syncs an e-reader with an OPDS catalogue.

plato-opds A fetcher hook for the Plato document reader that syncs an e-reader with an OPDS catalogue. Motivation I wanted to be able to sync my e-rea

null 11 Nov 8, 2023
A high-performance, high-reliability observability data pipeline.

Quickstart • Docs • Guides • Integrations • Chat • Download What is Vector? Vector is a high-performance, end-to-end (agent & aggregator) observabilit

Timber 12.1k Jan 2, 2023
serde-like serialization and deserialization of static Rust types in XML

static-xml static-xml is a serde-like serialization and deserialization library for XML, currently written as a layer on top of xml-rs. Status: in ear

Scott Lamb 8 Nov 22, 2022
Fast & Memory Efficient NodeJs Excel Writer using Rust Binding

FastExcel This project need Rust to be installed, check here for Rust installation instruction This project using Rust and Neon as a binding to Rust t

Aditya Kresna 2 Dec 15, 2022
Open Graphic Image Writer

Open Graphic Image Writer Documentation You can generate Open Graphic Image dynamically. A CSS-like API. You can generate image by using template imag

keiya sasaki 46 Dec 15, 2022
A library to create zip files on a non-seekable writer

A library to create zip files on a non-seekable writer

nyantec GmbH 2 Mar 17, 2022
A XML parser written in Rust

RustyXML Documentation RustyXML is a namespace aware XML parser written in Rust. Right now it provides a basic SAX-like API, and an ElementBuilder bas

null 97 Dec 27, 2022
An XML library in Rust

SXD-Document An XML library in Rust. Overview The project is currently broken into two crates: document - Basic DOM manipulation and reading/writing X

Jake Goulding 146 Nov 11, 2022
An XML library in Rust

xml-rs, an XML library for Rust Documentation xml-rs is an XML library for Rust programming language. It is heavily inspired by Java Streaming API for

Vladimir Matveev 417 Dec 13, 2022