An XML library in Rust

Overview

xml-rs, an XML library for Rust

Build Status crates.io docs

Documentation

xml-rs is an XML library for Rust programming language. It is heavily inspired by Java Streaming API for XML (StAX).

This library currently contains pull parser much like StAX event reader. It provides iterator API, so you can leverage Rust's existing iterators library features.

It also provides a streaming document writer much like StAX event writer. This writer consumes its own set of events, but reader events can be converted to writer events easily, and so it is possible to write XML transformation chains in a pretty clean manner.

This parser is mostly full-featured, however, there are limitations:

  • no other encodings but UTF-8 are supported yet, because no stream-based encoding library is available now; when (or if) one will be available, I'll try to make use of it;
  • DTD validation is not supported, <!DOCTYPE> declarations are completely ignored; thus no support for custom entities too; internal DTD declarations are likely to cause parsing errors;
  • attribute value normalization is not performed, and end-of-line characters are not normalized too.

Other than that the parser tries to be mostly XML-1.0-compliant.

Writer is also mostly full-featured with the following limitations:

  • no support for encodings other than UTF-8, for the same reason as above;
  • no support for emitting <!DOCTYPE> declarations;
  • more validations of input are needed, for example, checking that namespace prefixes are bounded or comments are well-formed.

What is planned (highest priority first, approximately):

  1. missing features required by XML standard (e.g. aforementioned normalization and proper DTD parsing);
  2. miscellaneous features of the writer;
  3. parsing into a DOM tree and its serialization back to XML text;
  4. SAX-like callback-based parser (fairly easy to implement over pull parser);
  5. DTD validation;
  6. (let's dream a bit) XML Schema validation.

Building and using

xml-rs uses Cargo, so just add a dependency section in your project's manifest:

[dependencies]
xml-rs = "0.8"

The package exposes a single crate called xml:

extern crate xml;

Reading XML documents

xml::reader::EventReader requires a Read instance to read from. When a proper stream-based encoding library is available, it is likely that xml-rs will be switched to use whatever character stream structure this library would provide, but currently it is a Read.

Using EventReader is very straightforward. Just provide a Read instance to obtain an iterator over events:

extern crate xml;

use std::fs::File;
use std::io::BufReader;

use xml::reader::{EventReader, XmlEvent};

fn indent(size: usize) -> String {
    const INDENT: &'static str = "    ";
    (0..size).map(|_| INDENT)
             .fold(String::with_capacity(size*INDENT.len()), |r, s| r + s)
}

fn main() {
    let file = File::open("file.xml").unwrap();
    let file = BufReader::new(file);

    let parser = EventReader::new(file);
    let mut depth = 0;
    for e in parser {
        match e {
            Ok(XmlEvent::StartElement { name, .. }) => {
                println!("{}+{}", indent(depth), name);
                depth += 1;
            }
            Ok(XmlEvent::EndElement { name }) => {
                depth -= 1;
                println!("{}-{}", indent(depth), name);
            }
            Err(e) => {
                println!("Error: {}", e);
                break;
            }
            _ => {}
        }
    }
}

EventReader implements IntoIterator trait, so you can just use it in a for loop directly. Document parsing can end normally or with an error. Regardless of exact cause, the parsing process will be stopped, and iterator will terminate normally.

You can also have finer control over when to pull the next event from the parser using its own next() method:

match parser.next() {
    ...
}

Upon the end of the document or an error the parser will remember that last event and will always return it in the result of next() call afterwards. If iterator is used, then it will yield error or end-of-document event once and will produce None afterwards.

It is also possible to tweak parsing process a little using xml::reader::ParserConfig structure. See its documentation for more information and examples.

You can find a more extensive example of using EventReader in src/analyze.rs, which is a small program (BTW, it is built with cargo build and can be run after that) which shows various statistics about specified XML document. It can also be used to check for well-formedness of XML documents - if a document is not well-formed, this program will exit with an error.

Writing XML documents

xml-rs also provides a streaming writer much like StAX event writer. With it you can write an XML document to any Write implementor.

extern crate xml;

use std::fs::File;
use std::io::{self, Write};

use xml::writer::{EventWriter, EmitterConfig, XmlEvent, Result};

fn handle_event<W: Write>(w: &mut EventWriter<W>, line: String) -> Result<()> {
    let line = line.trim();
    let event: XmlEvent = if line.starts_with("+") && line.len() > 1 {
        XmlEvent::start_element(&line[1..]).into()
    } else if line.starts_with("-") {
        XmlEvent::end_element().into()
    } else {
        XmlEvent::characters(&line).into()
    };
    w.write(event)
}

fn main() {
    let mut file = File::create("output.xml").unwrap();

    let mut input = io::stdin();
    let mut output = io::stdout();
    let mut writer = EmitterConfig::new().perform_indent(true).create_writer(&mut file);
    loop {
        print!("> "); output.flush().unwrap();
        let mut line = String::new();
        match input.read_line(&mut line) {
            Ok(0) => break,
            Ok(_) => match handle_event(&mut writer, line) {
                Ok(_) => {}
                Err(e) => panic!("Write error: {}", e)
            },
            Err(e) => panic!("Input error: {}", e)
        }
    }
}

The code example above also demonstrates how to create a writer out of its configuration. Similar thing also works with EventReader.

The library provides an XML event building DSL which helps to construct complex events, e.g. ones having namespace definitions. Some examples:

// <a:hello a:param="value" xmlns:a="urn:some:document">
XmlEvent::start_element("a:hello").attr("a:param", "value").ns("a", "urn:some:document")

// <hello b:config="name" xmlns="urn:default:uri">
XmlEvent::start_element("hello").attr("b:config", "value").default_ns("urn:defaul:uri")

// <![CDATA[some unescaped text]]>
XmlEvent::cdata("some unescaped text")

Of course, one can create XmlEvent enum variants directly instead of using the builder DSL. There are more examples in xml::writer::XmlEvent documentation.

The writer has multiple configuration options; see EmitterConfig documentation for more information.

Other things

No performance tests or measurements are done. The implementation is rather naive, and no specific optimizations are made. Hopefully the library is sufficiently fast to process documents of common size. I intend to add benchmarks in future, but not until more important features are added.

Known issues

All known issues are present on GitHub issue tracker: http://github.com/netvl/xml-rs/issues. Feel free to post any found problems there.

License

This library is licensed under MIT license.


Copyright (C) Vladimir Matveev, 2014-2020

Comments
  • Proposal: Stream support reopen parser

    Proposal: Stream support reopen parser

    Hello,

    I'm working on an xmpp library and I need to be able to reopen the parser to consume new bytes.

    It's a first step to be able to reset some condition, I use it like that:

    loop {
        // My parser is an instance of EventReader<Buffer>,
        // Buffer is a custom struct that hold my buffer and expose some useful methods.
        // It implement Read 
        if self.parser.source().available_data() > 0 {
            self.parser.reopen_parser();
        }
        match self.parser.next() { ... }
    }
    

    What do you think?

    Signed-off-by: Freyskeyd [email protected]

    opened by Freyskeyd 13
  • very slow

    very slow

    i took the usage example from the project main page and apply it to my 3 MB xml file with this result:

    python: 00.968 sec (CPython/xml.etree, i have Java parser that are ~10x faster that this one) rust: 04.310 sec (without prints traversing only!)

    • Win7 64 x
    • cargo build --release (rust 1.1)
    opened by s-trooper 10
  • Fails to parse /> as part of XML body

    Fails to parse /> as part of XML body

    xml-rs fails to parse XML that contains /> as part of it's body.

    Per https://www.w3.org/TR/REC-xml/#syntax

    The ampersand character (&) and the left angle bracket (<) must not appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they must be escaped using either numeric character references or the strings " & " and " < " respectively. The right angle bracket (>) may be represented using the string " > ", and must, for compatibility, be escaped using either " > " or a character reference when it appears in the string " ]]> " in content, when that string is not marking the end of a CDATA section.

    It seems like this should be allowed, as only the & and < are reserved and must be escaped. > is just a may be escaped. As an example in xml-rs, <b>></b> is valid XML, just <b>/></b> is invalid XML.

    Minimally reproducible example:

    cargo.toml

    [package]
    name = "xml-minimal-error"
    version = "0.1.0"
    authors = ["Jeff LaJoie <[email protected]>"]
    edition = "2018"
    
    # See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
    
    [dependencies]
    xml-rs="0.8"
    

    main.rs

    use xml::reader::ParserConfig;
    
    fn main() {
        let xml_str: &[u8] = b"<b>/></b>";
        let parser_config = ParserConfig::default();
        let parser = parser_config.create_reader(xml_str);
    
        for event in parser {
            match event {
                Err(e) => {
                    println!("Error = {:?}", e);
                }
                _ => {},
            }
        }
    }
    

    Output

    ➜  xml-minimal-error git:(master) ✗ cargo run
       Compiling xml-minimal-error v0.1.0 (/Users/jlajoie/workspace/xml-minimal-error)
        Finished dev [unoptimized + debuginfo] target(s) in 0.23s
         Running `target/debug/xml-minimal-error`
    Error = Error { pos: 1:4, kind: Syntax("Unexpected token: />") }
    

    Additionally, this example does work in other language XML parsers.

    NodeJS Example

    package.json

    {
      "name": "xml-minimal",
      "version": "1.0.0",
      "description": "",
      "main": "index.js",
      "scripts": {
        "test": "echo \"Error: no test specified\" && exit 1"
      },
      "author": "",
      "license": "ISC",
      "dependencies": {
        "fast-xml-parser": "^3.19.0"
      }
    }
    

    index.js

    const parser = require('fast-xml-parser');
    
    let jsonObj = parser.parse("<b>/></b>");
    
    console.dir(jsonObj);
    

    output

    ➜  node-xml-minimal node index.js
    { b: '/>' }
    
    opened by jlajoie 7
  • Allow peeking into next event and getting current event

    Allow peeking into next event and getting current event

    While parsing a xml document it would be nice to know what the next event in the reader will be so I can dispatch it to the correct parser.

    Another way of doing this would be getting the current event (i.e. not having to call next to get an event).

    opened by pedrohjordao 7
  • EventWriter::flush()

    EventWriter::flush()

    It would be handy if EventWriter implemented a flush() method, that would just flush the underlying sink.

    My use-case is a long-running measurement application that occasionally saves a complex XML element into a file backed by a BufWriter. I'd like to flush the writer each time I save a new element so I have the relevant info stored in case of failure.

    opened by dvtomas 6
  • Compile error using nightly compiler

    Compile error using nightly compiler

    Looks like xml-rs stopped compiling with the nightly compiler from 2017-05-21. Nightly build from 2017-05-20 still worked.

    error: no rules expected the token `flags`
       --> /home/pg/.cargo/registry/src/github.com-1ecc6299db9ec823/xml-rs-0.1.26/src/writer/emitter.rs:115:5
        |
    115 |     flags IndentFlags: u8 {
        |     ^^^^^
    
    error: Could not compile `xml-rs`.
    
    opened by pgerber 6
  • Add a streaming API

    Add a streaming API

    When a document is received in chunks (on XMPP for example), it makes sense to initialize the parser on the first chunk, and then feed it data as it comes.

    xml::ParserConfig would get a new streaming boolean that would make it never emit xml::XmlEvent::EndDocument before the root tag is closed, a feed method to xml::EventReader, taking a string and emitting newer xml::XmlEvents as they are parsed, and finally add a method to abort the stream.

    opened by linkmauve 6
  • Move EventReader integration tests into a separate file and enable them

    Move EventReader integration tests into a separate file and enable them

    With these tests enabled people can hack on the parser more confidently. A drawback of using include_bytes! is that cargo doesn't notice changes to any included files (this might just be a cargo bug).

    opened by gkoz 6
  • Bump to 0.1.5 on crates.io for core::slice::Iter support

    Bump to 0.1.5 on crates.io for core::slice::Iter support

    Getting this error as of rust nightly 12-23:

    src/namespace.rs:2:5: 2:23 error: unresolved import `core::slice::Items`. There is no `Items` in `core::slice`
    src/namespace.rs:2 use core::slice::Items;
                           ^~~~~~~~~~~~~~~~~~
    

    Changed as of this commit: https://github.com/rust-lang/rust/commit/f8cfd2480b69a1cc266fc91d0b60c825a9dc18a7#diff-91f9d2237c7851d61911b0ca64792a88

    opened by tcr 6
  • Please expose a function to get the current position when parsing

    Please expose a function to get the current position when parsing

    This would be very helpful to construct error messages when figuring out where in a multi-megabyte XML monstrosity a parser goes off the tracks. It looks like it should be a relatively straightforward change too.

    feature desirable 
    opened by Blei 6
  • Attempted to fix the parsing as characters of ]]> and ?>

    Attempted to fix the parsing as characters of ]]> and ?>

    I was playing around with xml-rs in my own project (https://github.com/jdalberg/cwmp) and decided to use QuickCheck for validation. So a bunch of randomness into fields, generate some xml, parse it and compare with input with different permutation of content, and came across what I consider to be a bug in the parser. It would not recoqnice: "?>" as characters.

    Looking into xml-rs i could see that in line 127 of reader/lexer.rs the token for "?>" was missing from the function, looking at issue #32 i introduced two new testcases and the two tokens for "]]>" and "?>" in the list of token that are possibly characters.

    opened by jdalberg 5
  • EventReader never return Result::Err after document end

    EventReader never return Result::Err after document end

    EventReader never return Result::Err after document end. It returns Ok(EndDocument) over and over instead. It does not depend on the flag ignore_end_of_stream.

    So, next code stuck in an endless loop:

    use std::io::{BufReader, Cursor};
    use xml::{EventReader, ParserConfig};
    
    
    fn main() {
    
        let content = "<a></a>";
    
        let reader = BufReader::new(Cursor::new(content));
        let mut parser = EventReader::new_with_config(reader, ParserConfig::new().ignore_end_of_stream(true /* or false  */) );
    
        loop {
            if let Err(_) = parser.next() {
                break;
            }
        }
    }
    
    opened by chabapok 0
  • Parsing of comments <!-- <!-->

    Parsing of comments

    The tricky case of <!-- <!--> should be parsed as a single comment, ignoring <! in the comment. xml-rs parses this as two unclosed comments.

    opened by kornelski 0
  • Restricted XmlEvent?

    Restricted XmlEvent?

    Opening this to ask whether that's been considered (a search didn't turn up anything and discussions are not enabled): ParserConfig allows some fairly extensive customisation to the emittable XmlEvent variants, down to 6 (and up to 9, from a default of 8 if I'm reading everything right).

    However currently the user still has to "deal" with the un-emittable variants.

    Granted most applications will have a single loop processing the input events, but still, would a more type-heavy interface be an option?

    Sadly Rust still has no polymorphic or type-based variants, so the syntactic overhead would be fairly large, but it would also be fairly simple code, just annoying to write.

    opened by masklinn 0
Owner
Vladimir Matveev
Vladimir Matveev
Rust high performance xml reader and writer

quick-xml High performance xml pull reader/writer. The reader: is almost zero-copy (use of Cow whenever possible) is easy on memory allocation (the AP

Johann Tuffe 802 Dec 31, 2022
A XML parser written in Rust

RustyXML Documentation RustyXML is a namespace aware XML parser written in Rust. Right now it provides a basic SAX-like API, and an ElementBuilder bas

null 97 Dec 27, 2022
serde-like serialization and deserialization of static Rust types in XML

static-xml static-xml is a serde-like serialization and deserialization library for XML, currently written as a layer on top of xml-rs. Status: in ear

Scott Lamb 8 Nov 22, 2022
An XPath library in Rust

SXD-XPath An XML XPath library in Rust. Overview The project is broken into two crates: document - Basic DOM manipulation and reading/writing XML from

Jake Goulding 107 Nov 11, 2022
A Rust OpenType manipulation library

fonttools-rs   This is an attempt to write an Rust library to read, manipulate and write TTF/OTF files. It is in the early stages of development. Cont

Simon Cozens 36 Nov 14, 2022
An XML library in Rust

SXD-Document An XML library in Rust. Overview The project is currently broken into two crates: document - Basic DOM manipulation and reading/writing X

Jake Goulding 146 Nov 11, 2022
An XML library in Rust

xml-rs, an XML library for Rust Documentation xml-rs is an XML library for Rust programming language. It is heavily inspired by Java Streaming API for

Vladimir Matveev 417 Dec 13, 2022
Rust high performance xml reader and writer

quick-xml High performance xml pull reader/writer. The reader: is almost zero-copy (use of Cow whenever possible) is easy on memory allocation (the AP

Johann Tuffe 802 Dec 31, 2022
A XML parser written in Rust

RustyXML Documentation RustyXML is a namespace aware XML parser written in Rust. Right now it provides a basic SAX-like API, and an ElementBuilder bas

null 97 Dec 27, 2022
serde-like serialization and deserialization of static Rust types in XML

static-xml static-xml is a serde-like serialization and deserialization library for XML, currently written as a layer on top of xml-rs. Status: in ear

Scott Lamb 8 Nov 22, 2022
Anglosaxon is a command line tool to parse XML files using SAX

anglosaxon - Convert large XML files to other formats anglosaxon is a command line tool to parse XML files using SAX. You can do simple transformation

Amanda 8 Oct 7, 2022
dovi_meta is a CLI tool for creating Dolby Vision XML metadata from an encoded deliverable with binary metadata.

dovi_meta dovi_meta is a CLI tool for creating Dolby Vision XML metadata from an encoded deliverable with binary metadata. Building Toolchain The mini

Rainbaby 12 Dec 14, 2022
This project returns Queried value from SOAP(XML) in form of JSON.

About This is project by team SSDD for HachNUThon (TechHolding). This project stores and allows updating SOAP(xml) data and responds to various querie

Sandipsinh Rathod 3 Apr 30, 2023
Language server for Odoo Python/JS/XML

odoo-lsp Features Completion, definition and references for models, XML IDs and model fields Works for records, templates, env.ref() and other structu

Viet Dinh 5 Aug 31, 2023
Rust 核心库和标准库的源码级中文翻译,可作为 IDE 工具的智能提示 (Rust core library and standard library translation. can be used as IntelliSense for IDE tools)

Rust 标准库中文版 这是翻译 Rust 库 的地方, 相关源代码来自于 https://github.com/rust-lang/rust。 如果您不会说英语,那么拥有使用中文的文档至关重要,即使您会说英语,使用母语也仍然能让您感到愉快。Rust 标准库是高质量的,不管是新手还是老手,都可以从中

wtklbm 493 Jan 4, 2023
Rust library for build scripts to compile C/C++ code into a Rust library

A library to compile C/C++/assembly into a Rust library/application.

Alex Crichton 1.3k Dec 21, 2022
Rust Imaging Library's Python binding: A performant and high-level image processing library for Python written in Rust

ril-py Rust Imaging Library for Python: Python bindings for ril, a performant and high-level image processing library written in Rust. What's this? Th

Cryptex 13 Dec 6, 2022
The gRPC library for Rust built on C Core library and futures

gRPC-rs gRPC-rs is a Rust wrapper of gRPC Core. gRPC is a high performance, open source universal RPC framework that puts mobile and HTTP/2 first. Sta

TiKV Project 1.6k Jan 7, 2023
A µTP (Micro/uTorrent Transport Library) library implemented in Rust

rust-utp A Micro Transport Protocol library implemented in Rust. API documentation Overview The Micro Transport Protocol is a reliable transport proto

Ricardo Martins 134 Dec 11, 2022
A library to compile USDT probes into a Rust library

sonde sonde is a library to compile USDT probes into a Rust library, and to generate a friendly Rust idiomatic API around it. Userland Statically Defi

Ivan Enderlin 40 Jan 7, 2023