PEG parser combinators using operator overloading without macros.

Overview

pom

Crates.io Build Status Docs Discord

PEG parser combinators created using operator overloading without macros.

Document

What is PEG?

PEG stands for parsing expression grammar, is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. Unlike CFGs, PEGs cannot be ambiguous; if a string parses, it has exactly one valid parse tree. Each parsing function conceptually takes an input string as its argument, and yields one of the following results:

  • success, in which the function may optionally move forward or consume one or more characters of the input string supplied to it, or
  • failure, in which case no input is consumed.

Read more on Wikipedia.

What is parser combinator?

A parser combinator is a higher-order function that accepts several parsers as input and returns a new parser as its output. Parser combinators enable a recursive descent parsing strategy that facilitates modular piecewise construction and testing.

Parsers built using combinators are straightforward to construct, readable, modular, well-structured and easily maintainable. With operator overloading, a parser combinator can take the form of an infix operator, used to glue different parsers to form a complete rule. Parser combinators thereby enable parsers to be defined in an embedded style, in code which is similar in structure to the rules of the formal grammar. And the code is easier to debug than macros.

The main advantage is that you don't need to go through any kind of code generation step, you're always using the vanilla language underneath. Aside from build issues (and the usual issues around error messages and debuggability, which in fairness are about as bad with macros as with code generation), it's usually easier to freely intermix grammar expressions and plain code.

List of predefined parsers and combinators

Basic Parsers Description
empty() Always succeeds, consume no input.
end() Match end of input.
any() Match any symbol and return the symbol.
sym(t) Match a single terminal symbol t.
seq(s) Match sequence of symbols.
list(p,s) Match list of p, separated by s.
one_of(set) Success when current input symbol is one of the set.
none_of(set) Success when current input symbol is none of the set.
is_a(predicate) Success when predicate return true on current input symbol.
not_a(predicate) Success when predicate return false on current input symbol.
take(n) Read n symbols.
skip(n) Skip n symbols.
call(pf) Call a parser factory, can be used to create recursive parsers.
Parser Combinators Description
p | q Match p or q, return result of the first success.
p + q Match p and q, if both succeed return a pair of results.
p - q Match p and q, if both succeed return result of p.
p * q Match p and q, if both succeed return result of q.
p >> q Parse p and get result P, then parse q and return result of q(P).
-p Success when p succeeds, doesn't consume input.
!p Success when p fails, doesn't consume input.
p.opt() Make parser optional. Returns an Option.
p.repeat(m..n) p.repeat(0..) repeat p zero or more times
p.repeat(1..) repeat p one or more times
p.repeat(1..4) match p at least 1 and at most 3 times
p.repeat(5) repeat p exactly 5 times
p.map(f) Convert parser result to desired value.
p.convert(f) Convert parser result to desired value, fails in case of conversion error.
p.pos() Get input position after matching p.
p.collect() Collect all matched input symbols.
p.discard() Discard parser output.
p.name(_) Give parser a name to identify parsing errors.
p.expect(_) Mark parser as expected, abort early when failed in ordered choice.

The choice of operators is established by their operator precedence, arity and "meaning". Use * to ignore the result of first operand on the start of an expression, + and - can fulfill the need on the rest of the expression.

For example, A * B * C - D + E - F will return the results of C and E as a pair.

Example code

extern crate pom;
use pom::parser::*;

let input = b"abcde";
let parser = sym(b'a') * none_of(b"AB") - sym(b'c') + seq(b"de");
let output = parser.parse(input);
assert_eq!(output, Ok( (b'b', vec![b'd', b'e']) ) );

Example JSON parser

extern crate pom;
use pom::parser::*;
use pom::Parser;

use std::collections::HashMap;
use std::str::{self, FromStr};

#[derive(Debug, PartialEq)]
pub enum JsonValue {
	Null,
	Bool(bool),
	Str(String),
	Num(f64),
	Array(Vec<JsonValue>),
	Object(HashMap<String,JsonValue>)
}

fn space() -> Parser<u8, ()> {
	one_of(b" \t\r\n").repeat(0..).discard()
}

fn number() -> Parser<u8, f64> {
	let integer = one_of(b"123456789") - one_of(b"0123456789").repeat(0..) | sym(b'0');
	let frac = sym(b'.') + one_of(b"0123456789").repeat(1..);
	let exp = one_of(b"eE") + one_of(b"+-").opt() + one_of(b"0123456789").repeat(1..);
	let number = sym(b'-').opt() + integer + frac.opt() + exp.opt();
	number.collect().convert(str::from_utf8).convert(|s|f64::from_str(&s))
}

fn string() -> Parser<u8, String> {
	let special_char = sym(b'\\') | sym(b'/') | sym(b'"')
		| sym(b'b').map(|_|b'\x08') | sym(b'f').map(|_|b'\x0C')
		| sym(b'n').map(|_|b'\n') | sym(b'r').map(|_|b'\r') | sym(b't').map(|_|b'\t');
	let escape_sequence = sym(b'\\') * special_char;
	let string = sym(b'"') * (none_of(b"\\\"") | escape_sequence).repeat(0..) - sym(b'"');
	string.convert(String::from_utf8)
}

fn array() -> Parser<u8, Vec<JsonValue>> {
	let elems = list(call(value), sym(b',') * space());
	sym(b'[') * space() * elems - sym(b']')
}

fn object() -> Parser<u8, HashMap<String, JsonValue>> {
	let member = string() - space() - sym(b':') - space() + call(value);
	let members = list(member, sym(b',') * space());
	let obj = sym(b'{') * space() * members - sym(b'}');
	obj.map(|members|members.into_iter().collect::<HashMap<_,_>>())
}

fn value() -> Parser<u8, JsonValue> {
	( seq(b"null").map(|_|JsonValue::Null)
	| seq(b"true").map(|_|JsonValue::Bool(true))
	| seq(b"false").map(|_|JsonValue::Bool(false))
	| number().map(|num|JsonValue::Num(num))
	| string().map(|text|JsonValue::Str(text))
	| array().map(|arr|JsonValue::Array(arr))
	| object().map(|obj|JsonValue::Object(obj))
	) - space()
}

pub fn json() -> Parser<u8, JsonValue> {
	space() * value() - end()
}

fn main() {
	let input = br#"
	{
        "Image": {
            "Width":  800,
            "Height": 600,
            "Title":  "View from 15th Floor",
            "Thumbnail": {
                "Url":    "http://www.example.com/image/481989943",
                "Height": 125,
                "Width":  100
            },
            "Animated" : false,
            "IDs": [116, 943, 234, 38793]
        }
    }"#;

	println!("{:?}", json().parse(input));
}

You can run this example with the following command:

cargo run --example json

Benchmark

Parser Time to parse the same JSON file
pom: json_byte 621,319 ns/iter (+/- 20,318)
pom: json_char 627,110 ns/iter (+/- 11,463)
pest: json_char 13,359 ns/iter (+/- 811)

Lifetimes and files

String literals have a static lifetime so they can work with the static version of Parser imported from pom::Parser. Input read from a file has a shorter lifetime. In this case you should import pom::parser::Parser and declare lifetimes on your parser functions. So

fn space() -> Parser<u8, ()> {
    one_of(b" \t\r\n").repeat(0..).discard()
}

would become

fn space<'a>() -> Parser<'a, u8, ()> {
    one_of(b" \t\r\n").repeat(0..).discard()
}
Comments
  • "utf8" module supporting matching UTF-8/returning &str

    I find "pom" very enjoyable to use but I find I have frustration around converting inputs and match-strings to/from UTF-8 &str (see #53). I think pom adding explicit support for UTF-8 would bring important advantages:

    • UX improvements when working with Rust strings and chars
    • Match primitives that guarantee at each step a valid UTF-8 string is being matched, for example an any() that matches UTF-8 chars only (yes, I know I can .convert(str::from_utf8) and it will correctly reject invalid UTF-8, but that bails out relatively late)

    This is a draft/first attempt at a utf8 module. (The regular parser is unchanged, utf8 is opt-in.) You can see what using it is like in the example examples/utf8.rs but it's much like normal pom. (.parse() still requires the input to be :as_bytes()ed, but seq() accepts normal Rust strings). The basic approach is

    • use pom::utf8::* contains functions that have the same names and usage as the ones in pom::parser::* (so it is mostly a drop-in replacement), but any returns or arguments that are &[I] in parser::Parser are &str in utf8::Parser.
    • pom::utf8::Parser<'a, O> is implemented as a thin wrapper around pom::parser::Parser<'a, u8, O>— it is a separate type because by keeping track of which patterns are pure utf8, collect() over a tree of utf8::Parsers can return a &str safely. But because at core it's still just parser::Parser<_, u8,_>, it can be combined into a single pattern with non-UTF8 parser::Parser (at the cost of no longer being able to do a collect() without re-verifying UTF-8).

    This prototype has just enough functions to implement the examples/utf8.rs example. It implements UTF-8 aware seq() and any() combinators, has the UTF-8 aware collect and convert, you can turn a utf8 Parser into parser::Parser with from/into, and it so far has methods passing discard, map, parse, repeat, | and * on to the underlying parser::Parser implementation.

    Next steps are:

    • Implement rest of parser:: functions/methods (I may do this with a macro? I think I would have to write the macro myself. There are some delegation macro crates but none of them seem exactly fit to this situation.)
    • sym needs to be special because this is the one function I intend to use a slightly different interface from parser::Parser:
      • pub fn sym<'a>(tag: char) -> Parser<'a, &'a str> will return a single-char string
      • pub fn sym_char<'a>(tag: char) -> Parser<'a, char> will return a parsed char, to make constructions like sym_char(ch).is_a(str::is_alphabetic) possible
    • The utf8 module uses unsafe {} because it calls str::from_utf8_unchecked on slices it has already confirmed contain complete UTF-8 characters. I would like to introduce a Cargo "feature" to remove use of unsafe from utf8, at the cost of a redundant str::from_utf8 check in places.
    • May create a utf8::Parser.parse_str(input:&str) that just calls parse(input:as_bytes()), for convenience (?)
    • Versions of +, - etc that take one parser::Parser and one utf8::Parser and return a parser::Parser, to make it easy to mix them; also I want to create an examples/utf8_mixed.rs demonstrating using parser::Parser and utf8::Parser in the same pattern (EG a simple MsgPack parser or something).

    Long term additions I'd be interested in attempting are:

    • Possibly a Unicode version of pom::char_class?
    • Possibly glyph support, or support for normalization forms (this would make possible things like seq_case_insensitive which would be very useful to me)

    What I need to know from @J-F-Liu:

    • Are you interested in this PR, in theory? If you do not think this belongs in pom, I will probably publish it as a second supplementary crate.
    • Should utf-8 support be behind a "feature" so it can be disabled? It does introduce complexities such as external crate support (it uses bstr) and unsafe.
    • The type of utf8::Parser is Parser<'a, O>. This makes sense because by definition it can only ever work on u8, but means mixing fns that define utf8::Parsers and parser::Parsers in the same file would be a little confusing because some functions would have 2 generic arguments and some would have 3. Would it make sense to put the I type argument back in with a where I=u8, and require the user to type the u8 generic argument every time? (My vote is no, it's fine the way it is now, but I wanted to ask.)

    Thank you for this neat library! I have used it a lot this month.

    opened by mcclure 10
  • Improper handling of `Incomplete`

    Improper handling of `Incomplete`

    Pom does not handle propagation of incomplete input properly. If the input is too short to completely parse an input, it must immediately return an Incomplete error, and not attempt to continue parsing.

    For example, the BitOr operator does not pay attention to why the left side failed to parse - it always goes on to try the right. If the left failed to parse because it returned Incomplete, then it should immediately return the same without trying to parse the right. Otherwise a parser of the form: seq(b"foobar") | seq(b"blitblat") will fail on the input b"foo", complaining that it expected a 'b' but got an 'f', because it improperly failed over to the right side when the left-hand parse was indeterminate.

    Similarly, repeat() has:

    			loop {
    //...
    				if let Ok(item) = self.parse(input) {
    					items.push(item)
    				} else {
    					break;
    				}
    			}
    			if let Included(&start) = range.start() {
    				if items.len() < start {
    					input.jump_to(start_pos);
    					return Err(Error::Mismatch {
    						message: format!("expect repeat at least {} times, found {} times", start, items.len()),
    						position: start_pos,
    					});
    				}
    			}
    

    which means it ends up reporting Error::Mismatch even if underlying parser returns Incomplete.

    This problem seems common throughout the codebase; I think essentially every place that handles errors needs to special case Incomplete. Perhaps it shouldn't be represented as an error at all.

    Without this, pom is not useful for processing input that comes in incrementally in buffers that may or may not contain a complete parsable phrase.

    enhancement discussion 
    opened by jsgf 10
  • Remove more unnecessary traits and generalize PartialEq usage

    Remove more unnecessary traits and generalize PartialEq usage

    These changes remove some unnecessary trait restrictions on the functions, and also generalize all uses of PartialEq (including the set implementation) to arbitrary types implementing PartialEq<T>. I think in most cases this introduces no breaking changes (and all tests still run and pass as before), however a larger corpus or test base may better confirm this. The only thing that makes me less sure is things like the change to sym which cause the returned parser to be a different type than types that are inferred from input arguments (i.e., the output type now is inferred from usage rather than from arguments, so it may be the case that existing code will need annotations to disambiguate). So to be safe it may be appropriate to put this toward a breaking API release.

    opened by afranchuk 7
  • Function parser parse input from IO cause lifetime issues

    Function parser parse input from IO cause lifetime issues

    Hi, I write a parser that can parse the raw string correctly, but can't parse content read from file, to simplify the code, here is an example:

    fn crlf() -> Parser<u8, ()> {
        one_of(b"\r\n").repeat(0..).discard()
    }
    fn main() {
        let input = fs::read("test.txt").unwrap();
    
        let outputOk = crlf().parse("test".as_bytes());
        let outputErr = crlf().parse(&input);
    }
    

    The line with outputErr cause the error: input does not live long enough

    opened by hulufei 7
  • Not operator needs 'static

    Not operator needs 'static

    Means if you want to parse a string sourced from IO, you need to use lazy_static. Not very convenient

        lazy_static! {
            pub static ref STDIN_BUF: String = {
                let mut buf = String::new();
    
                std::io::stdin().lock().read_to_string(&mut buf).expect(
                    "Failed to read stdin!"
                );
    
                buf
            };
        }
    
        println!(
            "{}",
            serde_yaml::to_string(&my_parser().parse(STDIN_BUF.as_bytes())?)?
        );
    
    opened by Cokemonkey11 5
  • Tips on debugging

    Tips on debugging

    I love this package, I'm using it (abusing it? :wink:) to parse structured PDF documents: using lopdf and a custom output to extract a struct token stream of characters and strokes, passing the token stream to a pom-based parser to convert to a data structure.

    I find that I often have errors in the form of Err(Mismatch { message: "expect end of input, found: Terminal { typ: Char(CharTerminal('(', CharTerminalKind(SecondaryTitle, Bold))), page_num: 123 }", position: 29254 }). Basically it means that the parsing of one of the items of the top-level item().repeat(0..) is not matched by the item() parser. But the error message doesn't say anything about why it failed. Usually, by doing the parsing by hand on a piece of paper I am able to find the bug in my parser, but not always.

    Do you have any tips on how to debug pom parsers efficiently? PEG.js has something like this which looks nice, but really any sort of tip or trick would be useful. My parsing is growing quite complex and even though I try to keep it organized, it can be tough to debug it.

    opened by ramonsnir 3
  • Does Parser.collect() return discarded Parsers?

    Does Parser.collect() return discarded Parsers?

    Hello,

    In the following code:

    use pom::Parser;
    use pom::parser::{sym, one_of};
    
    fn num() -> Parser<u8, &'static [u8]> {
        (sym(b'{') * one_of(b"0123456789").repeat(1..) - sym(b'}'))
            .collect()
    }
    
    fn main() {
        println!("{:?}", num().parse(b"{123}"));
    }
    

    I get the output:

    Ok([123, 49, 50, 51, 125])
    

    As you can see, in the output, the leading { (ordinal 123) and the trailing } (ordinal 125) are being returned as well, even though I am using the * and - combinators to try and ignore them.

    Is this a bug in the library, or am I just using it incorrectly?

    Thank you for creating pom.


    pom version 3.0.0 rust stable, 1.31.0 mac os

    opened by chrisdotcode 3
  • Exponential blow-up in compilation time

    Exponential blow-up in compilation time

    Hi

    We are trying to use "pom" for heavy/branchy grammar and looks like intermediate types generated by '|' cause exponential time taken by Rust compiler.

    I am wondering - is there any way to reduce their size? You might have experience the same.

    BR

    opened by MageSlayer 3
  • Issue from

    Issue from "Parser::new" method on current version.

    In the legendary project clacks use pom = "1.1.0". My current job to merge some legendary code to a new project with current pom version v 3.0 which changed on core API. I'm stuck on define new method on Parser old but it works well source

    fn output<T: 'static>(inner: Parser<u8, T>) -> Parser<u8, Matched<T>> {
            Parser::new(move |input| {
                let start = input.position();
                let output = inner.parse(input)?;
                let end = input.position();
                Ok(Matched(output, utf8(input.segment(start, end))))
            })
        }
    #[derive(Debug, Clone)]
    pub struct Matched<T>(pub T, pub String);
    fn utf8(v: Vec<u8>) -> String {
         String::from_utf8(v).unwrap()
    }
    

    How can i apply Parser::new with current version?

    opened by pictca 2
  • add whitespace-aware example parser using >>

    add whitespace-aware example parser using >>

    Greetings

    The point of this example is to demonstrate how to make whitespace-aware parsers in pom.

    I created this as a proof of concept when I wasn't sure if such a job was possible with pom's API.

    It works by using >> and take(0) to "tap" the parser and inject the output of an inner parser.

    In the process, I had some struggles with lifetimes, and nearly changed parser.rs to use FnMut, but eventually saw how it's possible without any changes needed.

    High-level description is something like this:

    Parse a white-space aware grammar by recursively parsing blocks of lines with the same indentation level

    opened by blakehawkins 2
  • Add example parsing ISO 8601 duration

    Add example parsing ISO 8601 duration

    Inspired by this comparison of Nom, Combine, and Pest, I wrote an example parser for the ISO 8601 duration format using pom.

    I'd welcome any feedback you have to improve this parser. In particular, I don't like the map statement required to flatten the tuples after chaining multiple parsers with + in the date_part and time_part functions.

    Generally, I really enjoy working with pom. Thanks for your work on it!

    opened by JoshMcguigan 2
  • Documentation or examples for Range and Set

    Documentation or examples for Range and Set

    I don't quite understand what Range and Set are for. It would be good if they had documentation strings, or at least an example. I could help write documentation as a PR if I had an example to compare to or an explanation.

    If "Range" is something used internally by the library and not intended to be messed with directly by users, the documentation strings should say that.

    I would very much like to be able to match a range of values, like (('a' as u8)..=('z' as u8)). Is that what Range does?

    opened by mcclure 0
  • Use lazy_format! to repleace format! in constructing error messages

    Use lazy_format! to repleace format! in constructing error messages

    lazy_format! captures its arguments and returns an opaque struct with a Display implementation, rather than immediatly formatting them into a String. I think this can improve the parser performance.

    opened by J-F-Liu 0
  • Parse error human-readable messages maybe should translate u8s to ascii

    Parse error human-readable messages maybe should translate u8s to ascii

    The recommended way to use pom (per your examples) is parsing 'u8's. If pom fails to match, it gives messages like: Err(Mismatch { message: "expect: 120, found: 101", position: 1 }) Here "120" is ascii "x" and "101" is ascii "e". Since usually pom will be interpreting text, it would make sense to convert these values to characters if they happen to match printable ASCII.

    Did I miss a feature that can do this already?

    opened by mcclure 1
  • Feature request: Support .clone()

    Feature request: Support .clone()

    Here is a simple sample program that uses Pom to decode a list of ranges, like "30-42,10-40".

    To parse an integer it adapts number() from the sample code:

    fn positive<'a>() -> Parser<'a, char, i64> {
    	let integer = one_of("123456789") - one_of("0123456789").repeat(0..) | sym('0');
    	integer.collect().convert(|s|String::from_iter(s.iter()).parse::<i64>())
    }
    

    Then it uses this like

    fn range<'a>() -> Parser<'a, char, (i64, i64)>
    	{ positive() - sym('-') + positive() }
    

    Notice it must call pinteger() twice. This is to make the borrow checker happy, but it also makes sense because (I expect?) each positive() will need internal state.

    What if instead the Parser objects supported .clone() and/or Copy? Then it would not need to execute the code in positive() every time, and also the "fn"s and their <'a> boilerplate would be unnecessary. For example the code could just say:

    let integerPattern = one_of("123456789") - one_of("0123456789").repeat(0..) | sym('0'); let positive = integerPattern.collect().convert(|s|String::from_iter(s.iter()).parse::()) let range = positive.clone() - sym('-') + positive.clone()

    This is a lot less typing because the signatures are not needed.

    opened by mcclure 1
  • Static lifetime of pom::Parser does not specialize / example code should not encourage pom::Parser

    Static lifetime of pom::Parser does not specialize / example code should not encourage pom::Parser

    Today I tried to use pom. I got stuck for a long time on this problem:

    Say I take the sample code in the README and modify it so that instead of the input being hardcoded in the program, it comes from a string (code linked).

    I get this error:

    74 |         println!("{:?}", json().parse(input.as_bytes()));
       |                          -------------^^^^^^^^^^^^^^^^-
       |                          |            |
       |                          |            borrowed value does not live long enough
       |                          argument requires that `input` is borrowed for `'static`
    75 |     } else {
       |     - `input` dropped here while still borrowed
    

    I don't understand why I need to borrow because it should be okay to drop input immediately. But okay, I add the &. This doesn't work either:

    error[E0597]: `input` does not live long enough
      --> src\main.rs:74:34
       |
    74 |         println!("{:?}", json().parse(&input.as_bytes()));
       |                          --------------^^^^^^^^^^^^^^^^-
       |                          |             |
       |                          |             borrowed value does not live long enough
       |                          argument requires that `input` is borrowed for `'static`
    75 |     } else {
       |     - `input` dropped here while still borrowed
    

    I am new to Rust so maybe I am confused about lifetimes. But I think what is happening here is that the sample code is using pom::Parser. As the docs explain "pom::Parser<I, O> is alias of pom::parser::Parser<'static, I, O>". This means that when json() is created, the parser object is using lifetime 'static, and therefore it requires any data it parse()s to have lifetime 'static. In other words this code can only work on data that lives the entire lifetime of the application, such as an inline constant! This is not usually what you want.

    I showed this to more experienced Rust programmers and they seemed surprised that the Rust compiler did not automatically specialize the 'static to a narrower lifetime. But somehow it does not.

    The solution is to not use pom::Parser and instead use normal pom::parser::Parser. This requires rewriting every function signature to pass along the lifetime variable, for example fn space() -> Parser<u8, ()> becomes fn space<'a>() -> Parser<'a, u8, ()>. I made this change (code linked) and the code works, even without borrowing input.as_bytes().

    I think you need to do one of the following:

    1. Make whatever change is necessary for Rust to figure out, automatically, that 'static is actually something narrower. Did this used to work at some time in the past? (I do not know if this is possible.)

    2. Make it harder to use the static pom::Parser by accident. Probably pom::Parser should be named pom::StaticParser so it is obvious it must be used with static values. Also, you should link some sample code in the documentation (such as the pom-2 code I link above) demonstrating how to use Pom with runtime values. All the sample code right now seems to use pom::Parser.

    If someone copy-pastes the current sample code, like I did, they will probably waste a lot of time trying to figure out why it does not work on simple strings until they figure out they must add lifetimes. I think this is the real error being seen in #32.

    opened by mcclure 1
  • How to use with Unicode?

    How to use with Unicode?

    I recently wrote a small program with pom. I found the API interface lovely, but I found it very hard to get string values into the library. All sample code is written with <u8, T> parsers and the literals are written like b"char". It is clear how to use this with ASCII, but not unicode.

    If I try to write my parsers instead as <char, T> then of course parse() cannot accept strings because then pom expects an array of chars and a string is UTF-8 bytes. I can convert the string to an array of chars, but for very long strings this will be inefficient.

    I see the convert() function can be used to easily (efficiently?) interpret a string as a sequence of bytes, so maybe it is okay to just use <u8, T>. However, then I have a different problem. What if I want to have unicode literals (maybe sym('🐈'), if for some reason 🐈 is a separator) or unicode ranges (for example codepoint U+1100 to U+11FF [ᄀ..ᇿ])?

    • Do I have to say seq("🐈".to_bytes()) every time? How then do I do character ranges?
    • Could pom be made to consume iterators instead of [T] arrays, so parse() could take string.chars() as an argument?
    opened by mcclure 1
Releases(v3.1.0)
  • v3.1.0(Aug 4, 2020)

  • v3.0.3(Dec 17, 2019)

  • v3.0.0(Dec 12, 2018)

    • 3.0 is based on 1.0 and changed:
    pub struct Parser<'a, I, O> {
        method: Box<Fn(&mut Input<I>) -> Result<O> + 'a>,
    }
    

    to

    pub struct Parser<'a, I, O> {
        method: Box<Fn(&'a [I], usize) -> Result<(O, usize)> + 'a>,
    }
    

    This is like 2.0 version, but avoids potential issue such as #23.

    • Toolchain switched to Rust 2018 stable channel.
    Source code(tar.gz)
    Source code(zip)
  • v1.1.0(Oct 29, 2017)

  • v2.0.0(Oct 29, 2017)

  • v2.0.0-beta(Mar 7, 2017)

    • Add p.many(range), like p.repeat(range) but return slice instead of vector.
    • Add p.cache(), can be used to remember parser output result in case of potential backtracking.
    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Feb 15, 2017)

  • v2.0.0-alpha(Feb 10, 2017)

  • v0.9.0(Feb 3, 2017)

    • Can build on Rust stable 1.15.0.
    • p.repeat(n) repeat p exactly n times.
    • Implement Display and Error for pom::Error.

    Thanks for Jeremy Fitzhardinge's contribution.

    Source code(tar.gz)
    Source code(zip)
  • v0.8.0(Jan 24, 2017)

    • Add p.name(_), give parser a name to identify parsing errors.
    • Add p.convert(f), convert parser result to desired value, fail in case of conversion error.
    • list(p,s) backtrack to the last successfully matched element.
    • Merged 2 pull requests from new contributors.
    Source code(tar.gz)
    Source code(zip)
  • v0.7.0(Jan 22, 2017)

    • Add p.pos() to get input position after matching p.
    • No longer use extra_requirement_in_impl.
    • Add pom::Parser<I, O> as alias of pom::parser::Parser<'static, I, O>.
    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(Jan 17, 2017)

  • v0.5.0(Jan 12, 2017)

  • v0.4.0(Jan 11, 2017)

  • v0.3.0(Jan 9, 2017)

    • seq(), one_of() and none_of() can accept either string literals or byte string literals.
    • Add json_char parser example.
    • JSON parser supports escaped UTF-16 character including surrogate pairs.
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Jan 3, 2017)

  • v0.1.0(Dec 30, 2016)

Owner
Junfeng Liu
https://toolkit.site/
Junfeng Liu
PEG parser for YAML written in Rust 🦀

yaml-peg PEG parser (pest) for YAML written in Rust ?? Quick Start ⚡️ # Run cargo run -- --file example_files/test.yaml # Output { "xmas": "true",

Visarut Phusua 4 Sep 17, 2022
A Rust crate for LL(k) parser combinators.

oni-comb-rs (鬼昆布,おにこんぶ) A Rust crate for LL(k) parser combinators. Main project oni-comb-parser-rs Sub projects The following is projects implemented

Junichi Kato 24 Nov 3, 2022
A procedural macro for defining nom combinators in simple DSL

A procedural macro for defining nom combinators in simple DSL

Andy Lok 22 Dec 12, 2022
Website for Microformats Rust parser (using 'microformats-parser'/'mf2')

Website for Microformats Rust parser (using 'microformats-parser'/'mf2')

Microformats 5 Jul 19, 2022
Yet Another Parser library for Rust. A lightweight, dependency free, parser combinator inspired set of utility methods to help with parsing strings and slices.

Yap: Yet another (rust) parsing library A lightweight, dependency free, parser combinator inspired set of utility methods to help with parsing input.

James Wilson 117 Dec 14, 2022
Parsing and inspecting Rust literals (particularly useful for proc macros)

litrs: parsing and inspecting Rust literals litrs offers functionality to parse Rust literals, i.e. tokens in the Rust programming language that repre

Lukas Kalbertodt 31 Dec 26, 2022
A modern dialogue executor and tree parser using YAML.

A modern dialogue executor and tree parser using YAML. This crate is for building(ex), importing/exporting(ex), and walking(ex) dialogue trees. convo

Spencer Imbleau 27 Aug 3, 2022
A native Rust port of Google's robots.txt parser and matcher C++ library.

robotstxt A native Rust port of Google's robots.txt parser and matcher C++ library. Native Rust port, no third-part crate dependency Zero unsafe code

Folyd 72 Dec 11, 2022
Rust parser combinator framework

nom, eating data byte by byte nom is a parser combinators library written in Rust. Its goal is to provide tools to build safe parsers without compromi

Geoffroy Couprie 7.6k Jan 7, 2023
url parameter parser for rest filter inquiry

inquerest Inquerest can parse complex url query into a SQL abstract syntax tree. Example this url: /person?age=lt.42&(student=eq.true|gender=eq.'M')&

Jovansonlee Cesar 25 Nov 2, 2020
A fast monadic-style parser combinator designed to work on stable Rust.

Chomp Chomp is a fast monadic-style parser combinator library designed to work on stable Rust. It was written as the culmination of the experiments de

Martin Wernstål 228 Oct 31, 2022
A parser combinator library for Rust

combine An implementation of parser combinators for Rust, inspired by the Haskell library Parsec. As in Parsec the parsers are LL(1) by default but th

Markus Westerlind 1.1k Dec 28, 2022
LR(1) parser generator for Rust

LALRPOP LALRPOP is a Rust parser generator framework with usability as its primary goal. You should be able to write compact, DRY, readable grammars.

null 2.4k Jan 7, 2023
The Elegant Parser

pest. The Elegant Parser pest is a general purpose parser written in Rust with a focus on accessibility, correctness, and performance. It uses parsing

pest 3.5k Jan 8, 2023
A typed parser generator embedded in Rust code for Parsing Expression Grammars

Oak Compiled on the nightly channel of Rust. Use rustup for managing compiler channels. You can download and set up the exact same version of the comp

Pierre Talbot 138 Nov 25, 2022
Rust query string parser with nesting support

What is Queryst? This is a fork of the original, with serde and serde_json updated to 0.9 A query string parsing library for Rust inspired by https://

Stanislav Panferov 67 Nov 16, 2022
A fast, extensible, command-line arguments parser

parkour A fast, extensible, command-line arguments parser. Introduction ?? The most popular argument parser, clap, allows you list all the possible ar

Ludwig Stecher 18 Apr 19, 2021
Soon to be AsciiDoc parser implemented in rust!

pagliascii "But ASCII Doc, I am Pagliascii" Soon to be AsciiDoc parser implemented in rust! This project is the current implementation of the requeste

Lukas Wirth 49 Dec 11, 2022
An LR parser generator, implemented as a proc macro

parsegen parsegen is an LR parser generator, similar to happy, ocamlyacc, and lalrpop. It currently generates canonical LR(1) parsers, but LALR(1) and

Ömer Sinan Ağacan 5 Feb 28, 2022