PEG parser combinators using operator overloading without macros.

Junfeng Liu

Last update: Dec 31, 2022

Related tags

Overview

pom

PEG parser combinators created using operator overloading without macros.

Document

Tutorial
API Reference
Learning Parser Combinators With Rust - By Bodil Stokke

What is PEG?

PEG stands for parsing expression grammar, is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. Unlike CFGs, PEGs cannot be ambiguous; if a string parses, it has exactly one valid parse tree. Each parsing function conceptually takes an input string as its argument, and yields one of the following results:

success, in which the function may optionally move forward or consume one or more characters of the input string supplied to it, or
failure, in which case no input is consumed.

What is parser combinator?

A parser combinator is a higher-order function that accepts several parsers as input and returns a new parser as its output. Parser combinators enable a recursive descent parsing strategy that facilitates modular piecewise construction and testing.

Parsers built using combinators are straightforward to construct, readable, modular, well-structured and easily maintainable. With operator overloading, a parser combinator can take the form of an infix operator, used to glue different parsers to form a complete rule. Parser combinators thereby enable parsers to be defined in an embedded style, in code which is similar in structure to the rules of the formal grammar. And the code is easier to debug than macros.

The main advantage is that you don't need to go through any kind of code generation step, you're always using the vanilla language underneath. Aside from build issues (and the usual issues around error messages and debuggability, which in fairness are about as bad with macros as with code generation), it's usually easier to freely intermix grammar expressions and plain code.

List of predefined parsers and combinators

Basic Parsers	Description
empty()	Always succeeds, consume no input.
end()	Match end of input.
any()	Match any symbol and return the symbol.
sym(t)	Match a single terminal symbol t.
seq(s)	Match sequence of symbols.
list(p,s)	Match list of p, separated by s.
one_of(set)	Success when current input symbol is one of the set.
none_of(set)	Success when current input symbol is none of the set.
is_a(predicate)	Success when predicate return true on current input symbol.
not_a(predicate)	Success when predicate return false on current input symbol.
take(n)	Read n symbols.
skip(n)	Skip n symbols.
call(pf)	Call a parser factory, can be used to create recursive parsers.

Parser Combinators	Description
p \| q	Match p or q, return result of the first success.
p + q	Match p and q, if both succeed return a pair of results.
p - q	Match p and q, if both succeed return result of p.
p * q	Match p and q, if both succeed return result of q.
p >> q	Parse p and get result P, then parse q and return result of q(P).
-p	Success when p succeeds, doesn't consume input.
!p	Success when p fails, doesn't consume input.
p.opt()	Make parser optional. Returns an `Option`.
p.repeat(m..n)	`p.repeat(0..)` repeat p zero or more times `p.repeat(1..)` repeat p one or more times `p.repeat(1..4)` match p at least 1 and at most 3 times `p.repeat(5)` repeat p exactly 5 times
p.map(f)	Convert parser result to desired value.
p.convert(f)	Convert parser result to desired value, fails in case of conversion error.
p.pos()	Get input position after matching p.
p.collect()	Collect all matched input symbols.
p.discard()	Discard parser output.
p.name(_)	Give parser a name to identify parsing errors.
p.expect(_)	Mark parser as expected, abort early when failed in ordered choice.

The choice of operators is established by their operator precedence, arity and "meaning". Use * to ignore the result of first operand on the start of an expression, + and - can fulfill the need on the rest of the expression.

For example, A * B * C - D + E - F will return the results of C and E as a pair.

Example code

extern crate pom;
use pom::parser::*;

let input = b"abcde";
let parser = sym(b'a') * none_of(b"AB") - sym(b'c') + seq(b"de");
let output = parser.parse(input);
assert_eq!(output, Ok( (b'b', vec![b'd', b'e']) ) );

Example JSON parser

extern crate pom;
use pom::parser::*;
use pom::Parser;

use std::collections::HashMap;
use std::str::{self, FromStr};

#[derive(Debug, PartialEq)]
pub enum JsonValue {
	Null,
	Bool(bool),
	Str(String),
	Num(f64),
	Array(Vec<JsonValue>),
	Object(HashMap<String,JsonValue>)
}

fn space() -> Parser<u8, ()> {
	one_of(b" \t\r\n").repeat(0..).discard()
}

fn number() -> Parser<u8, f64> {
	let integer = one_of(b"123456789") - one_of(b"0123456789").repeat(0..) | sym(b'0');
	let frac = sym(b'.') + one_of(b"0123456789").repeat(1..);
	let exp = one_of(b"eE") + one_of(b"+-").opt() + one_of(b"0123456789").repeat(1..);
	let number = sym(b'-').opt() + integer + frac.opt() + exp.opt();
	number.collect().convert(str::from_utf8).convert(|s|f64::from_str(&s))
}

fn string() -> Parser<u8, String> {
	let special_char = sym(b'\\') | sym(b'/') | sym(b'"')
		| sym(b'b').map(|_|b'\x08') | sym(b'f').map(|_|b'\x0C')
		| sym(b'n').map(|_|b'\n') | sym(b'r').map(|_|b'\r') | sym(b't').map(|_|b'\t');
	let escape_sequence = sym(b'\\') * special_char;
	let string = sym(b'"') * (none_of(b"\\\"") | escape_sequence).repeat(0..) - sym(b'"');
	string.convert(String::from_utf8)
}

fn array() -> Parser<u8, Vec<JsonValue>> {
	let elems = list(call(value), sym(b',') * space());
	sym(b'[') * space() * elems - sym(b']')
}

fn object() -> Parser<u8, HashMap<String, JsonValue>> {
	let member = string() - space() - sym(b':') - space() + call(value);
	let members = list(member, sym(b',') * space());
	let obj = sym(b'{') * space() * members - sym(b'}');
	obj.map(|members|members.into_iter().collect::<HashMap<_,_>>())
}

fn value() -> Parser<u8, JsonValue> {
	( seq(b"null").map(|_|JsonValue::Null)
	| seq(b"true").map(|_|JsonValue::Bool(true))
	| seq(b"false").map(|_|JsonValue::Bool(false))
	| number().map(|num|JsonValue::Num(num))
	| string().map(|text|JsonValue::Str(text))
	| array().map(|arr|JsonValue::Array(arr))
	| object().map(|obj|JsonValue::Object(obj))
	) - space()
}

pub fn json() -> Parser<u8, JsonValue> {
	space() * value() - end()
}

fn main() {
	let input = br#"
	{
        "Image": {
            "Width":  800,
            "Height": 600,
            "Title":  "View from 15th Floor",
            "Thumbnail": {
                "Url":    "http://www.example.com/image/481989943",
                "Height": 125,
                "Width":  100
            },
            "Animated" : false,
            "IDs": [116, 943, 234, 38793]
        }
    }"#;

	println!("{:?}", json().parse(input));
}

You can run this example with the following command:

cargo run --example json

Benchmark

Parser	Time to parse the same JSON file
pom: json_byte	621,319 ns/iter (+/- 20,318)
pom: json_char	627,110 ns/iter (+/- 11,463)
pest: json_char	13,359 ns/iter (+/- 811)

Lifetimes and files

String literals have a static lifetime so they can work with the static version of Parser imported from pom::Parser. Input read from a file has a shorter lifetime. In this case you should import pom::parser::Parser and declare lifetimes on your parser functions. So

fn space() -> Parser<u8, ()> {
    one_of(b" \t\r\n").repeat(0..).discard()
}

would become

fn space<'a>() -> Parser<'a, u8, ()> {
    one_of(b" \t\r\n").repeat(0..).discard()
}

Comments

"utf8" module supporting matching UTF-8/returning &str
I find "pom" very enjoyable to use but I find I have frustration around converting inputs and match-strings to/from UTF-8 &str (see #53). I think pom adding explicit support for UTF-8 would bring important advantages:

UX improvements when working with Rust strings and chars

Match primitives that guarantee at each step a valid UTF-8 string is being matched, for example an any() that matches UTF-8 chars only (yes, I know I can .convert(str::from_utf8) and it will correctly reject invalid UTF-8, but that bails out relatively late)

This is a draft/first attempt at a utf8 module. (The regular parser is unchanged, utf8 is opt-in.) You can see what using it is like in the example examples/utf8.rs but it's much like normal pom. (.parse() still requires the input to be :as_bytes()ed, but seq() accepts normal Rust strings). The basic approach is

use pom::utf8::* contains functions that have the same names and usage as the ones in pom::parser::* (so it is mostly a drop-in replacement), but any returns or arguments that are &[I] in parser::Parser are &str in utf8::Parser.

pom::utf8::Parser<'a, O> is implemented as a thin wrapper around pom::parser::Parser<'a, u8, O>— it is a separate type because by keeping track of which patterns are pure utf8, collect() over a tree of utf8::Parsers can return a &str safely. But because at core it's still just parser::Parser<_, u8,_>, it can be combined into a single pattern with non-UTF8 parser::Parser (at the cost of no longer being able to do a collect() without re-verifying UTF-8).

This prototype has just enough functions to implement the examples/utf8.rs example. It implements UTF-8 aware seq() and any() combinators, has the UTF-8 aware collect and convert, you can turn a utf8 Parser into parser::Parser with from/into, and it so far has methods passing discard, map, parse, repeat, | and * on to the underlying parser::Parser implementation.

Next steps are:

Implement rest of parser:: functions/methods (I may do this with a macro? I think I would have to write the macro myself. There are some delegation macro crates but none of them seem exactly fit to this situation.)

sym needs to be special because this is the one function I intend to use a slightly different interface from parser::Parser:

pub fn sym<'a>(tag: char) -> Parser<'a, &'a str> will return a single-char string

pub fn sym_char<'a>(tag: char) -> Parser<'a, char> will return a parsed char, to make constructions like sym_char(ch).is_a(str::is_alphabetic) possible

The utf8 module uses unsafe {} because it calls str::from_utf8_unchecked on slices it has already confirmed contain complete UTF-8 characters. I would like to introduce a Cargo "feature" to remove use of unsafe from utf8, at the cost of a redundant str::from_utf8 check in places.

May create a utf8::Parser.parse_str(input:&str) that just calls parse(input:as_bytes()), for convenience (?)

Versions of +, - etc that take one parser::Parser and one utf8::Parser and return a parser::Parser, to make it easy to mix them; also I want to create an examples/utf8_mixed.rs demonstrating using parser::Parser and utf8::Parser in the same pattern (EG a simple MsgPack parser or something).

Long term additions I'd be interested in attempting are:

Possibly a Unicode version of pom::char_class?

Possibly glyph support, or support for normalization forms (this would make possible things like seq_case_insensitive which would be very useful to me)

What I need to know from @J-F-Liu:

Are you interested in this PR, in theory? If you do not think this belongs in pom, I will probably publish it as a second supplementary crate.

Should utf-8 support be behind a "feature" so it can be disabled? It does introduce complexities such as external crate support (it uses bstr) and unsafe.

The type of utf8::Parser is Parser<'a, O>. This makes sense because by definition it can only ever work on u8, but means mixing fns that define utf8::Parsers and parser::Parsers in the same file would be a little confusing because some functions would have 2 generic arguments and some would have 3. Would it make sense to put the I type argument back in with a where I=u8, and require the user to type the u8 generic argument every time? (My vote is no, it's fine the way it is now, but I wanted to ask.)

Thank you for this neat library! I have used it a lot this month.
opened by mcclure 10
Improper handling of `Incomplete`
Pom does not handle propagation of incomplete input properly. If the input is too short to completely parse an input, it must immediately return an Incomplete error, and not attempt to continue parsing.

For example, the BitOr operator does not pay attention to why the left side failed to parse - it always goes on to try the right. If the left failed to parse because it returned Incomplete, then it should immediately return the same without trying to parse the right. Otherwise a parser of the form: seq(b"foobar") | seq(b"blitblat") will fail on the input b"foo", complaining that it expected a 'b' but got an 'f', because it improperly failed over to the right side when the left-hand parse was indeterminate.

Similarly, repeat() has:

loop { //... if let Ok(item) = self.parse(input) { items.push(item) } else { break; } } if let Included(&start) = range.start() { if items.len() < start { input.jump_to(start_pos); return Err(Error::Mismatch { message: format!("expect repeat at least {} times, found {} times", start, items.len()), position: start_pos, }); } }

which means it ends up reporting Error::Mismatch even if underlying parser returns Incomplete.

This problem seems common throughout the codebase; I think essentially every place that handles errors needs to special case Incomplete. Perhaps it shouldn't be represented as an error at all.

Without this, pom is not useful for processing input that comes in incrementally in buffers that may or may not contain a complete parsable phrase.
enhancement discussion
opened by jsgf 10
Remove more unnecessary traits and generalize PartialEq usage

These changes remove some unnecessary trait restrictions on the functions, and also generalize all uses of PartialEq (including the set implementation) to arbitrary types implementing PartialEq<T>. I think in most cases this introduces no breaking changes (and all tests still run and pass as before), however a larger corpus or test base may better confirm this. The only thing that makes me less sure is things like the change to sym which cause the returned parser to be a different type than types that are inferred from input arguments (i.e., the output type now is inferred from usage rather than from arguments, so it may be the case that existing code will need annotations to disambiguate). So to be safe it may be appropriate to put this toward a breaking API release.

opened by afranchuk 7
Function parser parse input from IO cause lifetime issues
Hi, I write a parser that can parse the raw string correctly, but can't parse content read from file, to simplify the code, here is an example:

fn crlf() -> Parser<u8, ()> { one_of(b"\r\n").repeat(0..).discard() } fn main() { let input = fs::read("test.txt").unwrap(); let outputOk = crlf().parse("test".as_bytes()); let outputErr = crlf().parse(&input); }

The line with outputErr cause the error: input does not live long enough
opened by hulufei 7

Not operator needs 'static

Means if you want to parse a string sourced from IO, you need to use lazy_static. Not very convenient

    lazy_static! {
        pub static ref STDIN_BUF: String = {
            let mut buf = String::new();

            std::io::stdin().lock().read_to_string(&mut buf).expect(
                "Failed to read stdin!"
            );

            buf
        };
    }

    println!(
        "{}",
        serde_yaml::to_string(&my_parser().parse(STDIN_BUF.as_bytes())?)?
    );

opened by Cokemonkey11 5

Tips on debugging

I love this package, I'm using it (abusing it? :wink:) to parse structured PDF documents: using lopdf and a custom output to extract a struct token stream of characters and strokes, passing the token stream to a pom-based parser to convert to a data structure.

I find that I often have errors in the form of Err(Mismatch { message: "expect end of input, found: Terminal { typ: Char(CharTerminal('(', CharTerminalKind(SecondaryTitle, Bold))), page_num: 123 }", position: 29254 }). Basically it means that the parsing of one of the items of the top-level item().repeat(0..) is not matched by the item() parser. But the error message doesn't say anything about why it failed. Usually, by doing the parsing by hand on a piece of paper I am able to find the bug in my parser, but not always.

Do you have any tips on how to debug pom parsers efficiently? PEG.js has something like this which looks nice, but really any sort of tip or trick would be useful. My parsing is growing quite complex and even though I try to keep it organized, it can be tough to debug it.

opened by ramonsnir 3
Does Parser.collect() return discarded Parsers?
Hello,

In the following code:

use pom::Parser; use pom::parser::{sym, one_of}; fn num() -> Parser<u8, &'static [u8]> { (sym(b'{') * one_of(b"0123456789").repeat(1..) - sym(b'}')) .collect() } fn main() { println!("{:?}", num().parse(b"{123}")); }

I get the output:

Ok([123, 49, 50, 51, 125])

As you can see, in the output, the leading { (ordinal 123) and the trailing } (ordinal 125) are being returned as well, even though I am using the * and - combinators to try and ignore them.

Is this a bug in the library, or am I just using it incorrectly?

Thank you for creating pom.

pom version 3.0.0 rust stable, 1.31.0 mac os
opened by chrisdotcode 3
Exponential blow-up in compilation time

Hi

We are trying to use "pom" for heavy/branchy grammar and looks like intermediate types generated by '|' cause exponential time taken by Rust compiler.

I am wondering - is there any way to reduce their size? You might have experience the same.

BR

opened by MageSlayer 3

Issue from "Parser::new" method on current version.

In the legendary project clacks use pom = "1.1.0". My current job to merge some legendary code to a new project with current pom version v 3.0 which changed on core API. I'm stuck on define new method on Parser old but it works well source

fn output<T: 'static>(inner: Parser<u8, T>) -> Parser<u8, Matched<T>> {
        Parser::new(move |input| {
            let start = input.position();
            let output = inner.parse(input)?;
            let end = input.position();
            Ok(Matched(output, utf8(input.segment(start, end))))
        })
    }
#[derive(Debug, Clone)]
pub struct Matched<T>(pub T, pub String);
fn utf8(v: Vec<u8>) -> String {
     String::from_utf8(v).unwrap()
}

How can i apply Parser::new with current version?

opened by pictca 2

add whitespace-aware example parser using >>

Greetings

The point of this example is to demonstrate how to make whitespace-aware parsers in pom.

I created this as a proof of concept when I wasn't sure if such a job was possible with pom's API.

It works by using >> and take(0) to "tap" the parser and inject the output of an inner parser.

In the process, I had some struggles with lifetimes, and nearly changed parser.rs to use FnMut, but eventually saw how it's possible without any changes needed.

High-level description is something like this:

Parse a white-space aware grammar by recursively parsing blocks of lines with the same indentation level

opened by blakehawkins 2
Add example parsing ISO 8601 duration

Inspired by this comparison of Nom, Combine, and Pest, I wrote an example parser for the ISO 8601 duration format using pom.

I'd welcome any feedback you have to improve this parser. In particular, I don't like the map statement required to flatten the tuples after chaining multiple parsers with + in the date_part and time_part functions.

Generally, I really enjoy working with pom. Thanks for your work on it!

opened by JoshMcguigan 2
Documentation or examples for Range and Set

I don't quite understand what Range and Set are for. It would be good if they had documentation strings, or at least an example. I could help write documentation as a PR if I had an example to compare to or an explanation.

If "Range" is something used internally by the library and not intended to be messed with directly by users, the documentation strings should say that.

I would very much like to be able to match a range of values, like (('a' as u8)..=('z' as u8)). Is that what Range does?

opened by mcclure 0
Use lazy_format! to repleace format! in constructing error messages

lazy_format! captures its arguments and returns an opaque struct with a Display implementation, rather than immediatly formatting them into a String. I think this can improve the parser performance.

opened by J-F-Liu 0
Parse error human-readable messages maybe should translate u8s to ascii

The recommended way to use pom (per your examples) is parsing 'u8's. If pom fails to match, it gives messages like: Err(Mismatch { message: "expect: 120, found: 101", position: 1 }) Here "120" is ascii "x" and "101" is ascii "e". Since usually pom will be interpreting text, it would make sense to convert these values to characters if they happen to match printable ASCII.

Did I miss a feature that can do this already?

opened by mcclure 1
Feature request: Support .clone()
Here is a simple sample program that uses Pom to decode a list of ranges, like "30-42,10-40".

To parse an integer it adapts number() from the sample code:

fn positive<'a>() -> Parser<'a, char, i64> { let integer = one_of("123456789") - one_of("0123456789").repeat(0..) | sym('0'); integer.collect().convert(|s|String::from_iter(s.iter()).parse::<i64>()) }

Then it uses this like

fn range<'a>() -> Parser<'a, char, (i64, i64)> { positive() - sym('-') + positive() }

Notice it must call pinteger() twice. This is to make the borrow checker happy, but it also makes sense because (I expect?) each positive() will need internal state.

What if instead the Parser objects supported .clone() and/or Copy? Then it would not need to execute the code in positive() every time, and also the "fn"s and their <'a> boilerplate would be unnecessary. For example the code could just say:

let integerPattern = one_of("123456789") - one_of("0123456789").repeat(0..) | sym('0'); let positive = integerPattern.collect().convert(|s|String::from_iter(s.iter()).parse::()) let range = positive.clone() - sym('-') + positive.clone()

This is a lot less typing because the signatures are not needed.
opened by mcclure 1
Static lifetime of pom::Parser does not specialize / example code should not encourage pom::Parser
Today I tried to use pom. I got stuck for a long time on this problem:

Say I take the sample code in the README and modify it so that instead of the input being hardcoded in the program, it comes from a string (code linked).

I get this error:

74 | println!("{:?}", json().parse(input.as_bytes())); | -------------^^^^^^^^^^^^^^^^- | | | | | borrowed value does not live long enough | argument requires that `input` is borrowed for `'static` 75 | } else { | - `input` dropped here while still borrowed

I don't understand why I need to borrow because it should be okay to drop input immediately. But okay, I add the &. This doesn't work either:

error[E0597]: `input` does not live long enough --> src\main.rs:74:34 | 74 | println!("{:?}", json().parse(&input.as_bytes())); | --------------^^^^^^^^^^^^^^^^- | | | | | borrowed value does not live long enough | argument requires that `input` is borrowed for `'static` 75 | } else { | - `input` dropped here while still borrowed

I am new to Rust so maybe I am confused about lifetimes. But I think what is happening here is that the sample code is using pom::Parser. As the docs explain "pom::Parser<I, O> is alias of pom::parser::Parser<'static, I, O>". This means that when json() is created, the parser object is using lifetime 'static, and therefore it requires any data it parse()s to have lifetime 'static. In other words this code can only work on data that lives the entire lifetime of the application, such as an inline constant! This is not usually what you want.

I showed this to more experienced Rust programmers and they seemed surprised that the Rust compiler did not automatically specialize the 'static to a narrower lifetime. But somehow it does not.

The solution is to not use pom::Parser and instead use normal pom::parser::Parser. This requires rewriting every function signature to pass along the lifetime variable, for example fn space() -> Parser<u8, ()> becomes fn space<'a>() -> Parser<'a, u8, ()>. I made this change (code linked) and the code works, even without borrowing input.as_bytes().

I think you need to do one of the following:

Make whatever change is necessary for Rust to figure out, automatically, that 'static is actually something narrower. Did this used to work at some time in the past? (I do not know if this is possible.)

Make it harder to use the static pom::Parser by accident. Probably pom::Parser should be named pom::StaticParser so it is obvious it must be used with static values. Also, you should link some sample code in the documentation (such as the pom-2 code I link above) demonstrating how to use Pom with runtime values. All the sample code right now seems to use pom::Parser.

If someone copy-pastes the current sample code, like I did, they will probably waste a lot of time trying to figure out why it does not work on simple strings until they figure out they must add lifetimes. I think this is the real error being seen in #32.
opened by mcclure 1
How to use with Unicode?
I recently wrote a small program with pom. I found the API interface lovely, but I found it very hard to get string values into the library. All sample code is written with <u8, T> parsers and the literals are written like b"char". It is clear how to use this with ASCII, but not unicode.

If I try to write my parsers instead as <char, T> then of course parse() cannot accept strings because then pom expects an array of chars and a string is UTF-8 bytes. I can convert the string to an array of chars, but for very long strings this will be inefficient.

I see the convert() function can be used to easily (efficiently?) interpret a string as a sequence of bytes, so maybe it is okay to just use <u8, T>. However, then I have a different problem. What if I want to have unicode literals (maybe sym('🐈'), if for some reason 🐈 is a separator) or unicode ranges (for example codepoint U+1100 to U+11FF [ᄀ..ᇿ])?

Do I have to say seq("🐈".to_bytes()) every time? How then do I do character ranges?

Could pom be made to consume iterators instead of [T] arrays, so parse() could take string.chars() as an argument?
opened by mcclure 1

Releases(v3.1.0)

v3.1.0(Aug 4, 2020)

Expose the method field in Parser to allow custom combinators
Source code(tar.gz)
Source code(zip)
v3.0.3(Dec 17, 2019)

Input symbol's trait bounds change Copy to Clone.
Source code(tar.gz)
Source code(zip)
v3.0.0(Dec 12, 2018)
3.0 is based on 1.0 and changed:

pub struct Parser<'a, I, O> { method: Box<Fn(&mut Input<I>) -> Result<O> + 'a>, }

to

pub struct Parser<'a, I, O> { method: Box<Fn(&'a [I], usize) -> Result<(O, usize)> + 'a>, }

This is like 2.0 version, but avoids potential issue such as #23.

Toolchain switched to Rust 2018 stable channel.

Source code(tar.gz)
Source code(zip)
v1.1.0(Oct 29, 2017)
Add p.expect(_), mark parser as expected, abort early when failed in ordered choice.

Source code(tar.gz)
Source code(zip)
v2.0.0(Oct 29, 2017)
Add p.expect(_), mark parser as expected, abort early when failed in ordered choice.

Source code(tar.gz)
Source code(zip)
v2.0.0-beta(Mar 7, 2017)
Add p.many(range), like p.repeat(range) but return slice instead of vector.

Add p.cache(), can be used to remember parser output result in case of potential backtracking.

Source code(tar.gz)
Source code(zip)
v1.0.0(Feb 15, 2017)
Remove range() function, one_of() and none_of() can accept range literal as well.

Performance improvement.

Source code(tar.gz)
Source code(zip)
v2.0.0-alpha(Feb 10, 2017)
Greatly improved performance.

Parser as trait, Combinator as wrapper struct.

Apply zero-copy strategy.

Source code(tar.gz)
Source code(zip)
v0.9.0(Feb 3, 2017)
Can build on Rust stable 1.15.0.

p.repeat(n) repeat p exactly n times.

Implement Display and Error for pom::Error.

Thanks for Jeremy Fitzhardinge's contribution.
Source code(tar.gz)
Source code(zip)
v0.8.0(Jan 24, 2017)
Add p.name(_), give parser a name to identify parsing errors.

Add p.convert(f), convert parser result to desired value, fail in case of conversion error.

list(p,s) backtrack to the last successfully matched element.

Merged 2 pull requests from new contributors.

Source code(tar.gz)
Source code(zip)
v0.7.0(Jan 22, 2017)
Add p.pos() to get input position after matching p.

No longer use extra_requirement_in_impl.

Add pom::Parser<I, O> as alias of pom::parser::Parser<'static, I, O>.

Source code(tar.gz)
Source code(zip)
v0.6.0(Jan 17, 2017)
Rename eof() to end().

Rename term() to sym().

Add Benchmark and document.

Source code(tar.gz)
Source code(zip)
v0.5.0(Jan 12, 2017)

Add p >> q operation, where q depends on the result of p.
Source code(tar.gz)
Source code(zip)
v0.4.0(Jan 11, 2017)
Now parser's Input is a trait.

Input has two impls: DataInput and TextInput.

Source code(tar.gz)
Source code(zip)
v0.3.0(Jan 9, 2017)
seq(), one_of() and none_of() can accept either string literals or byte string literals.

Add json_char parser example.

JSON parser supports escaped UTF-16 character including surrogate pairs.

Source code(tar.gz)
Source code(zip)
v0.2.0(Jan 3, 2017)

Add list(p,s) to parse separated list.
Source code(tar.gz)
Source code(zip)
v0.1.0(Dec 30, 2016)

Ingenious design and implementation of parser combinators.
Source code(tar.gz)
Source code(zip)

Owner

Junfeng Liu

https://toolkit.site/

GitHub

PEG parser for YAML written in Rust 🦀

yaml-peg PEG parser (pest) for YAML written in Rust ?? Quick Start ⚡️ # Run cargo run -- --file example_files/test.yaml # Output { "xmas": "true",

4 Sep 17, 2022

A Rust crate for LL(k) parser combinators.

oni-comb-rs (鬼昆布,おにこんぶ) A Rust crate for LL(k) parser combinators. Main project oni-comb-parser-rs Sub projects The following is projects implemented

24 Nov 3, 2022

A procedural macro for defining nom combinators in simple DSL

22 Dec 12, 2022

Website for Microformats Rust parser (using 'microformats-parser'/'mf2')

5 Jul 19, 2022

Yet Another Parser library for Rust. A lightweight, dependency free, parser combinator inspired set of utility methods to help with parsing strings and slices.

Yap: Yet another (rust) parsing library A lightweight, dependency free, parser combinator inspired set of utility methods to help with parsing input.

117 Dec 14, 2022

Parsing and inspecting Rust literals (particularly useful for proc macros)

litrs: parsing and inspecting Rust literals litrs offers functionality to parse Rust literals, i.e. tokens in the Rust programming language that repre

31 Dec 26, 2022

A modern dialogue executor and tree parser using YAML.

A modern dialogue executor and tree parser using YAML. This crate is for building(ex), importing/exporting(ex), and walking(ex) dialogue trees. convo

27 Aug 3, 2022

A native Rust port of Google's robots.txt parser and matcher C++ library.

robotstxt A native Rust port of Google's robots.txt parser and matcher C++ library. Native Rust port, no third-part crate dependency Zero unsafe code

72 Dec 11, 2022

Rust parser combinator framework

nom, eating data byte by byte nom is a parser combinators library written in Rust. Its goal is to provide tools to build safe parsers without compromi

7.6k Jan 7, 2023

url parameter parser for rest filter inquiry

inquerest Inquerest can parse complex url query into a SQL abstract syntax tree. Example this url: /person?age=lt.42&(student=eq.true|gender=eq.'M')&

25 Nov 2, 2020

A fast monadic-style parser combinator designed to work on stable Rust.

Chomp Chomp is a fast monadic-style parser combinator library designed to work on stable Rust. It was written as the culmination of the experiments de

228 Oct 31, 2022

A parser combinator library for Rust

combine An implementation of parser combinators for Rust, inspired by the Haskell library Parsec. As in Parsec the parsers are LL(1) by default but th

1.1k Dec 28, 2022

LR(1) parser generator for Rust

LALRPOP LALRPOP is a Rust parser generator framework with usability as its primary goal. You should be able to write compact, DRY, readable grammars.

2.4k Jan 7, 2023

The Elegant Parser

pest. The Elegant Parser pest is a general purpose parser written in Rust with a focus on accessibility, correctness, and performance. It uses parsing

3.5k Jan 8, 2023

A typed parser generator embedded in Rust code for Parsing Expression Grammars

Oak Compiled on the nightly channel of Rust. Use rustup for managing compiler channels. You can download and set up the exact same version of the comp

138 Nov 25, 2022

Rust query string parser with nesting support

What is Queryst? This is a fork of the original, with serde and serde_json updated to 0.9 A query string parsing library for Rust inspired by https://

67 Nov 16, 2022

A fast, extensible, command-line arguments parser

parkour A fast, extensible, command-line arguments parser. Introduction ?? The most popular argument parser, clap, allows you list all the possible ar

18 Apr 19, 2021

Soon to be AsciiDoc parser implemented in rust!

pagliascii "But ASCII Doc, I am Pagliascii" Soon to be AsciiDoc parser implemented in rust! This project is the current implementation of the requeste

49 Dec 11, 2022

An LR parser generator, implemented as a proc macro

parsegen parsegen is an LR parser generator, similar to happy, ocamlyacc, and lalrpop. It currently generates canonical LR(1) parsers, but LALR(1) and

5 Feb 28, 2022

PEG parser combinators using operator overloading without macros.

Related tags

Overview

pom

Document

What is PEG?

What is parser combinator?

List of predefined parsers and combinators

Example code

Example JSON parser

Benchmark

Lifetimes and files

Comments

Releases(v3.1.0)

v3.1.0(Aug 4, 2020)

v3.0.3(Dec 17, 2019)

v3.0.0(Dec 12, 2018)

v1.1.0(Oct 29, 2017)

v2.0.0(Oct 29, 2017)

v2.0.0-beta(Mar 7, 2017)

v1.0.0(Feb 15, 2017)

v2.0.0-alpha(Feb 10, 2017)

v0.9.0(Feb 3, 2017)

v0.8.0(Jan 24, 2017)

v0.7.0(Jan 22, 2017)

v0.6.0(Jan 17, 2017)

v0.5.0(Jan 12, 2017)

v0.4.0(Jan 11, 2017)

v0.3.0(Jan 9, 2017)

v0.2.0(Jan 3, 2017)

v0.1.0(Dec 30, 2016)

Owner

Junfeng Liu

PEG parser for YAML written in Rust 🦀

A Rust crate for LL(k) parser combinators.

A procedural macro for defining nom combinators in simple DSL

Website for Microformats Rust parser (using 'microformats-parser'/'mf2')

Yet Another Parser library for Rust. A lightweight, dependency free, parser combinator inspired set of utility methods to help with parsing strings and slices.

Parsing and inspecting Rust literals (particularly useful for proc macros)

A modern dialogue executor and tree parser using YAML.

A native Rust port of Google's robots.txt parser and matcher C++ library.

Rust parser combinator framework

url parameter parser for rest filter inquiry

A fast monadic-style parser combinator designed to work on stable Rust.

A parser combinator library for Rust

LR(1) parser generator for Rust

The Elegant Parser

A typed parser generator embedded in Rust code for Parsing Expression Grammars

Rust query string parser with nesting support

A fast, extensible, command-line arguments parser

Soon to be AsciiDoc parser implemented in rust!

An LR parser generator, implemented as a proc macro