A parser combinator library for Rust

Markus Westerlind

Last update: Dec 28, 2022

Related tags

Overview

combine

An implementation of parser combinators for Rust, inspired by the Haskell library Parsec. As in Parsec the parsers are LL(1) by default but they can opt-in to arbitrary lookahead using the attempt combinator.

Example

extern crate combine;
use combine::{many1, Parser, sep_by};
use combine::parser::char::{letter, space};

// Construct a parser that parses *many* (and at least *1) *letter*s
let word = many1(letter());

// Construct a parser that parses many *word*s where each word is *separated by* a (white)*space*
let mut parser = sep_by(word, space())
    // Combine can collect into any type implementing `Default + Extend` so we need to assist rustc
    // by telling it that `sep_by` should collect into a `Vec` and `many1` should collect to a `String`
    .map(|mut words: Vec<String>| words.pop());
let result = parser.parse("Pick up that word!");
// `parse` returns `Result` where `Ok` contains a tuple of the parsers output and any remaining input.
assert_eq!(result, Ok((Some("word".to_string()), "!")));

A tutorial as well as explanations on what goes on inside combine can be found in the wiki.

Larger examples can be found in the examples, tests and benches folders.

Links

Documentation and examples

crates.io

Features

Parse arbitrary streams - Combine can parse anything from &[u8] and &str to iterators and Read instances. If none of the builtin streams fit your use case you can even implement a couple traits your self to create your own custom stream!
zero-copy parsing - When parsing in memory data, combine can parse without copying. See the range module for parsers specialized for zero-copy parsing.
partial parsing - Combine parsers can be stopped at any point during parsing and later be resumed without losing any progress. This makes it possible to start parsing partial data coming from an io device such as a socket without worrying about if enough data is present to complete the parse. If more data is needed the parser will stop and may be resumed at the same point once more data is available. See the async example for an example and this post for an introduction.

About

A parser combinator is, broadly speaking, a function which takes several parsers as arguments and returns a new parser, created by combining those parsers. For instance, the many parser takes one parser, p, as input and returns a new parser which applies p zero or more times. Thanks to the modularity that parser combinators gives it is possible to define parsers for a wide range of tasks without needing to implement the low level plumbing while still having the full power of Rust when you need it.

The library adheres to semantic versioning.

If you end up trying it I welcome any feedback from your experience with it. I am usually reachable within a day by opening an issue, sending an email or posting a message on Gitter.

FAQ

Why does my errors contain inscrutable positions?

Since combine aims to crate parsers with little to no overhead, streams over &str and &[T] do not carry any extra position information, but instead, they only rely on comparing the pointer of the buffer to check which Stream is further ahead than another Stream. To retrieve a better position, either call translate_position on the PointerOffset which represents the position or wrap your stream with State.

How does it compare to nom?

https://github.com/Marwes/combine/issues/73 contains discussion and links to comparisons to nom.

Parsers written in combine

Formats and protocols

GraphQL https://github.com/graphql-rust/graphql-parser (Uses a custom tokenizer as input)
DiffX https://github.com/brennie/diffx-rs
Redis https://github.com/mitsuhiko/redis-rs/pull/141 (Uses partial parsing)
Toml https://github.com/ordian/toml_edit
Maker Interchange Format https://github.com/aidanhs/frametool (Uses combine as a lexer)
Javascript https://github.com/freemasen/ress
JPEG Metadata https://github.com/vadixidav/exifsd

Miscellaneous

Template language https://github.com/tailhook/trimmer
Code exercises https://github.com/dgel/adventOfCode2017
Programming language
- https://github.com/MaikKlein/spire-lang
- https://github.com/vadixidav/typeflow/tree/master/lang
Query parser (+ more) https://github.com/mozilla/mentat
Query parser https://github.com/tantivy-search/tantivy

Extra

There is an additional crate which has parsers to lex and parse programming languages in combine-language.

You can find older versions of combine (parser-combinators) here.

Contributing

Current master is the 3.0.0 branch. If you want to submit a fix or feature to the 2.x version of combine then do so to the 2.x branch or submit the PR to master and request that it be backported.

The easiest way to contribute is to just open an issue about any problems you encounter using combine but if you are interested in adding something to the library here is a list of some of the easier things to work on to get started.

Add additional parsers If you have a suggestion for another parser just open an issue or a PR with an implementation.
Add additional examples More examples for using combine will always be useful!
Add and improve the docs Not the fanciest of work but one cannot overstate the importance of good documentation.

Comments

Extremely long compile times

I know this issue is a little vague, so please bear with me. As I've started using parser-combinators, I've noticed that the compile times for my project have sharply increased; more than I would expect from the amount of code I've written. cargo build now takes 8 to 11 minutes to complete. This is on a fairly new machine, too - I have a MacBook Pro with a 2.5GHz quad-core i7, and I've generally seen very good performance from cargo/rustc.

I guess I'm just curious to know what's causing these very long compile times. Is this to be expected when using this library, am I misusing it in some way that's confusing rustc, or is something else wrong? Of course, it's likely difficult to determine exactly what's going on, but I'd welcome any additional information.
Slow

opened by hawkw 22
Propagate error throught block
Hi,

I have a problem with good error reporting. I'm parsing nginx config syntax which is basically:

directive1 args; direcitve2 args;

And then there are blocks:

location / { directive1 args; directive2 args: }

When bad directive is reached in a root context it reports a nice error.

But when parser encounters bad directive in a block:

location / { directive1 args; bad_directive; }

It reports error as:

Unexpected `bad_directive` Expected `BlockEnd`

I would like it to report Expected("some_similar_directive") messages too, but they got lost (I think) in many combinator.

I can probably make a small and self contained example to reproduce the issue, but I think I'm missing something small and simple. Any ideas?
bug
opened by tailhook 16
Code organisation for larger parser

I have a 400 lines nom parser I consider to convert to combine (mostly for the better error messages). How would you organize it? I like being able to cleanly separate each small parser. How can I do that with combine?

opened by Yamakaky 16
Relicense under dual MIT/Apache-2.0
This issue was automatically generated. Feel free to close without ceremony if you do not agree with re-licensing or if it is not possible for other reasons. Respond to @cmr with any questions or concerns, or pop over to #rust-offtopic on IRC to discuss.

You're receiving this because someone (perhaps the project maintainer) published a crates.io package with the license as "MIT" xor "Apache-2.0" and the repository field pointing here.

TL;DR the Rust ecosystem is largely Apache-2.0. Being available under that license is good for interoperation. The MIT license as an add-on can be nice for GPLv2 projects to use your code.

Why?

The MIT license requires reproducing countless copies of the same copyright header with different names in the copyright field, for every MIT library in use. The Apache license does not have this drawback. However, this is not the primary motivation for me creating these issues. The Apache license also has protections from patent trolls and an explicit contribution licensing clause. However, the Apache license is incompatible with GPLv2. This is why Rust is dual-licensed as MIT/Apache (the "primary" license being Apache, MIT only for GPLv2 compat), and doing so would be wise for this project. This also makes this crate suitable for inclusion and unrestricted sharing in the Rust standard distribution and other projects using dual MIT/Apache, such as my personal ulterior motive, the Robigalia project.

Some ask, "Does this really apply to binary redistributions? Does MIT really require reproducing the whole thing?" I'm not a lawyer, and I can't give legal advice, but some Google Android apps include open source attributions using this interpretation. Others also agree with it. But, again, the copyright notice redistribution is not the primary motivation for the dual-licensing. It's stronger protections to licensees and better interoperation with the wider Rust ecosystem.

How?

To do this, get explicit approval from each contributor of copyrightable work (as not all contributions qualify for copyright, due to not being a "creative work", e.g. a typo fix) and then add the following to your README:

## License Licensed under either of * Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0) * MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT) at your option. ### Contribution Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

and in your license headers, if you have them, use the following boilerplate (based on that used in Rust):

// Copyright 2016 parser-combinators Developers // // Licensed under the Apache License, Version 2.0, <LICENSE-APACHE or // http://apache.org/licenses/LICENSE-2.0> or the MIT license <LICENSE-MIT or // http://opensource.org/licenses/MIT>, at your option. This file may not be // copied, modified, or distributed except according to those terms.

It's commonly asked whether license headers are required. I'm not comfortable making an official recommendation either way, but the Apache license recommends it in their appendix on how to use the license.

Be sure to add the relevant LICENSE-{MIT,APACHE} files. You can copy these from the Rust repo for a plain-text version.

And don't forget to update the license metadata in your Cargo.toml to:

license = "MIT OR Apache-2.0"

I'll be going through projects which agree to be relicensed and have approval by the necessary contributors and doing this changes, so feel free to leave the heavy lifting to me!

Contributor checkoff

To agree to relicensing, comment with :

I license past and future contributions under the dual MIT/Apache-2.0 license, allowing licensees to chose either at their option.

Or, if you're a contributor, you can check the box in this repo next to your name. My scripts will pick this exact phrase up and check your checkbox, but I'll come through and manually review this issue later as well.

[x] @Marwes

[x] @aochagavia

[x] @hawkw

[ ] @hugoduncan

[ ] @iblech

[x] @ildarsharafutdinov

[ ] @jxv

[x] @skade
opened by emberian 15
take_until with tuple parser

I'm new to parser combinator so pardon if this is a naive question. This works as expected: println!("{:?}", take_until::<String, _>(try(string("ab"))).easy_parse(r#"aaab"#)); // Ok(("aa", "ab")) But this behaves surprisingly: println!("{:?}", take_until::<String, _>(try((char('a'), char('b')))).easy_parse(r#"aaab"#)); // Ok(("aaa", "b"))

I want to use tuple because the closing string is calculated at runtime. For example, to parse the closing "quote" of Rust raw string, I count the number of # in r#", but take_until::<String, _>(try((char('"'), count_min_max::<String, _>(1, 1, char('#'))))).easy_parse("aa\"ab\"#") gives aa\"ab\" instead of aa\"ab
bug

opened by constituent 10
Overload operators as parser combinators
Since all parsers are currently functions or methods I find that large parsers often become a bit of a word soup. Large chains of parser can become rather hard to read. I have some ideas for what might be useful to implement which I document below.

Tuples

Tuples could allow parsers which should be applied in sequence to be written as.

string("if").with(expr()).skip(string("then")).and(expr()).skip(string("else")).and(expr()) .map(|(b, (t, f))| Expr::IfElse(Box::new(b), Box::new(t), Some(Box::new(f)))) //With tuples (string("if"), expr(), string("then"), expr(), string("else"), expr()) .map(|(_, b, _, t, _, f)| Expr::IfElse(Box::new(b), Box::new(t), Some(Box::new(f))))

Strings and character literals

Strings and character literals could implement parser directly allow them to be written without string and char.

("if", expr(), "then", expr(), "else", expr()) .map(|(_, b, _, t, _, f)| Expr::IfElse(Box::new(b), Box::new(t), Some(Box::new(f))))

Use std::ops::* traits

The most likely candidate here is overloading | to work the same as the or parser. Unfortunately this won't work directly without changing the library rather radically since it is not possible to implement it as below.

impl <A: Parser, B: Parser> BitOr<B> for A { type Output = Or<A, B>; fn bitor(self, other: B) -> Or<A, B> { self.or(other) } }

This will not work since A and B due to coherence (A and B must appear inside some local type). The same applies for any other operator.

Since all of these could be seen as being to clever with the syntax it would be nice to have some feedback on which of these (if any) that may be good to implement.
enhancement
opened by Marwes 10

Parsec parsers that are not implemented

Below is a list of (hopefully) all the parsers which exist in parsec but not in this library. I added some comments on a few of them about their usefulness. If there is a parser you would argue for its inclusion (or exclusion) please leave a comment.

Parser combinators

option :: Stream s m t => a -> ParsecT s u m a -> ParsecT s u m a
optionMaybe :: Stream s m t => ParsecT s u m a -> ParsecT s u m (Maybe a)
optional :: Stream s m t => ParsecT s u m a -> ParsecT s u m ()

The 'optionMaybe' parser is called 'optional' in this lib which can cover all cases.

count :: Stream s m t => Int -> ParsecT s u m a -> ParsecT s u m [a]
endBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]
endBy1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]
chainl :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> a -> ParsecT s u m a
chainr :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> a -> ParsecT s u m a
manyTill :: Stream s m t => ParsecT s u m a -> ParsecT s u m end -> ParsecT s u m [a]
lookAhead :: Stream s m t => ParsecT s u m a -> ParsecT s u m a

Char parsers

oneOf :: Stream s m Char => [Char] -> ParsecT s u m Char
noneOf :: Stream s m Char => [Char] -> ParsecT s u m Char
endOfLine :: Stream s m Char => ParsecT s u m Char

satisfy covers all of these cases. These parsers could give better error reporting though which is an argument for their inclusion.

Added parsers:

newline :: Stream s m Char => ParsecT s u m Char
crlf :: Stream s m Char => ParsecT s u m Char
tab :: Stream s m Char => ParsecT s u m Char
upper :: Stream s m Char => ParsecT s u m Char
lower :: Stream s m Char => ParsecT s u m Char
alphaNum :: Stream s m Char => ParsecT s u m Char
letter :: Stream s m Char => ParsecT s u m Char
digit :: Stream s m Char => ParsecT s u m Char
hexDigit :: Stream s m Char => ParsecT s u m Char
octDigit :: Stream s m Char => ParsecT s u m Char

char :: Stream s m Char => Char -> ParsecT s u m Char

chainr1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> ParsecT s u m a
sepEndBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]
sepEndBy1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]

skipMany :: ParsecT s u m a -> ParsecT s u m ()
skipMany1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m ()

Equivalent to many(parser.map(|_| ())) since the many parser should not allocate the vector for zero sized values. Added since the above example will not compile without type annotations any longer and it is not obvious that it does not allocate any memory.

choice :: Stream s m t => [ParsecT s u m a] -> ParsecT s u m a

Added as choice_slice and choice_vec, might be generalized further later on.

anyToken :: (Stream s m t, Show t) => ParsecT s u m t

Added as the any parser.

eof :: (Stream s m t, Show t) => ParsecT s u m ()

Added as the eof parser.

enhancement help wanted

opened by Marwes 10

How to use take_until_range() as part of your own parser?

So I have a parser that looks like this:

fn mt940_message<I>() -> impl Parser<Input = I, Output = String>
where
    I: Stream<Item = char>,
    I::Error: ParseError<I::Item, I::Range, I::Position>,
{
    take_until_range("\r\n")
}

But that gives me a lot of errors:

error[E0271]: type mismatch resolving `<combine::range::TakeUntilRange<I> as combine::Parser>::Output == std::string::String`
  --> src/lib.rs:30:26
   |
30 | fn mt940_message<I>() -> impl Parser<Input = I, Output = String>
   |                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected associated type, found struct `std::string::String`
   |
   = note: expected type `<I as combine::StreamOnce>::Range`
              found type `std::string::String`
   = note: the return type of a function must have a statically known size

error[E0271]: type mismatch resolving `<I as combine::StreamOnce>::Range == &str`
  --> src/lib.rs:35:5
   |
35 |     take_until_range("\r\n")
   |     ^^^^^^^^^^^^^^^^^^^^^^^^ expected associated type, found &str
   |
   = note: expected type `<I as combine::StreamOnce>::Range`
              found type `&str`

error[E0277]: the trait bound `<I as combine::StreamOnce>::Range: combine::stream::Range` is not satisfied
  --> src/lib.rs:30:26
   |
30 | fn mt940_message<I>() -> impl Parser<Input = I, Output = String>
   |                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `combine::stream::Range` is not implemented for `<I as combine::StreamOnce>::Range`
   |
   = help: consider adding a `where <I as combine::StreamOnce>::Range: combine::stream::Range` bound
   = note: required because of the requirements on the impl of `combine::Parser` for `combine::range::TakeUntilRange<I>`
   = note: the return type of a function must have a statically known size

error[E0277]: the trait bound `I: combine::RangeStreamOnce` is not satisfied
  --> src/lib.rs:35:5
   |
35 |     take_until_range("\r\n")
   |     ^^^^^^^^^^^^^^^^ the trait `combine::RangeStreamOnce` is not implemented for `I`
   |
   = help: consider adding a `where I: combine::RangeStreamOnce` bound
   = note: required because of the requirements on the impl of `combine::RangeStream` for `I`
   = note: required by `combine::range::take_until_range`

error: aborting due to 4 previous errors

Some errors occurred: E0271, E0277.
For more information about an error, try `rustc --explain E0271`.
error[E0271]: type mismatch resolving `<combine::range::TakeUntilRange<I> as combine::Parser>::Output == std::string::String`
  --> src/lib.rs:30:26
   |
30 | fn mt940_message<I>() -> impl Parser<Input = I, Output = String>
   |                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected associated type, found struct `std::string::String`
   |
   = note: expected type `<I as combine::StreamOnce>::Range`
              found type `std::string::String`
   = note: the return type of a function must have a statically known size

error[E0271]: type mismatch resolving `<I as combine::StreamOnce>::Range == &str`
  --> src/lib.rs:35:5
   |
35 |     take_until_range("\r\n")
   |     ^^^^^^^^^^^^^^^^^^^^^^^^ expected associated type, found &str
   |
   = note: expected type `<I as combine::StreamOnce>::Range`
              found type `&str`

error[E0277]: the trait bound `<I as combine::StreamOnce>::Range: combine::stream::Range` is not satisfied
  --> src/lib.rs:30:26
   |
30 | fn mt940_message<I>() -> impl Parser<Input = I, Output = String>
   |                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `combine::stream::Range` is not implemented for `<I as combine::StreamOnce>::Range`
   |
   = help: consider adding a `where <I as combine::StreamOnce>::Range: combine::stream::Range` bound
   = note: required because of the requirements on the impl of `combine::Parser` for `combine::range::TakeUntilRange<I>`
   = note: the return type of a function must have a statically known size

error[E0277]: the trait bound `I: combine::RangeStreamOnce` is not satisfied
  --> src/lib.rs:35:5
   |
35 |     take_until_range("\r\n")
   |     ^^^^^^^^^^^^^^^^ the trait `combine::RangeStreamOnce` is not implemented for `I`
   |
   = help: consider adding a `where I: combine::RangeStreamOnce` bound
   = note: required because of the requirements on the impl of `combine::RangeStream` for `I`
   = note: required by `combine::range::take_until_range`

I really only want a parser that recognizes something until "\r\n".

opened by svenstaro 9

Still struggling to understand errors.
Hi, thanks for #131, it looks like an improvement. Still I have problems with how to build error handling.

For context: I usually have my own tokenizer instead of using character stream and usually tokens borrow chunks from source string (Token<'a> { value: &'a str })

I find error API too complex. First there are setters, then there are getters and there are more convertions. I don't understand how parser will use them.

I don't understand when and how error converts from StreamError to ParseError

Finally I want to expose a custom error from my module, without exposing Token type. The best way I've found is to use just to use Error and Errors from easy module and just format them to string. It looks like suboptimal though.

update: forgotten another API issue: there is a constructor for end_of_input, but not a way to check. Is == Error::end_of_input() a good idea?

Are there any good examples of error handling?
opened by tailhook 9
Add map functions for Error<> and Info<> ranges. (#86)

It's possible to map_err_range for ParseResult<> too, but it's awkward because the output type needs to be a compatible StreamOnce. As suggested in https://github.com/Marwes/combine/issues/86#issuecomment-280487165, it's probably best to either change the parse result type entirely, or wait for https://github.com/rust-lang/rust/issues/21903.

This at least helps consumers convert ParseError<> into something that can implement std::fmt::Display.

opened by ncalexan 9
take_until but parses the input it takes
Is it possible to parse the input consumed by take with another parser instead of just accumulating it in String? Specifically I want to parse something that looks like:

#+BEGIN_name ... some content #+END_name

Since name is dynamic and has to be the same in the start and end line I can't use something like between. Also I don't really want the parser that's responsible for the content to know about name.

Currently what I'm thinking about doing is using and_then to just take_until the end line, collect the content in a String and create a new stream from the collected string and parse it.

But I'm wondering if there is a better way of doing this.
opened by Lythenas 8

Parse `std::process::Child` stdout

Hi,

I am developing a TUI program and one of the features is that I can open vim for any file. The problem is that vim outputs some ANSI escape codes that I would like to strip out (the enable and disable alternative screen codes). I'm trying to use combine to do so but am running into some difficulty and was wondering if I could get some guidance.

Here is what I current have:

I spawn vim as a std::process::Child
I wrap the child's stdout in a std::io::BufReader
I read from the buffer one byte at a time and create a ParitalStream with it
- Reading one byte at a time probably isn't the best way to do this?
I then use parse_with_state on my parser

I can't seem to get this approach to compile b/c of some lifetime errors. Is this the right approach or is there a better way to do this w/ combine?

Here is a snippet of the code I described above:

let command_stdout: ChildStdout = child.stdout.take().unwrap();
let mut reader: BufReader<ChildStdout> = BufReader::new(command_stdout);
let mut buffer: [u8; 1] = [0; 1];

let mut stdout = io::stdout().lock();
let mut parser = any_partial_state(ansi_escaped_text::parser());
let mut parser_state = AnyPartialState::default();
let mut length: usize;
loop {
    length = {
        reader.read(&mut buffer).unwrap()
    };
    if length == 0 {
        break;
    }

    #[cfg(feature = "logging")]
    debug!("Got output {:?} from vim.", &buffer[..length]);

    // Parse the text so that we can strip out the ANSI escape codes for enabling and
    // disabling the alternative screen.
    let mut stream = PartialStream(&buffer[..]);
    let parsed_text = parser.parse_with_state(&mut stream, &mut parser_state);
    match parsed_text {
        Ok(ansi_escaped_text) => {
            match ansi_escaped_text {
                ANSIEscapedText::ANSIEscapeCode(ansi_escape_code) => {
                    match ansi_escape_code {
                        ANSIEscapeCode::EnableAlternativeScreen => {
                            #[cfg(feature = "logging")]
                            debug!("Stripping enable alternative screen ANSI escape code from vim's output.");
                        },
                        ANSIEscapeCode::DisableAlternativeScreen => {
                            #[cfg(feature = "logging")]
                            debug!("Stripping disable alternative screen ANSI escape code from vim's output.");
                        }
                    }
                },
                ANSIEscapedText::Character(character) => {
                    stdout.write(&[character]).unwrap();
                    stdout.flush().unwrap();
                }
            };
        },
        #[cfg(feature = "logging")]
        Err(error) => {
            error!("Failed to parse ANSI escaped text: {:?}", error);
        },
        #[cfg(not(feature = "logging"))]
        Err(_) => {},
    };
}

opened by AustinScola 0

Remove extraneous Input::Error trait bounds
Due to an old Rust bug, rust-lang/rust#24159, many parsers included an extraneous trait bound for Input::Error as a workaround. That bug has been closed since February 2021, and the workaround is no longer needed.

There are a few places I did NOT remove Input::Error bound from the where clause, which would be nice to eliminate but I think may be required:

stream/position.rs requires it to specify the StreamError associated type in the Positioned and StreamOnce implementations.

stream/easy.rs similarly requires it to specify StreamError in the docs example

parser/combinator.rs requires Into<...::StreamError> for the error generic in its AndThen struct

src/lib.rs has similar awkward boilerplate for ...:StreamError: From<ParseIntError> in its docs examples
opened by softmoth 0

Unbounded mutual recursion in Parser impl

I've implemented a recursive descent parser via the precedence climbing method using Combine, but it seems prone to unbounded mutual recursion somewhere in undocumented methods of the Parser impl.

The parser itself looks something like this:

struct ExpressionParser(u8);

impl<Input> Parser<Input> for ExpressionParser<Input>
where Input: Stream<Token = u8>
{
    type Output = EvalResult;
    type PartialState = ();

    fn parse_stream(&mut self, input: &mut Input) -> ParseResult<Self::Output, <Input as StreamOnce>::Error> {
        // calls parse_terminal() and may also recurse explicitly through parse_stream()
    }
}

impl<Input> ExpressionParser<Input>
where Input: Stream<Token = u8>
{
    fn parse_terminal(&self, input: &mut Input) -> ParseResult<EvalResult, <Input as StreamOnce>::Error> {
        // sometimes recurses through the Parser impl on Self
    }
}

I find that in some formulations this gets stuck calling parse_first, parse_partial and parse_lazy on the Parser without end (until the stack overflows) as illustrated by this stack trace (most recent call first) taken from where parse is called before things go wrong:

#74777 0x000055555555df80 in combine::parser::Parser::parse_first<ExpressionParser<&[u8]>, &[u8]> (
    self=0x7fffffffc459, input=0x7fffffffd558, state=0x7fffffffbb18)
    at /home/tari/.cargo/registry/src/github.com-1ecc6299db9ec823/combine-4.6.3/src/parser/mod.rs:245                                       
#74778 0x000055555555df21 in combine::parser::Parser::parse_lazy<ExpressionParser<&[u8]>, &[u8]> (    
    self=0x7fffffffc459, input=0x7fffffffd558)
    at /home/tari/.cargo/registry/src/github.com-1ecc6299db9ec823/combine-4.6.3/src/parser/mod.rs:196                                       
#74779 0x000055555555dfb0 in combine::parser::Parser::parse_partial<ExpressionParser<&[u8]>, &[u8]> (         
    self=0x7fffffffc459, input=0x7fffffffd558, state=0x7fffffffc361)
    at /home/tari/.cargo/registry/src/github.com-1ecc6299db9ec823/combine-4.6.3/src/parser/mod.rs:261                                       
#74780 0x000055555555df80 in combine::parser::Parser::parse_first<ExpressionParser<&[u8]>, &[u8]> (                
    self=0x7fffffffc459, input=0x7fffffffd558, state=0x7fffffffc361)
    at /home/tari/.cargo/registry/src/github.com-1ecc6299db9ec823/combine-4.6.3/src/parser/mod.rs:245
#74781 0x000055555555e063 in combine::parser::Parser::parse_mode_impl<ExpressionParser<&[u8]>, &[u8], combine::parser::FirstMode> (
    self=0x7fffffffc459, mode=..., input=0x7fffffffd558, state=0x7fffffffc361)
    at /home/tari/.cargo/registry/src/github.com-1ecc6299db9ec823/combine-4.6.3/src/parser/mod.rs:294                                      
#74782 0x000055555556274f in combine::parser::{impl#3}::parse<ExpressionParser<&[u8]>, &[u8]> (
    self=..., parser=0x7fffffffc459, input=0x7fffffffd558, state=0x7fffffffc361)                                                                      
    at /home/tari/.cargo/registry/src/github.com-1ecc6299db9ec823/combine-4.6.3/src/parser/mod.rs:1170

According to the documentation it seems like implementing parse_stream alone is fine, and parse_partial/parse_first are not documented so I don't understand what they're meant to be doing or why they end up mutually recursing.

When I previously had this problem in parse_terminal I was able to work around it (for reasons I don't understand) by wrapping the recursive call in a wrapper function. Whereas this sort of thing exhibited the same problem described here:

Self::parse(instance, input)

wrapping it in parser and explicitly calling parse_stream did not:

combine::parser(|i| Self::parse_stream(instance, i))

opened by tari 2

Offset parser

Hello! Previously, I asked to make an offset function parsing data located by some offset. Finally, I've found time to do this one. But we should discuss the feature because look_ahead(skip_count(count).with(parser)) does same thing as offset. Does the combine really need the new parser? If so then it closes #337.

opened by r4v3n6101 1
XML parsing for React.js to Solid.js conversion

Combine is recommended in https://www.reddit.com/r/rust/comments/t37twl/hey_rustaceans_got_an_easy_question_ask_here_92022/hyycpd6/?utm_source=reddit&utm_medium=web2x&context=3

But, I can't seem to find any XML parsing done using Combine. Is that possible and any comments on that?

opened by rrjanbiah 4

Owner

Markus Westerlind

GitHub https://docs.rs/combine/*/combine/

Rust parser combinator framework

nom, eating data byte by byte nom is a parser combinators library written in Rust. Its goal is to provide tools to build safe parsers without compromi

7.6k Jan 7, 2023

A fast monadic-style parser combinator designed to work on stable Rust.

Chomp Chomp is a fast monadic-style parser combinator library designed to work on stable Rust. It was written as the culmination of the experiments de

228 Oct 31, 2022

A friendly parser combinator crate

Chumsky A friendly parser combinator crate that makes writing LL-1 parsers with error recovery easy. Example Here follows a Brainfuck parser. See exam

2.4k Jan 8, 2023

A parser combinator for parsing &[Token].

PickTok A parser combinator like nom but specialized in parsing &[Token]. It has similar combinators as nom, but also provides convenient parser gener

6 Feb 24, 2023

Website for Microformats Rust parser (using 'microformats-parser'/'mf2')

5 Jul 19, 2022

A native Rust port of Google's robots.txt parser and matcher C++ library.

robotstxt A native Rust port of Google's robots.txt parser and matcher C++ library. Native Rust port, no third-part crate dependency Zero unsafe code

72 Dec 11, 2022

The Simplest Parser Library (that works) in Rust

The Simplest Parser Library (TSPL) TSPL is the The Simplest Parser Library that works in Rust. Concept In pure functional languages like Haskell, a Pa

28 Mar 1, 2024

Universal configuration library parser

LIBUCL Table of Contents generated with DocToc Introduction Basic structure Improvements to the json notation General syntax sugar Automatic arrays cr

1.5k Jan 7, 2023

Parsing Expression Grammar (PEG) parser generator for Rust

Parsing Expression Grammars in Rust Documentation | Release Notes rust-peg is a simple yet flexible parser generator that makes it easy to write robus

1.2k Dec 30, 2022

LR(1) parser generator for Rust

LALRPOP LALRPOP is a Rust parser generator framework with usability as its primary goal. You should be able to write compact, DRY, readable grammars.

2.4k Jan 7, 2023

A typed parser generator embedded in Rust code for Parsing Expression Grammars

Oak Compiled on the nightly channel of Rust. Use rustup for managing compiler channels. You can download and set up the exact same version of the comp

138 Nov 25, 2022

Rust query string parser with nesting support

What is Queryst? This is a fork of the original, with serde and serde_json updated to 0.9 A query string parsing library for Rust inspired by https://

67 Nov 16, 2022

Soon to be AsciiDoc parser implemented in rust!

pagliascii "But ASCII Doc, I am Pagliascii" Soon to be AsciiDoc parser implemented in rust! This project is the current implementation of the requeste

49 Dec 11, 2022

PEG parser for YAML written in Rust 🦀

yaml-peg PEG parser (pest) for YAML written in Rust ?? Quick Start ⚡️ # Run cargo run -- --file example_files/test.yaml # Output { "xmas": "true",

4 Sep 17, 2022

This project aims to implement a CSS(less like) parser in rust. Currently the code is targeting the PostCSS AST.

CSS(less like) parser written in rust (WIP) This project aims to implement a CSS(less like) parser in rust. Currently the code is targeting the PostCS

21 Aug 23, 2022

MRT/BGP data parser written in Rust.

BGPKIT Parser BGPKIT Parser aims to provides the most ergonomic MRT/BGP message parsing Rust API. BGPKIT Parser has the following features: performant

46 Dec 19, 2022

This project aims to implement a CSS(less like) parser in rust. Currently the code is targeting the PostCSS AST. Very early stage, do not use in production.

CSS(less like) parser written in rust (WIP) This project aims to implement a CSS(less like) parser in rust. Currently the code is targeting the PostCS

21 Aug 23, 2022

A feature-few, no-allocation JSON parser in `no_std` rust.

Small JSON Parser in no_std This library reads and parses JSON strings. Its intended use case is to read a JSON payload once. It does not serialise da

18 Nov 29, 2022

A Gura parser for Rust

Gura Rust parser IMPORTANT: if you need to use Gura in a more user-friendly way, you have at your disposal Serde Gura which allows you to perform Seri

21 Nov 13, 2022

A parser combinator library for Rust

Related tags

Overview

combine

Example

Links

Features

About

FAQ

Why does my errors contain inscrutable positions?

How does it compare to nom?

Parsers written in combine

Formats and protocols

Miscellaneous

Extra

Contributing

Comments

Why?

How?

Contributor checkoff

Tuples

Strings and character literals

Use std::ops::* traits

Parser combinators

Char parsers

Added parsers:

Owner

Markus Westerlind

Rust parser combinator framework

A fast monadic-style parser combinator designed to work on stable Rust.

A friendly parser combinator crate

A parser combinator for parsing &[Token].

Website for Microformats Rust parser (using 'microformats-parser'/'mf2')

A native Rust port of Google's robots.txt parser and matcher C++ library.

The Simplest Parser Library (that works) in Rust

Universal configuration library parser

Parsing Expression Grammar (PEG) parser generator for Rust

LR(1) parser generator for Rust

A typed parser generator embedded in Rust code for Parsing Expression Grammars

Rust query string parser with nesting support

Soon to be AsciiDoc parser implemented in rust!

PEG parser for YAML written in Rust 🦀

This project aims to implement a CSS(less like) parser in rust. Currently the code is targeting the PostCSS AST.

MRT/BGP data parser written in Rust.

This project aims to implement a CSS(less like) parser in rust. Currently the code is targeting the PostCSS AST. Very early stage, do not use in production.

A feature-few, no-allocation JSON parser in `no_std` rust.

A Gura parser for Rust