Parse BNF grammar definitions

Overview

bnf

.github/workflows/ci.yml codecov Crates.io Version Crates.io LICENSE

A library for parsing Backus–Naur form context-free grammars.

What does a parsable BNF grammar look like?

The following grammar from the Wikipedia page on Backus-Naur form exemplifies a compatible grammar. (*Note: parser allows for an optional ';' to indicate the end of a production)

 <postal-address> ::= <name-part> <street-address> <zip-part>

      <name-part> ::= <personal-part> <last-name> <opt-suffix-part> <EOL>
                    | <personal-part> <name-part>

  <personal-part> ::= <initial> "." | <first-name>

 <street-address> ::= <house-num> <street-name> <opt-apt-num> <EOL>

       <zip-part> ::= <town-name> "," <state-code> <ZIP-code> <EOL>

<opt-suffix-part> ::= "Sr." | "Jr." | <roman-numeral> | ""
    <opt-apt-num> ::= <apt-num> | ""

Output

Take the following grammar for DNA sequences to be input to this library's parse function.

<dna> ::= <base> | <base> <dna>
<base> ::= "A" | "C" | "G" | "T"

The output is a Grammar object representing a tree that looks like this:

Grammar {
    productions: [
        Production {
            lhs: Nonterminal(
                "dna"
            ),
            rhs: [
                Expression {
                    terms: [
                        Nonterminal(
                            "base"
                        )
                    ]
                },
                Expression {
                    terms: [
                        Nonterminal(
                            "base"
                        ),
                        Nonterminal(
                            "dna"
                        )
                    ]
                }
            ]
        },
        Production {
            lhs: Nonterminal(
                "base"
            ),
            rhs: [
                Expression {
                    terms: [
                        Terminal(
                            "A"
                        )
                    ]
                },
                Expression {
                    terms: [
                        Terminal(
                            "C"
                        )
                    ]
                },
                Expression {
                    terms: [
                        Terminal(
                            "G"
                        )
                    ]
                },
                Expression {
                    terms: [
                        Terminal(
                            "T"
                        )
                    ]
                }
            ]
        }
    ]
}

Once the Grammar object is populated, to generate a random sentence from it call the object's generate function. grammar.generate(). For the above grammar you could expect something like TGGC or AG.

If the generate function can't find a production for a nonterminal it tries to evaluate it will print the identifer as a nonterminal, i.e. <identifier>.

The generate function will return an error if it detects an infinite loop caused by a production such as <PATTERN> ::= <PATTERN>.

Parse Example

extern crate bnf;
use bnf::Grammar;

fn main() {
    let input =
        "<postal-address> ::= <name-part> <street-address> <zip-part>

            <name-part> ::= <personal-part> <last-name> <opt-suffix-part> <EOL>
                            | <personal-part> <name-part>

        <personal-part> ::= <initial> \".\" | <first-name>

        <street-address> ::= <house-num> <street-name> <opt-apt-num> <EOL>

            <zip-part> ::= <town-name> \",\" <state-code> <ZIP-code> <EOL>

        <opt-suffix-part> ::= \"Sr.\" | \"Jr.\" | <roman-numeral> | \"\"
            <opt-apt-num> ::= <apt-num> | \"\"";

    let grammar: Result<Grammar, _> = input.parse();
    match grammar {
        Ok(g) => println!("{:#?}", g),
        Err(e) => println!("Failed to make grammar from String: {}", e),
    }
}

Generate Example

extern crate bnf;
use bnf::Grammar;

fn main() {
    let input =
        "<dna> ::= <base> | <base> <dna>
        <base> ::= \"A\" | \"C\" | \"G\" | \"T\"";
    let grammar: Grammar = input.parse().unwrap();
    let sentence = grammar.generate();
    match sentence {
        Ok(s) => println!("random sentence: {}", s),
        Err(e) => println!("something went wrong: {}!", e)
    }
}
Comments
  • Feat/property tests from generate

    Feat/property tests from generate

    Revised generate test.

    Prior to this commit the generate tests only generated one random grammar to parse per run. This commit revises the approach and instead uses the generate function as an arbitrary property to test the the Grammar::from_str functionality with.

    Additions relevant to the reopening of this PR:

    First, a new error kind was introduced to indicate that the generation error isn't generic but the result of the limits of recursion being reached. This allows the property tests for the Grammar type to disregard problematic sentence generation, it is testing that what it gets is parsable.

    In order to manage meeting recursion limits being met in the property tests, a new use case was accounted for, one that I belive implementaion for is useful. Logic was added to allow the "from_str" trait for bnf types to be parallel to Type::new() in all cases except the Term type. Because the Term type does not have a sensical way to implement new(), "from_str" for Term types corresponds to a Terminal rule for the empty string which is allowed in parsable gramamrs.

    As this commit represents that addition of a new error kind it seemed that there is enough variety now to make our Error type public in order for users can match on them.

    opened by shnewto 20
  • Add

    Add "+" operator

    I implemented the BitOr Trait (| operator) for Term and Expression as described in #34. This allows syntax that mirrors actual BNF like: let exp= Term::Terminal::from_str("A") | Term::Terminal::from_str("B") | Term::Terminal::from_str("C")

    • Term | Term creates an Expression that contains both Terms.
    • Expression | Term and Term | Expression add the Term to the Expression.
    • Expression1 | Expression2 clones all terms from Expression2 and adds them to Expression1

    I made seperate methods for Expression and &Expression to prevent unnecessary clones as recommended in https://doc.rust-lang.org/core/ops/index.html

    While testing, I noticed that two expressions are only counted as equal when their .terms are in the exact same order. This causes strange behavior like: a | Expression::from_parts(vec![b,c]) != Expression::from_parts(vec![a,b,c]) This should probably be a new Issue, but the implementation for Term | Expression would have to be changed if this is intentional.

    opened by DrunkJon 11
  • add callbacks to string generation

    add callbacks to string generation

    I was using bnf as a fuzzer for a parser I made. I experienced a problem that was hard to solve without a change to the library. I'm using my own fork right now, but I thought it would be nice to propose these changes for upstream as well. The problem I had is that I have a grammar that allows variable names in the form [a-zAZ_][a-zA-Z0-9_]*. However, not certain names (such as 'test', which is a keyword). I could just reject all strings with the word 'test' in, however, that would never generate strings with this keyword.

    My solution is to add a set of alternatives to Grammar::generate() and Grammar::generate_seeded() which take a callback (though I'm not yet entirely sure about the current naming). For example Grammar::generate_callback().

    Grammar::generate() is a thin wrapper around Grammar::generate_callback(|_, _| true), making the callback essentially a no-op. However, if a user callback is provided it can function as a filter. For example:

    <01> ::= "0" | "1"
    <09> ::= <01>| "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
    <number> ::= <09> <number> | <09>
    
    g.generate_callback(|ident, value| match ident {
        "number" => value.parse::<u32>().is_ok()
        _ => true
    });
    

    would only generate integers which fit into a u32.

    If you don't like these changes, feel free to reject. Then I'll keep using my own fork.

    opened by jonay2000 11
  • make production ending semicolon optional

    make production ending semicolon optional

    Plenty of BNF examples, including the wikipedia example, do not require productions to terminate with a semicolon, but rather only require a newline. I believe the parser could be updated to optionally use the ending semicolon, which would support both formats.

    enhancement 
    opened by CrockAgile 9
  • How to parse a certain text according to a bnf grammar and get a list of token

    How to parse a certain text according to a bnf grammar and get a list of token

    The examples provided take a bnf grammar description and then are able to generate text that matches the grammar. How do I give the grammar a string compliant with bnf specification and then get a list of tokens out? This is something that pyparsing does.

    opened by prmammoth 8
  • Step by step example

    Step by step example

    Hello,

    First of all wanted to say thank you for your hard work on this project.

    What do you think about creating step by step example with using simple grammar? It seems to me this would help to see the usual work flow and may show general concepts better.

    FYI: There is a template to make a good README.md https://gist.github.com/PurpleBooth/109311bb0361f32d87a2

    enhancement 
    opened by sergeyklay 8
  • Add a possibility to parse the text and view how it was parsed

    Add a possibility to parse the text and view how it was parsed

    When I was in the university there was a program for learning grammatics in the programming languages theory course. It was very useful for learners to see how the code was parsed in the grammar tree. Can we do the same thing? It must not be so hard, basically, we already have it while we output Grammar as a tree in the README file, the same thing is needed but with real data which could come not from the random generator but from the user input - this is how learners learn how to write grammar correctly.

    enhancement help wanted 
    opened by vityafx 8
  • Detect and auto complete

    Detect and auto complete "nullable" productions

    Closes #112

    What

    #112 noted that empty/"nullable" parsing was encountering some issues. @Loup's Blog explains the root of the problem better than I can here. But the solution can be summarized: detect which non-terminals can be "null", and when "predicting", also automatically complete them.

    Detecting null productions is achievable. But BNF provides parse trees. So rather than simply "skipping" nullable nonterminals, BNF should provide how the nonterminal was nullable (possibly by many ways!).

    This required a significant refactor because currently matching nonterminals requires a corresponding State. But nullable rules have no directly associated State !

    enum TermMatch<'gram> {
        Terminal(&'gram str),
        NonTerminal(StateId), // <--- UH OH
    }
    

    So I must apologize because this is much too large of a PR 😓 I will do better next time!

    Changes

    • improve AppendOnlyVec to use it in more places
    • separate "term matching" from "state"
    • rename State to Traversal because it was confusing me (this may be controversial! please let's talk about it!)
    • detect "nullable" productions, and autocomplete them during parsing
    • added "tracing" !

    Tracing

    Tracing helped me a lot when tracking down a much larger performance regression that was introduced during development.

    automatica tracing is very helpful, but can be a bit much of noise. And because it uses rust debug symbols, the output can be pretty messy. That said, it does capture A LOT

    image

    tracing-flame allows for more "manual" flamegraphs, only using spans specifically made by developers allows for more intentional data, but often misses out due to human oversight.

    image

    Performance

    parsing benchmarks from my local laptop:

    // before
    parse polish calculator time:   [497.64 µs 578.29 µs 662.03 µs]                 
    
    // after
    parse polish calculator time:   [564.33 µs 650.73 µs 739.58 µs]                                    
                            change: [-13.197% +14.678% +49.938%] (p = 0.35 > 0.05)
                            No change in performance detected.
    

    So criterion isn't confident, but after many personal trial runs it does seem that +14% performance hit is "real".

    That said, I do have ideas of how to improve:

    • Grammar::parse is currently re-detecting "nullable" productions on every parse attempt. This duplicate work could be removed by adding a new Grammar::parser , which would provide a reusable parser instance
    • Term matching is using Vec<TermMatch> for non-terminals. Profiling has shown that the cloning (and later dropping) account for a fair amount of work. Term matching could instead use a tree/stemming approach, removing the need for Vec

    Thanks

    I want to thank @nskobelevs and @amascolo for helping by raising the problem and contributing good test cases! I could not have put this PR together without you <3

    opened by CrockAgile 7
  • Empty String Rules (Still) Fail to Match

    Empty String Rules (Still) Fail to Match

    Describe the bug Empty string rules still fail to match in some situations. This is the same as #108 although after it's fix it doesn't happen in all situations but still happens.

    To Reproduce

    1. Load grammar from tests/fixtures/bnf.bnf

    2. Try to parse the first line of the text: "<syntax> ::= <rule> | <rule> <syntax>\n" This match will fail.

    3. Try to parse the same line with whitespace before the newline "<syntax> ::= <rule> | <rule> <syntax> \n" This will parse properly

    The first example should pass given <line-end> may begin with <opt-whitespace> which has an empty string rule

    <line-end>       ::= <opt-whitespace> <EOL>
    <opt-whitespace> ::= "" | " " <opt-whitespace>
    <EOL>            ::= "
    "
    

    Unfortunately I haven't been able to find a smaller example. Weirdly it does parse "<syntax> ::= <rule>" correctly but not the more complicated example above.

    If there support for special characters like \n or a built in definition for <EOL> it could be good to have a test that a grammer made from tests/fixtures/bnf.bnf can parse the file itself.

    Desktop (please complete the following information):

    • MacOS Ventura 13.0.1
    opened by nskobelevs 7
  • Clippy check step for PRs

    Clippy check step for PRs

    Clippy has a lot of best practices, but BNF is in violation of a lot of them. Some would be breaking changes (like removing the non std::str::FromStr from_str public functions). Other changes require updating the parsers to nom 5 best practices.

    Once that is done, would just need to revert previous attempt to re-enable clippy CI.

        Checking bnf v0.2.6
    warning: use of deprecated item 'ws': whitespace parsing only works with macros and will not be updated anymore
      --> src/parsers.rs:6:1
       |
    6  | / named!(pub prod_lhs< &[u8], Term >,
    7  | |     do_parse!(
    8  | |             nt: delimited!(char!('<'), take_until!(">"), ws!(char!('>'))) >>
    9  | |             _ret: ws!(tag!("::=")) >>
    10 | |             (Term::Nonterminal(String::from_utf8_lossy(nt).into_owned()))
    11 | |     )
    12 | | );
       | |__^
       |
       = note: `#[warn(deprecated)]` on by default
       = note: this warning originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)
    
    warning: use of deprecated item 'ws': whitespace parsing only works with macros and will not be updated anymore
      --> src/parsers.rs:6:1
       |
    6  | / named!(pub prod_lhs< &[u8], Term >,
    7  | |     do_parse!(
    8  | |             nt: delimited!(char!('<'), take_until!(">"), ws!(char!('>'))) >>
    9  | |             _ret: ws!(tag!("::=")) >>
    10 | |             (Term::Nonterminal(String::from_utf8_lossy(nt).into_owned()))
    11 | |     )
    12 | | );
       | |__^
       |
       = note: this warning originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)
    
    warning: use of deprecated item 'ws': whitespace parsing only works with macros and will not be updated anymore
      --> src/parsers.rs:14:1
       |
    14 | / named!(pub terminal< &[u8], Term >,
    15 | |     do_parse!(
    16 | |         t: alt!(
    17 | |             delimited!(char!('"'), take_until!("\""), ws!(char!('"'))) |
    ...  |
    21 | |     )
    22 | | );
       | |__^
       |
       = note: this warning originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)
    
    warning: use of deprecated item 'ws': whitespace parsing only works with macros and will not be updated anymore
      --> src/parsers.rs:14:1
       |
    14 | / named!(pub terminal< &[u8], Term >,
    15 | |     do_parse!(
    16 | |         t: alt!(
    17 | |             delimited!(char!('"'), take_until!("\""), ws!(char!('"'))) |
    ...  |
    21 | |     )
    22 | | );
       | |__^
       |
       = note: this warning originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)
    
    warning: use of deprecated item 'ws': whitespace parsing only works with macros and will not be updated anymore
      --> src/parsers.rs:24:1
       |
    24 | / named!(pub nonterminal< &[u8], Term >,
    25 | |     do_parse!(
    26 | |         nt: complete!(delimited!(char!('<'), take_until!(">"), ws!(char!('>')))) >>
    27 | |         ws!(not!(complete!(tag!("::=")))) >>
    28 | |         (Term::Nonterminal(String::from_utf8_lossy(nt).into_owned()))
    29 | |     )
    30 | | );
       | |__^
       |
       = note: this warning originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)
    
    warning: use of deprecated item 'ws': whitespace parsing only works with macros and will not be updated anymore
      --> src/parsers.rs:24:1
       |
    24 | / named!(pub nonterminal< &[u8], Term >,
    25 | |     do_parse!(
    26 | |         nt: complete!(delimited!(char!('<'), take_until!(">"), ws!(char!('>')))) >>
    27 | |         ws!(not!(complete!(tag!("::=")))) >>
    28 | |         (Term::Nonterminal(String::from_utf8_lossy(nt).into_owned()))
    29 | |     )
    30 | | );
       | |__^
       |
       = note: this warning originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)
    
    warning: use of deprecated item 'ws': whitespace parsing only works with macros and will not be updated anymore
      --> src/parsers.rs:42:1
       |
    42 | / named!(pub expression_next,
    43 | |     do_parse!(
    44 | |         ws!(char!('|')) >>
    45 | |         ret: recognize!(peek!(complete!(expression))) >>
    46 | |         (ret)
    47 | |     )
    48 | | );
       | |__^
       |
       = note: this warning originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)
    
    warning: use of deprecated item 'ws': whitespace parsing only works with macros and will not be updated anymore
      --> src/parsers.rs:50:1
       |
    50 | / named!(pub expression< &[u8], Expression >,
    51 | |     do_parse!(
    52 | |         peek!(term) >>
    53 | |         terms: many1!(complete!(term)) >>
    ...  |
    63 | |     )
    64 | | );
       | |__^
       |
       = note: this warning originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)
    
    warning: use of deprecated item 'ws': whitespace parsing only works with macros and will not be updated anymore
      --> src/parsers.rs:74:1
       |
    74 | / named!(pub production< &[u8], Production >,
    75 | |     do_parse!(
    76 | |         lhs: ws!(prod_lhs) >>
    77 | |         rhs: many1!(complete!(expression)) >>
    ...  |
    86 | |     )
    87 | | );
       | |__^
       |
       = note: this warning originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)
    
    warning: use of deprecated item 'ws': whitespace parsing only works with macros and will not be updated anymore
      --> src/parsers.rs:74:1
       |
    74 | / named!(pub production< &[u8], Production >,
    75 | |     do_parse!(
    76 | |         lhs: ws!(prod_lhs) >>
    77 | |         rhs: many1!(complete!(expression)) >>
    ...  |
    86 | |     )
    87 | | );
       | |__^
       |
       = note: this warning originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)
    
    warning: unneeded return statement
       --> src/grammar.rs:115:9
        |
    115 |         return Ok(result);
        |         ^^^^^^^^^^^^^^^^^^ help: remove `return`: `Ok(result)`
        |
        = note: `#[warn(clippy::needless_return)]` on by default
        = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_return
    
    warning: useless use of `format!`
      --> src/error.rs:55:32
       |
    55 |             Needed::Unknown => format!("Data error: insufficient size, expectation unknown"),
       |                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: consider using .to_string(): `"Data error: insufficient size, expectation unknown".to_string()`
       |
       = note: `#[warn(clippy::useless_format)]` on by default
       = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#useless_format
    
    warning: you should consider deriving a `Default` implementation for `expression::Expression`
      --> src/expression.rs:16:5
       |
    16 | /     pub fn new() -> Expression {
    17 | |         Expression { terms: vec![] }
    18 | |     }
       | |_____^
       |
       = note: `#[warn(clippy::new_without_default)]` on by default
       = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#new_without_default
    help: try this
       |
    10 | #[derive(Default)]
       |
    
    warning: defining a method called `from_str` on this type; consider implementing the `std::str::FromStr` trait or choosing a less ambiguous name
      --> src/expression.rs:26:5
       |
    26 | /     pub fn from_str(s: &str) -> Result<Self, Error> {
    27 | |         match parsers::expression_complete(s.as_bytes()) {
    28 | |             Result::Ok((_, o)) => Ok(o),
    29 | |             Result::Err(e) => Err(Error::from(e)),
    30 | |         }
    31 | |     }
       | |_____^
       |
       = note: `#[warn(clippy::should_implement_trait)]` on by default
       = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#should_implement_trait
    
    warning: you should consider deriving a `Default` implementation for `grammar::Grammar`
      --> src/grammar.rs:20:5
       |
    20 | /     pub fn new() -> Grammar {
    21 | |         Grammar {
    22 | |             productions: vec![],
    23 | |         }
    24 | |     }
       | |_____^
       |
       = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#new_without_default
    help: try this
       |
    14 | #[derive(Default)]
       |
    
    warning: defining a method called `from_str` on this type; consider implementing the `std::str::FromStr` trait or choosing a less ambiguous name
      --> src/grammar.rs:32:5
       |
    32 | /     pub fn from_str(s: &str) -> Result<Self, Error> {
    33 | |         match parsers::grammar_complete(s.as_bytes()) {
    34 | |             Result::Ok((_, o)) => Ok(o),
    35 | |             Result::Err(e) => Err(Error::from(e)),
    36 | |         }
    37 | |     }
       | |_____^
       |
       = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#should_implement_trait
    
    warning: writing `&String` instead of `&str` involves a new object where a slice will do.
      --> src/grammar.rs:74:31
       |
    74 |     fn traverse(&self, ident: &String, rng: &mut StdRng) -> Result<String, Error> {
       |                               ^^^^^^^
       |
       = note: `#[warn(clippy::ptr_arg)]` on by default
       = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#ptr_arg
    help: change this to
       |
    74 |     fn traverse(&self, ident: &str, rng: &mut StdRng) -> Result<String, Error> {
       |                               ^^^^
    help: change `ident.clone()` to
       |
    86 |         let nonterm = Term::Nonterminal(ident.to_string());
       |                                         ^^^^^^^^^^^^^^^^^
    
    error: using `clone` on a double-reference; this will copy the reference instead of cloning the inner type
      --> src/grammar.rs:99:37
       |
    99 |             Some(e) => expression = e.clone(),
       |                                     ^^^^^^^^^
       |
       = note: `#[deny(clippy::clone_double_ref)]` on by default
       = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#clone_double_ref
    help: try dereferencing it
       |
    99 |             Some(e) => expression = &(*e).clone(),
       |                                     ^^^^^^^^^^^^^
    help: or try being explicit about what type to clone
       |
    99 |             Some(e) => expression = &expression::Expression::clone(e),
       |                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    
    enhancement 
    opened by CrockAgile 6
  • Impl from string

    Impl from string

    Closes #22 and #12 by implementing std::str::FromStr for the node types.

    Also removed look_ahead! in favor of not maintaining a separate macro when it seems like nom::recognize! offers similar behavior.

    ⚠️ do not merge: tests to be added

    opened by CrockAgile 6
  • Complete prior Earley predictions

    Complete prior Earley predictions

    Closes #117 and #118 (hopefully!)

    #117 and #118 raised examples of failures to parse input based on grammars. after some investigation, it seems the root cause was shared by both issues.

    Root Cause

    Consider the grammar:

    <start> ::= <shortfail> | <longsuccess>
    <shortfail> ::= <char> 'never'
    <char> ::= 'a'
    <longsuccess> ::= <long2>
    <long2> ::= <long3>
    <long3> ::= <long4>
    <long4> ::= <char>
    

    When parsing input "a", there are two routes: "shortfail" and "longsuccess". As the names suggest, "shortfail" requires fewer traversals, but always fails. "longsuccess" requires more traversal steps, and should eventually succeed.

    The issue is caused because both paths predict the non-terminal "char". Practical Earley parsing requires de-duplicating predictions or else recursive grammars fall into infinite loops. The existing Earley implementation in BNF does roughly the following:

    (work roughly alternates between the short and long routes because new traversals are appended to the end of the work queue)

    * <start> ::= • <shortfail>
    * <start> ::= • <longsuccess>
    * <shortfail> ::= • <char> 'never'
    * <longsuccess> ::= • <long2>
    * <char> ::= • 'a'
    * <long2> ::= • <long3>
    * <char> ::= 'a' •
    * <long3> ::= • <long4>
    * <shortfail> ::= <char> • 'never' // <--- notice this does NOT succeed
    * <long4> ::= • <char> // !!! this <char> prediction is IGNORED because an identical prediction was already made
    

    All the <longN> productions are necessary because otherwise the <longsuccess> route is able to predict before its completion.

    I am sorry if I have not explained this super clearly. It is a tricky problem! Funny aside, I actually thought of this bug while developing v0.4.0 . But I (wrongly) assumed there was something about the Earley algorithm which resolved it. I attempted to write a test, but I did not realize it would require so many intermediate non-terminals to expose. Woops!

    Solution

    The underlying problem is that because non-terminals can be shared across multiple traversal routes, "completion" must be addressed also during "prediction".

    Performance

    On my machine, there was a roughly 10% performance cost to the new parsing logic. I will experiment in the future for improvements, but in the meantime I believe passing tests and correct behavior outweigh the cost.

    Extra

    Basically every time I have worked on an Earley bug, I have ended up adding the same manual logging to help with debugging. I decided to commit that logging this time!

    There is a new tracing::event! which by default is a noop, but with the "tracing" feature enabled adds logging events.

    For Earley, traversal state events are logged when created and during predict/scan/complete. It helps quite a bit with debugging!

    opened by CrockAgile 2
  • Right-recursive productions Fail to Match

    Right-recursive productions Fail to Match

    Looks related to nullable productions: #108 #112 #117

    This grammar fails to match when we change a production from left-recursive to right-recursive.

    The left-recursive grammar failed on 0.4.0 ≤ bnf ≤ 0.4.1 – the right-recursive still fails on 0.4.0 ≤ bnf ≤ 0.4.3

    So this issue must have been partially addressed by @CrockAgile in #109 (0.4.2)

    Happy to help battle-testing any proposed fixes ⚔️

    use bnf::Grammar;
    
    #[test]
    fn test() {
        let input = "aa a";
    
        let left_recursive: &str = "
        <conjunction> ::= <conjunction> <ws> <predicate> | <predicate>
        <predicate> ::= <string_null_one> | <special-string> '.'
    
        <string_null_one> ::= <string_null_two>
        <string_null_two> ::= <string_null_three>
        <string_null_three> ::= <string>
    
        <string> ::= <char_null> | <string> <char_null>
        <special-string> ::= <special-char> | <special-string> <special-char>
    
        <char_null> ::= <char>
        <char> ::= 'a'
    
        <special-char> ::= <char_null> | <whitespace>
        <whitespace> ::= ' '
    
        <ws> ::= ' ' | ' ' <ws>
        ";
        assert!(left_recursive.parse::<Grammar>().unwrap().parse_input(input).next().is_some());
    
        let right_recursive = left_recursive.replace(
            // rewrite production from left- to right- recursive
            "<string> ::= <char_null> | <string> <char_null>",
            "<string> ::= <char_null> | <char_null> <string>",
        );
        assert!(
            right_recursive.parse::<Grammar>().unwrap().parse_input(input).next().is_some(),
            "right-recursive grammar failed to parse: {input}"
        );
    }
    
    opened by amascolo 0
  • Nullable productions (Still, Still) Fail to Match

    Nullable productions (Still, Still) Fail to Match

    Related to #108 and #112 – failures reproduce on 0.4.0 ≤ bnf ≤ 0.4.3 so not a regression from #113 .

    Hit more cases where nullable productions aren't handled properly.

    This example looks contrived but was adapted from a larger grammar I'm writing.

    As usual, thanks for all your work @CrockAgile !

    Sadly, this correctness issue continues to be a total blocker preventing me from adopting bnf

    Hoping my battle-testing hasn't been in vain and there's an easy fix around the corner 🙏🏻

    use bnf::Grammar;
    
    #[test]
    fn test() {
        let fails: &str = "
        <disjunction> ::= <predicate> | <disjunction> <or> <predicate>
        <predicate> ::= <char_null_one> | <special-string> '.'
    
        <char_null_one> ::= <char_null_two>
        <char_null_two> ::= <char_null_three>
        <char_null_three> ::= <char>
    
        <or> ::= <ws> 'or' <ws>
        <ws> ::= <whitespace> | ' ' <ws>
        <whitespace> ::= ' '
    
        <special-string> ::= <special-char> | <special-char> <special-string>
        <special-char> ::= <char> | <whitespace>
        <char> ::= 'a'
        ";
    
        let input = "a or a";
    
        let passes_1 = fails.replace(
            // skip nullable production <char_null_two>
            "<char_null_one> ::= <char_null_two>",
            "<char_null_one> ::= <char_null_three>",
        );
        assert!(passes_1.parse::<Grammar>().unwrap().parse_input(input).next().is_some());
    
        let passes_2 = fails.replace(
            // replace <whitespace> with its terminal ' '
            "<ws> ::= <whitespace> | ' ' <ws>",
            "<ws> ::= ' ' | ' ' <ws>",
        );
        assert!(passes_2.parse::<Grammar>().unwrap().parse_input(input).next().is_some());
    
        let passes_3 = fails.replace(
            // again, replace <whitespace> with its terminal ' '
            "<special-char> ::= <char> | <whitespace>",
            "<special-char> ::= <char> | ' '",
        );
        assert!(passes_3.parse::<Grammar>().unwrap().parse_input(input).next().is_some());
    
        assert!(
            fails.parse::<Grammar>().unwrap().parse_input(input).next().is_some(),
            "all grammars except last one parsed: {input}"
        );
    }
    
    opened by amascolo 1
  • reusable grammar parser

    reusable grammar parser

    Is your feature request related to a problem? Please describe.

    The existing Grammar::parse_input method is useful for parsing input text according to a Grammar. But the current API builds "one use" parsers. This limitation forces some redundant work to be repeated when parsing multiple input texts.

    // each of these builds completely new `GrammarMatching` and other parsing tables,
    // which are then freed before the next `parse_input`, wasting computation
    grammar.parse_input(input_1);
    grammar.parse_input(input_2);
    grammar.parse_input(input_3);
    

    Describe the solution you'd like

    Instead, a single GrammarParser could be available which would reuse parser data efficiently:

    let parser = grammar.input_parser();
    
    // common immutable parser data is shared between calls
    parser.parse_input(input_1);
    parser.parse_input(input_2);
    parser.parse_input(input_3);
    

    Describe alternatives you've considered

    Maybe this just doesn't matter! I am not sure if this new API would even make much of a performance difference. I am not sure "how much faster" would warrant a new API method. 1% faster? 5% faster? 50% faster?

    enhancement help wanted 
    opened by CrockAgile 0
  • faster production matching while earley parsing

    faster production matching while earley parsing

    What

    while working on nullable rules (#113), profiling showed a potential for performance improvement.

    ProductionMatching owns matching productions with terms. currently, it is implemented:

    pub(crate) struct ProductionMatching<'gram> {
        pub prod_id: ProductionId,
        pub lhs: &'gram crate::Term,
        /// "right hand side" [`TermMatching`]s which are partitioned by the matched and unmatched.
        /// For example: [Matched, Matched, Matched, Unmatched, Unmatched]
        rhs: Vec<TermMatching<'gram>>,
        /// The progress cursor used to separate [`TermMatching`]s in the "right hand side"
        matched_count: usize,
    }
    

    The rhs vector requires a fair amount of dynamic memory alloc and free while parsing (current location where vector is cloned).

    There may be a better way to model matching terms.

    Brainstorm

    If you pick this up, do not take this brainstorm as "the only way". This is just one of many possibilities.

    Trie

    instead of cloning the matching vector, matching could be modeled as a Trie/Stemming tree, with each node a Term

    eg. <word> ::= 'B' 'A' 'D' | 'B' 'A' 'T'

    graph TD
        S[start matching prod] --> B
        B --> A
        A --> T
        A --> D
    

    Then each ProductionMatching references a single node in this matching tree.

    Results

    To check results, the criterion benchmarking can be used:

    cargo criterion

    Tracing can be used to investigate:

    CARGO_PROFILE_BENCH_DEBUG=true cargo flamegraph --bench bnf -- --bench

    enhancement help wanted 
    opened by CrockAgile 0
  • Ability to parse BNF grammars at compile time

    Ability to parse BNF grammars at compile time

    Is your feature request related to a problem? Please describe. There's no reason to include an entire BNF parser in the compiled application if I just have a set grammar that I want to use that is constant. However, this is what happens if I use the bnf crate in my application - it must parse and validate at runtime.

    Describe the solution you'd like Allow BNF syntax to be parsed at compile-time. Make the required functions const (FromStr::from_str is not const)

    Describe alternatives you've considered N/A

    Additional context N/A

    opened by LoganDark 3
Releases(0.4.3)
  • 0.4.3(Nov 30, 2022)

  • 0.4.0(Jul 9, 2022)

    What's Changed

    🎉 Add "+" operator for Term and Expression types @DrunkJon #88 🎉 Update to Rust edition 2021 @CrockAgile #90 🎉 Benchmark BNF examples via Criterion @CrockAgile #91 🎉 Parse grammar sentences via "Earley Parsing" and generate corresponding "parse forests" @CrockAgile #92 🎉 Mermaid formatting of parse trees @CrockAgile #99

    New Contributors

    @DrunkJon with the "+" operator in #88

    Full Changelog: https://github.com/shnewto/bnf/compare/0.3.4...0.4.0

    Source code(tar.gz)
    Source code(zip)
  • 0.3.4(Dec 20, 2021)

    What's Changed

    • Update to stable quickcheck by @shnewto in https://github.com/shnewto/bnf/pull/77
    • chore: factor long bnf text fixtures into their own files by @SKalt in https://github.com/shnewto/bnf/pull/80
    • remove coverage reports for now by @shnewto in https://github.com/shnewto/bnf/pull/83
    • Revert "remove coverage reports for now" by @shnewto in https://github.com/shnewto/bnf/pull/85
    • Update nom requirement from ^6.0.1 to ^7.0.0 by @dependabot in https://github.com/shnewto/bnf/pull/84
    • Remove unnecessary borrows and to_string() by @benarmstead in https://github.com/shnewto/bnf/pull/81
    • add callbacks to string generation by @jonay2000 in https://github.com/shnewto/bnf/pull/87

    New Contributors

    • @SKalt made their first contribution in https://github.com/shnewto/bnf/pull/80
    • @dependabot made their first contribution in https://github.com/shnewto/bnf/pull/84
    • @benarmstead made their first contribution in https://github.com/shnewto/bnf/pull/81
    • @jonay2000 made their first contribution in https://github.com/shnewto/bnf/pull/87

    Full Changelog: https://github.com/shnewto/bnf/compare/0.3.3...0.3.4

    Source code(tar.gz)
    Source code(zip)
  • 0.3.3(Dec 30, 2020)

  • 0.3.2(Dec 22, 2020)

    • Update nom and rand dependencies to latest.
    • Using eof parser from nom 6.0 instead of homerolled eoi
    • Increased threshold for "stack redzone" to address some issues where infinite loops could overflow the stack despite our checks to prevent it.
    Source code(tar.gz)
    Source code(zip)
  • 0.3.1(Oct 4, 2020)

  • 0.2.7(Oct 31, 2019)

  • 0.2.6(Aug 25, 2019)

Owner
Shea Newton
writes software, talks to hardware. ::= "🙃" | "🙀" | "😬" | "🐋"
Shea Newton
LR(1) grammar parser of simple expression

LR(1)语法分析程序 实验内容 编写LR(1)语法分析程序,实现对算术表达式的语法分析。要求所分析算数表达式由如下的文法产生: E -> E+T | E-T | T T -> T*F | T/F | F F -> (E) | num 程序设计与实现 使用方式:运行.\lr1-parser.exe

Gao Keyong 1 Nov 24, 2021
Rust grammar tool libraries and binaries

Grammar and parsing libraries for Rust grmtools is a suite of Rust libraries and binaries for parsing text, both at compile-time, and run-time. Most u

Software Development Team 318 Dec 26, 2022
WIP: Parse archived parler pages into structured html

parler-parse Parler HTML goes in (stdin), structured JSON comes out (stdout) Might be useful for feeding into elasticsearch or cross-referencing with

Christopher Tarquini 15 Feb 16, 2021
Parse byte size into integer accurately.

parse-size parse-size is an accurate, customizable, allocation-free library for parsing byte size into integer. use parse_size::parse_size; assert_eq

null 20 Aug 16, 2022
Generate and parse UUIDs.

uuid Here's an example of a UUID: 67e55044-10b1-426f-9247-bb680e5fe0c8 A UUID is a unique 128-bit value, stored as 16 octets, and regularly formatted

Rust Uuid 754 Jan 6, 2023
Parse RISC-V opcodes to provide more detailed structured data

riscv-opcodes-parser Parse RISC-V opcodes to provide more detailed structured data. License Licensed under either of Apache License, Version 2.0 (LICE

Sprite 2 Jul 30, 2022
Checks all your documentation for spelling and grammar mistakes with hunspell and a nlprule based checker for grammar

cargo-spellcheck Check your spelling with hunspell and/or nlprule. Use Cases Run cargo spellcheck --fix or cargo spellcheck fix to fix all your docume

Bernhard Schuster 274 Nov 5, 2022
languagetool-code-comments integrates the LanguageTool API to parse, spell check, and correct the grammar of your code comments!

languagetool-code-comments integrates the LanguageTool API to parse, spell check, and correct the grammar of your code comments! Overview Install MacO

Dustin Blackman 17 Dec 25, 2022
Parsing Expression Grammar (PEG) parser generator for Rust

Parsing Expression Grammars in Rust Documentation | Release Notes rust-peg is a simple yet flexible parser generator that makes it easy to write robus

Kevin Mehall 1.2k Dec 30, 2022
A fast Rust-based safe and thead-friendly grammar-based fuzz generator

Intro fzero is a grammar-based fuzzer that generates a Rust application inspired by the paper "Building Fast Fuzzers" by Rahul Gopinath and Andreas Ze

null 203 Nov 9, 2022
a grammar based feedback fuzzer

Nautilus NOTE: THIS IS AN OUTDATE REPOSITORY, THE CURRENT RELEASE IS AVAILABLE HERE. THIS REPO ONLY SERVES AS A REFERENCE FOR THE PAPER Nautilus is a

Chair for Sys­tems Se­cu­ri­ty 157 Oct 26, 2022
LR(1) grammar parser of simple expression

LR(1)语法分析程序 实验内容 编写LR(1)语法分析程序,实现对算术表达式的语法分析。要求所分析算数表达式由如下的文法产生: E -> E+T | E-T | T T -> T*F | T/F | F F -> (E) | num 程序设计与实现 使用方式:运行.\lr1-parser.exe

Gao Keyong 1 Nov 24, 2021
A fast Rust-based safe and thead-friendly grammar-based fuzz generator

Intro fzero is a grammar-based fuzzer that generates a Rust application inspired by the paper "Building Fast Fuzzers" by Rahul Gopinath and Andreas Ze

null 203 Nov 9, 2022
a grammar based feedback fuzzer

Nautilus NOTE: THIS IS AN OUTDATE REPOSITORY, THE CURRENT RELEASE IS AVAILABLE HERE. THIS REPO ONLY SERVES AS A REFERENCE FOR THE PAPER Nautilus is a

Chair for Sys­tems Se­cu­ri­ty 157 Oct 26, 2022
A set of bison skeleton files that can be used to generate a Bison grammar that is written in Rust.

rust-bison-skeleton A set of bison skeleton files that can be used to generate a Bison grammar that is written in Rust. Technically it's more like a B

Ilya Bylich 12 Dec 14, 2022
Rust grammar tool libraries and binaries

Grammar and parsing libraries for Rust grmtools is a suite of Rust libraries and binaries for parsing text, both at compile-time, and run-time. Most u

Software Development Team 318 Dec 26, 2022
Mypyc DSL grammar for tree-sitter

tree-sitter-mypyc Mypyc DSL grammar for tree-sitter. Installing (Neovim) This is based on the Neovim Tree-sitter docs for adding new parsers. Basicall

dosisod 3 Dec 30, 2022
A generator for high-performance Pest parsers, bringing your grammar to the next level

Faster-Pest Welcome to faster-pest, a high-performance code generator for Parsing Expression Grammars. faster-pest is an unofficial pro-macro providin

null 3 Jan 13, 2023
Simple grammar-based test case generator

tree-splicer tree-splicer is a simple grammar-based test case generator. It parses a number of input files using tree-sitter grammars, and produces ne

Langston Barrett 5 Mar 19, 2023
Easy-to-use grammar-based black-box fuzzer. Has found dozens of bugs in important targets like Clang, Deno, and rustc.

tree-crasher tree-crasher is an easy-to-use grammar-based black-box fuzzer. It parses a number of input files using tree-sitter grammars, and produces

Langston Barrett 5 Mar 28, 2023