Parse BNF grammar definitions

Last update: Dec 26, 2022

Overview

bnf

A library for parsing Backus–Naur form context-free grammars.

What does a parsable BNF grammar look like?

The following grammar from the Wikipedia page on Backus-Naur form exemplifies a compatible grammar. (*Note: parser allows for an optional ';' to indicate the end of a production)

 <postal-address> ::= <name-part> <street-address> <zip-part>

      <name-part> ::= <personal-part> <last-name> <opt-suffix-part> <EOL>
                    | <personal-part> <name-part>

  <personal-part> ::= <initial> "." | <first-name>

 <street-address> ::= <house-num> <street-name> <opt-apt-num> <EOL>

       <zip-part> ::= <town-name> "," <state-code> <ZIP-code> <EOL>

<opt-suffix-part> ::= "Sr." | "Jr." | <roman-numeral> | ""
    <opt-apt-num> ::= <apt-num> | ""

Output

Take the following grammar for DNA sequences to be input to this library's parse function.

<dna> ::= <base> | <base> <dna>
<base> ::= "A" | "C" | "G" | "T"

The output is a Grammar object representing a tree that looks like this:

Grammar {
    productions: [
        Production {
            lhs: Nonterminal(
                "dna"
            ),
            rhs: [
                Expression {
                    terms: [
                        Nonterminal(
                            "base"
                        )
                    ]
                },
                Expression {
                    terms: [
                        Nonterminal(
                            "base"
                        ),
                        Nonterminal(
                            "dna"
                        )
                    ]
                }
            ]
        },
        Production {
            lhs: Nonterminal(
                "base"
            ),
            rhs: [
                Expression {
                    terms: [
                        Terminal(
                            "A"
                        )
                    ]
                },
                Expression {
                    terms: [
                        Terminal(
                            "C"
                        )
                    ]
                },
                Expression {
                    terms: [
                        Terminal(
                            "G"
                        )
                    ]
                },
                Expression {
                    terms: [
                        Terminal(
                            "T"
                        )
                    ]
                }
            ]
        }
    ]
}

Once the Grammar object is populated, to generate a random sentence from it call the object's generate function. grammar.generate(). For the above grammar you could expect something like TGGC or AG.

If the generate function can't find a production for a nonterminal it tries to evaluate it will print the identifer as a nonterminal, i.e. <identifier>.

The generate function will return an error if it detects an infinite loop caused by a production such as <PATTERN> ::= <PATTERN>.

Parse Example

extern crate bnf;
use bnf::Grammar;

fn main() {
    let input =
        "<postal-address> ::= <name-part> <street-address> <zip-part>

            <name-part> ::= <personal-part> <last-name> <opt-suffix-part> <EOL>
                            | <personal-part> <name-part>

        <personal-part> ::= <initial> \".\" | <first-name>

        <street-address> ::= <house-num> <street-name> <opt-apt-num> <EOL>

            <zip-part> ::= <town-name> \",\" <state-code> <ZIP-code> <EOL>

        <opt-suffix-part> ::= \"Sr.\" | \"Jr.\" | <roman-numeral> | \"\"
            <opt-apt-num> ::= <apt-num> | \"\"";

    let grammar: Result<Grammar, _> = input.parse();
    match grammar {
        Ok(g) => println!("{:#?}", g),
        Err(e) => println!("Failed to make grammar from String: {}", e),
    }
}

Generate Example

extern crate bnf;
use bnf::Grammar;

fn main() {
    let input =
        "<dna> ::= <base> | <base> <dna>
        <base> ::= \"A\" | \"C\" | \"G\" | \"T\"";
    let grammar: Grammar = input.parse().unwrap();
    let sentence = grammar.generate();
    match sentence {
        Ok(s) => println!("random sentence: {}", s),
        Err(e) => println!("something went wrong: {}!", e)
    }
}

Comments

Feat/property tests from generate

Revised generate test.

Prior to this commit the generate tests only generated one random grammar to parse per run. This commit revises the approach and instead uses the generate function as an arbitrary property to test the the Grammar::from_str functionality with.

Additions relevant to the reopening of this PR:

First, a new error kind was introduced to indicate that the generation error isn't generic but the result of the limits of recursion being reached. This allows the property tests for the Grammar type to disregard problematic sentence generation, it is testing that what it gets is parsable.

In order to manage meeting recursion limits being met in the property tests, a new use case was accounted for, one that I belive implementaion for is useful. Logic was added to allow the "from_str" trait for bnf types to be parallel to Type::new() in all cases except the Term type. Because the Term type does not have a sensical way to implement new(), "from_str" for Term types corresponds to a Terminal rule for the empty string which is allowed in parsable gramamrs.

As this commit represents that addition of a new error kind it seemed that there is enough variety now to make our Error type public in order for users can match on them.

opened by shnewto 20
Add "+" operator
I implemented the BitOr Trait (| operator) for Term and Expression as described in #34. This allows syntax that mirrors actual BNF like: let exp= Term::Terminal::from_str("A") | Term::Terminal::from_str("B") | Term::Terminal::from_str("C")

Term | Term creates an Expression that contains both Terms.

Expression | Term and Term | Expression add the Term to the Expression.

Expression1 | Expression2 clones all terms from Expression2 and adds them to Expression1

I made seperate methods for Expression and &Expression to prevent unnecessary clones as recommended in https://doc.rust-lang.org/core/ops/index.html

While testing, I noticed that two expressions are only counted as equal when their .terms are in the exact same order. This causes strange behavior like: a | Expression::from_parts(vec![b,c]) != Expression::from_parts(vec![a,b,c]) This should probably be a new Issue, but the implementation for Term | Expression would have to be changed if this is intentional.
opened by DrunkJon 11
add callbacks to string generation
I was using bnf as a fuzzer for a parser I made. I experienced a problem that was hard to solve without a change to the library. I'm using my own fork right now, but I thought it would be nice to propose these changes for upstream as well. The problem I had is that I have a grammar that allows variable names in the form [a-zAZ_][a-zA-Z0-9_]*. However, not certain names (such as 'test', which is a keyword). I could just reject all strings with the word 'test' in, however, that would never generate strings with this keyword.

My solution is to add a set of alternatives to Grammar::generate() and Grammar::generate_seeded() which take a callback (though I'm not yet entirely sure about the current naming). For example Grammar::generate_callback().

Grammar::generate() is a thin wrapper around Grammar::generate_callback(|_, _| true), making the callback essentially a no-op. However, if a user callback is provided it can function as a filter. For example:

<01> ::= "0" | "1" <09> ::= <01>| "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" <number> ::= <09> <number> | <09>

g.generate_callback(|ident, value| match ident { "number" => value.parse::<u32>().is_ok() _ => true });

would only generate integers which fit into a u32.

If you don't like these changes, feel free to reject. Then I'll keep using my own fork.
opened by jonay2000 11
make production ending semicolon optional

Plenty of BNF examples, including the wikipedia example, do not require productions to terminate with a semicolon, but rather only require a newline. I believe the parser could be updated to optionally use the ending semicolon, which would support both formats.
enhancement

opened by CrockAgile 9
How to parse a certain text according to a bnf grammar and get a list of token

The examples provided take a bnf grammar description and then are able to generate text that matches the grammar. How do I give the grammar a string compliant with bnf specification and then get a list of tokens out? This is something that pyparsing does.

opened by prmammoth 8
Step by step example

Hello,

First of all wanted to say thank you for your hard work on this project.

What do you think about creating step by step example with using simple grammar? It seems to me this would help to see the usual work flow and may show general concepts better.

FYI: There is a template to make a good README.md https://gist.github.com/PurpleBooth/109311bb0361f32d87a2
enhancement

opened by sergeyklay 8
Add a possibility to parse the text and view how it was parsed

When I was in the university there was a program for learning grammatics in the programming languages theory course. It was very useful for learners to see how the code was parsed in the grammar tree. Can we do the same thing? It must not be so hard, basically, we already have it while we output Grammar as a tree in the README file, the same thing is needed but with real data which could come not from the random generator but from the user input - this is how learners learn how to write grammar correctly.
enhancement help wanted

opened by vityafx 8
Detect and auto complete "nullable" productions
Closes #112

What

#112 noted that empty/"nullable" parsing was encountering some issues. @Loup's Blog explains the root of the problem better than I can here. But the solution can be summarized: detect which non-terminals can be "null", and when "predicting", also automatically complete them.

Detecting null productions is achievable. But BNF provides parse trees. So rather than simply "skipping" nullable nonterminals, BNF should provide how the nonterminal was nullable (possibly by many ways!).

This required a significant refactor because currently matching nonterminals requires a corresponding State. But nullable rules have no directly associated State !

enum TermMatch<'gram> { Terminal(&'gram str), NonTerminal(StateId), // <--- UH OH }

So I must apologize because this is much too large of a PR 😓 I will do better next time!

Changes

improve AppendOnlyVec to use it in more places

separate "term matching" from "state"

rename State to Traversal because it was confusing me (this may be controversial! please let's talk about it!)

detect "nullable" productions, and autocomplete them during parsing

added "tracing" !

Tracing

Tracing helped me a lot when tracking down a much larger performance regression that was introduced during development.

automatica tracing is very helpful, but can be a bit much of noise. And because it uses rust debug symbols, the output can be pretty messy. That said, it does capture A LOT

tracing-flame allows for more "manual" flamegraphs, only using spans specifically made by developers allows for more intentional data, but often misses out due to human oversight.

Performance

parsing benchmarks from my local laptop:

// before parse polish calculator time: [497.64 µs 578.29 µs 662.03 µs] // after parse polish calculator time: [564.33 µs 650.73 µs 739.58 µs] change: [-13.197% +14.678% +49.938%] (p = 0.35 > 0.05) No change in performance detected.

So criterion isn't confident, but after many personal trial runs it does seem that +14% performance hit is "real".

That said, I do have ideas of how to improve:

Grammar::parse is currently re-detecting "nullable" productions on every parse attempt. This duplicate work could be removed by adding a new Grammar::parser , which would provide a reusable parser instance

Term matching is using Vec<TermMatch> for non-terminals. Profiling has shown that the cloning (and later dropping) account for a fair amount of work. Term matching could instead use a tree/stemming approach, removing the need for Vec

Thanks

I want to thank @nskobelevs and @amascolo for helping by raising the problem and contributing good test cases! I could not have put this PR together without you <3
opened by CrockAgile 7
Empty String Rules (Still) Fail to Match
Describe the bug Empty string rules still fail to match in some situations. This is the same as #108 although after it's fix it doesn't happen in all situations but still happens.

To Reproduce

Load grammar from tests/fixtures/bnf.bnf

Try to parse the first line of the text: "<syntax> ::= <rule> | <rule> <syntax>\n" This match will fail.

Try to parse the same line with whitespace before the newline "<syntax> ::= <rule> | <rule> <syntax> \n" This will parse properly

The first example should pass given <line-end> may begin with <opt-whitespace> which has an empty string rule

<line-end> ::= <opt-whitespace> <EOL> <opt-whitespace> ::= "" | " " <opt-whitespace> <EOL> ::= " "

Unfortunately I haven't been able to find a smaller example. Weirdly it does parse "<syntax> ::= <rule>" correctly but not the more complicated example above.

If there support for special characters like \n or a built in definition for <EOL> it could be good to have a test that a grammer made from tests/fixtures/bnf.bnf can parse the file itself.

Desktop (please complete the following information):

MacOS Ventura 13.0.1
opened by nskobelevs 7

Clippy check step for PRs

Clippy has a lot of best practices, but BNF is in violation of a lot of them. Some would be breaking changes (like removing the non std::str::FromStr from_str public functions). Other changes require updating the parsers to nom 5 best practices.

Once that is done, would just need to revert previous attempt to re-enable clippy CI.

    Checking bnf v0.2.6
warning: use of deprecated item 'ws': whitespace parsing only works with macros and will not be updated anymore
  --> src/parsers.rs:6:1
   |
6  | / named!(pub prod_lhs< &[u8], Term >,
7  | |     do_parse!(
8  | |             nt: delimited!(char!('<'), take_until!(">"), ws!(char!('>'))) >>
9  | |             _ret: ws!(tag!("::=")) >>
10 | |             (Term::Nonterminal(String::from_utf8_lossy(nt).into_owned()))
11 | |     )
12 | | );
   | |__^
   |
   = note: `#[warn(deprecated)]` on by default
   = note: this warning originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)

warning: use of deprecated item 'ws': whitespace parsing only works with macros and will not be updated anymore
  --> src/parsers.rs:6:1
   |
6  | / named!(pub prod_lhs< &[u8], Term >,
7  | |     do_parse!(
8  | |             nt: delimited!(char!('<'), take_until!(">"), ws!(char!('>'))) >>
9  | |             _ret: ws!(tag!("::=")) >>
10 | |             (Term::Nonterminal(String::from_utf8_lossy(nt).into_owned()))
11 | |     )
12 | | );
   | |__^
   |
   = note: this warning originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)

warning: use of deprecated item 'ws': whitespace parsing only works with macros and will not be updated anymore
  --> src/parsers.rs:14:1
   |
14 | / named!(pub terminal< &[u8], Term >,
15 | |     do_parse!(
16 | |         t: alt!(
17 | |             delimited!(char!('"'), take_until!("\""), ws!(char!('"'))) |
...  |
21 | |     )
22 | | );
   | |__^
   |
   = note: this warning originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)

warning: use of deprecated item 'ws': whitespace parsing only works with macros and will not be updated anymore
  --> src/parsers.rs:14:1
   |
14 | / named!(pub terminal< &[u8], Term >,
15 | |     do_parse!(
16 | |         t: alt!(
17 | |             delimited!(char!('"'), take_until!("\""), ws!(char!('"'))) |
...  |
21 | |     )
22 | | );
   | |__^
   |
   = note: this warning originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)

warning: use of deprecated item 'ws': whitespace parsing only works with macros and will not be updated anymore
  --> src/parsers.rs:24:1
   |
24 | / named!(pub nonterminal< &[u8], Term >,
25 | |     do_parse!(
26 | |         nt: complete!(delimited!(char!('<'), take_until!(">"), ws!(char!('>')))) >>
27 | |         ws!(not!(complete!(tag!("::=")))) >>
28 | |         (Term::Nonterminal(String::from_utf8_lossy(nt).into_owned()))
29 | |     )
30 | | );
   | |__^
   |
   = note: this warning originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)

warning: use of deprecated item 'ws': whitespace parsing only works with macros and will not be updated anymore
  --> src/parsers.rs:24:1
   |
24 | / named!(pub nonterminal< &[u8], Term >,
25 | |     do_parse!(
26 | |         nt: complete!(delimited!(char!('<'), take_until!(">"), ws!(char!('>')))) >>
27 | |         ws!(not!(complete!(tag!("::=")))) >>
28 | |         (Term::Nonterminal(String::from_utf8_lossy(nt).into_owned()))
29 | |     )
30 | | );
   | |__^
   |
   = note: this warning originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)

warning: use of deprecated item 'ws': whitespace parsing only works with macros and will not be updated anymore
  --> src/parsers.rs:42:1
   |
42 | / named!(pub expression_next,
43 | |     do_parse!(
44 | |         ws!(char!('|')) >>
45 | |         ret: recognize!(peek!(complete!(expression))) >>
46 | |         (ret)
47 | |     )
48 | | );
   | |__^
   |
   = note: this warning originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)

warning: use of deprecated item 'ws': whitespace parsing only works with macros and will not be updated anymore
  --> src/parsers.rs:50:1
   |
50 | / named!(pub expression< &[u8], Expression >,
51 | |     do_parse!(
52 | |         peek!(term) >>
53 | |         terms: many1!(complete!(term)) >>
...  |
63 | |     )
64 | | );
   | |__^
   |
   = note: this warning originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)

warning: use of deprecated item 'ws': whitespace parsing only works with macros and will not be updated anymore
  --> src/parsers.rs:74:1
   |
74 | / named!(pub production< &[u8], Production >,
75 | |     do_parse!(
76 | |         lhs: ws!(prod_lhs) >>
77 | |         rhs: many1!(complete!(expression)) >>
...  |
86 | |     )
87 | | );
   | |__^
   |
   = note: this warning originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)

warning: use of deprecated item 'ws': whitespace parsing only works with macros and will not be updated anymore
  --> src/parsers.rs:74:1
   |
74 | / named!(pub production< &[u8], Production >,
75 | |     do_parse!(
76 | |         lhs: ws!(prod_lhs) >>
77 | |         rhs: many1!(complete!(expression)) >>
...  |
86 | |     )
87 | | );
   | |__^
   |
   = note: this warning originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)

warning: unneeded return statement
   --> src/grammar.rs:115:9
    |
115 |         return Ok(result);
    |         ^^^^^^^^^^^^^^^^^^ help: remove `return`: `Ok(result)`
    |
    = note: `#[warn(clippy::needless_return)]` on by default
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_return

warning: useless use of `format!`
  --> src/error.rs:55:32
   |
55 |             Needed::Unknown => format!("Data error: insufficient size, expectation unknown"),
   |                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: consider using .to_string(): `"Data error: insufficient size, expectation unknown".to_string()`
   |
   = note: `#[warn(clippy::useless_format)]` on by default
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#useless_format

warning: you should consider deriving a `Default` implementation for `expression::Expression`
  --> src/expression.rs:16:5
   |
16 | /     pub fn new() -> Expression {
17 | |         Expression { terms: vec![] }
18 | |     }
   | |_____^
   |
   = note: `#[warn(clippy::new_without_default)]` on by default
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#new_without_default
help: try this
   |
10 | #[derive(Default)]
   |

warning: defining a method called `from_str` on this type; consider implementing the `std::str::FromStr` trait or choosing a less ambiguous name
  --> src/expression.rs:26:5
   |
26 | /     pub fn from_str(s: &str) -> Result<Self, Error> {
27 | |         match parsers::expression_complete(s.as_bytes()) {
28 | |             Result::Ok((_, o)) => Ok(o),
29 | |             Result::Err(e) => Err(Error::from(e)),
30 | |         }
31 | |     }
   | |_____^
   |
   = note: `#[warn(clippy::should_implement_trait)]` on by default
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#should_implement_trait

warning: you should consider deriving a `Default` implementation for `grammar::Grammar`
  --> src/grammar.rs:20:5
   |
20 | /     pub fn new() -> Grammar {
21 | |         Grammar {
22 | |             productions: vec![],
23 | |         }
24 | |     }
   | |_____^
   |
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#new_without_default
help: try this
   |
14 | #[derive(Default)]
   |

warning: defining a method called `from_str` on this type; consider implementing the `std::str::FromStr` trait or choosing a less ambiguous name
  --> src/grammar.rs:32:5
   |
32 | /     pub fn from_str(s: &str) -> Result<Self, Error> {
33 | |         match parsers::grammar_complete(s.as_bytes()) {
34 | |             Result::Ok((_, o)) => Ok(o),
35 | |             Result::Err(e) => Err(Error::from(e)),
36 | |         }
37 | |     }
   | |_____^
   |
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#should_implement_trait

warning: writing `&String` instead of `&str` involves a new object where a slice will do.
  --> src/grammar.rs:74:31
   |
74 |     fn traverse(&self, ident: &String, rng: &mut StdRng) -> Result<String, Error> {
   |                               ^^^^^^^
   |
   = note: `#[warn(clippy::ptr_arg)]` on by default
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#ptr_arg
help: change this to
   |
74 |     fn traverse(&self, ident: &str, rng: &mut StdRng) -> Result<String, Error> {
   |                               ^^^^
help: change `ident.clone()` to
   |
86 |         let nonterm = Term::Nonterminal(ident.to_string());
   |                                         ^^^^^^^^^^^^^^^^^

error: using `clone` on a double-reference; this will copy the reference instead of cloning the inner type
  --> src/grammar.rs:99:37
   |
99 |             Some(e) => expression = e.clone(),
   |                                     ^^^^^^^^^
   |
   = note: `#[deny(clippy::clone_double_ref)]` on by default
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#clone_double_ref
help: try dereferencing it
   |
99 |             Some(e) => expression = &(*e).clone(),
   |                                     ^^^^^^^^^^^^^
help: or try being explicit about what type to clone
   |
99 |             Some(e) => expression = &expression::Expression::clone(e),
   |                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

enhancement

opened by CrockAgile 6

Impl from string

Closes #22 and #12 by implementing std::str::FromStr for the node types.

Also removed look_ahead! in favor of not maintaining a separate macro when it seems like nom::recognize! offers similar behavior.

⚠️ do not merge: tests to be added

opened by CrockAgile 6
Complete prior Earley predictions
Closes #117 and #118 (hopefully!)

#117 and #118 raised examples of failures to parse input based on grammars. after some investigation, it seems the root cause was shared by both issues.

Root Cause

Consider the grammar:

<start> ::= <shortfail> | <longsuccess> <shortfail> ::= <char> 'never' <char> ::= 'a' <longsuccess> ::= <long2> <long2> ::= <long3> <long3> ::= <long4> <long4> ::= <char>

When parsing input "a", there are two routes: "shortfail" and "longsuccess". As the names suggest, "shortfail" requires fewer traversals, but always fails. "longsuccess" requires more traversal steps, and should eventually succeed.

The issue is caused because both paths predict the non-terminal "char". Practical Earley parsing requires de-duplicating predictions or else recursive grammars fall into infinite loops. The existing Earley implementation in BNF does roughly the following:

(work roughly alternates between the short and long routes because new traversals are appended to the end of the work queue)

* <start> ::= • <shortfail> * <start> ::= • <longsuccess> * <shortfail> ::= • <char> 'never' * <longsuccess> ::= • <long2> * <char> ::= • 'a' * <long2> ::= • <long3> * <char> ::= 'a' • * <long3> ::= • <long4> * <shortfail> ::= <char> • 'never' // <--- notice this does NOT succeed * <long4> ::= • <char> // !!! this <char> prediction is IGNORED because an identical prediction was already made

All the <longN> productions are necessary because otherwise the <longsuccess> route is able to predict before its completion.

I am sorry if I have not explained this super clearly. It is a tricky problem! Funny aside, I actually thought of this bug while developing v0.4.0 . But I (wrongly) assumed there was something about the Earley algorithm which resolved it. I attempted to write a test, but I did not realize it would require so many intermediate non-terminals to expose. Woops!

Solution

The underlying problem is that because non-terminals can be shared across multiple traversal routes, "completion" must be addressed also during "prediction".

Performance

On my machine, there was a roughly 10% performance cost to the new parsing logic. I will experiment in the future for improvements, but in the meantime I believe passing tests and correct behavior outweigh the cost.

Extra

Basically every time I have worked on an Earley bug, I have ended up adding the same manual logging to help with debugging. I decided to commit that logging this time!

There is a new tracing::event! which by default is a noop, but with the "tracing" feature enabled adds logging events.

For Earley, traversal state events are logged when created and during predict/scan/complete. It helps quite a bit with debugging!
opened by CrockAgile 2

Right-recursive productions Fail to Match

Looks related to nullable productions: #108 #112 #117

This grammar fails to match when we change a production from left-recursive to right-recursive.

The left-recursive grammar failed on 0.4.0 ≤ bnf ≤ 0.4.1 – the right-recursive still fails on 0.4.0 ≤ bnf ≤ 0.4.3

So this issue must have been partially addressed by @CrockAgile in #109 (0.4.2)

Happy to help battle-testing any proposed fixes ⚔️

use bnf::Grammar;

#[test]
fn test() {
    let input = "aa a";

    let left_recursive: &str = "
    <conjunction> ::= <conjunction> <ws> <predicate> | <predicate>
    <predicate> ::= <string_null_one> | <special-string> '.'

    <string_null_one> ::= <string_null_two>
    <string_null_two> ::= <string_null_three>
    <string_null_three> ::= <string>

    <string> ::= <char_null> | <string> <char_null>
    <special-string> ::= <special-char> | <special-string> <special-char>

    <char_null> ::= <char>
    <char> ::= 'a'

    <special-char> ::= <char_null> | <whitespace>
    <whitespace> ::= ' '

    <ws> ::= ' ' | ' ' <ws>
    ";
    assert!(left_recursive.parse::<Grammar>().unwrap().parse_input(input).next().is_some());

    let right_recursive = left_recursive.replace(
        // rewrite production from left- to right- recursive
        "<string> ::= <char_null> | <string> <char_null>",
        "<string> ::= <char_null> | <char_null> <string>",
    );
    assert!(
        right_recursive.parse::<Grammar>().unwrap().parse_input(input).next().is_some(),
        "right-recursive grammar failed to parse: {input}"
    );
}

opened by amascolo 0

Nullable productions (Still, Still) Fail to Match

Related to #108 and #112 – failures reproduce on 0.4.0 ≤ bnf ≤ 0.4.3 so not a regression from #113 .

Hit more cases where nullable productions aren't handled properly.

This example looks contrived but was adapted from a larger grammar I'm writing.

As usual, thanks for all your work @CrockAgile !

Sadly, this correctness issue continues to be a total blocker preventing me from adopting bnf

Hoping my battle-testing hasn't been in vain and there's an easy fix around the corner 🙏🏻

use bnf::Grammar;

#[test]
fn test() {
    let fails: &str = "
    <disjunction> ::= <predicate> | <disjunction> <or> <predicate>
    <predicate> ::= <char_null_one> | <special-string> '.'

    <char_null_one> ::= <char_null_two>
    <char_null_two> ::= <char_null_three>
    <char_null_three> ::= <char>

    <or> ::= <ws> 'or' <ws>
    <ws> ::= <whitespace> | ' ' <ws>
    <whitespace> ::= ' '

    <special-string> ::= <special-char> | <special-char> <special-string>
    <special-char> ::= <char> | <whitespace>
    <char> ::= 'a'
    ";

    let input = "a or a";

    let passes_1 = fails.replace(
        // skip nullable production <char_null_two>
        "<char_null_one> ::= <char_null_two>",
        "<char_null_one> ::= <char_null_three>",
    );
    assert!(passes_1.parse::<Grammar>().unwrap().parse_input(input).next().is_some());

    let passes_2 = fails.replace(
        // replace <whitespace> with its terminal ' '
        "<ws> ::= <whitespace> | ' ' <ws>",
        "<ws> ::= ' ' | ' ' <ws>",
    );
    assert!(passes_2.parse::<Grammar>().unwrap().parse_input(input).next().is_some());

    let passes_3 = fails.replace(
        // again, replace <whitespace> with its terminal ' '
        "<special-char> ::= <char> | <whitespace>",
        "<special-char> ::= <char> | ' '",
    );
    assert!(passes_3.parse::<Grammar>().unwrap().parse_input(input).next().is_some());

    assert!(
        fails.parse::<Grammar>().unwrap().parse_input(input).next().is_some(),
        "all grammars except last one parsed: {input}"
    );
}

opened by amascolo 1

reusable grammar parser
Is your feature request related to a problem? Please describe.

The existing Grammar::parse_input method is useful for parsing input text according to a Grammar. But the current API builds "one use" parsers. This limitation forces some redundant work to be repeated when parsing multiple input texts.

// each of these builds completely new `GrammarMatching` and other parsing tables, // which are then freed before the next `parse_input`, wasting computation grammar.parse_input(input_1); grammar.parse_input(input_2); grammar.parse_input(input_3);

Describe the solution you'd like

Instead, a single GrammarParser could be available which would reuse parser data efficiently:

let parser = grammar.input_parser(); // common immutable parser data is shared between calls parser.parse_input(input_1); parser.parse_input(input_2); parser.parse_input(input_3);

Describe alternatives you've considered

Maybe this just doesn't matter! I am not sure if this new API would even make much of a performance difference. I am not sure "how much faster" would warrant a new API method. 1% faster? 5% faster? 50% faster?
enhancement help wanted
opened by CrockAgile 0
faster production matching while earley parsing
What

while working on nullable rules (#113), profiling showed a potential for performance improvement.

ProductionMatching owns matching productions with terms. currently, it is implemented:

pub(crate) struct ProductionMatching<'gram> { pub prod_id: ProductionId, pub lhs: &'gram crate::Term, /// "right hand side" [`TermMatching`]s which are partitioned by the matched and unmatched. /// For example: [Matched, Matched, Matched, Unmatched, Unmatched] rhs: Vec<TermMatching<'gram>>, /// The progress cursor used to separate [`TermMatching`]s in the "right hand side" matched_count: usize, }

The rhs vector requires a fair amount of dynamic memory alloc and free while parsing (current location where vector is cloned).

There may be a better way to model matching terms.

Brainstorm

If you pick this up, do not take this brainstorm as "the only way". This is just one of many possibilities.

Trie

instead of cloning the matching vector, matching could be modeled as a Trie/Stemming tree, with each node a Term

eg. <word> ::= 'B' 'A' 'D' | 'B' 'A' 'T'

graph TD S[start matching prod] --> B B --> A A --> T A --> D

Then each ProductionMatching references a single node in this matching tree.

Results

To check results, the criterion benchmarking can be used:

cargo criterion

Tracing can be used to investigate:

CARGO_PROFILE_BENCH_DEBUG=true cargo flamegraph --bench bnf -- --bench

enhancement help wanted
opened by CrockAgile 0
Ability to parse BNF grammars at compile time

Is your feature request related to a problem? Please describe. There's no reason to include an entire BNF parser in the compiled application if I just have a set grammar that I want to use that is constant. However, this is what happens if I use the bnf crate in my application - it must parse and validate at runtime.

Describe the solution you'd like Allow BNF syntax to be parsed at compile-time. Make the required functions const (FromStr::from_str is not const)

Describe alternatives you've considered N/A

Additional context N/A

opened by LoganDark 3

Releases(0.4.3)

0.4.3(Nov 30, 2022)

Fix Earley parsing of "nullable" productions (grammars with productions like: <term> ::= '' (note the empty string)
Source code(tar.gz)
Source code(zip)
0.4.0(Jul 9, 2022)

What's Changed

🎉 Add "+" operator for Term and Expression types @DrunkJon #88 🎉 Update to Rust edition 2021 @CrockAgile #90 🎉 Benchmark BNF examples via Criterion @CrockAgile #91 🎉 Parse grammar sentences via "Earley Parsing" and generate corresponding "parse forests" @CrockAgile #92 🎉 Mermaid formatting of parse trees @CrockAgile #99

New Contributors

@DrunkJon with the "+" operator in #88

Full Changelog: https://github.com/shnewto/bnf/compare/0.3.4...0.4.0
Source code(tar.gz)
Source code(zip)
0.3.4(Dec 20, 2021)
What's Changed

Update to stable quickcheck by @shnewto in https://github.com/shnewto/bnf/pull/77

chore: factor long bnf text fixtures into their own files by @SKalt in https://github.com/shnewto/bnf/pull/80

remove coverage reports for now by @shnewto in https://github.com/shnewto/bnf/pull/83

Revert "remove coverage reports for now" by @shnewto in https://github.com/shnewto/bnf/pull/85

Update nom requirement from ^6.0.1 to ^7.0.0 by @dependabot in https://github.com/shnewto/bnf/pull/84

Remove unnecessary borrows and to_string() by @benarmstead in https://github.com/shnewto/bnf/pull/81

add callbacks to string generation by @jonay2000 in https://github.com/shnewto/bnf/pull/87

New Contributors

@SKalt made their first contribution in https://github.com/shnewto/bnf/pull/80

@dependabot made their first contribution in https://github.com/shnewto/bnf/pull/84

@benarmstead made their first contribution in https://github.com/shnewto/bnf/pull/81

@jonay2000 made their first contribution in https://github.com/shnewto/bnf/pull/87

Full Changelog: https://github.com/shnewto/bnf/compare/0.3.3...0.3.4
Source code(tar.gz)
Source code(zip)
0.3.3(Dec 30, 2020)

Using the serde crate's Serialize/Deserialize in the derive macros on Gramamr, Production, Expression, and Term structs / enums
Source code(tar.gz)
Source code(zip)
0.3.2(Dec 22, 2020)
Update nom and rand dependencies to latest.

Using eof parser from nom 6.0 instead of homerolled eoi

Increased threshold for "stack redzone" to address some issues where infinite loops could overflow the stack despite our checks to prevent it.

Source code(tar.gz)
Source code(zip)
0.3.1(Oct 4, 2020)

Update rand dependency.
Source code(tar.gz)
Source code(zip)
0.2.7(Oct 31, 2019)

Fixes for things that flagged us for future breakages on a crater run. Thanks @z2oh!
Source code(tar.gz)
Source code(zip)
0.2.6(Aug 25, 2019)
Upgrade nom dependency

Get rid of a build warning.

Source code(tar.gz)
Source code(zip)
0.2.4(May 7, 2019)

Source code(tar.gz)
Source code(zip)

Owner

Shea Newton

writes software, talks to hardware. ::= "🙃" | "🙀" | "😬" | "🐋"

GitHub

LR(1) grammar parser of simple expression

LR(1)语法分析程序实验内容编写LR(1)语法分析程序,实现对算术表达式的语法分析。要求所分析算数表达式由如下的文法产生: E -> E+T | E-T | T T -> T*F | T/F | F F -> (E) | num 程序设计与实现使用方式：运行.\lr1-parser.exe

1 Nov 24, 2021

Rust grammar tool libraries and binaries

Grammar and parsing libraries for Rust grmtools is a suite of Rust libraries and binaries for parsing text, both at compile-time, and run-time. Most u

318 Dec 26, 2022

WIP: Parse archived parler pages into structured html

parler-parse Parler HTML goes in (stdin), structured JSON comes out (stdout) Might be useful for feeding into elasticsearch or cross-referencing with

15 Feb 16, 2021

Parse byte size into integer accurately.

parse-size parse-size is an accurate, customizable, allocation-free library for parsing byte size into integer. use parse_size::parse_size; assert_eq

20 Aug 16, 2022

Generate and parse UUIDs.

uuid Here's an example of a UUID: 67e55044-10b1-426f-9247-bb680e5fe0c8 A UUID is a unique 128-bit value, stored as 16 octets, and regularly formatted

754 Jan 6, 2023

Parse RISC-V opcodes to provide more detailed structured data

riscv-opcodes-parser Parse RISC-V opcodes to provide more detailed structured data. License Licensed under either of Apache License, Version 2.0 (LICE

2 Jul 30, 2022

Checks all your documentation for spelling and grammar mistakes with hunspell and a nlprule based checker for grammar

cargo-spellcheck Check your spelling with hunspell and/or nlprule. Use Cases Run cargo spellcheck --fix or cargo spellcheck fix to fix all your docume

274 Nov 5, 2022

languagetool-code-comments integrates the LanguageTool API to parse, spell check, and correct the grammar of your code comments!

languagetool-code-comments integrates the LanguageTool API to parse, spell check, and correct the grammar of your code comments! Overview Install MacO

17 Dec 25, 2022

Easy protocol definitions in Rust

protocol Documentation Easy protocol definitions in Rust. This crate adds a custom derive that can be added to types, allowing structured data to be s

157 Dec 30, 2022

syntect is a syntax highlighting library for Rust that uses Sublime Text syntax definitions.

syntect is a syntax highlighting library for Rust that uses Sublime Text syntax definitions. It aims to be a good solution for any Rust project that needs syntax highlighting, including deep integration with text editors written in Rust.

1.5k Jan 8, 2023

Interface definitions for the Compute@Edge platform in witx.

?? compute-at-edge-abi This repository contains witx definitions for the Compute@Edge platform ABI. About witx The witx file format is an experimental

14 Apr 5, 2022

The Safe Network Core. API message definitions, routing and nodes, client core api.

safe_network The Safe Network Core. API message definitions, routing and nodes, client core api. License This Safe Network repository is licensed unde

101 Dec 19, 2022

Experimental Rust tool for generating FFI definitions allowing many other languages to call Rust code

Diplomat is an experimental Rust tool for generating FFI definitions allowing many other languages to call Rust code. With Diplomat, you can simply define Rust APIs to be exposed over FFI and get high-level C, C++, and JavaScript bindings automatically!

255 Dec 30, 2022

Parse BNF grammar definitions

Related tags

Overview

bnf

What does a parsable BNF grammar look like?

Output

Parse Example

Generate Example

Comments

What

Changes

Tracing

Performance

Thanks

Root Cause

Solution

Performance

Extra

What

Brainstorm

Trie

Results

Releases(0.4.3)

0.4.3(Nov 30, 2022)

0.4.0(Jul 9, 2022)

What's Changed

New Contributors

0.3.4(Dec 20, 2021)

What's Changed

New Contributors

0.3.3(Dec 30, 2020)

0.3.2(Dec 22, 2020)

0.3.1(Oct 4, 2020)

0.2.7(Oct 31, 2019)

0.2.6(Aug 25, 2019)

0.2.4(May 7, 2019)

Owner

Shea Newton

LR(1) grammar parser of simple expression

Rust grammar tool libraries and binaries

WIP: Parse archived parler pages into structured html

Parse byte size into integer accurately.

Generate and parse UUIDs.

Parse RISC-V opcodes to provide more detailed structured data

Checks all your documentation for spelling and grammar mistakes with hunspell and a nlprule based checker for grammar

languagetool-code-comments integrates the LanguageTool API to parse, spell check, and correct the grammar of your code comments!

Easy protocol definitions in Rust

syntect is a syntax highlighting library for Rust that uses Sublime Text syntax definitions.

Interface definitions for the Compute@Edge platform in witx.

The Safe Network Core. API message definitions, routing and nodes, client core api.

Experimental Rust tool for generating FFI definitions allowing many other languages to call Rust code

AVR device definitions

Rust definitions corresponding to jni.h.

A library for generating TypeScript definitions from rust code.

A CLI tool for fetching Urban Dictionary definitions

Merge multiple Juniper object definitions into a single object type.

Nushell "extern" definitions for tab completion generated from Fish's

Code-shape is a tool for extracting definitions from source code files