LR(1) parser generator for Rust

Overview

LALRPOP

Join the chat at https://gitter.im/lalrpop/Lobby

Build status

LALRPOP is a Rust parser generator framework with usability as its primary goal. You should be able to write compact, DRY, readable grammars. To this end, LALRPOP offers a number of nifty features:

  1. Nice error messages in case parser constructor fails.
  2. Macros that let you extract common parts of your grammar. This means you can go beyond simple repetition like Id* and define things like Comma<Id> for a comma-separated list of identifiers.
  3. Macros can also create subsets, so that you easily do something like Expr<"all"> to represent the full range of expressions, but Expr<"if"> to represent the subset of expressions that can appear in an if expression.
  4. Builtin support for operators like * and ?.
  5. Compact defaults so that you can avoid writing action code much of the time.
  6. Type inference so you can often omit the types of nonterminals.

Despite its name, LALRPOP in fact uses LR(1) by default (though you can opt for LALR(1)), and really I hope to eventually move to something general that can handle all CFGs (like GLL, GLR, LL(*), etc).

Documentation

The LALRPOP book covers all things LALRPOP -- or at least it intends to! Here are some tips:

  • The tutorial covers the basics of setting up a LALRPOP parser.
  • For the impatient, you may prefer the quick start guide section, which describes how to add LALRPOP to your Cargo.toml.
  • The advanced setup chapter shows how to configure other aspects of LALRPOP's preprocessing.
  • If you have any questions join our gitter lobby.

Example Uses

  • LALRPOP is itself implemented in LALRPOP.
  • Gluon is a statically typed functional programming language.
  • RustPython is Python 3.5+ rewritten in Rust
  • Solang is Ethereum Solidity rewritten in Rust

Contributing

You really should read CONTRIBUTING.md if you intend to change LALRPOP's own grammar.

Comments
  • larger grammars generate a LOT of code

    larger grammars generate a LOT of code

    Hello,

    i try to parse php expressions using lalrpop: https://github.com/timglabisch/rustphp/blob/c79060d6495a55174fc2ad5710d5774d8ec94d67/src/calculator1.lalrpop

    my problem is that cargo run becomes incredible slow:

    time cargo run 1234.76s user 19.88s system 99% cpu 20:58.78 total

    the file is very huge (~50mb):

    cat src/calculator1.rs | wc -l 1044621

    is there something fundamentally wrong or is this expected?

    design-work-needed urgent 
    opened by timglabisch 52
  • Error recovery in 0.13 may end up in infinite loops

    Error recovery in 0.13 may end up in infinite loops

    Tried to upgrade to 0.13.1 (from 0.12.5) but I have two tests for error recovery which now loops forever.

    https://github.com/gluon-lang/gluon/blob/817d61eed462e27e196a7db6872ea57d801fd0ba/parser/tests/error_handling.rs#L64-L80

    cargo test --features test -p gluon_parser --test error_handling wrong_indent

    Looking at it in a debugger shows that after the 2 is received by lalrpop it will go into error recovery where after it will push a reduce action, evaluate it and then go right back into error recovery,

    I am guessing this is to do with the lanetable implementation(?) breaking some assumption in the error recovery so I may may take a look over the weekend. I don't really have any guess as to why the lanetable would break it though so no promises that I know how to solve this!

    opened by Marwes 25
  • Consider generating .rs files into OUT_DIR by default

    Consider generating .rs files into OUT_DIR by default

    This is coming from two angles.

    First, cargocomb can't compile crates which generate files into the source directory https://github.com/rust-lang-nursery/crater/issues/157. While this is perhaps not really LALRPOPs problem, it may be something to consider. gluon couldn't compile with cargobomb which meant a recent improvement to rustc broke an assumption that I had (erroneously) which caused tests to segfault on nightlies with that change.

    Second, with the grammar.rs file in the source tree it can be off-putting for new contributors looking for the parser to find an enormous, impenetrable file.

    (While the first may not be LALRPOPs problem and the second more of a hypotetical, both of these problems were actually discovered when explaining how to contribute to gluon on a meetup so I wanted to take some steps to protect against this in the future).

    Both of these are solved by hiding the file away in OUT_DIR instead which I now do explicitly with https://github.com/gluon-lang/gluon/pull/401 but it might be a good idea to make the default.

    let's-do-this 
    opened by Marwes 24
  • feat: Implement error recovery

    feat: Implement error recovery

    Work in progress branch which aims to allow parsers to recover from parse errors.

    This adds the error keyword (bikesheddable) to LALRPOPs syntax. If error is encountered during parsing the parser goes into recovery mode where it will first pop states until it finds a state which it can potentially recover from. After finding a recoverable state the next step is to skip tokens until a token is found which allows the parser to continue. Recovering from an error acts as a successful parse so it is up to the user to record this error in some way.

    Unresolved questions

    • [x] ~~Is it possible to have recovery for LALR grammars as well? (Need to look into this but haven't found anything concrete yet)~~ I guess his might come for free actually?

    TODO

    • [x] Recover from errors at EOF
    • [x] Allow the error recovered from to be inspected in its action "+" <e: error> => { /* usee*/ }
    • [x] ~~Implement recovery for recursive ascent parsers~~ Skipping this until later when the parse_table implementation has seen some use.
    • [x] Implement unimplemented!() stubs (example generation)
    • [x] Documentation
    • [x] Clean up commits

    Reading

    https://arxiv.org/pdf/1010.1234.pdf https://www.cs.clemson.edu/course/cpsc827/material/LRk/LR%20Error%20Recovery.pdf https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=5&ved=0ahUKEwjt9cTt6JjQAhVGjSwKHV0bAigQFgg7MAQ&url=http%3A%2F%2Fwww.cs.princeton.edu%2Fcourses%2Farchive%2Fspr05%2Fcos320%2Fnotes%2F7-parsing-error.ppt&usg=AFQjCNF0I2l3VAMVyT47DIYGv6wZOUzP2A&sig2=0-c3w6nOFevP5nTamen3dg&cad=rja

    cc #156

    BREAKING CHANGE

    This adds the error keyword to grammar definitions.

    opened by Marwes 23
  • Performance degredation with noddy csv parser

    Performance degredation with noddy csv parser

    The performance of this csv parser code went from reading 1m lines in ~1second to taking minutes with 13.1.

    The grammar is:

    grammar;                                                                        
                                                                                    
    pub Record = Comma<Field>;                                                      
                                                                                    
    Comma<T>: Vec<T> = {                                                            
        <v:(<T> ",")*> <e:T?> => match e {                                          
            None=> v,                                                               
            Some(e) => {                                                            
                let mut v = v;                                                      
                v.push(e);                                                          
                v                                                                   
            }                                                                       
        }                                                                           
    };                                                                              
                                                                                    
    Field: String = {                                                               
        <s:r#""[^"]+""#> => {                                                       
            s[1..s.len()-1].to_string()                                             
        },                                                                          
        <s:r#"[^",][^,]*"#> => {                                                    
            s.to_string()                                                           
        },                                                                          
    }; 
    

    The test program counts the number of fields encountered:

    pub mod csv;
    
    use std::io::BufReader;
    use std::io::BufRead;
    use std::fs::File;
    
    fn main() {
        let fpath = ::std::env::args().nth(1).unwrap();
        let f = File::open(fpath).unwrap();
        let file = BufReader::new(&f);
        let mut sum = 0;
        for line in file.lines() {
            let l = line.unwrap();
            let rec = csv::parse_Record(&l).unwrap();
            sum += rec.len();
        }
        println!("{}", sum);
    }
    

    Results:

    $ for t in 1 10 100 1000 10000 100000 ; do time ./target/release/lalr-reader <(head -${t} /tmp/hello.csv ); done
    5
    
    real    0m0.006s
    user    0m0.002s
    sys     0m0.009s
    50
    
    real    0m0.006s
    user    0m0.004s
    sys     0m0.006s
    500
    
    real    0m0.014s
    user    0m0.014s
    sys     0m0.001s
    5000
    
    real    0m0.085s
    user    0m0.084s
    sys     0m0.001s
    50000
    
    real    0m0.838s
    user    0m0.837s
    sys     0m0.002s
    500000
    
    real    0m8.274s
    user    0m8.261s
    sys     0m0.007s
    
    $ time ./target/release/lalr-reader /tmp/hello.csv 
    5000000
    
    real    1m22.654s
    user    1m22.554s
    sys     0m0.007s
    
    urgent let's-do-this 
    opened by ehiggs 18
  • Allow #![...] attributes in grammar files (clippy)

    Allow #![...] attributes in grammar files (clippy)

    • lalrpop version 0.16.2
    • rustc 1.34.0 (91856ed52 2019-04-10)
    • cargo 1.34.0 (6789d8a0a 2019-04-01)

    Hi! Thanks a lot for this project! I am currently facing a problem were if I use clippy in my project there are pages of clippy errors that come all from the synthesized grammar file.

    Clippy's solution to these things is to add an attribute that will make clippy ignore the errors in that file

    #![allow(clippy)]
    

    But in the lalrpop 16 I try to add said attribute to the top of my grammar .lalrpop file and it throws an error.

    I notice there is PR open that looks like it addresses this problem https://github.com/lalrpop/lalrpop/pull/384 but it hasn't have much traction since Jan.

    I might be able to help fix this if the fix is good enough for my first issue in this repo :)

    Thanks! Fran

    opened by franleplant 17
  • Improve error report about @L @R tokens

    Improve error report about @L @R tokens

    I just confused myself by creating a stupid grammar that would read two @L tokens successively. A minimal example:

    grammar;
    
    Spanned<T> = @L <T> @R;
    Word: String = <s:r"[a-z]+"> => <>;
    
    pub Foo = {
        Spanned<Word> "~",
        Spanned<Spanned<Word>> "+",
    };
    

    When having many macros like Spanned, it can easily happen to next two Spanned macros. The grammar above results in:

    test.lalrpop:3:14: 3:15: Local ambiguity detected
    
      The problem arises after having observed the following symbols in the input:
        @L
      At that point, if the next token is a `r#"[a-z]+"#`, then the parser can
      proceed in two different ways.
    
      First, the parser could execute the production at
      src/grammar/test.lalrpop:3:14: 3:15, which would consume the top 0 token(s)
      from the stack and produce a `@L`. This might then yield a parse tree like
        ╷      ╷ Word  @R
        ├─@L───┘        │
        └─Spanned<Word>─┘
    
      Alternatively, the parser could shift the `r#"[a-z]+"#` token and later use it
      to construct a `Word`. This might then yield a parse tree like
        @L r#"[a-z]+"# @R
        │  └─Word────┘  │
        └─Spanned<Word>─┘
    
      See the LALRPOP manual for advice on making your grammar LR(1).
    

    This is not really helpful right now (my real error message with my big grammar was even more confusing...).

    (the new error reporting is awesome though :3)

    design-work-needed wishlist 
    opened by LukasKalbertodt 15
  • table drive parser [WIP]

    table drive parser [WIP]

    This is a work-in-progress PR for #65. So far it does not do too much, but it already generates const arrays for interned strings and "reduced" items.

    The main intention behind this PR is that @nikomatsakis gets a rough idea how my current approach looks like and to discuss it.

    opened by fhahn 15
  • Split apart UnrecognizedEOF error variant from UnrecognizedToken

    Split apart UnrecognizedEOF error variant from UnrecognizedToken

    Targets #322.

    Not sure if this is still relevant, since the linked issue is from a year ago.

    I have lalrpop-util compiling separately with the new error variant, but I don't understand the code generation logic in lalrpop well enough to fix the breakage. Would appreciate any help!

    opened by nwtnni 14
  • Environment variable 'OUT_DIR' not defined

    Environment variable 'OUT_DIR' not defined

    I get this error while following your tutorial:

    error: environment variable `OUT_DIR` not defined
     --> src/main.rs:3:1
      |
    3 | lalrpop_mod!(pub calculator1);
      | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      |
      = note: this error originates in a macro outside of the current crate (in Nightly builds, run with -Z     external-macro-backtrace for more info)
    error: couldn't read src/0/calculator1.rs: No such file or directory (os error 2)
     --> src/main.rs:3:1
      |
    3 | lalrpop_mod!(pub calculator1);
      | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      |
      = note: this error originates in a macro outside of the current crate (in Nightly builds, run with -Z     external-macro-backtrace for more info)
    
    error: aborting due to 2 previous errors
    
    error: Could not compile `calculator`.
    

    Any advice to fix this?

    opened by Cthutu 13
  • lrgrammar.rs seems to be back in the crate

    lrgrammar.rs seems to be back in the crate

    As of 0.15.2, file lalrpop/src/parser/lrgrammar.rs seems to be back in the crate. That's a 3.7 Mb file that doesn't need to be there, and prevents me from updating some dependencies in Firefox :/

    Can something be done against that?

    opened by Yoric 12
  • Stream errors to the terminal as they are computed

    Stream errors to the terminal as they are computed

    I’m working on a project where LALRPOP typically takes many minutes or hours to compute all the error messages. It’s much more useful to show the messages as they’re computed, than to buffer them all and display them at the end.

    opened by andersk 0
  • `lalrpop_mod!` looks under build subdirectory instead of crate root

    `lalrpop_mod!` looks under build subdirectory instead of crate root

    Running cargo build results in couldn't read target/debug/build/xmd-lalrpop-<id>/out/src/calculator1.lalrpop: no such file or directory.

    I can get the build script to finish if I use lalrpop_mod!(pub calculator1, "/../../../../../src/calculator1.lalrpop") but that really doesn't look like the intended way to use the macro.

    Using cargo -V cargo 1.60.0 (d1fd9fe 2022-03-01)

    Project layout based on calculator example example:

    .
    ├── build.rs
    ├── Cargo.lock
    ├── Cargo.toml
    └── src
       ├── lib.rs
       └── calculator1.lalrpop
    

    src/lib.rs:

    #[macro_use] extern crate lalrpop_util;
    
    lalrpop_mod!(pub calculator1, "/src/calculator1.lalrpop");
    
    opened by littlebenlittle 2
  • state requires copy trait

    state requires copy trait

    As per the instructions: https://lalrpop.github.io/lalrpop/tutorial/009_state_parameter.html

    If I try to pass a state parameter that does not implement the copy trait, I get the error:

       Compiling robotica-node-rust v0.1.0 (/home/brian/tree/personal/robotica-node-rust)
    error[E0507]: cannot move out of `self.context` which is behind a mutable reference
       --> /home/brian/tree/personal/robotica-node-rust/target/debug/build/robotica-node-rust-a07a1b4494bd2ade/out/scheduling/conditions.rs:399:17
        |
    399 |                 self.context,
        |                 ^^^^^^^^^^^^ move occurs because `self.context` has type `Fields<T>`, which does not implement the `Copy` trait
    
    For more information about this error, try `rustc --explain E0507`.
    error: could not compile `robotica-node-rust` due to previous error
    warning: build failed, waiting for other jobs to finish...
    error: could not compile `robotica-node-rust` due to previous error
    

    I am trying to pass a type that is incompatible with the copy trait (clone would be OK however).

    This happens in the reduce function:

            fn reduce(
                &mut self,
                action: i8,
                start_location: Option<&Self::Location>,
                states: &mut alloc::vec::Vec<i8>,
                symbols: &mut alloc::vec::Vec<__state_machine::SymbolTriple<Self>>,
            ) -> Option<__state_machine::ParseResult<Self>> {
                __reduce(
                    self.context,
                    self.input,
                    action,
                    start_location,
                    states,
                    symbols,
                    core::marker::PhantomData::<(&(), T)>,
                )
            }
    

    I am wondering if this parameter can be passed as a reference, so the copy is not required.

    opened by brianmay 1
Owner
null
A typed parser generator embedded in Rust code for Parsing Expression Grammars

Oak Compiled on the nightly channel of Rust. Use rustup for managing compiler channels. You can download and set up the exact same version of the comp

Pierre Talbot 138 Nov 25, 2022
An LR parser generator, implemented as a proc macro

parsegen parsegen is an LR parser generator, similar to happy, ocamlyacc, and lalrpop. It currently generates canonical LR(1) parsers, but LALR(1) and

Ömer Sinan Ağacan 5 Feb 28, 2022
Yet Another Parser library for Rust. A lightweight, dependency free, parser combinator inspired set of utility methods to help with parsing strings and slices.

Yap: Yet another (rust) parsing library A lightweight, dependency free, parser combinator inspired set of utility methods to help with parsing input.

James Wilson 117 Dec 14, 2022
Website for Microformats Rust parser (using 'microformats-parser'/'mf2')

Website for Microformats Rust parser (using 'microformats-parser'/'mf2')

Microformats 5 Jul 19, 2022
A native Rust port of Google's robots.txt parser and matcher C++ library.

robotstxt A native Rust port of Google's robots.txt parser and matcher C++ library. Native Rust port, no third-part crate dependency Zero unsafe code

Folyd 72 Dec 11, 2022
Rust parser combinator framework

nom, eating data byte by byte nom is a parser combinators library written in Rust. Its goal is to provide tools to build safe parsers without compromi

Geoffroy Couprie 7.6k Jan 7, 2023
A fast monadic-style parser combinator designed to work on stable Rust.

Chomp Chomp is a fast monadic-style parser combinator library designed to work on stable Rust. It was written as the culmination of the experiments de

Martin Wernstål 228 Oct 31, 2022
A parser combinator library for Rust

combine An implementation of parser combinators for Rust, inspired by the Haskell library Parsec. As in Parsec the parsers are LL(1) by default but th

Markus Westerlind 1.1k Dec 28, 2022
Rust query string parser with nesting support

What is Queryst? This is a fork of the original, with serde and serde_json updated to 0.9 A query string parsing library for Rust inspired by https://

Stanislav Panferov 67 Nov 16, 2022
Soon to be AsciiDoc parser implemented in rust!

pagliascii "But ASCII Doc, I am Pagliascii" Soon to be AsciiDoc parser implemented in rust! This project is the current implementation of the requeste

Lukas Wirth 49 Dec 11, 2022
PEG parser for YAML written in Rust 🦀

yaml-peg PEG parser (pest) for YAML written in Rust ?? Quick Start ⚡️ # Run cargo run -- --file example_files/test.yaml # Output { "xmas": "true",

Visarut Phusua 4 Sep 17, 2022
This project aims to implement a CSS(less like) parser in rust. Currently the code is targeting the PostCSS AST.

CSS(less like) parser written in rust (WIP) This project aims to implement a CSS(less like) parser in rust. Currently the code is targeting the PostCS

Huang Liuhaoran 21 Aug 23, 2022
MRT/BGP data parser written in Rust.

BGPKIT Parser BGPKIT Parser aims to provides the most ergonomic MRT/BGP message parsing Rust API. BGPKIT Parser has the following features: performant

BGPKIT 46 Dec 19, 2022
This project aims to implement a CSS(less like) parser in rust. Currently the code is targeting the PostCSS AST. Very early stage, do not use in production.

CSS(less like) parser written in rust (WIP) This project aims to implement a CSS(less like) parser in rust. Currently the code is targeting the PostCS

Huang Liuhaoran 21 Aug 23, 2022
A feature-few, no-allocation JSON parser in `no_std` rust.

Small JSON Parser in no_std This library reads and parses JSON strings. Its intended use case is to read a JSON payload once. It does not serialise da

Robert Spencer 18 Nov 29, 2022
A Gura parser for Rust

Gura Rust parser IMPORTANT: if you need to use Gura in a more user-friendly way, you have at your disposal Serde Gura which allows you to perform Seri

Gura Config Lang 21 Nov 13, 2022
Front Matter parser for Rust.

fronma Front Matter parser for Rust. Usage Add this crate as a dependency: [dependencies] fronma = "~0.1" then use fronma::parser::parse to parse text

Ryo Nakamura 6 Nov 19, 2021
A Rust crate for LL(k) parser combinators.

oni-comb-rs (鬼昆布,おにこんぶ) A Rust crate for LL(k) parser combinators. Main project oni-comb-parser-rs Sub projects The following is projects implemented

Junichi Kato 24 Nov 3, 2022
gors is an experimental go toolchain written in rust (parser, compiler).

gors gors is an experimental go toolchain written in rust (parser, compiler). Install Using git This method requires the Rust toolchain to be installe

Aymeric Beaumet 12 Dec 14, 2022