Rust grammar tool libraries and binaries


Grammar and parsing libraries for Rust

Bors enabled lrpar on lrlex on lrtable on cfgrammar on

grmtools is a suite of Rust libraries and binaries for parsing text, both at compile-time, and run-time. Most users will probably be interested in the compile-time Yacc feature, which allows traditional .y files to be used (mostly) unchanged in Rust.


A minimal example using this library consists of two files (in addition to the grammar and lexing definitions). First we need to create a file in the root of our project with the following content:

use cfgrammar::yacc::YaccKind;
use lrlex::CTLexerBuilder;

fn main() -> Result<(), Box<dyn std::error::Error>> {
        .lrpar_config(|ctp| {

This will generate and compile a parser and lexer, where the definitions for the lexer can be found in src/calc.l:

[0-9]+ "INT"
\+ "+"
\* "*"
\( "("
\) ")"
[\t ]+ ;

and where the definitions for the parser can be found in src/calc.y:

%start Expr
%avoid_insert "INT"
Expr -> Result<u64, ()>:
      Expr '+' Term { Ok($1? + $3?) }
    | Term { $1 }

Term -> Result<u64, ()>:
      Term '*' Factor { Ok($1? * $3?) }
    | Factor { $1 }

Factor -> Result<u64, ()>:
      '(' Expr ')' { $2 }
    | 'INT'
          let v = $1.map_err(|_| ())?;
// Any functions here are in scope for all the grammar actions above.

fn parse_int(s: &str) -> Result<u64, ()> {
    match s.parse::<u64>() {
        Ok(val) => Ok(val),
        Err(_) => {
            eprintln!("{} cannot be represented as a u64", s);

We can then use the generated lexer and parser within our src/ file as follows:

use std::env;

use lrlex::lrlex_mod;
use lrpar::lrpar_mod;

// Using `lrlex_mod!` brings the lexer for `calc.l` into scope. By default the
// module name will be `calc_l` (i.e. the file name, minus any extensions,
// with a suffix of `_l`).
// Using `lrpar_mod!` brings the parser for `calc.y` into scope. By default the
// module name will be `calc_y` (i.e. the file name, minus any extensions,
// with a suffix of `_y`).

fn main() {
    // Get the `LexerDef` for the `calc` language.
    let lexerdef = calc_l::lexerdef();
    let args: Vec<String> = env::args().collect();
    // Now we create a lexer with the `lexer` method with which we can lex an
    // input.
    let lexer = lexerdef.lexer(&args[1]);
    // Pass the lexer to the parser and lex and parse the input.
    let (res, errs) = calc_y::parse(&lexer);
    for e in errs {
        println!("{}", e.pp(&lexer, &calc_y::token_epp));
    match res {
        Some(r) => println!("Result: {:?}", r),
        _ => eprintln!("Unable to evaluate expression.")

For more information on how to use this library please refer to the grmtools book, which also includes a more detailed quickstart guide.


lrpar contains several examples on how to use the lrpar/lrlex libraries, showing how to generate parse trees and ASTs, or execute code while parsing.


Latest release master
grmtools book grmtools book
cfgrammar cfgrammar
lrpar lrpar
lrlex lrlex
lrtable lrtable

Documentation for all past and present releases

  • Unused symbol

    Unused symbol

    I wasn't having much luck with Unused due to conflicts, which I believe i'm starting to understand why bison's production of that warning is limited (to Shift/Reduce conflicts). So here is a prototype for warnings, which adds an unused symbol warning for yacc which are not due to conflicts.

    It tries to include a mechanism through which we can also produce warnings as errors, and largely follows from the all the YaccGrammarError work that has been done minus the Error trait.

    A few things are noteworthy spanskind wanted to be on YaccGrammarWarningKind as well as YaccGrammarWarning, so that YaccGrammarError could leverage it when treating warnings as errors, SpansKind::Error is also an awkward choice of naming here.

    There is some subsequent patches left to do:

    1. Integrating this into various tools
    2. ~~I noticed that complete_and_validate is still returning a YaccGrammarError, and that we should probably make it that Vec<YaccGrammarError>.~~ Realized this is just pub(crate) not pub, so can do this whenever.
    3. How or if we should pass a warnings_as_errors flag to new_with_storaget, and whether the place I implemented warnings() on YaccGrammarError is actually the right place, or if this is something that callers should do using the From impl given.
    4. When we complete_and_validate() and there are errors, this code is only callable too late to actually have both warnings and errors, but implementing it to be callable earlier seemed tougher since it relies on PIdx's a bit.

    As such I'm going to mark this as draft pre-emptively, since it seems like there is some thought that needs to go into what the best thing to do here is, and whether this is even worth the additional types/impls?

    opened by ratmice 53
  • Implement actions

    Implement actions

    This PR is under construction and should not be merged in its current state. The goal of this PR is to implement grammar actions similar to YACC, for example:

    %start Expr
    Expr: Term 'PLUS' Expr { add($0, $2) }
        | Term { $0 }
    Term: Factor 'MUL' Term { mul ($0, $2) }
        | Factor { $0 }
    Factor: 'LBRACK' Expr 'RBRACK' { $1 }
          | 'INT' { $0 }
    type TYPE = u64;
    fn add(arg1: u64, arg2: u64) -> u64 {
        arg1 + arg2
    fn mul(arg1: u64, arg2: u64) -> u64 {
        arg1 * arg2
    opened by ptersilie 38
  • Started work on conflict API

    Started work on conflict API

    ** PR in progress **

    The goal of this PR is to provide a nice API for the user (i.e. the language developer) to inspect shift/reduce and reduce/reduce conflicts and handle them manually.

    At the moment the API is very basic, but I thought I get some early thoughts on the direction before continuing. An example usage of the API can be found in nimbleparse, where it just pretty prints the conflicts to the terminal.

    My main question would be, where the user would be expected to handle these errors. For example within the calc examples, would that be in or Or am I off track again, and we want to allow the user to handle the conflict as it happens and they then can manually change how the parser generator resolves those conflicts?

    opened by ptersilie 37
  • Add a name_span field to Rule.

    Add a name_span field to Rule.

    This adds a name_span field to lexerdef's rule, I assume it is okay to depend on span here from lrpar in lrlex since the depdency is already there.

    Notes about the impl: The span refers to inside the quotes, of the quoted name, and in the case of an anonymous rule with no name, It points to the empty string at the semicolon.

    This is why name is an option but span is not, this still needs to be documented and tested, But I wanted to post this up for comments before writing docs/tests in case of feedback on the impl causes different behavior. So for now I merely checked it just with iter_rules() manually.

    It doesn't add spans for re_str (re_span?), which is private and I'm not sure it would be alright to have a public span for that private field, and i'm not sure if I would actually need it -- I still need to play around with adding this info to diagnostics to get a feel for that.

    opened by ratmice 34
  • Use packed vectors to store state tables

    Use packed vectors to store state tables

    This PR replaces the state table HashMaps (goto and actions) with packed vectors. This sacrifices memory usage for performance, as it makes table lookups O(1).

    Previously, states were stored as (stIdx, ridx) => stidx, where stidx is a state id, and ridx is a rule id. Now, lookups are done by calculating an index into the vector from stidx and ridx which reveals the state, e.g. goto[stidx * nt_len + ridx], where nt_len is the amount of nonterminals in the grammar.

    We then use to pack the vectors to reduce memory usage.

    opened by ptersilie 33
  • WIP: %parse-param

    WIP: %parse-param

    This merely adds some basic scaffolding for %parse-param, issue #212, It contains some ugly hacks/commented out code to get tests compiling with the unimplemented feature. Mostly wanted to ask if this looks like an ok syntax for lifetimes.

    opened by ratmice 30
  • Add a span field to LexBuildError

    Add a span field to LexBuildError

    This currently uses the offset given to build an empty span at (off, off). I've commented in the tests, some spans that I believe might be the right thing, but without really digging into the parser code for each case and checking out the results, I'm not certain of the accuracy of these comments.

    I mainly wanted to push this before starting any work on that, because that will probably take a bit of work, increase the size of the patch, and that work should be isolated private API.

    My feeling is the next step would be to remove the line/col fields from the structure and the error text, moving that to callers using the span, in this patch we still need those, because by the time we format the message we no longer have the text to count newlines. Does that sound reasonable?

    opened by ratmice 29
  • Allow different rules to have different action types.

    Allow different rules to have different action types.

    This falls under the "why didn't this occur to me earlier?" heading.

    Our previous %actiontype solution is, more-or-less, useless in practise. It works in C because the union type that's created allows C programmers to treat chunks of memory as dynamically typed (happily seg-faulting if the programmer fished the wrong thing out of the union). Consider this (highly contrived) grammar:

      %start S
      S: A | S A;
      A: 'a';

    i.e. match one or more "a" characters. Let's assume that, for each "a" we match we want to produce some value. How should we write this? Our only option is "%actiontype Vec<...>" because "S" must return a Vec, even though "A" only sensibly returns a singular value. In this case, this feels inelegant, but it works.

    However, what happens if you've got a real programming language building up an AST? Consider a simple language which allow assignments and expressions:

      Assignments: Assignment | Assignments Assignment;
      Assignment: "ID" "=" Expr
      Expr: "INT" "+" Expr | "INT"

    "Assignments" should return "Vec" and "Expr" should return an "Expr". What should our %actiontype be now? Well, we can make an enum:

      enum AssignsOrExprs {

    and then we can have "%actiontype AssignsOrExprs". But we now have to scale this up to every AST type in our entire program which is horrible, and also turns our seemingly statically typed Rust into dynamically typed code: the number of "match" statements doesn't bear thinking about!

    How do other grammar generators deal with this? Well they do the obvious (in retrospect) thing of allowing each different rule's actions to return different types (note: each production in a rule must return the same type!). That's what this commit does. Interestingly, this requires a simple tweak to cfgrammar; a big change to; and not much else.

    First we add YaccKind::Grmtools, a new variant on Yacc syntax. %actiontype is not valid in this grammar type. Instead, each rule in this grammar type must have a type after its name:

      %start S
      S::Vec<A>: A { vec![$1] }
        | S A {
      A::A: 'a' { A } ;
      pub struct A;

    How does this work? Before we translated each production's action into a rule which (for the first production in the grammar above) looked roughly like this:

      fn action_prod0(..., args: Drain<...>) -> Vec<A> {
         let arg1 = match {
            AStackType::ActionType(x) => x,
            _ => unreachable!()

    The crucial detail now is that we know the types of all rules in advance. So we generate an enum of all action types, and translate each action into a wrapper and an action:

      enum ActionsKind {
      fn wrapper_prod0(..., args: Drain<...>) -> Vec<A> {
        let arg1 = match {
          AStackType::ActionType(x) => x,
          _ => unreachable!()
      fn action_prod0(..., mut arg1: A) -> Vec<A> {
      fn wrapper_prod1(..., args: Drain<...>) -> Vec<A> {
        let arg1 = match {
          AStackType::ActionType(x) => x,
          _ => unreachable!()
        let arg2 = match {
          AStackType::ActionType(x) => x,
          _ => unreachable!()
        action_prod0(arg1, arg2)
      fn action_prod1(..., mut arg1: Vec<A>, mut arg2: A) -> Vec<A> {
      fn wrapper_prod2(..., args: Drain<...>) -> Vec<A> {
      fn action_prod2(...) -> A {

    The cunning thing about this is that we don't have to change the way the parser works: it receives "action" functions that are wrappers, all of which have the same function signature (i.e. from the parser's perspective it's a bit like we wrote %actiontype ActionsKind). The wrapper functions then unpack the Drain, and call the "actual" action functions which contain user code. The actual action functions have their function's arguments and return types statically typed, so Rust statically guarantees that you can't, for example, mix your Classes and Imports. This does incur a mild additional overhead because of the ActionsKind enum (one extra machine word, at worst, per ActionType we're holding), but not enough to hugely worry us. And now we can write grammars which generate ASTs with ease!

    As a happy bonus, I realised that we can make arguments to action functions be mutable (hence the mut arg1: Vec<A> above), which makes doing thing like flattening lists a lot more ergonomic.

    opened by ltratt 25
  • Tentatively add a $span pseudo-variable

    Tentatively add a $span pseudo-variable

    This allows you to tell how much input the current production has matched, which can be useful for giving better debugging information to users. Its type is (usize, usize) where these represent (start, end) offsets in the input.

    For example if you have a rule:

      Expr: Term '+' Expr { println!("{:?}", $span); } ;

    and input along the lines of "2+3", you will get the output "(0, 3)".

    Interestingly, users can mostly calculate this same information themselves (by inspecting tokens start/end positions), except for epsilon rules where there's no way of knowing where in the input we are. So this production can't be made to work sensibly except with $span:

      R: { println!("{:?}", $span); }

    This is a bit tentative, because I haven't used this enough yet to know if it's the right design: feedback is welcome! The major commit is with documentation in; the other commits are mostly shuffling a few things around.

    opened by ltratt 23
  • More flexible input lifetime

    More flexible input lifetime

    This is based on it tries to decouple the lifetime of a lexer from its input. That PR is a work of near genius: I simply would not have imagined that the outcome it achieves is possible without the PR as a proof-of-existence. The only minor problem was that I couldn't work out how it achieved its effect. I therefore tried to simplify the PR, but didn't get very far. I then tried reimplementing it, and didn't get very far with that either.

    This PR is the result of me taking a different approach. First I simplified and unified the existing lifetimes in grmtools, because there were several inconsistencies, which I thought might be responsible for some of the pain in #174. Once that was done I could then add an explicit 'input lifetime in which I think achieves the same effect as #174. Certainly it is enough to allow this program to now compile:

          fn main() {
              let lexerdef = t_l::lexerdef();
              let input = "a";
              let t = {
                  let lexer = lexerdef.lexer(&input);
                  let lx = lexer.iter().next().unwrap().unwrap();

    where previously rustc would complain:

    error[E0597]: `lexer` does not live long enough
      --> src/
    9  |     let t = {
       |         - borrow later stored here
    12 |         lexer.span_str(lx.span())
       |         ^^^^^ borrowed value does not live long enough
    13 |     };
       |     - `lexer` dropped here while still borrowed
    error: aborting due to previous error

    However, I am not sure if this PR is able to handle all the same cases as #174. I'm hoping that @valarauca will be able to let me know if this solves the problem that led him to create #174.

    opened by ltratt 21
  • Add badges linking to

    Add badges linking to


    The badges are practical in a way that they

    • make it obvious at first glance which components are contained in this project
    • identify the current version numbers
    • link to the documentation of the individual projects and their READMEs
    opened by pablosichert 20
  • Adding a %grammar-kind declaration?

    Adding a %grammar-kind declaration?

    Before I try and come up with a patch, I figured it would be good to discuss this in an issue, I was considering potentially adding a declaration %grammar-kind Original(NoAction), etc

    One of the problems with this is that it is likely that we want to parse the value by just using Deserialize on the YaccGrammarKind, this would at least be the easiest way. But it brings about a few issues

    1. cfgrammar has Optional deserialization support, so if we deserialized that way %grammar-kind would only work with serde feature enabled. Alternately we could just implement this by hand instead of serde?
    2. Some declarations depend upon a specific %grammar-kind, we may have to move some checks from parse_declarations to ast.complete_and_validate.

    But it could potentially reduce the number of places that YaccGrammarKind needs to be specified (, nimbleparse command line, etc). So it seemed like it might be worth considering.

    opened by ratmice 0
  • Permit stack operations on start conditions

    Permit stack operations on start conditions

    In #318, start state logic was added for start states defined by name. In the POSIX lex standard, start states can be used by numeric id

    Q: Should this include support for expanding the target start state logic to support increment and decrementing the current start state, as well as setting to an explicit target?

    opened by SMartinScottLogic 13
  • Remove debug formatting in non-debug locations

    Remove debug formatting in non-debug locations

    In a couple of places (e.g. we use debug formatting in a non-debug location. This feels somewhat unsatisfactory, particularly as there are fewer guarantees about stability.

    opened by ltratt 2
  • Error span improvements

    Error span improvements

    In pr #299 which adds spans to various Error types, the Spans returned are based off of the existing offset data from which we can derive a line & column. As it is we currently always return a span where start == end, since it is just getting us to the desired semver ABI.

    1. SemVer compatible changes (after we add Spans to Errors): After that PR we could include in the error more information from the parse functions into YaccParserError and LexBuildError. This may require some reorganization of the various private parse functions.
    2. Potential SemVer incompatible changes (after we add Spans to Errors): YaccErrorKind, and LexErrorKind could sometimes have useful additional spans, for instance LexErrorKind::DuplicateName Could have a span pointing to the first occurrence of the duplicate entry.
    • [ ] SemVer compatible improvements
    • [x] SemVer incompatible improvements
    opened by ratmice 0
  • Apparently infinite recursive rule

    Apparently infinite recursive rule

    One of the "fun" things about my project is running the parser on strange, half edited/incomplete changes. Here is one such case I encountered that way, and have minimized.

    given the input character a, this will cause an infinite loop pushing to pstack between adding a case like: Some(i) if i == usize::from(stidx) + 1 => None, to goto fixes it, (i.e. the return value of goto == prior).

    Filing this as a bug report rather than sending a PR though, because I haven't yet tested it against valid parsers, or as of yet tried to surmise if this case can only and always lead to infinite recursion or if it ever actually comes up in a valid way.

    a "a"
    [\t\n ] ;
    Start: Bar;
    Foo: "a" | ;
    Bar: Foo | Foo Bar;
    opened by ratmice 5
  • Expose more than one rule?

    Expose more than one rule?

    Question / Feature Request: Is there any way to parse a specific rule as the starting parser? For example, if I have:

    %start Expr
    Expr -> ...;
    Int -> ...;

    I also want to be able to parse a string as Int, not just Expr.

    (I'm trying to port my parser from LALRPOP to lrpar (mainly because of the operator precedence feature) which exposes a parser for any rule prefixed with the keyword pub.)

    opened by utkarshkukreti 3
Software Development Team
Software Development Team
Parse BNF grammar definitions

bnf A library for parsing Backus–Naur form context-free grammars. What does a parsable BNF grammar look like? The following grammar from the Wikipedia

Shea Newton 188 Dec 26, 2022
LR(1) grammar parser of simple expression

LR(1)语法分析程序 实验内容 编写LR(1)语法分析程序,实现对算术表达式的语法分析。要求所分析算数表达式由如下的文法产生: E -> E+T | E-T | T T -> T*F | T/F | F F -> (E) | num 程序设计与实现 使用方式:运行.\lr1-parser.exe

Gao Keyong 1 Nov 24, 2021
Pure, simple and elegant HTML parser and editor.

HTML Editor Pure, simple and elegant HTML parser and editor. Examples Parse HTML segment/document let document = parse("<!doctype html><html><head></h

Lomirus 16 Nov 8, 2022
A native Rust port of Google's robots.txt parser and matcher C++ library.

robotstxt A native Rust port of Google's robots.txt parser and matcher C++ library. Native Rust port, no third-part crate dependency Zero unsafe code

Folyd 72 Dec 11, 2022
JsonPath engine written in Rust. Webassembly and Javascript support too

jsonpath_lib Rust 버전 JsonPath 구현으로 Webassembly와 Javascript에서도 유사한 API 인터페이스를 제공 한다. It is JsonPath JsonPath engine written in Rust. it provide a simil

Changseok Han 95 Dec 29, 2022
Parsing and inspecting Rust literals (particularly useful for proc macros)

litrs: parsing and inspecting Rust literals litrs offers functionality to parse Rust literals, i.e. tokens in the Rust programming language that repre

Lukas Kalbertodt 31 Dec 26, 2022
A Rust crate for RDF parsing and inferencing.

RDF-rs This crate provides the tools necessary to parse RDF graphs. It currently contains a full (with very few exceptions) Turtle parser that can par

null 2 May 29, 2022
Yet Another Parser library for Rust. A lightweight, dependency free, parser combinator inspired set of utility methods to help with parsing strings and slices.

Yap: Yet another (rust) parsing library A lightweight, dependency free, parser combinator inspired set of utility methods to help with parsing input.

James Wilson 117 Dec 14, 2022
🕑 A personal git log and MacJournal output parser, written in rust.

?? git log and MacJournal export parser A personal project, written in rust. WORK IN PROGRESS; NOT READY This repo consolidates daily activity from tw

Steven Black 4 Aug 17, 2022
Sqllogictest parser and runner in Rust.

Sqllogictest-rs Sqllogictest parser and runner in Rust. License Licensed under either of Apache License, Version 2.0 (LICENSE-APACHE or http://www.apa

Singularity Data Inc. 101 Dec 21, 2022
A library to display rich (Markdown) snippets and texts in a rust terminal application

A CLI utilities library leveraging Markdown to format terminal rendering, allowing separation of structure, data and skin. Based on crossterm so works

Canop 614 Dec 29, 2022
A CSS parser, transformer, and minifier written in Rust.

@parcel/css A CSS parser, transformer, and minifier written in Rust. Features Extremely fast – Parsing and minifying large files is completed in milli

Parcel 3.1k Jan 9, 2023
An IRC (RFC1459) parser and formatter, built in Rust.

ircparser An IRC (RFC1459) parser and formatter, built in Rust. ircparser should work on basically any Rust version, but the earliest version checked

Ethan Henderson 2 Oct 18, 2022
A WIP svelte parser written in rust. Designed with error recovery and reporting in mind

Svelte(rs) A WIP parser for svelte files that is designed with error recovery and reporting in mind. This is mostly a toy project for now, with some v

James Birtles 3 Apr 19, 2023
A rusty, dual-wielding Quake and Half-Life texture WAD parser.

Ogre   A rusty, dual-wielding Quake and Half-Life texture WAD parser ogre is a rust representation and nom parser for Quake and Half-Life WAD files. I

Josh Palmer 16 Dec 5, 2022
A modern dialogue executor and tree parser using YAML.

A modern dialogue executor and tree parser using YAML. This crate is for building(ex), importing/exporting(ex), and walking(ex) dialogue trees. convo

Spencer Imbleau 27 Aug 3, 2022
Org mode structural parser/emitter with an emphasis on modularity and avoiding edits unrelated to changes.

Introduction Org mode structural parser/emitter with an emphasis on modularity and avoiding edits unrelated to changes. The goal of this library is to

Alex Roper 4 Oct 7, 2022
Generate and parse UUIDs.

uuid Here's an example of a UUID: 67e55044-10b1-426f-9247-bb680e5fe0c8 A UUID is a unique 128-bit value, stored as 16 octets, and regularly formatted

Rust Uuid 754 Jan 6, 2023
Parser for Object files define the geometry and other properties for objects in Wavefront's Advanced Visualizer.

format of the Rust library load locad blender obj file to Rust NDArray. cargo run test\t10k-images.idx3-ubyte A png file will be generated for the fi

Nasser Eddine Idirene 1 Jan 3, 2022