A typed parser generator embedded in Rust code for Parsing Expression Grammars

Related tags

Parsing oak
Overview

Oak

ptal on Travis CI

Compiled on the nightly channel of Rust. Use rustup for managing compiler channels. You can download and set up the exact same version of the compiler used with rustup override add 2021-01-06.

Please consult the Oak manual.

Features

  • Easy to install: PEG grammar description as a Rust procedural macro.
  • User-friendly: most of the types are automatically inferred from the parsing expressions.
  • Safe: Well-formedness analysis guarantees termination.
  • Modular: External parser rules can be called at any time.
  • Fast: Generation of both recognizer and parser functions for each rule.

Build local documentation

You might want to build the manual or code documentation from the repository because you need it to be synchronized with a specific version of Oak or simply for offline usage. Here how to do it!

Build the manual

You need the utility mdbook:

cargo install mdbook

Once installed, go inside oak/doc and execute mdbook build -o. The manual is generated inside a local folder named book and directly opened in your browser.

Build the code documentation

As a user of Oak, you will be interested by the runtime documentation.

cd oak/runtime
cargo doc

The documentation is then available in oak/runtime/target/doc.

To build the internal documentation of Oak, you can type this command at the root of the project:

cd oak
rustdoc --document-private-items --output=target/dev-doc src/liboak/lib.rs

The documentation will be available inside oak/target/dev-doc. It is useful to work on Oak :-)

Comments
  • Well-formed grammar analysis

    Well-formed grammar analysis

    It summarizes what we called the "non-consuming loop analysis" and the left-recursion analysis. Consult the article from Ford, section "Well-formed Grammars". It closes #22 #23 #24.

    static-analysis 
    opened by ptal 8
  • Full tuple unpacking

    Full tuple unpacking

    The following example gives the type ((String, PExpr), PExpr), we would like instead (String, PExpr, PExpr).

    let_expr = let_kw let_binding in_kw expression
    let_binding = identifier bind_op expression
    
    AST-generation typing 
    opened by ptal 6
  • Better API interface of top-level functions and `ParseResult`

    Better API interface of top-level functions and `ParseResult`

    Instead of:

     assert_eq!(calculator::parse_expression("7+(7*2)", 0).unwrap().data, 21);
    

    We should propose a version without the 0 offset parameter:

    assert_eq!(calculator::parse_expression("7+(7*2)").unwrap().data, 21);
    
    code-generation 
    opened by ptal 6
  • Seamingly odd

    Seamingly odd "error: Type mismatch between branches of the choice operator."

    I just started using oak. Probably I'm doing something wrong, but the following struck me as odd:

    Code:

    #![feature(plugin)]
    #![plugin(oak)]
    
    extern crate oak_runtime;
    use oak_runtime::*;
    
    grammar! my_grammar{
      quotation_mark = dquote / ("%22" > char_quotation_mark)
      dquote = ["\""] // %x22
    
      fn char_quotation_mark() -> char { '\"' }
    }
    
    fn main() {
      let state = my_grammar::parse_quotation_mark("%22".into_state());
      println!("{:?}", state.unwrap_data());
    }
    

    Output from from running cargo run

       Compiling oak_test v0.1.0 (file:///home/tstorch/src/oak_test)
    error: Type mismatch between branches of the choice operator.
     --> src/main.rs:8:20
      |
    8 |   quotation_mark = dquote / ("%22" > char_quotation_mark)
      |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      |
    note: char
     --> src/main.rs:8:20
      |
    8 |   quotation_mark = dquote / ("%22" > char_quotation_mark)
      |                    ^^^^^^
    note: char
     --> src/main.rs:8:36
      |
    8 |   quotation_mark = dquote / ("%22" > char_quotation_mark)
      |                                    ^^^^^^^^^^^^^^^^^^^^^^
    
    error: aborting due to previous error
    
    error: Could not compile `oak_test`.
    

    As both sides are char. This leaves me wondering a bit. Is this a bug or intentional behavior?

    cargo and rust version:

    > rustc --version
    rustc 1.19.0-nightly (5b13bff52 2017-05-23)
    > cargo --version
    cargo 0.20.0-nightly (397359840 2017-05-18)
    

    Update: If I do this it works "just fine":

    #![feature(plugin)]
    #![plugin(oak)]
    
    extern crate oak_runtime;
    use oak_runtime::*;
    
    grammar! sum{
      quotation_mark = dquote / dquote_encoded
      dquote = "\"" > char_quotation_mark
      dquote_encoded = "%22" > char_quotation_mark
    
      fn char_quotation_mark() -> char { '\"' }
    }
    
    fn main() {
      let state = sum::parse_quotation_mark("%22".into_state());
      println!("{:?}", state.unwrap_data());
    }
    

    Somehow the bracket notation does not work well with functions returning a char... My current workaround is doing something like this in order to circumvent this:

    a_to_f = ["A-F"] > char_id
    fn char_id(c: char) -> char { c }
    
    opened by tstorch 5
  • Type analysis of sum branches

    Type analysis of sum branches

    For example r1 / r2 in a typed context cannot be typed because we don't have name for the sum type. However r1 > make_r1 / r2 > make_r2 can be typed but the return types of make_r1 and make_r2 must be the same.

    static-analysis typing 
    opened by ptal 5
  • Testing Grammar

    Testing Grammar

    Hello, This is very cool, though I am not sure if I get it correctly. Does it mean that we the help of rust.peg, one could generate valid input sequences for a give grammar? For example, given a grammar for a calculator, then it can randomly generate input strings like '2+3'...?

    opened by Moondee 4
  • Parse string literal to Vec<char>

    Parse string literal to Vec

    Why string literal parsed to -> (^)? It makes no sense. If I want use string literal I need to look ahead with &"literal" and after get tuple of chars with same length and write a function to convert tuple of chars to vec or string or whatever. But if string literal return a value it can be easy dropped with -> (^) or maybe add new operator which can look ahead and consume value.

    I tried realise it, but I don't really understand how oak works.

    Sorry for my bad english.

    opened by jeizsm 3
  • Allow using external functions as semantic actions

    Allow using external functions as semantic actions

    When you write the context actions (such as e > f) in the grammar, they actually must be defined (not only declared) inside the grammar! macro. But it becomes uncomfortable when the grammar is large (for example, one for a whole programming language may require several tens of actions). So, it's currently impossible to write like this:

    // in actions/part1.rs
    pub fn action1() {
        // ...
    }
    // in actions/part2.rs
    pub fn action2() {
        // ...
    }
    
    // in grammar
    grammar! test {
        use super::actions::part1::*;
        use super::actions::part2::*;
        // ...
        some_test = e1 > action1 / e2 > action2
        // ...
    }
    

    So, for large grammar, allowing use functions from other modules would be useful.

    opened by trolley813 3
  • Tree pattern matching

    Tree pattern matching

    Extend the parser with "tree-shape" atoms. This will allow to provide pattern matching on enum type and for example, to directly work with the Rust compiler Lexer. Several notes:

    • We require rules and rust functions to start with a lower case (rules' names and semantic action) to distinguish these from tree-shaped data.
    • Tree-shaped data structure start with an upper case. It would also work in semantic action, as suggested in #53.
    • ~~We leave the type checking of the branches of the choice combinator (e1 / e2, T1 must equals T2) to the Rust compiler. It closes #73.~~ Finally, we chose to check the branches.
    • We allow atom identifiers to be parsed by the host-language rule. The purpose is to allow identifiers with a path such as in mod::BinOp(mod2::Plus). It closes #54.
    • We require the function parsing these "full-identifiers" to return the "core identifier" such as BinOp for mod::BinOp so we can distinguish function/rules VS. tree-data.

    Some references on this topic of tree-pattern matching with PEG:

    • Warth, Alessandro. “Experimenting with Programming Languages,” 2009.
    • Warth, Alessandro, and Ian Piumarta. “OMeta: An Object-Oriented Language for Pattern Matching.” In Proceedings of the 2007 Symposium on Dynamic Languages, 11–19. ACM, 2007.
    grammar AST-generation parsing postponed 
    opened by ptal 3
  • Update to use rust nightly-2017-04-26

    Update to use rust nightly-2017-04-26

    It's my first time in the internals of Rust, so I may have made a mistake here, but the code compiles and the tests all pass. The only thing I'm really unsure about is whether I'm using the correct constant for the ctxt member of the syntax::codemap::Span struct, but NO_EXPANSION probably seems the safest for now.

    opened by jasonl 2
  • Oak causes Rust to build any project including

    Oak causes Rust to build any project including "oak_runtime" as dynamically linked

    TL;DR: Because oak_runtime includes extern crate syntax, but is a runtime crate rather than a compiler plugin, rustc produced a dynamically linked executable instead of a statically-linked one, which makes distribution difficult/impossible.

    According to Alex Chrichton at https://github.com/rust-lang/rust/issues/32996, it is intended behaviour for rustc to switch to using dynamic linking when syntax is referred to in a project, because the rlib for that crate is not included with the Rust standard library distribution, but only a .so library, presumably on the assumption that only the compiler will need to dynamically link to that library when it loads a compiler plugin.

    However, because of Rust's linking strategy resolution (as detailed in the comments above), any of the final libraries referring to it will cause Rust to decide to dynamically link all libraries for the executable produced.

    This becomes a problem at it precludes the use of the x86_64-unknown-linux-musl architecture, which is useful for packaging up the Rust executable with the very lightweight Alpine Linux in a docker container for deployment, or even using a full distribution like Ubuntu, as the produced executable won't run without the dynamic libraries in the rust toolchain - again making it almost impossible to distribute. In effect, any rust program that includes oak_runtime can only be executed with cargo run, which is not a suitable thing for production systems.

    It seems the commit which introduced this issue is https://github.com/ptal/oak/commit/41e355feacd049cab6a525210f06eebf9d73633a

    opened by jasonl 2
  • Allow stable compilation

    Allow stable compilation

    I'd really like to use oak in Differential Datalog, but unfortunately it has the requirement of stable Rust, allowing the nightly features to be toggled with a Cargo feature or removing them entirely to allow stable compilation would be really amazing

    opened by Kixiron 1
  • Token parsing

    Token parsing

    Initially, #80 was opened to be able to directly parsed on the rust token stream and, taking advantages of techniques already used in OMeta, being able to parse on arbitrary tree structure. However we are in a typed context and it seems more difficult than what I first thought. Similarly to #92, we propose to use the string literal to parse on enum's variants.

    Design

    • Provide a stream structure encapsulating the Rust parser, it will also be useful to interact with external parser.
    • Implement the trait ConsumePrefix such that it consumes one or more tokens depending on the prefix.
    • A literal, for example "=" will correspond to the token Eq. If the literal cannot be mapped, then an assert will be triggered at runtime. General parsing on tree proposed by #80 would avoid this runtime error, however, for now, we believe it's easy enough to find and fix.
    • Tokens containing data must be parsed with an external parser (Ident(..), Literal(..), Lifetime(..),....) if you want to extract the contained data. Otherwise, the token data will be compared against the string literal. Note that Ident("a_variable") does not match "a_var" "iable" but only "a_variable".
    runtime-lib 
    opened by ptal 0
  • List of expressions

    List of expressions

    A list of expressions is defined by e % sep which is equivalent to e (sep e)* in term of parsing but not equivalent in term of types. Indeed the first expression will have the type Vec<E> while the second (E, Vec<E>) or (E, Vec<(E, Sep)>) if the separator has a type different from () or (^).

    • e % sep has the type Vec<E> and sep must have the type () (or (^)) or a typing error should be exposed.
    grammar code-generation typing 
    opened by ptal 0
  • Wrong error reporting for not syntactic predicate

    Wrong error reporting for not syntactic predicate

    For example take the grammar

    predicate = (!"b" .)+ 
    

    with the input b, it will output the wrong message:

    unexpected `b`, expecting .
    

    Other example could be find as well, this is because when logging errors we do not know we under in a not predicate.

    error-reporting 
    opened by ptal 1
  • Error reporting system

    Error reporting system

    Instead of building errors in different way and everywhere in the code, we take the approach of implementing a file error.rs in each module (such as more or less done in rustc). We use an error enum such as:

    enum TypingError {
      TypeMismatch(TypeMismatchInfo),
      ...
    }
    
    struct TypeMismatchInfo {
      span: Span;
      ...
    }
    

    Each error has a struct containing the relevant information for later printing the error. An error enum must implement a trait to retrieve an error code:

    trait ErrorCode {
      fn error_code(&self) -> String;
    }
    

    That returns the name of the variant, here TypeMismatch. It is used in the error attribute #29.

    Each error structure must implement a trait for printing the error:

    trait DisplayError {
      fn display_error(&self, cx: &ExtCtxt);
    }
    

    This is flexible enough to allow different kind of printing, even for a JSON output or for deep explanation (as with --explain in rustc).

    error-reporting refactoring 
    opened by ptal 1
Owner
Pierre Talbot
Pierre Talbot
Yet Another Parser library for Rust. A lightweight, dependency free, parser combinator inspired set of utility methods to help with parsing strings and slices.

Yap: Yet another (rust) parsing library A lightweight, dependency free, parser combinator inspired set of utility methods to help with parsing input.

James Wilson 117 Dec 14, 2022
LR(1) grammar parser of simple expression

LR(1)语法分析程序 实验内容 编写LR(1)语法分析程序,实现对算术表达式的语法分析。要求所分析算数表达式由如下的文法产生: E -> E+T | E-T | T T -> T*F | T/F | F F -> (E) | num 程序设计与实现 使用方式:运行.\lr1-parser.exe

Gao Keyong 1 Nov 24, 2021
Extensible inline parser engine, the backend parsing engine for Lavendeux.

Lavendeux Parser - Extensible inline parser engine lavendeux-parser is an exensible parsing engine for mathematical expressions. It supports variable

Richard Carson 10 Nov 3, 2022
A parser combinator for parsing &[Token].

PickTok A parser combinator like nom but specialized in parsing &[Token]. It has similar combinators as nom, but also provides convenient parser gener

Mikuto Matsuo 6 Feb 24, 2023
LR(1) parser generator for Rust

LALRPOP LALRPOP is a Rust parser generator framework with usability as its primary goal. You should be able to write compact, DRY, readable grammars.

null 2.4k Jan 7, 2023
An LR parser generator, implemented as a proc macro

parsegen parsegen is an LR parser generator, similar to happy, ocamlyacc, and lalrpop. It currently generates canonical LR(1) parsers, but LALR(1) and

Ömer Sinan Ağacan 5 Feb 28, 2022
Website for Microformats Rust parser (using 'microformats-parser'/'mf2')

Website for Microformats Rust parser (using 'microformats-parser'/'mf2')

Microformats 5 Jul 19, 2022
This project aims to implement a CSS(less like) parser in rust. Currently the code is targeting the PostCSS AST.

CSS(less like) parser written in rust (WIP) This project aims to implement a CSS(less like) parser in rust. Currently the code is targeting the PostCS

Huang Liuhaoran 21 Aug 23, 2022
This project aims to implement a CSS(less like) parser in rust. Currently the code is targeting the PostCSS AST. Very early stage, do not use in production.

CSS(less like) parser written in rust (WIP) This project aims to implement a CSS(less like) parser in rust. Currently the code is targeting the PostCS

Huang Liuhaoran 21 Aug 23, 2022
A Rust library for zero-allocation parsing of binary data.

Zero A Rust library for zero-allocation parsing of binary data. Requires Rust version 1.6 or later (requires stable libcore for no_std). See docs for

Nick Cameron 45 Nov 27, 2022
Parsing and inspecting Rust literals (particularly useful for proc macros)

litrs: parsing and inspecting Rust literals litrs offers functionality to parse Rust literals, i.e. tokens in the Rust programming language that repre

Lukas Kalbertodt 31 Dec 26, 2022
Rust library for parsing configuration files

configster Rust library for parsing configuration files Config file format The 'option' can be any string with no whitespace. arbitrary_option = false

The Impossible Astronaut 19 Jan 5, 2022
A Rust crate for RDF parsing and inferencing.

RDF-rs This crate provides the tools necessary to parse RDF graphs. It currently contains a full (with very few exceptions) Turtle parser that can par

null 2 May 29, 2022
rbdt is a python library (written in rust) for parsing robots.txt files for large scale batch processing.

rbdt ?? ?? ?? ?? rbdt is a work in progress, currently being extracted out of another (private) project for the purpose of open sourcing and better so

Knuckleheads' Club 0 Nov 9, 2021
A Rust crate for hassle-free Corosync's configuration file parsing

corosync-config-parser A Rust crate for hassle-free Corosync's configuration file parsing. Inspired by Kilobyte22/config-parser. Usage extern crate co

Alessio Biancalana 2 Jun 10, 2022
This crate provide parsing fontconfig file but not yet complete all features

This crate provide parsing fontconfig file but not yet complete all features

null 4 Dec 27, 2022
A native Rust port of Google's robots.txt parser and matcher C++ library.

robotstxt A native Rust port of Google's robots.txt parser and matcher C++ library. Native Rust port, no third-part crate dependency Zero unsafe code

Folyd 72 Dec 11, 2022
Rust parser combinator framework

nom, eating data byte by byte nom is a parser combinators library written in Rust. Its goal is to provide tools to build safe parsers without compromi

Geoffroy Couprie 7.6k Jan 7, 2023
A fast monadic-style parser combinator designed to work on stable Rust.

Chomp Chomp is a fast monadic-style parser combinator library designed to work on stable Rust. It was written as the culmination of the experiments de

Martin Wernstål 228 Oct 31, 2022