A generator for high-performance Pest parsers, bringing your grammar to the next level

Last update: Jan 13, 2023

Related tags

Encoding JSON rust json parsing serde nom pest

Overview

Faster-Pest

Welcome to faster-pest, a high-performance code generator for Parsing Expression Grammars. faster-pest is an unofficial pro-macro providing next-level implementations of Pest parsers. It uses low-level optimization tricks under the hood to generate highly optimized code which minimizes the overhead of the AST recognition process, resulting in much faster parsing.

faster-pest is compatible with the standard Pest syntax, so you can easily switch to it without having to change your existing grammar.

With faster-pest, you can enjoy the convenience and expressiveness of Pest while getting the performance of a low-level parsing library. Give it a try and experience the difference for yourself!

The parsing approach used under the hood has nothing in common with the original pest code. To be honest, I never looked at the pest codebase, because it was easier to start from scratch. There is still one thing that was not reimplemented: the parsing of the actual pest grammar. However, this might not last. I need to extend the grammar to enable more advanced tricks, like making it possible to define complex rules with Rust code and import them in a pest grammar.

Benchmarks

Only a week after its creation, faster-pest already parses Json at 705% the speed of Pest and 137% the speed of Nom. This places faster-pest on par with serde_json. faster-pest allows you to approach limits that only SIMD-powered parsers can overcome.

Benchmark yourself

Examples

See the example folder for examples.

It contains two examples from the Pest book: csv and ini.
These use the exact same code as in the Pest book, showing that faster-pest is a drop-in replacement for Pest.

If you don't have any legacy Pest codebase, it is recommended to not use the pest compatibility layer. See other two examples: json and po.
These are the most efficient and idiomatic uses of faster-pest. They work rather similarly to the pest compatibility layer, but their implementation is nicer.

Limitations

faster-pest is still in its early stages of development, so it has some limitations. Here are the most important ones:

Limited syntax support (Missing: stack, insens, pospred)
The tokens API of Pest is not supported (you probably didn't use that)
Error printing is made for Linux
Errors can be obscure when a repetition ends prematurely
Not everything has been tested and there could be incorrect parsing behavior

Optimization tricks used (for curious people)

faster-pest generates two versions of every parsing component that exists. One version has error support, the other doesn't. There are so many places where error support is not needed because it would be discarded rightaway (like a failing branch). faster-pest will only retrieve errors if parsing completely fails, so any valid input will only result in calls of completely error-unaware code. From the developer point of view, this optimization is completely transparent.
Groups of rules are sometimes grouped into a single rule where pest would have split them
Repetitions of simple character rules use iterator adapters instead of loops
Every unnecessary check is bypassed
Allocations are made in bulk which makes them fairly sporadic
Code is so small it is likely to get inlined often by the compiler
Parsing itself is entirely zero-copy
Iteration over parsed identifiers is almost free

Licence: GPL-3.0

You might also like...

Bringing immutable infrastructure to the desktop!

Normal people don't reinstall their OS from scratch very often. When they do, the moment they reach that pristine desktop or terminal after a clean in

83 Dec 27, 2022

A rocksdb.rs wrapper bringing stack and queue functionalities

RocksDB_sq (Stack & Queue) A Rust crate that adds stack and queue functionality to RocksDB. This crate provide a wrapper around a RocksDB database and

5 May 16, 2023

High-performance, low-level framework for composing flexible web integrations

High-performance, low-level framework for composing flexible web integrations. Used mainly as a dependency of `barter-rs` project

8 Dec 28, 2022

A lightweight and high-performance order-book designed to process level 2 and trades data. Available in Rust and Python

ninjabook A lightweight and high-performance order-book implemented in Rust, designed to process level 2 and trades data. Available in Python and Rust

134 Jul 22, 2024

Outp0st is an open-source UI tool to enable next-level team collaboration on dApp development over Terra blockchain

2 May 4, 2022

Rust experiments involving Haskell-esque do notation, state, failure and Nom parsers!

Introduction As a long time Haskell developer recently delving into Rust, something I've really missed is monads and do notation. One thing monadic do

23 Feb 28, 2022

git-cliff can generate changelog files from the Git history by utilizing conventional commits as well as regex-powered custom parsers.⛰️

git-cliff can generate changelog files from the Git history by utilizing conventional commits as well as regex-powered custom parsers. The changelog template can be customized with a configuration file to match the desired format.

5k Jan 9, 2023

A series of compact encoding schemes for building small and fast parsers and serializers

2 Feb 5, 2022

Example of using nom parsers from a proc macro

Example of using nom parsers from a proc macro This project is organised as 3 crates: nom_macro is the main project, exposing the proc macro and the g

1 Mar 20, 2022

Parsers based on lady-deirdre project

ld-exts Parsers based on lady-deirdre project Links Lady Deirdre Alternative - Tree Sitter Config for NeoVim - LunarVim Parsers: Language Progress Hig

4 Feb 11, 2023

Provably optimal zero-copy parsers using nondeterministic finite automata.

inator: an evil parsing library You supply the evil plan; we supply the -inator! or, Provably Optimal Zero-Copy Parsers with Nondeterministic Finite A

51 Oct 4, 2023

A high-performance, high-reliability observability data pipeline.

Quickstart • Docs • Guides • Integrations • Chat • Download What is Vector? Vector is a high-performance, end-to-end (agent & aggregator) observabilit

12.1k Jan 2, 2023

languagetool-code-comments integrates the LanguageTool API to parse, spell check, and correct the grammar of your code comments!

languagetool-code-comments integrates the LanguageTool API to parse, spell check, and correct the grammar of your code comments! Overview Install MacO

17 Dec 25, 2022

Vector is a high-performance, end-to-end (agent & aggregator) observability data pipeline that puts you in control of your observability data

Quickstart • Docs • Guides • Integrations • Chat • Download What is Vector? Vector is a high-performance, end-to-end (agent & aggregator) observabilit

12.1k Jan 2, 2023

A generator for high-performance Pest parsers, bringing your grammar to the next level

Related tags

Overview

Faster-Pest

Benchmarks

Examples

Limitations

Optimization tricks used (for curious people)

You might also like...

Bringing immutable infrastructure to the desktop!

A rocksdb.rs wrapper bringing stack and queue functionalities

High-performance, low-level framework for composing flexible web integrations

A lightweight and high-performance order-book designed to process level 2 and trades data. Available in Rust and Python

Outp0st is an open-source UI tool to enable next-level team collaboration on dApp development over Terra blockchain

Rust experiments involving Haskell-esque do notation, state, failure and Nom parsers!

git-cliff can generate changelog files from the Git history by utilizing conventional commits as well as regex-powered custom parsers.⛰️

A series of compact encoding schemes for building small and fast parsers and serializers

Example of using nom parsers from a proc macro

Parsers based on lady-deirdre project

Provably optimal zero-copy parsers using nondeterministic finite automata.

A high-performance, high-reliability observability data pipeline.

languagetool-code-comments integrates the LanguageTool API to parse, spell check, and correct the grammar of your code comments!

Vector is a high-performance, end-to-end (agent & aggregator) observability data pipeline that puts you in control of your observability data

Sūshì is a simple but customizable static site generator / blog generator written in Rust

Parse BNF grammar definitions

a grammar based feedback fuzzer

LR(1) grammar parser of simple expression

a grammar based feedback fuzzer

Owner

Checks all your documentation for spelling and grammar mistakes with hunspell and a nlprule based checker for grammar

Easy c̵̰͠r̵̛̠ö̴̪s̶̩̒s̵̭̀-t̶̲͝h̶̯̚r̵̺͐e̷̖̽ḁ̴̍d̶̖̔ ȓ̵͙ė̶͎ḟ̴͙e̸̖͛r̶̖͗ë̶̱́ṉ̵̒ĉ̷̥e̷͚̍ s̷̹͌h̷̲̉a̵̭͋r̷̫̊ḭ̵̊n̷̬͂g̵̦̃ f̶̻̊ơ̵̜ṟ̸̈́ R̵̞̋ù̵̺s̷̖̅ţ̸͗!̸̼͋

IDE tools for writing pest grammars, using the Language Server Protocol for Visual Studio Code, Vim and other editors

🐳 📦 Bringing docker containers to your AUR helper since 2022

Parsing Expression Grammar (PEG) parser generator for Rust

A fast Rust-based safe and thead-friendly grammar-based fuzz generator

A fast Rust-based safe and thead-friendly grammar-based fuzz generator

Simple grammar-based test case generator

High-level memory-safe binding generator for Flutter/Dart <-> Rust

Bringing support for the EOS-S3 in Rust