Provably optimal zero-copy parsers using nondeterministic finite automata.

Related tags

Miscellaneous inator
Overview

inator: an evil parsing library

You supply the evil plan; we supply the -inator!

or, Provably Optimal Zero-Copy Parsers with Nondeterministic Finite Automata

Portrait of the eminent Dr. Heinz Doofenshmirtz

🚧 Development still ongoing! 🚧

Why?

Computability theory has known for ages that a certain class of languages have a unique and provably optimal representation as a graph. This is the insight behind regular-expression searchers like grep, but anything more than accepting or rejecting a string is difficult to reason about and impossible to optimize.

How?

This library uses the same high-level machinery as tools like grep but augmented with some extra goods:

Given a specification like "parse A after B" or "put this thing in parentheses" (and infinite combinations of similar things), we write a graph with a set of initial nodes, a set of accepting nodes, and some directed edges marked with input characters. From this, the computability theory stuff kicks in: we can minimize this to a graph with the provably minimal number of states that will tell us "is this valid syntax or not?"

Then, although we can't provably optimize your code, we let you write functions that effectively hitch a ride on the optimal decision graph we just made. Specifically, each edge—that is, each time you said "parse this token"—you can call a function on the data so far, using that token, that returns the next state.

Last of all, you can convert this optimal graph to a Rust source file: not only can we prove the implementation is optimal, we can leverage Rust's binary optimizer to make it run literally as fast as possible.

What does this whole process look like?

Surprisingly, it looks a lot like just writing down what you want. Here's the definition for "put this thing in parentheses":

pub fn parenthesized(p: Parser<char>) -> Parser<char> {
    ignore('(') + p + ignore(')')
}

So, if p is a parser that accepts ABC, then parenthesized(p) will accept (ABC). Simple as that.

The key idea here is that parsers are data, and you can pass them around, modify them, and combine them just like anything else.

See examples/tuple for a well-annotated crate that reads characters from a tuple representation ((), (A,), or (A, B, C, ...)), ignoring whitespace. In that example, the entire parser could be defined in a single line, but I split it up to illustrate, first, that you can do that—you don't have to have a once-and-for-all megaparser—and, two, to explain as much about using the library as I can in detail.

Other crates in examples extend the same technique to more complex parsers (e.g. phone numbers and email addresses).

Anything else cool you can do?

Yes! Since we're really just riding on top of a decision-problem automaton, you can take a specification and invert it to produce an infinite stream of random strings that are all guaranteed to be parsed correctly. If you're writing a language, this means automatically generating all possible valid source files.

And, you guessed it, this gets compiled down to Rust source as well, so your property-tests can be ridiculously effective.

Why not other parsing libraries?

Please try other parsing libraries! This is my favorite, mostly becuase it's pure Rust with zero macros, no new syntax, zero input copying, parsers as data, automatic input generation, and—well–I wrote it, but I'm not too familiar with other libraries, so I can't in good faith recommend this one too highly.

My primary goal was a personal tool, but it turned out much better than I expected, so I'd love to see you use it and get your feedback!

Acknowledgments

First off, to Haskell's parsing libraries (and UPenn's Haskell course) for showing me that parsers can even work this way. Most of the syntax is inspired by Haskell.

Second, to Rajeev Alur and Penn's CIS 262 for formally introducing me to nondeterministic finite automata.

Third, to Rust.

You might also like...
Private swaps for Secret Network using a private entropy pool & differential privacy.

WIP SecretSwap: Anon Edition Private swaps for Secret Network! Uses private entropy pool for differential privacy when reporting pools sizes. Swap amo

A template for kick starting a Cloudflare worker project using workers-rs.

Getting Started A template for kick starting a Cloudflare worker project using workers-rs. This template is designed for compiling Rust to WebAssembly

twilight-interactions is a set of macros and utilities to work with Discord Interactions using twilight.

Twilight interactions twilight-interactions is a set of macros and utilities to work with Discord Interactions using twilight. Note: This crate is not

Modrinth API is a simple library for using, you guessed it, the Modrinth API in Rust projects

Modrinth API is a simple library for using, you guessed it, the Modrinth API in Rust projects. It uses reqwest as its HTTP(S) client and deserialises responses to typed structs using serde.

A translation of Brendan Galea's Vulkan tutorial into Rust using the ash crate
A translation of Brendan Galea's Vulkan tutorial into Rust using the ash crate

Rust Light Vulkan Engine This is a translation of Brendan Galea's Vulkan tutorial into rust using the Ash crate. Original tutorial: Brendan Galea's Yo

Some tools for streaming frames to rpi-rgb-led-matrix using ZeroMQ, written in Rust.

led_matrix_zmq Some tools for streaming frames to rpi-rgb-led-matrix using ZeroMQ, written in Rust. This repository includes: Rust client and server l

Cross-platform GUI written in Rust using ADB to debloat non-rooted android devices
Cross-platform GUI written in Rust using ADB to debloat non-rooted android devices

Cross-platform GUI written in Rust using ADB to debloat non-rooted android devices. Improve your privacy, the security and battery life of your device.

This contract implements simple vote for the best coffe in indonesia using near protocol.

vote-coffe-near Description This contract implements simple vote for the best coffe in indonesia using near protocol. Contract in contract/src/lib.rs

Telegram bot for searching in Arch User Repository ( AUR ); Implemented using rust.

AurSearchBot A Telegram Inline Search Bot Written in Rust Introduction Telegram Bot that can search AUR ( Arch User Repository ) in inline mode. This

Comments
  • Remove UB from Lazy::PostponedReference

    Remove UB from Lazy::PostponedReference

    I know that you already reverted the commits, but if you want to try again this would probably be a better starting point. This is pretty much the same as your earlier approach, but without using raw pointers.

    opened by Seppel3210 1
Owner
Will Sturgeon
CS+CogSci @ UPenn
Will Sturgeon
Frame is a markdown language for creating state machines (automata) in 7 programming languages as well as generating UML documentation.

Frame Language Transpiler v0.5.1 Hi! So very glad you are interested in Frame. Frame system design markdown language for software architects and engin

Mark Truluck 35 Dec 31, 2022
A massively parallel, optimal functional runtime in Rust

High-order Virtual Machine (HVM) is a pure functional compile target that is lazy, non-garbage-collected and massively parallel. It is also beta-optimal, meaning that, in several cases, it can be exponentially faster than most functional runtimes, including Haskell's GHC.

null 5.6k Jan 8, 2023
A zero-copy parser for the contents of the __unwind_info section of a mach-O binary.

A parser for Apple's Compact Unwinding Format, which is used in the __unwind_info section of mach-O binaries.

Markus Stange 10 May 31, 2022
Parsers based on lady-deirdre project

ld-exts Parsers based on lady-deirdre project Links Lady Deirdre Alternative - Tree Sitter Config for NeoVim - LunarVim Parsers: Language Progress Hig

Bankov Andrey 4 Feb 11, 2023
The most primitive and the fastest implementation of a fixed-size last-in-first-out stack on stack in Rust, for Copy-implementing types

This is the simplest and the fastest (faster than Vec!) implementation of a last-in-first-out stack data structure, on stack, when stack elements are

Yegor Bugayenko 10 Jun 18, 2023
A following of the book: Zero to Production In Rust

A following of the book: Zero to Production In Rust

Zack Penn 2 Mar 19, 2022
dm-jitaux is a Rust-based JIT compiler using modified auxtools, dmasm and Inkwell LLVM wrapper for boosting Byond DM performance without any hassle!

dm-jitaux is a Rust-based JIT compiler using modified auxtools, dmasm and Inkwell LLVM wrapper for boosting Byond DM performance without any hassle (such as rewriting/refactroing your DM code).

SS220 20 Dec 13, 2022
A Simple, But amazing telegram bot, Made using the Rust language!

Telegram bot in Rust A fun Telegram bot made using Rust language.

Deep Alchemy 2 Dec 21, 2021
Rust bindings to Cloudflare Worker KV Stores using wasm-bindgen and js-sys.

worker-kv Rust bindings to Cloudflare Worker KV Stores using wasm-bindgen and js-sys

Zeb Piasecki 39 Dec 4, 2022
Showing how to deploy a Terra smart contract using Chainlink Data Feeds

Chainlink Terra Developing Requirements This demo requires the following components: Rust: rustup with cargo 1.44.1+. rustc and cargo 1.44.1+. Install

SmartContract 6 Aug 22, 2022