Introduction

As a long time Haskell developer recently delving into Rust, something I've really missed is monads and do notation. One thing monadic do notation makes particularly nice is writing parsers. This article will explore the possibility of enabling Haskell-style do notation for writing parsers in Rust.

Probably the most popular crate for parsing in Rust is called Nom and its parser combinator functions requires manually threading the parser input. To be fair, I should mention that Nom also provides macro-based parser combinators and a do_parse! macro which is pretty similar to do notation in function.

There are also several crates which attempt to allow do notation using macros. I picked one called do-notation that seemed to take a general approach and had also been updated relatively recently. Note: do-notation uses m! rather than do! since do is a reserved word in Rust.

A simple example with Option (known as Maybe in the Haskell world) would be something like this:

// Type annotation included just for clarity.
let result: Option<i32> = m! {
  x <- Some(1);
  let y = 2;
  Some(x + y)
};

Bind, let and expressions are allowed within m!, although pattern matching the left side of a bind doesn't seem to work. The block needs to have an expression at the end and evaluate to the monad you're using.

To enable using a type with the m! macro, it needs to implement the Lift trait. Example for Option:

impl  Lift for Option {
  fn lift -> Self {
    Some(inner)
  }
}

If you're familiar with Haskell, this would be return or pure. Some in previous example could be written as <_>::lift (automatically inferring the type) or >::lift.

Also required is an and_then associated function for the type you're planning to use with m!. Rust's std::Option and std::Result already include this by default. Here is a link to Option::and_then and you can just view the source for it right there. We'd call this bind or >>= in Haskell.

Nom parsers are just state combined with the ability to fail, so let's work our way toward building a monad that will combine those traits. Starting with...

State

Source: src/state.rs

The approach I used here was precisely the same as the state monad from Haskell — the State type holds a closure which takes the current state and returns a tuple of the new state and the result of the action. The type looks like this:

pub struct State<'a, S, A>(Box<dyn FnOnce(S) -> (A, S) + 'a>);

In line with the Haskell state monad, I included a function to extract the current state:

pub fn getst<'a, S: Clone + 'a>() -> State<'a, S, S> {
  State(Box::new(|s| (s.clone(), s)))
}

And one to set the state:

pub fn putst<'a, SNEW: 'a>(snew: SNEW) -> State<'a, SNEW, ()> {
  State(Box::new(move |_| ((), snew)))
}

The function to run a state action is very simple — we just apply the state to the closure stored inside State to get back a tuple of the return value and last state.

pub fn run_state(s: S, ma: State) -> (A, S) {
  ma.0(s)
}

Using it looks like this:

// Type annotation here not actually needed.
let action: State<i32, i32> = m! {
  st <- getst();
  putst(st + 1);
  st <- getst();
  putst(st + 1);
  getst()
};
let result = run_state(10, action);

result here would be (12, 12)

State Inside Result

Source: src/stateresult.rs

For the State + Result monad, our type looks like:

pub struct StateResult<'a, S, A, E>(Box<dyn FnOnce(S) -> Result<(S, A), E> + 'a>);

Instead of simply returning the state and return value, we return Result with either state and return value or an error. The state manipulation functions haven't changed much:

pub fn getstres<'a, S: Clone, E>() -> StateResult<'a, S, S, E> {
  StateResult(Box::new(|s| Ok((s.clone(), s))))
}

pub fn putstres<'a, SNEW: 'a, E>(snew: SNEW) -> StateResult<'a, SNEW, (), E> {
  StateResult(Box::new(move |_| Ok((snew, ()))))
}

The only difference is now the value is wrapped by Result. Since our actions can fail, I included a function to throw an error and abort the computation:

pub fn throwstres<'a, S, A, E: 'a>(e: E) -> StateResult<'a, S, A, E> {
  StateResult(Box::new(|_| Err(e)))
}

Using the monad looks like this:

let action = m! {
  st <- getstres();
  putstres(st + 1);
  st <- getstres();
  putstres(st + 1);
  st <- getstres();
  // st is 12 here, so of course this will always fail.
  _ <- if st > 11 {
    throwstres("Oh no!")
  } else {
    <_>::lift(())
  };
  getstres()
};
let result = run_state_result(10, action);

The type for result in the previous example would be something like StateResult

Worth noting is that this approach throws away the current state when hitting Err. This way of doing it is most similar to Nom but there's another possibility:

State Beside Result

Source: src/stateeither.rs — no particular reason for that name.

In this case, the type looks like:

pub struct StateEither<'a, S, A, E>(Box<dyn FnOnce(S) -> (S, Result<A, E>) + 'a>);

The implementation is very similar, but we always get the last state back along with the result, whether or not it is Err.

NomParser!

Source: src/nomparser.rs

Now I can put together what I've learned and implement the required support for using do notation with Nom.

The type of a Nom parser looks like this:

pub struct NomParser<'a, S, A, E = Error>(Box<dyn FnOnce(S) -> IResult<S, A, E> + 'a>);

This is basically the same as StateResult from before. The state (S) is the data that we're parsing, the result (A) is the result of our parser operations and of course parsing is fallible so there's E. Nom parser combinators actually work very similar to the closure we store inside the State or NomParser type: they return a closure that takes the "state" (input). One simple example is the tag combinator which just expects a certain value in the input.

Example usage:

let (next_input, _result) = tag("abc")(current_input)?;

If tag matches, it consumes the input and returns the matched value — otherwise it fails.

And of course the next time you wanted to apply a parser combinator, you'd pass the input state that tag had returned and recieve the next one plus the return value and so on, which is a bit of a pain.

I added a function to wrap an existing Nom combinator with NomParser:

pub fn nomp<'a, F, S, A, E>(f: F) -> NomParser<'a, S, A, E> where F: FnOnce(S) -> IResult + 'a, { NomParser(Box::new(f)) }

The type matches up exactly with our state eating closure, which isn't an accident.

It's also possible to write a pre-wrapped version of a combinator so we don't have to write nomp(tag(etc)) every single time:

fn ptag<'a, S, T>(t: T) -> NomParser<'a, S, S> where S: 'a + nom::InputTake + nom::Compare, T: 'a + Clone + nom::InputLength, { nomp(tag(t)) }

For convenience I included a type alias for parsing &str using a standard error type:

pub type NompStr<'a, A, S = &'a str, E = Error~~> = NomParser<'a, S, A, E>;~~

Let's take the example from Nom's README and show what it looks like with do notation:

#[derive(Debug, PartialEq)] pub struct Color { pub red: u8, pub green: u8, pub blue: u8, } fn from_hex(input: &str) -> Result<u8, std::num::ParseIntError> { u8::from_str_radix(input, 16) } fn is_hex_digit(c: char) -> bool { c.is_digit(16) } fn hex_primary_parser<'a>() -> NompStr<'a, u8> { nomp(map_res(take_while_m_n(2, 2, is_hex_digit), from_hex)) } fn hex_color_parser<'a>() -> NompStr<'a, Color> { m! { nomp(tag("#")); red <- hex_primary_parser(); green <- hex_primary_parser(); blue <- hex_primary_parser(); <_>::lift(Color { red, green, blue }) } }

For comparison, the corresponding non-monadic approach with manual input threading would look like:

// Helper functions and type elided fn hex_primary(input: &str) -> IResult<&str, u8> { map_res( take_while_m_n(2, 2, is_hex_digit), from_hex )(input) } fn hex_color(input: &str) -> IResult<&str, Color> { let (input, _) = tag("#")(input)?; let (input, red) = hex_primary(input)?; let (input, green) = hex_primary(input)?; let (input, blue) = hex_primary(input)?; Ok((input, Color { red, green, blue })) }

Note: The example provided in the Nom repo uses the tuple combinator to avoid three separate calls to hex_primary.

Is It Practical?

I suspect not. There are several apparent disadvantages:

Wrapping/building up closures on every operation is very likely to sap performance and I wouldn't be surprised if it blows up the stack too.

Many Nom parsers take other parsers. When trying to use do notation, you're suddenly back in plain Nom combinator territory and have to either re-enter NomParser or just write the rest of your parser the old fashioned way. It is possible writing a library of wrapped combinators would alleviate the issue.

Code inside macros with special syntax presents a problem for development tools like IDEs, automatic formatting, etc. It's not clear the benefit of clearer syntax with monadic parsing (if everyone would even agree it is a benefit!) outweighs that downside.

On the plus side, it's pretty interesting that this is possible and I got further with it than I expected to!

Project

This project includes a binary that just runs some random test code. You can execute it using cargo run.

The different monads have a simple test block, which you can run with cargo test.

The project also works as a library and can be imported. It exports each module by name, and also mdoexperiments::prelude which pulls in everything and re-exports it.

Closing

I'm sure my approach isn't optimal, even within what's allowed by Rust's type system. I'd be very interested in seeing potential improvements. There's also a good change my use of lifetimes is incorrect or suboptimal — I only got it to the point where my examples would run.

Feedback welcome! The best way to contact me is via reddit with username KerfuffleV2

Northstar is a horizontally scalable and multi-tenant Kubernetes cluster provisioner and orchestrator

Northstar Northstar is a horizontally scalable and multi-tenant Kubernetes cluster provisioner and orchestrator. Explore the docs » View Demo · Report

1 Jan 22, 2022

Time related types (and conversions) for scientific and astronomical usage.

astrotime Time related types (and conversions) for scientific and astronomical usage. This library is lightweight and high performance. Features The f

3 Aug 22, 2022

UnTeX is both a library and an executable that allows you to manipulate and understand TeX files.

UnTeX UnTeX is both a library and an executable that allows you to manipulate and understand TeX files. Usage Executable If you wish to use the execut

1 Apr 5, 2022

A convenient tracing config and init lib, with symlinking and local timezone.

clia-tracing-config A convenient tracing config and init lib, with symlinking and local timezone. Use these formats default, and can be configured: pr

5 Jan 3, 2023

📮 load, write, and copy remote and local assets

axoasset This library offers read, write, and copy functions, for local and remote assets given a string that contains a relative or absolute local pa

7 Jan 25, 2023

A tool for investigating file system and folder contents and their changes.

Sniff A tool for investigating file systems and folder contents and their changes. Sniff can create snapshots of file systems and folders, storing has

6 Jan 14, 2023

Bolt is a desktop application that is designed to make the process of developing and testing APIs easier and more efficient.

Bolt ⚡ Bolt is a desktop application that is designed to make the process of developing and testing APIs easier and more efficient. Quick start 👩‍💻

6 Mar 26, 2023

k-mer counter in Rust using the rust-bio and rayon crates

krust is a k-mer counter written in Rust and run from the command line that will output canonical k-mers and their frequency across the records in a f

14 Jan 7, 2023

An n-tuple pendulum simulator in Rust + WebAssembly (and JavaScript)

An n-tuple pendulum simulator in Rust + WebAssembly (and JavaScript) Remaking this n-tuple pendulum simulator moving the math to Rust 🦀 and WebAssemb

27 Feb 19, 2022

Rust experiments involving Haskell-esque do notation, state, failure and Nom parsers!

Related tags

Overview

Introduction

State

State Inside Result

State Beside Result

NomParser!

Is It Practical?

Project

Closing

You might also like...

Northstar is a horizontally scalable and multi-tenant Kubernetes cluster provisioner and orchestrator

Time related types (and conversions) for scientific and astronomical usage.

UnTeX is both a library and an executable that allows you to manipulate and understand TeX files.

A convenient tracing config and init lib, with symlinking and local timezone.

📮 load, write, and copy remote and local assets

A tool for investigating file system and folder contents and their changes.

Bolt is a desktop application that is designed to make the process of developing and testing APIs easier and more efficient.

k-mer counter in Rust using the rust-bio and rayon crates

An n-tuple pendulum simulator in Rust + WebAssembly (and JavaScript)

Owner

Kerfuffle

hado-rshado — A little macro for writing haskell-like do expressions without too much ceremony

💫 Small microservice to handle state changes of Kubernetes pods and post them to Instatus or Statuspages

Director is a simple, versatile, ergonomic state machine in Rust-lang.

A Polkadot SDK-like state machine written from scratch in Rust.

A lightning fast state management module for Yew.

Component-based state machine plugin for Bevy

Fusion is a cross-platform App Dev ToolKit build on Rust . Fusion lets you create Beautiful and Fast apps for mobile and desktop platform.

A Diablo II library for core and simple client functionality, written in Rust for performance, safety and re-usability

Code examples, data structures, and links from my book, Rust Atomics and Locks.

List of Persian Colors and hex colors for CSS, SCSS, PHP, JS, Python, and Ruby.