Simplistic complier~virtual-machine that transforms AST into a Rust function, with basic type checking

Overview

rast-jit-vm

rast-jit-vm is a simplistic, proof-of-concept~ish compiler / virtual machine that transforms syntax tree into a Rust function, type-checking it on the way:

use rast_jit_vm::prelude::*;

fn main() {
    let mul2 = Program {
        input: Type::Int,
        output: Type::Int,
        body: Node::Mul {
            lhs: Box::new(Node::Var("input")),
            rhs: Box::new(Node::Const(Value::Int(2))),
        },
    };
    
    let mul2 = vm::compile::<_, i32>(mul2);
    
    println!("{}", mul2(15)); 
}

Instead of transforming AST into bytecode, rast-jit-vm uses a bit lesser known technique that's oriented around thunks - basically, instead of doing:

fn eval(node: Node) -> Value {
    match node {
        Node::Add { lhs, rhs } => lhs + rhs,
        /* ... */
    }
}

... what rast-jit-vm does is:

type Thunk = Box<dyn Fn() -> Value>;

fn compile(node: Node) -> (Type, Thunk) {
    match node {
        Node::Add { lhs, rhs } => {
            let (lhs_ty, lhs) = compile(lhs);
            let (rhs_ty, rhs) = compile(rhs);
 
            match (lhs_ty, rhs_ty) {
                (Type::Int, Type::Int) => {
                    let ty = Type::Int;
                    let thunk = Box::new(|| lhs() + rhs());
                    
                    (ty, thunk)
                }
                
                (lhs_ty, rhs_th) => {
                    panic!("unsupported op: {:?} + {:?}", lhs_ty, rhs_ty);
                }
            }
        },
        
        /* ... */
    }
}

This allows not only to type-check all of the code up-front, but also to perform otherwise impossible optimizations such as preallocating variables into stack-slots:

struct CompilationContext {
    stack: Vec<Type>,
    vars: BTreeMap<String, usize>, // var-name => stack-slot
}

struct RuntimeContext {
    stack: Vec<Value>,
}

type Thunk = Box<dyn Fn(&mut RuntimeContext) -> Value>;

fn compile(ctxt: &mut CompilationContext, node: Node) -> (Type, Thunk) {
    match node {
        /// `let name = value`
        Node::Let { name, value } => {
            let (ty, value) = compile(ctxt, value);
            let id = ctxt.stack.len();

            ctxt.stack.push(ty);
            ctxt.vars.insert(name, ty);

            let thunk = Box::new(move |ctxt: &mut RuntimeContext| {
                // Voilà, no HashMap / BTreeMap needed at runtime!
                ctxt.stack[id] = value(ctxt);

                Value::Unit
            });
            
            (Type::Unit, thunk)
        },
        
        /// `name`
        Node::Var(name) => {
            let id = *vars.get(name).unwrap_or_else(|| {
                panic!("variable not defined: {}", name);
            });
            
            let thunk = Box::new(move |ctxt: &mut RuntimeContext| {
                // Voilà, no HashMap / BTreeMap needed at runtime!
                ctxt.stack[id].clone()
            });
            
            (ty, thunk)
        },
        
        /* ... */
    }
}

Thanks to this approach, an interpreter / a virtual machine written in this way can be faster than a typical one, because the "compiled" program doesn't have to perform any var-name => var-value map-based lookups anymore - to read or write a variable, it simply accesses it by its index!

Running locally

$ git clone https://github.com/Patryk27/rast-jit-vm
$ cd rast-jit-vm

# Uses the compilation technique as described above:
$ cargo run --bin mandelbrot-compile

# Uses just a regular interpreter-based evaluation approach:
$ cargo run --bin mandelbrot-eval

# Note that printing Mandelbrot fractal at this scale is pretty fast, so if you
# want to see a difference in the times, compare:
$ cargo run --release --bin mandelbrot-compile-100k
$ cargo run --release --bin mandelbrot-eval-100k 

Thoughts

Performance

Using the Mandelbrot fractal as a benchmark, tuned to 100k iterations for extra synthetic-benchmarking-juice, I've been able to obtain the following times:

  • Rust¹: 1.5s
  • rast-jit-vm, compiled²: 50s
  • Python³: 115s
  • rast-jit-vm, evaluated⁴: 130s

(technology: time to render the fractal; less is better)

¹ cd benches && rustc -O mandelbrot.rs && ./mandelbrot
² cargo run --release --bin mandelbrot-compile-100k
³ cd benches && python3 ./mandelbrot.py
cargo run --release --bin mandelbrot-eval-100k

Missing features

  • pretty error handling (it's just panic!() all over the place),
  • both the compiler and the evaluator don't understand scoping - it's possible to declare a variable inside a while block and use it outside (to do something like while false { let x = true; } println!("{}", x);),
  • rast-jit-vm's AST is Turing-incomplete (rough proof: all programs have statically allocated stack and there's no runtime malloc/free-like facilities, making recursion impossible; though arguably that's somewhere in-between a missing feature and a design decision).

License

Copyright (c) 2022, Patryk Wychowaniec [email protected].
Licensed under the MIT license.

You might also like...
An open source virtual tabletop that is currently purely a concept.

An open source virtual tabletop that is currently purely a concept.

A Rust framework for building context-sensitive type conversion.

Xylem is a stateful type conversion framework for Rust.

The never type (the true one!) in stable Rust.

::never-say-never The ! type. In stable Rust. Since 1.14.0. Better than an enum Never {} definition would be, since an instance of type ! automagicall

This crate provides a convenient macro that allows you to generate type wrappers that promise to always uphold arbitrary invariants that you specified.

prae This crate provides a convenient macro that allows you to generate type wrappers that promise to always uphold arbitrary invariants that you spec

A fusion of OTP lib/dialyzer + lib/compiler for regular Erlang with type inference

Typed ERLC The Problem I have a dream, that one day there will be an Erlang compiler, which will generate high quality type-correct code from deduced

A bidirectional type checker

A bidirectional type inference system loosely based on Complete and Easy Bidirectional Typechecking for Higher-Rank Polymorphism.

sblade or switchblade it's a multitool in one capable of doing simple analysis with any type of data, attempting to speed up ethical hacking activities

sblade or switchblade it's a multitool in one capable of doing simple analysis with any type of data, attempting to speed up ethical hacking activities

Ruxnasm is an assembler for Uxntal — a programming language for the Uxn stack-machine by Hundred Rabbits

Ruxnasm is an assembler for Uxntal — a programming language for the Uxn stack-machine by Hundred Rabbits. Ruxnasm strives to be an alternative to Uxnasm, featuring more user-friendly error reporting, warnings, and helpful hints, reminiscent of those seen in modern compilers for languages such as Rust or Elm.

figure out who holds the nfts that came out of a candy machine.

Candy Holders This is far from finished, but can: find tokens with a given update authority find holders of those tokens Neither the Rust or Node APIs

Owner
Patryk Wychowaniec
doing (hopefully useful) computer things | he, him
Patryk Wychowaniec
a simple compiled language i made in rust. it uses intermediate representation (IR) instead of an abstract syntax tree (AST).

a simple compiled language i made in rust. it uses intermediate representation (IR) instead of an abstract syntax tree (AST).

null 4 Oct 3, 2022
rlox-interpreter is an AST-walking implementation of Bob Nystrom's Lox language in Rust.

rlox-interpreter rlox-interpreter is an AST-walking implementation of Bob Nystrom's Lox language in Rust. Disclaimer: This is my first Rust project, d

Paul Fedorow 3 Oct 5, 2022
Easy-to-use optional function arguments for Rust

OptArgs uses const generics to ensure compile-time correctness. I've taken the liberty of expanding and humanizing the macros in the reference examples.

Jonathan Kelley 37 Nov 18, 2022
Lambda function to handle Bitbucket webhook payloads, extract relevant information and send notifications to Microsoft Teams

PR-Bot Lambda function to handle Bitbucket webhook payloads, extract relevant information, and send notifications to Microsoft Teams, saving you time

Irine 14 Sep 26, 2023
Basic Rust I2C demo on the TI Launchpad

msp430-i2c-oled-example An example project to talk to an SSD1306 OLED (Adafruit PiOLED) with an MSP430G2553 Launchpad via I2C.

Edward Shin 3 Jul 9, 2021
Basic ActivityPub Server (in Rust)

Basic ActivityPub Server (in Rust) This is a deep-dive on this blog post: https://blog.joinmastodon.org/2018/06/how-to-implement-a-basic-activitypub-s

Mat 10 Nov 24, 2022
basic multiple package manager

baka basic multiple package manager Docs Env baka_root_setting Windows: %USERPROFILE%/.baka/config Linux, Mac: $HOME/.baka/config baka_plugins (Just u

null 8 Dec 29, 2021
A basic rp2040-hal project with blinky and rtt logging example code.

A basic rp2040-hal project with blinky and rtt logging example code. With this you can quickly get started on a new rp2040 project

rp-rs 202 Jan 6, 2023
Alternative basic focus movement for the sway and i3 window managers.

sway-overfocus Alternative basic focus movement for the sway and i3 window managers. The primary goal of this program is to create one set of keybinds

null 42 Oct 23, 2022
Edgelord is a library for Cloudflare Workers. You can scaffold a basic bot for discord, slack, etc.

Edge Computing + chūnibyō = Edgelord ✨ ?? Edgelord Edgelord is now working. You can contribute for it. Edgelord is a Rust library for cloudflare worke

null 23 Dec 26, 2022