Lineiform is a meta-JIT library for Rust interpreters

Overview

Lineiform

Lineiform is a meta-JIT library for Rust interpreters. Given an interpreter that uses closure generation, it allows an author to add minimal annotations and calls into Lineiform in order to automagically get an optimizing method JIT, via runtime function inlining and constant propagation to resolve dynamic calls to closed environment members. Internally it does lifting of closure bodies from x86 assembly to Cranelift IR for dynamic recompilation.

The Big Idea

A normal tree-walking interpreter in Rust looks like this: (pseudocode but not much)

fn eval(ast: Ast) -> usize {
    match ast {
        Literal(l) => l,
        Add(left, right) => eval(left) + eval(right),
    }
}

let val = eval(Add(Literal(1), Literal(2)));
val // 3

You can instead write a closure generator interpreter like this, for the same language: (also pseudocode but not much)

type Environment = ();
type EvalAst = Fn(Environment) -> usize;
fn eval(ast: Ast) -> EvalAst {
    match ast {
        Literal(l) => |e: Environment| l,
        Add(_left, _right) => {
            let left = eval(_left);
            let right = eval(_right);
            |e: Environment| { left(e) + right(e) }
    }
}

let compiled_eval = eval(Add(Literal(1), Literal(2)))
let env: Environment = ();
compiled_eval(env) // 3

compiled_eval look like [ return [ return 1] + [ return 2] ] where [] is its own closure. The Add case of the closure in particular has a closed environment of left and right which are a specialized closure for the AST node that they're made of - it's just that you still need a dynamic call for evaluating them. Lineiform is a library that you'd call (probably only in a Call language construct, for method-level optimization) to optimize an EvalAst closure, for example inlining those left and right dynamic calls into the single function body of compiled_eval, since the closed environment for an EvalAst is statically known at that point. compiled_eval then become [ return 1 + 2 ], which you can then optimize further to just [ return 3] - it "unrolled" multiple interpreter loops into one function, exposing them to a recompiler for global optimization of the actual evaluation of the closure. We don't fix the Environment that is actually passed to the closure, only the closed environment.

You can think of this analogous to how a Forth compiler works: 1 is a word that pushes the value 1, 2 is a word that pushes the value 2. A dumb value of 1 2 + then is three dynamic function calls, to function body addresses stored in its "contents" list (1, 2, +). If you have a concretely built word at runtime, however, you can inline the calls to all the function pointers you have, and get the actual execution path of the function, allowing you to do global optimizations. The Rust closure compiled_eval looks much like this Forth word if you squint, but doesen't use a threaded interpreter execution mode, and stores the function pointers in the closure closed environment instead of a word body array.

What Works

Currently, not much. You can create a JIT and optimize a closure, which does some janky constant propagation from the closure's environment in order to turn jmp rax into a statically known jmp, which allows us to inline. I basically only implemented as much x86 lifting as was needed for the PoC, and don't have features like "branches" or "eflags" or "calls" or "correct instruction memory widths". Don't use this yet.

A freezing allocator

Currently we inline loads from a reference to our closed environment. This is good, but it doesn't scale: we also want to inline functions of functions we call, which requires us to also inline loads for closures that we are closed over. We can't just recursively inline everything from our environment, however, because our upvalues may have interior mutability that would invalidate our compiled function. Instead, we can use a freezing allocator: all closures that are candidates for inlining you jit.freeze(move |e: Env| { ... body ... }), which copies it to a read-only memory region. We can then inline any loads from pointers that are inside the frozen memory region, allowing us to recursively inline environments from any closure starting point. This has the small downside of causing segfaults if you have a RefCell in your environment and try to mutate it. That's a niche enough case, and easy enough to debug, that I'm just not going to worry about it: more complex interior mutable datastructures would be a pointer chase or more away, and so unfrozen and non-inlinable.

What Needs Work

  • Disassembling functions
  • Lifting x86 to Cranelift IR
    • Enough instructions that anything non-trivial doesn't crash
  • Branches
  • Loops
  • Calls (we need to "concretize" the SSA value map, so that an external call has the correct registers/stack, and restore them afterwards)
    • Lifting bailouts: if we hit an instruction we can't lift, we can instead bail out and emit a call to inside the original version of the function, so we can optimize only half of a complex function for example.
    • Optimistic external calls: try to use DWARF to figure out calling convention + argument count for external functions, to avoid having to spill more registers/stack than we need!
  • Correct input/output function prototype detection: currently we just hardcode parameter/return value types, which is Bad.
  • Freezing allocator scheme
    • Freeing from the frozen allocator (probably using crossbeam_epoch)
  • Smarter function size fetching (currently we only search /proc/self/exe for them! Which doesn't work if you have any dynamic linking lol)
  • Performance counters for the "optimized" Fn trait, so we only optimize over some threshold.
  • A specialize function: we can have a variant of optimize that also fixes a value from the arguments to the closure, to for example pin the type tag of an object to erase type checks. We can compile a variant for each concrete pinned value, and emit a jump table that dispatches to either the non-compiled generic version, or the specialized variant.
Comments
  • WIP: Temp refactor

    WIP: Temp refactor

    Refactors a bunch of code to handle instruction temporary values better - this allows us to do proper zero/sign extending for e.g. sub.

    The main difference is that operations are over JitTmp values now, which are variable widths; previously we were passing and returning JitValues for them, which we always normalized to HOST_WIDTH width. This allows us to also give Cranelift better type information for all its SSA values, since we're using the actual variable-width int types. Additionally, it makes our math operations have correct condition flag settings: before we were assuming that all math ops were with usize operands, and the flag behavior for those operations, while now we have 8,16,32,64 bit assembly versions that we capture the flag effects of properly.

    This is a pretty major refactor, and includes a bunch of other things I felt like changing while doing so. lol woops

    opened by chc4 1
  • Recursive example incomplete?

    Recursive example incomplete?

    Hey! Shouldn't your README.md example for the recursive evaluation state

    fn eval(ast: Ast) -> usize {
        match ast {
            Literal(l) => l,
            Add(left, right) => eval(left) + eval(right),  // <-- recursive call to eval missing!
        }
    }
    
    let val = eval(Add(Literal(1), Literal(2)));
    val // 3
    

    ?

    opened by phorward 1
  • Use a different IR than Cranelift

    Use a different IR than Cranelift

    move |a: usize| {
        let val: usize;
        if Wrapping(a) + Wrapping(0x5) < Wrapping(a) {
            val = 1;
        } else {
           val = 2;
        }
        val
    }
    
    ---- cmp rsi, -0x5
    ---- mov eax, 0x2
    ---- adc rax, -0x1
    ---- ret
    
    function u0:0(i64, i64, i64) -> i64 system_v {
    block0(v0: i64, v1: i64, v2: i64):
        jump block1
    
    block1:
        v3 = iconst.i64 -5
        v4, v5 = isub_ifbout.i64 v2, v3
        v6 = iconst.i64 2
        v7 = iconst.i64 -1
        v8, v9 = iadd_ifcarry v6, v7, v5
        return v8
    }
    

    (Ignore the weird block0) This errors when compiling with thread 'test::test_passthrough_usize' panicked at 'ALU+imm and ALU+carry ops should not appear here!', /home/charlie/.cargo/registry/src/github.com-1ecc6299db9ec823/cranelift-codegen-0.75.0/src/isa/x64/lower.rs:5977:13

    Technically for this specific case I could turn the cmp into a Cranelift icmp_imm instruction, but later could could be depending on the rsi (v4) value. I'd have to re-emit an isub instruction right before its use, which I can do in this specific case, but really this is just a sign that Cranelift's IFLAGS mechanism plain doesn't work for me. The bigger issue is that Cranelift assumes that all math ops clobber all IFLAGS, so you can't have, for example

    ---- cmp rsi, -0x5
    ---- lea eax, [rsi+5]
    ---- mov eax, 0x2
    ---- adc rax, -0x1
    ---- ret
    

    since it would say that the flag produced by the cmp would be invalid after the iadd from the LEA. I'd have to do my own code movement and instruction picking to make it happy, and even then Cranelift flag rules are hard enough to use it's probably impossible.

    I'm not entirely sure where to go from here: I probably should've just been using LLVM this entire time, since that's what a lot of other x86_64 lifters target for a backend. Alternatively, I could try to use dynasm-rs: it doesn't have any optimizations, but then Cranelift barely does either (and it's codegen is often worse than just re-emitting the same instructions!). By lifting and lowering to the exact same instructions, nothing would clobber flags - at worse I'd have to emit stc; adc if I constant fold the add part of an add; adc, and could probably do something smarter instead.

    opened by chc4 3
  • Better tracing

    Better tracing

    Right now we just println a bunch, which sucks.

    • [ ] Switch to tracing so we have introspection into how long specific parts of the program are taking
    • [ ] Save cranelift_codegen::cfg_printer::CFGPrinter to disk so we can visualize the JIT'd program CFG better (and maybe build our own graphviz printout for JitValues?)
    opened by chc4 2
  • Persistent cache

    Persistent cache

    We optimally cache a lot of the JIT metadata we collect, for things like e.g. loop entry points from #8 or which control flows merge from #7 or DWARF abi information from #3.

    All of this data should be identical across runs with the same binary. So hypothetically we should be able to not only cache the data for one run, but all future runs of the JIT.

    We should look into dropping the data to disk, and mmapping it back in when we start back up again to have the best information we can, and atomically updating the database if we get new information in a run.

    Slight problems are we have to handle cache invalidation if the binary is updated, or any data that depended on a symbol from a dynamic library that was updated.

    opened by chc4 1
  • Ahead-of-time compilation mode

    Ahead-of-time compilation mode

    Ok, maybe this is a dumb idea, but it'd be nice to be able to do a JIT compilation of a script and then have Lineiform dump a snapshot to disk. Then running the snapshot executes the "JIT" compiled closures immediately, without having to parse + emit closures + JIT compile, giving you ahead-of-time compilation of scripts (while still having the full range of language features like eval).

    Lisp and Smalltalk have "images" that do this. I'm not entirely sure how you handle rebasing the JIT compiled closures, tho, if we're inlining pointers from the closed environment and stuff. Maybe you just don't have ASLR in the snapshotted executable lol.

    opened by chc4 0
  • Implement inlining heuristics

    Implement inlining heuristics

    Right now Lineiform just tries to inline every call unconditionally. This is pretty stupid. Most JITs use some combination of current function size, CFG complexity, and target function size to decide if it should inline a call or not, and we'd like to do the same. This is slightly hampered in that we don't have an CFG building pass, but we do have target function size from the symbol.

    opened by chc4 1
Owner
null
Rust library for build scripts to compile C/C++ code into a Rust library

A library to compile C/C++/assembly into a Rust library/application.

Alex Crichton 1.3k Dec 21, 2022
Rust Attribute-Based Encryption library rabe's C FFI binding , support CP-ABE and KP-ABE encrypt and decrypt, submodule of Rabe.Core c# library.

Rabe-ffi Rust Attribute-Based Encryption library rabe's C FFI binding , support CP-ABE and KP-ABE encrypt and decrypt, submodule of Rabe.Core c# libra

Aya0wind 2 Oct 10, 2022
Rust library to interface with Lua

hlua This library is a high-level binding for Lua 5.2. You don't have access to the Lua stack, all you can do is read/write variables (including callb

Pierre Krieger 488 Dec 26, 2022
A minimalist and safe ECS library for rust!

The full ECS (Entity-Component-System) library. Support an Open Source Developer! ♥️ Composed of two smaller libraries: world_dispatcher: the System p

Joël Lupien 124 Dec 19, 2022
A library for functional programming in Rust

It contains purely functional data structures to supplement the functional programming needs alongside with the Rust Standard Library.

Jason Shin 1.1k Dec 30, 2022
A rust library containing typings and utility functions dealing with the Public specification of the Internet Computer.

IC Types Contributing Please follow the guidelines in the CONTRIBUTING.md document. Goal This library contains typings and utility functions dealing w

DFINITY 5 Nov 28, 2022
libnotcurses-sys is a low-level Rust wrapper for the notcurses C library

libnotcurses-sys is a low-level Rust wrapper for the notcurses C library This library is built with several layers of zero-overhead abstractions over

nick black 29 Nov 26, 2022
A minimal library for building compiled Node.js add-ons in Rust via Node-API

A minimal library for building compiled Node.js add-ons in Rust via Node-API

Node-API (N-API) for Rust 3.1k Dec 29, 2022
Stack unwinding library in Rust

Unwinding library in Rust and for Rust This library serves two purposes: Provide a pure Rust alternative to libgcc_eh or libunwind. Provide easier unw

Gary Guo 51 Nov 4, 2022
witgen is a library to generate .wit files for WebAssembly in Rust

witgen witgen is a library to help you generate wit definitions in a wit file for WebAssembly. Using this lib in addition to wit-bindgen will help you

Coenen Benjamin 28 Nov 9, 2022
A simple library to allow for easy use of python from rust.

Rustpy A simple library to allow for easy use of python from rust. Status Currently this library has not received much love (pull requests welcome for

Luke 74 Jun 20, 2022
Robust and Fast tokenizations alignment library for Rust and Python

Robust and Fast tokenizations alignment library for Rust and Python Demo: demo Rust document: docs.rs Blog post: How to calculate the alignment betwee

Explosion 157 Dec 28, 2022
Very experimental Python bindings for the Rust biscuit-auth library

Overview This is a very experimental take on Python bindings for the biscuit_auth Rust library. It is very much a work in progress (limited testing, m

Josh Wright 5 Sep 14, 2022
Python bindings for heck, the Rust case conversion library

pyheck PyHeck is a case conversion library (for converting strings to snake_case, camelCase etc). It is a thin wrapper around the Rust library heck. R

Kevin Heavey 35 Nov 7, 2022
Arrowdantic is a small Python library backed by a mature Rust implementation of Apache Arrow

Welcome to arrowdantic Arrowdantic is a small Python library backed by a mature Rust implementation of Apache Arrow that can interoperate with Parquet

Jorge Leitao 52 Dec 21, 2022
Rust bindings for Supabase JavaScript library via WebAssembly.

supabase-js-rs Rust bindings for Supabase JavaScript library via WebAssembly. Usage Add supabase-js-rs to Cargo.toml supabase-js-rs = { version = "0.1

Valery Stepanov 8 Jan 13, 2023
A 3D bin packing library in Rust/WebAssembly.

packme-wasm Demo https://packme.vercel.app This repository hosts an implementation of Dube, E., & Kanavathy L. (2006). Optimizing Three-Dimensional Bi

Ade Yahya Prasetyo 17 Feb 25, 2024
🐱‍👤 Cross-language static library for accessing the Lua state in Garry's Mod server plugins

gmserverplugin This is a utility library for making Server Plugins that access the Lua state in Garry's Mod. Currently, accessing the Lua state from a

William 5 Feb 7, 2022
Node.js bindings to the ripgrep library, for fast file searching in JavaScript without child processes!

ripgrepjs ripgrepjs: Node.js bindings to the ripgrep library, for direct integration with JS programs without spawning an extra subprocess! This proje

Annika 1 May 10, 2022