Symbolically Executable Stack Machines in Rust

TannrA

Last update: Dec 28, 2022

Related tags

Overview

Symbolically Executable Stack Machines in Rust

Symbolic Stack Machines is a library for implementing symbolically executable stack-based virtual machines.

This works by providing various traits for a Stack, Memory (including read-only and writeable memory), VMInstruction, as well as generic Val structure to represent stack or memory values. Finally, it provides a set of abstract Machines which can be instantiated with a specific instruction set and can execute a sequence of instructions concretely or symbolically.

Implementing a new VM involves, simply, implementing the VMInstruction trait for an enum that describes the instruction set of the operations.

Each operation should output a state diff (ExecRecord), which describes the updates to the stack, memory, any new path constraints (in the case of symbolic execution), and machine's program counter. The machine can apply the update with ExecRecord::apply.

Reachability

The Machine constructs the updates performed on its state by instructions. Operations can add path constraints via the ExecRecord structure. The Machine incrementally constructs a set of constraints for each possible path of execution. Once these paths are constructed, the machine checks whether such paths are reachable. See src/machine/mod.rs, specifically, the BaseMachine::run_sym method for the implementation.

Currently, there is no intermediate path pruning performed.

Supported Memory & Stack Models

Currently, two forms of symbolic memory are built-in: Memory based on the theory of arrays, and finite concrete memory that can store possibly symbolic values.

The only built-in stack model right now is a finite stack that can store symbolic or concrete values.

The only symbolic values that have built-in support right now are integers.

Usage

See lib.rs for a toy instruction set and its symbolic execution.

Open Questions

How to handle endianness of various machines w.r.t bit vectors?
Best approach for modular plug-and-play style machine creation (storage, mem, stack, etc)?
How to handle special, niche environments? E.g., EVM has GAS opcode which requires a notion of gas within the machine.
Copy on write when storing machine states?
Niche exec environments:
- Optional undefined context;
- Pass this as an implicit argument to VMInstruction::Exec
- Opcode author is responsible for writing the interaction
- ExecRecord is extensible by this generic context as well

Notes

Type constraints on Machine to ensure that the values stored on stack are convertible to the val type stored in memory as well as the val type used to index the memory
Remove direct z3 dependency and generate an IR + transformation from IR -> target (e.g., smtlib2, rust-z3 bindings)
For niche exec environment, provide custom context definition and access on the machine
Add a generic context switch method; useful for describing behavior of one program calling another (such as smart contract calls)

Comments

remove generic from stack

Remove the generic from stack. This also requires removing the test where we test when symbolic values are on the stack. We can add that back in when StackVal is the more general Val that uses an ast.

opened by williamberman 1
remove core/instructions/val in favor of core/value

core/instructions/val is being replaced with the core/value module some of the non abstract machine + tests were dependent on the core/instructions/val module so removed those as well

the contrib/tests/simple_lang.rs tests were dependent on core/instructions/val so I removed and replaced with contrib/tests/simple_langg.rs tests

also ran cargo fmt :P

opened by williamberman 1
Abstract Core
Decouple core features from convenient features.

Notable differences:

Since most built in Abstract components are actually well defined data structures that are only abstract over the data types they contain, we have replaced the reliance on type signatures as the primary way to configure a machine with, instead, a "universal" value type that can be lazily interpreted as a different type depending on the intent of the machine author. This dramatically simplifies the type signatures.

Configuration is thus done through a Config object that (will in the near future) contain the "hooks" provided by core::ast which enable customizing the way in which an abstract value is interpreted by the machine's interpreters.

Still TODO:

Make use of AST hooks to configure various Abstract-X structures via their Config objects

Replace the StackVal, MemVal etc., with ast::Sentence

Review the use of core::Constraint now that both values and constraints can be expressed as sentences

Review the inner_interpreter-outer_interpreter architecture to integrate its execution with ast::Sentence

Bring back the symbolic execution phase of the outer interpreter
opened by WilfredTA 0
Made return type for AbstractInstruction abstract; Separated Abstract…

…Machine into AbstractMachine, InnerInterpreter, and OuterInterpreter

Made return type for AbstractInstruction abstract.

Because instructions can now return different types, the InnerInterpreter trait constrains the set of possible return types from individual instructions. All existing instructions still return AbstractExecRecord. All existing instructions are concrete so they return ConcreteAbstractExecRecord which has its constraint type parameter set to the unit type. This allows concrete instructions to not have any generic type for constraints while still using the same AbstractExecRecord return type.

Separated AbstractMachine into AbstractMachine, InnerInterpreter, and OuterInterpreter.

AbstractMachine - purely encapsulates the current state of the machine. It does not include constraints as concrete machines do not have constraints.

InnerInterpreter - handles the return value and state update from execution of an individual instruction. We have two example inner interpreters. The concrete inner interpreter simply updates the current machine state. The symbolic inner interpreter returns a vec of new machine states and constraints.

OuterInterpreter - handles orchestration of which machine state(s) to run. For the example outer interpreters we implement, we embed the inner interpreter in the outer interpreter struct as a trait object. This allows for some flexibility in choosing different inner interpreters so long as they have a step function of the right type for the outer interpreter. The concrete outer interpreter just steps the concrete inner interpreter until the machine can’t execute any further. The symbolic outer interpreter performs DFS on the search graph of machine states. Note that this architecture allows us to substitute out an alternative symbolic outer interpreter that performs BFS or any other alternative search strategy.

I temporarily removed the PathSummary portion yet where we use the solver to find the reachable paths and return models. The type signature of SymbolicOuterInterpreter is already incredibly complex and I'd prefer to not add the additional generics that the solver is going to entail in a separate PR.

opened by williamberman 0
Add z3 solver
rebased on top of https://github.com/WilfredTA/symbolic-stack-machines/pull/16

Creates the abstract interface to the constraint solver. Running the symbolic machine now returns

the model mapping symbolic values to concrete values in the machine

the symbolic return value of the machine

the concrete return value of the machine
opened by williamberman 0
init symbolic machine

rebased on top of https://github.com/WilfredTA/symbolic-stack-machines/pull/15

This creates the symbolic machine that wraps a concrete machine. This initial version does not execute the SMT solver and only allows for collecting constraints and splitting the machine state.

opened by williamberman 0
implement concrete machine

rebased on top of https://github.com/WilfredTA/symbolic-stack-machines/pull/14

separating out the core concrete machine into its own trait+implementation

opened by williamberman 0
instruction implementations

rebased on top of https://github.com/WilfredTA/symbolic-stack-machines/pull/13

Opcode implementations. Opcodes are written so they can be dynamically dispatched to.

opened by williamberman 0
separate out concrete and symbolic instruction
rebased on top of https://github.com/WilfredTA/symbolic-stack-machines/pull/12

The separate concrete and symbolic instruction allows for clean interfaces and opcode implementations.

concrete instructions

only operate on concrete machine state

symbolic instructions

operate on concrete machine state

can split into multiple machine states

can add path constraints

This gives the nice ability to define both concrete and symbolic implementations of the same opcode. I.e. see {concrete, symbolic} JUMPI.

https://github.com/WilfredTA/symbolic-stack-machines/blob/aec8aa99c2c0c19370f8402c3976a41300ed8f70/src/instructions/sym.rs#L12-L48

https://github.com/WilfredTA/symbolic-stack-machines/blob/aec8aa99c2c0c19370f8402c3976a41300ed8f70/src/instructions/misc.rs#L47-L71

The concrete machine will only use the concrete instructions. The symbolic machine will use the hybrid instructions.
opened by williamberman 0
add concrete indexed memory

this is rebased on https://github.com/WilfredTA/symbolic-stack-machines/pull/11

Adds the ConcreteInt struct which just wraps i64 and implements some of the interfaces the machine needs

opened by williamberman 0
remove lifetime from machine

This is rebased on top of https://github.com/WilfredTA/symbolic-stack-machines/pull/9 and https://github.com/WilfredTA/symbolic-stack-machines/pull/10

Removing the lifetime from the machine means we have to remove the symbolic context. We'll separate out machines into two separate structs, one that is the concrete machine and a symbolic machine that wraps the concrete machine. So the current machine w/out the lifetime will become the concrete machine.

The two {concrete,symbolic} machines work together with the {concrete,symbolic} instructions. We setup the types so that both machines can take a reference to the same program, but the symbolic machine's job is to make sure that symbolic instructions are never executed by the concrete machine. When they are, the opcode panics.

https://github.com/WilfredTA/symbolic-stack-machines/blob/aec8aa99c2c0c19370f8402c3976a41300ed8f70/src/instructions/mod.rs#L59-L71

https://github.com/WilfredTA/symbolic-stack-machines/blob/aec8aa99c2c0c19370f8402c3976a41300ed8f70/src/machine/symbolic.rs#L65-L106

The symbolic machine will hold a list of constraints but again the constraints will be abstract and not have the lifetime.

opened by williamberman 0
Make Extension Environment Generic

In a previous version of symbolic-stack-machines (SSM), each modular component (memory, stack, etc) were all generic over Instruction. The idea was to use Rust's type system to configure the machine's behavior. This turned out to be detrimental to the library's ease of use & ergonomics more specifically.

While climbing out of the so-called "generics-hell", we realized that most of the generics in various components were due to being generic over the type of values stored in the stack, memory, storage, etc. Thus, we created a universal value type that is lazily evaluated; the universal value is actually an AST.

We also introduced a Configuration object for each modular component (StackConfig, MemConfig, etc). These config objects are meant to receive hooks which can be passed to the universal value's AST interpreter. This allows us to treat the universal value in Rust's type system as a single type (thereby drastically simplifying the library's API), while also allowing various components to customize the semantics of that type and specialize it to whichever type is relevant for that component. In other words, it gives us polymorphism across different components as well as polymorphism across different values within a particular component, since the hooks passed to the universal value's interpreter can conditionally evaluate the value as a different type based on the component's state.

The only component that does in fact need to be generic is the ExtEnv. This is because it is meant to be a catch all extension mechanism for implementing symbolic machines with more components, as well as those with bespoke features (e.g., the concept of "gas" in the EVM). Notably, the ExtEnv can also be leveraged to construct register-based machines. The crucial difference between the extension environment component and other components is that it does not have a predefined structure. While other components are well defined though generic over the relevant values on which they operate (a memory is treated as a flat array with some index set and value set; a stack is an abstract data type with API to push, pop, and peek), the ExtEnv has no predefined API nor structural constraints. Thus, we should re-introduce generics to the definition of this component.
enhancement help wanted

opened by WilfredTA 0

Owner

TannrA

Working on the distributed web

GitHub

ttasm is an assembler for assembling source code to TTVM executable.

1 Nov 8, 2021

Auto launch any application or executable at startup

Auto Launch Auto launch any application or executable at startup. Supports Windows, Mac (via AppleScript or Launch Agent), and Linux. How does it work

38 Dec 27, 2022

Executable memory allocator with support for dual mapping and W^X protection

jit-allocator A simple memory allocator for executable code. Use JitAllocator type to allocate/release memory and virtual_memory module functions to e

5 Jul 5, 2023

LIBFFM - field-aware factorization machines - in Rust

LIBFFM Rust LIBFFM - field-aware factorization machines - in Rust Getting Started LIBFFM Rust is available as a Rust library and a command line tool.

1 Jan 8, 2022

Cassette A simple, single-future, non-blocking executor intended for building state machines.

Cassette A simple, single-future, non-blocking executor intended for building state machines. Designed to be no-std and embedded friendly. This execut

50 Jan 2, 2023

Frame is a markdown language for creating state machines (automata) in 7 programming languages as well as generating UML documentation.

Frame Language Transpiler v0.5.1 Hi! So very glad you are interested in Frame. Frame system design markdown language for software architects and engin

35 Dec 31, 2022

Envwoman is an application, to sync your .env-files across multiple machines

Envwoman is an application, to sync your .env-files across multiple machines. The main goal is to make Envwoman secure and trustworthy, so everything is open-source and the data will never in plain-text on the server. Encryption happens client-sided via aes-gcm.

3 Sep 28, 2022

Source code and documentation for our 'full stack on rust' meetup on 29-9-2022

Full stack on Rust This is the code and documentation repository for our 'Full stack on Rust' meetup on 29-9-2022. It includes step-by-step documentat

7 Oct 23, 2022

Ruxnasm is an assembler for Uxntal — a programming language for the Uxn stack-machine by Hundred Rabbits

Ruxnasm is an assembler for Uxntal — a programming language for the Uxn stack-machine by Hundred Rabbits. Ruxnasm strives to be an alternative to Uxnasm, featuring more user-friendly error reporting, warnings, and helpful hints, reminiscent of those seen in modern compilers for languages such as Rust or Elm.

44 Oct 4, 2022

The language that eats the stack. Heavily inspired by porth which is inspired off of forth

Snack The language that eats the stack. Heavily inspired by porth which is inspired off of forth Install To use Snack you will need Rust and fasm Afte

2 Mar 20, 2022

A toy-level BLE peripheral stack

bleps - A toy-level BLE peripheral stack This is a BLE peripheral stack in Rust. (no-std / no-alloc) To use it you need an implementation of embedded-

4 Oct 17, 2022

Open Source Application Stack & PaaS

mycelia Open Source Application Stack & PaaS Installation cargo xtask build NOTE: We opted for cargo-xtask because Cargo build.rs is not supported for

3 Sep 6, 2023

First Git on Rust is reimplementation with rust in order to learn about rust, c and git.

First Git on Rust First Git on Rust is reimplementation with rust in order to learn about rust, c and git. Reference project This project refer to the

1 Jan 28, 2022

A stupid macro that compiles and executes Rust and spits the output directly into your Rust code

inline-rust This is a stupid macro inspired by inline-python that compiles and executes Rust and spits the output directly into your Rust code. There

19 Nov 29, 2022

Learn-rust - An in-depth resource to learn Rust 🦀

Learning Rust ?? Hello friend! ?? Welcome to my "Learning Rust" repo, a home for my notes as I'm learning Rust. I'm structuring everything into lesson

7 Jan 28, 2022

A highly modular Bitcoin Lightning library written in Rust. Its Rust-Lightning, not Rusty's Lightning!

Rust-Lightning is a Bitcoin Lightning library written in Rust. The main crate, lightning, does not handle networking, persistence, or any other I/O. Thus, it is runtime-agnostic, but users must implement basic networking logic, chain interactions, and disk storage. More information is available in the About section.

850 Jan 3, 2023

Symbolically Executable Stack Machines in Rust

Related tags

Overview

Symbolically Executable Stack Machines in Rust

Reachability

Supported Memory & Stack Models

Usage

Open Questions

Notes

Comments

Owner

TannrA

ttasm is an assembler for assembling source code to TTVM executable.

Auto launch any application or executable at startup

Executable memory allocator with support for dual mapping and W^X protection

LIBFFM - field-aware factorization machines - in Rust

Cassette A simple, single-future, non-blocking executor intended for building state machines.

Frame is a markdown language for creating state machines (automata) in 7 programming languages as well as generating UML documentation.

Envwoman is an application, to sync your .env-files across multiple machines

Source code and documentation for our 'full stack on rust' meetup on 29-9-2022

Ruxnasm is an assembler for Uxntal — a programming language for the Uxn stack-machine by Hundred Rabbits

The language that eats the stack. Heavily inspired by porth which is inspired off of forth

A toy-level BLE peripheral stack

Open Source Application Stack & PaaS

First Git on Rust is reimplementation with rust in order to learn about rust, c and git.

A stupid macro that compiles and executes Rust and spits the output directly into your Rust code

Learn-rust - An in-depth resource to learn Rust 🦀

A highly modular Bitcoin Lightning library written in Rust. Its Rust-Lightning, not Rusty's Lightning!

Telegram bot help you to run Rust code in Telegram via Rust playground

`Debug` in rust, but only supports valid rust syntax and outputs nicely formatted using pretty-please

Playing with web dev in Rust. This is a sample Rust microservice that can be deployed on Kubernetes.