A Rust proc-macro crate which derives functions to compile and parse back enums and structs to and from a bytecode representation

Overview

Bytecode

A simple way to derive bytecode for you Enums and Structs.

What is this

This is a crate that provides a proc macro which will derive bytecode representation of your enums and structs, and provides compile and parse functions to convert to and from the bytecode. This also provides necessary traits to do so, in case you want to do it manually.

Note : The values of the fields are compiled as little-endian values, so in the bytecode the smallest byte is at smallest location. The bytecode itself is Big-endian for certain reasons.

Example

Cargo.toml

...
[dependencies]
...
bytecode = {git = "https://github.com/YJDoc2/Bytecode" }
...

Code

use bytecode::{Bytecodable, Bytecode};

#[derive(Bytecode, Debug, PartialEq, Eq)]
pub enum Register {
    AX,
    BX,
    CX,
    DX,
}

#[derive(Bytecode, Debug, PartialEq, Eq)]
pub struct Mem {
    segment: Register,
    offset: Register,
    imOffset: u16,
}

#[derive(Bytecode, Debug, PartialEq, Eq)]
pub enum Opcode {
    Hlt,
    Nop,
    Add(Register, Register),
    AddI(Register, u16),
    AddM(Register, Mem),
}

fn main() {
    let op1 = Opcode::AddI(Register::AX, 57);
    let compiled = op1.compile();

    // This is for the example,
    // actually you might use parse on a already compiled
    // values, to parse them back into the enum variant
    let bytes = [4, 2, 1, 3, 0x75, 0x00];
    let op2 = Opcode::parse(&bytes);
    let op2_test = Opcode::AddM(
        Register::CX,
        Mem {
            segment: Register::BX,
            offset: Register::DX,
            imOffset: 0x0075,
        },
    );
    assert!(op2 == Ok((op2_test, 6)));
}

Bytecodable Trait

This crate also exposes the bytecodable trait, which is used to implement the compile and parse functions. In case it is required, you can implement this by yourself for your own structs/enums.

pub trait Bytecodable{
    /// This function compiles the value to its bytecode representation.
    /// It should return a Vec<u8> containing the bytecode of the value.
    fn compile(&self)->Vec<u8>;

    /// This function parses a u8 slice reference to the value, and
    /// returns a result :
    /// Ok with the parsed value and how many bytes were consumed to parse it
    /// Err with the BytecodeError
    fn parse(&[u8])-> Result<(Self,usize),BytecodeError>;
}

Use case

This shows why and when you would use this. To see how to use this, see the example section.

Consider that you are writing a VM, or an interpreter, and you want to have a opcode-like representation of your instructions. A good way to do this would be using an Enum to represent the instructions, with the parameters of the instruction as the enum fields. So it would be something like this :

pub enum Register{
    AX,
    BX,
    ...
}

pub enum Opcode{
    Nop,
    Hlt,
    Add(Register,Register),
    AddI(Register,u16),
    ...
}

Now that you have done the representation, you can directly use it as an IR, where you target your language to this Enum variants, and the iterate over the list, and taking actions as per the variant.

But if you are making a low level emulator, where you need to store the opcodes as values in memory, or you want to store this representation as a compiled file, you will need to make a bytecode representation of this Enum. In the simplest way, this would mean assigning a value to each enum variant, and to each of enum field, and write functions to manually compile the enum values to u8 array, and parse from u8 array back to enum field. It would be something like this

impl Register{
    ...
    fn compile(&self)->Vec<u8>{
        match self{
            Register::AX => vec![0],
            Register::BX => vec![1],
            ...
        }
    }
    fn parse(bytes:&[u8])->Result<Self,&str>{
        match bytes[0]{
            1 => Ok(Register::AX),
            2 => Ok(Register::BX),
            ...
            _ => Err("Invalid opcode")
        }
    }
    ...
}

impl Opcode{
    ...
    fn compile(&self)->Vec<u8>{
        match self{
            Opcode::Nop => vec![0],
            Opcode::Hlt => vec![1],
            Opcode::Add(r1,r2) => {
                let mut v = Vec::with_capacity(2);
                v.extend(&r1.compile());
                v.extend(&r2.compile());
                v
            }
            Opcode::AddI(r1,v1) =>{
                let mut v = Vec::with_capacity(3);
                v.extend(&r1.compile());
                v.extend(&v1.to_le_bytes());
                v
            }
            ...
        }
    }
    fn parse(bytes:&[u8])->Result<Self,&str>{
        match bytes[0]{
            1 => Ok(Opcode::Nop),
            2 => Ok(Opcode::Hlt),
            3 =>{
                let r1 = Register::parse(&bytes[1..])?;
                ...
            }
            ...
            _ => Err("Invalid opcode")
        }
    }
}

Now consider doing this for even 25-ish opcodes, which is roughly the minimum amount you might need for a small instruction set. For a more complex instruction set, you will need to do this for about 100+ instructions, and then for each field value of individual variant, like Register in this example. This will get tedious, potentially error prone, and quite boring, moving focus from building a VM / interpreter which is your original intention to writing these functions.

The Bytecode macro will derive these functions for you, for Enum and Structs, without having you to manually do anything.

Also imagine trying to remove a variant from middle in the manual implementation :-| You will either :

  • need to remove that variant from parse and compile functions and shift rest of them accordingly
  • keep the variant in the functions, but never output in the IR building, making one dead spot
  • remove the variant, but keep the values in compile and parse function same, treating the value of removed variant as dead spot

None of these is particularly good, and thus delegating this manual work to a proc-macro makes that much more sense!

Why did I make this

  • The primary reason behind making this was to move the IR of my 8086 emulator from Text-based to opcode based. Currently the emulator uses a Text-based IR, which works and gives about 90-95 % of 8086 functionality, but still I would like to make is as similar to an actual 8086 hardware emulator as possible. One of the improvements needed for this is to make the IR bytecode-based, which can be stored in the VM's memory. I didn't want to make the bytecode for about 100 odd instruction opcodes, so instead I made this!
  • I wanted to experiment with proc-macros for sometime, and this seemed a good opportunity to try it.
  • In the long term, I want to make a general Hardware Emulation framework, which will allow connecting different emulated hardware components together, eg: connecting my 8086 emulator with someone's DMA controller. That will not only allow making hardware emulators a bit more approachable, but it will also allow developing modular emulators (at least that's what I think). I feel having such a macro to easily write the bytecode of different devices will be a step in that direction.

Restrictions

This macro has some restrictions :

  • For enums, there can be at most 1 << 15 = 32768 variants. In case you need more variants than this, this cannot help you.

  • By default this provides Bytecodable implementation for bool and all u_ and i_ numeric types except usize and isize. In case you need some other types as enum fields, you need to make sure that they also implements the Bytecodable trait, wither using the derive macro or manually (like the Register enum in the example).

  • This crate does not necessarily create an efficient representation of the values. For example

    enum Register{
        AX,
        BX,
        CX,
        DX,
    }
    enum Opcode{
        ...
        Add(Register,Register),
        ...
    }

    Ideally the Add instruction can be compiled into two bytes :

    • 1 byte for denoting the Add variant of Opcode Enum
    • 1 byte whose higher 4 bits denote first register, and lower 4 bits denote second register. This can be done, as we know that Register enum will have at max 4 values, and thus its bytecode representation will have at most thevalue 3, which can be represented in 4 bits.

    This bytecode macro will not create this efficient representation, but will instead allocate 1 byte for each register field, thus taking 3 bytes total for Add Opcode Variant. In case you need this kind of compression, the only way to do it currently is to use u8 instead of two Register i.e. Add(u8) and then manually extract and code the Registers in the u8.

  • This macro will not allow to manually set value of the bytecode for a specific field or variant. Currently The bytecode starts at 0 for first variant of any enum, and goes on from there. But this is not a guarantee, and ideally the bytecode generated should be treated as a black box and should be interacted only with the compile and parse functions (Ideally) . That said currently there is not way to set specific value for a specific variant, and the values will be allocated as generated. In case you need a specific value for specific opcode, such as 0 for Hlt (because if you are storing the compiled bytecode in VM's memory, treating 0 as Hlt is a good idea to stop random, potentially endless execution), then you will need to make sure Hlt is the first variant in the enum.


License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

You might also like...
proc macros for generating mut and non-mut methods without duplicating code

mwt Hey! You! Read this before using! mwt was thrown together pretty quickly for personal use, because I couldn't find an existing crate that does thi

Provide expansion of proc-macros, in a way that rustc directs you directly to the issues at hand

expander Expands a proc-macro into a file, and uses a include! directive in place. Advantages Only expands a particular proc-macro, not all of them. I

`fugit` provides a comprehensive library of `Duration` and `Instant` for the handling of time in embedded systems, doing all it can at compile time.

fugit fugit provides a comprehensive library of Duration and Instant for the handling of time in embedded systems, doing all it can at compile time. T

constduck: compile-time duck typing and reflection

constduck provides a procmacro that can enable compile time duck typing and reflection on arbitrary struct types.

Option and Either types with variants known at compile time.

Const Either Some types to allow deciding at compile time if an option contains a value or which variant from the either type is active. This might be

Faster division by constants that aren't known at compile-time

Baseline implementation of division by constants When dividing integers by compile-time constants, compilers (LLVM) can be trusted to convert those to

little brother of gnu-copypasta-maker To compile, use make.

UWU Maker little brother of gnu-copypasta-maker To compile, use make. To install, use sudo make install or if you are root make install To uninstall,

Allocate memory at compile time!

const-alloc Link to the docs! Allocate memory at compile time! Currently, in stable rust there is no way to dynamically allocate or deallocate memory

Compile-time lifetimes for comments.

todo_by Compile-time lifetimes for comments. To use this macro, add it to your dependencies via Cargo: cargo add todo_by Then, import and invoke the m

Owner
null
This crate defines a single macro that is a brainfunct compile-time interpreter.

Compile Protection This crate defines a single macro that is a brainfunct compile-time interpreter. One example is as follows #![recursion_limit = "18

John Marsden 7 Nov 29, 2021
Macros to make writing proc-macro crates easy

proc-easy Macros to make writing proc-macro crates easy. This crate provides mainly macros and supporting types and traits to reduce amount of boilerp

Zakarum 7 Jan 1, 2023
A Rust crate to expressively declare bitfield-like structs

proc-bitfield A Rust crate to expressively declare bitfield-like structs, automatically ensuring their correctness at compile time and declaring acces

null 36 Dec 3, 2022
secmem-proc is a crate designed to harden a process against low-privileged attackers running on the same system trying to obtain secret memory contents of the current process.

secmem-proc is a crate designed to harden a process against low-privileged attackers running on the same system trying to obtain secret memory contents of the current process. More specifically, the crate disables core dumps and tries to disable tracing on unix-like OSes.

null 3 Dec 19, 2022
Const equivalents of many [`bytemuck`] functions, and a few additional const functions.

Const equivalents of many bytemuck functions, and a few additional const functions. constmuck uses bytemuck's traits, so any type that implements thos

null 6 Nov 4, 2021
📦 Crate Protocol allows anyone to create, manage, and trade a tokenized basket of assets, which we refer to as a Crate.

?? Crate Protocol Crate Protocol allows anyone to create, manage, and trade a tokenized basket of assets, which we refer to as a Crate. A Crate is alw

Crate Protocol 63 Oct 31, 2022
A Fuel VM bytecode disassembler.

Fuel Disassembler A disassembler for the Fuel VM byte code. About The forc utility provides a parse-bytecode command which will print a bare-bones dis

Toby Hutton 2 Dec 5, 2022
A Huff <> bytecode transpiler

Murph - Transpile EVM bytecode into huff Murph can transpile this: 60003560e01c8063552410771461001c5780632096525514610023575b6004356000555b60005460005

Franfran 53 Feb 17, 2023
a simple compiled language i made in rust. it uses intermediate representation (IR) instead of an abstract syntax tree (AST).

a simple compiled language i made in rust. it uses intermediate representation (IR) instead of an abstract syntax tree (AST).

null 4 Oct 3, 2022
This crate provides a convenient macro that allows you to generate type wrappers that promise to always uphold arbitrary invariants that you specified.

prae This crate provides a convenient macro that allows you to generate type wrappers that promise to always uphold arbitrary invariants that you spec

null 96 Dec 4, 2022