Standalone JIT-style runtime for WebAssembly, using Cranelift

Bytecode Alliance

Last update: Dec 31, 2022

Related tags

WebAssembly rust runtime sandbox webassembly wasm jit wasi cranelift wasmtime

Overview

`wasmtime`

A standalone runtime for WebAssembly

A Bytecode Alliance project

Guide | Contributing | Website | Chat

Installation

The Wasmtime CLI can be installed on Linux and macOS with a small install script:

$ curl https://wasmtime.dev/install.sh -sSf | bash

Windows or otherwise interested users can download installers and binaries directly from the GitHub Releases page.

Example

If you've got the Rust compiler installed then you can take some Rust source code:

fn main() {
    println!("Hello, world!");
}

and compile/run it with:

$ rustup target add wasm32-wasi
$ rustc hello.rs --target wasm32-wasi
$ wasmtime hello.wasm
Hello, world!

Features

Lightweight. Wasmtime is a standalone runtime for WebAssembly that scales with your needs. It fits on tiny chips as well as makes use of huge servers. Wasmtime can be embedded into almost any application too.
Fast. Wasmtime is built on the optimizing Cranelift code generator to quickly generate high-quality machine code at runtime.
Configurable. Whether you need to precompile your wasm ahead of time, or interpret it at runtime, Wasmtime has you covered for all your wasm-executing needs.
WASI. Wasmtime supports a rich set of APIs for interacting with the host environment through the WASI standard.
Standards Compliant. Wasmtime passes the official WebAssembly test suite, implements the official C API of wasm, and implements future proposals to WebAssembly as well. Wasmtime developers are intimately engaged with the WebAssembly standards process all along the way too.

Language Support

You can use Wasmtime from a variety of different languages through embeddings of the implementation:

Rust - the wasmtime crate
C - the wasm.h, wasi.h, and wasmtime.h headers or use wasmtime Conan package
[C++] - the wasmtime-cpp repository or use wasmtime-cpp Conan package
Python - the wasmtime PyPI package
.NET - the Wasmtime NuGet package
Go - the wasmtime-go repository

Documentation

📚 Read the Wasmtime guide here! 📚

The wasmtime guide is the best starting point to learn about what Wasmtime can do for you or help answer your questions about Wasmtime. If you're curious in contributing to Wasmtime, it can also help you do that!

It's Wasmtime.

Comments

add riscv64 backend for cranelift.

I am been trying to add riscv64 backend for cranelift these days. right now I have pass all run test in filetests.

some features not implemented right now. i128 mul div rem, all simd type and compare overflow.

some test need platform support. like bitrev need qemu-riscv64 support bitmanip and zbkb extension (don't know how to enable it.).
cranelift cranelift:meta cranelift:area:aarch64

opened by yuyang-ok 80

`wasmtime`: Implement fast Wasm stack walking

Why do we want Wasm stack walking to be fast? Because we capture stacks whenever there is a trap and traps actually happen fairly frequently with short-lived programs and WASI's exit.

Previously, we would rely on generating the system unwind info (e.g. .eh_frame) and using the system unwinder (via the backtracecrate) to walk the full stack and filter out any non-Wasm stack frames. This can, unfortunately, be slow for two primary reasons:

The system unwinder is doing O(all-kinds-of-frames) work rather than O(wasm-frames) work.
System unwind info and the system unwinder need to be much more general than a purpose-built stack walker for Wasm needs to be. It has to handle any kind of stack frame that any compiler might emit where as our Wasm frames are emitted by Cranelift and always have frame pointers. This translates into implementation complexity and general overhead. There can also be unnecessary-for-our-use-cases global synchronization and locks involved, further slowing down stack walking in the presence of multiple threads trying to capture stacks in parallel.

This commit introduces a purpose-built stack walker for traversing just our Wasm frames. To find all the sequences of Wasm-to-Wasm stack frames, and ignore non-Wasm stack frames, we keep a linked list of (entry stack pointer, exit frame pointer) pairs. This linked list is maintained via Wasm-to-host and host-to-Wasm trampolines. Within a sequence of Wasm-to-Wasm calls, we can use frame pointers (which Cranelift preserves) to find the next older Wasm frame on the stack, and we keep doing this until we reach the entry stack pointer, meaning that the next older frame will be a host frame.

The trampolines need to avoid a couple stumbling blocks. First, they need to be compiled ahead of time, since we may not have access to a compiler at runtime (e.g. if the cranelift feature is disabled) but still want to be able to call functions that have already been compiled and get stack traces for those functions. Usually this means we would compile the appropriate trampolines inside Module::new and the compiled module object would hold the trampolines. However, we also need to support calling host functions that are wrapped into wasmtime::Funcs and there doesn't exist any ahead-of-time compiled module object to hold the appropriate trampolines:

// Define a host function.
let func_type = wasmtime::FuncType::new(
    vec![wasmtime::ValType::I32],
    vec![wasmtime::ValType::I32],
);
let func = Func::new(&mut store, func_type, |_, params, results| {
    // ...
    Ok(())
});

// Call that host function.
let mut results = vec![wasmtime::Val::I32(0)];
func.call(&[wasmtime::Val::I32(0)], &mut results)?;

Therefore, we define one host-to-Wasm trampoline and one Wasm-to-host trampoline in assembly that work for all Wasm and host function signatures. These trampolines are careful to only use volatile registers, avoid touching any register that is an argument in the calling convention ABI, and tail call to the target callee function. This allows forwarding any set of arguments and any returns to and from the callee, while also allowing us to maintain our linked list of Wasm stack and frame pointers before transferring control to the callee. These trampolines are not used in Wasm-to-Wasm calls, only when crossing the host-Wasm boundary, so they do not impose overhead on regular calls. (And if using one trampoline for all host-Wasm boundary crossing ever breaks branch prediction enough in the CPU to become any kind of bottleneck, we can do fun things like have multiple copies of the same trampoline and choose a random copy for each function, sharding the functions across branch predictor entries.)

Finally, this commit also ends the use of a synthetic Module and allocating a stubbed out VMContext for host functions. Instead, we define a VMHostFuncContext with its own magic value, similar to VMComponentContext, specifically for host functions.

Benchmarks

Traps and Stack Traces

Large improvements to taking stack traces on traps, ranging from shaving off 64% to 99.95% of the time it used to take.

multi-threaded-traps/0  time:   [2.5686 us 2.5808 us 2.5934 us]
                        thrpt:  [0.0000  elem/s 0.0000  elem/s 0.0000  elem/s]
                 change:
                        time:   [-85.419% -85.153% -84.869%] (p = 0.00 < 0.05)
                        thrpt:  [+560.90% +573.56% +585.84%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe
multi-threaded-traps/1  time:   [2.9021 us 2.9167 us 2.9322 us]
                        thrpt:  [341.04 Kelem/s 342.86 Kelem/s 344.58 Kelem/s]
                 change:
                        time:   [-91.455% -91.294% -91.096%] (p = 0.00 < 0.05)
                        thrpt:  [+1023.1% +1048.6% +1070.3%]
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe
multi-threaded-traps/2  time:   [2.9996 us 3.0145 us 3.0295 us]
                        thrpt:  [660.18 Kelem/s 663.47 Kelem/s 666.76 Kelem/s]
                 change:
                        time:   [-94.040% -93.910% -93.762%] (p = 0.00 < 0.05)
                        thrpt:  [+1503.1% +1542.0% +1578.0%]
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high severe
multi-threaded-traps/4  time:   [5.5768 us 5.6052 us 5.6364 us]
                        thrpt:  [709.68 Kelem/s 713.63 Kelem/s 717.25 Kelem/s]
                 change:
                        time:   [-93.193% -93.121% -93.052%] (p = 0.00 < 0.05)
                        thrpt:  [+1339.2% +1353.6% +1369.1%]
                        Performance has improved.
multi-threaded-traps/8  time:   [8.6408 us 9.1212 us 9.5438 us]
                        thrpt:  [838.24 Kelem/s 877.08 Kelem/s 925.84 Kelem/s]
                 change:
                        time:   [-94.754% -94.473% -94.202%] (p = 0.00 < 0.05)
                        thrpt:  [+1624.7% +1709.2% +1806.1%]
                        Performance has improved.
multi-threaded-traps/16 time:   [10.152 us 10.840 us 11.545 us]
                        thrpt:  [1.3858 Melem/s 1.4760 Melem/s 1.5761 Melem/s]
                 change:
                        time:   [-97.042% -96.823% -96.577%] (p = 0.00 < 0.05)
                        thrpt:  [+2821.5% +3048.1% +3281.1%]
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

many-modules-registered-traps/1
                        time:   [2.6278 us 2.6361 us 2.6447 us]
                        thrpt:  [378.11 Kelem/s 379.35 Kelem/s 380.55 Kelem/s]
                 change:
                        time:   [-85.311% -85.108% -84.909%] (p = 0.00 < 0.05)
                        thrpt:  [+562.65% +571.51% +580.76%]
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) high mild
  6 (6.00%) high severe
many-modules-registered-traps/8
                        time:   [2.6294 us 2.6460 us 2.6623 us]
                        thrpt:  [3.0049 Melem/s 3.0235 Melem/s 3.0425 Melem/s]
                 change:
                        time:   [-85.895% -85.485% -85.022%] (p = 0.00 < 0.05)
                        thrpt:  [+567.63% +588.95% +608.95%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe
many-modules-registered-traps/64
                        time:   [2.6218 us 2.6329 us 2.6452 us]
                        thrpt:  [24.195 Melem/s 24.308 Melem/s 24.411 Melem/s]
                 change:
                        time:   [-93.629% -93.551% -93.470%] (p = 0.00 < 0.05)
                        thrpt:  [+1431.4% +1450.6% +1469.5%]
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
many-modules-registered-traps/512
                        time:   [2.6569 us 2.6737 us 2.6923 us]
                        thrpt:  [190.17 Melem/s 191.50 Melem/s 192.71 Melem/s]
                 change:
                        time:   [-99.277% -99.268% -99.260%] (p = 0.00 < 0.05)
                        thrpt:  [+13417% +13566% +13731%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild
many-modules-registered-traps/4096
                        time:   [2.7258 us 2.7390 us 2.7535 us]
                        thrpt:  [1.4876 Gelem/s 1.4955 Gelem/s 1.5027 Gelem/s]
                 change:
                        time:   [-99.956% -99.955% -99.955%] (p = 0.00 < 0.05)
                        thrpt:  [+221417% +223380% +224881%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

many-stack-frames-traps/1
                        time:   [1.4658 us 1.4719 us 1.4784 us]
                        thrpt:  [676.39 Kelem/s 679.38 Kelem/s 682.21 Kelem/s]
                 change:
                        time:   [-90.368% -89.947% -89.586%] (p = 0.00 < 0.05)
                        thrpt:  [+860.23% +894.72% +938.21%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe
many-stack-frames-traps/8
                        time:   [2.4772 us 2.4870 us 2.4973 us]
                        thrpt:  [3.2034 Melem/s 3.2167 Melem/s 3.2294 Melem/s]
                 change:
                        time:   [-85.550% -85.370% -85.199%] (p = 0.00 < 0.05)
                        thrpt:  [+575.65% +583.51% +592.03%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe
many-stack-frames-traps/64
                        time:   [10.109 us 10.171 us 10.236 us]
                        thrpt:  [6.2525 Melem/s 6.2925 Melem/s 6.3309 Melem/s]
                 change:
                        time:   [-78.144% -77.797% -77.336%] (p = 0.00 < 0.05)
                        thrpt:  [+341.22% +350.38% +357.55%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe
many-stack-frames-traps/512
                        time:   [126.16 us 126.54 us 126.96 us]
                        thrpt:  [4.0329 Melem/s 4.0461 Melem/s 4.0583 Melem/s]
                 change:
                        time:   [-65.364% -64.933% -64.453%] (p = 0.00 < 0.05)
                        thrpt:  [+181.32% +185.17% +188.71%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high severe

Calls

There is, however, a small regression in raw Wasm-to-host and host-to-Wasm call performance due the new trampolines. It seems to be on the order of about 2-10 nanoseconds per call, depending on the benchmark.

I believe this regression is ultimately acceptable because

this overhead will be vastly dominated by whatever work a non-nop callee actually does,
we will need these trampolines, or something like them, when implementing the Wasm exceptions proposal to do things like translate Wasm's exceptions into Rust's Results,
and because the performance improvements to trapping and capturing stack traces are of such a larger magnitude than this call regressions.

sync/no-hook/host-to-wasm - typed - nop
                        time:   [28.683 ns 28.757 ns 28.844 ns]
                        change: [+16.472% +17.183% +17.904%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  5 (5.00%) high severe
sync/no-hook/host-to-wasm - untyped - nop
                        time:   [42.515 ns 42.652 ns 42.841 ns]
                        change: [+12.371% +14.614% +17.462%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) high mild
  10 (10.00%) high severe
sync/no-hook/host-to-wasm - unchecked - nop
                        time:   [33.936 ns 34.052 ns 34.179 ns]
                        change: [+25.478% +26.938% +28.369%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) high mild
  2 (2.00%) high severe
sync/no-hook/host-to-wasm - typed - nop-params-and-results
                        time:   [34.290 ns 34.388 ns 34.502 ns]
                        change: [+40.802% +42.706% +44.526%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  5 (5.00%) high mild
  8 (8.00%) high severe
sync/no-hook/host-to-wasm - untyped - nop-params-and-results
                        time:   [62.546 ns 62.721 ns 62.919 ns]
                        change: [+2.5014% +3.6319% +4.8078%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  2 (2.00%) high mild
  10 (10.00%) high severe
sync/no-hook/host-to-wasm - unchecked - nop-params-and-results
                        time:   [42.609 ns 42.710 ns 42.831 ns]
                        change: [+20.966% +22.282% +23.475%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  4 (4.00%) high mild
  7 (7.00%) high severe

sync/hook-sync/host-to-wasm - typed - nop
                        time:   [29.546 ns 29.675 ns 29.818 ns]
                        change: [+20.693% +21.794% +22.836%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
sync/hook-sync/host-to-wasm - untyped - nop
                        time:   [45.448 ns 45.699 ns 45.961 ns]
                        change: [+17.204% +18.514% +19.590%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
  4 (4.00%) high mild
  10 (10.00%) high severe
sync/hook-sync/host-to-wasm - unchecked - nop
                        time:   [34.334 ns 34.437 ns 34.558 ns]
                        change: [+23.225% +24.477% +25.886%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe
sync/hook-sync/host-to-wasm - typed - nop-params-and-results
                        time:   [36.594 ns 36.763 ns 36.974 ns]
                        change: [+41.967% +47.261% +52.086%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) high mild
  9 (9.00%) high severe
sync/hook-sync/host-to-wasm - untyped - nop-params-and-results
                        time:   [63.541 ns 63.831 ns 64.194 ns]
                        change: [-4.4337% -0.6855% +2.7134%] (p = 0.73 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe
sync/hook-sync/host-to-wasm - unchecked - nop-params-and-results
                        time:   [43.968 ns 44.169 ns 44.437 ns]
                        change: [+18.772% +21.802% +24.623%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
  3 (3.00%) high mild
  12 (12.00%) high severe

async/no-hook/host-to-wasm - typed - nop
                        time:   [4.9612 us 4.9743 us 4.9889 us]
                        change: [+9.9493% +11.911% +13.502%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe
async/no-hook/host-to-wasm - untyped - nop
                        time:   [5.0030 us 5.0211 us 5.0439 us]
                        change: [+10.841% +11.873% +12.977%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe
async/no-hook/host-to-wasm - typed - nop-params-and-results
                        time:   [4.9273 us 4.9468 us 4.9700 us]
                        change: [+4.7381% +6.8445% +8.8238%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
  5 (5.00%) high mild
  9 (9.00%) high severe
async/no-hook/host-to-wasm - untyped - nop-params-and-results
                        time:   [5.1151 us 5.1338 us 5.1555 us]
                        change: [+9.5335% +11.290% +13.044%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  3 (3.00%) high mild
  13 (13.00%) high severe

async/hook-sync/host-to-wasm - typed - nop
                        time:   [4.9330 us 4.9394 us 4.9467 us]
                        change: [+10.046% +11.038% +12.035%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe
async/hook-sync/host-to-wasm - untyped - nop
                        time:   [5.0073 us 5.0183 us 5.0310 us]
                        change: [+9.3828% +10.565% +11.752%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe
async/hook-sync/host-to-wasm - typed - nop-params-and-results
                        time:   [4.9610 us 4.9839 us 5.0097 us]
                        change: [+9.0857% +11.513% +14.359%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  7 (7.00%) high mild
  6 (6.00%) high severe
async/hook-sync/host-to-wasm - untyped - nop-params-and-results
                        time:   [5.0995 us 5.1272 us 5.1617 us]
                        change: [+9.3600% +11.506% +13.809%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe

async-pool/no-hook/host-to-wasm - typed - nop
                        time:   [2.4242 us 2.4316 us 2.4396 us]
                        change: [+7.8756% +8.8803% +9.8346%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe
async-pool/no-hook/host-to-wasm - untyped - nop
                        time:   [2.5102 us 2.5155 us 2.5210 us]
                        change: [+12.130% +13.194% +14.270%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) high mild
  8 (8.00%) high severe
async-pool/no-hook/host-to-wasm - typed - nop-params-and-results
                        time:   [2.4203 us 2.4310 us 2.4440 us]
                        change: [+4.0380% +6.3623% +8.7534%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
  5 (5.00%) high mild
  9 (9.00%) high severe
async-pool/no-hook/host-to-wasm - untyped - nop-params-and-results
                        time:   [2.5501 us 2.5593 us 2.5700 us]
                        change: [+8.8802% +10.976% +12.937%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  5 (5.00%) high mild
  11 (11.00%) high severe

async-pool/hook-sync/host-to-wasm - typed - nop
                        time:   [2.4135 us 2.4190 us 2.4254 us]
                        change: [+8.3640% +9.3774% +10.435%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  6 (6.00%) high mild
  5 (5.00%) high severe
async-pool/hook-sync/host-to-wasm - untyped - nop
                        time:   [2.5172 us 2.5248 us 2.5357 us]
                        change: [+11.543% +12.750% +13.982%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) high mild
  7 (7.00%) high severe
async-pool/hook-sync/host-to-wasm - typed - nop-params-and-results
                        time:   [2.4214 us 2.4353 us 2.4532 us]
                        change: [+1.5158% +5.0872% +8.6765%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
  2 (2.00%) high mild
  13 (13.00%) high severe
async-pool/hook-sync/host-to-wasm - untyped - nop-params-and-results
                        time:   [2.5499 us 2.5607 us 2.5748 us]
                        change: [+10.146% +12.459% +14.919%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
  3 (3.00%) high mild
  15 (15.00%) high severe

sync/no-hook/wasm-to-host - nop - typed
                        time:   [6.6135 ns 6.6288 ns 6.6452 ns]
                        change: [+37.927% +38.837% +39.869%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe
sync/no-hook/wasm-to-host - nop-params-and-results - typed
                        time:   [15.930 ns 15.993 ns 16.067 ns]
                        change: [+3.9583% +5.6286% +7.2430%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  11 (11.00%) high mild
  1 (1.00%) high severe
sync/no-hook/wasm-to-host - nop - untyped
                        time:   [20.596 ns 20.640 ns 20.690 ns]
                        change: [+4.3293% +5.2047% +6.0935%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) high mild
  5 (5.00%) high severe
sync/no-hook/wasm-to-host - nop-params-and-results - untyped
                        time:   [42.659 ns 42.882 ns 43.159 ns]
                        change: [-2.1466% -0.5079% +1.2554%] (p = 0.58 > 0.05)
                        No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) high mild
  14 (14.00%) high severe
sync/no-hook/wasm-to-host - nop - unchecked
                        time:   [10.671 ns 10.691 ns 10.713 ns]
                        change: [+83.911% +87.620% +92.062%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe
sync/no-hook/wasm-to-host - nop-params-and-results - unchecked
                        time:   [11.136 ns 11.190 ns 11.263 ns]
                        change: [-29.719% -28.446% -27.029%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  4 (4.00%) high mild
  10 (10.00%) high severe

sync/hook-sync/wasm-to-host - nop - typed
                        time:   [6.7964 ns 6.8087 ns 6.8226 ns]
                        change: [+21.531% +24.206% +27.331%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
  4 (4.00%) high mild
  10 (10.00%) high severe
sync/hook-sync/wasm-to-host - nop-params-and-results - typed
                        time:   [15.865 ns 15.921 ns 15.985 ns]
                        change: [+4.8466% +6.3330% +7.8317%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  3 (3.00%) high mild
  13 (13.00%) high severe
sync/hook-sync/wasm-to-host - nop - untyped
                        time:   [21.505 ns 21.587 ns 21.677 ns]
                        change: [+8.0908% +9.1943% +10.254%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe
sync/hook-sync/wasm-to-host - nop-params-and-results - untyped
                        time:   [44.018 ns 44.128 ns 44.261 ns]
                        change: [-1.4671% -0.0458% +1.2443%] (p = 0.94 > 0.05)
                        No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
  5 (5.00%) high mild
  9 (9.00%) high severe
sync/hook-sync/wasm-to-host - nop - unchecked
                        time:   [11.264 ns 11.326 ns 11.387 ns]
                        change: [+80.225% +81.659% +83.068%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe
sync/hook-sync/wasm-to-host - nop-params-and-results - unchecked
                        time:   [11.816 ns 11.865 ns 11.920 ns]
                        change: [-29.152% -28.040% -26.957%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  8 (8.00%) high mild
  6 (6.00%) high severe

async/no-hook/wasm-to-host - nop - typed
                        time:   [6.6221 ns 6.6385 ns 6.6569 ns]
                        change: [+43.618% +44.755% +45.965%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  6 (6.00%) high mild
  7 (7.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - typed
                        time:   [15.884 ns 15.929 ns 15.983 ns]
                        change: [+3.5987% +5.2053% +6.7846%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  3 (3.00%) high mild
  13 (13.00%) high severe
async/no-hook/wasm-to-host - nop - untyped
                        time:   [20.615 ns 20.702 ns 20.821 ns]
                        change: [+6.9799% +8.1212% +9.2819%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) high mild
  8 (8.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - untyped
                        time:   [41.956 ns 42.207 ns 42.521 ns]
                        change: [-4.3057% -2.7730% -1.2428%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  3 (3.00%) high mild
  11 (11.00%) high severe
async/no-hook/wasm-to-host - nop - unchecked
                        time:   [10.440 ns 10.474 ns 10.513 ns]
                        change: [+83.959% +85.826% +87.541%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - unchecked
                        time:   [11.476 ns 11.512 ns 11.554 ns]
                        change: [-29.857% -28.383% -26.978%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  5 (5.00%) high severe
async/no-hook/wasm-to-host - nop - async-typed
                        time:   [26.427 ns 26.478 ns 26.532 ns]
                        change: [+6.5730% +7.4676% +8.3983%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - async-typed
                        time:   [28.557 ns 28.693 ns 28.880 ns]
                        change: [+1.9099% +3.7332% +5.9731%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) high mild
  14 (14.00%) high severe

async/hook-sync/wasm-to-host - nop - typed
                        time:   [6.7488 ns 6.7630 ns 6.7784 ns]
                        change: [+19.935% +22.080% +23.683%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - typed
                        time:   [15.928 ns 16.031 ns 16.149 ns]
                        change: [+5.5188% +6.9567% +8.3839%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  9 (9.00%) high mild
  2 (2.00%) high severe
async/hook-sync/wasm-to-host - nop - untyped
                        time:   [21.930 ns 22.114 ns 22.296 ns]
                        change: [+4.6674% +7.7588% +10.375%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - untyped
                        time:   [42.684 ns 42.858 ns 43.081 ns]
                        change: [-5.2957% -3.4693% -1.6217%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  2 (2.00%) high mild
  12 (12.00%) high severe
async/hook-sync/wasm-to-host - nop - unchecked
                        time:   [11.026 ns 11.053 ns 11.086 ns]
                        change: [+70.751% +72.378% +73.961%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) high mild
  5 (5.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - unchecked
                        time:   [11.840 ns 11.900 ns 11.982 ns]
                        change: [-27.977% -26.584% -24.887%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  3 (3.00%) high mild
  15 (15.00%) high severe
async/hook-sync/wasm-to-host - nop - async-typed
                        time:   [27.601 ns 27.709 ns 27.882 ns]
                        change: [+8.1781% +9.1102% +10.030%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
  6 (6.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - async-typed
                        time:   [28.955 ns 29.174 ns 29.413 ns]
                        change: [+1.1226% +3.0366% +5.1126%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  7 (7.00%) high mild
  6 (6.00%) high severe

async-pool/no-hook/wasm-to-host - nop - typed
                        time:   [6.5626 ns 6.5733 ns 6.5851 ns]
                        change: [+40.561% +42.307% +44.514%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - typed
                        time:   [15.820 ns 15.886 ns 15.969 ns]
                        change: [+4.1044% +5.7928% +7.7122%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 17 outliers among 100 measurements (17.00%)
  4 (4.00%) high mild
  13 (13.00%) high severe
async-pool/no-hook/wasm-to-host - nop - untyped
                        time:   [20.481 ns 20.521 ns 20.566 ns]
                        change: [+6.7962% +7.6950% +8.7612%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  6 (6.00%) high mild
  5 (5.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - untyped
                        time:   [41.834 ns 41.998 ns 42.189 ns]
                        change: [-3.8185% -2.2687% -0.7541%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
  3 (3.00%) high mild
  10 (10.00%) high severe
async-pool/no-hook/wasm-to-host - nop - unchecked
                        time:   [10.353 ns 10.380 ns 10.414 ns]
                        change: [+82.042% +84.591% +87.205%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - unchecked
                        time:   [11.123 ns 11.168 ns 11.228 ns]
                        change: [-30.813% -29.285% -27.874%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  11 (11.00%) high mild
  1 (1.00%) high severe
async-pool/no-hook/wasm-to-host - nop - async-typed
                        time:   [27.442 ns 27.528 ns 27.638 ns]
                        change: [+7.5215% +9.9795% +12.266%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
  3 (3.00%) high mild
  15 (15.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - async-typed
                        time:   [29.014 ns 29.148 ns 29.312 ns]
                        change: [+2.0227% +3.4722% +4.9047%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild
  1 (1.00%) high severe

async-pool/hook-sync/wasm-to-host - nop - typed
                        time:   [6.7916 ns 6.8116 ns 6.8325 ns]
                        change: [+20.937% +22.050% +23.281%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - typed
                        time:   [15.917 ns 15.975 ns 16.051 ns]
                        change: [+4.6404% +6.4217% +8.3075%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  5 (5.00%) high mild
  11 (11.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - untyped
                        time:   [21.558 ns 21.612 ns 21.679 ns]
                        change: [+8.1158% +9.1409% +10.217%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - untyped
                        time:   [42.475 ns 42.614 ns 42.775 ns]
                        change: [-6.3613% -4.4709% -2.7647%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  3 (3.00%) high mild
  15 (15.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - unchecked
                        time:   [11.150 ns 11.195 ns 11.247 ns]
                        change: [+74.424% +77.056% +79.811%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
  3 (3.00%) high mild
  11 (11.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - unchecked
                        time:   [11.639 ns 11.695 ns 11.760 ns]
                        change: [-30.212% -29.023% -27.954%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  7 (7.00%) high mild
  8 (8.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - async-typed
                        time:   [27.480 ns 27.712 ns 27.984 ns]
                        change: [+2.9764% +6.5061% +9.8914%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - async-typed
                        time:   [29.218 ns 29.380 ns 29.600 ns]
                        change: [+5.2283% +7.7247% +10.822%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  2 (2.00%) high mild
  14 (14.00%) high severe

wasmtime:api cranelift cranelift:area:machinst cranelift:meta fuzzing cranelift:area:aarch64 cranelift:area:x64 wasmtime:ref-types wasmtime:config

opened by fitzgen 40

wasm64 support

We should consider supporting wasm64 modules, not just wasm32; people will want to run with large linear address spaces, both to process large amounts of data and to provide address space for shared mappings or file mappings.

Opening this issue to start discussing what that support should look like, and how we can do that with minimal complexity or duplication.

opened by joshtriplett 40
externref: implement stack map-based garbage collection

For host VM code, we use plain reference counting, where cloning increments the reference count, and dropping decrements it. We can avoid many of the on-stack increment/decrement operations that typically plague the performance of reference counting via Rust's ownership and borrowing system. Moving a VMExternRef avoids mutating its reference count, and borrowing it either avoids the reference count increment or delays it until if/when the VMExternRef is cloned.

When passing a VMExternRef into compiled Wasm code, we don't want to do reference count mutations for every compiled local.{get,set}, nor for every function call. Therefore, we use a variation of deferred reference counting, where we only mutate reference counts when storing VMExternRefs somewhere that outlives the activation: into a global or table. Simultaneously, we over-approximate the set of VMExternRefs that are inside Wasm function activations. Periodically, we walk the stack at GC safe points, and use stack map information to precisely identify the set of VMExternRefs inside Wasm activations. Then we take the difference between this precise set and our over-approximation, and decrement the reference count for each of the VMExternRefs that are in our over-approximation but not in the precise set. Finally, the over-approximation is replaced with the precise set.

The VMExternRefActivationsTable implements the over-approximized set of VMExternRefs referenced by Wasm activations. Calling a Wasm function and passing it a VMExternRef moves the VMExternRef into the table, and the compiled Wasm function logically "borrows" the VMExternRef from the table. Similarly, global.get and table.get operations clone the gotten VMExternRef into the VMExternRefActivationsTable and then "borrow" the reference out of the table.

When a VMExternRef is returned to host code from a Wasm function, the host increments the reference count (because the reference is logically "borrowed" from the VMExternRefActivationsTable and the reference count from the table will be dropped at the next GC).

For more general information on deferred reference counting, see An Examination of Deferred Reference Counting and Cycle Detection by Quinane: https://openresearch-repository.anu.edu.au/bitstream/1885/42030/2/hon-thesis.pdf

cc #929

Fixes #1804

Depends on https://github.com/rust-lang/backtrace-rs/pull/341
wasmtime:api cranelift

opened by fitzgen 33
[meta] Migrate instruction selection to ISLE
This is a meta issue to track the migration from hand-written instruction selection and lowering over to using the ISLE DSL.

As you port lowering for a clif opcode over to ISLE, please check the associated box (or leave a comment, if you don't have edit permissions and I or someone else can check the box for you). Hopefully this will help us focus our porting efforts and finish the migration in a timely manner, as well as avoid stepping on each others toes by having two people accidentally port the same opcode lowerings.

cc @alexcrichton @cfallin @abrown @jlb6740 @uweigand @sparker-arm @akirilov-arm

x86_64 -- DONE!

[x] Opcode::Clz

[x] Opcode::Ctz

[x] Opcode::Popcnt

[x] Opcode::Bitrev

[x] Opcode::IsNull

[x] Opcode::IsInvalid

[x] Opcode::Uextend

[x] Opcode::Sextend

[x] Opcode::Breduce

[x] Opcode::Bextend

[x] Opcode::Ireduce

[x] Opcode::Bint

[x] Opcode::Icmp

[x] Opcode::Fcmp

[x] Opcode::FallthroughReturn

[x] Opcode::Return

[x] Opcode::Call

[x] Opcode::CallIndirect

[x] Opcode::Debugtrap

[x] Opcode::Trapif

[x] Opcode::Trapff

[x] Opcode::WideningPairwiseDotProductS

[x] Opcode::Fadd

[x] Opcode::Fsub

[x] Opcode::Fmul

[x] Opcode::Fdiv

[x] Opcode::Fmin

[x] Opcode::Fmax

[x] Opcode::FminPseudo

[x] Opcode::FmaxPseudo

[x] Opcode::Sqrt

[x] Opcode::Fpromote

[x] Opcode::FvpromoteLow

[x] Opcode::Fdemote

[x] Opcode::Fvdemote

[x] Opcode::FcvtFromSint

[x] Opcode::FcvtLowFromSint

[x] Opcode::FcvtFromUint

[x] Opcode::FcvtToUint

[x] Opcode::FcvtToUintSat

[x] Opcode::FcvtToSint

[x] Opcode::FcvtToSintSat

[x] Opcode::IaddPairwise

[x] Opcode::UwidenHigh

[x] Opcode::UwidenLow

[x] Opcode::SwidenHigh

[x] Opcode::SwidenLow

[x] Opcode::Snarrow

[x] Opcode::Unarrow

[x] Opcode::Bitcast

[x] Opcode::Fabs

[x] Opcode::Fneg

[x] Opcode::Fcopysign

[x] Opcode::Ceil

[x] Opcode::Floor

[x] Opcode::Nearest

[x] Opcode::Trunc

[x] Opcode::Load

[x] Opcode::Uload8

[x] Opcode::Sload8

[x] Opcode::Uload16

[x] Opcode::Sload16

[x] Opcode::Uload32

[x] Opcode::Sload32

[x] Opcode::Sload8x8

[x] Opcode::Uload8x8

[x] Opcode::Sload16x4

[x] Opcode::Uload16x4

[x] Opcode::Sload32x2

[x] Opcode::Uload32x2

[x] Opcode::Store

[x] Opcode::Istore8

[x] Opcode::Istore16

[x] Opcode::Istore32

[x] Opcode::AtomicRmw

[x] Opcode::AtomicCas

[x] Opcode::AtomicLoad

[x] Opcode::AtomicStore

[x] Opcode::Fence

[x] Opcode::FuncAddr

[x] Opcode::SymbolValue

[x] Opcode::StackAddr

[x] Opcode::Select

[x] Opcode::Selectif

[x] Opcode::SelectifSpectreGuard

[x] Opcode::Udiv

[x] Opcode::Urem

[x] Opcode::Sdiv

[x] Opcode::Srem

[x] Opcode::Umulhi

[x] Opcode::Smulhi

[x] Opcode::GetPinnedReg

[x] Opcode::SetPinnedReg

[x] Opcode::Vconst

[x] Opcode::RawBitcast

[x] Opcode::Shuffle

[x] Opcode::Swizzle

[x] Opcode::Insertlane

[x] Opcode::Extractlane

[x] Opcode::ScalarToVector

[x] Opcode::Splat

[x] Opcode::VanyTrue

[x] Opcode::VallTrue

[x] Opcode::VhighBits

[x] Opcode::Iconcat

[x] Opcode::Isplit

[x] Opcode::TlsValue

[x] Opcode::SqmulRoundSat

[x] Opcode::Uunarrow

aarch64 -- DONE!

[x] Opcode::Load

[x] Opcode::Uload8

[x] Opcode::Sload8

[x] Opcode::Uload16

[x] Opcode::Sload16

[x] Opcode::Uload32

[x] Opcode::Sload32

[x] Opcode::Sload8x8

[x] Opcode::Uload8x8

[x] Opcode::Sload16x4

[x] Opcode::Uload16x4

[x] Opcode::Sload32x2

[x] Opcode::Uload32x2

[x] Opcode::Store

[x] Opcode::Istore8

[x] Opcode::Istore16

[x] Opcode::Istore32

[x] Opcode::StackAddr

[x] Opcode::AtomicRmw

[x] Opcode::AtomicCas

[x] Opcode::AtomicLoad

[x] Opcode::AtomicStore

[x] Opcode::Fence

[x] Opcode::Select

[x] Opcode::Selectif

[x] Opcode::SelectifSpectreGuard

[x] Opcode::Bitselect

[x] Opcode::Vselect

[x] Opcode::Trueif

[x] Opcode::Trueff

[x] Opcode::IsNull

[x] Opcode::IsInvalid

[x] Opcode::Copy

[x] Opcode::Breduce

[x] Opcode::Ireduce

[x] Opcode::Bextend

[x] Opcode::Bmask

[x] Opcode::Bint

[x] Opcode::Bitcast

[x] Opcode::FallthroughReturn

[x] Opcode::Return

[x] Opcode::Icmp

[x] Opcode::Fcmp

[x] Opcode::Debugtrap

[x] Opcode::Trap

[x] Opcode::ResumableTrap

[x] Opcode::Trapif

[x] Opcode::Trapff

[x] Opcode::FuncAddr

[x] Opcode::SymbolValue

[x] Opcode::Call

[x] Opcode::CallIndirect

[x] Opcode::GetPinnedReg

[x] Opcode::SetPinnedReg

[x] Opcode::Vconst

[x] Opcode::RawBitcast

[x] Opcode::Extractlane

[x] Opcode::Insertlane

[x] Opcode::Splat

[x] Opcode::ScalarToVector

[x] Opcode::VallTrue

[x] Opcode::VanyTrue

[x] Opcode::VhighBits

[x] Opcode::Shuffle

[x] Opcode::Swizzle

[x] Opcode::Isplit

[x] Opcode::Iconcat

[x] Opcode::Imax

[x] Opcode::Umax

[x] Opcode::Umin

[x] Opcode::Imin

[x] Opcode::IaddPairwise

[x] Opcode::WideningPairwiseDotProductS

[x] Opcode::Fadd

[x] Opcode::Fsub

[x] Opcode::Fmul

[x] Opcode::Fdiv

[x] Opcode::Fmin

[x] Opcode::Fmax

[x] Opcode::FminPseudo

[x] Opcode::FmaxPseudo

[x] Opcode::Sqrt

[x] Opcode::Fneg

[x] Opcode::Fabs

[x] Opcode::Fpromote

[x] Opcode::Fdemote

[x] Opcode::Ceil

[x] Opcode::Floor

[x] Opcode::Trunc

[x] Opcode::Nearest

[x] Opcode::Fma

[x] Opcode::Fcopysign

[x] Opcode::FcvtToUint

[x] Opcode::FcvtToSint

[x] Opcode::FcvtFromUint

[x] Opcode::FcvtFromSint

[x] Opcode::FcvtToUintSat

[x] Opcode::FcvtToSintSat

[x] Opcode::IaddIfcout

[x] Opcode::Iabs

[x] Opcode::AvgRound

[x] Opcode::Snarrow

[x] Opcode::Unarrow

[x] Opcode::Uunarrow

[x] Opcode::SwidenLow

[x] Opcode::SwidenHigh

[x] Opcode::UwidenLow

[x] Opcode::UwidenHigh

[x] Opcode::TlsValue

[x] Opcode::SqmulRoundSato

[x] Opcode::FcvtLowFromSint

[x] Opcode::FvpromoteLow

[x] Opcode::Fvdemote

[x] Branches

s390x -- DONE!

[x] Calls

[x] Returns

[x] Traps

[x] Branches

cranelift cranelift:E-compiler cranelift:area:aarch64 cranelift:area:x64 isle cranelift:area:s390x
opened by fitzgen 32
memfd/madvise-based CoW pooling allocator
Add a pooling allocator mode based on copy-on-write mappings of memfds.

As first suggested by Jan on the Zulip here [1], a cheap and effective way to obtain copy-on-write semantics of a "backing image" for a Wasm memory is to mmap a file with MAP_PRIVATE. The memfd mechanism provided by the Linux kernel allows us to create anonymous, in-memory-only files that we can use for this mapping, so we can construct the image contents on-the-fly then effectively create a CoW overlay. Furthermore, and importantly, madvise(MADV_DONTNEED, ...) will discard the CoW overlay, returning the mapping to its original state.

By itself this is almost enough for a very fast instantiation-termination loop of the same image over and over, without changing the address space mapping at all (which is expensive). The only missing bit is how to implement heap growth. But here memfds can help us again: if we create another anonymous file and map it where the extended parts of the heap would go, we can take advantage of the fact that a mmap() mapping can be larger than the file itself, with accesses beyond the end generating a SIGBUS, and the fact that we can cheaply resize the file with ftruncate, even after a mapping exists. So we can map the "heap extension" file once with the maximum memory-slot size and grow the memfd itself as memory.grow operations occur.

The above CoW technique and heap-growth technique together allow us a fastpath of madvise() and ftruncate() only when we re-instantiate the same module over and over, as long as we can reuse the same slot. This fastpath avoids all whole-process address-space locks in the Linux kernel, which should mean it is highly scalable. It also avoids the cost of copying data on read, as the uffd heap backend does when servicing pagefaults; the kernel's own optimized CoW logic (same as used by all file mmaps) is used instead.

There are still a few loose ends in this PR, which I intend to tie up before merging:

There is no InstanceAllocationStrategy yet that attempts to actually reuse instance slots; that should be added ASAP. For testing so far, I have just instantiated the same one module repeatedly (so reuse naturally occurs).

The guard-page strategy is slightly wrong; I need to implement the pre-heap guard region as well. This will be done by performing another mapping once, to reserve the whole address range, then mmap'ing the image and extension file on top at appropriate offsets (2GiB, 2GiB plus image size).

Thanks to Jan on Zulip (are you also @koute from #3691?) for the initial idea/inspiration! This PR is meant to demonstrate my thoughts on how to build the feature and spawn discussion; now that we see both approaches hopefully we can work out a way to meet the needs of both of our use-cases.

[1] https://bytecodealliance.zulipchat.com/#narrow/stream/206238-general/topic/Copy.20on.20write.20based.20instance.20reuse/near/266657772
wasmtime:api
opened by cfallin 32
support a few DWARF-5 only features
See #932.

accept and pass DebugAddrIndex, DebugStrOffsetsIndex attributes

skip DebugAddrBase, DebugStrOffsetsBase attribute when transforming, these are managed by the compilation unit elsewhere

accept and resolve DebugLineStrRef in line programs

read .debug_addr

read .debug_rnglists

read .debug_loclists

read .debug_line_str

read .debug_str_offsets

perform the DebugAddrIndex and DebugStrOffsetsIndex indirections

TODO:

[x] tests (added DWARF-5 test, but it needs a refresh, lldb test also needed).
opened by ggreif 31
Implement path_link for Windows.
This is probably the last missing syscall for Windows!

This PR implements path_link for Windows and adds a non-strict version of the path_link integration test.

I'm unsure about the error handling in path_link. MSDN doesn't say much about possible error codes for either CreateHardLinkA or CreateSymbolicLinkA. I mostly copied over the error conversion from path_symlink, but I'm not sure if it's correct. In particular, it's unclear to me what the purpose of strip_trailing_slashes_and_concatenate is.

path_symlink will now also detect an attempt to create a dangling symlink and return ENOTSUP. (is this the correct return code)?

Currently the non-strictness of the test consists of:

we use a separate subdirectories subdir, subdir2, subdir3 for each test stage. This is due to the fact Windows will not remove the directory and won't allow to create a directory with the same name until the previous one has been deleted. I don't see any way of circumventing it, because the application may still try to access the directory through the unclosed file descriptor.

path_link will return EACCES instead of EPERM when trying to create a link to a subdirectory. This violates the POSIX spec. We could manually check if the source path is a directory in case of ERROR_ACCESS_DENIED but this would cost us an extra syscall.

Tests for dangling symlinks or symlink loops have been disabled. Alternatively, we could check if the attempt to create a dangling symlink returns ENOTSUP, but this doesn't make much sense while 1&2 are an issue.

Let me know what you think.

Btw. @kubkon, according to this stackoverflow post Mac OS X 10.5+ permits hard links to directories, which our tests expect to fail.

Notes about links and symlinks under Windows:

creating a symlink requires administrative privileges (SeCreateSymbolicLinkPrivilege). On Windows 10 this requirement may be removed, but this requires enabling developer mode

Windows distinguishes between file and directory symlinks

It's possible to create a dangling symlink, but the type (file/directory) has to be specified upon creation. The behavior in case of type mismatch is inconsistent. Precisely, suppose that a dangling file symlink is created foo -> bar and later, a directory bar is created. Then:

under msys64 bash, cd foo succeeds and the directory view is the same when access either directly or through the symlink

under cmd (both windowed and as a child process from msys64 bash). cd foo fails with The directory name is invalid

wasi:impl wasi:tests wasi
opened by marmistrz 31
Implement lazy funcref table and anyfunc initialization.
During instance initialization, we build two sorts of arrays eagerly:

We create an "anyfunc" (a VMCallerCheckedAnyfunc) for every function in an instance.

We initialize every element of a funcref table with an initializer to a pointer to one of these anyfuncs.

Most instances will not touch (via call_indirect or table.get) all funcref table elements. And most anyfuncs will never be referenced, because most functions are never placed in tables or used with ref.func. Thus, both of these initialization tasks are quite wasteful. Profiling shows that a significant fraction of the remaining instance-initialization time after our other recent optimizations is going into these two tasks.

This PR implements two basic ideas:

The anyfunc array can be lazily initialized as long as we retain the information needed to do so. A zero in the func-ptr part of the tuple means "uninitalized"; a null-check and slowpath does the initialization whenever we take a pointer to an anyfunc.

A funcref table can be lazily initialized as long as we retain a link to its corresponding instance and function index for each element. A zero in a table element means "uninitialized", and a slowpath does the initialization.

The use of all-zeroes to mean "uninitialized" means that we can use fast memory clearing techniques, like madvise(DONTNEED) on Linux or just freshly-mmap'd anonymous memory, to get to the initial state without a lot of memory writes.

Funcref tables are a little tricky because funcrefs can be null. We need to distinguish "element was initially non-null, but user stored explicit null later" from "element never touched" (ie the lazy init should not blow away an explicitly stored null). We solve this by stealing the LSB from every funcref (anyfunc pointer): when the LSB is set, the funcref is initialized and we don't hit the lazy-init slowpath. We insert the bit on storing to the table and mask it off after loading.

Performance effect on instantiation in the on-demand allocator (pooling allocator effect should be similar as the table-init path is the same):

sequential/default/spidermonkey.wasm time: [71.886 us 72.012 us 72.133 us] sequential/default/spidermonkey.wasm time: [22.243 us 22.256 us 22.270 us] change: [-69.117% -69.060% -69.000%] (p = 0.00 < 0.05) Performance has improved.

So, 72µs to 22µs, or a 69% reduction.
wasmtime:api cranelift cranelift:area:machinst cranelift:wasm cranelift:area:x64
opened by cfallin 28
Debug a wasm application with reasonable amount of RAM

I'm trying to run a large (~15 MB) wasm application that crashes on ud2. Is there any way to determine which function the crash address corresponds to?

opened by whitequark 28
Support records, variants, enums, unions, and flags in the component model
I'm splitting this issue out of https://github.com/bytecodealliance/wasmtime/issues/4185 to write up some thoughts on how this can be done. Specifically today the current Wasmtime support for the component model has mappings for many component model types to Rust native types but not all of them. For example integers, strings, lists, tuples, etc, are all mapped directly to Rust types. Basically if the component model types equivalent in Rust is in the Rust standard library that's already implemented. What that leaves to implement, however, is Rust-defined mappings for component model types that are "structural" like records.

This issue is intended to document the current thinking of how we're going to expose this. The general idea is that we'll create a proc-macro crate, probably named something like wasmtime-component-macro, which is an internal dependency of the wasmtime crate. The various macros would then get reexported at the wasmtime::component::* namespace.

Currently the bindings for host types are navigated through three traits: ComponentValue, Lift, and Lower. We'll want a custom derive for all three of these traits. Deriving Lift and Lower require a ComponentValue derive as well, but users should be able to pick one of Lift and Lower without the other one.

record

Records in the component model correspond to structs in Rust. The rough shape of this will be:

use wasmtime::component::{ComponentValue, Lift, Lower}; #[derive(ComponentValue, Lift, Lower)] #[component(record)] struct Foo { #[component(name = "foo-bar-baz")] a: i32, b: u32, }

To typecheck correctly the record type must list fields in the same order as the fields listed in the Rust code for now. Field reordering may be implemented at a later date but for now we'll do strict matching. Fields must have both matching names and matching types.

The #[component(record)] here may seem redundant but it's somewhat required below for variants/enums.

The #[component(name = "...")] is intended to rename the field from the component model's perspective. The type-checking will test against the name specified.

Using this derive on a tuple or empty struct will result in a compile-time error.

variant

Variants roughly correspond to Rust enums:

use wasmtime::component::{ComponentValue, Lift, Lower}; #[derive(ComponentValue, Lift, Lower)] #[component(variant)] enum Foo { #[component(name = "foo-bar-baz")] A(u32), B, }

Typechecking, like records, will check cases in-order and all cases must match in both name and payload. A missing payload in Rust is automatically interpreted as the unit payload in the component model.

Variants with named fields (B { bar: u32 }) will be disallowed. Variants with multiple payloads (B(u32, u32)) will also be disallowed.

Note that #[component(variant)] here distinguishes it from...

enum

use wasmtime::component::{ComponentValue, Lift, Lower}; #[derive(ComponentValue, Lift, Lower)] #[component(enum)] enum Foo { #[component(name = "foo-bar-baz")] A, B, }

Typechecking is similar to variants where the number/names of cases must all match.

Variants with any payload are disallowed in this derive mode.

union

This will, perhaps surprisingly, still map to an enum in Rust since this is still a tagged union, not a literal C union:

use wasmtime::component::{ComponentValue, Lift, Lower}; #[derive(ComponentValue, Lift, Lower)] #[component(union)] enum Foo { A(u32), B(f32), }

The number of cases and the types of each case must match a union definition to correctly typecheck. Union cases don't have names so renaming here isn't needed.

A payload on each enum case in Rust is required, and like with variant it's required to be a tuple-variant with only one element. All other forms of payloads are disallowed. Note that the names in Rust are just informative in Rust, it doesn't affect the ABI or type-checking

flags

These will be a bit "funkier" than the above since there's not something obvious to attach a #[derive] to:

wasmtime::component::flags! { #[derive(Lift, Lower)] flags Foo { #[component(name = "...")] const A; const B; const C; } }

The general idea here is to roughly take inspiration from the bitflags crate in terms of what the generated code does. Ideally this should have a convenient Debug implementation along with various constants to OR-together and such in Rust. The exact syntax here is up for debate, this is just a strawman.

Implementation Details

One caveat is that the ComponentValue/Lift/Lower traits mention internal types in the wasmtime crate which aren't intended to be part of the public API. To solve this the macro will reference items in a path such as:

wasmtime::component::__internal::the_name

The __internal module will be #[doc(hidden)] and will only exist to reexport dependencies needed by the proc-macro. This crate may end up having a bland pub use wasmtime_environ or individual items, whatever works best.

The actual generated trait impls will probably look very similar to the implementations that exist for tuples, and Result<T, E> already present in typed.rs

Alternatives

One alternative to the above is to have #[derive(ComponentRecord)] instead of #[derive(ComponentValue)] #[component(record)] or something like that. While historically some discussions have leaned in this direction with the introduction of Lift and Lower traits I personally feel that the balance is now slightly in the other direction where it would be nice if we can keep derive targeted at the specific traits and then configuration for the derive happens afterwards.
wasm-proposal:component-model
opened by alexcrichton 27
aarch64: Support GOT Relative relocations in PIC mode

👋 Hey,

This PR adds support for using the GOT in PIC mode on AArch64. It implements the relocations for both ELF and MachO.

In order to support the MachO relocations we need to update the object crate. Unfortunately that also brings a few other dependencies which also need to be vetted.

@cfallin has already validated the object crate update up to 0.30.0 in #5434 so we just need to do 0.30.0 -> 0.30.1 and the rest of the new dependencies.

Tested this using the awesome example provided by @acw in zulip. On aarch64-linux I was able to fully test this, on aarch64-darwin the decompiled file seems correct but I don't have a M1/2 machine to link and run it. Maybe @acw can test and report back if anything is wrong?

Fixes #2907 Fixes #5544
cranelift cranelift:module cranelift:area:aarch64

opened by afonso360 0
cranelift: Optimize `select+icmp` into `{s,u}{min,max}`
👋 Hey,

This PR adds some egraph rules to optimize a select + icmp into the appropriate version of the min/max instructions.

We transform the following code:

function %select_sgt_to_smax(i32, i32) -> i32 { block0(v0: i32, v1: i32): v2 = icmp sgt v0, v1 v3 = select v2, v0, v1 return v3 }

Into:

function %select_sgt_to_smax(i32, i32) -> i32 { block0(v0: i32, v1: i32): v2 = smax v0, v1 return v2 }

This is beneficial at least for the RISC-V backend, where we can emit the max instruction in these cases instead of branching.

I've been running this in fuzzgen for a while and it hasn't complained so far!
cranelift
opened by afonso360 0
fuzzgen: Re-enable `srem.i8` on x86

👋 Hey,

With #5470 fixed (Thanks!) we can now re-enable fuzzing for this instruction.

I ran the fuzzer for a while and it didn't point anything else out.
cranelift

opened by afonso360 0
Cranelift: AArch64 backend uses absolute relocations instead of GOT-relative relocations when using is_pic
.clif Test Case

Any code which uses global_value or a call to a function not marked as colocated.

Steps to Reproduce

Compile with is_pic for AArch64

Expected Results

Object file has a GOT relocation and loads the address from the GOT.

Actual Results

Object file has an absolute relocation in a text section. This causes the dynamic linker to crash on macOS as the text segment is read-only.

Versions and Environment

Cranelift version or commit: 44913825b5e93d40ea4ca4fb3d03ab3aa15e1714

Operating system: macOS

Architecture: AArch64

Extra Info

Likely the root cause of https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift/topic/Tricks.20with.20linkage.20internal.20.2F.20module.20external.20functions.3F
bug cranelift
opened by bjorn3 0
Reimplement Wasmtime's DWARF transform and debugging support
We should reimplement the DWARF_wasm to DWARF_native transformation pass that implements the GDB/LLDB debugging support in Wasmtime by separating DWARF translation from DWARF traversal. We could do this by defining a generic DWARF transformation pass that takes a generic visitor implementation, walks the read-only input DWARF, calls the corresponding visitor method for each DIE/attribute/value/line-table entry/etc... in the DWARF to produce a new DWARF entity, and writes that new DWARF entity into the output DWARF that is being built up. We would then implement a DWARF_wasm to DWARF_native visitor.

I think this approach would be much easier to implement, maintain, and ensure correctness of than our current open-coded transformation.

Assuming this interface works out well and we prove it out, it could be worth upstreaming the generic transformation pass and visitor trait into gimli itself (cc @philipc).

Potential hiccups could be that, for our purposes here, the visitor might not be exactly a simple map over the input DWARF (or "functor-ish") in that one DWARF_wasm entity might become multiple DWARF_native entities (making it more "monad-ish", apologies if I'm just muddying the waters with this nomenclature). One example is that what might be a location list entry in Wasm could become multiple location list entries in native code due to register allocation, live range splitting, and spilling.

Testing

Our testing story for debugging support is very poor at the moment and the debugging support is correspondingly buggy. As part of this reimplementation, we should take the opportunity to improve our approach to testing.

I think we can do something like this, in a loop:

generate a random C program with C-Smith

compile the program twice:

to wasm32-wasi

to the host target

attach gdb and/or lldb to

wasmtime running the wasm version

the native binary

single step N times (or until main exits) and at each point assert that:

the native and wasm programs are paused at the same location

the same variables are in scope

the variables in scope have the same values (at least for non-pointer scalars, we can tune the C-Smith flags we use to generate test programs as necessary)

I think this should give us fairly high confidence in the correctness of the new DWARF transform.

Unfortunately, this won't fit into OSS-Fuzz's paradigm super well. It involves a lot of wrangling external processes. I think we can do N iterations under normal cargo test with a fixed corpus of seeds, so that running cargo test twice runs the same set of test programs each time. And then in CI perhaps we can have a job that runs more iterations, or a nightly CI job that does a bunch of iterations, or something like that. To some degree, we can kick this can down the road and figure things out once we have the test infrastructure set up (even just running it manually whenever we touch this code would be a huge improvement over our current debugging testing strategy).

cc @cfallin as this is something we have talked about together in the past.
wasmtime:debugging
opened by fitzgen 0

Standalone JIT-style runtime for WebAssembly, using Cranelift

Related tags

Overview

wasmtime

Guide | Contributing | Website | Chat

Installation

Example

Features

Language Support

Documentation

Comments

Benchmarks

Traps and Stack Traces

Calls

x86_64 -- DONE!

aarch64 -- DONE!

s390x -- DONE!

record

variant

enum

union

flags

Implementation Details

Alternatives

.clif Test Case

Steps to Reproduce

Expected Results

Actual Results

Versions and Environment

Extra Info

Testing

Releases(dev)

dev(Sep 28, 2022)

v4.0.0(Dec 21, 2022)

v3.0.1(Dec 1, 2022)

v3.0.0(Nov 21, 2022)

v2.0.2(Nov 10, 2022)

v1.0.2(Nov 11, 2022)

v2.0.1(Oct 27, 2022)

v2.0.0(Oct 20, 2022)

v1.0.1(Sep 26, 2022)

v1.0.0(Sep 20, 2022)

v0.40.1(Aug 31, 2022)

v0.40.0(Aug 22, 2022)

v0.39.1(Jul 21, 2022)

v0.39.0(Jul 20, 2022)

v0.38.3(Jul 21, 2022)

v0.38.2(Jul 20, 2022)

v0.38.1(Jun 27, 2022)

v0.38.0(Jun 21, 2022)

v0.37.0(May 20, 2022)

v0.36.0(Apr 20, 2022)

v0.35.3(Apr 11, 2022)

v0.35.2(Mar 31, 2022)

v0.34.2(Mar 31, 2022)

v0.35.1(Mar 9, 2022)

v0.35.0(Mar 7, 2022)

v0.34.1(Feb 16, 2022)

v0.33.1(Feb 16, 2022)

v0.34.0(Feb 8, 2022)

v0.33.0(Jan 5, 2022)

v0.32.1(Jan 4, 2022)

Owner

Bytecode Alliance

A (very experimental) WebAssembly backend for Cranelift.

A standalone Forth interpreter/compiler for WebAssembly.

🚀Wasmer is a fast and secure WebAssembly runtime that enables super lightweight containers to run anywhere

WebAssembly to Lua translator, with runtime

Lunatic is an Erlang-inspired runtime for WebAssembly

A prototype WebAssembly linker using module linking.

Zaplib is an open-source library for speeding up web applications using Rust and WebAssembly.

A template for kick starting a Rust and WebAssembly project using wasm-pack.

Client for integrating private analytics in fast and reliable libraries and apps using Rust and WebAssembly

Lumen - A new compiler and runtime for BEAM languages

A high-performance, secure, extensible, and OCI-complaint JavaScript runtime for WasmEdge.

Wasm runtime written in Rust

Sealed boxes implementation for Rust/WebAssembly.

WebAssembly on Rust is a bright future in making application runs at the Edge or on the Serverless technologies.

WebAssembly modules that use Azure services

WebAssembly Service Porter

`wasmtime`

`record`

`variant`

`enum`

`union`

`flags`

`.clif` Test Case