This is a really interesting project! Really nice work :)
I have some small suggestions to improve the Rust performance, and am willing to contribute by creating a PR if you like. I thought it better to post an issue for discussion before opening an unsolicited PR :)
Essentially, it's the following things:
- LTO is not being applied to the release build that we're measuring in the mini benchmark. We have
thin
LTO enabled, but it's quite a bit faster with full fat
LTO and codegen-units=1
, possibly due to the code residing in two different modules. This bumps the performance of the WASM build a little too.
- For the
test-rust
native target, there's a significant performance improvement if we allow LLVM to target more recent hardware; by default it's just doing the "generic" baseline x64 instruction set, i.e. SSE, but no AVX / AVX2 etc.
- This may need some discussion -- it's debatable whether we should just set the target-cpu flag to
native
and compile for the exact machine, or whether it's more representative to target something more generic (but supporting AVX2) like the x86-64-v3
target.
- The same applied to the ARM builds, although I don't know offhand what a good representative target would be there.
- Benchmarks for the normal (JIT) dotnet are more representative if we run the benchmark a few times in a loop.
On the main branch, I'm getting results that look like this currently:
- test-rust: 775 ms
- test-wasm: 1060 ms
- test-dotnet: 900 ms
With the above applied, and after a 10 iterations or so for the dotnet JIT to settle:
- test-rust: 542 ms
- test-wasm: 925 ms
- test-dotnet: 720 ms
If this looks like a useful contribution, let me know and I'll open a little PR to add the required configs.