Wonnx is a GPU-accelerated ONNX inference run-time written 100% in Rust, ready for the web.
wgpu
)
Supported Platforms (enabled by API | Windows | Linux & Android | macOS & iOS |
---|---|---|---|
Vulkan | |
|
|
Metal | |
||
DX12 | |
||
DX11 | |
||
GLES3 | |
Getting started
- Install Rust
- Install Vulkan, Metal, or DX12 for the GPU API.
- Ensure Git LFS is installed
- git clone this repo.
git clone https://github.com/webonnx/wonnx.git
git lfs install
From the command line
Ensure Git LFS is initialized and has downloaded the model files (in wonnx/examples/data/models
). Then, you're all set! You can run an example:
cargo run --example squeeze --release
Or you can try the CLI (see the README for more information):
cargo run --release -- info ./data/models/opt-squeeze.mnist
cargo run --release -- infer ./data/models/opt-squeeze.onnx -i data=./data/images/pelican.jpeg --labels ./data/models/squeeze-labels.txt --top 3
From Python
pip install wonnx
And then:
from wonnx import PySession
session = PySession.from_path(
"../data/models/single_relu.onnx"
)
inputs = {"x": [-1.0, 2.0]}
assert session.run(inputs) == {"y": [0.0, 2.0]}
To build the Python module for development:
cd wonnx-py
python3 -m venv .env
source .env/bin/activate
pip install maturin
maturin develop
Then run python3
with the above Python code!
Running a model from scratch
- To run an onnx model, first simplify it with onnx-simplifier, with the command:
# pip install -U pip && pip install onnx-simplifier
python -m onnxsim mnist-8.onnx opt-mnist.onnx
- Then you can run it following the example in the examples folder:
cargo run --example mnist --release
fn main() -> HashMap<String, Vec<f32>> {
let mut input_data = HashMap::new();
let image = load_squeezenet_image(); // Load image
input_data.insert("data".to_string(), InputTensor::F32(image.as_slice().unwrap()));
let session = pollster::block_on(wonnx::Session::from_path(
"examples/data/models/opt-squeeze.onnx",
))
.expect("session did not create");
let result = pollster::block_on(session.run(input_data)).unwrap();
let result = result["squeezenet0_flatten0_reshape0"];
let mut probabilities = result.iter().enumerate().collect::<Vec<_>>();
probabilities.sort_unstable_by(|a, b| b.1.partial_cmp(a.1).unwrap());
assert_eq!(probabilities[0].0, 22);
}
Examples are available in the
examples
folder
Tested models
- Squeezenet
- MNIST
GPU selection
You may set the following environment variables to influence GPU selection by WGPU:
WGPU_ADAPTER_NAME
with a substring of the name of the adapter you want to use (e.g.1080
will matchNVIDIA GeForce 1080ti
).WGPU_BACKEND
with a comma separated list of the backends you want to use (vulkan
,metal
,dx12
,dx11
, orgl
).WGPU_POWER_PREFERENCE
with the power preference to choose when a specific adapter name isn't specified (high
orlow
)
Contribution: On implementing a new Operator
Contribution are very much welcomed even without large experience in DL, WGSL, or Rust. I hope that, this project can be a sandbox for all of us to learn more about those technologies beyond this project initial scope.
To implement an operator all you have to do is:
- Add a new matching pattern in
compiler.rs
- Retrieve its attributes values using the
get_attribute
function:
let alpha = get_attribute("alpha", Some(1.0), node);
// or without default value
let alpha = get_attribute::<f32>("alpha", None, node);
- Add any variable you want to use in the WGSL shader using
context
. - Write a new WGSL template in the
templates
folder.
Available types are in
structs.wgsl
but you can also generate new ones within your templates.
- Respect the binding layout that each entry is incremented by 1 starting from 0, with input first and output last. If the number of binding is above 4. Increment the binding group. You can change the input within
sequencer.rs
- Write the logic.
There is default variables in the context:
{{ i_lens[0] }}
: the length of the input 0. This also work for output:{{ o_lens[0] }}
and other input{{ i_lens[1] }}
{{ i_shape[0] }}
: the array of dimensions of input 0. To get the first dimension of the array, just use:{{ i_shape[0][0] }}
{{ i_chunks[0] }}
: the size of the chunks of each dimensions of input 0. By default, each variable is represented as a long array of values where to get to specific values you have to move by chunks. Those chunks are represented within this variable. To get the size of the chunks of the first dimensions use:{{ i_chunks[0][0] }}
.{{ op_type }}
the op type as some op_type like activation are using the same template.
- Test it using the utils function and place it in the tests folder. The test can look as follows:
#[test]
fn test_matmul_square_matrix() {
// USER INPUT
let n = 16;
let mut input_data = HashMap::new();
let data_a = ndarray::Array2::eye(n);
let mut data_b = ndarray::Array2::<f32>::zeros((n, n));
data_b[[0, 0]] = 0.2;
data_b[[0, 1]] = 0.5;
let sum = data_a.dot(&data_b);
input_data.insert("A".to_string(), data_a.as_slice().unwrap());
input_data.insert("B".to_string(), data_b.as_slice().unwrap());
let n = n as i64;
let model = model(graph(
vec![tensor("A", &[n, n]), tensor("B", &[n, n])],
vec![tensor("C", &[n, n])],
vec![],
vec![],
vec![node(vec!["A", "B"], vec!["C"], "MatMul", "MatMul", vec![])],
));
let session =
pollster::block_on(wonnx::Session::from_model(model)).expect("Session did not create");
let result = pollster::block_on(session.run(input_data)).unwrap();
assert_eq!(result["C"].as_slice(), sum.as_slice().unwrap());
}
Check out tera documentation for other templating operation: https://tera.netlify.app/docs/
- If at any point you want to do optimisation of several node you can do it within
sequencer.rs
.