(MERGED) Rust bindings for TVM runtime



The RFC is closed and this has been merge into TVM.

TVM Runtime Frontend Support

This crate provides an idiomatic Rust API for TVM runtime frontend as part of the ongoing RFC. Currently this requires Nightly Rust.

Checkout the docs.

What Does This Crate Offer?

Here is a major workflow

  1. Train your Deep Learning model using any major framework such as PyTorch, Apache MXNet or TensorFlow
  2. Use TVM to build optimized model artifacts on a supported context such as CPU, GPU, OpenCL, Vulkan, VPI, ROCM, etc.
  3. Deploy your models using Rust ❤️

Example: Deploy Image Classification from Pretrained Resnet18 on ImageNet1k

Please checkout examples/resnet for the complete end-to-end example.

Here's a Python snippet for downloading and building a pretrained Resnet18 via MXNet and TVM

block = get_model('resnet18_v1', pretrained=True)
sym, params = nnvm.frontend.from_mxnet(block)
# add the softmax layer for prediction
net = nnvm.sym.softmax(sym)
# compile the model
with nnvm.compiler.build_config(opt_level=opt_level):
    graph, lib, params = nnvm.compiler.build(
        net, target, shape={"data": data_shape}, params=params)
# same the model artifacts
lib.save(os.path.join(target_dir, "deploy_lib.o"))
cc.create_shared(os.path.join(target_dir, "deploy_lib.so"),
                [os.path.join(target_dir, "deploy_lib.o")])

with open(os.path.join(target_dir, "deploy_graph.json"), "w") as fo:
with open(os.path.join(target_dir,"deploy_param.params"), "wb") as fo:

Now, we need to input the artifacts to create and run the Graph Runtime to detect our input cat image


as demostrated in the following Rust snippet

let graph = fs::read_to_string("deploy_graph.json")?;
// load module
let lib = Module::load(&Path::new("deploy_lib.so"))?;
// get the global TVM graph runtime function
let runtime_create_fn = Function::get_function("tvm.graph_runtime.create", true).unwrap();

let runtime_create_fn_ret = call_packed!(
// get graph runtime module
let graph_runtime_module = runtime_create_fn_ret.to_module();
// get the registered `load_params` from runtime module
let load_param_fn = graph_runtime_module
    .get_function("load_params", false)
// parse parameters and convert to TVMByteArray
let params: Vec<u8> = fs::read("deploy_param.params")?;
let barr = TVMByteArray::from(&params);
// load the parameters
call_packed!(load_param_fn, &barr)?;
// get the set_input function
let set_input_fn = graph_runtime_module
    .get_function("set_input", false)

call_packed!(set_input_fn, "data", &input)?;
// get `run` function from runtime module
let run_fn = graph_runtime_module.get_function("run", false).unwrap();
// execute the run function. Note that it has no argument.
// prepare to get the output
let output_shape = &mut [1, 1000];
let output = empty(output_shape, TVMContext::cpu(0), TVMType::from("float"));
// get the `get_output` function from runtime module
let get_output_fn = graph_runtime_module
    .get_function("get_output", false)
// execute the get output function
call_packed!(get_output_fn, &0, &output)?;
// flatten the output as Vec<f32>
let output = output.to_vec::<f32>()?;


Please follow TVM installations, export TVM_HOME=/path/to/tvm and add libtvm_runtime to your LD_LIBRARY_PATH.

Note: To run the end-to-end examples and tests, tvm, nnvm and topi need to be added to your PYTHONPATH or it's automatic via an Anaconda environment when install individually.

Supported TVM Functionalities

Use TVM to Generate Shared Library

One can use the following Python snippet to generate add_gpu.so which add two vectors on GPU.

import os
import tvm
from tvm.contrib import cc

def test_add(target_dir):
    if not tvm.module.enabled("cuda"):
        print(f"skip {__file__} because cuda is not enabled...")
    n = tvm.var("n")
    A = tvm.placeholder((n,), name='A')
    B = tvm.placeholder((n,), name='B')
    C = tvm.compute(A.shape, lambda i: A[i] + B[i], name="C")
    s = tvm.create_schedule(C.op)
    bx, tx = s[C].split(C.op.axis[0], factor=64)
    s[C].bind(bx, tvm.thread_axis("blockIdx.x"))
    s[C].bind(tx, tvm.thread_axis("threadIdx.x"))
    fadd_cuda = tvm.build(s, [A, B, C], "cuda", target_host="llvm", name="myadd")

    fadd_cuda.save(os.path.join(target_dir, "add_gpu.o"))
    fadd_cuda.imported_modules[0].save(os.path.join(target_dir, "add_gpu.ptx"))
    cc.create_shared(os.path.join(target_dir, "add_gpu.so"),
            [os.path.join(target_dir, "add_gpu.o")])

if __name__ == "__main__":
    import sys
    if len(sys.argv) != 2:

Run the Generated Shared Library

The following code snippet demonstrates how to load and test the generated shared library (add_gpu.so) in Rust.

extern crate tvm_frontend as tvm;

use tvm::*;

fn main() {
    let shape = &mut [2];
    let mut data = vec![3f32, 4.0];
    let mut arr = empty(shape, TVMContext::gpu(0), TVMType::from("float"));
    let mut ret = empty(shape, TVMContext::gpu(0), TVMType::from("float"));
    let path = Path::new("add_gpu.so");
    let ptx = Path::new("add_gpu.ptx");
    let mut fadd = Module::load(path).unwrap();
    let fadd_dep = Module::load(ptx).unwrap();
    function::Builder::from(&mut fadd)
        .set_output(&mut ret)

    assert_eq!(ret.to_vec::<f32>().unwrap(), vec![6f32, 8.0]);

Note: it is required to instruct the rustc to link to the generated add_gpu.so in runtime, for example by cargo:rustc-link-search=native=add_gpu.

See the tests and examples custom build.rs for more details.

Convert and Register a Rust Function as a TVM Packed Function

One can use register_global_func! macro to convert and register a Rust function of type fn(&[TVMArgValue]) -> Result<TVMRetValue> to a global TVM packed function as follows

extern crate tvm_frontend as tvm;

use tvm::*;

fn main() {
    register_global_func! {
        fn sum(args: &[TVMArgValue]) -> Result<TVMRetValue> {
            let mut ret = 0f32;
            let shape = &mut [2];
            for arg in args.iter() {
                let e = empty(shape, TVMContext::cpu(0), TVMType::from("float"));
                let arr = arg.to_ndarray().copy_to_ndarray(e).unwrap();
                let rnd: ArrayD<f32> = ArrayD::try_from(&arr).unwrap();
                ret += rnd.scalar_sum();
            let ret_val = TVMRetValue::from(&ret);

    let shape = &mut [2];
    let mut data = vec![3f32, 4.0];
    let mut arr = empty(shape, TVMContext::cpu(0), TVMType::from("float"));
    let mut registered = function::Builder::default();
        .get_function("sum", true)

    assert_eq!(registered.invoke().unwrap().to_float(), 14f64);
You might also like...
sparse linear algebra library for rust

sprs, sparse matrices for Rust sprs implements some sparse matrix data structures and linear algebra algorithms in pure Rust. The API is a work in pro

PyO3-based Rust binding of NumPy C-API

rust-numpy Rust bindings for the NumPy C-API API documentation Latest release (possibly broken) Current Master Requirements Rust = 1.41.1 Basically,

DataFrame / Series data processing in Rust

black-jack While PRs are welcome, the approach taken only allows for concrete types (String, f64, i64, ...) I'm not sure this is the way to go. I want

A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, written in Rust

Datafuse Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture Datafuse is a Real-Time Data Processing & Analytics DBMS wit

ConnectorX - Fastest library to load data from DB to DataFrames in Rust and Python
ConnectorX - Fastest library to load data from DB to DataFrames in Rust and Python

ConnectorX enables you to load data from databases into Python in the fastest and most memory efficient way.

Fill Apache Arrow record batches from an ODBC data source in Rust.

arrow-odbc Fill Apache Arrow arrays from ODBC data sources. This crate is build on top of the arrow and odbc-api crate and enables you to read the dat

A new arguably faster implementation of Apache Spark from scratch in Rust

vega Previously known as native_spark. Documentation A new, arguably faster, implementation of Apache Spark from scratch in Rust. WIP Framework tested

Work out how to read Parquet files in a browser using web assembly (via the Rust toolchain)

wasm-pack-template A template for kick starting a Rust and WebAssembly project using wasm-pack. Tutorial | Chat Built with 🦀 🕸 by The Rust and WebAs

bspipe A Rust implementation of Bidirectional Secure Pipe

bspipe A Rust implementation of Bidirectional Secure Pipe

Ehsan M. Kermani
Science and software enthusiast
Ehsan M. Kermani
Orkhon: ML Inference Framework and Server Runtime

Orkhon: ML Inference Framework and Server Runtime Latest Release License Build Status Downloads Gitter What is it? Orkhon is Rust framework for Machin

Theo M. Bulut 129 Dec 21, 2022
Yet Another Technical Analysis library [for Rust]

YATA Yet Another Technical Analysis library YaTa implements most common technical analysis methods and indicators. It also provides you an interface t

Dmitry 197 Dec 29, 2022
Dataframe structure and operations in Rust

Utah Utah is a Rust crate backed by ndarray for type-conscious, tabular data manipulation with an expressive, functional interface. Note: This crate w

Suchin 139 Sep 26, 2022
Rust DataFrame library

Polars Blazingly fast DataFrames in Rust & Python Polars is a blazingly fast DataFrames library implemented in Rust. Its memory model uses Apache Arro

Ritchie Vink 11.9k Jan 8, 2023
A Rust DataFrame implementation, built on Apache Arrow

Rust DataFrame A dataframe implementation in Rust, powered by Apache Arrow. What is a dataframe? A dataframe is a 2-dimensional tabular data structure

Wakahisa 287 Nov 11, 2022
Rayon: A data parallelism library for Rust

Rayon Rayon is a data-parallelism library for Rust. It is extremely lightweight and makes it easy to convert a sequential computation into a parallel

null 7.8k Jan 8, 2023
A Rust crate that reads and writes tfrecord files

tfrecord-rust The crate provides the functionality to serialize and deserialize TFRecord data format from TensorFlow. Features Provide both high level

null 22 Nov 3, 2022
Official Rust implementation of Apache Arrow

Native Rust implementation of Apache Arrow Welcome to the implementation of Arrow, the popular in-memory columnar format, in Rust. This part of the Ar

The Apache Software Foundation 1.3k Jan 9, 2023
Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow

Parquet2 This is a re-write of the official parquet crate with performance, parallelism and safety in mind. The five main differentiators in compariso

Jorge Leitao 237 Jan 1, 2023
Apache TinkerPop from Rust via Rucaja (JNI)

Apache TinkerPop from Rust An example showing how to call Apache TinkerPop from Rust via Rucaja (JNI). This repository contains two directories: java

null 8 Sep 27, 2022