A Demo server serving Bert through ONNX with GPU written in Rust with <3

Overview

Demo BERT ONNX server written in rust

This demo showcase the use of onnxruntime-rs on BERT with a GPU on CUDA 11 served by actix-web and tokenized with Hugging Face tokenizer.

Requirement

  • Linux x86_64
  • NVIDIA GPU with CUDA 11 (Not sure if CUDA 10 works)
  • Rust (obviously)
  • git lfs for the models

Installation

export ORT_USE_CUDA=1
git lfs install
cargo build --release

Run

cargo run --release

or

export LD_LIBRARY_PATH=path/to/onnxruntime-linux-x64-gpu-1.8.0/lib:${LD_LIBRARY_PATH}
./target/release/onnx-server

Call

curl http://localhost:8080/\?data=Hello+World

Python alternative

To compare with standart python server with FastAPI, I've added the code for the same server in src called python_alternative.py

Install

pip install -r requirements.txt

Run

cd src
uvicorn python_alternative:app --reload --workers 1

Call

curl http://localhost:8000/\?data=Hello+World

training and converting to ONNX

The training pipeline is in another repo: https://github.com/haixuanTao/bert-onnx-rs-pipeline

You might also like...
How to: Run Rust code on your NVIDIA GPU

Status This documentation about an unstable feature is UNMAINTAINED and was written over a year ago. Things may have drastically changed since then; r

🐉 Making Rust a first-class language and ecosystem for GPU shaders 🚧
🐉 Making Rust a first-class language and ecosystem for GPU shaders 🚧

🐉 rust-gpu Rust as a first-class language and ecosystem for GPU graphics & compute shaders Current Status 🚧 Note: This project is still heavily in d

A gpu accelerated (optional) neural network Rust crate.

Intricate A GPU accelerated library that creates/trains/runs neural networks in pure safe Rust code. Architechture overview Intricate has a layout ver

A repo for learning how to parallelize computations in the GPU using Apple's Metal, in Rust.

Metal playground in rust Made for learning how to parallelize computations in the GPU using Apple's Metal, in Rust, via the metal crate. Overview The

LLaMa 7b with CUDA acceleration implemented in rust. Minimal GPU memory needed!

LLaMa 7b in rust This repo contains the popular LLaMa 7b language model, fully implemented in the rust programming language! Uses dfdx tensors and CUD

Signed distance functions + Rust (CPU & GPU) = ❤️❤️
Signed distance functions + Rust (CPU & GPU) = ❤️❤️

sdf-playground Signed distance functions + Rust (CPU & GPU) = ❤️❤️ Platforms: Windows, Mac & Linux. About sdf-playground is a demo showcasing how you

rust-gpu CLI driver

rust-gpu-driver Experiment to make rust-gpu more accessible as a GPU shading language in various projects. DISCLAIMER: This is an unstable experiment

Open Machine Intelligence Framework for Hackers. (GPU/CPU)

Leaf • Introduction Leaf is a open Machine Learning Framework for hackers to build classical, deep or hybrid machine learning applications. It was ins

Open deep learning compiler stack for cpu, gpu and specialized accelerators
Open deep learning compiler stack for cpu, gpu and specialized accelerators

Open Deep Learning Compiler Stack Documentation | Contributors | Community | Release Notes Apache TVM is a compiler stack for deep learning systems. I

Owner
Xavier Tao
Xavier Tao
High-level non-blocking Deno bindings to the rust-bert machine learning crate.

bertml High-level non-blocking Deno bindings to the rust-bert machine learning crate. Guide Introduction The ModelManager class manages the FFI bindin

Carter Snook 14 Dec 15, 2022
Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference

Sonos' Neural Network inference engine. This project used to be called tfdeploy, or Tensorflow-deploy-rust. What ? tract is a Neural Network inference

Sonos, Inc. 1.5k Jan 8, 2023
Your one stop CLI for ONNX model analysis.

Your one stop CLI for ONNX model analysis. Featuring graph visualization, FLOP counts, memory metrics and more! ⚡️ Quick start First, download and ins

Christopher Fleetwood 20 Dec 30, 2022
🔭 interactively explore `onnx` networks in your CLI.

nnli Interactively explore onnx networks in your CLI. Get nnli ?? From Cargo cargo install nnli From Github git clone https://github.com/drbh/nnli.git

drbh 18 Nov 27, 2023
🦀 Example of serving deep learning models in Rust with batched prediction

rust-dl-webserver This project provides an example of serving a deep learning model with batched prediction using Rust. In particular it runs a GPT2 m

Evan Pete Walsh 28 Dec 15, 2022
A fun, hackable, GPU-accelerated, neural network library in Rust, written by an idiot

Tensorken: A Fun, Hackable, GPU-Accelerated, Neural Network library in Rust, Written by an Idiot (work in progress) Understanding deep learning from t

Kurt Schelfthout 44 May 6, 2023
Rust based Cross-GPU Machine Learning

HAL : Hyper Adaptive Learning Rust based Cross-GPU Machine Learning. Why Rust? This project is for those that miss strongly typed compiled languages.

Jason Ramapuram 83 Dec 20, 2022
A real-time implementation of "Ray Tracing in One Weekend" using nannou and rust-gpu.

Real-time Ray Tracing with nannou & rust-gpu An attempt at a real-time implementation of "Ray Tracing in One Weekend" by Peter Shirley. This was a per

null 89 Dec 23, 2022
Ecosystem of libraries and tools for writing and executing extremely fast GPU code fully in Rust.

Ecosystem of libraries and tools for writing and executing extremely fast GPU code fully in Rust.

Riccardo D'Ambrosio 2.1k Jan 5, 2023
Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.

The Rust CUDA Project An ecosystem of libraries and tools for writing and executing extremely fast GPU code fully in Rust Guide | Getting Started | Fe

Rust GPU 2.1k Dec 30, 2022