Turn your webcam into a face-detector with Rust, onnxruntime and actix_web!

Overview

InferCam ONNX

Turn your webcam into a face detector with Rust, onnxruntime and the lightweight ultraface network.

InferCam Example Image

Overview

  • Images are captured from the /dev/video0 interface using the libv4l-dev library on Linux with the rscam crate.
  • Captured frames are passed through a pre-trained network stored in the onnx format, powered by the no-frills onnxruntime wrapper tract.
  • Post-processing (mainly non-maximum suppression) is done in native Rust.
  • Detected faces are drawn as bounding boxes on the frame with their confidences using imageproc.
  • Streams from the raw video and the face detection are served in the browser with the performant actix_web framework.

Building & Running

  • Make sure that you have the libv4l-dev package installed on your system:
sudo apt update && sudo apt install -y libv4l-dev
  • Download a build of the onnxruntime from Microsoft here and install it on your system (e.g. copying the .so files to `~/.local/lib).

  • Download the pretrained ultraface networks to the root of the repository:

wget https://github.com/onnx/models/raw/master/vision/body_analysis/ultraface/models/version-RFB-320.onnx
wget https://github.com/onnx/models/raw/master/vision/body_analysis/ultraface/models/version-RFB-640.onnx
  • Build in release mode (there is a difference of factor 32 in fps between release and debug mode on my system):
cargo build --release
  • Run the application:
# Without logging
./target/release/infercam_onnx

# With debug logging
RUST_LOG=debug ./target/release/infercam_onnx
  • You can see the face detection at http://127.0.0.1:8080/. The raw webcam stream is also available at http://127.0.0.1:8080/video_stream.
  • There are two command line arguments:
    • --port XYZ binds to the port XYZ.
    • --bindall publishes the routes on all network interfaces, not just the localhost.

Comments

I developed this project for fun and for auto-didactic purposes. Since I have not worked with Rust professionally, some things might not be completely idiomatic or top-notch performant. Nevertheless, the application runs at around 8-9fps on my private laptop with a i7-6600U and no dedicated GPU (when compiled with optimizations in release mode).

Initially, I considered using the onnxruntime crate, but that did not work out of the box and when I checked on GitHub, the project seems to be a lot less active than tract.

Not having a dedicated GPU on my private laptop, I did not go through the process of setting up inference with onnxruntime on GPU, but it should not be so much different. An accepted inefficiency in the current implementation is that we clone the network output into vectors for sorting. With more time, I would take a look at sorting the output in-place.

It also took a while to understand the exact meaning of the network output since I could not find a paper/blogpost explaining it in the level of detail that I needed here. At the end, I went to the python demo code and reverse-engineered the meaning. I believe the output can be interpreted like this:

  • K: Number of bounding box proposals.
  • result[0]: 1xKx2 tensor of bounding box confidences. The confidences for having a face in the bounding box are in the second column, so at [:,:,1].
  • result[1]: 1xKx4 tensor of bounding box candidate border points.

Every candidate bounding box consists of the relative coordinates [x_top_left, y_top_left, x_bottom_right, y_bottom_right]. They can be multiplied with the width and height of the original image to obtain the bounding box coordinates for the real frame.

Before this project, I had only used non-maximum suppression as library function and had an idea of how it worked. Implementing it myself in Rust was fun :) All in all, it was a nice project for me and a valuable proof of concept that Rust is definitely a candidate language when considering to write an application for inference on edge devices.

You might also like...
🔭 interactively explore `onnx` networks in your CLI.

nnli Interactively explore onnx networks in your CLI. Get nnli 🎉 From Cargo cargo install nnli From Github git clone https://github.com/drbh/nnli.git

Ecosystem of libraries and tools for writing and executing extremely fast GPU code fully in Rust.

Ecosystem of libraries and tools for writing and executing extremely fast GPU code fully in Rust.

Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.

The Rust CUDA Project An ecosystem of libraries and tools for writing and executing extremely fast GPU code fully in Rust Guide | Getting Started | Fe

Robust and Fast tokenizations alignment library for Rust and Python
Robust and Fast tokenizations alignment library for Rust and Python

Robust and Fast tokenizations alignment library for Rust and Python

Narwhal and Tusk A DAG-based Mempool and Efficient BFT Consensus.

This repo contains a prototype of Narwhal and Tusk. It supplements the paper Narwhal and Tusk: A DAG-based Mempool and Efficient BFT Consensus.

MesaTEE GBDT-RS : a fast and secure GBDT library, supporting TEEs such as Intel SGX and ARM TrustZone

MesaTEE GBDT-RS : a fast and secure GBDT library, supporting TEEs such as Intel SGX and ARM TrustZone MesaTEE GBDT-RS is a gradient boost decision tre

[WIP] An experimental Java-like language and it's virtual machine, for learning Java and JVM.

Sky VM An experimental Java-like language and it's virtual machine, for learning Java and JVM. Dependencies Rust (rust-lang/rust) 2021 Edition, dual-l

Some hacks and failed experiments surrounding nvidia's gamestream protocol and sunshine/moonlight implementations

Sunrise This repository contains a bunch of experiments surrounding the nvidia gamestream protocol and reimplementations in the form of sunshine and m

Msgpack serialization/deserialization library for Python, written in Rust using PyO3, and rust-msgpack. Reboot of orjson. msgpack.org[Python]

ormsgpack ormsgpack is a fast msgpack library for Python. It is a fork/reboot of orjson It serializes faster than msgpack-python and deserializes a bi

Comments
  • Refactor to client/server model with axum

    Refactor to client/server model with axum

    Large-scale overhaul separating the functionality into capturing in a client and inference/serving in a server, also switching the web-framework from actix-web to axum.

    opened by sgasse 0
Owner
null
A Voice Activity Detector rust library using the Silero VAD model.

Voice Activity Detector Provides a model and extensions for detecting speech in audio. Standalone Voice Activity Detector This crate provides a standa

Nick Keenan 3 Apr 3, 2024
A Python CLI tool that finds all third-party packages imported into your Python project

python-third-party-imports This is a Python CLI tool built with Rust that finds all third-party packages imported into your Python project. Install Yo

Maksudul Haque 24 Feb 1, 2023
Compile your WebAssembly programs into SPIR-V shaders

wasm2spirv - Compile your WebAssembly programs into SPIR-V shaders Warning wasm2spirv is still in early development, and not production ready. This re

Alex Andreba 18 Jul 25, 2023
m2cgen (Model 2 Code Generator) - is a lightweight library which provides an easy way to transpile trained statistical models into a native code

Transform ML models into a native code (Java, C, Python, Go, JavaScript, Visual Basic, C#, R, PowerShell, PHP, Dart, Haskell, Ruby, F#, Rust) with zero dependencies

Bayes' Witnesses 2.3k Dec 31, 2022
A fast and cross-platform Signed Distance Function (SDF) viewer, easily integrated with your SDF library.

SDF Viewer (demo below) A fast and cross-platform Signed Distance Function (SDF) viewer, easily integrated with your SDF library. A Signed Distance Fu

null 39 Dec 21, 2022
How to: Run Rust code on your NVIDIA GPU

Status This documentation about an unstable feature is UNMAINTAINED and was written over a year ago. Things may have drastically changed since then; r

null 343 Dec 22, 2022
Talk with your machine in this minimalistic Rust crate!

Speak Speak is a simple, easy to use Natural Language Processor (NLP) written in Rust. Why use Speak? Speak uses a custom engine, and to setup you jus

Alex 15 Oct 11, 2022
Your one stop CLI for ONNX model analysis.

Your one stop CLI for ONNX model analysis. Featuring graph visualization, FLOP counts, memory metrics and more! ⚡️ Quick start First, download and ins

Christopher Fleetwood 20 Dec 30, 2022
Believe in AI democratization. llama for nodejs backed by llama-rs, work locally on your laptop CPU. support llama/alpaca model.

llama-node Large Language Model LLaMA on node.js This project is in an early stage, the API for nodejs may change in the future, use it with caution.

Genkagaku.GPT 145 Apr 10, 2023
An NVIDIA SMI'esk GPU Monitoring tool for your terminal.

NVTOP An NVIDIA SMI'esk GPU Monitoring tool for your terminal. art by stable-diffusion + Maz Contents: usage prerequisites installation why troublesho

Jer 17 Oct 14, 2023