Orkhon: ML Inference Framework and Server Runtime

Overview


Orkhon: ML Inference Framework and Server Runtime

Latest Release Crates.io
License Crates.io
Build Status Build Status
Downloads Crates.io
Gitter

What is it?

Orkhon is Rust framework for Machine Learning to run/use inference/prediction code written in Python, frozen models and process unseen data. It is mainly focused on serving models and processing unseen data in a performant manner. Instead of using Python directly and having scalability problems for servers this framework tries to solve them with built-in async API.

Main features

  • Sync & Async API for models.
  • Easily embeddable engine for well-known Rust web frameworks.
  • API contract for interacting with Python code.
  • High processing throughput
    • ~4.8361 GiB/s prediction throughput
    • 3_000 concurrent requests takes ~4ms on average
  • Python Module caching

Installation

You can include Orkhon into your project with;

[dependencies]
orkhon = "0.2"

Dependencies

You will need:

  • If you use pymodel feature, Python dev dependencies should be installed and have proper python runtime to use Orkhon with your project.
  • If you want to have tensorflow inference. Installing tensorflow as library for linking is required.
  • ONNX interface doesn't need extra dependencies from the system side.
  • Point out your PYTHONHOME environment variable to your Python installation.

Python API contract

For Python API contract you can take a look at the Project Documentation.

Examples

Request a Tensorflow prediction asynchronously

 use orkhon::prelude::*;
 use orkhon::tcore::prelude::*;
 use orkhon::ttensor::prelude::*;
 use rand::*;
 use std::path::PathBuf;

let o = Orkhon::new()
    .config(
        OrkhonConfig::new()
            .with_input_fact_shape(InferenceFact::dt_shape(f32::datum_type(), tvec![10, 100])),
    )
    .tensorflow(
        "model_which_will_be_tested",
        PathBuf::from("tests/protobuf/manual_input_infer/my_model.pb"),
    )
    .shareable();

let mut rng = thread_rng();
let vals: Vec<_> = (0..1000).map(|_| rng.gen::<f32>()).collect();
let input = tract_ndarray::arr1(&vals).into_shape((10, 100)).unwrap();

let o = o.get();
let handle = async move {
    let processor = o.tensorflow_request_async(
       "model_which_will_be_tested",
       ORequest::with_body(TFRequest::new().body(input.into())),
    );
    processor.await
};
let resp = block_on(handle).unwrap();

Request an ONNX prediction synchronously

This example needs onnxmodel feature enabled.

use orkhon::prelude::*;
use orkhon::tcore::prelude::*;
use orkhon::ttensor::prelude::*;
use rand::*;
use std::path::PathBuf;

 let o = Orkhon::new()
     .config(
         OrkhonConfig::new()
             .with_input_fact_shape(InferenceFact::dt_shape(f32::datum_type(), tvec![10, 100])),
     )
     .onnx(
         "model_which_will_be_tested",
         PathBuf::from("tests/protobuf/onnx_model/example.onnx"),
     )
     .build();

 let mut rng = thread_rng();
 let vals: Vec<_> = (0..1000).map(|_| rng.gen::<f32>()).collect();
 let input = tract_ndarray::arr1(&vals).into_shape((10, 100)).unwrap();

 let resp = o
     .onnx_request(
         "model_which_will_be_tested",
         ORequest::with_body(ONNXRequest::new().body(input.into())),
     )
     .unwrap();
 assert_eq!(resp.body.output.len(), 1);

License

License is MIT

Documentation

Official documentation is hosted on docs.rs.

Getting Help

Please head to our Gitter or use StackOverflow

Discussion and Development

We use Gitter for development discussions. Also please don't hesitate to open issues on GitHub ask for features, report bugs, comment on design and more! More interaction and more ideas are better!

Contributing to Orkhon Open Source Helpers

All contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome.

A detailed overview on how to contribute can be found in the CONTRIBUTING guide on GitHub.

You might also like...
A fast, safe and easy to use reinforcement learning framework in Rust.
A fast, safe and easy to use reinforcement learning framework in Rust.

RSRL (api) Reinforcement learning should be fast, safe and easy to use. Overview rsrl provides generic constructs for reinforcement learning (RL) expe

Machine learning framework for building object trackers and similarity search engines

Similari Similari is a framework that helps build sophisticated tracking systems. The most frequently met operations that can be efficiently implement

Framework and Language for Neuro-Symbolic Programming
Framework and Language for Neuro-Symbolic Programming

Scallop Scallop is a declarative language designed to support rich symbolic reasoning in AI applications. It is based on Datalog, a logic rule-based q

ThRust is a software framework for thermodynamic and probabilistic computing.
ThRust is a software framework for thermodynamic and probabilistic computing.

ThRust ThRust is a Rust crate that provides a framework for thermodynamic and probabilistic computing. This package currently supports the following f

A Demo server serving Bert through ONNX with GPU written in Rust with 3

Demo BERT ONNX server written in rust This demo showcase the use of onnxruntime-rs on BERT with a GPU on CUDA 11 served by actix-web and tokenized wit

A Rust machine learning framework.

Linfa linfa (Italian) / sap (English): The vital circulating fluid of a plant. linfa aims to provide a comprehensive toolkit to build Machine Learning

Open Machine Intelligence Framework for Hackers. (GPU/CPU)

Leaf • Introduction Leaf is a open Machine Learning Framework for hackers to build classical, deep or hybrid machine learning applications. It was ins

Xaynet represents an agnostic Federated Machine Learning framework to build privacy-preserving AI applications.
Xaynet represents an agnostic Federated Machine Learning framework to build privacy-preserving AI applications.

xaynet Xaynet: Train on the Edge with Federated Learning Want a framework that supports federated learning on the edge, in desktop browsers, integrate

Tangram is an automated machine learning framework designed for programmers.

Tangram Tangram is an automated machine learning framework designed for programmers. Run tangram train to train a model from a CSV file on the command

Comments
  • Incorporate mitosis

    Incorporate mitosis

    @ManishEarth wrote https://github.com/Manishearth/mitosis . It looks like a perfect match for solving Python GIL sharing. Try that and solve Orkhon's contention problems.

    opened by vertexclique 1
Releases(v0.2.0)
  • v0.2.0(Nov 17, 2020)

    This release comes with:

    • ONNX interface
    • New asynchronous servicing methods
    • Shareable server runtime
    • Nuclei asynchronous runtime
    • Inferring input facts for frozen model
    • Improves throughput:
      • ~4.8361 GiB/s prediction throughput
      • 3_000 concurrent requests take ~4ms on average
    Source code(tar.gz)
    Source code(zip)
Owner
Theo M. Bulut
λx.λy.y → λx.λz.z | compiler opt, @rust-lang, and ML | 0xD20F2F5E6DFD6F11 – Different barb, same wire
Theo M. Bulut
Wonnx - a GPU-accelerated ONNX inference run-time written 100% in Rust, ready for the web

Wonnx is a GPU-accelerated ONNX inference run-time written 100% in Rust, ready for the web. Supported Platforms (enabled by wgpu) API Windows Linux &

WebONNX 354 Jan 6, 2023
Using OpenAI Codex's "davinci-edit" Model for Gradual Type Inference

OpenTau: Using OpenAI Codex for Gradual Type Inference Current implementation is focused on TypeScript Python implementation comes next Requirements r

Gamma Tau 11 Dec 18, 2022
pyke Diffusers is a modular Rust library for optimized Stable Diffusion inference 🔮

pyke Diffusers is a modular Rust library for pretrained diffusion model inference to generate images, videos, or audio, using ONNX Runtime as a backen

pyke 12 Jan 5, 2023
Rust+OpenCL+AVX2 implementation of LLaMA inference code

RLLaMA RLLaMA is a pure Rust implementation of LLaMA large language model inference.. Supported features Uses either f16 and f32 weights. LLaMA-7B, LL

Mikko Juola 344 Apr 16, 2023
`dfx new --type=rust` + burn-rs MNIST web inference example

ic-mnist The frontend provides a canvas where users can draw a digit. The drawn digit is then sent to the backend canister running burn-rs for inferen

Marcin Nowak-Liebiediew 4 Jun 25, 2023
WASM runtime for Deku and Michelson-to-WASM compiler

Tuna This repository has two different projects, a plugable VM for running WASM contracts on Deku and a Michelson to WASM compiler which also has some

Marigold 6 Nov 17, 2022
High-performance runtime for data analytics applications

Weld Documentation Weld is a language and runtime for improving the performance of data-intensive applications. It optimizes across libraries and func

Weld 2.9k Jan 7, 2023
Network-agnostic, high-level game networking library for client-side prediction and server reconciliation.

WARNING: This crate currently depends on nightly rust unstable and incomplete features. crystalorb Network-agnostic, high-level game networking librar

Ernest Wong 175 Dec 31, 2022
scalable and fast unofficial osu! server implementation

gamma! the new bancho server for theta! built for scalability and speed configuration configuration is done either through gamma.toml, or through envi

null 3 Jan 7, 2023
🧠 Motörhead is a memory and information retrieval server for LLMs.

Motörhead Motörhead is a memory and information retrieval server for LLMs. Why use Motörhead? When building chat applications using LLMs, memory handl

Metal 56 Apr 6, 2023