Orkhon: ML Inference Framework and Server Runtime

Theo M. Bulut

Last update: Dec 21, 2022

Related tags

Data processing machine-learning async tensorflow multiprocessing python3 inference-server data-parallelism

Overview

Orkhon: ML Inference Framework and Server Runtime

Latest Release

License
Build Status
Downloads
Gitter

What is it?

Orkhon is Rust framework for Machine Learning to run/use inference/prediction code written in Python, frozen models and process unseen data. It is mainly focused on serving models and processing unseen data in a performant manner. Instead of using Python directly and having scalability problems for servers this framework tries to solve them with built-in async API.

Main features

Sync & Async API for models.
Easily embeddable engine for well-known Rust web frameworks.
API contract for interacting with Python code.
High processing throughput
- ~4.8361 GiB/s prediction throughput
- 3_000 concurrent requests takes ~4ms on average
Python Module caching

Installation

You can include Orkhon into your project with;

[dependencies]
orkhon = "0.2"

Dependencies

You will need:

If you use pymodel feature, Python dev dependencies should be installed and have proper python runtime to use Orkhon with your project.
If you want to have tensorflow inference. Installing tensorflow as library for linking is required.
ONNX interface doesn't need extra dependencies from the system side.
Point out your PYTHONHOME environment variable to your Python installation.

Python API contract

For Python API contract you can take a look at the Project Documentation.

Examples

Request a Tensorflow prediction asynchronously

 use orkhon::prelude::*;
 use orkhon::tcore::prelude::*;
 use orkhon::ttensor::prelude::*;
 use rand::*;
 use std::path::PathBuf;

let o = Orkhon::new()
    .config(
        OrkhonConfig::new()
            .with_input_fact_shape(InferenceFact::dt_shape(f32::datum_type(), tvec![10, 100])),
    )
    .tensorflow(
        "model_which_will_be_tested",
        PathBuf::from("tests/protobuf/manual_input_infer/my_model.pb"),
    )
    .shareable();

let mut rng = thread_rng();
let vals: Vec<_> = (0..1000).map(|_| rng.gen::<f32>()).collect();
let input = tract_ndarray::arr1(&vals).into_shape((10, 100)).unwrap();

let o = o.get();
let handle = async move {
    let processor = o.tensorflow_request_async(
       "model_which_will_be_tested",
       ORequest::with_body(TFRequest::new().body(input.into())),
    );
    processor.await
};
let resp = block_on(handle).unwrap();

Request an ONNX prediction synchronously

This example needs onnxmodel feature enabled.

use orkhon::prelude::*;
use orkhon::tcore::prelude::*;
use orkhon::ttensor::prelude::*;
use rand::*;
use std::path::PathBuf;

 let o = Orkhon::new()
     .config(
         OrkhonConfig::new()
             .with_input_fact_shape(InferenceFact::dt_shape(f32::datum_type(), tvec![10, 100])),
     )
     .onnx(
         "model_which_will_be_tested",
         PathBuf::from("tests/protobuf/onnx_model/example.onnx"),
     )
     .build();

 let mut rng = thread_rng();
 let vals: Vec<_> = (0..1000).map(|_| rng.gen::<f32>()).collect();
 let input = tract_ndarray::arr1(&vals).into_shape((10, 100)).unwrap();

 let resp = o
     .onnx_request(
         "model_which_will_be_tested",
         ORequest::with_body(ONNXRequest::new().body(input.into())),
     )
     .unwrap();
 assert_eq!(resp.body.output.len(), 1);

License

License is MIT

Documentation

Official documentation is hosted on docs.rs.

Getting Help

Please head to our Gitter or use StackOverflow

Discussion and Development

We use Gitter for development discussions. Also please don't hesitate to open issues on GitHub ask for features, report bugs, comment on design and more! More interaction and more ideas are better!

Contributing to Orkhon

All contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome.

A detailed overview on how to contribute can be found in the CONTRIBUTING guide on GitHub.

Comments

Incorporate mitosis

@ManishEarth wrote https://github.com/Manishearth/mitosis . It looks like a perfect match for solving Python GIL sharing. Try that and solve Orkhon's contention problems.

opened by vertexclique 1

Releases(v0.2.0)

v0.2.0(Nov 17, 2020)
This release comes with:

ONNX interface

New asynchronous servicing methods

Shareable server runtime

Nuclei asynchronous runtime

Inferring input facts for frozen model

Improves throughput:

~4.8361 GiB/s prediction throughput

3_000 concurrent requests take ~4ms on average

Source code(tar.gz)
Source code(zip)

Orkhon: ML Inference Framework and Server Runtime

Related tags

Overview

Orkhon: ML Inference Framework and Server Runtime

What is it?

Main features

Installation

Dependencies

Python API contract

Examples

Request a Tensorflow prediction asynchronously

Request an ONNX prediction synchronously

License

Documentation

Getting Help

Discussion and Development

Contributing to Orkhon

You might also like...

A Rust crate that reads and writes tfrecord files

Apache Arrow DataFusion and Ballista query engines

Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow

ConnectorX - Fastest library to load data from DB to DataFrames in Rust and Python

Provides a way to use enums to describe and execute ordered data pipelines. 🦀🐾

AppFlowy is an open-source alternative to Notion. You are in charge of your data and customizations

New generation decentralized data warehouse and streaming data pipeline

An example repository on how to start building graph applications on streaming data. Just clone and start building 💻 💪

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing

Comments

Incorporate mitosis

Releases(v0.2.0)

v0.2.0(Nov 17, 2020)

Owner

Theo M. Bulut

High-performance runtime for data analytics applications

(MERGED) Rust bindings for TVM runtime

Xaynet represents an agnostic Federated Machine Learning framework to build privacy-preserving AI applications.

Bytewax is an open source Python framework for building highly scalable dataflows.

Arrow User-Defined Functions Framework on WebAssembly.

Perhaps the fastest and most memory efficient way to pull data from PostgreSQL into pandas and numpy. 🚀

A fast, powerful, flexible and easy to use open source data analysis and manipulation tool written in Rust

A fast, powerful, flexible and easy to use open source data analysis and manipulation tool written in Rust

ndarray: an N-dimensional array with array views, multidimensional slicing, and efficient operations

Dataframe structure and operations in Rust