Machine Learning Library for Rust

Last update: Jan 1, 2023

Related tags

Overview

autograph

Machine Learning Library for Rust

undergoing maintenance

Features

Portable accelerated compute
Run SPIR-V shaders on GPU's that support Vulkan / Metal / DX12
Interop with ndarray, Tensor emulates Array
Lightweight Async / Non Blocking API

Currently using GLSL as a shader language. When rust-gpu gains enough compute shader support, it will be possible to write portable GPU code in Rust!

Platforms

Linux / Unix

Supports GPU's with Vulkan. Tested on Ubuntu 18.04 AMD RX 580 / NV GTX 1060

MacOs / iOS

Supports GPU's with Metal. Planned support for Vulkan. GPU execution untested.

Windows

Supports GPU's with DX12. Planned support for Vulkan. Tested on Windows 10, AMD RX 580.

Note: Run the windows tests with cargo test -- --num-threads 1 to avoid creating too many instances of the gpu on too many threads. Shared access across threads is safe, but creating a Device for each of several processes may fail.

Datasets

Iris https://gist.github.com/curran/a08a1080b88344b0c8a7

KMeans

See example.

Neural Networks

Coming soon!

Comments

Tests fail on Mac OS Monterey, Rust 1.57

Tests fail to finish on M1 Mac

$ cargo test device_new --features device_tests
running 1 test
test device::tests::device_new has been running for over 60 seconds
error: test failed, to rerun pass '--lib'
Caused by:
  process didn't exit successfully: `/Users/rjzak/Downloads/autograph/target/debug/deps/autograph-868587c6365604da device_new` (signal: 9, SIGKILL: kill)

$ cargo test --features "full device_tests"
test device::buffer::tests::device_buffer_copy_from_slice has been running for over 60 seconds
test device::buffer::tests::device_buffer_serde has been running for over 60 seconds
test device::buffer::tests::fill_bf16 has been running for over 60 seconds
test device::buffer::tests::fill_f16 has been running for over 60 seconds
test device::buffer::tests::fill_f32 has been running for over 60 seconds
test device::buffer::tests::fill_f64 has been running for over 60 seconds
test device::buffer::tests::fill_i16 has been running for over 60 seconds
test device::buffer::tests::fill_i32 has been running for over 60 seconds
error: test failed, to rerun pass '--lib'
Caused by:
  process didn't exit successfully: `/Users/rjzak/Downloads/autograph/target/debug/deps/autograph-aa9dbc5e89ab94bc` (signal: 9, SIGKILL: kill)

$ uname -a
Darwin macmini.local 21.1.0 Darwin Kernel Version 21.1.0: Wed Oct 13 17:33:24 PDT 2021; root:xnu-8019.41.5~1/RELEASE_ARM64_T8101 arm64
$ rustc --version
rustc 1.57.0 (f1edd0429 2021-11-29)

bug help wanted

opened by rjzak 37

Build fails with oneDNN, missing CMakeLists.txt

Downloaded as zip file, ran cargo run --example mnist_dense --features "datasets" --release:

CMake Error: The source directory "autograph-master/oneDNN" does not appear to contain CMakeLists.txt.

This was just a quick test, maybe I’m missing some obvious step.

opened by AlbertoGP 4
implement Model struct/trait to simplify library usage
From your examples (https://github.com/charles-r-earp/autograph/blob/master/examples/mnist_lenet5.rs) it seems that one has to write a lof of boilerplate code to do some actual learning. Is it possible to provide some kind of default Model struct with builder that would remove the need to write all this? I think it could look the following way:

struct Model { layers: Vec<Layer> } impl Model { fn build() -> Self fn fit(&mut self, ...) ... }
opened by Alexei-Kornienko 2
Parametrized tests?

I was checking out cargo test on metal and all non-f64 tests pass 🎉

Going forward, I think tests parametrized on dtype and Device (include/exclude via test params?) would make for a DRY yet granular foundation. I'm no expert in rust fixtures, but saw rstest and macro approaches.

Do you have any opinion on that?

opened by ahirner 1
Improved AllocatorConfig and tests

Fixes #46

Reworked this to be more robust and deterministic. The basic idea is to select all of the heaps that are DEVICE_LOCAL and are of sufficient size, and use those for allocating "storage" (ie device) memory. This is used for Buffer's, compute operations. Then "mapping" memory is selected in a similar way, for heaps that are not DEVICE_LOCAL, sorted such that we prefer CPU_CACHED and require CPU_VISIBLE and COHERENT. The trick is that some drivers, at least Microsoft Basic Render Driver (used for testing on github actions) has just one heap that is DEVICE_LOCAL. So in the case that there isn't a non DEVICE_LOCAL heap, we don't require mapping memory to be not DEVICE_LOCAL. Mapping memory is used for staging buffers for both writes and reads.

In theory, for configs where you have DEVICE_LOCAL and CPU_VISIBLE memory, the host can write and read directly from this memory (so long as the gpu isn't using it), which saves a copy. However, this is very complex and in fact the storage allocator aliases Buffer's to allow for temporaries to be reused, which isn't the case for writes and reads which have to be fully allocated for a given frame. The allocation scheme is different too, because of the different usage pattern, so it would be difficult to take advantage of being able to say write directly into a buffer from the host and use it in a compute shader, and read back the results without staging buffers. In general the assumption is that this overhead is small when the device has to do a bunch of work with the data but for inference on mobile this may be a consideration.

I added tests to verify expected behavior on each of my dev platforms, and I think this should work on anything that is in fact supported. Unfortunately I can't find the relevant info to create additional tests without extracting the memory config manually via cargo test allocator_config_diagnostic -- --ignored --nocapture, which will print out the MemoryProperties and it's then simple to construct a unit test for those properties.

opened by charles-r-earp 0
16 and 64 bit support

Currently 64 bit operations are not fully supported on windows and macos. Additionally, 8 and 16 bit ops require extensions, and may not be fully supported as well.

For images, it is very beneficial to load data as u8 to the device and then convert it to floating point, as this increases bandwidth by a factor 4. What I did before was just pack the u8's into a u32, and then do bitwise operations to extract those into 4 u32's on the device. Not sure on the performance, but at least it's the most portable.

I would like to support bf16 eventually, even if 16 bit values are simply converted to f32s for operations. Potentially this may be faster due to 2x the bandwidth and memory.
help wanted

opened by charles-r-earp 0

Added Sequential layer for reduced boilerplate.

Addresses #21. Along with branches flatten_layer and forward_requires_layer, you could then do something like this:

fn lenet5(device: &Device) -> impl Forward<Ix4, OutputDim=Ix2> {
    Sequential::builder()
        .layer(
            Conv2d::builder()
                .device(&device)
                .inputs(1)
                .outputs(6)
                .kernel(5)
                .build();
        )
        .layer(Relu::default())
        .layer(
            MaxPool2d::builder()
                .args(
                    Pool2dArgs::default()
                        .kernel(2)
                        .strides(2)
                )
                .build()
        )
        .layer(
            Conv2d::builder()
                .device(&device)
                .inputs(6)
                .outputs(16)
                .kernel(5)
                .build()
            )
        )
        .layer(Relu::default())
        .layer(
            MaxPool2d::builder()
                .args(
                    Pool2dArgs::default()
                        .kernel(2)
                        .strides(2)
                )
                .build()
        )
        .layer(Flatten::default())
        .layer(
            Dense::builder()
                .device(&device)
                .inputs(256)
                .outputs(120)
                .build()
        )
        .layer(Relu::default())
        .layer(
            Dense::builder()
            .device(&device)
            .inputs(120)
            .outputs(84)
            .build()
        )
        .layer(Relu::default())
        .layer(
            Dense::builder()
                .device(&device)
                .inputs(84)
                .outputs(10)
                .bias()
                .build()
        )
        .build()
}

opened by charles-r-earp 0

Update README.md

Fixes #26 Moving oneDNN to it's own crate might make it easier to manually download and build. This could even be a sub crate within this repo. That should trigger cargo to do the download for you.

opened by charles-r-earp 0
[question] how to convert BufferBase to Vec
Hi, I use the network defined in https://github.com/charles-r-earp/autograph/blob/main/examples/neural-network-mnist/src/main.rs The output of the model in inference as_raw_slice() is F32(BufferBase { device: Device(0), len: 14190, elem: "f32" }). Now I try to get a Vec<32> of it to interpret the results. Can you help me to understand how to get the data?

Here is a peace of code:

autograph=v0.1.1

let prediction = self .net .clone() .into_device(device) .await .unwrap() .infer(&x) .unwrap(); let prediction: FloatBuffer = prediction .as_raw_slice() .into_device(Device::host()) .await .unwrap(); println!("{:?}", prediction); // F32(BufferBase { device: Host, len: 14190, elem: "f32" })

Some examples show to "read" data back: let output = y.read().await?; But FloatBuffer does not have the read() function.
opened by chriamue 0

Releases(v0.1.1)

v0.1.1(Dec 12, 2021)
Profiling

Currently requires nightly and feature "profile". Set the AUTOGRAPH_PROFILE environmental variable to 1 or True to produce a table of statistics for compute passes that are executed.

AUTOGRAPH_PROFILE=1 cargo +nightly run --feature profile

Rust GEMM

Improved performance on Neural Network MNIST example (Lenet5) by 5x.

Implemented in Rust for u32, i32, f32

bf16 not yet implemented

Unrolled loops with crunchy

Work per thread (1x1, 2x2, 4x4) micro tiles

SplitK variant (256) for small m or n and large k

Atomically accumulates with multiple work groups

Tensor

Added Tensor::ones method.

Neural Networks

Allowed SGD learning_rate = 1.0

MeanPool

Fixed correctness issues

Cross Entropy Loss

Sum

Test accuracy improved to ~99% on Neural Network MNIST example (Lenet5)

Examples

Added shuffling of training batches

Benchmark

Added Neural Network Benchmark to compare performance with other libraries. Training is now ~2.7x slower than tch (NVIDIA GeForce GTX 1060 with Max-Q Design) with similar test accuracy.

+-----------+------------+---------------+-----------------------+----------------------------------+ | Library | Best Epoch | Best Accuracy | Time To Best Accuracy | Mean Epoch Time to Best Accuracy | +===========+============+===============+=======================+==================================+ | autograph | 69 | 99.04% | 127.38s | 1.85s | +-----------+------------+---------------+-----------------------+----------------------------------+ | tch | 32 | 99.12% | 22.03s | 688.31ms | +-----------+------------+---------------+-----------------------+----------------------------------+
Source code(tar.gz)
Source code(zip)
v0.1.0(Oct 30, 2021)
This is the first release of autograph rebuilt on SPIR-V compute shaders that can be compiled from Rust source with rust-gpu!

Compute Shaders

All computations are implemented in either Rust or GLSL (to be replaced by Rust), and this API is publicly exposed so that external crates can develop their own routines. Shader code targeting SPIR-V is portable and is compiled at runtime for devices supporting Vulkan, Metal, and DX12 API's.

Datasets

The library includes MNIST and Iris datasets to make it easy to get started and these are used in examples.

Machine Learning

High level traits like Train, Test, and Infer are provided to create a common interface for different algorithms.

KMeans

An implementation of the KMeans classifier, demonstrated in the examples.

Neural Networks

Networks can be constructed as a structure of Layers, including:

Convolutions

ReLU

MaxPool

Dense

Each of these layers implement Layer and Forward traits, which can be derived to reduce boiler plate.

#[derive(Layer, Forward, Clone, Debug, Serialize, Deserialize)] struct Lenet5 { #[autograph(layer)] conv1: Conv, #[autograph(layer)] relu1: Relu, #[autograph(layer)] pool1: MaxPool, #[autograph(layer)] conv2: Conv, #[autograph(layer)] relu2: Relu, #[autograph(layer)] pool2: MaxPool, #[autograph(layer)] dense1: Dense, #[autograph(layer)] relu3: Relu, #[autograph(layer)] dense2: Dense, #[autograph(layer)] relu4: Relu, #[autograph(layer)] dense3: Dense, }

Similarly, backward ops can be defined using the Autograd and Backward traits, where Autograd can be derived in much the same way that Layer is.

#[derive(Autograd)] struct DenseBackward { // Use vertex / optional_vertex for Variables and Parameters #[autograph(vertex)] input: Variable2, #[autograph(vertex)] weight: Parameter2, #[autograph(optional_vertex)] bias: Option<Parameter1>, }

The intent is that users can write their own custom, modular layers and functions which can be defined from the high level down to custom shader code, all implemented in Rust.

Status

The crate is fairly minimal, missing implementations for some data types, not supporting bf16 for convolutions and pooling layers, with many functions like matrix multiplication internal and not publicly exposed. Things that are potential work items:

Fully support bf16 in Neural Networks, with a nicer means to convert from f32 to bf16 and back for Variables and Parameters.

Render the backward "graph" using petgraph for visualization and debugging purposes.

Profiling tools for evaluating key functions / shaders and for improving the engine itself.

Port GLSL to Rust, rust-gpu barriers are not working yet and need to reduce the need for code duplication particularly for bf16.

Improve performance, particularly the GEMM implementation.

Implement more operations and algorithms:

MeanPool is implemented but backward is not yet working.

Binary ops like addition are easy but not yet implemented due to uncertainty over API (in regards to Residual layers etc with more than 2 inputs).

SGD with momentum not yet implemented, implement other optimizers.

Model parallelism supported but not tested or optimized. Data parallelism is intended to override Layer::update() to perform an all reduce (ie mean) over the the gradients for each parameter duplicated on several devices prior to the optimization step.

Contributors

Thank you to those that have contributed to the project!

@AlbertoGP

@nkconnor

Source code(tar.gz)
Source code(zip)

Owner

GitHub

Machine Learning Library for Rust

autograph Machine Learning Library for Rust undergoing maintenance Features Portable accelerated compute Run SPIR-V shaders on GPU's that support Vulk

223 Jan 1, 2023

Mars is a rust machine learning library. [Goal is to make Simple as possible]

Mars Mars (ma-rs) is an blazingly fast rust machine learning library. Simple and Powerful! ?? ?? Contribution: Feel free to build this project. This i

3 Dec 25, 2022

A machine learning library in Rust from scratch.

Machine Learning in Rust Learn the Rust programming language through implementing classic machine learning algorithms. This project is self-completed

39 Jan 17, 2023

convolutions-rs is a crate that provides a fast, well-tested convolutions library for machine learning

convolutions-rs convolutions-rs is a crate that provides a fast, well-tested convolutions library for machine learning written entirely in Rust with m

10 Jun 28, 2022

A machine learning library for supervised training of parametrized models

Vikos Vikos is a library for supervised training of parameterized, regression, and classification models Design Goals Model representations, cost func

10 May 10, 2022

A Rust machine learning framework.

Linfa linfa (Italian) / sap (English): The vital circulating fluid of a plant. linfa aims to provide a comprehensive toolkit to build Machine Learning

2.2k Jan 2, 2023

Machine learning crate for Rust

rustlearn A machine learning package for Rust. For full usage details, see the API documentation. Introduction This crate contains reasonably effectiv

547 Dec 28, 2022

Machine learning in Rust.

Rustml Rustml is a library for doing machine learning in Rust. The documentation of the project with a descprition of the modules can be found here. F

60 Dec 15, 2022

Rust based Cross-GPU Machine Learning

HAL : Hyper Adaptive Learning Rust based Cross-GPU Machine Learning. Why Rust? This project is for those that miss strongly typed compiled languages.

83 Dec 20, 2022

Fwumious Wabbit, fast on-line machine learning toolkit written in Rust

Fwumious Wabbit is a very fast machine learning tool built with Rust inspired by and partially compatible with Vowpal Wabbit (much love! read more abo

115 Dec 9, 2022

A Machine Learning Framework for High Performance written in Rust

polarlight polarlight is a machine learning framework for high performance written in Rust. Key Features TBA Quick Start TBA How To Contribute Contrib

25 Aug 23, 2022

Example of Rust API for Machine Learning

rust-machine-learning-api-example Example of Rust API for Machine Learning API example that uses resnet224 to infer images received in base64 and retu

16 Oct 3, 2022

High-level non-blocking Deno bindings to the rust-bert machine learning crate.

bertml High-level non-blocking Deno bindings to the rust-bert machine learning crate. Guide Introduction The ModelManager class manages the FFI bindin

14 Dec 15, 2022

Machine learning Neural Network in Rust

vinyana vinyana - stands for mind in pali language. Goal To implement a simple Neural Network Library in order to understand the maths behind it. This

3 Dec 26, 2022

Source Code for 'Practical Machine Learning with Rust' by Joydeep Bhattacharjee

Apress Source Code This repository accompanies Practical Machine Learning with Rust by Joydeep Bhattacharjee (Apress, 2020). Download the files as a z

57 Dec 7, 2022

An example of using TensorFlow rust bindings to serve trained machine learning models via Actix Web

Serving TensorFlow with Actix-Web This repository gives an example of training a machine learning model using TensorFlow2.0 Keras in python, exporting

39 Dec 12, 2022

🏆 A ranked list of awesome machine learning Rust libraries.

best-of-ml-rust ?? A ranked list of awesome machine learning Rust libraries. This curated list contains 180 awesome open-source projects with a total

110 Dec 28, 2022

Machine learning crate in Rust

DeepRust - Machine learning in Rust Vision To create a deeplearning crate in rust aiming to create a great experience for ML researchers & developers

8 Sep 6, 2022

BudouX-rs is a rust port of BudouX (machine learning powered line break organizer tool).

BudouX-rs BudouX-rs is a rust port of BudouX (machine learning powered line break organizer tool). Note: This project contains the deliverables of the

5 Jan 20, 2022

Machine Learning Library for Rust

Related tags

Overview

autograph

Features

Platforms

Linux / Unix

MacOs / iOS

Windows

Datasets

KMeans

Neural Networks

Comments

Releases(v0.1.1)

v0.1.1(Dec 12, 2021)

Profiling

Rust GEMM

Tensor

Neural Networks

Examples

Benchmark

v0.1.0(Oct 30, 2021)

Compute Shaders

Datasets

Machine Learning

KMeans

Neural Networks

Status

Contributors

Owner

Machine Learning Library for Rust

Mars is a rust machine learning library. [Goal is to make Simple as possible]

A machine learning library in Rust from scratch.

convolutions-rs is a crate that provides a fast, well-tested convolutions library for machine learning

A machine learning library for supervised training of parametrized models

A Rust machine learning framework.

Machine learning crate for Rust

Machine learning in Rust.

Rust based Cross-GPU Machine Learning

Fwumious Wabbit, fast on-line machine learning toolkit written in Rust

A Machine Learning Framework for High Performance written in Rust

Example of Rust API for Machine Learning

High-level non-blocking Deno bindings to the rust-bert machine learning crate.

Machine learning Neural Network in Rust

Source Code for 'Practical Machine Learning with Rust' by Joydeep Bhattacharjee

An example of using TensorFlow rust bindings to serve trained machine learning models via Actix Web

🏆 A ranked list of awesome machine learning Rust libraries.

Machine learning crate in Rust

BudouX-rs is a rust port of BudouX (machine learning powered line break organizer tool).