Machine Learning Library for Rust

Overview

LicenseBadge DocsBadge Build Status

autograph

Machine Learning Library for Rust

undergoing maintenance

Features

  • Portable accelerated compute
  • Run SPIR-V shaders on GPU's that support Vulkan / Metal / DX12
  • Interop with ndarray, Tensor emulates Array
  • Lightweight Async / Non Blocking API

Currently using GLSL as a shader language. When rust-gpu gains enough compute shader support, it will be possible to write portable GPU code in Rust!

Platforms

Linux / Unix

Supports GPU's with Vulkan. Tested on Ubuntu 18.04 AMD RX 580 / NV GTX 1060

MacOs / iOS

Supports GPU's with Metal. Planned support for Vulkan. GPU execution untested.

Windows

Supports GPU's with DX12. Planned support for Vulkan. Tested on Windows 10, AMD RX 580.

Note: Run the windows tests with cargo test -- --num-threads 1 to avoid creating too many instances of the gpu on too many threads. Shared access across threads is safe, but creating a Device for each of several processes may fail.

Datasets

KMeans

See example.

Neural Networks

Coming soon!

Comments
  • Tests fail on Mac OS Monterey, Rust 1.57

    Tests fail on Mac OS Monterey, Rust 1.57

    Tests fail to finish on M1 Mac

    $ cargo test device_new --features device_tests
    running 1 test
    test device::tests::device_new has been running for over 60 seconds
    error: test failed, to rerun pass '--lib'
    Caused by:
      process didn't exit successfully: `/Users/rjzak/Downloads/autograph/target/debug/deps/autograph-868587c6365604da device_new` (signal: 9, SIGKILL: kill)
    
    $ cargo test --features "full device_tests"
    test device::buffer::tests::device_buffer_copy_from_slice has been running for over 60 seconds
    test device::buffer::tests::device_buffer_serde has been running for over 60 seconds
    test device::buffer::tests::fill_bf16 has been running for over 60 seconds
    test device::buffer::tests::fill_f16 has been running for over 60 seconds
    test device::buffer::tests::fill_f32 has been running for over 60 seconds
    test device::buffer::tests::fill_f64 has been running for over 60 seconds
    test device::buffer::tests::fill_i16 has been running for over 60 seconds
    test device::buffer::tests::fill_i32 has been running for over 60 seconds
    error: test failed, to rerun pass '--lib'
    Caused by:
      process didn't exit successfully: `/Users/rjzak/Downloads/autograph/target/debug/deps/autograph-aa9dbc5e89ab94bc` (signal: 9, SIGKILL: kill)
    
    $ uname -a
    Darwin macmini.local 21.1.0 Darwin Kernel Version 21.1.0: Wed Oct 13 17:33:24 PDT 2021; root:xnu-8019.41.5~1/RELEASE_ARM64_T8101 arm64
    $ rustc --version
    rustc 1.57.0 (f1edd0429 2021-11-29)
    
    bug help wanted 
    opened by rjzak 37
  • Build fails with oneDNN, missing CMakeLists.txt

    Build fails with oneDNN, missing CMakeLists.txt

    Downloaded as zip file, ran cargo run --example mnist_dense --features "datasets" --release:

    CMake Error: The source directory "autograph-master/oneDNN" does not appear to contain CMakeLists.txt.

    This was just a quick test, maybe I’m missing some obvious step.

    opened by AlbertoGP 4
  • implement Model struct/trait to simplify library usage

    implement Model struct/trait to simplify library usage

    From your examples (https://github.com/charles-r-earp/autograph/blob/master/examples/mnist_lenet5.rs) it seems that one has to write a lof of boilerplate code to do some actual learning. Is it possible to provide some kind of default Model struct with builder that would remove the need to write all this? I think it could look the following way:

    struct Model {
        layers: Vec<Layer>
    }
    
    impl Model {
        fn build() -> Self
        fn fit(&mut self, ...)
        ...
    }
    
    opened by Alexei-Kornienko 2
  • Parametrized tests?

    Parametrized tests?

    I was checking out cargo test on metal and all non-f64 tests pass 🎉

    Going forward, I think tests parametrized on dtype and Device (include/exclude via test params?) would make for a DRY yet granular foundation. I'm no expert in rust fixtures, but saw rstest and macro approaches.

    Do you have any opinion on that?

    opened by ahirner 1
  • Improved AllocatorConfig and tests

    Improved AllocatorConfig and tests

    Fixes #46

    Reworked this to be more robust and deterministic. The basic idea is to select all of the heaps that are DEVICE_LOCAL and are of sufficient size, and use those for allocating "storage" (ie device) memory. This is used for Buffer's, compute operations. Then "mapping" memory is selected in a similar way, for heaps that are not DEVICE_LOCAL, sorted such that we prefer CPU_CACHED and require CPU_VISIBLE and COHERENT. The trick is that some drivers, at least Microsoft Basic Render Driver (used for testing on github actions) has just one heap that is DEVICE_LOCAL. So in the case that there isn't a non DEVICE_LOCAL heap, we don't require mapping memory to be not DEVICE_LOCAL. Mapping memory is used for staging buffers for both writes and reads.

    In theory, for configs where you have DEVICE_LOCAL and CPU_VISIBLE memory, the host can write and read directly from this memory (so long as the gpu isn't using it), which saves a copy. However, this is very complex and in fact the storage allocator aliases Buffer's to allow for temporaries to be reused, which isn't the case for writes and reads which have to be fully allocated for a given frame. The allocation scheme is different too, because of the different usage pattern, so it would be difficult to take advantage of being able to say write directly into a buffer from the host and use it in a compute shader, and read back the results without staging buffers. In general the assumption is that this overhead is small when the device has to do a bunch of work with the data but for inference on mobile this may be a consideration.

    I added tests to verify expected behavior on each of my dev platforms, and I think this should work on anything that is in fact supported. Unfortunately I can't find the relevant info to create additional tests without extracting the memory config manually via cargo test allocator_config_diagnostic -- --ignored --nocapture, which will print out the MemoryProperties and it's then simple to construct a unit test for those properties.

    opened by charles-r-earp 0
  • 16 and 64 bit support

    16 and 64 bit support

    Currently 64 bit operations are not fully supported on windows and macos. Additionally, 8 and 16 bit ops require extensions, and may not be fully supported as well.

    For images, it is very beneficial to load data as u8 to the device and then convert it to floating point, as this increases bandwidth by a factor 4. What I did before was just pack the u8's into a u32, and then do bitwise operations to extract those into 4 u32's on the device. Not sure on the performance, but at least it's the most portable.

    I would like to support bf16 eventually, even if 16 bit values are simply converted to f32s for operations. Potentially this may be faster due to 2x the bandwidth and memory.

    help wanted 
    opened by charles-r-earp 0
  • Added Sequential layer for reduced boilerplate.

    Added Sequential layer for reduced boilerplate.

    Addresses #21. Along with branches flatten_layer and forward_requires_layer, you could then do something like this:

    fn lenet5(device: &Device) -> impl Forward<Ix4, OutputDim=Ix2> {
        Sequential::builder()
            .layer(
                Conv2d::builder()
                    .device(&device)
                    .inputs(1)
                    .outputs(6)
                    .kernel(5)
                    .build();
            )
            .layer(Relu::default())
            .layer(
                MaxPool2d::builder()
                    .args(
                        Pool2dArgs::default()
                            .kernel(2)
                            .strides(2)
                    )
                    .build()
            )
            .layer(
                Conv2d::builder()
                    .device(&device)
                    .inputs(6)
                    .outputs(16)
                    .kernel(5)
                    .build()
                )
            )
            .layer(Relu::default())
            .layer(
                MaxPool2d::builder()
                    .args(
                        Pool2dArgs::default()
                            .kernel(2)
                            .strides(2)
                    )
                    .build()
            )
            .layer(Flatten::default())
            .layer(
                Dense::builder()
                    .device(&device)
                    .inputs(256)
                    .outputs(120)
                    .build()
            )
            .layer(Relu::default())
            .layer(
                Dense::builder()
                .device(&device)
                .inputs(120)
                .outputs(84)
                .build()
            )
            .layer(Relu::default())
            .layer(
                Dense::builder()
                    .device(&device)
                    .inputs(84)
                    .outputs(10)
                    .bias()
                    .build()
            )
            .build()
    }
    
    opened by charles-r-earp 0
  • Update README.md

    Update README.md

    Fixes #26 Moving oneDNN to it's own crate might make it easier to manually download and build. This could even be a sub crate within this repo. That should trigger cargo to do the download for you.

    opened by charles-r-earp 0
  • [question] how to convert BufferBase to Vec

    [question] how to convert BufferBase to Vec

    Hi, I use the network defined in https://github.com/charles-r-earp/autograph/blob/main/examples/neural-network-mnist/src/main.rs The output of the model in inference as_raw_slice() is F32(BufferBase { device: Device(0), len: 14190, elem: "f32" }). Now I try to get a Vec<32> of it to interpret the results. Can you help me to understand how to get the data?

    Here is a peace of code:

    autograph=v0.1.1

    let prediction = self
            .net
            .clone()
            .into_device(device)
            .await
            .unwrap()
            .infer(&x)
            .unwrap();
    let prediction: FloatBuffer = prediction
            .as_raw_slice()
            .into_device(Device::host())
            .await
            .unwrap();
    println!("{:?}", prediction);
    // F32(BufferBase { device: Host, len: 14190, elem: "f32" })
    

    Some examples show to "read" data back: let output = y.read().await?; But FloatBuffer does not have the read() function.

    opened by chriamue 0
Releases(v0.1.1)
  • v0.1.1(Dec 12, 2021)

    Profiling

    Currently requires nightly and feature "profile". Set the AUTOGRAPH_PROFILE environmental variable to 1 or True to produce a table of statistics for compute passes that are executed.

    AUTOGRAPH_PROFILE=1 cargo +nightly run --feature profile
    

    Rust GEMM

    Improved performance on Neural Network MNIST example (Lenet5) by 5x.

    • Implemented in Rust for u32, i32, f32
      • bf16 not yet implemented
    • Unrolled loops with crunchy
    • Work per thread (1x1, 2x2, 4x4) micro tiles
    • SplitK variant (256) for small m or n and large k
      • Atomically accumulates with multiple work groups

    Tensor

    • Added Tensor::ones method.

    Neural Networks

    • Allowed SGD learning_rate = 1.0
    • MeanPool
    • Fixed correctness issues
      • Cross Entropy Loss
      • Sum
      • Test accuracy improved to ~99% on Neural Network MNIST example (Lenet5)

    Examples

    • Added shuffling of training batches

    Benchmark

    Added Neural Network Benchmark to compare performance with other libraries. Training is now ~2.7x slower than tch (NVIDIA GeForce GTX 1060 with Max-Q Design) with similar test accuracy.

    +-----------+------------+---------------+-----------------------+----------------------------------+
    | Library   | Best Epoch | Best Accuracy | Time To Best Accuracy | Mean Epoch Time to Best Accuracy |
    +===========+============+===============+=======================+==================================+
    | autograph | 69         | 99.04%        | 127.38s               | 1.85s                            |
    +-----------+------------+---------------+-----------------------+----------------------------------+
    | tch       | 32         | 99.12%        | 22.03s                | 688.31ms                         |
    +-----------+------------+---------------+-----------------------+----------------------------------+
    
    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Oct 30, 2021)

    This is the first release of autograph rebuilt on SPIR-V compute shaders that can be compiled from Rust source with rust-gpu!

    Compute Shaders

    All computations are implemented in either Rust or GLSL (to be replaced by Rust), and this API is publicly exposed so that external crates can develop their own routines. Shader code targeting SPIR-V is portable and is compiled at runtime for devices supporting Vulkan, Metal, and DX12 API's.

    Datasets

    The library includes MNIST and Iris datasets to make it easy to get started and these are used in examples.

    Machine Learning

    High level traits like Train, Test, and Infer are provided to create a common interface for different algorithms.

    KMeans

    An implementation of the KMeans classifier, demonstrated in the examples.

    Neural Networks

    Networks can be constructed as a structure of Layers, including:

    • Convolutions
    • ReLU
    • MaxPool
    • Dense

    Each of these layers implement Layer and Forward traits, which can be derived to reduce boiler plate.

    #[derive(Layer, Forward, Clone, Debug, Serialize, Deserialize)]
    struct Lenet5 {
        #[autograph(layer)]
        conv1: Conv,
        #[autograph(layer)]
        relu1: Relu,
        #[autograph(layer)]
        pool1: MaxPool,
        #[autograph(layer)]
        conv2: Conv,
        #[autograph(layer)]
        relu2: Relu,
        #[autograph(layer)]
        pool2: MaxPool,
        #[autograph(layer)]
        dense1: Dense,
        #[autograph(layer)]
        relu3: Relu,
        #[autograph(layer)]
        dense2: Dense,
        #[autograph(layer)]
        relu4: Relu,
        #[autograph(layer)]
        dense3: Dense,
    }
    

    Similarly, backward ops can be defined using the Autograd and Backward traits, where Autograd can be derived in much the same way that Layer is.

    #[derive(Autograd)]
    struct DenseBackward {
        // Use vertex / optional_vertex for Variables and Parameters
        #[autograph(vertex)]
        input: Variable2,
        #[autograph(vertex)]
        weight: Parameter2,
        #[autograph(optional_vertex)]
        bias: Option<Parameter1>,
    }
    

    The intent is that users can write their own custom, modular layers and functions which can be defined from the high level down to custom shader code, all implemented in Rust.

    Status

    The crate is fairly minimal, missing implementations for some data types, not supporting bf16 for convolutions and pooling layers, with many functions like matrix multiplication internal and not publicly exposed. Things that are potential work items:

    • Fully support bf16 in Neural Networks, with a nicer means to convert from f32 to bf16 and back for Variables and Parameters.
    • Render the backward "graph" using petgraph for visualization and debugging purposes.
    • Profiling tools for evaluating key functions / shaders and for improving the engine itself.
    • Port GLSL to Rust, rust-gpu barriers are not working yet and need to reduce the need for code duplication particularly for bf16.
    • Improve performance, particularly the GEMM implementation.
    • Implement more operations and algorithms:
      • MeanPool is implemented but backward is not yet working.
      • Binary ops like addition are easy but not yet implemented due to uncertainty over API (in regards to Residual layers etc with more than 2 inputs).
      • SGD with momentum not yet implemented, implement other optimizers.
    • Model parallelism supported but not tested or optimized. Data parallelism is intended to override Layer::update() to perform an all reduce (ie mean) over the the gradients for each parameter duplicated on several devices prior to the optimization step.

    Contributors

    Thank you to those that have contributed to the project!

    • @AlbertoGP
    • @nkconnor
    Source code(tar.gz)
    Source code(zip)
Owner
null
Machine Learning Library for Rust

autograph Machine Learning Library for Rust undergoing maintenance Features Portable accelerated compute Run SPIR-V shaders on GPU's that support Vulk

null 223 Jan 1, 2023
Mars is a rust machine learning library. [Goal is to make Simple as possible]

Mars Mars (ma-rs) is an blazingly fast rust machine learning library. Simple and Powerful! ?? ?? Contribution: Feel free to build this project. This i

KoBruh 3 Dec 25, 2022
A machine learning library in Rust from scratch.

Machine Learning in Rust Learn the Rust programming language through implementing classic machine learning algorithms. This project is self-completed

Chi Zuo 39 Jan 17, 2023
convolutions-rs is a crate that provides a fast, well-tested convolutions library for machine learning

convolutions-rs convolutions-rs is a crate that provides a fast, well-tested convolutions library for machine learning written entirely in Rust with m

null 10 Jun 28, 2022
A machine learning library for supervised training of parametrized models

Vikos Vikos is a library for supervised training of parameterized, regression, and classification models Design Goals Model representations, cost func

Blue Yonder GmbH 10 May 10, 2022
A Rust machine learning framework.

Linfa linfa (Italian) / sap (English): The vital circulating fluid of a plant. linfa aims to provide a comprehensive toolkit to build Machine Learning

Rust-ML 2.2k Jan 2, 2023
Machine learning crate for Rust

rustlearn A machine learning package for Rust. For full usage details, see the API documentation. Introduction This crate contains reasonably effectiv

Maciej Kula 547 Dec 28, 2022
Machine learning in Rust.

Rustml Rustml is a library for doing machine learning in Rust. The documentation of the project with a descprition of the modules can be found here. F

null 60 Dec 15, 2022
Rust based Cross-GPU Machine Learning

HAL : Hyper Adaptive Learning Rust based Cross-GPU Machine Learning. Why Rust? This project is for those that miss strongly typed compiled languages.

Jason Ramapuram 83 Dec 20, 2022
Fwumious Wabbit, fast on-line machine learning toolkit written in Rust

Fwumious Wabbit is a very fast machine learning tool built with Rust inspired by and partially compatible with Vowpal Wabbit (much love! read more abo

Outbrain 115 Dec 9, 2022
A Machine Learning Framework for High Performance written in Rust

polarlight polarlight is a machine learning framework for high performance written in Rust. Key Features TBA Quick Start TBA How To Contribute Contrib

Chris Ohk 25 Aug 23, 2022
Example of Rust API for Machine Learning

rust-machine-learning-api-example Example of Rust API for Machine Learning API example that uses resnet224 to infer images received in base64 and retu

vaaaaanquish 16 Oct 3, 2022
High-level non-blocking Deno bindings to the rust-bert machine learning crate.

bertml High-level non-blocking Deno bindings to the rust-bert machine learning crate. Guide Introduction The ModelManager class manages the FFI bindin

Carter Snook 14 Dec 15, 2022
Machine learning Neural Network in Rust

vinyana vinyana - stands for mind in pali language. Goal To implement a simple Neural Network Library in order to understand the maths behind it. This

Alexandru Olaru 3 Dec 26, 2022
Source Code for 'Practical Machine Learning with Rust' by Joydeep Bhattacharjee

Apress Source Code This repository accompanies Practical Machine Learning with Rust by Joydeep Bhattacharjee (Apress, 2020). Download the files as a z

Apress 57 Dec 7, 2022
An example of using TensorFlow rust bindings to serve trained machine learning models via Actix Web

Serving TensorFlow with Actix-Web This repository gives an example of training a machine learning model using TensorFlow2.0 Keras in python, exporting

Kyle Kosic 39 Dec 12, 2022
🏆 A ranked list of awesome machine learning Rust libraries.

best-of-ml-rust ?? A ranked list of awesome machine learning Rust libraries. This curated list contains 180 awesome open-source projects with a total

₸ornike 110 Dec 28, 2022
Machine learning crate in Rust

DeepRust - Machine learning in Rust Vision To create a deeplearning crate in rust aiming to create a great experience for ML researchers & developers

Vigneshwer Dhinakaran 8 Sep 6, 2022
BudouX-rs is a rust port of BudouX (machine learning powered line break organizer tool).

BudouX-rs BudouX-rs is a rust port of BudouX (machine learning powered line break organizer tool). Note: This project contains the deliverables of the

null 5 Jan 20, 2022