General matrix multiplication with custom configuration in Rust. Supports no_std and no_alloc environments.

Last update: Nov 6, 2023

Related tags

Machine learning microgemm

Overview

microgemm

General matrix multiplication with custom configuration in Rust.
Supports no_std and no_alloc environments.

The implementation is based on the BLIS microkernel approach.

Getting Started

The Kernel trait is the main abstraction of microgemm. You can implement it yourself or use kernels that are already provided out of the box.

gemm

use microgemm as mg;
use microgemm::Kernel as _;

fn main() {
    let kernel = mg::kernels::Generic8x8Kernel::<f32>::new();
    assert_eq!(kernel.mr(), 8);
    assert_eq!(kernel.nr(), 8);

    let pack_sizes = mg::PackSizes {
        mc: 5 * kernel.mr(), // MC must be divisible by MR
        kc: 190,
        nc: 9 * kernel.nr(), // NC must be divisible by NR
    };
    let mut packing_buf = vec![0.0; pack_sizes.buf_len()];

    let alpha = 2.0;
    let beta = -3.0;
    let (m, k, n) = (100, 380, 250);

    let a = vec![2.0; m * k];
    let b = vec![3.0; k * n];
    let mut c = vec![4.0; m * n];

    let a = mg::MatRef::new(m, k, &a, mg::Layout::RowMajor);
    let b = mg::MatRef::new(k, n, &b, mg::Layout::RowMajor);
    let mut c = mg::MatMut::new(m, n, &mut c, mg::Layout::RowMajor);

    // c <- alpha a b + beta c
    kernel.gemm(alpha, &a, &b, beta, &mut c, &pack_sizes, &mut packing_buf);
    println!("{:?}", c.as_slice());
}

Also see no_alloc example for use without Vec.

Implemented Kernels

Name	Scalar Types	Target
GenericNxNKernel (N: 2, 4, 8, 16, 32)	T: Copy + Zero + One + Mul + Add	Any
NeonKernel	f32	aarch64 and target feature neon
WasmSimd128Kernel	f32	wasm32 and target feature simd128

Custom Kernel Implementation

use microgemm::{typenum::U4, Kernel, MatMut, MatRef};

struct CustomKernel;

impl Kernel for CustomKernel {
    type Scalar = f64;
    type Mr = U4;
    type Nr = U4;

    // dst <- alpha lhs rhs + beta dst
    fn microkernel(
        &self,
        alpha: f64,
        lhs: &MatRef<f64>,
        rhs: &MatRef<f64>,
        beta: f64,
        dst: &mut MatMut<f64>,
    ) {
        // lhs is col-major by default
        assert_eq!(lhs.row_stride(), 1);
        assert_eq!(lhs.nrows(), Self::MR);

        // rhs is row-major by default
        assert_eq!(rhs.col_stride(), 1);
        assert_eq!(rhs.ncols(), Self::NR);

        // dst is col-major by default
        assert_eq!(dst.row_stride(), 1);
        assert_eq!(dst.nrows(), Self::MR);
        assert_eq!(dst.ncols(), Self::NR);

        // your microkernel implementation...
    }
}

Benchmarks

All benchmarks are performed on square matrices of dimension n.

f32

PackSizes { mc: n, kc: n, nc: n }

aarch64 (M1)

   n    NeonKernel    Generic4x4    Generic8x8  naive(rustc)
  32        10.7µs        13.9µs        12.7µs        53.2µs
  64        50.6µs          73µs        62.7µs       307.7µs
 128       257.5µs       482.8µs       379.8µs         2.5ms
 256           1ms           2ms         1.3ms         9.5ms
 512         3.4ms         8.4ms           6ms        94.5ms
1024          25ms        66.4ms        46.4ms       882.7ms

License

Licensed under either of Apache License, Version 2.0 or MIT license at your option.

[WIP] An experimental Java-like language and it's virtual machine, for learning Java and JVM.

Sky VM An experimental Java-like language and it's virtual machine, for learning Java and JVM. Dependencies Rust (rust-lang/rust) 2021 Edition, dual-l

2 Jan 3, 2022

Some hacks and failed experiments surrounding nvidia's gamestream protocol and sunshine/moonlight implementations

Sunrise This repository contains a bunch of experiments surrounding the nvidia gamestream protocol and reimplementations in the form of sunshine and m

5 Dec 21, 2022

Msgpack serialization/deserialization library for Python, written in Rust using PyO3, and rust-msgpack. Reboot of orjson. msgpack.org[Python]

ormsgpack ormsgpack is a fast msgpack library for Python. It is a fork/reboot of orjson It serializes faster than msgpack-python and deserializes a bi

139 Dec 30, 2022

Distributed compute platform implemented in Rust, and powered by Apache Arrow.

Ballista: Distributed Compute Platform Overview Ballista is a distributed compute platform primarily implemented in Rust, powered by Apache Arrow. It

2.3k Jan 3, 2023

Tensors and differentiable operations (like TensorFlow) in Rust

autograd Differentiable operations and tensors backed by ndarray. Motivation Machine learning is one of the field where Rust lagging behind other lang

403 Dec 25, 2022

A fast, safe and easy to use reinforcement learning framework in Rust.

RSRL (api) Reinforcement learning should be fast, safe and easy to use. Overview rsrl provides generic constructs for reinforcement learning (RL) expe

139 Dec 13, 2022

Rust implementation of real-coded GA for solving optimization problems and training of neural networks

revonet Rust implementation of real-coded genetic algorithm for solving optimization problems and training of neural networks. The latter is also know

19 Aug 11, 2022

A real-time implementation of "Ray Tracing in One Weekend" using nannou and rust-gpu.

Real-time Ray Tracing with nannou & rust-gpu An attempt at a real-time implementation of "Ray Tracing in One Weekend" by Peter Shirley. This was a per

89 Dec 23, 2022

Tensors and dynamic neural networks in pure Rust.

Neuronika is a machine learning framework written in pure Rust, built with a focus on ease of use, fast prototyping and performance. Dynamic neural ne

851 Jan 3, 2023

General matrix multiplication with custom configuration in Rust. Supports no_std and no_alloc environments.

Related tags

Overview

microgemm

Getting Started

gemm

Implemented Kernels

Custom Kernel Implementation

Benchmarks

f32

aarch64 (M1)

License

You might also like...

[WIP] An experimental Java-like language and it's virtual machine, for learning Java and JVM.

Some hacks and failed experiments surrounding nvidia's gamestream protocol and sunshine/moonlight implementations

Msgpack serialization/deserialization library for Python, written in Rust using PyO3, and rust-msgpack. Reboot of orjson. msgpack.org[Python]

Distributed compute platform implemented in Rust, and powered by Apache Arrow.

Tensors and differentiable operations (like TensorFlow) in Rust

A fast, safe and easy to use reinforcement learning framework in Rust.

Rust implementation of real-coded GA for solving optimization problems and training of neural networks

A real-time implementation of "Ray Tracing in One Weekend" using nannou and rust-gpu.

Tensors and dynamic neural networks in pure Rust.

Releases(v0.1.3)

v0.1.3(Nov 2, 2023)

v0.1.2(Oct 31, 2023)

Owner

🧮 alphatensor matrix breakthrough algorithms + simd + rust.

import sticker packs from telegram, to be used at the Maunium sticker picker for Matrix

Statically sized matrix using a definition with const generics

Cleora AI is a general-purpose model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data.

Detect whether the current terminal supports rendering hyperlinks

Ecosystem of libraries and tools for writing and executing extremely fast GPU code fully in Rust.

Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.

Robust and Fast tokenizations alignment library for Rust and Python

Narwhal and Tusk A DAG-based Mempool and Efficient BFT Consensus.

MesaTEE GBDT-RS : a fast and secure GBDT library, supporting TEEs such as Intel SGX and ARM TrustZone