Rust library for Self Organising Maps (SOM).

Overview

RusticSOM

Rust library for Self Organising Maps (SOM).

Status Open Source Love License Build Status

Using this Crate

Add rusticsom as a dependency in Cargo.toml

[dependencies]
rusticsom = "1.1.0"

Include the crate

use rusticsom::SOM;

API

Use SOM::create to create an SOM object using the API call below, which creates an SOM with length x breadth cells and accepts neurons of length inputs.

pub fn create(length: usize, breadth: usize, inputs: usize, randomize: bool, learning_rate: Option<f32>, sigma: Option<f32>, decay_function: Option<fn(f32, u32, u32) -> f64>, neighbourhood_function: Option<fn((usize, usize), (usize, usize), f32) -> Array2<f64>>) -> SOM { ... }

randomize is a flag, which, if true, initializes the weights of each cell to random, small, floating-point values.

learning_rate, optional, is the learning_rate of the SOM; by default it will be 0.5.

sigma, optional, is the spread of the neighbourhood function; by default it will be 1.0.

decay_function, optional, is a function pointer that accepts functions that take 3 parameters of types f32, u32, u32, and returns an f64. This function is used to "decay" both the learning_rate and sigma. By default it is

new_value = old_value / (1 + current_iteration/total_iterations)

neighbourhood_function, optional, is also a function pointer that accepts functions that take 3 parameters, a tuple of type (usize, usize) representing the size of the SOM, another tuple of type (usize, usize) representing the position of the winner neuron, and an f32 representing sigma; and returns a 2D Array containing weights of the neighbours of the winning neuron, i.e, centered at winner. By default, the Gaussian function will be used, which returns a "Gaussian centered at the winner neuron".


    pub fn from_json(serialized: &str,  decay_function: Option<fn(f32, u32, u32) -> f64>, neighbourhood_function: Option<fn((usize, usize), (usize, usize), f32) -> Array2<f64>>) -> serde_json::Result<SOM> { ... }

This function allows to create a SOM from a previously exported SOM json data using SOM::to_json().


Use SOM_Object.train_random() to train the SOM with the input dataset, where samples from the input dataset are picked in a random order.

pub fn train_random(&mut self, data: Array2<f64>, iterations: u32) { ... }

Samples (rows) from the 2D Array data are picked randomly and the SOM is trained for iterations iterations!


Use SOM_Object.train_batch() to train the SOM with the input dataset, where samples from the input dataset are picked in a sequential order.

pub fn train_batch(&mut self, data: Array2<f64>, iterations: u32) { ... }

Samples (rows) from the 2D Array data are picked sequentially and the SOM is trained for iterations iterations!


Use SOM_Object.winner() to find the winning neuron for a given sample.

pub fn winner(&mut self, elem: Array1<f64>) -> (usize, usize) { ... }

This function must be called with an SOM object.

Requires one parameter, a 1D Array of f64s representing the input sample.

Returns a tuple (usize, usize) representing the x and y coordinates of the winning neuron in the SOM.


Use SOM_Object.winner_dist() to find the winning neuron for a given sample, and it's distance from this winner neuron.

pub fn winner_dist(&mut self, elem: Array1<f64>) -> ((usize, usize), f64) { ... }

This function must be called with an SOM object.

Requires one parameter, a 1D Array of f64s representing the input sample.

Returns a tuple (usize, usize) representing the x and y coordinates of the winning neuron in the SOM.

Also returns an f64 representing the distance of the input sample from this winner neuron.


pub fn activation_response(&self) -> ArrayView2<usize> { ... }

This function returns the activation map of the SOM. The activation map is a 2D Array where each cell at (i, j) represents the number of times the (i, j) cell of the SOM was picked to be the winner neuron.


pub fn get_size(&self) -> (usize, usize)

This function returns a tuple representing the size of the SOM. Format is (length, breadth).


pub fn distance_map(self) -> Array2<f64> { ... }

Returns the distance map of the SOM, i.e, the normalized distance of every neuron with every other neuron.


pub fn to_json(&self) -> serde_json::Result<String> { ... }

Returns the internal SOM data as pretty printed json (using serde_json).


Primary Contributors

Aditi Srinivas
Avinash Shenoy


Example

We've tested this crate on the famous iris dataset (present in csv format in the extras folder).

The t_full_test function in /tests/test.rs was used to produce the required output. The following plots were obtained using matplotlib for Python.

Using a 5 x 5 SOM, trained for 250 iterations :

SOM1


Using a 10 x 10 SOM, trained for 1000 iterations :

SOM2

Symbol Represents
Circle setosa
Square versicolor
Diamond virginica
Comments
  • Add lint and formatting checks to CI

    Add lint and formatting checks to CI

    This PR adds two new CI steps: one to check formatting (rustfmt), and another to check for code lints (clippy).

    I reformatted the code and fixed all lints. However, I purposely introduced one lint and formatting mistake to ensure the CI steps execute as expected. If these changes are desirable, I will remove the last commit so this PR can merge.

    opened by JayKickliter 4
  • implement serde_json export/import

    implement serde_json export/import

    In order to use trained SOMs for later use, I have had the need for import/export. Using serde I have implemented json as import/export format. One simple test added under tests/.

    If you like it, you can include it in mainstream.

    While I was at it, I have added Edition 2008 in Cargo.toml and fixed the one or other compiler warning.

    opened by gin66 3
  • [docs] Reformat docstrings to be auto-generated

    [docs] Reformat docstrings to be auto-generated

    Hey!

    Dunno if you're accepting PRs, but this is a documentation-only change that basically just uses /// instead of // for docstrings, so that the auto-generated documentation on docs.rs will actually display all the existing documentation.

    opened by beyarkay 2
  • Simplify code where possible

    Simplify code where possible

    This PR is a non-comprehensive1 effort to:

    • reduce temporary heap allocations
    • delete unnecessary code
    • improve readability

    1: I have another branch with more aggressive, but API breaking, optimizations that remove a lot of heap allocations. I can open a follow-up PR for that along with a major version bump.

    opened by JayKickliter 1
  • update ndarray to 0.13

    update ndarray to 0.13

    This updates the ndarray dependency to 0.13 for compatibility with minor changes to the public types. I also bumped the version number to 1.1.1 and fixed some unused warnings.

    Side-note: I noticed the 1.1.0 version did not seem to be published to crates, but using the repository url in the cargo file is an easy workaround.

    opened by masonblier 1
  • Potential NaNs due to div by zero when normalising

    Potential NaNs due to div by zero when normalising

    So these lines in the update method:

    for i in 0..self.data.x {
        for j in 0..self.data.y {
            for k in 0..self.data.z {
                self.data.map[[i, j, k]] += (elem[[k]] - self.data.map[[i, j, k]]) * g[[i, j]];
            }
    
            let norm = norm(self.data.map.index_axis(Axis(0), i).index_axis(Axis(0), j));
            for k in 0..self.data.z {
                self.data.map[[i, j, k]] /= norm;
            }
        }
    }
    

    Were causing me issues because norm was ending up as zero, causing a divide by zero to make self.data.map[[i, j, k]] be f64::NaN and resulting in funky results later down the line (I've got NaN values in my input features)

    I'm not sure what the purpose of the normalization is? I understand that it would ensure each neuron's weights sum to 1, but I can't find where this is recommended.

    On my own fork I've wrapped the normalisation with a check to make sure norm >0 and that seems to have solved the issues, although I'm not sure how valid it is.

    opened by beyarkay 0
  • Breaking changes: increase performance

    Breaking changes: increase performance

    This PR significantly increases training and winner-lookup speed. The speedup is primarily achieved by reducing heap allocations. However, the cost is that it introduces API-breaking changes.

    The methodology was first to add benchmarking to the current code base and measuring performance deltas after every little tweak to the code. In some cases, removing intermediate heap allocations led to performance regressions. In those cases, I left comments explaining why they're necessary.

    Because this PR introduces breaking changes, I added breaking change: the non-default feature serde-1. This helps with build times for people not interested in serialization. Building with serde-1 enables this crate's old [to, from]_json support. I believe that those functions are out of this crate's scope, but as long as they are disabled by default, I see no harm.

    Benchmarks

    I first ran cargo bench without any library modifications, and the output below is after rerunning it on the tip of this branch.

    Training/Random/10      time:   [68.191 us 68.893 us 69.746 us]
                            thr8 Kelem/s 145.15 Kelem/s 146.65 Kelem/s]
                     change:
                            time:   [-4.4626% -3.2623% -2.0623%] (p = 0.00 < 0.05)
                            thrpt:  [+2.1057% +3.3724% +4.6710%]
                            Performance has improved.
    Found 6 outliers among 100 measurements (6.00%)
      1 (1.00%) high mild
      5 (5.00%) high severe
    Training/Batch/10       time:   [59.171 us 59.415 us 59.682 us]
                            thrpt:  [167.55 Kelem/s 168.31 Kelem/s 169.00 Kelem/s]
                     change:
                            time:   [-19.837% -18.782% -17.789%] (p = 0.00 < 0.05)
                            thrpt:  [+21.639% +23.126% +24.745%]
                            Performance has improved.
    Found 4 outliers among 100 measurements (4.00%)
      2 (2.00%) high mild
      2 (2.00%) high severe
    Training/Random/100     time:   [666.92 us 671.67 us 678.39 us]
                            thrpt:  [147.41 Kelem/s 148.88 Kelem/s 149.94 Kelem/s]
                     change:
                            time:   [-6.2380% -4.6824% -2.8067%] (p = 0.00 < 0.05)
                            thrpt:  [+2.8878% +4.9124% +6.6531%]
                            Performance has improved.
    Found 7 outliers among 100 measurements (7.00%)
      1 (1.00%) low mild
      1 (1.00%) high mild
      5 (5.00%) high severe
    Training/Batch/100      time:   [605.23 us 607.84 us 611.17 us]
                            thrpt:  [163.62 Kelem/s 164.52 Kelem/s 165.23 Kelem/s]
                     change:
                            time:   [-17.402% -16.414% -15.498%] (p = 0.00 < 0.05)
                            thrpt:  [+18.340% +19.637% +21.068%]
                            Performance has improved.
    Found 8 outliers among 100 measurements (8.00%)
      1 (1.00%) low mild
      5 (5.00%) high mild
      2 (2.00%) high severe
    Training/Random/1000    time:   [6.7615 ms 6.7930 ms 6.8285 ms]
                            thrpt:  [146.45 Kelem/s 147.21 Kelem/s 147.90 Kelem/s]
                     change:
                            time:   [-3.2269% -2.6307% -2.0150%] (p = 0.00 < 0.05)
                            thrpt:  [+2.0564% +2.7017% +3.3345%]
                            Performance has improved.
    Found 5 outliers among 100 measurements (5.00%)
      2 (2.00%) high mild
      3 (3.00%) high severe
    Training/Batch/1000     time:   [6.0276 ms 6.0493 ms 6.0753 ms]
                            thrpt:  [164.60 Kelem/s 165.31 Kelem/s 165.90 Kelem/s]
                     change:
                            time:   [-14.930% -14.487% -14.008%] (p = 0.00 < 0.05)
                            thrpt:  [+16.290% +16.942% +17.550%]
                            Performance has improved.
    Found 4 outliers among 100 measurements (4.00%)
      2 (2.00%) high mild
      2 (2.00%) high severe
    
    Winner/Plain/4          time:   [1.0489 us 1.0518 us 1.0548 us]
                            change: [-45.352% -44.837% -44.342%] (p = 0.00 < 0.05)
                            Performance has improved.
    Found 3 outliers among 100 measurements (3.00%)
      2 (2.00%) high mild
      1 (1.00%) high severe
    Winner/Distance/4       time:   [1.0665 us 1.0722 us 1.0785 us]
                            change: [-48.275% -47.959% -47.654%] (p = 0.00 < 0.05)
                            Performance has improved.
    Found 7 outliers among 100 measurements (7.00%)
      3 (3.00%) high mild
      4 (4.00%) high severe
    

    Notes

    I'm fairly certain that the lackluster speedup of random training is explained in this comment.

    opened by JayKickliter 1
Owner
Avinash Shenoy
A venti cup of coffee in a land of tea
Avinash Shenoy
Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference

Sonos' Neural Network inference engine. This project used to be called tfdeploy, or Tensorflow-deploy-rust. What ? tract is a Neural Network inference

Sonos, Inc. 1.5k Jan 8, 2023
Msgpack serialization/deserialization library for Python, written in Rust using PyO3, and rust-msgpack. Reboot of orjson. msgpack.org[Python]

ormsgpack ormsgpack is a fast msgpack library for Python. It is a fork/reboot of orjson It serializes faster than msgpack-python and deserializes a bi

Aviram Hassan 139 Dec 30, 2022
A Rust library with homemade machine learning models to classify the MNIST dataset. Built in an attempt to get familiar with advanced Rust concepts.

mnist-classifier Ideas UPDATED: Finish CLI Flags Parallelize conputationally intensive functions Class-based naive bayes README Image parsing Confusio

Neil Kaushikkar 0 Sep 2, 2021
Machine Learning library for Rust

rusty-machine This library is no longer actively maintained. The crate is currently on version 0.5.4. Read the API Documentation to learn more. And he

James Lucas 1.2k Dec 31, 2022
Rust numeric library with R, MATLAB & Python syntax

Peroxide Rust numeric library contains linear algebra, numerical analysis, statistics and machine learning tools with R, MATLAB, Python like macros. W

Tae Geun Kim 351 Dec 29, 2022
A deep learning library for rust

Alumina An experimental deep learning library written in pure rust. Breakage expected on each release in the short term. See mnist.rs in examples or R

zza 95 Nov 30, 2022
Machine Learning Library for Rust

autograph Machine Learning Library for Rust undergoing maintenance Features Portable accelerated compute Run SPIR-V shaders on GPU's that support Vulk

null 223 Jan 1, 2023
Simple neural network library for classification written in Rust.

Cogent A note I continue working on GPU stuff, I've made some interesting things there, but ultimately it made me realise this is far too monumental a

Jonathan Woollett-Light 41 Dec 25, 2022
Rust wrapper for the Fast Artificial Neural Network library

fann-rs Rust wrapper for the Fast Artificial Neural Network (FANN) library. This crate provides a safe interface to FANN on top of the low-level bindi

Andreas Fackler 12 Jul 17, 2022
RustFFT is a high-performance FFT library written in pure Rust.

RustFFT is a high-performance FFT library written in pure Rust. It can compute FFTs of any size, including prime-number sizes, in O(nlogn) time.

Elliott Mahler 411 Jan 9, 2023
Rust crate to create Anki decks. Based on the python library genanki

genanki-rs: A Rust Crate for Generating Anki Decks With genanki-rs you can easily generate decks for the popular open source flashcard platform Anki.

Yannick Funk 63 Dec 23, 2022
Generic Automatic Differentiation library for Rust (aka "autograd")

GAD: Generic Automatic Differentiation for Rust This project aims to provide a general and extensible framework for tape-based automatic differentiati

Facebook Research 24 Dec 20, 2022
l2 is a fast, Pytorch-style Tensor+Autograd library written in Rust

l2 • ?? A Pytorch-style Tensor+Autograd library written in Rust Installation • Contributing • Authors • License • Acknowledgements Made by Bilal Khan

Bilal Khan 163 Dec 25, 2022
Reinforcement learning library written in Rust

REnforce Reinforcement library written in Rust This library is still in early stages, and the API has not yet been finalized. The documentation can be

Niven Achenjang 20 Jun 14, 2022
Rust library for genetic algorithms

Spiril Spiril is an implementation of a genetic algorithm for obtaining optimum variables (genetics) for a task through mutation and natural selection

Ashley Jeffs 25 Apr 29, 2022
🚀 efficient approximate nearest neighbor search algorithm collections library written in Rust 🦀 .

?? efficient approximate nearest neighbor search algorithm collections library written in Rust ?? .

Hora-Search 2.3k Jan 3, 2023
miniature: a toy deep learning library written in Rust

miniature: a toy deep learning library written in Rust A miniature is a toy deep learning library written in Rust. The miniature is: implemented for a

Takuma Seno 4 Nov 29, 2021
A rust library inspired by kDDBSCAN clustering algorithm

kddbscan-rs Rust implementation of the kddbscan clustering algorithm. From the authors of kDDBSCAN algorithm. Due to the adoption of global parameters

WhizSid 2 Apr 28, 2021
Rust language bindings for the LIBLINEAR C/C++ library.

liblinear-rs Rust bindings for the liblinear C/C++ library. Provides a thin (but rustic) wrapper around the original C-interface exposed by the librar

Madeesh Kannan 8 Sep 22, 2022