Machine learning crate for Rust

Overview

rustlearn

Circle CI Crates.io

A machine learning package for Rust.

For full usage details, see the API documentation.

Introduction

This crate contains reasonably effective implementations of a number of common machine learning algorithms.

At the moment, rustlearn uses its own basic dense and sparse array types, but I will be happy to use something more robust once a clear winner in that space emerges.

Features

Matrix primitives

Models

All the models support fitting and prediction on both dense and sparse data, and the implementations should be roughly competitive with Python sklearn implementations, both in accuracy and performance.

Cross-validation

Metrics

Parallelization

A number of models support both parallel model fitting and prediction.

Model serialization

Model serialization is supported via serde.

Using rustlearn

Usage should be straightforward.

  • import the prelude for all the linear algebra primitives and common traits:
use rustlearn::prelude::*;
  • import individual models and utilities from submodules:
use rustlearn::prelude::*;

use rustlearn::linear_models::sgdclassifier::Hyperparameters;
// more imports

Examples

Logistic regression

use rustlearn::prelude::*;
use rustlearn::datasets::iris;
use rustlearn::cross_validation::CrossValidation;
use rustlearn::linear_models::sgdclassifier::Hyperparameters;
use rustlearn::metrics::accuracy_score;


let (X, y) = iris::load_data();

let num_splits = 10;
let num_epochs = 5;

let mut accuracy = 0.0;

for (train_idx, test_idx) in CrossValidation::new(X.rows(), num_splits) {

    let X_train = X.get_rows(&train_idx);
    let y_train = y.get_rows(&train_idx);
    let X_test = X.get_rows(&test_idx);
    let y_test = y.get_rows(&test_idx);

    let mut model = Hyperparameters::new(X.cols())
                                    .learning_rate(0.5)
                                    .l2_penalty(0.0)
                                    .l1_penalty(0.0)
                                    .one_vs_rest();

    for _ in 0..num_epochs {
        model.fit(&X_train, &y_train).unwrap();
    }

    let prediction = model.predict(&X_test).unwrap();
    accuracy += accuracy_score(&y_test, &prediction);
}

accuracy /= num_splits as f32;

Random forest

use rustlearn::prelude::*;

use rustlearn::ensemble::random_forest::Hyperparameters;
use rustlearn::datasets::iris;
use rustlearn::trees::decision_tree;

let (data, target) = iris::load_data();

let mut tree_params = decision_tree::Hyperparameters::new(data.cols());
tree_params.min_samples_split(10)
    .max_features(4);

let mut model = Hyperparameters::new(tree_params, 10)
    .one_vs_rest();

model.fit(&data, &target).unwrap();

// Optionally serialize and deserialize the model

// let encoded = bincode::serialize(&model).unwrap();
// let decoded: OneVsRestWrapper<RandomForest> = bincode::deserialize(&encoded).unwrap();

let prediction = model.predict(&data).unwrap();

Contributing

Pull requests are welcome.

To run basic tests, run cargo test.

Running cargo test --features "all_tests" --release runs all tests, including generated and slow tests. Running cargo bench --features bench (only on the nightly branch) runs benchmarks.

Comments
  • Support conformal prediction

    Support conformal prediction

    My grad work deals a lot with conformal prediction, and there are few, if any, open source libraries that deal with it. I've been doing some work in it on Rust and, if it's okay, I'd like to add code to get some conformal prediction implementations out in the wild.

    I realize there are more pressing things to consider, and much more widely used things that you probably want to support first, but I'd like to put it out there.

    Some of the library needs to be structured to deal with it, unfortunately, but it's not too intrusive. For a classifier to work with CP, it needs to be able to provide a "conformal score", which could be part of a trait. While theoretically any classifier can be used with CP, the scores are classifier-specific. Logistic Regression is fine, and just requires the weights.

    LibSVM isn't sufficient to implement CP over an SVM since you need to gather the support vector coefficients from a solved QP problem, but it's not a big deal if not all classifiers we have support it.

    opened by UserAB1236872 10
  • add two regression metrics and two ranking metrics

    add two regression metrics and two ranking metrics

    Wanted to get learn some rustlearn and Rust more generally so I added:

    • dcg_score
    • ndcg_score
    • mean absolute error
    • mean squared error

    One note, cargo test fails for me on something unrelated with:

    failures:
    
    ---- factorization::factorization_machines::tests::test_iris_parallel stdout ----
    	Accuracy 0.7
    Train accuracy 0.7
    thread 'factorization::factorization_machines::tests::test_iris_parallel' panicked at 'assertion failed: test_accuracy > 0.94', src/factorization/factorization_machines.rs:637
    
    
    failures:
        factorization::factorization_machines::tests::test_iris_parallel
    
    test result: FAILED. 68 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out
    
    opened by travisbrady 7
  • Panic in decision_tree

    Panic in decision_tree

    Increasing the max_features in my hyperparameters seems to make this more likely.

    let mut tree_params = decision_tree::Hyperparameters::new(feat_matrix.cols());
    tree_params.min_samples_split(10)
               .rng(StdRng::from_seed(&[100]));
    
    let mut model = Hyperparameters::new(tree_params, 20)
                        .rng(StdRng::from_seed(&[100]))
                        .one_vs_rest();
    

    The data is shape: rows: 2247 cols: 442

    thread 'main' panicked at 'index out of bounds: the len is 10 but the index is 11', ../src/libcore/slice.rs:442

    The len/index number changes, but it's always an off by one.

    9: 0x55ede9c87a43 - core::panicking::panic_bounds_check::ha883fe1527ce6884 10: 0x55ede9a1e0a2 - <[T] as core..slice..SliceExt>::swap::h9bf7fe18e2fa8251 at /buildslave/rust-buildbot/slave/nightly-dist-rustc-linux/build/obj/../src/libcore/slice.rs:442 11: 0x55ede9a12fdc - collections::slice::<impl [T]>::swap::h6559c8ef586608a3 at /buildslave/rust-buildbot/slave/nightly-dist-rustc-linux/build/obj/../src/libcollections/slice.rs:479 12: 0x55ede9a27b9e - rustlearn::trees::decision_tree::FeatureIndices::mark_as_used::hc0584f311f525fd4 at /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/rustlearn-0.4.0/src/trees/decision_tree.rs:103 13: 0x55ede9a298ae - rustlearn::trees::decision_tree::DecisionTree::build_tree::hdb922dd21da7568d

    opened by insanitybit 4
  • Correct the ROC AUC computations and add relevant tests.

    Correct the ROC AUC computations and add relevant tests.

    The old ROC AUC computations were wrong (at least) in cases of duplicate y_hat values. I added a test demonstrating said issue and fixed the computations.

    The table below summarizes the resulting AUCs for each test. Tests 0,1 and tests 2,3 differ only in data-point order, so they should obviously return the same AUCs. I checked the correctness of the new values on paper.

    EDIT: also checked using Python's sklearn.metrics.roc_auc_score

    | test | old AUC | new AUC | | -- | ------------- | ------------- | | 0 | Ok(1) | Ok(0.75) | | 1 | Ok(0.25) | Ok(0.75) | | 2 | Ok(0.625) | Ok(0.875) | | 3 | Ok(1) | Ok(0.875) | | 4 | Ok(0.16666666) | Ok(0.5) | | 5 | Ok(NaN) | Ok(0.25) |

    opened by potocpav 3
  • Ideas for LibSVM wrapper

    Ideas for LibSVM wrapper

    I've been internally using a LibSVM wrapper internally for my grad work, which I just put on Github to maybe provoke some design ideas:

    https://github.com/Jragonmiris/rust-libsvm/tree/master/src

    I like the way it turned out, except the serialization which is a bit hacky. It has a bit more of an API surface area than your implementation, and would require editing to deal with your traits, but I like the way I handled Kernel and SVM Type parameters, which exist as an Enum instead of allowing you to set unrelated hyperparams.

    opened by UserAB1236872 3
  • Unable to deserialize Random Forest model

    Unable to deserialize Random Forest model

    After fitting a model I serialize it to the disk using rustc_serialize json.

    I then try to load the model but I get a parse error. The reason is that the 'rng' field of the model looks like this:

    ... ,"feature_types":[],"rng":},{"dim":4 ...

    That is, some normal fields, then the rng field, a colon, and nothing - the object just closes.

    Using rustc_serialize = "0.3.*"

    rustc -V rustc 1.13.0-nightly (3c5a0fa45 2016-08-22)

    The following functions are used to load and store the random forest.

    pub fn serialize_to_file<T>(s: &T, path: &str)
        where T: Encodable
    {
        let serialized = json::encode(&s).unwrap();
    
        let mut f = File::create(path).unwrap();
        write!(f, "{}", serialized).unwrap();
        f.flush().unwrap();
    }
    
    pub fn load_json<T: Decodable>(path: &str) -> T {
        let mut f = File::open(path).unwrap();
        let mut json_str = String::new();
    
        let _ = f.read_to_string(&mut json_str).unwrap();
        json::decode(&json_str).unwrap()
    }
    
    opened by insanitybit 1
  • Try to keep consistent in Array::from() method

    Try to keep consistent in Array::from() method

    when the input is a Vec<f32> or a Vec<Vec<f32>>, one uses move, the another uses reference, this is not consistent.

    This is for performance, but make this clearly in docs is better.

    let array = Array::from(vec![0.0, 1.0, 2.0, 3.0]);
    let array = Array::from(&vec![vec![0.0, 1.0],
                                  vec![2.0, 3.0]]);
    
    opened by libratiger 1
  • Compatability with `ndarray`

    Compatability with `ndarray`

    Is there a functionality to use arrays from ndarray to fit / predict models within rustlearn without making a copy? Possibly by creating an ArrayView with underlying data from ndarray::ArrayView2.

    opened by mlondschien 0
  • Add OOB predictions to random forest

    Add OOB predictions to random forest

    I am using in-sample OOB predictions to estimate the KL-divergence between samples. In general, OOB predictions are an efficient alternative to CV to estimate out of sample prediction performance and can be used for tuning.

    Getting OOB predictions requires storing the samples used to build each tree (i.e. indices here). This could be made optional. We can then add up predictions for samples only that were OOB for a particualr tree here, keeping track of the number of trees for which a particular sample was OOB.

    I could work on a PR, but might need some help with details and guidance on what you think the API should be.

    opened by mlondschien 0
  • Update to Rust 2018

    Update to Rust 2018

    Needs to be updated with current dependencies and brought up to date for Rust 2018.

    I have done this in my repo, and am happy to do a pull request. However, my updated version has three test regressions from current master, so it seems better to fix those first.

    If anyone is still interested in this crate and wants to help, please let me know.

    opened by BartMassey 0
  • Normalized cumulative gain computation possibly incorrect

    Normalized cumulative gain computation possibly incorrect

    I was looking at the implementation here, copied below:

    /// Normalized Discounted Cumulative Gain
    ///
    /// # Panics
    /// Will panic if inputs are of unequal length.
    pub fn ndcg_score(y_true: &Array, y_hat: &Array, k: i32) -> f32 {
        assert!(y_true.rows() == y_hat.rows());
        let best = dcg_score(y_true, y_hat, k);
        let actual = dcg_score(y_true, y_hat, k);
        actual / best
    }
    

    Doesn't that always return 1.0f32 (unless it panics)? I would expect best to be computed as dcg_score(y_true, y_true, k).

    opened by marcusklaas 0
  • How to run for one hot encoding

    How to run for one hot encoding

    In the examples shown using iris dataset, y is a vector of dimension 1 which is essentially a labelencoded vector. Running that on a one-hot encoded vector for y is not working out for me. please help on this. Below is an example code.

    
    use rustlearn::ensemble::random_forest::Hyperparameters;
    use rustlearn::trees::decision_tree;
    
    fn main() {
        let data = Array::from(&vec![vec![0.0, 1.0], vec![2.0, 3.0], vec![3.0, 4.0], vec![5.0, 6.0], vec![7.0, 8.0], vec![9.0, 10.0]]);
        let target = Array::from(&vec![vec![0.0, 1.0], vec![0.0, 1.0], vec![0.0, 1.0], vec![1.0, 0.0], vec![1.0, 0.0], vec![1.0, 0.0]]);
        let test = Array::from(&vec![vec![0.0, 1.0]]);
    
        println!("{:?}", data);
        println!("{:?}", target);
    
        let mut tree_params = decision_tree::Hyperparameters::new(data.cols());
        tree_params.min_samples_split(2)
            .max_features(2);
    
        let mut model = Hyperparameters::new(tree_params, 2)
            .one_vs_rest();
    
        model.fit(&data, &target).unwrap();
    
        let prediction = model.predict(&test).unwrap();
        print!("{:?}", prediction);
    }
    

    The output of this code is

    Array { rows: 6, cols: 2, order: RowMajor, data: [0.0, 1.0, 2.0, 3.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0] }
    Array { rows: 6, cols: 2, order: RowMajor, data: [0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0] }
    Array { rows: 1, cols: 1, order: RowMajor, data: [0.0] }
    

    As you can see the dimension of the predicted values is only 1.

    opened by infinite-Joy 0
Releases(v0.5.0)
Owner
Maciej Kula
Maciej Kula
Machine learning crate in Rust

DeepRust - Machine learning in Rust Vision To create a deeplearning crate in rust aiming to create a great experience for ML researchers & developers

Vigneshwer Dhinakaran 8 Sep 6, 2022
convolutions-rs is a crate that provides a fast, well-tested convolutions library for machine learning

convolutions-rs convolutions-rs is a crate that provides a fast, well-tested convolutions library for machine learning written entirely in Rust with m

null 10 Jun 28, 2022
A Rust library with homemade machine learning models to classify the MNIST dataset. Built in an attempt to get familiar with advanced Rust concepts.

mnist-classifier Ideas UPDATED: Finish CLI Flags Parallelize conputationally intensive functions Class-based naive bayes README Image parsing Confusio

Neil Kaushikkar 0 Sep 2, 2021
A Rust machine learning framework.

Linfa linfa (Italian) / sap (English): The vital circulating fluid of a plant. linfa aims to provide a comprehensive toolkit to build Machine Learning

Rust-ML 2.2k Jan 2, 2023
Machine Learning library for Rust

rusty-machine This library is no longer actively maintained. The crate is currently on version 0.5.4. Read the API Documentation to learn more. And he

James Lucas 1.2k Dec 31, 2022
Machine learning in Rust.

Rustml Rustml is a library for doing machine learning in Rust. The documentation of the project with a descprition of the modules can be found here. F

null 60 Dec 15, 2022
Rust based Cross-GPU Machine Learning

HAL : Hyper Adaptive Learning Rust based Cross-GPU Machine Learning. Why Rust? This project is for those that miss strongly typed compiled languages.

Jason Ramapuram 83 Dec 20, 2022
Machine Learning Library for Rust

autograph Machine Learning Library for Rust undergoing maintenance Features Portable accelerated compute Run SPIR-V shaders on GPU's that support Vulk

null 223 Jan 1, 2023
Fwumious Wabbit, fast on-line machine learning toolkit written in Rust

Fwumious Wabbit is a very fast machine learning tool built with Rust inspired by and partially compatible with Vowpal Wabbit (much love! read more abo

Outbrain 115 Dec 9, 2022
A Machine Learning Framework for High Performance written in Rust

polarlight polarlight is a machine learning framework for high performance written in Rust. Key Features TBA Quick Start TBA How To Contribute Contrib

Chris Ohk 25 Aug 23, 2022
Example of Rust API for Machine Learning

rust-machine-learning-api-example Example of Rust API for Machine Learning API example that uses resnet224 to infer images received in base64 and retu

vaaaaanquish 16 Oct 3, 2022
Machine learning Neural Network in Rust

vinyana vinyana - stands for mind in pali language. Goal To implement a simple Neural Network Library in order to understand the maths behind it. This

Alexandru Olaru 3 Dec 26, 2022
Source Code for 'Practical Machine Learning with Rust' by Joydeep Bhattacharjee

Apress Source Code This repository accompanies Practical Machine Learning with Rust by Joydeep Bhattacharjee (Apress, 2020). Download the files as a z

Apress 57 Dec 7, 2022
An example of using TensorFlow rust bindings to serve trained machine learning models via Actix Web

Serving TensorFlow with Actix-Web This repository gives an example of training a machine learning model using TensorFlow2.0 Keras in python, exporting

Kyle Kosic 39 Dec 12, 2022
🏆 A ranked list of awesome machine learning Rust libraries.

best-of-ml-rust ?? A ranked list of awesome machine learning Rust libraries. This curated list contains 180 awesome open-source projects with a total

₸ornike 110 Dec 28, 2022
BudouX-rs is a rust port of BudouX (machine learning powered line break organizer tool).

BudouX-rs BudouX-rs is a rust port of BudouX (machine learning powered line break organizer tool). Note: This project contains the deliverables of the

null 5 Jan 20, 2022
Mars is a rust machine learning library. [Goal is to make Simple as possible]

Mars Mars (ma-rs) is an blazingly fast rust machine learning library. Simple and Powerful! ?? ?? Contribution: Feel free to build this project. This i

KoBruh 3 Dec 25, 2022
A machine learning library in Rust from scratch.

Machine Learning in Rust Learn the Rust programming language through implementing classic machine learning algorithms. This project is self-completed

Chi Zuo 39 Jan 17, 2023
Xaynet represents an agnostic Federated Machine Learning framework to build privacy-preserving AI applications.

xaynet Xaynet: Train on the Edge with Federated Learning Want a framework that supports federated learning on the edge, in desktop browsers, integrate

XayNet 196 Dec 22, 2022