Machine Learning in Rust
Learn the Rust programming language through implementing classic machine learning algorithms. This project is self-completed without relying on any third-party libraries, serving as a bootstrap machine learning library.
Basics
- NdArray Module, just as the name. It has implemented
broadcast
,matrix operations
,permute
and etc. in arbitrary dimension. SIMD is used in matrix multiplication thanks to auto vectorizing by Rust. - Dataset Module, supporting customized loading data, re-format,
normalize
,shuffle
andDataloader
. Several popular dataset pre-processing recipes are available.
Algorithms
- Decision Tree, supporting both classification and regression tasks. Info gains like
gini
orentropy
are provided. - Logistic Regression, supporting regularization (
Lasso
,Ridge
andL-inf
) - Linear Regression, same as logistic regression, but for regression tasks.
- Naive Bayes, free to handle discrete or continuous feature values.
- SVM, with linear kernel using SGD and Hinge Loss to optimize.
- nn Module, containing
linear(MLP)
and someactivation
functions which could be freely stacked and optimized by gradient back propagations. - KNN, supporting both
KdTree
and vanillaBruteForceSearch
. - K-Means, clustering data with an unsupervised learning approach
Start
Let's use KNN algorithm to solve a classification task. More examples can be found in examples
directory.
-
create some synthetic data for tests
use std::collections::HashMap; let features = vec![ vec![0.6, 0.7, 0.8], vec![0.7, 0.8, 0.9], vec![0.1, 0.2, 0.3], ]; let labels = vec![0, 0, 1]; // so it is a binary classifiction task, 0 is for the large label, 1 is for the small label let mut label_map = HashMap::new(); label_map.insert(0, "large".to_string()); label_map.insert(1, "small".to_string());
-
convert the data to the
dataset
use mlinrust::dataset::Dataset; let dataset = Dataset::new(features, labels, Some(label_map));
-
split the dataset into
train
andvalid
sets and normalize them by Standard normalizationlet mut temp = dataset.split_dataset(vec![2.0, 1.0], 0); // [2.0, 1.0] is the split fraction, 0 is the seed let (mut train_dataset, mut valid_dataset) = (temp.remove(0), temp.remove(0)); use mlinrust::dataset::utils::{normalize_dataset, ScalerType}; normalize_dataset(&mut train_dataset, ScalerType::Standard); normalize_dataset(&mut valid_dataset, ScalerType::Standard);
-
build and train our KNN model using
KdTree
use mlinrust::model::knn::{KNNAlg, KNNModel, KNNWeighting}; // KdTree is one implementation of KNN; 1 defines the k of neighbours; Weighting decides the way of ensemble prediction; train_dataset is for training KNN; Some(2) is the param of minkowski distance let model = KNNModel::new(KNNAlg::KdTree, 1, Some(KNNWeighting::Distance), train_dataset, Some(2));
-
evaluate the model
use mlinrust::utils::evaluate; let (correct, acc) = evaluate(&valid_dataset, &model); println!("evaluate results\ncorrect {correct} / total {}, acc = {acc:.5}", test_dataset.len());
Todo
- model weights serialization for saving and loading
- Boosting/bagging
- matrix multiplication with multi threads
- refactor codes, sincerely request for comments from senior developers
Reference
- scikit-learn
- The book, 机器学习西瓜书 by Prof. Zhihua Zhou
Thanks
The rust community. I received many help from rust-lang Discord.
License
Under GPL-v3 license. And commercial use is strictly prohibited.