Barnes-Hut t-SNE implementation written in Rust.

Overview

bhtsne

License: MIT Gethseman codecov

Barnes-Hut implementation of t-SNE written in Rust. The algorithm is described with fine detail in this paper by Laurens van der Maaten.

Installation

Add this line to your Cargo.toml:

[dependencies]
bhtsne = "0.4.0"

Example

use bhtsne;

// Parameters template. One can also use f64s.
let n: usize = 8;     // Number of vectors to embed.
let d: usize = 4;     // The dimensionality of the original space.
let theta: f32 = 0.5; // The parameter used by the Barnes-Hut algorithm. When set to 0.0
                      // the exact t-SNE version is used instead.
   
let perplexity: f32 = 1.0;      // The perplexity of the conditional distribution.
let max_iter: u64 = 2000;       // The number of fitting iterations.
let no_dims: usize = 2;         // The dimensionality of the embedded space.

// Loads data the from a csv file skipping the first row,
// treating it as headers and skipping the 5th column,
// treating it as a class label.
let mut data: Vec<f32> = bhtsne::load_csv("data.csv", true, Some(4));
// This is the vector for the resulting embedding.
let mut y: Vec<f32> = vec![0.0; n * no_dims];
// Runs t-SNE.
bhtsne::run(
    &mut data, n, d, &mut y, no_dims, perplexity, theta, false, max_iter, 250, 250,
);
// Writes the embedding to a csv file.
bhtsne::write_csv("embedding.csv", y, no_dims);

Also check the docs available on crates.io.

MNIST embedding

The following embedding has been obtained by preprocessing the MNIST dataset using PCA to reduce its dimensionality to 50. It took approximately 17 mins on a 2.0GHz quad-core 10th-generation i5 MacBook Pro. mnist

Comments
  • c++ benchmarks

    c++ benchmarks

    Dear Francesco,

    Thanks for this implementation. I assume it is faster than the C++ original version right considering the rayon parallel efficiency. I will have some tests later on but just want to know whether you have some benchmarks.

    Thanks,

    Jianshu

    opened by jianshu93 16
  • Integration into linfa

    Integration into linfa

    I just saw your post on Reddit, awesome work! I'm the maintainer of linfa and thought about implementing t-SNE as a transformative dimensionality reduction technique in the past, but never came to it. This crate can take off a lot of work for us. We would implement a wrapper which adepts your algorithm by:

    • implementing builder style pattern for configuration
    • using datasets for input/output
    • implementing transform trait for the algorithm

    Sounds good? I just quickly glanced at the source code and three things stood out which could be improved:

    • make csv dependency optional, sometimes it's not necessary to pull that in
    • make algorithm generic for num_traits::Float
    • what is about error handling? Can any part of your algorithm fail, especially what happens if there are NaNs in your data or parameters are mis-configured (e.g. perplexity negative)
    enhancement 
    opened by bytesnake 9
  • Replace pdqselect with select_nth_unstable_by

    Replace pdqselect with select_nth_unstable_by

    Switch from pdqselect dependency to official version of the algorithm. Issue with pdqselect is that the 0.1.1 version broke all older compilers by updating to the 2021 edition.

    opened by YuhanLiin 3
  • Typo in function name

    Typo in function name

    First, thanks so much for providing this crate! I've thought about implementing t-SNE in Rust before, but never took on the challenge. Kudos!

    I think you have a typo in one of your function names; bhtsne::wite_csv should probably be write_csv. Just wanted to let you know.

    Thanks, again!

    bug documentation 
    opened by vlmutolo 2
  • fix: Index out of bounds in symmetrize_spare_matrix

    fix: Index out of bounds in symmetrize_spare_matrix

    Received this error intermittently:

    thread 'main' panicked at 'index out of bounds: the len is 3173830 but the index is 3173830', /home/ehernva/.cargo/git/checkouts/bhtsne-5d23f839d3bdde38/69c58bf/src/tsne/mod.rs:347:17
    stack backtrace:
    0: rust_begin_unwind at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/std/src/panicking.rs:498:5
    1: core::panicking::panic_fmt at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/core/src/panicking.rs:107:14
    2: core::panicking::panic_bounds_check at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/core/src/panicking.rs:75:5
    3: bhtsne::tsne::symmetrize_sparse_matrix
    4: bhtsne::tSNE<T,U>::barnes_hut
    5: embed::main
    

    I don't have an indepth understanding of this code, but it seems to me like the condition in the loop that calculates the number of columns per row is the opposite of the one that is used to increase the offsets in the main loop. From my testing, this seems to solve the problem, but probably worth thinking through that it's a valid fix before accepting this PR.

    opened by EmilHernvall 1
  • Add a method for getting the output directly

    Add a method for getting the output directly

    This PR adds a simple method for retrieving the results, without writing them to a csv-file. This seems to obvious to be an oversight, so maybe there's some conscious design reason for why it's not in there? Nevertheless, I figured I might as well send a PR rather than just file an issue.

    Thank you for this crate!

    opened by EmilHernvall 1
  • Missing file errors when running tests

    Missing file errors when running tests

    When I run cargo test for the first time I get the following:

    running 10 tests
    test test::set_embedding_dim ... ok
    test test::set_epochs ... ok
    test test::exact_tsne ... FAILED
    test test::set_final_momentum ... ok
    test test::set_learning_rate ... ok
    test test::set_momentum ... ok
    test test::set_momentum_switch_epoch ... ok
    test test::set_perplexity ... ok
    test test::set_stop_lying_epoch ... ok
    test test::barnes_hut_tsne ... FAILED
    
    failures:
    
    ---- test::exact_tsne stdout ----
    thread 'test::exact_tsne' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/test.rs:77:87
    
    ---- test::barnes_hut_tsne stdout ----
    thread 'test::barnes_hut_tsne' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/test.rs:107:87
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
    
    
    failures:
        test::barnes_hut_tsne
        test::exact_tsne
    

    It looks like the tests expect several CSV data files, but they're not there.

    opened by YuhanLiin 0
  • Remove recursion from VPTree

    Remove recursion from VPTree

    The performance of this implementation is slightly better when the construction and the search though the VPTree is done in an iterative manner. However, by implementing the SPTree too in an iterative way the performance worsen noticeably.

    opened by frjnn 0
Owner
Francesco Iannelli
CompSci postgraduate student and ML enthusiast. Member, creator and maintainer of @neuronika.
Francesco Iannelli
Rust implementation of real-coded GA for solving optimization problems and training of neural networks

revonet Rust implementation of real-coded genetic algorithm for solving optimization problems and training of neural networks. The latter is also know

Yury Tsoy 19 Aug 11, 2022
Instance Distance is a fast pure-Rust implementation of the Hierarchical Navigable Small Worlds paper

Fast approximate nearest neighbor searching in Rust, based on HNSW index

Instant Domain Search, Inc. 135 Dec 24, 2022
A real-time implementation of "Ray Tracing in One Weekend" using nannou and rust-gpu.

Real-time Ray Tracing with nannou & rust-gpu An attempt at a real-time implementation of "Ray Tracing in One Weekend" by Peter Shirley. This was a per

null 89 Dec 23, 2022
A neural network, and tensor dynamic automatic differentiation implementation for Rust.

Corgi A neural network, and tensor dynamic automatic differentiation implementation for Rust. BLAS The BLAS feature can be enabled, and requires CBLAS

Patrick Song 20 Nov 7, 2022
Flexible, reusable reinforcement learning (Q learning) implementation in Rust

Rurel Rurel is a flexible, reusable reinforcement learning (Q learning) implementation in Rust. Release documentation In Cargo.toml: rurel = "0.2.0"

Milan Boers 60 Dec 29, 2022
A Rust🦀 implementation of CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning

craftml-rs A Rust implementation of CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning (Siblini et al., 2018). Perf

Tom Dong 15 Nov 6, 2022
An implementation of the Pair Adjacent Violators algorithm for isotonic regression in Rust

Pair Adjacent Violators for Rust Overview An implementation of the Pair Adjacent Violators algorithm for isotonic regression. Note this algorithm is a

Ian Clarke 3 Dec 25, 2021
Rust implementation for DBSCANSD, a trajectory clustering algorithm.

DBSCANSD Rust implementation for DBSCANSD, a trajectory clustering algorithm. Brief Introduction DBSCANSD (Density-Based Spatial Clustering of Applica

Nick Gu 2 Mar 14, 2021
A naive DBSCAN implementation in Rust

DBSCAN Density-Based Spatial Clustering of Applications with Noise Wikipedia link DBSCAN is a density-based clustering algorithm: given a set of point

Alan K 2 Dec 23, 2022
A random forest implementation in Rust

randomforest A random forest implementation in Rust. Examples use randomforest::criterion::Mse; use randomforest::RandomForestRegressorOptions; use ra

Takeru Ohta 3 Nov 19, 2022
Rust implementation of multi-index hashing for neighbor searches on binary codes in the Hamming space

mih-rs Rust implementation of multi-index hashing (MIH) for neighbor searches on binary codes in the Hamming space, described in the paper Norouzi, Pu

Shunsuke Kanda 8 Sep 23, 2022
kdtree implementation for rust.

kdtree-rust kdtree implementation for rust. Implementation uses sliding midpoint variation of the tree. More Info here Implementation uses single Vec<

Aleksander 11 May 18, 2022
Rust implementation of user-based collaborative filtering

Rucommender Recommendation system written in Rust Overview An implementation in Rust of a collaborative filtering recommendations algorithm with a use

null 0 Sep 15, 2018
This repository features a simple Kalman filter and RTS smoother (KFS) implementation in Rust by using the ndarray library.

Kalman filter and RTS smoother in Rust (ndarray) This repository features a simple Kalman filter and RTS smoother (KFS) implementation in Rust by usin

SPDEs 3 Dec 1, 2022
Python+Rust implementation of the Probabilistic Principal Component Analysis model

Probabilistic Principal Component Analysis (PPCA) model This project implements a PPCA model implemented in Rust for Python using pyO3 and maturin. In

FindHotel 11 Dec 16, 2022
TopK algorithm implementation in Rust (Filtered Space-Saving)

TopK TopK algorithm implementation in Rust. This crate currently provides the Filtered Space-Saving algorithm. Version numbers follow the semver conve

null 6 Feb 24, 2023
Rust+OpenCL+AVX2 implementation of LLaMA inference code

RLLaMA RLLaMA is a pure Rust implementation of LLaMA large language model inference.. Supported features Uses either f16 and f32 weights. LLaMA-7B, LL

Mikko Juola 344 Apr 16, 2023
A rust implementation of the csl-next model.

Vision This is a project to write the CSL-Next typescript model and supporting libraries and tools in Rust, and convert to JSON Schema from there. At

Bruce D'Arcus 4 Jun 13, 2023
Rust implementation of @Qdrant/fastembed.

FastEmbed-rs ?? Rust implementation of @Qdrant/fastembed ?? Features Supports synchronous usage. No dependency on Tokio. Uses @huggingface/tokenizers

Anush 35 Oct 30, 2023