C library for finding nearest (most similar) element in a set

Overview

VP-tree nearest neighbor search

A relatively simple and readable Rust implementation of Vantage Point tree search algorithm.

The VP tree algorithm doesn't need to know coordinates of items, only distances between them. It can efficiently search multi-dimensional spaces and abstract things as long as you can define similarity between them (e.g. points, colors, and even images).

Please see the API reference or examples for details.

This algorithm does not work with squared distances. When implementing Euclidean distance, you MUST use sqrt(). Vantage Point trees require metric spaces.

#[derive(Copy, Clone)]
struct Point {
    x: f32, y: f32,
}

/// `MetricSpace` makes items comparable. It's a bit like Rust's `PartialOrd`.
impl vpsearch::MetricSpace for Point {
    type UserData = ();
    type Distance = f32;

    fn distance(&self, other: &Self, _: &Self::UserData) -> Self::Distance {
        let dx = self.x - other.x;
        let dy = self.y - other.y;

        // You MUST use sqrt here! The algorithm will give wrong results for squared distances.
        (dx*dx + dy*dy).sqrt()
    }
}

fn main() {
    let points = vec![Point{x:2.0,y:3.0}, Point{x:0.0,y:1.0}, Point{x:4.0,y:5.0}];

    let vp = vpsearch::Tree::new(&points);
    let (index, _) = vp.find_nearest(&Point{x:1.0,y:2.0});

    println!("The nearest point is at ({}, {})", points[index].x, points[index].y);
}

Implementing MetricSpace for Rust built-in types

This library includes a workaround for orphan rules. You need to add your crate's type when implementing MetricSpace:

struct MyImpl; // it can be any type, as long as it's yours
impl vpsearch::MetricSpace<MyImpl> for Vec<u8> {
    // continue as usual
}

Memory efficiency tip

Tree clones all the items and puts them in the tree. If the items are too big to clone and you'd rather keep the items elsewhere, you can!

Instead of storing the items, make the tree store indices into your items storage, and pass actual items as user_data to your MetricSpace::distance() function.

let items = /* your actual items are here */;
let indices: Vec<_> = (0...items.len() as u32).collect();
let tree = Tree::new_with_user_data_ref(&items, &indices);
let res = tree.find_nearest(&needle, &items);
Comments
  • Is it possible to return at most N nearest neighbors?

    Is it possible to return at most N nearest neighbors?

    According to the API doc, there is a find_nearest function. However, by default, it can only return the nearest neighbor node. How can I find out at most N nearest neighbors? This is import for interpolation.

    opened by astrojhgu 4
  • Add examples for custom search methods

    Add examples for custom search methods

    Thanks for the library! I found it really useful but thought it would be really nice to include some examples of the uses of find_nearest_custom, this pull request adds two new examples that can be used to find all the neighbours in a radius or the N nearest neighbours.

    This also adds the Default and Debug traits to the Distance item, this makes it easier to build / debug these BestCandidate implementations.

    This fixes: https://github.com/kornelski/vpsearch/issues/1

    opened by gijs-s 3
  • Query item's lifetime must same to the tree's lifetime?

    Query item's lifetime must same to the tree's lifetime?

    Hi,

    I'm a newbie to Rust, I want use this library to write an application for sequence processing. But there are somethig confusing me. Here is my simplified code:

    use strsim::{hamming};
    extern crate vpsearch;
    
    #[derive(Copy, Clone)]
    struct Barcode<'a> {
        code: &'a String
    }
    
    impl<'a> Barcode<'a> {
        fn new(code: &'a String) -> Self {
            Barcode {
                code: code
            }
        }
    }
    
    impl<'a> vpsearch::MetricSpace for Barcode<'a> {
        type UserData = ();
        type Distance = f32;
    
        fn distance<'b>(&self, other: &'b Self, _: &Self::UserData) -> Self::Distance {
            let d = hamming(self.code, other.code).unwrap();
            d as f32
        }
    }
    
    
    fn main() {
        let seqs: Vec<String> = vec!["AATT", "TTCC"].iter().map(|s| s.to_string()).collect();
        let mut items = vec![];
        for i in 0..seqs.len() {
            items.push(Barcode::new(&seqs[i]));
        }
        let tree = vpsearch::Tree::new(&items);
    
        {
            let q = "TATT".to_string();
            let q = Barcode::new(&q);
            let (index, dist) = tree.find_nearest(&q);
            println!("{} {}", index, dist);
        }
        {
            let q = "TTCA".to_string();
            let q = Barcode::new(&q);
            let (index, dist) = tree.find_nearest(&q);
            println!("{} {}", index, dist);
        }
    }
    
    

    Compile it, got:

    error[E0597]: `q` does not live long enough
      --> src/main.rs:38:30
       |
    38 |         let q = Barcode::new(&q);
       |                              ^^ borrowed value does not live long enough
    ...
    41 |     }
       |     - `q` dropped here while still borrowed
    ...
    45 |         let (index, dist) = tree.find_nearest(&q);
       |                             ---- borrow later used here
    
    error: aborting due to previous error
    
    For more information about this error, try `rustc --explain E0597`.
    error: could not compile `vptest`.
    

    In the real application, for saving the memory the query should be droped when it has been processed. Is there are any solution to make the query item and tree have independent lifetimes?

    opened by Nanguage 2
  • Implement Serialize and Deserialize for Tree

    Implement Serialize and Deserialize for Tree

    Would it be possible to implement serde's Serialize and Deserialize for the Tree struct? This would allow me to generate the tree, save it to a file and load it when I need it to speed up things.

    opened by Tirzono 1
  • Parallelise Tree creation with Rayon

    Parallelise Tree creation with Rayon

    First of all, I have to say thank you for this library; I have found it extremely useful, ergonomic and it is an essential part of the project I am using it in.

    As the size of the input data I am processing increases I have found the creation of the tree becomes a bottleneck as it is CPU bound and limited to only 1 core with the current implementation.

    I have only skimmed the codebase and am generally unfamiliar with the creation of VP trees, but it appears a few choice applications of Rayon's parallel iterators (in the index sorting for example) would greatly reduce the creation time of the tree structure.

    I'll be experimenting with it over the coming days but wanted to raise this issue to get your thoughts on this approach.

    Thank you again.

    opened by ntr-808 1
Owner
Kornel
Rust, image compression, web performance.
Kornel
🚀 efficient approximate nearest neighbor search algorithm collections library written in Rust 🦀 .

?? efficient approximate nearest neighbor search algorithm collections library written in Rust ?? .

Hora-Search 2.3k Jan 3, 2023
HNSW ANN from the paper "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs"

hnsw Hierarchical Navigable Small World Graph for fast ANN search Enable the serde feature to serialize and deserialize HNSW. Tips A good default for

Rust Computer Vision 93 Dec 30, 2022
Graph-based Approximate Nearest Neighbor Search

granne* granne (graph-based retrieval of approximate nearest neighbors) is a Rust library for approximate nearest neighbor search based on Hierarchica

null 283 Dec 21, 2022
Nearest neighbor search algorithms including a ball tree and a vantage point tree.

petal-neighbors Nearest neighbor search algorithms including a ball tree and a vantage point tree. Examples The following example shows how to find tw

Petabi 6 Oct 19, 2022
Hamming Weight Tree from the paper Online Nearest Neighbor Search in Hamming Space

hwt Hamming Weight Tree from the paper Online Nearest Neighbor Search in Hamming Space To understand how the data structure works, please see the docs

Rust Computer Vision 7 Oct 9, 2021
Rust-port of spotify/annoy as a wrapper for Approximate Nearest Neighbors in C++/Python optimized for memory usage.

Rust-port of spotify/annoy as a wrapper for Approximate Nearest Neighbors in C++/Python optimized for memory usage.

Arthur·Thomas 13 Mar 10, 2022
Rust-port of spotify/annoy as a wrapper for Approximate Nearest Neighbors in C++/Python optimized for memory usage.

Fareast This library is a rust port of spotify/annoy , currently only index serving is supported. It also provides FFI bindings for jvm, dotnet and da

Arthur·Thomas 13 Mar 10, 2022
Multi-agent (path finding) planning framework

multi-agent (path finding) planning framework Mapf is a (currently experimental) Rust library for multi-agent planning, with a focus on cooperative pa

null 17 Dec 5, 2022
PyQIR is a set of APIs for generating, parsing, and evaluating Quantum Intermediate Representation (QIR).

PyQIR PyQIR is a set of APIs for generating, parsing, and evaluating Quantum Intermediate Representation (QIR). It consists of the following component

QIR Alliance 37 Dec 31, 2022
Machine Learning library for Rust

rusty-machine This library is no longer actively maintained. The crate is currently on version 0.5.4. Read the API Documentation to learn more. And he

James Lucas 1.2k Dec 31, 2022
Rust library for Self Organising Maps (SOM).

RusticSOM Rust library for Self Organising Maps (SOM). Using this Crate Add rusticsom as a dependency in Cargo.toml [dependencies] rusticsom = "1.1.0"

Avinash Shenoy 26 Oct 17, 2022
Rust numeric library with R, MATLAB & Python syntax

Peroxide Rust numeric library contains linear algebra, numerical analysis, statistics and machine learning tools with R, MATLAB, Python like macros. W

Tae Geun Kim 351 Dec 29, 2022
A deep learning library for rust

Alumina An experimental deep learning library written in pure rust. Breakage expected on each release in the short term. See mnist.rs in examples or R

zza 95 Nov 30, 2022
Machine Learning Library for Rust

autograph Machine Learning Library for Rust undergoing maintenance Features Portable accelerated compute Run SPIR-V shaders on GPU's that support Vulk

null 223 Jan 1, 2023
Simple neural network library for classification written in Rust.

Cogent A note I continue working on GPU stuff, I've made some interesting things there, but ultimately it made me realise this is far too monumental a

Jonathan Woollett-Light 41 Dec 25, 2022
Rust wrapper for the Fast Artificial Neural Network library

fann-rs Rust wrapper for the Fast Artificial Neural Network (FANN) library. This crate provides a safe interface to FANN on top of the low-level bindi

Andreas Fackler 12 Jul 17, 2022
Wrapper around Microsoft CNTK library

Bindings for CNTK library Simple low level bindings for CNTK library from Microsoft. API Documentation Status Currently exploring ways how to interact

Vlado Boza 21 Nov 30, 2021
RustFFT is a high-performance FFT library written in pure Rust.

RustFFT is a high-performance FFT library written in pure Rust. It can compute FFTs of any size, including prime-number sizes, in O(nlogn) time.

Elliott Mahler 411 Jan 9, 2023
Rust crate to create Anki decks. Based on the python library genanki

genanki-rs: A Rust Crate for Generating Anki Decks With genanki-rs you can easily generate decks for the popular open source flashcard platform Anki.

Yannick Funk 63 Dec 23, 2022