๐Ÿš€ efficient approximate nearest neighbor search algorithm collections library written in Rust ๐Ÿฆ€ .

Overview

Hora

[Homepage] [Document] [Examples]

Hora Search Everywhere!

Hora is an approximate nearest neighbor search algorithm (wiki) library. We implement all code in Rust๐Ÿฆ€ for reliability, high level abstraction and high speeds comparable to C++.

Hora, ใ€Œใปใ‚‰ใ€ in Japanese, sounds like [hลlษ™], and means Wow, You see! or Look at that!. The name is inspired by a famous Japanese song ใ€Œๅฐใ•ใชๆ‹ใฎใ†ใŸใ€.

Demos

๐Ÿ‘ฉ Face-Match [online demo], have a try!

๐Ÿท Dream wine comments search [online demo], have a try!

Features

  • Performant โšก๏ธ

    • SIMD-Accelerated (packed_simd)
    • Stable algorithm implementation
    • Multiple threads design
  • Supports Multiple Languages โ˜„๏ธ

    • Python
    • Javascript
    • Java
    • Go (WIP)
    • Ruby (WIP)
    • Swift (WIP)
    • R (WIP)
    • Julia (WIP)
    • Can also be used as a service
  • Supports Multiple Indexes ๐Ÿš€

    • Hierarchical Navigable Small World Graph Index (HNSWIndex) (details)
    • Satellite System Graph (SSGIndex) (details)
    • Product Quantization Inverted File(PQIVFIndex) (details)
    • Random Projection Tree(RPTIndex) (LSH, WIP)
    • BruteForce (BruteForceIndex) (naive implementation with SIMD)
  • Portable ๐Ÿ’ผ

    • Supports WebAssembly
    • Supports Windows, Linux and OS X
    • Supports IOS and Android (WIP)
    • Supports no_std (WIP, partial)
    • No heavy dependencies, such as BLAS
  • Reliability ๐Ÿ”’

    • Rust compiler secures all code
    • Memory managed by Rust for all language libraries such as Python's
    • Broad testing coverage
  • Supports Multiple Distances ๐Ÿงฎ

    • Dot Product Distance
      • equation
    • Euclidean Distance
      • equation
    • Manhattan Distance
      • equation
    • Cosine Similarity
      • equation
  • Productive โญ

    • Well documented
    • Elegant, simple and easy to learn API

Installation

Rust

in Cargo.toml

[dependencies]
hora = "0.1.1"

Python

$ pip install horapy

Javascript (WebAssembly)

$ npm i horajs

Building from source

$ git clone https://github.com/hora-search/hora
$ cargo build

Benchmarks

by aws t2.medium (CPU: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz) more information

Examples

Rust example [more info]

use hora::core::ann_index::ANNIndex;
use rand::{thread_rng, Rng};
use rand_distr::{Distribution, Normal};

pub fn demo() {
    let n = 1000;
    let dimension = 64;

    // make sample points
    let mut samples = Vec::with_capacity(n);
    let normal = Normal::new(0.0, 10.0).unwrap();
    for _i in 0..n {
        let mut sample = Vec::with_capacity(dimension);
        for _j in 0..dimension {
            sample.push(normal.sample(&mut rand::thread_rng()));
        }
        samples.push(sample);
    }

    // init index
    let mut index = hora::index::hnsw_idx::HNSWIndex::<f32, usize>::new(
        dimension,
        &hora::index::hnsw_params::HNSWParams::<f32>::default(),
    );
    for (i, sample) in samples.iter().enumerate().take(n) {
        // add point
        index.add(sample, i).unwrap();
    }
    index.build(hora::core::metrics::Metric::Euclidean).unwrap();

    let mut rng = thread_rng();
    let target: usize = rng.gen_range(0..n);
    // 523 has neighbors: [523, 762, 364, 268, 561, 231, 380, 817, 331, 246]
    println!(
        "{:?} has neighbors: {:?}",
        target,
        index.search(&samples[target], 10) // search for k nearest neighbors
    );
}

Python example [more info]

(dimension: 50, dtype: usize, max_item: 1000000, n_neigh: 32, n_neigh0: 64, ef_build: 20, ef_search: 500, has_deletion: False) # has neighbors: [410, 736, 65, 36, 631, 83, 111, 254, 990, 161] print("{} in {} \nhas neighbors: {}".format( target, index, index.search(samples[target], 10))) # search ">
import numpy as np
from horapy import HNSWIndex

dimension = 50
n = 1000

# init index instance
index = HNSWIndex(dimension, "usize")

samples = np.float32(np.random.rand(n, dimension))
for i in range(0, len(samples)):
    # add node
    index.add(np.float32(samples[i]), i)

index.build("euclidean")  # build index

target = np.random.randint(0, n)
# 410 in Hora ANNIndex  (dimension: 50, dtype: usize, max_item: 1000000, n_neigh: 32, n_neigh0: 64, ef_build: 20, ef_search: 500, has_deletion: False)
# has neighbors: [410, 736, 65, 36, 631, 83, 111, 254, 990, 161]
print("{} in {} \nhas neighbors: {}".format(
    target, index, index.search(samples[target], 10)))  # search

JavaScript example [more info]

{ const dimension = 50; var bf_idx = horajs.BruteForceIndexUsize.new(dimension); // var hnsw_idx = horajs.HNSWIndexUsize.new(dimension, 1000000, 32, 64, 20, 500, 16, false); for (var i = 0; i < 1000; i++) { var feature = []; for (var j = 0; j < dimension; j++) { feature.push(Math.random()); } bf_idx.add(feature, i); // add point } bf_idx.build("euclidean"); // build index var feature = []; for (var j = 0; j < dimension; j++) { feature.push(Math.random()); } console.log("bf result", bf_idx.search(feature, 10)); //bf result Uint32Array(10) [704, 113, 358, 835, 408, 379, 117, 414, 808, 826] } (async () => { await horajs.default(); await horajs.init_env(); demo(); })(); ">
import * as horajs from "horajs";

const demo = () => {
    const dimension = 50;
    var bf_idx = horajs.BruteForceIndexUsize.new(dimension);
    // var hnsw_idx = horajs.HNSWIndexUsize.new(dimension, 1000000, 32, 64, 20, 500, 16, false);
    for (var i = 0; i < 1000; i++) {
        var feature = [];
        for (var j = 0; j < dimension; j++) {
            feature.push(Math.random());
        }
        bf_idx.add(feature, i); // add point 
    }
    bf_idx.build("euclidean"); // build index
    var feature = [];
    for (var j = 0; j < dimension; j++) {
        feature.push(Math.random());
    }
    console.log("bf result", bf_idx.search(feature, 10)); //bf result Uint32Array(10) [704, 113, 358, 835, 408, 379, 117, 414, 808, 826]
}

(async () => {
    await horajs.default();
    await horajs.init_env();
    demo();
})();

Java example [more info]

public void demo() {
    final int dimension = 2;
    final float variance = 2.0f;
    Random fRandom = new Random();

    BruteForceIndex bruteforce_idx = new BruteForceIndex(dimension); // init index instance

    List tmp = new ArrayList<>();
    for (int i = 0; i < 5; i++) {
        for (int p = 0; p < 10; p++) {
            float[] features = new float[dimension];
            for (int j = 0; j < dimension; j++) {
                features[j] = getGaussian(fRandom, (float) (i * 10), variance);
            }
            bruteforce_idx.add("bf", features, i * 10 + p); // add point
            tmp.add(features);
          }
    }
    bruteforce_idx.build("bf", "euclidean"); // build index

    int search_index = fRandom.nextInt(tmp.size());
    // nearest neighbor search
    int[] result = bruteforce_idx.search("bf", 10, tmp.get(search_index));
    // [main] INFO com.hora.app.ANNIndexTest  - demo bruteforce_idx[7, 8, 0, 5, 3, 9, 1, 6, 4, 2]
    log.info("demo bruteforce_idx" + Arrays.toString(result));
}

private static float getGaussian(Random fRandom, float aMean, float variance) {
    float r = (float) fRandom.nextGaussian();
    return aMean + r * variance;
}

Roadmap

  • Full test coverage
  • Implement EFANNA algorithm to achieve faster KNN graph building
  • Swift support and iOS/macOS deployment example
  • Support R
  • support mmap

Related Projects and Comparison

  • Faiss, Annoy, ScaNN:

    • Hora's implementation is strongly inspired by these libraries.
    • Faiss focuses more on the GPU scenerio, and Hora is lighter than Faiss (no heavy dependencies).
    • Hora expects to support more languages, and everything related to performance will be implemented by Rust ๐Ÿฆ€ .
    • Annoy only supports the LSH (Random Projection) algorithm.
    • ScaNN and Faiss are less user-friendly, (e.g. lack of documentation).
    • Hora is ALL IN RUST ๐Ÿฆ€ .
  • Milvus, Vald, Jina AI

    • Milvus and Vald also support multiple languages, but serve as a service instead of a library
    • Milvus is built upon some libraries such as Faiss, while Hora is a library with all the algorithms implemented itself

Contribute

We appreciate your help!

We are glad to have you participate, any contributions are welcome, including documentations and tests. You can create a Pull Request or Issue on GitHub, and we will review it as soon as possible.

We use GitHub issues for tracking suggestions and bugs.

Clone the repo

git clone https://github.com/hora-search/hora

Build

cargo build

Test

cargo test --lib

Try the changes

cd examples
cargo run

License

The entire repository is licensed under the Apache License.

Comments
  • Saving Hora index to disk?

    Saving Hora index to disk?

    Hi, I'm using Horapy, and everything works great. But how do I save an index to disk after building it, so that I can use it elsewhere or later? Tried pickling the index but it gives an error that usize objects can't be pickled.

    opened by regstuff 2
  • BruteForceIndex load index from file error.

    BruteForceIndex load index from file error.

    index = HNSWIndex(dimension, "usize") 
    
    
    ---> 16     index.load(ipath)
         17 
         18     # Search
    
    C:\g\vr\lw\python\lib\site-packages\horapy\__init__.py in load(self, path)
         78             path: file path
         79         """
    ---> 80         self.ann_idx = self.ann_type.load(path)
         81 
         82     def dump(self, path):
    
    PanicException: called `Result::unwrap()` on an `Err` value: Io(Error { kind: UnexpectedEof, message: "failed to fill whole buffer" })
    
    opened by apiszcz 2
  • Can hora's index be serialized to disk?

    Can hora's index be serialized to disk?

    I have millions of vectors that need to be indexed, and the build speed is extremely slow. Can I serialize index to disk and then deserialize it into an index?

    opened by zzl221000 1
  • Code clean up

    Code clean up

    Mostly code clean up changes. Only logical one is replacing the match statement with the map method.

    Happy to make changes if these changes don't meet standards.

    opened by btv 1
  • Hey hello!! Project turismEC!!

    Hey hello!! Project turismEC!!

    I watch a great marketplace on my country. I live in Ecuador and the country have good opportunities on the turism market. I have a idea to connect the people, but I need some help. I have basic skills on development, and some money to start a project . Thanks. Psdt: is a great business

    question 
    opened by enriqueastu 1
  • Crash using cosine similarity when calling search

    Crash using cosine similarity when calling search

    When indexing a small number of vectors I am getting this error when specifying cosine_similarity (euclidean works fine for instance):

    thread 'hora_test' panicked at 'called `Option::unwrap()` on a `None` value', /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/core/neighbor.rs:32:54
    stack backtrace:
       0: rust_begin_unwind
                 at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:584:5
       1: core::panicking::panic_fmt
                 at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/panicking.rs:143:14
       2: core::panicking::panic
                 at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/panicking.rs:48:5
       3: core::option::Option<T>::unwrap
                 at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/option.rs:752:21
       4: <hora::core::neighbor::Neighbor<E,T> as core::cmp::Ord>::cmp
                 at /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/core/neighbor.rs:32:9
       5: <hora::core::neighbor::Neighbor<E,T> as core::cmp::PartialOrd>::partial_cmp
                 at /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/core/neighbor.rs:38:14
       6: core::cmp::PartialOrd::le
                 at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/cmp.rs:1129:19
       7: core::cmp::impls::<impl core::cmp::PartialOrd<&B> for &A>::le
                 at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/cmp.rs:1505:13
       8: alloc::collections::binary_heap::BinaryHeap<T>::sift_up
                 at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/alloc/src/collections/binary_heap.rs:562:16
       9: alloc::collections::binary_heap::BinaryHeap<T>::push
                 at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/alloc/src/collections/binary_heap.rs:496:18
      10: hora::index::hnsw_idx::HNSWIndex<E,T>::search_layer::{{closure}}
                 at /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/index/hnsw_idx.rs:363:25
      11: <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::for_each
                 at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/slice/iter/macros.rs:211:21
      12: hora::index::hnsw_idx::HNSWIndex<E,T>::search_layer
                 at /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/index/hnsw_idx.rs:353:13
      13: hora::index::hnsw_idx::HNSWIndex<E,T>::search_knn
                 at /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/index/hnsw_idx.rs:433:25
      14: <hora::index::hnsw_idx::HNSWIndex<E,T> as hora::core::ann_index::ANNIndex<E,T>>::node_search_k
                 at /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/index/hnsw_idx.rs:615:55
      15: hora::core::ann_index::ANNIndex::search
                 at /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/core/ann_index.rs:93:9
      16: hora_c::hora_test
                 at ./src/lib.rs:192:13
      17: hora_c::hora_test::{{closure}}
                 at ./src/lib.rs:168:1
      18: core::ops::function::FnOnce::call_once
                 at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ops/function.rs:227:5
      19: core::ops::function::FnOnce::call_once
                 at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ops/function.rs:227:5
    note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
    
    
    failures:
        hora_test
    
    opened by spullara 2
  • Possible to add sparse elements?

    Possible to add sparse elements?

    Thanks for the very nice library! I'm interested in using hora for doing nearest neighbor finding in single-cell genomics. The data of interest consist of very high dimensional points (D = 30,000), but for most points, most dimensions have value 0. Therefore, I'd like to avoid (it's not really feasible) to densify the elements before indexing them. Is there some way to provide a custom implementation of the relevant distance metrics for the indexed type such that I don't have to actually insert a dense representation of the points into the index?

    opened by rob-p 1
  • Dot production is not acting as a distance

    Dot production is not acting as a distance

    This line assumes, that dot function returns a simple dot production and negates it to make it work like a distance (smaller = closer) https://github.com/hora-search/hora/blob/a6759f8ae950a10d833735b9bca82d80435e60aa/src/core/metrics.rs#L56

    But in reality, dot function is already negated in the SIMDOptmized inplementation: https://github.com/hora-search/hora/blob/ce10e42bacbf29163fcb9a973e1a3f9dc528effb/src/core/simd_metrics.rs#L32

    Which leads to double inverting and incorrect use of Dot Production

    opened by generall 1
Owner
Hora-Search
Hora Search Everywhere!
Hora-Search
Graph-based Approximate Nearest Neighbor Search

granne* granne (graph-based retrieval of approximate nearest neighbors) is a Rust library for approximate nearest neighbor search based on Hierarchica

null 283 Dec 21, 2022
Nearest neighbor search algorithms including a ball tree and a vantage point tree.

petal-neighbors Nearest neighbor search algorithms including a ball tree and a vantage point tree. Examples The following example shows how to find tw

Petabi 6 Oct 19, 2022
Hamming Weight Tree from the paper Online Nearest Neighbor Search in Hamming Space

hwt Hamming Weight Tree from the paper Online Nearest Neighbor Search in Hamming Space To understand how the data structure works, please see the docs

Rust Computer Vision 7 Oct 9, 2021
Rust-port of spotify/annoy as a wrapper for Approximate Nearest Neighbors in C++/Python optimized for memory usage.

Rust-port of spotify/annoy as a wrapper for Approximate Nearest Neighbors in C++/Python optimized for memory usage.

ArthurยทThomas 13 Mar 10, 2022
Rust-port of spotify/annoy as a wrapper for Approximate Nearest Neighbors in C++/Python optimized for memory usage.

Fareast This library is a rust port of spotify/annoy , currently only index serving is supported. It also provides FFI bindings for jvm, dotnet and da

ArthurยทThomas 13 Mar 10, 2022
A neural network model that can approximate any non-linear function by using the random search algorithm for the optimization of the loss function.

random_search A neural network model that can approximate any non-linear function by using the random search algorithm for the optimization of the los

ph04 2 Apr 1, 2022
C library for finding nearest (most similar) element in a set

VP-tree nearest neighbor search A relatively simple and readable Rust implementation of Vantage Point tree search algorithm. The VP tree algorithm doe

Kornel 28 Aug 20, 2022
An approximate K-NN written in Rust.

small_knn This library is an approximate K-nearest neighbor search based on Hierarchical Navigable Small World (https://arxiv.org/pdf/1603.09320.pdf).

null 2 Nov 22, 2022
Rust implementation of multi-index hashing for neighbor searches on binary codes in the Hamming space

mih-rs Rust implementation of multi-index hashing (MIH) for neighbor searches on binary codes in the Hamming space, described in the paper Norouzi, Pu

Shunsuke Kanda 8 Sep 23, 2022
A simple url checker for finding fraud url(s) or nearest url

urlchecker A simple url checker for finding fraud url(s) or nearest url while being fast (threading) Eg:- use std::collections::HashMap; use urlchecke

Subconscious Compute 2 Aug 7, 2022
Program implementing the approximate version of DBSCAN introduced by Gan and Tao

appr_dbscan_rust Rust implementation of the approximate version of DBSCAN introduced by Gan and Tao in this paper Notice An upated version of this lib

Ivano Donadi 1 May 18, 2021
A naive density-based clustering algorithm written in Rust

Density-based clustering This a pure Rust implementation of a naive density-based clustering algorithm similar to DBSCAN. Here, 50 points are located

chris m 0 Mar 19, 2020
A rust library inspired by kDDBSCAN clustering algorithm

kddbscan-rs Rust implementation of the kddbscan clustering algorithm. From the authors of kDDBSCAN algorithm. Due to the adoption of global parameters

WhizSid 2 Apr 28, 2021
Label Propagation Algorithm by Rust. Label propagation (LP) is graph-based semi-supervised learning (SSL). LGC and CAMLP have been implemented.

label-propagation-rs Label Propagation Algorithm by Rust. Label propagation (LP) is graph-based semi-supervised learning (SSL). A simple LGC and a mor

vaaaaanquish 4 Sep 15, 2021
An implementation of the Pair Adjacent Violators algorithm for isotonic regression in Rust

Pair Adjacent Violators for Rust Overview An implementation of the Pair Adjacent Violators algorithm for isotonic regression. Note this algorithm is a

Ian Clarke 3 Dec 25, 2021
Rust implementation for DBSCANSD, a trajectory clustering algorithm.

DBSCANSD Rust implementation for DBSCANSD, a trajectory clustering algorithm. Brief Introduction DBSCANSD (Density-Based Spatial Clustering of Applica

Nick Gu 2 Mar 14, 2021
k-Medoids clustering in Rust with the FasterPAM algorithm

k-Medoids Clustering in Rust with FasterPAM This Rust crate implements k-medoids clustering with PAM. It can be used with arbitrary dissimilarites, as

Erich Schubert 11 Oct 16, 2022
Rust port of the extended isolation forest algorithm for anomaly detection

Extended Isolation Forest This is a rust port of the anomaly detection algorithm described in Extended Isolation Forest and implemented in https://git

Nico Mandery 6 Oct 21, 2022
TopK algorithm implementation in Rust (Filtered Space-Saving)

TopK TopK algorithm implementation in Rust. This crate currently provides the Filtered Space-Saving algorithm. Version numbers follow the semver conve

null 6 Feb 24, 2023