Rust implementation of multi-index hashing for neighbor searches on binary codes in the Hamming space

Overview

mih-rs

Documentation Crates.io License: MIT

Rust implementation of multi-index hashing (MIH) for neighbor searches on binary codes in the Hamming space, described in the paper

Norouzi, Punjani, and Fleet, Fast exact search in Hamming space with multi-index hashing, IEEE TPAMI, 36(6):1107– 1119, 2014.

As the benchmark result shows, on 10 million 64-bit codes, mih-rs can perform top-k searches 19−94 times faster than linear search when k = 1..100.

Features

  • Two types of neighbor searches: mih-rs provides the two search operations:

    • Range search finds neighbor codes whose Hamming distances to a given code are within a radius.
    • Top-K search finds the top-K codes that are closest to a given code.
  • Fast and memory-efficient implementation: The data structure is built on sparse hash tables, following the original implementation.

  • Parameter free: mih-rs automatically sets an optimal parameter of MIH depending on a given database (although you can also set this manually).

  • Serialization: mih-rs supports to serialize/deserialize the index.

Example

use mih_rs::Index;

// Database of codes
let codes: Vec<u64> = vec![
    0b1111111111111111111111011111111111111111111111111011101111111111, // #zeros = 3
    0b1111111111111111111111111111111101111111111011111111111111111111, // #zeros = 2
    0b1111111011011101111111111111111101111111111111111111111111111111, // #zeros = 4
    0b1111111111111101111111111111111111111000111111111110001111111110, // #zeros = 8
    0b1101111111111111111111111111111111111111111111111111111111111111, // #zeros = 1
    0b1111111111111111101111111011111111111111111101001110111111111111, // #zeros = 6
    0b1111111111111111111111111111111111101111111111111111011111111111, // #zeros = 2
    0b1110110101011011011111111111111101111111111111111000011111111111, // #zeros = 11
];

// Query code
let qcode: u64 = 0b1111111111111111111111111111111111111111111111111111111111111111; // #zeros = 0

// Construct the index
let index = Index::new(codes).unwrap();

// Find the ids of neighbor codes whose Hamming distances are within 2
let mut searcher = index.range_searcher();
let answers = searcher.run(qcode, 2);
assert_eq!(answers, vec![1, 4, 6]);

// Find the ids of the top-4 nearest neighbor codes
let mut searcher = index.topk_searcher();
let answers = searcher.run(qcode, 4);
assert_eq!(answers, vec![4, 1, 6, 0]);

// Serialization/Deserialization
let mut data = vec![];
index.serialize_into(&mut data).unwrap();
let other = Index::<u64>::deserialize_from(&data[..]).unwrap();
assert_eq!(index, other);

Binary code types

mih_rs::Index can be built from a vector of type mih_rs::CodeInt that is a primitive integer trait supporting a popcount operation. Currently, this library defines mih_rs::CodeInt for u8, u16, u32, and u64.

Benchmark

timeperf_topk.rs offers the benchmark of top-K search for MIH and LinearSearch algorithms on binary code types u32 and u64.

The following table shows the result of average search times in milliseconds per query, in the settings:

  • Database: N random codes from a uniform distribution.
  • Query set: 100 random codes from a uniform distribution.
  • Machine: MacBook Pro (2019) of Quad-Core Intel Core i5 @2.4 GHz with 16 GB of RAM.
  • Library version: v0.2.0

Result for u32

Algorithm N=10,000 N=100,000 N=1,000,000 N=10,000,000
MIH (K=1) 0.01 0.02 0.07 0.38
MIH (K=10) 0.04 0.08 0.30 1.06
MIH (K=100) 0.13 0.22 1.22 4.35
LinearSearch 0.36 4.40 50.96 626.87

Result for u64

Algorithm N=10,000 N=100,000 N=1,000,000 N=10,000,000
MIH (K=1) 0.10 0.36 1.46 6.7
MIH (K=10) 0.20 0.76 3.72 14.8
MIH (K=100) 0.41 1.53 7.02 33.2
LinearSearch 0.36 4.36 52.28 629.1

Licensing

This library is free software provided under MIT.

You might also like...
An efficient implementation of Partitioned Label Trees & its variations for extreme multi-label classification

Omikuji An efficient implementation of Partitioned Label Trees (Prabhu et al., 2018) and its variations for extreme multi-label classification, writte

Damavand is a quantum circuit simulator. It can  run on laptops or High Performance Computing architectures, such CPU distributed architectures or multi GPU distributed architectures.
Damavand is a quantum circuit simulator. It can run on laptops or High Performance Computing architectures, such CPU distributed architectures or multi GPU distributed architectures.

Damavand is a code that simulates quantum circuits. In order to learn more about damavand, refer to the documentation. Development status Core feature

Multi-agent (path finding) planning framework

multi-agent (path finding) planning framework Mapf is a (currently experimental) Rust library for multi-agent planning, with a focus on cooperative pa

Rust implementation of real-coded GA for solving optimization problems and training of neural networks
Rust implementation of real-coded GA for solving optimization problems and training of neural networks

revonet Rust implementation of real-coded genetic algorithm for solving optimization problems and training of neural networks. The latter is also know

Instance Distance is a fast pure-Rust implementation of the Hierarchical Navigable Small Worlds paper

Fast approximate nearest neighbor searching in Rust, based on HNSW index

A real-time implementation of
A real-time implementation of "Ray Tracing in One Weekend" using nannou and rust-gpu.

Real-time Ray Tracing with nannou & rust-gpu An attempt at a real-time implementation of "Ray Tracing in One Weekend" by Peter Shirley. This was a per

A neural network, and tensor dynamic automatic differentiation implementation for Rust.

Corgi A neural network, and tensor dynamic automatic differentiation implementation for Rust. BLAS The BLAS feature can be enabled, and requires CBLAS

Flexible, reusable reinforcement learning (Q learning) implementation in Rust

Rurel Rurel is a flexible, reusable reinforcement learning (Q learning) implementation in Rust. Release documentation In Cargo.toml: rurel = "0.2.0"

Barnes-Hut t-SNE implementation written in Rust.
Barnes-Hut t-SNE implementation written in Rust.

bhtsne Barnes-Hut implementation of t-SNE written in Rust. The algorithm is described with fine detail in this paper by Laurens van der Maaten. Instal

Owner
Shunsuke Kanda
TRIE Pokémon
Shunsuke Kanda
🚀 efficient approximate nearest neighbor search algorithm collections library written in Rust 🦀 .

?? efficient approximate nearest neighbor search algorithm collections library written in Rust ?? .

Hora-Search 2.3k Jan 3, 2023
HNSW ANN from the paper "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs"

hnsw Hierarchical Navigable Small World Graph for fast ANN search Enable the serde feature to serialize and deserialize HNSW. Tips A good default for

Rust Computer Vision 93 Dec 30, 2022
Graph-based Approximate Nearest Neighbor Search

granne* granne (graph-based retrieval of approximate nearest neighbors) is a Rust library for approximate nearest neighbor search based on Hierarchica

null 283 Dec 21, 2022
Nearest neighbor search algorithms including a ball tree and a vantage point tree.

petal-neighbors Nearest neighbor search algorithms including a ball tree and a vantage point tree. Examples The following example shows how to find tw

Petabi 6 Oct 19, 2022
A package for common types for Cargo index interactions, and conversion between them.

Development stream: https://youtu.be/zGS-HqcAvA4 License Licensed under either of Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org

Jon Gjengset 18 Apr 17, 2023
Locality Sensitive Hashing in Rust with Python bindings

lsh-rs (Locality Sensitive Hashing) Locality sensitive hashing can help retrieving Approximate Nearest Neighbors in sub-linear time. For more informat

Ritchie Vink 65 Jan 2, 2023
TopK algorithm implementation in Rust (Filtered Space-Saving)

TopK TopK algorithm implementation in Rust. This crate currently provides the Filtered Space-Saving algorithm. Version numbers follow the semver conve

null 6 Feb 24, 2023
An 8080 Space Invaders emulator in Rust

Space Invade.rs An 8080 Space Invaders emulator written in Rust This is an 8080 emulator running the 1978 Space Invaders game by Taito, written in Rus

Cedric Beust 23 Dec 27, 2022
A stable, linearithmic sort in constant space written in Rust

A stable, linearithmic sort in constant space written in Rust. Uses the method described in "Fast Stable Merging And Sorting In Constant Extra Space"

Dylan MacKenzie 4 Mar 30, 2022
A Rust🦀 implementation of CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning

craftml-rs A Rust implementation of CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning (Siblini et al., 2018). Perf

Tom Dong 15 Nov 6, 2022