Hamming Weight Tree from the paper Online Nearest Neighbor Search in Hamming Space

Rust Computer Vision

Last update: Oct 9, 2021

Related tags

Machine learning hwt

Overview

hwt

Hamming Weight Tree from the paper Online Nearest Neighbor Search in Hamming Space

To understand how the data structure works, please see the docs.

Benchmarks

Most recent benchmark for 1-NN:

You can find benchmark output here.

If you would like to run the benchmarks yourself, just run cargo bench at the command line. I recommend using RUSTFLAGS='-C target-cpu=native' cargo bench instead since both linear search and this tree are both significantly faster when using modern instructions.

Comments

Move features into tree for performance
The current performance is unacceptable. After profiling the code, it is clear that there are issues with cache locality. Here is the closure taking up 71.85% of all the time:

let lookup_distance = |index| (lookup(index) ^ feature).count_ones();

This performs a single trivial operation, so it looks innocuous, but it seems that the way hwt allows the user to store features outside of the tree is actually working against it, making it much slower than linear search (except maybe on huge data sets > 1B).

The API should be changed to give back features and not indices. The user can use their features to look up their data afterwards, which should have negligible performance impact since they only need to do a few lookups (k lookups for k-NN).

This has two negative impacts:

This could make the tree slightly larger.

This will make it harder for the user to store the data however they want.

Seeing as performance is the most important thing here, that takes precedence over these issues.
opened by vadixidav 4
Use packed_simd to speed up FeatureHeap::add

Profiling currently shows a lot of time is spent in FeatureHeap::add doing XOR, popcount, and comparations. We can speed this up with SIMD by doing the XOR, popcnt, and compares in parallel.

opened by vadixidav 0
Support approximate nearest neighbor

We can currently support approximate nearest neighbor search trivially by simply allowing neighbors that are at the current search radius + n. This means that we can be off by up to a distance of n. The performance gain should be high and it is easy to control precisely how much off you want to be (exactly by n at most).

opened by vadixidav 0

failed to selection version for ahash

$ cargo build
    Updating crates.io index
error: failed to select a version for the requirement `ahash = "^0.1.18"`
candidate versions found which didn't match: 0.8.0, 0.7.6, 0.7.5, ...
location searched: crates.io index
required by package `hwt v0.4.2`
    ... which is depended on by `bfp v0.1.0 (/workplace/amooren/sandbox/src/Amooren/bfp)`

Clean crate, just adding hwt latest version.

opened by mooreniemi 0

Dynamic sized features

I'm interested in using this crate to collate hashes produced by my img_hash library. I started to work on an implementation of a multiple vantage-point tree but I haven't had the time to fill out the tests or even finish the documentation and I have a feeling it's going to perform pretty abysmally by default. I really prefer your approach in this crate as I feel like we're working in the same problem space.

However, one of the features of img_hash is that it can produce hashes of varying sizes depending on a user's preference between performance and accuracy. By default, I have it producing 64-bit hashes so representing them as u128 would be pretty wasteful, and I also want to experiment with hash sizes of 256 bits and maybe greater so in that case u128 would require discarding information from the hash.

In the 3.0.0-alpha I have yet to release I have a generic representation of hash bytes which gives the option of either allocating them on the heap (for dynamic sizes) or inline (to save a misdirection and possibly improve codegen). I think this kind of generic approach would benefit hwt as well but I'm wondering what your opinion is of it.

I don't entirely understand the HWT algorithm so I don't know if 128-bit features are integral to its function but I have a feeling it wouldn't be too difficult to generalize over any size feature as long as they're all the same size in the tree.

opened by abonander 10
Convert search module to use a hand-optimized iterator for each level of the tree

Currently the search module is implemented in a convenient, but inefficient, way. It defines an iterator for 2 bit slices, then another one for 4 bit slices that flat maps two 2 bit slice ones, and so on. This causes the type name length (Rust compiler/type issue) to grow out of control and then requires Box to be used to wrap the types. In addition to the type name length issue, it is also very wasteful because of how it uses up to 128 128-bit operations on 1-bit numbers. The search module should be refactored (possibly at the expense of code size) to accommodate speed. SIMD should be used where possible.

opened by vadixidav 0
Performance tracking issue
Performance is currently only beating linear (which is incredibly fast on modern processors) for exact search at > 1M features. This is probably beatable, but it will require some optimizations. Below is the list of optimizations and their issues that need to be made:

k-NN

[x] Use packed_simd to speed up FeatureHeap::add

[ ] Convert search module to use a hand-optimized iterator for each level of the tree

[ ] Investigate if the search module can be sped up using SIMD or bit twiddling

[ ] Implement more SIMD optimizations upstream in swar
opened by vadixidav 1
Support lowes ratio retrieval

Once we reach a search distance for the second nearest neighbor that exceeds the Lowes ratio, we know that the Lowes ratio test has succeeded. We should return a Result<u128, (u128, u128)> for this operation. If the Lowes ratio test succeeds, we return the nearest neighbor. If the Lowes ratio test fails, we return the nearest neighbor and also the feature that caused the Lowes ratio test to fail (which is the second nearest neighbor).

This will speed up photogrammetry algorithms because we can stop searching once the Lowes ratio has been fulfilled.

opened by vadixidav 0

Owner

Rust Computer Vision

Dedicated to making computer vision easier than OpenCV and faster than C++

GitHub

HNSW ANN from the paper "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs"

hnsw Hierarchical Navigable Small World Graph for fast ANN search Enable the serde feature to serialize and deserialize HNSW. Tips A good default for

93 Dec 30, 2022

Rust implementation of multi-index hashing for neighbor searches on binary codes in the Hamming space

mih-rs Rust implementation of multi-index hashing (MIH) for neighbor searches on binary codes in the Hamming space, described in the paper Norouzi, Pu

8 Sep 23, 2022

Graph-based Approximate Nearest Neighbor Search

granne* granne (graph-based retrieval of approximate nearest neighbors) is a Rust library for approximate nearest neighbor search based on Hierarchica

283 Dec 21, 2022

🚀 efficient approximate nearest neighbor search algorithm collections library written in Rust 🦀 .

?? efficient approximate nearest neighbor search algorithm collections library written in Rust ?? .

2.3k Jan 3, 2023

C library for finding nearest (most similar) element in a set

VP-tree nearest neighbor search A relatively simple and readable Rust implementation of Vantage Point tree search algorithm. The VP tree algorithm doe

28 Aug 20, 2022

Rust-port of spotify/annoy as a wrapper for Approximate Nearest Neighbors in C++/Python optimized for memory usage.

13 Mar 10, 2022

Rust-port of spotify/annoy as a wrapper for Approximate Nearest Neighbors in C++/Python optimized for memory usage.

Fareast This library is a rust port of spotify/annoy , currently only index serving is supported. It also provides FFI bindings for jvm, dotnet and da

13 Mar 10, 2022

A simple url checker for finding fraud url(s) or nearest url

urlchecker A simple url checker for finding fraud url(s) or nearest url while being fast (threading) Eg:- use std::collections::HashMap; use urlchecke

2 Aug 7, 2022

Instance Distance is a fast pure-Rust implementation of the Hierarchical Navigable Small Worlds paper

Fast approximate nearest neighbor searching in Rust, based on HNSW index

135 Dec 24, 2022

MO's Trading - an online contest for high frequency trading

29 Dec 14, 2022

An 8080 Space Invaders emulator in Rust

Space Invade.rs An 8080 Space Invaders emulator written in Rust This is an 8080 emulator running the 1978 Space Invaders game by Taito, written in Rus

23 Dec 27, 2022

A stable, linearithmic sort in constant space written in Rust

A stable, linearithmic sort in constant space written in Rust. Uses the method described in "Fast Stable Merging And Sorting In Constant Extra Space"

4 Mar 30, 2022

TopK algorithm implementation in Rust (Filtered Space-Saving)

TopK TopK algorithm implementation in Rust. This crate currently provides the Filtered Space-Saving algorithm. Version numbers follow the semver conve

6 Feb 24, 2023

K-dimensional tree in Rust for fast geospatial indexing and lookup

kdtree K-dimensional tree in Rust for fast geospatial indexing and nearest neighbors lookup Crate Documentation Usage Benchmark License Usage Add kdtr

154 Jan 4, 2023

An Implementation of the Context Tree Weighting (CTW) Sequence Prediction Algorithm

Context Tree Weighting (CTW) CTW is a lightweight, practical and well performing sequence prediction algorithm discovered by Frans Willems, Yuri Shtar

7 Dec 23, 2022

Qdrant - vector similarity search engine with extended filtering support

Vector Similarity Search Engine with extended filtering support Qdrant (read: quadrant ) is a vector similarity search engine. It provides a productio

3.5k Dec 30, 2022

A neural network model that can approximate any non-linear function by using the random search algorithm for the optimization of the loss function.

random_search A neural network model that can approximate any non-linear function by using the random search algorithm for the optimization of the los

2 Apr 1, 2022

Machine learning framework for building object trackers and similarity search engines

Similari Similari is a framework that helps build sophisticated tracking systems. The most frequently met operations that can be efficiently implement

71 Dec 28, 2022

Nearest neighbor search algorithms including a ball tree and a vantage point tree.

petal-neighbors Nearest neighbor search algorithms including a ball tree and a vantage point tree. Examples The following example shows how to find tw

6 Oct 19, 2022

HNSW ANN from the paper "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs"

hnsw Hierarchical Navigable Small World Graph for fast ANN search Enable the serde feature to serialize and deserialize HNSW. Tips A good default for

93 Dec 30, 2022

Hamming Weight Tree from the paper Online Nearest Neighbor Search in Hamming Space

Related tags

Overview

hwt

Benchmarks

Comments

k-NN

Owner

Rust Computer Vision

HNSW ANN from the paper "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs"

Rust implementation of multi-index hashing for neighbor searches on binary codes in the Hamming space

Graph-based Approximate Nearest Neighbor Search

🚀 efficient approximate nearest neighbor search algorithm collections library written in Rust 🦀 .

C library for finding nearest (most similar) element in a set

Rust-port of spotify/annoy as a wrapper for Approximate Nearest Neighbors in C++/Python optimized for memory usage.

Rust-port of spotify/annoy as a wrapper for Approximate Nearest Neighbors in C++/Python optimized for memory usage.

A simple url checker for finding fraud url(s) or nearest url

Instance Distance is a fast pure-Rust implementation of the Hierarchical Navigable Small Worlds paper

MO's Trading - an online contest for high frequency trading

An 8080 Space Invaders emulator in Rust

A stable, linearithmic sort in constant space written in Rust

TopK algorithm implementation in Rust (Filtered Space-Saving)

K-dimensional tree in Rust for fast geospatial indexing and lookup

An Implementation of the Context Tree Weighting (CTW) Sequence Prediction Algorithm

Qdrant - vector similarity search engine with extended filtering support

A neural network model that can approximate any non-linear function by using the random search algorithm for the optimization of the loss function.

Machine learning framework for building object trackers and similarity search engines

Nearest neighbor search algorithms including a ball tree and a vantage point tree.

HNSW ANN from the paper "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs"