Locality Sensitive Hashing in Rust with Python bindings

Overview

lsh-rs (Locality Sensitive Hashing)

rust docs Build Status

Locality sensitive hashing can help retrieving Approximate Nearest Neighbors in sub-linear time.

For more information on the subject see:

Implementations

  • Base LSH
    • Signed Random Projections (Cosine similarity)
    • L2 distance
    • MIPS (Dot products/ Maximum Inner Product Search)
    • MinHash (Jaccard Similarity)
  • Multi Probe LSH
    • Step wise probing
      • SRP (only bit shifts)
    • Query directed probing
      • L2
      • MIPS
  • Generic numeric types

Getting started

use lsh_rs::LshMem;
// 2 rows w/ dimension 3.
let p = &[vec![1., 1.5, 2.],
        vec![2., 1.1, -0.3]];

// Do one time expensive preprocessing.
let n_projections = 9;
let n_hash_tables = 30;
let dim = 10;
let dim = 3;
let mut lsh = LshMem::new(n_projections, n_hash_tables, dim)
    .srp()
    .unwrap();
lsh.store_vecs(p);

// Query in sublinear time.
let query = &[1.1, 1.2, 1.2];
lsh.query_bucket(query);

Documentation

Python

At the moment, the Python bindings are only compiled for Linux x86_64 systems.

$ pip install floky

from floky import SRP
import numpy as np

N = 10000
n = 100
dim = 10

# Generate some random data points
data_points = np.random.randn(N, dim)

# Do a one time (expensive) fit.
lsh = SRP(n_projections=19, n_hash_tables=10)
lsh.fit(data_points)

# Query approximated nearest neigbors in sub-linear time
query = np.random.randn(n, dim)
results = lsh.predict(query)
Comments
  • Does not build

    Does not build

    I'm trying to run the examples and it seems like the project doesn't build at the moment. The compiler is reporting a few places where what appears to be a private serde module is being used. Did serde update and remove that export? Or am I missing something in order to import private modules?

    error[E0603]: module `export` is private
       --> lsh-rs/src/hash.rs:8:12
        |
    8   | use serde::export::PhantomData;
        |            ^^^^^^ private module
        |
    note: the module `export` is defined here
       --> /Users/isaac/.cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.120/src/lib.rs:275:5
        |
    275 | use self::__private as export;
        |     ^^^^^^^^^^^^^^^^^^^^^^^^^
    
    error[E0603]: module `export` is private
       --> lsh-rs/src/table/sqlite.rs:9:12
        |
    9   | use serde::export::PhantomData;
        |            ^^^^^^ private module
        |
    note: the module `export` is defined here
       --> /Users/isaac/.cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.120/src/lib.rs:275:5
        |
    275 | use self::__private as export;
        |     ^^^^^^^^^^^^^^^^^^^^^^^^^
    
    error[E0603]: module `export` is private
       --> lsh-rs/src/data.rs:4:12
        |
    4   | use serde::export::fmt::{Debug, Display};
        |            ^^^^^^ private module
        |
    note: the module `export` is defined here
       --> /Users/isaac/.cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.120/src/lib.rs:275:5
        |
    275 | use self::__private as export;
        |     ^^^^^^^^^^^^^^^^^^^^^^^^^
    
    opened by ijsnow 11
  • How to use for vanilla K,L LSH with precomputed hashes

    How to use for vanilla K,L LSH with precomputed hashes

    Hello,

    I'm curious to see if I can get this to work with plain K, L parametrized LSH. The setup is that I already have L Vecs each with K (u32) hashes of the input sample, which I'd like to be able to feed directly into the LSH. The LSH hasher in this case then only queries for exact matches in each of the L hash tables and returns the union of all matches across all tables.

    Is such a setup possible using the current API?

    Thanks,

    Paul

    opened by paul-sud 6
  • Upgrade `rusqlite` to `0.25.3`

    Upgrade `rusqlite` to `0.25.3`

    Upgrades rusqlite to 0.25.3 and fixes deprecation warnings caused by the upgrade. Also removes a couple of unused dependencies (as reported by linter).

    opened by pixelami 1
  • Fix import issues

    Fix import issues

    Close #9 following https://github.com/ritchie46/lsh-rs/issues/9#issuecomment-828214377

    This same change was enacted in other repos: https://github.com/teloxide/teloxide/pull/331/files

    The need for this was from a change in serde: https://github.com/serde-rs/serde/commit/dd1f4b483ee204d58465064f6e5bf5a457543b54

    opened by bwindsor22 0
  • rust 1.57.0 compatibility issue

    rust 1.57.0 compatibility issue

    Building using Rust-1.57.0 with Cargo.toml:

    [dependencies]
    clap = "*"
    log = "*"
    env_logger = "*"
    lsh-rs = {version = "*", features = ["blas"]}
    ndarray = {version = "*", features = ["blas"]}
    
    

    results in the following error:

       Compiling lsh-rs v0.4.0
    error[E0603]: module `export` is private
       --> XXX/.cargo/registry/src/github.com-1ecc6299db9ec823/lsh-rs-0.4.0/src/hash.rs:8:12
        |
    8   | use serde::export::PhantomData;
        |            ^^^^^^ private module
        |
    note: the module `export` is defined here
       --> XXX/.cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/lib.rs:276:5
        |
    276 | use self::__private as export;
        |     ^^^^^^^^^^^^^^^^^^^^^^^^^
    
    error[E0603]: module `export` is private
       --> XXX/.cargo/registry/src/github.com-1ecc6299db9ec823/lsh-rs-0.4.0/src/table/sqlite.rs:9:12
        |
    9   | use serde::export::PhantomData;
        |            ^^^^^^ private module
        |
    note: the module `export` is defined here
       --> XXX/.cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/lib.rs:276:5
        |
    276 | use self::__private as export;
        |     ^^^^^^^^^^^^^^^^^^^^^^^^^
    
    error[E0603]: module `export` is private
       --> XXX/.cargo/registry/src/github.com-1ecc6299db9ec823/lsh-rs-0.4.0/src/data.rs:4:12
        |
    4   | use serde::export::fmt::{Debug, Display};
        |            ^^^^^^ private module
        |
    note: the module `export` is defined here
       --> XXX/.cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/lib.rs:276:5
        |
    276 | use self::__private as export;
        |     ^^^^^^^^^^^^^^^^^^^^^^^^^
    

    This seems to be caused by change to serde. Any chance to bump up the implementation to the latest Rust?

    opened by rzolau 0
  • Can't obtain results using Rust implementation

    Can't obtain results using Rust implementation

    I'm roughly using the following code:

    let query_emb: Vec<f32>;
    let doc_emb: Vec<Vec<f32>>; // contains 3 document embeddings
    
    ...
    
    let mut lsh = LshMem::new(10, 30, 512).srp().unwrap();
    let _x = lsh.store_vecs(&doc_emb[..]);
    let result = lsh.query_bucket(&query_emb).unwrap();
    println!("lsh-rs: {:?}", result);
    

    Unfortunately, the result is empty. I'm testing the same query and documents with ngt-rs and I get some results (I'm looking for an alternative to ngt-rs which runs on windows). Is this a problem of using better parameters?

    opened by paulbricman 2
  • Make BLAS an optional version of install to avoid local build issues for some users

    Make BLAS an optional version of install to avoid local build issues for some users

    Following https://github.com/ritchie46/lsh-rs/issues/9#issuecomment-828215274, tracking separately: This make sense, BLAS is only for squeezing out max performance, but should definitely be opt in.

    opened by bwindsor22 0
  • how to add more

    how to add more

    Hi, i'm python user. I am enjoy to use your wonderful algorithm.

    While using it, i got some question. Is there any function that add samples after indexing

    for example

    lsh = SRP(n_projections=19, n_hash_tables=10) lsh.fit(data_points)

    lsh.add_data_points(data) <-- such as this function

    after build like this. i want to add some data_points again. but i don't want indexing all data_points that already indexed. so i just wondered there might be some function that can add moer data points.

    Thanks for your help.

    opened by JoungheeKim 1
  • Implement delete method for Sqlite backend.

    Implement delete method for Sqlite backend.

    The crate::table::general::HashTables has a delete method that is only implemented for the in memory backend. Could be useful for sqlite backend as well.

    See: https://github.com/ritchie46/lsh-rs/blob/master/lsh-rs/src/table/general.rs

    opened by ritchie46 0
Owner
Ritchie Vink
Machine Learning Engineer | Software Engineer
Ritchie Vink
Msgpack serialization/deserialization library for Python, written in Rust using PyO3, and rust-msgpack. Reboot of orjson. msgpack.org[Python]

ormsgpack ormsgpack is a fast msgpack library for Python. It is a fork/reboot of orjson It serializes faster than msgpack-python and deserializes a bi

Aviram Hassan 139 Dec 30, 2022
A Python CLI tool that finds all third-party packages imported into your Python project

python-third-party-imports This is a Python CLI tool built with Rust that finds all third-party packages imported into your Python project. Install Yo

Maksudul Haque 24 Feb 1, 2023
Rust numeric library with R, MATLAB & Python syntax

Peroxide Rust numeric library contains linear algebra, numerical analysis, statistics and machine learning tools with R, MATLAB, Python like macros. W

Tae Geun Kim 351 Dec 29, 2022
Rust crate to create Anki decks. Based on the python library genanki

genanki-rs: A Rust Crate for Generating Anki Decks With genanki-rs you can easily generate decks for the popular open source flashcard platform Anki.

Yannick Funk 63 Dec 23, 2022
Python package to compute levensthein distance in rust

Contents Introduction Installation Usage License Introduction Rust implementation of levensthein distance (https://en.wikipedia.org/wiki/Levenshtein_d

Thibault Blanc 2 Feb 21, 2022
Pyxirr - Rust-powered collection of financial functions for Python.

PyXIRR Rust-powered collection of financial functions. PyXIRR stands for "Python XIRR" (for historical reasons), but contains many other financial fun

Alexander Volkovsky 82 Jan 2, 2023
Robust and Fast tokenizations alignment library for Rust and Python

Robust and Fast tokenizations alignment library for Rust and Python

Yohei Tamura 14 Dec 10, 2022
A high performance python technical analysis library written in Rust and the Numpy C API.

Panther A efficient, high-performance python technical analysis library written in Rust using PyO3 and rust-numpy. Indicators ATR CMF SMA EMA RSI MACD

Greg 210 Dec 22, 2022
Rust-port of spotify/annoy as a wrapper for Approximate Nearest Neighbors in C++/Python optimized for memory usage.

Rust-port of spotify/annoy as a wrapper for Approximate Nearest Neighbors in C++/Python optimized for memory usage.

Arthur·Thomas 13 Mar 10, 2022
Rust-port of spotify/annoy as a wrapper for Approximate Nearest Neighbors in C++/Python optimized for memory usage.

Fareast This library is a rust port of spotify/annoy , currently only index serving is supported. It also provides FFI bindings for jvm, dotnet and da

Arthur·Thomas 13 Mar 10, 2022
Python+Rust implementation of the Probabilistic Principal Component Analysis model

Probabilistic Principal Component Analysis (PPCA) model This project implements a PPCA model implemented in Rust for Python using pyO3 and maturin. In

FindHotel 11 Dec 16, 2022
Low effort scraping Python's pickle format in Rust. It is to complete pickle parsing as BeautifulSoup was to complete HTML parsing.

repugnant-pickle Because it is, isn't it? This is a Rust crate for dealing with the Python pickle format. It also has support for opening PyTorch file

Kerfuffle 7 Apr 7, 2023
Sample Python extension using Rust/PyO3/tch to interact with PyTorch

Python extensions using tch to interact with PyTorch This sample crate shows how to use tch to write a Python extension that manipulates PyTorch tenso

Laurent Mazare 5 Jun 10, 2023
🌾 High-performance Text processing library for the Thai language, built with Rust and exposed as a Python package.

Thongna ?? Thongna (ท้องนา) is a high-performance text processing library for the Thai language, built with Rust and exposed as a Python package. Insp

fr4nk 3 Aug 17, 2024
Rye is Armin's personal one-stop-shop for all his Python needs.

Rye Rye is Armin's personal one-stop-shop for all his Python needs. It installs and manages Python installations, manages pyproject.toml files, instal

Armin Ronacher 2.8k Apr 26, 2023
Rust language bindings for TensorFlow

TensorFlow Rust provides idiomatic Rust language bindings for TensorFlow. Notice: This project is still under active development and not guaranteed to

null 4.1k Jan 1, 2023
Rust bindings for the C++ api of PyTorch.

tch-rs Rust bindings for the C++ api of PyTorch. The goal of the tch crate is to provide some thin wrappers around the C++ PyTorch api (a.k.a. libtorc

Laurent Mazare 2.3k Jan 1, 2023
Rust bindings for TensorFlow Lite

Rust bindings for TensorFlow Lite This crates provides TensorFlow Lite APIs. Please read the API documentation on docs.rs Using the interpreter from a

Boncheol Gu 84 Dec 11, 2022
Rust bindings for XGBoost.

rust-xgboost Rust bindings for the XGBoost gradient boosting library. Documentation Basic usage example: extern crate xgboost; use xgboost::{paramete

Dave Challis 79 Nov 28, 2022