Rust wrappers for NGT approximate nearest neighbor search

Overview

ngt-rs   Latest Version Latest Doc

Rust wrappers for NGT, which provides high-speed approximate nearest neighbor searches against a large volume of data.

Note that NGT will be built dynamically for your target and this requires cmake. Furthermore, NGT's shared memory and large dataset features are available through cargo features shared_mem and large_data respectively.

Usage

Defining the properties of a new index:

use ngt::{Properties, DistanceType, ObjectType};

// Defaut properties with vectors of dimension 3
let prop = Properties::dimension(3)?;

// Or customize values (here are the defaults)
let prop = Properties::dimension(3)?
    .creation_edge_size(10)?
    .search_edge_size(40)?
    .object_type(ObjectType::Float)?
    .distance_type(DistanceType::L2)?;

Creating/Opening an index and using it:

use ngt::{Index, Properties, EPSILON};

// Create a new index
let prop = Properties::dimension(3)?;
let index = Index::create("target/path/to/index/dir", prop)?;

// Open an existing index
let mut index = Index::open("target/path/to/index/dir")?;

// Insert two vectors and get their id
let vec1 = vec![1.0, 2.0, 3.0];
let vec2 = vec![4.0, 5.0, 6.0];
let id1 = index.insert(vec1)?;
let id2 = index.insert(vec2)?;

// Actually build the index (not yet persisted on disk)
// This is required in order to be able to search vectors
index.build(2)?;

// Perform a vector search (with 1 result)
let res = index.search(&vec![1.1, 2.1, 3.1], 1, EPSILON)?;
assert_eq!(res[0].id, id1);
assert_eq!(index.get_vec(id1)?, vec![1.0, 2.0, 3.0]);

// Remove a vector and check that it is not present anymore
index.remove(id1)?;
let res = index.get_vec(id1);
assert!(matches!(res, Result::Err(_)));

// Verify that now our search result is different
let res = index.search(&vec![1.1, 2.1, 3.1], 1, EPSILON)?;
assert_eq!(res[0].id, id2);
assert_eq!(index.get_vec(id2)?, vec![4.0, 5.0, 6.0]);

// Persist index on disk
index.persist()?;
Comments
  • Statically link ngt?

    Statically link ngt?

    The idea

    Looking in /lib after a build, I see libngt.a:

    $ fd --hidden --no-ignore --glob libngt*
    ...
    target/debug/build/ngt-sys-a9e71d1b7f68537e/out/lib/libngt.a
    target/debug/build/ngt-sys-a9e71d1b7f68537e/out/lib/libngt.so
    target/debug/build/ngt-sys-a9e71d1b7f68537e/out/lib/libngt.so.1
    target/debug/build/ngt-sys-a9e71d1b7f68537e/out/lib/libngt.so.1.14.7
    

    So I was wondering whether it would be possible to link ngt statically. This would remove the need to have to put libngt.so in a place where executables can find it.

    Naively, I changed a line in build.rs from this:

        println!("cargo:rustc-link-lib=dylib=ngt");
    

    to this:

        println!("cargo:rustc-link-lib=static=ngt");
    

    This fails

    After making the change to build.rs, this error occurs when running cargo build:

       <snip>
       Compiling proc-macro-crate v1.2.1
       Compiling num_enum_derive v0.5.7
       Compiling ngt-sys v1.14.8 (/home/caleb/tmp/ngt-rs/ngt-sys)
       Compiling num_enum v0.5.7
       Compiling ngt v0.4.4 (/home/caleb/tmp/ngt-rs)
    error[E0425]: cannot find function `ngt_get_number_of_objects` in crate `sys`
       --> src/index.rs:326:23
        |
    326 |         unsafe { sys::ngt_get_number_of_objects(self.index, self.ebuf) }
        |                       ^^^^^^^^^^^^^^^^^^^^^^^^^ not found in `sys`
    
    error[E0425]: cannot find function `ngt_get_number_of_indexed_objects` in crate `sys`
       --> src/index.rs:331:23
        |
    331 |         unsafe { sys::ngt_get_number_of_indexed_objects(self.index, self.ebuf) }
        |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ not found in `sys`
    
    For more information about this error, try `rustc --explain E0425`.
    error: could not compile `ngt` due to 2 previous errors
    warning: build failed, waiting for other jobs to finish...
    

    Further investigation

    Checking the file sizes of the ngt build artifacts:

    $ ls -lah target/debug/build/ngt-sys-a9e71d1b7f68537e/out/lib/
    Permissions Size User  Date Modified Name
    .rw-r--r--   56M caleb  4 Sep 19:17  libngt.a
    .rw-r--r--   22M caleb  4 Sep 19:18  libngt.so.1.14.7
    lrwxrwxrwx    16 caleb  4 Sep 19:18  libngt.so.1 -> libngt.so.1.14.7
    lrwxrwxrwx    11 caleb  4 Sep 19:18  libngt.so -> libngt.so.1
    

    We see libngt.a is ~ 56 M. Checking for the rlib libraries produced by rust:

    $ ll target/debug/deps/ | rg ngt
    .rw-rw-r--   720 caleb  4 Sep 19:18  ngt_sys-88de9e97cd094267.d
    .rw-rw-r--  206k caleb  4 Sep 19:18  libngt_sys-88de9e97cd094267.rmeta
    .rw-rw-r--   485 caleb  4 Sep 19:18  ngt-1421cbd3da3a38b1.d
    .rw-rw-r--   57M caleb  4 Sep 19:18  libngt_sys-88de9e97cd094267.rlib
    

    We see that libngt_sys-88de9e97cd094267.rlib size is around ~ 57 MB, suggesting that libngt.a has been linked into it?

    That's as far as I can go for now, but hopefully we can figure out a way to link ngt statically?

    opened by cjrh 11
  • fatal error: 'NGT/Capi.h' file not found

    fatal error: 'NGT/Capi.h' file not found

    Hi @lerouxrgd

    I tried building ngt-rs today and it failed. This is the error I see:

      /home/caleb/tmp/ngt-rs/target/debug/build/ngt-sys-8ce361414492c0ac/out/include/NGT/NGTQ/Capi.h:107:10: fatal error: 'NGT/Capi.h' file not found
    

    I checked, and the file NGT/Capi.h is present:

    ~/tmp/ngt-rs/target/debug/build/ngt-sys-8ce361414492c0ac/out/include/NGT
    $ ls -lah
    Permissions Size User  Date Modified Name
    .rw-r--r--  6.2k caleb  3 Sep 21:44  SharedMemoryAllocator.h
    .rw-r--r--   13k caleb  3 Sep 21:44  Tree.h
    .rw-r--r--  1.4k caleb  3 Sep 21:44  Version.h
    .rw-r--r--   14k caleb  3 Sep 21:44  ObjectRepository.h
    .rw-r--r--   18k caleb  3 Sep 21:44  Node.h
    .rw-r--r--  7.7k caleb  3 Sep 21:44  Thread.h
    .rw-r--r--  5.8k caleb  3 Sep 21:44  ArrayFile.h
    .rw-r--r--   20k caleb  3 Sep 21:44  MmapManagerImpl.hpp
    .rw-r--r--  2.6k caleb  3 Sep 21:44  MmapManager.h
    .rw-r--r--   17k caleb  3 Sep 21:44  ObjectSpace.h
    .rw-r--r--   54k caleb  3 Sep 21:44  Optimizer.h
    .rw-r--r--   860 caleb  3 Sep 21:44  MmapManagerException.h
    .rw-r--r--  2.4k caleb  3 Sep 21:44  MmapManagerDefs.h
    .rw-r--r--   29k caleb  3 Sep 21:44  Clustering.h
    .rw-r--r--  7.9k caleb  3 Sep 21:44  Capi.h
    .rw-r--r--   56k caleb  3 Sep 21:44  Common.h
    .rw-r--r--   30k caleb  3 Sep 21:44  GraphReconstructor.h
    .rw-r--r--  149k caleb  3 Sep 21:44  half.hpp
    .rw-r--r--   22k caleb  3 Sep 21:44  GraphOptimizer.h
    .rw-r--r--  4.1k caleb  3 Sep 21:44  Command.h
    .rw-r--r--   40k caleb  3 Sep 21:44  Graph.h
    .rw-r--r--   254 caleb  3 Sep 21:44  version_defs.h
    .rw-r--r--   36k caleb  3 Sep 21:44  PrimitiveComparator.h
    .rw-r--r--   31k caleb  3 Sep 21:44  ObjectSpaceRepository.h
    .rw-r--r--   59k caleb  3 Sep 21:44  Index.h
    .rw-r--r--  2.6k caleb  3 Sep 21:44  HashBasedBooleanSet.h
    .rw-r--r--  1.7k caleb  3 Sep 21:44  defines.h
    drwxrwxr-x     - caleb  3 Sep 21:45  NGTQ
    

    It seems to me some kind of path problem. Do you have any suggestions for what I can try to fix this, or is it something that must be changed in ngt-rs?

    For completeness, I've attached the full build output.

    err.log

    opened by cjrh 5
  • Can cargo run, can't cargo build

    Can cargo run, can't cargo build

    Hi @lerouxrgd,

    When using cargo run and cargo run --release, I can use the ngt-rs crate without any issues. Search works and it's really fast. However, when I use cargo build --release and run the binary, I get the following error:

    error while loading shared libraries: libngt.so.1: cannot open shared object file: No such file or directory

    I'm not sure if this is an issue with ngt-rs, an issue with the underlying NGT, or an issue with my setup. Do you have any thoughts on this?

    Ubuntu 18.04, ngt = "0.4.0"

    opened by paulbricman 4
  • How to reconstruct ANNG?

    How to reconstruct ANNG?

    I guess ngt::optim::refine_anng is the one, but I don't know how to use it... I tried to put ngt::optim::AnngRefineParams::default() into the second argument but the system crushed.

    opened by nyapicom 3
  • feat: add openmp-sys to support static linking to openmp

    feat: add openmp-sys to support static linking to openmp

    As you suggested here, I've made an attempt at introducing openmp-sys to restore openmp support with static linking.

    The changes in this PR appear to work. cargo build and cargo test succeed without error.

    I also created a local temporary --bin project, using the changes in this PR:

    # Cargo.toml
    [package]
    name = "ngttester"
    version = "0.1.0"
    edition = "2021"
    
    [dependencies]
    ngt = { path = "/home/caleb/Documents/repos/ngt-rs/" }
    

    The main.rs (the code is from the README tutorial):

    use ngt::{Index, Properties, EPSILON};
    
    fn main() -> Result<(), Box<dyn std::error::Error>> {
        println!("Hello, world!");
    
        // Create a new index
        let prop = Properties::dimension(3)?;
        let index = Index::create("db/", prop)?;
    
        // Open an existing index
        let mut index = Index::open("db/")?;
    
        // Insert two vectors and get their id
        let vec1 = vec![1.0, 2.0, 3.0];
        let vec2 = vec![4.0, 5.0, 6.0];
        let id1 = index.insert(vec1)?;
        let id2 = index.insert(vec2)?;
    
        // Actually build the index (not yet persisted on disk)
        // This is required in order to be able to search vectors
        index.build(2)?;
    
        // Perform a vector search (with 1 result)
        let res = index.search(&vec![1.1, 2.1, 3.1], 1, EPSILON)?;
        assert_eq!(res[0].id, id1);
        assert_eq!(index.get_vec(id1)?, vec![1.0, 2.0, 3.0]);
    
        // Remove a vector and check that it is not present anymore
        index.remove(id1)?;
        let res = index.get_vec(id1);
        assert!(matches!(res, Result::Err(_)));
    
        // Verify that now our search result is different
        let res = index.search(&vec![1.1, 2.1, 3.1], 1, EPSILON)?;
        assert_eq!(res[0].id, id2);
        assert_eq!(index.get_vec(id2)?, vec![4.0, 5.0, 6.0]);
    
        // Persist index on disk
        index.persist()?;
        Ok(())
    }
    

    $ cargo run works, and the assets created by ngt show up correctly:

    ~/tmp/ngttester  ±master|…8 
    $ ls -lah
    Permissions Size User  Date Modified Name
    .rw-rw-r--     8 caleb  6 Sep 21:42  .gitignore
    drwxrwxr-x     - caleb  6 Sep 21:42  target
    .rw-rw-r--   233 caleb  6 Sep 21:43  Cargo.toml
    .rw-rw-r--   13k caleb  6 Sep 21:45  Cargo.lock
    drwxrwxr-x     - caleb  6 Sep 21:52  src
    .rw-rw-r--   657 caleb  6 Sep 21:52  tags
    drwxr-xr-x     - caleb  6 Sep 21:52  db
    drwxrwxr-x     - caleb  6 Sep 21:52  .git
    ~/tmp/ngttester  ±master|…8 
    $ ls db
    grp  obj  prf  tre
    

    And, finally, the link table for the produced executable:

    $ ldd target/debug/ngttester
    	linux-vdso.so.1 (0x00007ffd1c0b1000)
    	libngt.so.1 => not found
    	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2b04600000)
    	/lib64/ld-linux-x86-64.so.2 (0x00007f2b04a08000)
    

    No dynamic links to libngt or libgomp 🎉

    opened by cjrh 1
Owner
Romain Leroux
Romain Leroux
Super-simple, fully Rust powered "memory" (doc store + semantic search) for LLM projects, semantic search, etc.

memex Super simple "memory" for LLM projects, semantic search, etc. Running the service Note that if you're running on Apple silicon (M1/M2/etc.), it'

Spyglass Search 15 Jun 19, 2023
Search and read 'The Rust Book' from the terminal

TheBook TheBook is a command line utility that allows you to SEARCH and READ The Rust Programming Language (popularly known as 'The Book' ) from the t

0xHiro 技術者 239 Jan 4, 2023
memchr vs stringzilla - up to 7x throughput difference between two SIMD-accelerated substring search libraries in Rust

memchr vs stringzilla Rust Substring Search Benchmarks Substring search is one of the most common operations in text processing, and one of the slowes

Ash Vardanian 38 Mar 5, 2024
Static low-bandwidth search at scale

Pagefind Pagefind is a fully static search library that aims to perform well on large sites, while using as little of your users' bandwidth as possibl

CloudCannon 657 Dec 30, 2022
Leetcode Solutions in Rust, Advent of Code Solutions in Rust and more

RUST GYM Rust Solutions Leetcode Solutions in Rust AdventOfCode Solutions in Rust This project demostrates how to create Data Structures and to implem

Larry Fantasy 635 Jan 3, 2023
Simple autoclicker written in Rust, to learn the Rust language.

RClicker is an autoclicker written in Rust, written to learn more about the Rust programming language. RClicker was was written by me to learn more ab

null 7 Nov 15, 2022
Rust programs written entirely in Rust

mustang Programs written entirely in Rust Mustang is a system for building programs built entirely in Rust, meaning they do not depend on any part of

Dan Gohman 561 Dec 26, 2022
Rust 核心库和标准库的源码级中文翻译,可作为 IDE 工具的智能提示 (Rust core library and standard library translation. can be used as IntelliSense for IDE tools)

Rust 标准库中文版 这是翻译 Rust 库 的地方, 相关源代码来自于 https://github.com/rust-lang/rust。 如果您不会说英语,那么拥有使用中文的文档至关重要,即使您会说英语,使用母语也仍然能让您感到愉快。Rust 标准库是高质量的,不管是新手还是老手,都可以从中

wtklbm 493 Jan 4, 2023
A library for extracting #[no_mangle] pub extern "C" functions (https://docs.rust-embedded.org/book/interoperability/rust-with-c.html#no_mangle)

A library for extracting #[no_mangle] pub extern "C" functions In order to expose a function with C binary interface for interoperability with other p

Dmitrii - Demenev 0 Feb 17, 2022
clone of grep cli written in Rust. From Chapter 12 of the Rust Programming Language book

minigrep is a clone of the grep cli in rust Minigrep will find a query string in a file. To test it out, clone the project and run cargo run body poem

Raunak Singh 1 Dec 14, 2021
Rust-blog - Educational blog posts for Rust beginners

pretzelhammer's Rust blog ?? I write educational content for Rust beginners and Rust advanced beginners. My posts are listed below in reverse chronolo

kirill 5.2k Jan 1, 2023
The ray tracer challenge in rust - Repository to follow my development of "The Raytracer Challenge" book by Jamis Buck in the language Rust

The Ray Tracer Challenge This repository contains all the code written, while step by implementing Ray Tracer, based on the book "The Ray Tracer Chall

Jakob Westhoff 54 Dec 25, 2022
Learn-rust-the-hard-way - "Learn C The Hard Way" by Zed Shaw Converted to Rust

Learn Rust The Hard Way This is an implementation of Zed Shaw's Learn X The Hard Way for the Rust Programming Language. Installing Rust TODO: Instruct

Ryan Levick 309 Dec 8, 2022
Learn to write Rust procedural macros [Rust Latam conference, Montevideo Uruguay, March 2019]

Rust Latam: procedural macros workshop This repo contains a selection of projects designed to learn to write Rust procedural macros — Rust code that g

David Tolnay 2.5k Dec 29, 2022
The Rust Compiler Collection is a collection of compilers for various languages, written with The Rust Programming Language.

rcc The Rust Compiler Collection is a collection of compilers for various languages, written with The Rust Programming Language. Compilers Language Co

null 2 Jan 17, 2022
Integra8 rust integration test framework Rust with a focus on productivity, extensibility, and speed.

integra8 Integra8 rust integration test framework Rust with a focus on productivity, extensibility, and speed. | This repo is in a "work in progress"

exceptional 3 Sep 26, 2022
Neofetch but in Rust (rust-toml-fetch)

rtfetch Configuration Recompile each time you change the config file logo = "arch.logo" # in src/assets. info = [ "", "", "<yellow>{host_n

Paolo Bettelini 6 Jun 6, 2022
Rust Sandbox [code for 15 concepts of Rust language]

Rust-Programming-Tutorial Rust Sandbox [code for 15 concepts of Rust language]. The first time I've been introduced to Rust was on January 2022, you m

Bek Brace 4 Aug 30, 2022
TypeRust - simple Rust playground where you can build or run your Rust code and share it with others

Rust playground Welcome to TypeRust! This is a simple Rust playground where you can build or run your Rust code and share it with others. There are a

Kirill Vasiltsov 28 Dec 12, 2022