probly-search ·
A full-text search library, optimized for insertion speed, that provides full control over the scoring calculations.
This start initially as a port of the Node library NDX.
Demo
Recipe (title) search with 50k documents.
https://quantleaf.github.io/probly-search-demo/
Features
-
Three ways to do scoring
- BM25 ranking function to rank matching documents. The same ranking function that is used by default in Lucene >= 6.0.0.
- zero-to-one, a library unique scoring function that provides a normalized score that is bounded by 0 and 1. Perfect for matching titles/labels with queries.
- Ability to fully customize your own scoring function by implenting the
ScoreCalculator
trait.
-
Trie based dynamic Inverted Index.
-
Multiple fields full-text indexing and searching.
-
Per-field score boosting.
-
Configurable tokenizer and term filter.
-
Free text queries with query expansion.
-
Fast allocation, but latent deletion.
Documentation
Documentation is under development. For now read the source tests.
Example
Creating an index with a document that has 2 fields. Query documents, and remove a document.
use std::collections::HashSet;
use probly_search::{
index::{add_document_to_index, create_index, remove_document_from_index, Index},
query::{
query,
score::default::{bm25, zero_to_one},
QueryResult,
},
};
// Create index with 2 fields
let mut index = create_index::<usize>(2);
// Create docs from a custom Doc struct
let doc_1 = Doc {
id: 0,
title: "abc".to_string(),
description: "dfg".to_string(),
};
let doc_2 = Doc {
id: 1,
title: "dfgh".to_string(),
description: "abcd".to_string(),
};
// Add documents to index
add_document_to_index(
&mut index,
&[title_extract, description_extract],
tokenizer,
filter,
doc_1.id,
doc_1.clone(),
);
add_document_to_index(
&mut index,
&[title_extract, description_extract],
tokenizer,
filter,
doc_2.id,
doc_2,
);
// Search, expected 2 results
let mut result = query(
&mut index,
&"abc",
&mut bm25::new(),
tokenizer,
filter,
&[1., 1.],
None,
);
assert_eq!(result.len(), 2);
assert_eq!(
result[0],
QueryResult {
key: 0,
score: 0.6931471805599453
}
);
assert_eq!(
result[1],
QueryResult {
key: 1,
score: 0.28104699650060755
}
);
// Remove documents from index
let mut removed_docs = HashSet::new();
remove_document_from_index(&mut index, &mut removed_docs, doc_1.id);
// Vacuum to remove completely
vacuum_index(&mut index, &mut removed_docs);
// Search, expect 1 result
result = query(
&mut index,
&"abc",
&mut bm25::new(),
tokenizer,
filter,
&[1., 1.],
Some(&removed_docs),
);
assert_eq!(result.len(), 1);
assert_eq!(
result[0],
QueryResult {
key: 1,
score: 0.1166450426074421
}
);
Go through source tests in for the BM25 implementation and zero-to-one implementation for more query examples.