Tantivy is a full text search engine library written in Rust.
It is closer to Apache Lucene than to Elasticsearch or Apache Solr in the sense it is not an off-the-shelf search engine server, but rather a crate that can be used to build such a search engine.
Tantivy is, in fact, strongly inspired by Lucene's design.
Benchmark
The following benchmark break downs performance for different type of queries / collection.
Your mileage WILL vary depending on the nature of queries and their load.
Features
- Full-text search
- Configurable tokenizer (stemming available for 17 Latin languages with third party support for Chinese (tantivy-jieba and cang-jie), Japanese (lindera and tantivy-tokenizer-tiny-segmenter) and Korean (lindera + lindera-ko-dic-builder)
- Fast (check out the
🐎 ✨ benchmark✨ 🐎 ) - Tiny startup time (<10ms), perfect for command line tools
- BM25 scoring (the same as Lucene)
- Natural query language (e.g.
(michael AND jackson) OR "king of pop"
) - Phrase queries search (e.g.
"michael jackson"
) - Incremental indexing
- Multithreaded indexing (indexing English Wikipedia takes < 3 minutes on my desktop)
- Mmap directory
- SIMD integer compression when the platform/CPU includes the SSE2 instruction set
- Single valued and multivalued u64, i64, and f64 fast fields (equivalent of doc values in Lucene)
&[u8]
fast fields- Text, i64, u64, f64, dates, and hierarchical facet fields
- LZ4 compressed document store
- Range queries
- Faceted search
- Configurable indexing (optional term frequency and position indexing)
- Cheesy logo with a horse
Non-features
- Distributed search is out of the scope of Tantivy. That being said, Tantivy is a library upon which one could build a distributed search. Serializable/mergeable collector state for instance, are within the scope of Tantivy.
Getting started
Tantivy works on stable Rust (>= 1.27) and supports Linux, MacOS, and Windows.
- Tantivy's simple search example
- tantivy-cli and its tutorial -
tantivy-cli
is an actual command line interface that makes it easy for you to create a search engine, index documents, and search via the CLI or a small server with a REST API. It walks you through getting a wikipedia search engine up and running in a few minutes. - Reference doc for the last released version
How can I support this project?
There are many ways to support this project.
- Use Tantivy and tell us about your experience on Discord or by email ([email protected])
- Report bugs
- Write a blog post
- Help with documentation by asking questions or submitting PRs
- Contribute code (you can join our Discord server)
- Talk about Tantivy around you
Contributing code
We use the GitHub Pull Request workflow: reference a GitHub ticket and/or include a comprehensive commit message when opening a PR.
Clone and build locally
Tantivy compiles on stable Rust but requires Rust >= 1.27
. To check out and run tests, you can simply run:
git clone https://github.com/quickwit-oss/tantivy.git
cd tantivy
cargo build
Run tests
Some tests will not run with just cargo test
because of fail-rs
. To run the tests exhaustively, run ./run-tests.sh
.
Debug
You might find it useful to step through the programme with a debugger.
A failing test
Make sure you haven't run cargo clean
after the most recent cargo test
or cargo build
to guarantee that the target/
directory exists. Use this bash script to find the name of the most recent debug build of Tantivy and run it under rust-gdb
:
find target/debug/ -maxdepth 1 -executable -type f -name "tantivy*" -printf '%TY-%Tm-%Td %TT %p\n' | sort -r | cut -d " " -f 3 | xargs -I RECENT_DBG_TANTIVY rust-gdb RECENT_DBG_TANTIVY
Now that you are in rust-gdb
, you can set breakpoints on lines and methods that match your source code and run the debug executable with flags that you normally pass to cargo test
like this:
$gdb run --test-threads 1 --test $NAME_OF_TEST
An example
By default, rustc
compiles everything in the examples/
directory in debug mode. This makes it easy for you to make examples to reproduce bugs:
rust-gdb target/debug/examples/$EXAMPLE_NAME
$ gdb run