DawnSearch
DawnSearch is an open source distributed web search engine that searches by meaning. It uses semantic search (searching on meaning), using all-MiniLM-L6-v2. It uses USearch for vector search. It can index the Common Crawl data. DawnSearch is written in Rust.
A public instance is available at dawnsearch.org.
Project Status
DawnSearch currently functions as a distributed (semantic) vector search. When you start an instance, it will register with the tracker. The instance can then participate in the network by searching. Optionally, it can index the common crawl dataset and answer queries.
Main items still to do:
- Better error handling. There still is a lot of .unwrap() in the code.
- Robustness against malfunctioning or malicious instances.
- Packet encryption.
- Increase search efficiency by distributing indexed pages to instances that are semantically close to the content.
Help needed!
DawnSearch is looking for:
- People to use the search on dawnsearch.org and give feedback on the useability and quality of results.
- People with Rust experience to take a look at the codebase and give tips on making it easier to read and more ideomatic Rust.
- A UI/UX designer to create designs for the main and search results pages.
- Rust developers who can tackle some of the problems mentioned under 'Project Status'
- People who want to run their own instance.
Please open issues for any questions or feedback. If you want to contribute something big, like a feature or a refactor, open an issue before you start so you don't do duplicate work!
Quick start
This will build and run an 'access terminal' DawnSearch instance on a recent Ubuntu, without GPU acceleration. See Modes for examples of other configurations.
sudo apt-get update && sudo apt-get install -y build-essential pkg-config
# Install rust if you don't have it already:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
mv DawnSearch.toml.example DawnSearch.toml
RUSTFLAGS='-C target-cpu=native' cargo run --release
Now, go to http://localhost:8080 to access your own DawnSearch instance. You will be able to perform searches, but you will not contribute to the network yet. Take a look at Modes to see how you can do so.
If you want to upgrade to GPU acceleration try this. You need to have CUDA installed:
RUSTFLAGS='-C target-cpu=native' cargo run --release --features cuda
Note that on an M1/M2 Mac, 'cargo install' does NOT work. 'cargo build' does though!
Feel free to open an issue if you encounter problems!
Configuration
You can configure DawnSearch through DawnSearch.toml or through environment variables like DAWNSEARCH_INDEX_CC.
Documentation
Work in progress!
- DawnSearch Architecture
- Additional information on buildling DawnSearch
- Data - Location of the data stored by DawnSearch.
- DawnSearch Modes - The different ways you can run DawnSearch.
- Optmizing - profiling and optimizing.
See also
- How to build a Semantic Search Engine in Rust - Excellent tutorial on how to do semantic search with rust-bert.