The open source distributed web search engine that searches by meaning.

Overview

DawnSearch

Build Status Crates.io Crates.io License

DawnSearch is an open source distributed web search engine that searches by meaning. It uses semantic search (searching on meaning), using all-MiniLM-L6-v2. It uses USearch for vector search. It can index the Common Crawl data. DawnSearch is written in Rust.

A public instance is available at dawnsearch.org.

Project Status

DawnSearch currently functions as a distributed (semantic) vector search. When you start an instance, it will register with the tracker. The instance can then participate in the network by searching. Optionally, it can index the common crawl dataset and answer queries.

Main items still to do:

  1. Better error handling. There still is a lot of .unwrap() in the code.
  2. Robustness against malfunctioning or malicious instances.
  3. Packet encryption.
  4. Increase search efficiency by distributing indexed pages to instances that are semantically close to the content.

Help needed!

DawnSearch is looking for:

  1. People to use the search on dawnsearch.org and give feedback on the useability and quality of results.
  2. People with Rust experience to take a look at the codebase and give tips on making it easier to read and more ideomatic Rust.
  3. A UI/UX designer to create designs for the main and search results pages.
  4. Rust developers who can tackle some of the problems mentioned under 'Project Status'
  5. People who want to run their own instance.

Please open issues for any questions or feedback. If you want to contribute something big, like a feature or a refactor, open an issue before you start so you don't do duplicate work!

Quick start

This will build and run an 'access terminal' DawnSearch instance on a recent Ubuntu, without GPU acceleration. See Modes for examples of other configurations.

sudo apt-get update && sudo apt-get install -y build-essential pkg-config

# Install rust if you don't have it already:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

mv DawnSearch.toml.example DawnSearch.toml
RUSTFLAGS='-C target-cpu=native'  cargo run --release

Now, go to http://localhost:8080 to access your own DawnSearch instance. You will be able to perform searches, but you will not contribute to the network yet. Take a look at Modes to see how you can do so.

If you want to upgrade to GPU acceleration try this. You need to have CUDA installed:

RUSTFLAGS='-C target-cpu=native'  cargo run --release --features cuda

Note that on an M1/M2 Mac, 'cargo install' does NOT work. 'cargo build' does though!

Feel free to open an issue if you encounter problems!

Configuration

You can configure DawnSearch through DawnSearch.toml or through environment variables like DAWNSEARCH_INDEX_CC.

Documentation

Work in progress!

See also

You might also like...
Afterglow-Server provides back-end APIs for the Afterglow workflow engine.

Afterglow-Server 🚧 Note: This project is still heavily in development and is at an early stage. Afterglow-Server provides back-end APIs for the After

Rust crate for configurable parallel web crawling, designed to crawl for content

url-crawler A configurable parallel web crawler, designed to crawl a website for content. Changelog Docs.rs Example extern crate url_crawler; use std:

 Crusty - polite && scalable broad web crawler
Crusty - polite && scalable broad web crawler

Broad web crawling is an activity of going through practically boundless web by starting from a set of locations(urls) and following outgoing links. Usually it doesn't matter where you start from as long as it has outgoing links to external domains.

A multiplayer web based roguelike built on Rust and WebRTC
A multiplayer web based roguelike built on Rust and WebRTC

Gorgon A multiplayer web-based roguelike build on Rust and WebRTC. License This project is licensed under either of Apache License, Version 2.0, (LICE

Getting the token's holder info and pushing to a web server.

Purpose of this program I've made this web scraper so you can use it to get the holder's amount from BSCscan and it will upload for you in JSON format

Multithreaded Web Server Made with Rust

Multithreaded Web Server Made with Rust The server listens for TCP connections at address 127.0.0.1:7878. Several pages can be accessed: 127.0.0.1:787

An efficient web server for TiddlyWikis.

Tiddlywiki Server This is a web backend for TiddlyWiki. It uses TiddlyWiki's web server API to save tiddlers in a [SQLite database]. It should come wi

A simple web server(and library) to display server stats over HTTP and Websockets/SSE or stream it to other systems.

x-server-stats A simple web server(and library) to display server stats over HTTP and Websockets/SSE or stream it to other systems. x-server(in x-serv

Filen.io is a cloud storage provider with an open-source desktop client.

Library to call Filen.io API from Rust Filen.io is a cloud storage provider with an open-source desktop client. My goal is to write a library which ca

Konstantin Zakharov 5 Nov 15, 2022
SpringQL: Open-source stream processor for IoT devices and in-vehicle computers

What is SpringQL? SpringQL is an open-source stream processor specialized in memory efficiency. It is supposed to run on embedded systems like IoT dev

SpringQL 25 Dec 26, 2022
MuonFP is an enterprise ready, TCP passive fingerprinter written in Rust that has no external dependencies such as WireShark or other open source software.

MuonFP is a TCP passive fingerprinter written in Rust that has no external dependencies such as WireShark or other open source software. The program w

Sundruid 8 Sep 22, 2024
Lightweight p2p library. Support build robust stable connection on p2p/distributed network.

Chamomile Build a robust stable connection on p2p network features Support build a robust stable connection between two peers on the p2p network. Supp

CympleTech 94 Jan 6, 2023
Bioyino is a distributed statsd-protocol server with carbon backend.

Bioyino The StatsD server written in Rust Description Bioyino is a distributed statsd-protocol server with carbon backend. Features all basic metric t

avito.tech 206 Dec 13, 2022
Open Internet Service to store transaction history for NFTs/Tokens on the Internet Computer

CAP - Certified Asset Provenance Transaction history & asset provenance for NFT’s & Tokens on the Internet Computer CAP is an open internet service pr

Psychedelic 42 Nov 10, 2022
Cover is an open internet service for canister code verification on the Internet Computer

Cover Cover (short for Code Verification) is an open internet service that helps verify the code of canisters on the Internet Computer. Visit our webs

Psychedelic 14 Oct 31, 2022
A open port scanner.

opscan A open port scanner. Install With cargo cargo install --force opscan With docker docker run --rm -it sigoden/opscan opscan.nmap.org Binaries

null 17 Feb 19, 2023
Tiny CLI application in rust to scan ports from a given IP and find how many are open. You can also pass the amount of threads for that scan

Port Scanner A simple multi-threaded port scanner written in Rust. Usage Run the port scanner by providing the target IP address and optional flags. $

nicolas lopes 4 Aug 29, 2023
A BitTorrent V1 engine library for Rust (and currently Linux)

cratetorrent Cratetorrent is a Rust crate implementing the BitTorrent version 1 protocol. It can be used as a library and also provides a simple examp

null 401 Dec 28, 2022