The open source distributed web search engine that searches by meaning.

Overview

DawnSearch

Build Status Crates.io Crates.io License

DawnSearch is an open source distributed web search engine that searches by meaning. It uses semantic search (searching on meaning), using all-MiniLM-L6-v2. It uses USearch for vector search. It can index the Common Crawl data. DawnSearch is written in Rust.

A public instance is available at dawnsearch.org.

Project Status

DawnSearch currently functions as a distributed (semantic) vector search. When you start an instance, it will register with the tracker. The instance can then participate in the network by searching. Optionally, it can index the common crawl dataset and answer queries.

Main items still to do:

  1. Better error handling. There still is a lot of .unwrap() in the code.
  2. Robustness against malfunctioning or malicious instances.
  3. Packet encryption.
  4. Increase search efficiency by distributing indexed pages to instances that are semantically close to the content.

Help needed!

DawnSearch is looking for:

  1. People to use the search on dawnsearch.org and give feedback on the useability and quality of results.
  2. People with Rust experience to take a look at the codebase and give tips on making it easier to read and more ideomatic Rust.
  3. A UI/UX designer to create designs for the main and search results pages.
  4. Rust developers who can tackle some of the problems mentioned under 'Project Status'
  5. People who want to run their own instance.

Please open issues for any questions or feedback. If you want to contribute something big, like a feature or a refactor, open an issue before you start so you don't do duplicate work!

Quick start

This will build and run an 'access terminal' DawnSearch instance on a recent Ubuntu, without GPU acceleration. See Modes for examples of other configurations.

sudo apt-get update && sudo apt-get install -y build-essential pkg-config

# Install rust if you don't have it already:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

mv DawnSearch.toml.example DawnSearch.toml
RUSTFLAGS='-C target-cpu=native'  cargo run --release

Now, go to http://localhost:8080 to access your own DawnSearch instance. You will be able to perform searches, but you will not contribute to the network yet. Take a look at Modes to see how you can do so.

If you want to upgrade to GPU acceleration try this. You need to have CUDA installed:

RUSTFLAGS='-C target-cpu=native'  cargo run --release --features cuda

Note that on an M1/M2 Mac, 'cargo install' does NOT work. 'cargo build' does though!

Feel free to open an issue if you encounter problems!

Configuration

You can configure DawnSearch through DawnSearch.toml or through environment variables like DAWNSEARCH_INDEX_CC.

Documentation

Work in progress!

See also

You might also like...
Rust crate for configurable parallel web crawling, designed to crawl for content

url-crawler A configurable parallel web crawler, designed to crawl a website for content. Changelog Docs.rs Example extern crate url_crawler; use std:

 Crusty - polite && scalable broad web crawler
Crusty - polite && scalable broad web crawler

Broad web crawling is an activity of going through practically boundless web by starting from a set of locations(urls) and following outgoing links. Usually it doesn't matter where you start from as long as it has outgoing links to external domains.

A multiplayer web based roguelike built on Rust and WebRTC
A multiplayer web based roguelike built on Rust and WebRTC

Gorgon A multiplayer web-based roguelike build on Rust and WebRTC. License This project is licensed under either of Apache License, Version 2.0, (LICE

Getting the token's holder info and pushing to a web server.

Purpose of this program I've made this web scraper so you can use it to get the holder's amount from BSCscan and it will upload for you in JSON format

Multithreaded Web Server Made with Rust

Multithreaded Web Server Made with Rust The server listens for TCP connections at address 127.0.0.1:7878. Several pages can be accessed: 127.0.0.1:787

An efficient web server for TiddlyWikis.

Tiddlywiki Server This is a web backend for TiddlyWiki. It uses TiddlyWiki's web server API to save tiddlers in a [SQLite database]. It should come wi

A simple web server(and library) to display server stats over HTTP and Websockets/SSE or stream it to other systems.

x-server-stats A simple web server(and library) to display server stats over HTTP and Websockets/SSE or stream it to other systems. x-server(in x-serv

open source training courses about distributed database and distributed systemes
open source training courses about distributed database and distributed systemes

Welcome to learn Talent Plan Courses! Talent Plan is an open source training program initiated by PingCAP. It aims to create or combine some open sour

MeiliSearch is a powerful, fast, open-source, easy to use and deploy search engine
MeiliSearch is a powerful, fast, open-source, easy to use and deploy search engine

MeiliSearch is a powerful, fast, open-source, easy to use and deploy search engine. Both searching and indexing are highly customizable. Features such as typo-tolerance, filters, and synonyms are provided out-of-the-box. For more information about features go to our documentation.

H2O Open Source Kubernetes operator and a command-line tool to ease deployment (and undeployment) of H2O open-source machine learning platform H2O-3 to Kubernetes.
H2O Open Source Kubernetes operator and a command-line tool to ease deployment (and undeployment) of H2O open-source machine learning platform H2O-3 to Kubernetes.

H2O Kubernetes Repository with official tools to aid the deployment of H2O Machine Learning platform to Kubernetes. There are two essential tools to b

Elemental System Designs is an open source project to document system architecture design of popular apps and open source projects that we want to study
Elemental System Designs is an open source project to document system architecture design of popular apps and open source projects that we want to study

Elemental System Designs is an open source project to document system architecture design of popular apps and open source projects that we want to study

Blueboat is an open-source alternative to Cloudflare Workers. The monolithic engine for serverless web apps.

Blueboat Blueboat is an open-source alternative to Cloudflare Workers. Blueboat aims to be a developer-friendly, multi-tenant platform for serverless

Spacedrive is an open source cross-platform file explorer, powered by a virtual distributed filesystem written in Rust.
Spacedrive is an open source cross-platform file explorer, powered by a virtual distributed filesystem written in Rust.

Spacedrive A file explorer from the future. spacedrive.com » Download for macOS · Windows · Linux · iOS · watchOS · Android ~ Links will be added once

ripgrep recursively searches directories for a regex pattern while respecting your gitignore
ripgrep recursively searches directories for a regex pattern while respecting your gitignore

ripgrep (rg) ripgrep is a line-oriented search tool that recursively searches the current directory for a regex pattern. By default, ripgrep will resp

ripgrep recursively searches directories for a regex pattern while respecting your gitignore
ripgrep recursively searches directories for a regex pattern while respecting your gitignore

ripgrep (rg) ripgrep is a line-oriented search tool that recursively searches the current directory for a regex pattern. By default, ripgrep will resp

Rust implementation of multi-index hashing for neighbor searches on binary codes in the Hamming space

mih-rs Rust implementation of multi-index hashing (MIH) for neighbor searches on binary codes in the Hamming space, described in the paper Norouzi, Pu

🌊 ~ seaward is a crawler which searches for links or a specified word in a website.
🌊 ~ seaward is a crawler which searches for links or a specified word in a website.

🌊 seaward Installation cargo install seaward On NetBSD a pre-compiled binary is available from the official repositories. To install it, simply run:

Shogun search - Learning the principle of search engine. This is the first time I've written Rust.

shogun_search Learning the principle of search engine. This is the first time I've written Rust. A search engine written in Rust. Current Features: Bu

Filen.io is a cloud storage provider with an open-source desktop client.

Library to call Filen.io API from Rust Filen.io is a cloud storage provider with an open-source desktop client. My goal is to write a library which ca

Konstantin Zakharov 5 Nov 15, 2022
SpringQL: Open-source stream processor for IoT devices and in-vehicle computers

What is SpringQL? SpringQL is an open-source stream processor specialized in memory efficiency. It is supposed to run on embedded systems like IoT dev

SpringQL 25 Dec 26, 2022
Lightweight p2p library. Support build robust stable connection on p2p/distributed network.

Chamomile Build a robust stable connection on p2p network features Support build a robust stable connection between two peers on the p2p network. Supp

CympleTech 94 Jan 6, 2023
Bioyino is a distributed statsd-protocol server with carbon backend.

Bioyino The StatsD server written in Rust Description Bioyino is a distributed statsd-protocol server with carbon backend. Features all basic metric t

avito.tech 206 Dec 13, 2022
Open Internet Service to store transaction history for NFTs/Tokens on the Internet Computer

CAP - Certified Asset Provenance Transaction history & asset provenance for NFT’s & Tokens on the Internet Computer CAP is an open internet service pr

Psychedelic 42 Nov 10, 2022
Cover is an open internet service for canister code verification on the Internet Computer

Cover Cover (short for Code Verification) is an open internet service that helps verify the code of canisters on the Internet Computer. Visit our webs

Psychedelic 14 Oct 31, 2022
A open port scanner.

opscan A open port scanner. Install With cargo cargo install --force opscan With docker docker run --rm -it sigoden/opscan opscan.nmap.org Binaries

null 17 Feb 19, 2023
Tiny CLI application in rust to scan ports from a given IP and find how many are open. You can also pass the amount of threads for that scan

Port Scanner A simple multi-threaded port scanner written in Rust. Usage Run the port scanner by providing the target IP address and optional flags. $

nicolas lopes 4 Aug 29, 2023
A BitTorrent V1 engine library for Rust (and currently Linux)

cratetorrent Cratetorrent is a Rust crate implementing the BitTorrent version 1 protocol. It can be used as a library and also provides a simple examp

null 401 Dec 28, 2022
Afterglow-Server provides back-end APIs for the Afterglow workflow engine.

Afterglow-Server ?? Note: This project is still heavily in development and is at an early stage. Afterglow-Server provides back-end APIs for the After

梦歆 0 Jun 2, 2022