🔍TinySearch is a lightweight, fast, full-text search engine. It is designed for static websites.

Overview

tinysearch

CI

TinySearch is a lightweight, fast, full-text search engine. It is designed for static websites.

TinySearch is written in Rust, and then compiled to WebAssembly to run in the browser.
It can be used together with static site generators such as Jekyll, Hugo, zola, Cobalt, or Pelican.

Demo

How it works

tinysearch is a Rust/WASM port of the Python code from the article "Writing a full-text search engine using Bloom filters". It can be seen as an alternative to lunr.js and elasticlunr, which are quite heavy for smaller websites and require a lot of JavaScript.

The idea of tinysearch is to generate a small, self-contained WASM module from a list of articles on your website and run it directly on the frontend inside browsers.

Users

Limitations

  • Only searches for entire words. There are no search suggestions (yet).
  • Since we bundle all search indices for all articles into one static binary, we recommend to only use it for small- to medium-size websites. Expect around 4kB (non-compressed) per article.

Installation

wasm-pack is required to build the WASM module. Install it with

cargo install wasm-pack

To optimize the JavaScript output, you'll also need terser:

npm install terser -g

If you want to make the WebAssembly as small as possible, we recommend to install binaryen as well. On macOS you can install it with homebrew:

brew install binaryen

Alternatively, you can download the binary from the release page or use your OS package manager.

After that, you can install tinysearch itself:

cargo install tinysearch

Usage

As an input, we require a JSON file, which contains the content to index. Check out this example file.

tinysearch fixtures/index.json

ℹ️ You can take a look at the code examples for different static site generators here.
ℹ️ The body field in the JSON document is optional and can be skipped to just index post titles.

This will create a WASM module and the JavaScript glue code to integrate it into your homepage. You can open the demo.html from any webserver to see the result.

For example, Python has a built-in webserver for testing:

python3 -m http.server 

then browse to http://0.0.0.0:8000/demo.html to see the result.

For advanced usage options, try

tinysearch --help

Please check what's required to host WebAssembly in production -- you will need to explicitly set mime gzip types.

Docker

If a full Rust setup, you can also use our nightly-built Docker images.

Build

Available buid args:

Demo

wget https://raw.githubusercontent.com/tinysearch/tinysearch/master/fixtures/index.json
docker run -v $PWD:/tmp tinysearch/cli index.json

Custom repo/branch build

docker build --build-arg WASM_BRANCH=master --build-arg TINY_MAGIC=64 -t tinysearch/cli .

By default most recent stable alpine rust image is used. To get nightly just run

docker build --build-arg RUST_IMAGE=rustlang/rust:nightly-alpine -t tinysearch/cli:nightly .

Maintainers

  • Matthias Endler (@mre)
  • Jorge-Luis Betancourt (@jorgelbg)
  • Mad Mike (@fluential)

License

tinysearch is licensed under either of

at your option.

Comments
  • Error: NotEnoughSpace

    Error: NotEnoughSpace

    Error

    ./tinysearch public/index.json
    Error: NotEnoughSpace
    

    Index file

     ls -lha public/index.json
    -rw-r--r-- 1 q q 6,7M Jun 11 22:46 public/index.json
    

    strace results

    strace ./tinysearch public/index.json -p /tmp/
    
    ...
    
    getrandom("", 0, GRND_NONBLOCK)         = 0
    getrandom("\x1d\xee\x9e\xa5\x0c\x2d\xf4\x99\x9c\x9a\xe5\xb8\xc4\x52\x05\xb7\x4c\x3c\x2a\x79\xe4\xee\x5f\x24\xaf\x1e\xcf\x62\xb8\x49\x35\x97", 32, GRND_NONBLOCK) = 32
    write(2, "Error: ", 7Error: )                  = 7
    write(2, "NotEnoughSpace", 14NotEnoughSpace)          = 14
    write(2, "\n", 1
    )                       = 1
    sigaltstack({ss_sp=NULL, ss_flags=SS_DISABLE, ss_size=8192}, NULL) = 0
    munmap(0x7f7fffdd5000, 12288)           = 0
    exit_group(1)                           = ?
    +++ exited with 1 +++
    
    bug 
    opened by Lusitaniae 45
  • Is there a way to return the page description or body in the results?

    Is there a way to return the page description or body in the results?

    How difficult would it be to also return the page description in the results?

    I have been implementing tinysearch into the abridge zola theme.

    here is a demo using tinysearch: https://jieiku.github.io/abridge-tinysearch/

    here is the normal demo which is using elasticlunr (zola's default): https://abridge.netlify.app/

    abridge tinysearch branch: https://github.com/Jieiku/abridge/tree/tinysearch

    Here is how elasticlunr looks:

    abridge-elasticlunr

    I have modified the tinysearch js example to more tightly integrate with abridge, so it looks VERY similar, but I don't have a description to go along with the title:

    abridge-tinysearch

    opened by Jieiku 19
  • Fix copying engine files (based on #153)

    Fix copying engine files (based on #153)

    Might be a fix for https://github.com/tinysearch/tinysearch/issues/152, https://github.com/tinysearch/tinysearch/issues/151, https://github.com/tinysearch/tinysearch/issues/147. Would be delighted if someone could test this.

    opened by mre 9
  • Search results seems a bit random

    Search results seems a bit random

    I'm evaluating Tinysearch for my hugo blog. I built the json by tweaking a little bit the example given and compiled tinysearch from git.

    I installed it here for now so you can try it. The index is here if you want to find something in here :-)

    • It finds results from words which doesn't exist at all ( try "xugn.ui").
    • It finds pages without the term (try "morse").
    • it can't find pages containing the term (try "midsommar")

    Is the index not well built or is a bug or else ?

    opened by lord-re 9
  • Error: Engine directory could not be created

    Error: Engine directory could not be created

    Trying to run tinysearch on Windows 11 (with a Hugo blog):

    > tinysearch.exe .\public\index.json
    Unpacking tinysearch WASM engine into temporary directory "C:\\Users\\stop_\\AppData\\Local\\Temp\\.tmpZx9gHh"
    Starting unpack
    Name: \\?\C:\Users\stop_\(redacted)\blog\.git
    Name: \\?\C:\Users\stop_\(redacted)\blog\.gitmodules
    Name: \\?\C:\Users\stop_\(redacted)\blog\.hugo_build.lock
    Name: \\?\C:\Users\stop_\(redacted)\blog\archetypes
    Name: \\?\C:\Users\stop_\(redacted)\blog\config.toml
    Name: \\?\C:\Users\stop_\(redacted)\blog\content
    Name: \\?\C:\Users\stop_\(redacted)\blog\data
    Name: \\?\C:\Users\stop_\(redacted)\blog\go.mod
    Name: \\?\C:\Users\stop_\(redacted)\blog\go.sum
    Name: \\?\C:\Users\stop_\(redacted)\blog\layouts
    Name: \\?\C:\Users\stop_\(redacted)\blog\public
    Name: \\?\C:\Users\stop_\(redacted)\blog\resources
    Name: \\?\C:\Users\stop_\(redacted)\blog\static
    Name: \\?\C:\Users\stop_\(redacted)\blog\storage
    Name: \\?\C:\Users\stop_\(redacted)\blog\themes
    Error: Engine directory could not be created at C:\Users\stop_\AppData\Local\Temp\.tmpZx9gHh\engine
    

    This is a quite incomplete error message in my opinion...

    opened by dertuxmalwieder 8
  • building docker image gives error from wasm-pack in step 19/28

    building docker image gives error from wasm-pack in step 19/28

    Hello,

    I am trying to test the docker but when running docker build --build-arg RUST_IMAGE=rustlang/rust:nightly-alpine -t tinysearch/cli:nightly .

    at step 19 from 28, the build stops with the error =>

    Step 19/28 : RUN wasm-pack --version
     ---> Running in d0787575a530
    The command '/bin/sh -c wasm-pack --version' returned a non-zero code: 139
    

    I tried with others docker images like docker build --build-arg WASM_BRANCH=master --build-arg TINY_MAGIC=64 -t tinysearch/cli . or other solutions from this thread => https://github.com/tinysearch/tinysearch/issues/111

    But it always lead to this error.

    Does anyone know why this happened?.

    Thanks for the help.

    opened by suppadeliux 5
  • Filters no longer statically built into the binary, are dynamically loaded in JS

    Filters no longer statically built into the binary, are dynamically loaded in JS

    Just a proof of concept for now - should solve the compile times issue since the wasm can be compiled once and then filters generated separately. To test, build the project as normal then overwrite the .wasm files by running (in the engine dir):

    tinysearch/engine $ wasm-pack build --target web --release --out-dir ../

    Next steps:

    1. Separate out the wasm from the filter generation, publish to npm
    2. Build a cli for generating the filters

    Let me know if you think it's worth continuing

    Cheers,

    P

    opened by petertrotman 5
  • Improve compile times

    Improve compile times

    At the moment, it takes ages to generate the Wasm binary. That's because we download an entire crate and compile it whenever we run tinysearch corpus.json. Maybe anyone has an idea on how to improve that?

    enhancement help wanted good first issue hacktoberfest 
    opened by mre 5
  • change default output folder to ./wasm_output instead of local dir

    change default output folder to ./wasm_output instead of local dir

    When I was running tinysearch locally and ran cargo run fixtures/index.json, it was wiping my local repository and replacing it with the wasm output.

    The above PR changes the default directory when no output directory is specified to a "wasm_output" dir created in the local directory.

    I'm curious about how the current tinysearch cargo build would work in another web project and if it would delete all the files and replace them with the wasm output.

    Also please excuse if there's anything wrong with my code, I'm new to Rust.

    opened by lspdrz 4
  • Please tag release 0.7.0

    Please tag release 0.7.0

    Hi! Can you please add the tag v0.7.0 to point to https://github.com/tinysearch/tinysearch/commit/5d98c355e4f6a0e8c7f67e5764cd0322883ecdef ?

    This would enable bumping the Homebrew formula to the latest version! Thank you!

    opened by ms-ati 3
  • Try Xor filter instead of Cuckoo filter

    Try Xor filter instead of Cuckoo filter

    For some use-cases, Xor filters are supposed to be faster and smaller than Bloom and Cuckoo filters. Recently, support for strings was added to the xorfilter Rust crate: https://github.com/bnclabs/xorfilter/blob/master/tests/xorfilter.rs#L56 We should test that to see if they make our wasm binary smaller.

    If someone wants to give this a shot, please write a short comment here. I'd be happy to provide mentoring if needed.

    enhancement help wanted good first issue 
    opened by mre 3
  • For Zola sites, the tinysearch json index gets included in the sitemap.xml file.

    For Zola sites, the tinysearch json index gets included in the sitemap.xml file.

    For Zola sites, the tinysearch json index gets included in the sitemap.xml file.

    https://endler.dev/sitemap.xml

    2022-10-21_11-59-02

    I found a report that suggests not rendering the section will resolve the issue, but thats not the case for me: https://github.com/getzola/zola/issues/604

    There is currently a pull request to allow more formats, maybe then we could output/use json instead of html? https://github.com/getzola/zola/pull/1998

    opened by Jieiku 0
  • Failing to find Cargo.toml in temp directory?

    Failing to find Cargo.toml in temp directory?

    After struggling mightily with tinysearch installed via Homebrew and direct downloads, this one was installed via cargo install tinysearch, cargo install wasm-pack

    Help please, I am still stuck :(

    Why is tinysearch's invocation of wasm-pack looking for a Cargo.toml file in the temp directory it created?

    tinysearch --optimize --path static public/data_tinysearch/index.html
    Unpacking tinysearch WASM engine into temporary directory "/var/folders/nx/7ys_xr014yg41_mw1zn_wvb00000gn/T/.tmpMxhsOO"
    Starting unpack
    Copying index into crate
    Compiling WASM module using wasm-pack
    Error: crate directory is missing a `Cargo.toml` file; is `/var/folders/nx/7ys_xr014yg41_mw1zn_wvb00000gn/T/.tmpMxhsOO/engine` the wrong directory?
    Error: failed to execute "wasm-pack" "build" "/var/folders/nx/7ys_xr014yg41_mw1zn_wvb00000gn/T/.tmpMxhsOO/engine" "--target" "web" "--release" "--out-dir" "/Users/name/blog/static"
    status: exit status: 1
    
    opened by ms-ati 3
  • Does tinysearch support stemming, stopwords, and CJK?

    Does tinysearch support stemming, stopwords, and CJK?

    Does tinysearch support CJK languages? (Chinese, Japanese, and Korean)

    Also what about stemmers and stopwords?

    I am interested in using tinysearch for Zola. I proposed it here: https://github.com/getzola/zola/issues/1849

    It was mentioned that there may not be specific stemmers/stopword lists for languages other than English?

    I was reading over this https://endler.dev/2019/tinysearch/ and I seen removing stopwords mentioned, but it was not clear if this was something you manually did or something that is supported by tinysearch when it builds the index.

    opened by Jieiku 1
  • Update Cargo build for tinysearch

    Update Cargo build for tinysearch

    The tinysearch fixtures/index.json command in the "Usage" section of the readme isn't working. I think this is because the Cargo build for tinysearch hasn't been changed since the fix in this PR: https://github.com/tinysearch/tinysearch/pull/154.

    When I ran the local project using cargo run fixtures/index.json, the generation worked as expected.

    opened by lspdrz 0
  • Error: failed to execute

    Error: failed to execute "wasm-pack" "build"

    When trying to build nightly with docker build --build-arg RUST_IMAGE=rustlang/rust:nightly-alpine -t tinysearch/cli:nightly . I'm getting the following error:

    thread 'main' panicked at 'crate directory should exist', src/readme.rs:11:5
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
    Well, this is embarrassing.
    
    wasm-pack had a problem and crashed. To help us diagnose the problem you can send us a crash report.
    
    
    We have generated a report file at "/tmp/report-2ec64b30-c702-4581-a790-5d71fb2f66fe.toml". Submit an issue or email with the subject of "wasm-pack Crash Report" and include the report as an attachment.
    
    - Authors: Ashley Williams <[email protected]>, Jesper Håkansson <[email protected]>
    
    We take privacy seriously, and do not perform any automated error collection. In order to improve the software, we rely on people to submit reports.
    
    Thank you kindly!
    Error: failed to execute "wasm-pack" "build" "/tmp/.tmpzklwBA/engine" "--target" "web" "--release" "--out-dir" "/tmp"
    status: exit status: 101
    

    In spite of the message no report file was generated at tmp/report-2ec64b30-c702-4581-a790-5d71fb2f66fe.toml

    opened by expilo 7
  • Too many false positives

    Too many false positives

    I guess I have pushed capabilities of tinysearch to the limits. The site I'm indexing is by no means tiny. It's over 1700 pages albeit with very little text. The compiled wasm binary is 343k . Amazingly it still works very fast, though in case of some queries I'm getting a lot of false positives, like over 50%. I was wondering if it might be possible to make it more precise at the cost of a larger binary. I could live up with a ca 1MB binary as this site definitely does not have to load under 3s. I was using the docker image to compile the binary.

    docker run -v $PWD:/tmp tinysearch/cli index.json

    opened by expilo 4
Releases(v0.7.0)
Easy c̵̰͠r̵̛̠ö̴̪s̶̩̒s̵̭̀-t̶̲͝h̶̯̚r̵̺͐e̷̖̽ḁ̴̍d̶̖̔ ȓ̵͙ė̶͎ḟ̴͙e̸̖͛r̶̖͗ë̶̱́ṉ̵̒ĉ̷̥e̷͚̍ s̷̹͌h̷̲̉a̵̭͋r̷̫̊ḭ̵̊n̷̬͂g̵̦̃ f̶̻̊ơ̵̜ṟ̸̈́ R̵̞̋ù̵̺s̷̖̅ţ̸͗!̸̼͋

Rust S̵̓i̸̓n̵̉ I̴n̴f̶e̸r̵n̷a̴l mutability! Howdy, friendly Rust developer! Ever had a value get m̵̯̅ð̶͊v̴̮̾ê̴̼͘d away right under your nose just when

null 294 Dec 23, 2022
Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust

Tantivy is a full text search engine library written in Rust. It is closer to Apache Lucene than to Elasticsearch or Apache Solr in the sense it is no

tantivy 7.4k Dec 28, 2022
A full-text search engine in rust

Toshi A Full-Text Search Engine in Rust Please note that this is far from production ready, also Toshi is still under active development, I'm just slo

Toshi Search 3.8k Jan 7, 2023
Tantivy is a full text search engine library written in Rust.

Tantivy is a full text search engine library written in Rust. It is closer to Apache Lucene than to Elasticsearch or Apache Solr in the sense it is no

Quickwit OSS 7.4k Dec 30, 2022
Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust

Tantivy is a full-text search engine library written in Rust. It is closer to Apache Lucene than to Elasticsearch or Apache Solr in the sense it is no

Quickwit OSS 7.5k Jan 9, 2023
A full-text search and indexing server written in Rust.

Bayard Bayard is a full-text search and indexing server written in Rust built on top of Tantivy that implements Raft Consensus Algorithm and gRPC. Ach

Bayard Search 1.8k Dec 26, 2022
Shogun search - Learning the principle of search engine. This is the first time I've written Rust.

shogun_search Learning the principle of search engine. This is the first time I've written Rust. A search engine written in Rust. Current Features: Bu

Yuxiang Liu 5 Mar 9, 2022
🔎 Impossibly fast web search, made for static sites.

Stork Impossibly fast web search, made for static sites. Stork is two things. First, it's an indexer: it indexes your loosely-structured content and c

James Little 2.5k Dec 27, 2022
A simple and lightweight fuzzy search engine that works in memory, searching for similar strings (a pun here).

simsearch A simple and lightweight fuzzy search engine that works in memory, searching for similar strings (a pun here). Documentation Usage Add the f

Andy Lok 116 Dec 10, 2022
weggli is a fast and robust semantic search tool for C and C++ codebases. It is designed to help security researchers identify interesting functionality in large codebases.

weggli Introduction weggli is a fast and robust semantic search tool for C and C++ codebases. It is designed to help security researchers identify int

Google Project Zero 2k Jan 5, 2023
🦔 Fast, lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM.

?? Fast, lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM.

Valerian Saliou 17.4k Jan 2, 2023
Lightning Fast, Ultra Relevant, and Typo-Tolerant Search Engine

MeiliSearch Website | Roadmap | Blog | LinkedIn | Twitter | Documentation | FAQ ⚡ Lightning Fast, Ultra Relevant, and Typo-Tolerant Search Engine ?? M

MeiliSearch 31.6k Dec 31, 2022
⚡ Insanely fast, 🌟 Feature-rich searching. lnx is the adaptable deployment of the tantivy search engine you never knew you wanted. Standing on the shoulders of giants.

✨ Feature Rich | ⚡ Insanely Fast An ultra-fast, adaptable deployment of the tantivy search engine via REST. ?? Standing On The Shoulders of Giants lnx

lnx 679 Jan 1, 2023
⚡ Insanely fast, 🌟 Feature-rich searching. lnx is the adaptable deployment of the tantivy search engine you never knew you wanted. Standing on the shoulders of giants.

✨ Feature Rich | ⚡ Insanely Fast An ultra-fast, adaptable deployment of the tantivy search engine via REST. ?? Standing On The Shoulders of Giants lnx

lnx 0 Apr 25, 2022
High-performance log search engine.

NOTE: This project is under development, please do not depend on it yet as things may break. MinSQL MinSQL is a log search engine designed with simpli

High Performance, Kubernetes Native Object Storage 359 Nov 27, 2022
Perlin: An Efficient and Ergonomic Document Search-Engine

Table of Contents 1. Perlin Perlin Perlin is a free and open-source document search engine library build on top of perlin-core. Since the first releas

CurrySoftware GmbH 70 Dec 9, 2022
AI-powered search engine for Rust

txtai: AI-powered search engine for Rust txtai executes machine-learning workflows to transform data and build AI-powered text indices to perform simi

NeuML 69 Jan 2, 2023
Cross-platform, cross-browser, cross-search-engine duckduckgo-like bangs

localbang Cross-platform, cross-browser, cross-search-engine duckduckgo-like bangs What are "bangs"?? Bangs are a way to define where to search inside

Jakob Kruse 7 Nov 23, 2022
A Rust API search engine

Roogle Roogle is a Rust API search engine, which allows you to search functions by names and type signatures. Progress Available Queries Function quer

Roogle 342 Dec 26, 2022