Static low-bandwidth search at scale

Overview

Pagefind

Pagefind is a fully static search library that aims to perform well on large sites, while using as little of your users' bandwidth as possible.

Pagefind runs after any static site generator and automatically indexes the built static files. Pagefind then outputs a static search bundle to your website, and exposes a JavaScript search API that can be used anywhere on your site.

Pagefind Documentation

Comments
  • HTML entities in search results

    HTML entities in search results

    I have a prototype search feature for my website using pagefind, https://dotat.at/search.html. It's nice and whizzy, and it fits in well with my Rust static site generator. Thanks for making pagefind!

    The only significant problem is that HTML entities in page titles are escaped, so my results page displays them like

     2022-04-20 – really divisionless random numbers 
    

    Entities in page bodies are not escaped, so if you search (for example) for nbsp, you get a lot of highlighted spaces in the results. This is probably a bug but it isn't a showstopper for me.

    opened by fanf2 7
  • ReferenceError: url is not defined with Content-Security-Policy enabled

    ReferenceError: url is not defined with Content-Security-Policy enabled

    I'm trying to enable pagefind with Content-Security-Policy enabled and run into the following error:

    pagefind.js:1 
      Uncaught (in promise) ReferenceError: url is not defined
        at Pagefind.loadWasm (pagefind.js:1:12922)
        at async Promise.all (/blog/index 0)
        at async Pagefind.init (pagefind.js:1:12207)
    loadWasm @ pagefind.js:1
    await in loadWasm (async)
    Pagefind @ pagefind.js:1
    (anonymous) @ pagefind.js:1
    

    Hugo config:

    server:
      headers:
      - for: /**
        values:
          X-Frame-Options: DENY
          X-Content-Type-Options: nosniff
          Referrer-Policy: strict-origin-when-cross-origin
          Permissions-Policy: document-domain=()
          Content-Security-Policy: default-src 'none'; img-src 'self' data:; form-action 'self'; base-uri 'self'; 
              block-all-mixed-content;
              style-src 'unsafe-inline' 'self' https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css https://fonts.googleapis.com/css;
              font-src https://fonts.gstatic.com/s/roboto/;
              script-src 'self' https://cdn.jsdelivr.net/npm/[email protected]/dist/js/bootstrap.bundle.min.js https://cdn.jsdelivr.net/npm/mermaid/dist/mermaid.min.js;
              frame-src https://www.youtube-nocookie.com/ https://player.vimeo.com/;
              media-src https://i.ytimg.com https://www.rovid.nl/def/dco/2016/def-dco-20160823-idoa9bivg-web-hd.mp4;
              connect-src ws://localhost:1313/livereload 'self';
    

    The connect-src option of the Content-Security-Policy is set to self which permits the script to connect (without this option you would get a policy error).

    ** copied the /public/_pagefind directory in my static Hugo folder for testing

    This error is also live on: https://pkic.org/blog/

    When running without Content-Security-Policy using the build in --serve option, the search runs fine:

    hugo; ../pagefind --source ./public/  --serve 
    
    opened by vanbroup 7
  • Support for dark and light mode?

    Support for dark and light mode?

    Love this search so far, really easy to integrate! <3

    But I didn't find anything for the support of more than one color mode?
    Right now it works well for light mode, but the text color is barely readable in dark mode.

    So it would be awesome to have support for dark and light mode :)

    Pagefind UI 
    opened by tohn 6
  • Ignore images within `data-pagefind-ignore`

    Ignore images within `data-pagefind-ignore`

    Currently there is no ability to exclude images selected for the image meta data, this is mainly problematic when there is no image related to the content at which an image from the site design is selected instead.

    I tried to exclude these images by putting them in a data-pagefind-ignore container expecting those images to be ignored but without success.

    As alternative I'm overruling the image meta data with the automatically generated open graph image, but this prevents any other images from within the content to be selected.

    <meta property="og:image" content="{{- $img.Permalink -}}" data-pagefind-meta="image[content]"/>
    
    opened by vanbroup 6
  • Multiple Site Indexing and Cross-site search.

    Multiple Site Indexing and Cross-site search.

    We have 3 different domains build with different SSG tools. We would like to be able to search on any of the sites and return the best results from any of the others. It would be nice if there was a way for the client side JS lib to have more than one index file to consider when finding hits.

    enhancement Pagefind Search 
    opened by bwklein 5
  • Hyphenated phrases

    Hyphenated phrases

    I tried searching for hyphenated words and phrases both with and without surrounding quotes — e.g., Go-based and "Go-based" — but this doesn’t work with 0.5.3. The result shows up with zero hits. I don't know whether the ignoring of hyphens is on purpose but, if not, just FYI.

    Pagefind Search Pagefind CLI 
    opened by brycewray 4
  • Wrong output path when using `--bundle-dir` with a relative path

    Wrong output path when using `--bundle-dir` with a relative path

    If one uses pagefind with --bundle-dir ./bundle --source /.../some-site, then ./bundle path is interpreted as relative to the --source, not relative to the current working directory where pagefind was started.

    (This makes me believe that pagefind uses chdir to switch to the --source path, and then all other paths from the CLI that are relative are misinterpreted as relative to the --source.)

    Perhaps a quick solution is before doing that chdir to resolve all CLI path arguments into absolute paths.

    (Fortunately this is not a major blocking point because one can just pass absolute paths as arguments.)

    opened by cipriancraciun 3
  • Words with accented letters

    Words with accented letters

    It appears that words with accented letters aren’t indexed. I searched for “Régis” as both “Régis” and “Regis” — neither returned “Régis.” Same for “Bjørn” — neither “Bjørn” nor “Bjorn” would return “Bjørn.” Just in case it was simply ignoring the accented letters, I also tried “Rgis” and “Bjrn” (respectively), and neither worked.

    opened by brycewray 3
  • Hide empty filters

    Hide empty filters

    Currently the filter in the search results shows many values with no results Value (0), it would make it more user friendly if those items would be hidden automatically:

    image

    opened by vanbroup 3
  • Internationalization

    Internationalization

    Pagefind is currently using the English stemmer — this should be made configurable and ideally auto-detected.

    If an internationalized website in folders is detected, Pagefind should run itself once for each language directory, using that language as the configuration.

    enhancement Pagefind Search Pagefind CLI 
    opened by bglw 3
  • Situations where HTML isn’t well-formed

    Situations where HTML isn’t well-formed

    This is more of an FYI than a true issue report.

    In testing Pagefind with the Astro SSG, I’ve seen that it can’t index pages with malformed HTML even though browsers can manage to display the HTML normally. For example, Astro sometimes generates pages lacking the wrapping <html></html> (and Pagefind fails to index them, in my testing) even though the HTML does manage to have <body></body>. (Their team is aware of the issue, and I’ve seen similar such problems with Astro-generated HTML over the last few months. They seem to come and go.)

    I’ve had no such problem with Hugo, where the HTML is always correct.

    Haven’t tried Pagefind with Eleventy yet but, in two-and-a-half years of off-and-on use of Eleventy, I never saw it produce malformed HTML, so am guessing Pagefind would be fine with it, too.

    Pagefind CLI 
    opened by brycewray 2
  • Incrementally updating static search bundles?

    Incrementally updating static search bundles?

    I’m really impressed by Pagefind. I’ve been wishing/looking for this kind of functionality for years.

    I’d like to integrate Pagefind into a static site generator (SSG) that I’m working on (it’s not yet public):

    • That SSG has an incremental mode where it detects which input files have changed since the last generation and only re-generates output files affected by those input files.
    • Is it conceivable for Pagefind to support a similar mode?
      • In my case, I’d love to have a Node.js API that I can call and provide with an Array of paths of files to re-index.
      • If a Node.js API is not in the cards, I can live with invoking the CLI via a child process.
    enhancement Pagefind CLI 
    opened by rauschma 2
  • Excluding certain terms from search

    Excluding certain terms from search

    Hi there. Thanks for releasing this amazing project! 🙇 I have a question about more advanced search queries:

    Is it possible to exclude certain terms from a search? E.g. I'd like cats ~"cute dogs" to return all documents that contain the word "cats" but not the phrase "cute dogs".

    If this functionality isn't implement yet but you'd be happy to have it in the scope of the project, I'm happy to work on it and open a pull request if you could provide me with some guidance on how to best go about this.

    opened by c-w 1
  • HTML parse encountered an error: ParsingAmbiguity

    HTML parse encountered an error: ParsingAmbiguity

    I'm using Docusaurus, a React-based SSR for documentation sites. On my site, I'm getting the following:

    ❯ npx -y pagefind --source build --serve
    
    Running Pagefind v0.8.0 (Extended)
    Running from: "/Users/dprothero/Projects/internal-product-docs"
    Source:       "build"
    Bundle Directory:  "_pagefind"
    
    [Walking source directory]
    Found 337 files matching **/*.{html}
    
    [Parsing files]
    thread 'main' panicked at 'HTML parse encountered an error: ParsingAmbiguity(
        ParsingAmbiguityError {
            on_tag_name: "style",
        },
    )', /Users/runner/work/pagefind/pagefind/pagefind/src/fossick/mod.rs:62:17
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
    

    When I try pagefind on the stock Docusaurus template, it works fine, so this is clearly something with my site, but the error does not provide enough information to diagnose the problem.

    opened by dprothero 4
  • Code block HTML detected and rendered as real HTML

    Code block HTML detected and rendered as real HTML

    Did a search today for install Hugo; result #19 brought up a segment of text from https://www.brycewray.com/posts/2022/07/more-tips-using-giscus/ — which, although actually a code block showing HTML, was rendered by Pagefind as real HTML. Here's the original code block:

    <details class="comments">
    	<summary class="ctr pokey">
    		<strong>View/hide comments</strong>
    	</summary>
    	<div class="giscus-comments">
    		<script src="https://giscus.app/client.js"
    			{{/*
    			... and all your specific settings,
    			which we'll skip here to save space
    			*/}}
    		>
    		</script>
    		<p class="ctr pokey sansSerif">
    			Commenting by <strong><a href="https://giscus.app" rel="nofollow">giscus</a></strong>.
    		</p>
    	</div>
    </details>
    

    . . . and the attached image is how it appeared.

    HTML-from-code-block__2022-09-06-2107CDT

    I have numerous code blocks within my site, including plenty with HTML, but this is the first time one has shown up in Pagefind results as rendered HTML. 😄

    Not really a huge deal, but just FYI.

    opened by brycewray 1
  • Cannot install package

    Cannot install package

    I'm building in my package.json like this:

      "scripts": {
        "all": "npm run build && npm run postbuild",
        "build": "npx @11ty/eleventy",
        "postbuild": "npx pagefind --source _site"
      },
    

    and then npm run all --serve

    I'm doing this locally, in dev mode, at the moment, and here's the output:

    > [email protected] all
    > npm run build && npm run postbuild
    
    
    > [email protected] build
    > npx @11ty/eleventy
    
    Writing _site/... etc etc
    Copied 841 files / Wrote 72 files in 25.28 seconds (351.1ms each, v0.12.1)
    
    > [email protected] postbuild
    > npx pagefind --source _site
    
    Need to install the following packages:
      pagefind
    Ok to proceed? (y) y
    
    > [email protected] postbuild
    > npx pagefind --source _site
    

    And it stops like this. I do not see any Pagefind output of indexed and created, and no /_pagefind/ directory is in my _site.

    I do have the Pagefind search UI in my search page:

              <div id="search"></div>
              <script>
                  window.addEventListener('DOMContentLoaded', (event) => {
                      new PagefindUI({ element: "#search",
                      showImages: false,
                      });
                  });
              </script>
    

    and I just omitted <link href="/_pagefind/pagefind-ui.css" rel="stylesheet">; while <script src="/_pagefind/pagefind-ui.js"></script> is in my 11ty _includes layout with other JS that work fine.

    I've also tried to run separately in my Windows Power Shell npx -y pagefind --source _site --serve but no success:

    What am I missing? Thanks!

    opened by rawriddims 11
Releases(v0.8.1)
Owner
CloudCannon
The Cloud CMS for Jamstack sites
CloudCannon
Low level access to T-Head Xuantie RISC-V processors

XuanTie Low level access to T-Head XuanTie RISC-V processors Contributing We welcome contribution! Please send an issue or pull request if you are rea

Luo Jia 30 Aug 24, 2022
Support SIMD low-memory overhead and high-performance adaptive radix tree.

Artful Artful is an adaptive radix tree library for Rust. At a high-level, it's like a BTreeMap. It is based on the implementation of paper, see The A

future 3 Sep 7, 2022
Compile time static maps for Rust

Rust-PHF Documentation Rust-PHF is a library to generate efficient lookup tables at compile time using perfect hash functions. It currently uses the C

null 1.3k Sep 26, 2022
Rust wrappers for NGT approximate nearest neighbor search

ngt-rs   Rust wrappers for NGT, which provides high-speed approximate nearest neighbor searches against a large volume of data. Note that NGT will be

Romain Leroux 14 Jul 16, 2022
Terminal bandwidth utilization tool

bandwhich This is a CLI utility for displaying current network utilization by process, connection and remote IP/hostname How does it work? bandwhich s

Aram Drevekenin 7.3k Sep 16, 2022
A network bandwidth and latency tester.

Crusader Network Tester Setup Run cargo build --release to build the executables which are placed in target/release. Command line usage To host a serv

null 15 Sep 13, 2022
Docker images for compiling static Rust binaries using musl-libc and musl-gcc, with static versions of useful C libraries. Supports openssl and diesel crates.

rust-musl-builder: Docker container for easily building static Rust binaries Source on GitHub Changelog UPDATED: Major updates in this release which m

Eric Kidd 1.3k Sep 25, 2022
Hot reload static web server for deploying mutiple static web site with version control.

SPA-SERVER It is to provide a static web http server with cache and hot reload. 中文 README Feature Built with Hyper and Warp, fast and small! SSL with

null 6 Aug 17, 2022
Static Web Server - a very small and fast production-ready web server suitable to serve static web files or assets

Static Web Server (or SWS abbreviated) is a very small and fast production-ready web server suitable to serve static web files or assets.

Jose Quintana 396 Sep 23, 2022
🔍TinySearch is a lightweight, fast, full-text search engine. It is designed for static websites.

tinysearch TinySearch is a lightweight, fast, full-text search engine. It is designed for static websites. TinySearch is written in Rust, and then com

null 2.1k Sep 22, 2022
🔎 Impossibly fast web search, made for static sites.

Stork Impossibly fast web search, made for static sites. Stork is two things. First, it's an indexer: it indexes your loosely-structured content and c

James Little 2.4k Sep 24, 2022
GraphScope: A One-Stop Large-Scale Graph Computing System from Alibaba

A One-Stop Large-Scale Graph Computing System from Alibaba GraphScope is a unified distributed graph computing platform that provides a one-stop envir

Alibaba 1.9k Sep 26, 2022
Tells you how many years you need to wait until your subatomic xeon crystal synchronizer has doubled in plasma inversion efficiency on the Goldberg-Moleman scale or whatever.

about Tells you how many years you need to wait until your subatomic xeon crystal synchronizer has doubled in plasma inversion efficiency on the Goldb

null 2 Dec 3, 2021
rbdt is a python library (written in rust) for parsing robots.txt files for large scale batch processing.

rbdt ?? ?? ?? ?? rbdt is a work in progress, currently being extracted out of another (private) project for the purpose of open sourcing and better so

Knuckleheads' Club 0 Nov 9, 2021
Build fast, reward everyone, and scale without friction.

Scrypto Language for building DeFi apps on Radix. Terminology Package: A collection of blueprints, compiled and published as a single unit. Blueprint:

Radix DLT 287 Sep 15, 2022
Web-Scale Blockchain for fast, secure, scalable, decentralized apps and marketplaces.

Building 1. Install rustc, cargo and rustfmt. $ curl https://sh.rustup.rs -sSf | sh $ source $HOME/.cargo/env $ rustup component add rustfmt When buil

Solana Foundation 9.4k Sep 18, 2022
Codemod - Codemod is a tool/library to assist you with large-scale codebase refactors that can be partially automated but still require human oversight and occasional intervention

Codemod - Codemod is a tool/library to assist you with large-scale codebase refactors that can be partially automated but still require human oversight and occasional intervention. Codemod was developed at Facebook and released as open source.

Meta Archive 4k Sep 13, 2022
Thalo is an event-sourcing framework for building large scale systems

Thalo Event sourcing framework for building microservices. Overview Thalo is an event-sourcing framework for building large scale systems based on the

null 496 Sep 19, 2022
Sorock is an experimental "so rocking" scale-out distributed object storage

Sorock is an experimental "so rocking" scale-out distributed object storage

Akira Hayakawa 6 Jun 13, 2022
Cost saving K8s controller to scale down and up of resources during non-business hours

Kube-Saver Motivation Scale down cluster nodes by scaling down Deployments, StatefulSet, CronJob, Hpa during non-business hours and save $$, but if yo

Mahesh Rayas 5 Aug 15, 2022