How do you index compressed html files?
I have a large static site with around ~100k html files. My static site generator outputs gzipped compressed html files, which get uploaded to S3. The compression saves a ton of money a month on hosting costs, and every modern browser than auto-detect and depress gzipped html files, so it's a necessary publication format.
However, I'm walking through you getting started docs and npx pagefind
is failing when it tries to walk my build directory:
Running Pagefind v0.9.1 (Extended)
Running from: "/var/project"
Source: "build"
Bundle Directory: "_pagefind"
[Walking source directory]
Found 90544 files matching **/*.{html}
[Parsing files]
Did not find a data-pagefind-body element on the site.
↳ Indexing all <body> elements on the site.
90544 pages found without an <html> element.
Pages without an outer <html> element will not be processed by default.
If adding this element is not possible, use the root selector config to target a different root element.
[Reading languages]
Discovered 0 languages:
[Building search indexes]
Total:
Indexed 0 languages
Indexed 0 pages
Indexed 0 words
Indexed 0 filters
Indexed 0 sorts
Error: Pagefind wasn't able to build an index.
Most likely, the directory passed to Pagefind was empty or did not contain any html files.
So it looks like it's seeing my files but not finding any tags, even though each page definitely contains an tag. Why is this?
My build directory isn't flat and is structured like:
build
category1
subcategory1
index.html
subcategory2
index.html
subcategory3
index.html
...
...
Is it failing to find any files because of my structure, or because of the gzip compression?
Either way, how would I resolve this?
For practicality reasons, I'm unable to change the folder structure nor remove gzip compression.
improvement Pagefind CLI