Previously, hashing only occurred on one thread. Additionally, the access patterns were near-worst-case sometimes.
Additionally, the code was (and still is) way more complicated than necessary.
I haven't verified if the ignore code is right, but it looks to be. Please review.
Please test on a single non-SSD, non-ZFS disk before merging.
Testing cold caches is difficult on this machine.
This PR is complete when:
- [x] BufReader to improve buffering
- [x] parallelizing with automatic num_cpus actually uses all cores
- [x] progress bar replaced with
indicatif
as it supports Send
- [x] sort files by inode before reading to improve spinning-rust performance (inode order is a bit of an approximation of file creation order)
- [x] option parsing works
Some smaller gains to be had on spinning rust if desired:
fiemap
with inode sort fallback (I don't have a filesystem that supports fiemap
handy)
- Batched
posix_fadvise
(POSIX_FADV_SEQUENTIAL
, POSIX_FADV_WILLNEED
, with POSIX_FADV_DONTNEED
afterward) calls to indicate that the OS should warm the caches before access.
MAP_POPULATE
for mmap - not sure if this is faster on spinning rust
There is an option parsing bug where --jobs
does not work. I'd rather just replace the whole option parsing with the newer style of clap, than try to fix that bug.
Benchmarks
System
- Ryzen 7 2700X
- 64GB 3200MT memory
- Samsung 970 EVO 1TB
- 9-disk SAS ZFS array in RAIDZ2 configuration
Dataset
3,045 files, 40GB total. ZFS compression disabled.
Size distribution:
4k: 2
16k: 6
64k: 4
128k: 11
256k: 99
512k: 46
1M: 176
2M: 94
4M: 84
8M: 1280
16M: 840
32M: 382
64M: 21
These are after the third run, to make sure the disk cache is warm.
The disk cache on this machine uses about 30GB of RAM.
Warm cache
- SSD Before: 46 seconds, about 870MB/s because most of the data is in RAM.
quickdash -a sha1 --force --create ~/uncompressed/unpacked 22.68s user 22.42s system 96% cpu 46.566 total
- SSD After: 10.2 seconds, about 3.9GB/s because most of the data is cached.
cargo run --release -- -a sha1 --force --create ~/uncompressed/unpacked 25.16s user 38.71s system 626% cpu 10.189 total
- HDD Array Before: 4m50s, or about 138MB/s. Slightly more than 50% of one disk's read performance, even with a warm cache.
quickdash -a sha1 --force --create ~/spinning-uncompressed/unpacked 31.07s user 30.77s system 21% cpu 4:49.98 total
- HDD Array After: 23 seconds, about 1.74GB/s which is pretty close to the 2250GB/s these disks are capable of on sequential reads.
cargo run --release -- -a sha1 --force --create 33.93s user 35.05s system 300% cpu 22.965 total
Cold cache
- SSD Before: 50 seconds, about 800MB/s.
quickdash -a sha1 --force --create ~/uncompressed/unpacked 22.92s user 23.79s system 93% cpu 49.831 total
- SSD After: 14 seconds, or about 2.86GB/s. SSD is rated for 3500MB/s sequential.
cargo run --release -- -a sha1 --force --create ~/uncompressed/unpacked 24.99s user 34.71s system 422% cpu 14.144 total
- HDD Array Before: 6:13.54, or about 106MB/s
quickdash -a sha1 --force --create ~/spinning-uncompressed/unpacked 37.68s user 48.41s system 23% cpu 6:13.54 total
- HDD Array After: 1 minute 38 seconds, or about 408MB/s. There's still a lot of room to improve since each disk was only getting about 60MB/s, and I suspect a single drive would have similar improvements left.
cargo run --release -- -a sha1 --force --create 38.40s user 44.48s system 84% cpu 1:38.27 total