memchr vs stringzilla - up to 7x throughput difference between two SIMD-accelerated substring search libraries in Rust

Overview

Rust Substring Search Benchmarks

Substring search is one of the most common operations in text processing, and one of the slowest. StringZilla was designed to supersede LibC and implement those core operations in CPU-friendly manner, using branchless operations, SWAR, and SIMD assembly instructions. Notably, Rust has a memchr crate that provides a similar functionality, and it's used in many popular libraries. This repository provides basic benchmarking scripts for comparing the throughput of stringzilla and memchr. For normal order and reverse order search, over ASCII and UTF8 input data, the following numbers can be expected.

ASCII ⏩ ASCII ⏪ UTF8 ⏩ UTF8 ⏪
Intel:
memchr 5.89 GB/s 1.08 GB/s 8.73 GB/s 3.35 GB/s
stringzilla 8.37 GB/s 8.21 GB/s 11.21 GB/s 11.20 GB/s
Arm:
memchr 6.38 GB/s 1.12 GB/s 13.20 GB/s 3.56 GB/s
stringzilla 6.56 GB/s 5.56 GB/s 9.41 GB/s 8.17 GB/s
Average 1.2x faster 6.2x faster - 2.8x faster

For Intel the benchmark was run on AWS r7iz instances with Sapphire Rapids cores. For Arm the benchmark was run on AWS r7g instances with Graviton 3 cores. The ⏩ signifies forward search, and ⏪ signifies reverse order search. At the time of writing, the latest versions of memchr and stringzilla were used - 2.7.1 and 3.3.0, respectively.

Replicating the Results

Before running benchmarks, you can test your Rust environment running:

cargo install cargo-criterion --locked
HAYSTACK_PATH=README.md cargo criterion --jobs 8

As part of the benchmark, the input "haystack" file is whitespace-tokenized into an array of strings. In every benchmark iteration, a new "needle" is taken from that array of tokens. All inclusions of that token in the haystack are counted, and the throughput is calculated. This generally results in very stable and predictable results. The benchmark also includes a warm-up, to ensure that the CPU caches are filled and the results are not affected by cold start or SIMD-related frequency scaling.

ASCII Corpus

For benchmarks on ASCII data I've used the English Leipzig Corpora Collection. It's 124 MB in size, 1'000'000 lines long, and contains 8'388'608 tokens of mean length 5.

wget --no-clobber -O leipzig1M.txt https://introcs.cs.princeton.edu/python/42sort/leipzig1m.txt 
HAYSTACK_PATH=leipzig1M.txt cargo criterion --jobs 8

UTF8 Corpus

For richer mixed UTF data, I've used the XL Sum dataset for multilingual extractive summarization. It's 4.7 GB in size (1.7 GB compressed), 1'004'598 lines long, and contains 268'435'456 tokens of mean length 8. To download, unpack, and run the benchmarks, execute the following bash script in your terminal:

wget --no-clobber -O xlsum.csv.gz https://github.com/ashvardanian/xl-sum/releases/download/v1.0.0/xlsum.csv.gz
gzip -d xlsum.csv.gz
HAYSTACK_PATH=xlsum.csv cargo criterion --jobs 8
You might also like...
Converts between country names, ISO 3166-1 codes and flag emojis.

country-emoji Converts between country names, ISO 3166-1 codes and flag emojis. Usage use country_emoji::{flag, code, name, countries}; flag("CL") /

The Fast Vector Similarity Library is designed to provide efficient computation of various similarity measures between vectors.
The Fast Vector Similarity Library is designed to provide efficient computation of various similarity measures between vectors.

Fast Vector Similarity Library Introduction The Fast Vector Similarity Library is designed to provide efficient computation of various similarity meas

Rust wrappers for NGT approximate nearest neighbor search

ngt-rs   Rust wrappers for NGT, which provides high-speed approximate nearest neighbor searches against a large volume of data. Note that NGT will be

Search and read 'The Rust Book' from the terminal
Search and read 'The Rust Book' from the terminal

TheBook TheBook is a command line utility that allows you to SEARCH and READ The Rust Programming Language (popularly known as 'The Book' ) from the t

Static low-bandwidth search at scale

Pagefind Pagefind is a fully static search library that aims to perform well on large sites, while using as little of your users' bandwidth as possibl

Leetcode Solutions in Rust, Advent of Code Solutions in Rust and more

RUST GYM Rust Solutions Leetcode Solutions in Rust AdventOfCode Solutions in Rust This project demostrates how to create Data Structures and to implem

Simple autoclicker written in Rust, to learn the Rust language.

RClicker is an autoclicker written in Rust, written to learn more about the Rust programming language. RClicker was was written by me to learn more ab

Rust programs written entirely in Rust

mustang Programs written entirely in Rust Mustang is a system for building programs built entirely in Rust, meaning they do not depend on any part of

Rust 核心库和标准库的源码级中文翻译,可作为 IDE 工具的智能提示 (Rust core library and standard library translation. can be used as IntelliSense for IDE tools)

Rust 标准库中文版 这是翻译 Rust 库 的地方, 相关源代码来自于 https://github.com/rust-lang/rust。 如果您不会说英语,那么拥有使用中文的文档至关重要,即使您会说英语,使用母语也仍然能让您感到愉快。Rust 标准库是高质量的,不管是新手还是老手,都可以从中

Owner
Ash Vardanian
CS/AI Researcher & Investor, Founder @unum-cloud, co-organizer @cpp-armenia 🇦🇲
Ash Vardanian
A fast rendezvous in rust where data can optionally be swapped between the two threads.

rendezvous_swap A rendezvous is an execution barrier between a pair of threads, but this crate also provides the option of swapping data at the synchr

Erik 5 Mar 17, 2023
Super-simple, fully Rust powered "memory" (doc store + semantic search) for LLM projects, semantic search, etc.

memex Super simple "memory" for LLM projects, semantic search, etc. Running the service Note that if you're running on Apple silicon (M1/M2/etc.), it'

Spyglass Search 15 Jun 19, 2023
Support SIMD low-memory overhead and high-performance adaptive radix tree.

Artful Artful is an adaptive radix tree library for Rust. At a high-level, it's like a BTreeMap. It is based on the implementation of paper, see The A

future 3 Sep 7, 2022
Rust library for hardware accelerated drawing of 2D shapes, images, and text, with an easy to use API.

Speedy2D Hardware-accelerated drawing of shapes, images, and text, with an easy to use API. Speedy2D aims to be: The simplest Rust API for creating a

null 223 Dec 26, 2022
This is a lightweight audio-video player built in Rust using FFmpeg libraries. It demonstrates the usage of FFmpeg with Rust to play back video files.

FFmpeg Rust Video Player This is a lightweight audio-video player built in Rust using FFmpeg libraries. It demonstrates the usage of FFmpeg with Rust

Jenin Sutradhar 3 Apr 10, 2024
Rust libraries for Bluesky's AT Protocol services. NOT STABLE (yet)

ATrium ATrium is a collection of Rust libraries designed to work with the AT Protocol, providing a versatile and coherent ecosystem for developers. Th

Yoshihiro Sugi 43 Jun 25, 2023
Core libraries, services and CLIs for Monetæ

Core libraries, services and CLIs for Monetæ

monetæ 1 Nov 1, 2021
A library for transcoding between bytes in Astro Notation Format and Native Rust data types.

Rust Astro Notation A library for transcoding between hexadecimal strings in Astro Notation Format and Native Rust data types. Usage In your Cargo.tom

Stelar Software 1 Feb 4, 2022
List public items (public API) of Rust library crates. Enables diffing public API between releases.

cargo wrapper for this library You probably want the cargo wrapper to this library. See https://github.com/Enselic/cargo-public-items. public_items Li

Martin Nordholts 20 Dec 26, 2022
A simple programming language for something between C and Rust.

inuc inuc is a systems programming language that is something between C and Rust. Features : [] Strong , static typing (type inference not a priority

Sagnik Chatterjee 1 Feb 7, 2022