memchr vs stringzilla - up to 7x throughput difference between two SIMD-accelerated substring search libraries in Rust

Ash Vardanian

Last update: Mar 5, 2024

Related tags

Learning Resources benchmark string libc string-search substring-search strstr memchr

Overview

`memchr` vs `stringzilla`

Rust Substring Search Benchmarks

Substring search is one of the most common operations in text processing, and one of the slowest. StringZilla was designed to supersede LibC and implement those core operations in CPU-friendly manner, using branchless operations, SWAR, and SIMD assembly instructions. Notably, Rust has a memchr crate that provides a similar functionality, and it's used in many popular libraries. This repository provides basic benchmarking scripts for comparing the throughput of stringzilla and memchr. For normal order and reverse order search, over ASCII and UTF8 input data, the following numbers can be expected.

	ASCII ⏩	ASCII ⏪	UTF8 ⏩	UTF8 ⏪
Intel:
`memchr`	5.89 GB/s	1.08 GB/s	8.73 GB/s	3.35 GB/s
`stringzilla`	8.37 GB/s	8.21 GB/s	11.21 GB/s	11.20 GB/s
Arm:
`memchr`	6.38 GB/s	1.12 GB/s	13.20 GB/s	3.56 GB/s
`stringzilla`	6.56 GB/s	5.56 GB/s	9.41 GB/s	8.17 GB/s

Average	1.2x faster	6.2x faster	-	2.8x faster

For Intel the benchmark was run on AWS r7iz instances with Sapphire Rapids cores. For Arm the benchmark was run on AWS r7g instances with Graviton 3 cores. The ⏩ signifies forward search, and ⏪ signifies reverse order search. At the time of writing, the latest versions of memchr and stringzilla were used - 2.7.1 and 3.3.0, respectively.

Replicating the Results

Before running benchmarks, you can test your Rust environment running:

cargo install cargo-criterion --locked
HAYSTACK_PATH=README.md cargo criterion --jobs 8

As part of the benchmark, the input "haystack" file is whitespace-tokenized into an array of strings. In every benchmark iteration, a new "needle" is taken from that array of tokens. All inclusions of that token in the haystack are counted, and the throughput is calculated. This generally results in very stable and predictable results. The benchmark also includes a warm-up, to ensure that the CPU caches are filled and the results are not affected by cold start or SIMD-related frequency scaling.

ASCII Corpus

For benchmarks on ASCII data I've used the English Leipzig Corpora Collection. It's 124 MB in size, 1'000'000 lines long, and contains 8'388'608 tokens of mean length 5.

wget --no-clobber -O leipzig1M.txt https://introcs.cs.princeton.edu/python/42sort/leipzig1m.txt 
HAYSTACK_PATH=leipzig1M.txt cargo criterion --jobs 8

UTF8 Corpus

For richer mixed UTF data, I've used the XL Sum dataset for multilingual extractive summarization. It's 4.7 GB in size (1.7 GB compressed), 1'004'598 lines long, and contains 268'435'456 tokens of mean length 8. To download, unpack, and run the benchmarks, execute the following bash script in your terminal:

wget --no-clobber -O xlsum.csv.gz https://github.com/ashvardanian/xl-sum/releases/download/v1.0.0/xlsum.csv.gz
gzip -d xlsum.csv.gz
HAYSTACK_PATH=xlsum.csv cargo criterion --jobs 8

Converts between country names, ISO 3166-1 codes and flag emojis.

country-emoji Converts between country names, ISO 3166-1 codes and flag emojis. Usage use country_emoji::{flag, code, name, countries}; flag("CL") /

4 Oct 25, 2022

The Fast Vector Similarity Library is designed to provide efficient computation of various similarity measures between vectors.

Fast Vector Similarity Library Introduction The Fast Vector Similarity Library is designed to provide efficient computation of various similarity meas

243 Sep 6, 2023

Rust wrappers for NGT approximate nearest neighbor search

ngt-rs Rust wrappers for NGT, which provides high-speed approximate nearest neighbor searches against a large volume of data. Note that NGT will be

16 Sep 19, 2022

Search and read 'The Rust Book' from the terminal

TheBook TheBook is a command line utility that allows you to SEARCH and READ The Rust Programming Language (popularly known as 'The Book' ) from the t

239 Jan 4, 2023

Static low-bandwidth search at scale

Pagefind Pagefind is a fully static search library that aims to perform well on large sites, while using as little of your users' bandwidth as possibl

657 Dec 30, 2022

Leetcode Solutions in Rust, Advent of Code Solutions in Rust and more

RUST GYM Rust Solutions Leetcode Solutions in Rust AdventOfCode Solutions in Rust This project demostrates how to create Data Structures and to implem

635 Jan 3, 2023

Simple autoclicker written in Rust, to learn the Rust language.

RClicker is an autoclicker written in Rust, written to learn more about the Rust programming language. RClicker was was written by me to learn more ab

7 Nov 15, 2022

Rust programs written entirely in Rust

mustang Programs written entirely in Rust Mustang is a system for building programs built entirely in Rust, meaning they do not depend on any part of

561 Dec 26, 2022

Rust 核心库和标准库的源码级中文翻译，可作为 IDE 工具的智能提示 (Rust core library and standard library translation. can be used as IntelliSense for IDE tools)

Rust 标准库中文版这是翻译 Rust 库的地方，相关源代码来自于 https://github.com/rust-lang/rust。如果您不会说英语，那么拥有使用中文的文档至关重要，即使您会说英语，使用母语也仍然能让您感到愉快。Rust 标准库是高质量的，不管是新手还是老手，都可以从中

493 Jan 4, 2023

memchr vs stringzilla - up to 7x throughput difference between two SIMD-accelerated substring search libraries in Rust

Related tags

Overview

`memchr` vs `stringzilla`

Rust Substring Search Benchmarks

Replicating the Results

ASCII Corpus

UTF8 Corpus

You might also like...

Converts between country names, ISO 3166-1 codes and flag emojis.

The Fast Vector Similarity Library is designed to provide efficient computation of various similarity measures between vectors.

Rust wrappers for NGT approximate nearest neighbor search

Search and read 'The Rust Book' from the terminal

Static low-bandwidth search at scale

Leetcode Solutions in Rust, Advent of Code Solutions in Rust and more

Simple autoclicker written in Rust, to learn the Rust language.

Rust programs written entirely in Rust

Rust 核心库和标准库的源码级中文翻译，可作为 IDE 工具的智能提示 (Rust core library and standard library translation. can be used as IntelliSense for IDE tools)

Owner

Ash Vardanian

A fast rendezvous in rust where data can optionally be swapped between the two threads.

Super-simple, fully Rust powered "memory" (doc store + semantic search) for LLM projects, semantic search, etc.

Support SIMD low-memory overhead and high-performance adaptive radix tree.

Rust library for hardware accelerated drawing of 2D shapes, images, and text, with an easy to use API.

This is a lightweight audio-video player built in Rust using FFmpeg libraries. It demonstrates the usage of FFmpeg with Rust to play back video files.

Rust libraries for Bluesky's AT Protocol services. NOT STABLE (yet)

Core libraries, services and CLIs for Monetæ

A library for transcoding between bytes in Astro Notation Format and Native Rust data types.

List public items (public API) of Rust library crates. Enables diffing public API between releases.

A simple programming language for something between C and Rust.