singe's grep - a fast grep using single-file parallelism

Dominic White

Last update: Nov 18, 2022

Related tags

Command-line singrep

Overview

singrep

singe's grep - a fast grep using single-file parallelism

singrep makes use of deterministic kernel file cache'ing to read the file fast enough to make multi-threading useful. It instructs the kernel to cache sections of the file to memory, then memory maps them for fast reads. Chunks of the file are then sent to separate thread to do the matching. On a modern multi-core system, this is significantly faster than other fast grep utilities.

This only works on Linux and macOS.

Compiling

You'll need a rust install, the easiet way is to use rustup.

In the cloned repository run:

cargo build --release

The resulting binary will be in target/releases/singrep.

Usage

singrep <pattern> <file>

Will search for occurances of pattern in the supplied file.

Advanced usage

Regex Match --regex, -r - will match using a regular expression
Exact Match --exact, -e - will only match lines that entirely match the pattern, incompatible with regex
First Match --first, -f - will exit after the first match is found, incompatible with regex
Byte Position --position, -p - will display the byte (not line) number where the pattern was found
Verbose --verbose, -v - will display some extra information

Performance Tuning

Block Size --block, -b

The block size controls how big a block will be read from the file at a time. This depends on the optimal speed of your drive. By default it is 8M (8_388_608). One way to test this is to do the following on a large file:

for x in 1M 1M 2M 4M 8M 12M; do time dd if=somefile of=/dev/null bs=$x; done

Running in --verbose mode will give stats on how fast the file was read from disk, for optimisation.

Cache Size --cache, -c

The cache size control how big the blocks of the file that are cached to the kernel's file pages are. On the systems I tested, this is about 68% of total system memory. But, if there's a ton of stuff running, your file cache can have less available space (MS Teams is a great way to test this). By default it is set to 2G (2_147_483_648).

You can find total memory with:

Linux cat /proc/meminfo |head -n1

macOS sysctl hw.memsize

Shard Size --shard, -s

The shard size controls how big the blocks of data to send to the threads should be. Running with --verbose and examining the thread waits can help to optimise this for your system. Fewer waits means the threads spend less time waiting for a new chunk to arrive.

Command-line tool that provides a workflow for extending, editing, diffing, and writing to vim-style grep lines.

Grug Grug is a command-line tool that provides a workflow for expanding, editing, diffing, and writing edits to files using vim-styled grep lines (suc

4 Apr 25, 2023

GREP like cli tool written in rust.

Show [ grep,tail,cat ] like cli tool written in rust. Only one release as of now which does very basic function,code has been refactored where other f

4 Jul 24, 2023

A command-line tool aiming to upload the local image used in your markdown file to the GitHub repo and replace the local file path with the returned URL.

Pup A command line tool aiming to upload the local image used in your markdown file to the GitHub repo and replace the local file path with the return

11 Aug 17, 2022

A tool for determining file types, an alternative to file

file-rs a tool for determining file types, an alternative to file whats done determining file extension determining file type determining file's mime

3 Nov 27, 2022

SAORI for UKAGAKA. Convert a image file to resized png file.

Resized Png GitHub repository これは何? デスクトップマスコット、「伺か」で使用できるSAORIの一種です。機能としては、指定した画像ファイルを拡大または縮小し、pngとして出力します。「伺か」「SAORI」等の用語については詳しく説明いたしませんのでご了承下さい。

2 Jan 3, 2023

FileSorterX is an automatic file sorting application that sorts your files into folders based on their file extension

FileSorterX is an automatic file sorting application that sorts your files into folders based on their file extension. With FileSorterX, you can easily keep your files organized and find what you need quickly.

22 Apr 4, 2023

My own image file format created for fun! Install the "hif_opener.exe" to open hif files. clone the repo and compile to make your own hif file

Why am i creating this? I wanted to create my own image format since I was 12 years old using Windows 7, tryna modify GTA San Andreas. That day, when

3 Dec 17, 2023

PyO3 bindings and Python interface to skani, a method for fast fast genomic identity calculation using sparse chaining.

🐍 ⛓️ 🧬 Pyskani PyO3 bindings and Python interface to skani, a method for fast fast genomic identity calculation using sparse chaining. 🗺️ Overview

13 Mar 21, 2023

Replace an app's icon from a png with a single terminal script. Made with Rust

Replace macOS App Icon Replace an app's icon from a png with a single terminal CLI. Made with Rust

8 Aug 3, 2022

singe's grep - a fast grep using single-file parallelism

Related tags

Overview

singrep

Compiling

Usage

Advanced usage

Performance Tuning

Block Size --block, -b

Cache Size --cache, -c

Shard Size --shard, -s

You might also like...

Command-line tool that provides a workflow for extending, editing, diffing, and writing to vim-style grep lines.

GREP like cli tool written in rust.

A command-line tool aiming to upload the local image used in your markdown file to the GitHub repo and replace the local file path with the returned URL.

A tool for determining file types, an alternative to file

SAORI for UKAGAKA. Convert a image file to resized png file.

FileSorterX is an automatic file sorting application that sorts your files into folders based on their file extension

My own image file format created for fun! Install the "hif_opener.exe" to open hif files. clone the repo and compile to make your own hif file

PyO3 bindings and Python interface to skani, a method for fast fast genomic identity calculation using sparse chaining.

Replace an app's icon from a png with a single terminal script. Made with Rust

Owner

Dominic White

Rust File Management CLI is a command-line tool written in Rust that provides essential file management functionalities. Whether you're working with files or directories, this tool simplifies common file operations with ease.

A command line tool for easily generating multiple versions of a configuration file from a single template

Set Shell Environment Variables across multiple shells with a single configuration file.

zman is a CLI year (time) progress that small, fast, and just one single binary.

Like grep, but uses tree-sitter grammars to search

Baby's first Rust CLI project. Basic implementation of grep. Written in about 100 SLOC.

Grep with human-friendly search output

A syntax-highlighting pager for git, diff, and grep output

A simplified recreation of the command-line utility grep written in Rust.

Fgr - Find & Grep utility with SQL-like query language