singe's grep - a fast grep using single-file parallelism

Related tags

Command-line singrep
Overview

singrep

singe's grep - a fast grep using single-file parallelism

singrep makes use of deterministic kernel file cache'ing to read the file fast enough to make multi-threading useful. It instructs the kernel to cache sections of the file to memory, then memory maps them for fast reads. Chunks of the file are then sent to separate thread to do the matching. On a modern multi-core system, this is significantly faster than other fast grep utilities.

This only works on Linux and macOS.

Compiling

You'll need a rust install, the easiet way is to use rustup.

In the cloned repository run:

cargo build --release

The resulting binary will be in target/releases/singrep.

Usage

singrep <pattern> <file>

Will search for occurances of pattern in the supplied file.

Advanced usage

  • Regex Match --regex, -r - will match using a regular expression
  • Exact Match --exact, -e - will only match lines that entirely match the pattern, incompatible with regex
  • First Match --first, -f - will exit after the first match is found, incompatible with regex
  • Byte Position --position, -p - will display the byte (not line) number where the pattern was found
  • Verbose --verbose, -v - will display some extra information

Performance Tuning

Block Size --block, -b

The block size controls how big a block will be read from the file at a time. This depends on the optimal speed of your drive. By default it is 8M (8_388_608). One way to test this is to do the following on a large file:

for x in 1M 1M 2M 4M 8M 12M; do time dd if=somefile of=/dev/null bs=$x; done

Running in --verbose mode will give stats on how fast the file was read from disk, for optimisation.

Cache Size --cache, -c

The cache size control how big the blocks of the file that are cached to the kernel's file pages are. On the systems I tested, this is about 68% of total system memory. But, if there's a ton of stuff running, your file cache can have less available space (MS Teams is a great way to test this). By default it is set to 2G (2_147_483_648).

You can find total memory with:

Linux cat /proc/meminfo |head -n1

macOS sysctl hw.memsize

Shard Size --shard, -s

The shard size controls how big the blocks of data to send to the threads should be. Running with --verbose and examining the thread waits can help to optimise this for your system. Fewer waits means the threads spend less time waiting for a new chunk to arrive.

You might also like...
Command-line tool that provides a workflow for extending, editing, diffing, and writing to vim-style grep lines.

Grug Grug is a command-line tool that provides a workflow for expanding, editing, diffing, and writing edits to files using vim-styled grep lines (suc

GREP like cli tool written in rust.

Show [ grep,tail,cat ] like cli tool written in rust. Only one release as of now which does very basic function,code has been refactored where other f

A command-line tool aiming to upload the local image used in your markdown file to the GitHub repo and replace the local file path with the returned URL.
A command-line tool aiming to upload the local image used in your markdown file to the GitHub repo and replace the local file path with the returned URL.

Pup A command line tool aiming to upload the local image used in your markdown file to the GitHub repo and replace the local file path with the return

A tool for determining file types, an alternative to file

file-rs a tool for determining file types, an alternative to file whats done determining file extension determining file type determining file's mime

SAORI for UKAGAKA. Convert a image file to resized png file.

Resized Png GitHub repository これは何? デスクトップマスコット、「伺か」で使用できるSAORIの一種です。 機能としては、指定した画像ファイルを拡大または縮小し、pngとして出力します。 「伺か」「SAORI」等の用語については詳しく説明いたしませんのでご了承下さい。

FileSorterX is an automatic file sorting application that sorts your files into folders based on their file extension
FileSorterX is an automatic file sorting application that sorts your files into folders based on their file extension

FileSorterX is an automatic file sorting application that sorts your files into folders based on their file extension. With FileSorterX, you can easily keep your files organized and find what you need quickly.

My own image file format created for fun! Install the "hif_opener.exe" to open hif files. clone the repo and compile to make your own hif file

Why am i creating this? I wanted to create my own image format since I was 12 years old using Windows 7, tryna modify GTA San Andreas. That day, when

PyO3 bindings and Python interface to skani, a method for fast fast genomic identity calculation using sparse chaining.

🐍 ⛓️ 🧬 Pyskani PyO3 bindings and Python interface to skani, a method for fast fast genomic identity calculation using sparse chaining. 🗺️ Overview

Replace an app's icon from a png with a single terminal script. Made with Rust

Replace macOS App Icon Replace an app's icon from a png with a single terminal CLI. Made with Rust

Owner
Dominic White
Dominic White
Rust File Management CLI is a command-line tool written in Rust that provides essential file management functionalities. Whether you're working with files or directories, this tool simplifies common file operations with ease.

Rust FileOps Rust File Management CLI is a command-line tool written in Rust that provides essential file management functionalities. Whether you're w

Harikesh Ranjan Sinha 5 May 2, 2024
A command line tool for easily generating multiple versions of a configuration file from a single template

MultiConf A command line tool for easily generating multiple versions of a configuration file from a single template. Why? I'm a big fan of the i3 win

Ian Clarke 4 Dec 10, 2022
Set Shell Environment Variables across multiple shells with a single configuration file.

Xshe – Cross-Shell Environment Vars xshe allows for setting Shell Environment Variables across multiple shells with a single TOML configuration file.

Ethan Kinnear 9 Dec 16, 2022
zman is a CLI year (time) progress that small, fast, and just one single binary.

zman zman is a CLI year (time) progress that small, fast, and just one single binary. Features Show year progress Show month, and week progress Show r

azzamsa 17 Dec 21, 2022
Like grep, but uses tree-sitter grammars to search

tree-grepper Works like grep, but uses tree-sitter to search for structure instead of strings. Installing This isn't available packaged anywhere. That

Brian Hicks 219 Dec 25, 2022
Baby's first Rust CLI project. Basic implementation of grep. Written in about 100 SLOC.

minigrep Coding project from Chapter 12 of the The Rust Programming Language book. Usage Compile and run as so minigrep QUERY FILENAME QUERY being the

Anis 2 Oct 2, 2021
Grep with human-friendly search output

hgrep: Human-friendly GREP hgrep is a grep tool to search files with given pattern and print the matched code snippets with human-friendly syntax high

Linda_pp 345 Jan 4, 2023
A syntax-highlighting pager for git, diff, and grep output

Get Started Install delta and add this to your ~/.gitconfig: [core] pager = delta [interactive] diffFilter = delta --color-only [delta]

Dan Davison 16k Dec 31, 2022
A simplified recreation of the command-line utility grep written in Rust.

smolgrep A simplified recreation of the command-line utility grep written in Rust. Download and run Download Rust On Mac/Linux Open a terminal and ent

Thi Dinh 0 Dec 27, 2021
Fgr - Find & Grep utility with SQL-like query language

fgr Find & Grep utility with SQL-like query language. Examples # Find all files with name equal to sample under the current directory: fgr -e name=sam

Igor 3 Dec 22, 2022