Fast line based iteration almost entirely lifted from ripgrep's grep_searcher.

Related tags

Command-line ripline
Overview

๐ŸŒŠ ripline

Build Status license Version info
This is not the greatest line reader in the world, this is just a tribute.

Fast line based iteration almost entirely lifted from ripgrep's grep_searcher.

All credit to Andrew Gallant and the ripgrep contributors.

Why?

  • Doesn't rely on a clousre like the bstr::for_line* methods (useful in some awkward lifetime scenarios).
  • No silently capped line lengths unlike rust-linereader
  • Brings the LineIter with for working with memmap files

Not all of this functionality was exposed in the grep_searcher crate, and rightly so as a lot of it had grep specific configurations embeded into the logic (i.e. binary detection).

What have I changed?

Not much. I took out some of the ripgrep specific logic such as the binary detection, some search related configs, and consolidated a few of the helper stucts from the other grep_* crates.

Example

See examples for more.

use grep_cli::stdout;
use ripline::{
    line_buffer::{LineBufferBuilder, LineBufferReader},
    lines::LineIter,
    LineTerminator,
};
use std::{env, error::Error, fs::File, io::Write, path::PathBuf};
use termcolor::ColorChoice;

fn main() -> Result<(), Box<dyn Error>> {
    let path = PathBuf::from(env::args().nth(1).expect("Failed to provide input file"));

    let mut out = stdout(ColorChoice::Never);

    let reader = File::open(&path)?;
    let terminator = LineTerminator::byte(b'\n');
    let mut line_buffer = LineBufferBuilder::new().build();
    let mut lb_reader = LineBufferReader::new(reader, &mut line_buffer);

    while lb_reader.fill()? {
        let lines = LineIter::new(terminator.as_byte(), lb_reader.buffer());
        for line in lines {
            out.write_all(line)?;
        }
        lb_reader.consume_all();
    }

    Ok(())
}

Crude and untrustworthy benchmarks

From examples/ripline_benchmarks.rs. Initial benchmark script take from rust-linereader, which is also included in the benchmarks as LR:*.

The input used was all_train.csv, unzipped can catted together five times createing a ~25G file.

Method Time Lines/sec Bandwidth
read() 2.01s 17439155/s 12303.42 MB/s
LR::next_batch() 2.11s 16576174/s 11694.59 MB/s
LR::next_line() 2.65s 13196734/s 9310.37 MB/s
ripline_line_buffer() 2.64s 13277194/s 9367.14 MB/s
ripline_mmap() 2.16s 16183503/s 11417.55 MB/s
bstr_for_line() 2.47s 14174502/s 10000.19 MB/s
read_until() 2.86s 12230594/s 8628.75 MB/s
read_line() 4.16s 8417415/s 5938.53 MB/s
lines() 5.05s 6930901/s 4889.79 MB/s

Note that read and next_batch are not counting lines. read_until() doesn't seem to perform as well in real-life scenarios as it does on this benchmark and I'm not sure why.

Hardware: Ubuntu 20 AMD Ryzen 9 3950X 16-Core Processor w/ 64 GB DDR4 memory and 1TB NVMe Drive

You might also like...
Xsv - A fast CSV command line toolkit written in Rust.

xsv is a command line program for indexing, slicing, analyzing, splitting and joining CSV files. Commands should be simple, fast and composable: Simpl

A blazing fast command line license generator for your open source projects written in Rust๐Ÿš€
A blazing fast command line license generator for your open source projects written in Rust๐Ÿš€

Overview This is a blazing fast โšก , command line license generator for your open source projects written in Rust. I know that GitHub

A blazingly fast command-line tool for converting Chinese punctuations to English punctuations
A blazingly fast command-line tool for converting Chinese punctuations to English punctuations

A blazingly fast command-line tool for converting Chinese punctuations to English punctuations

A robust, customizable, blazingly-fast, efficient and easy-to-use command line application to uwu'ify your text!
A robust, customizable, blazingly-fast, efficient and easy-to-use command line application to uwu'ify your text!

uwuifyy A robust, customizable, blazingly-fast, efficient and easy-to-use command line application to uwu'ify your text! Logo Credits: Jade Nelson Tab

SKYULL is a command-line interface (CLI) in development that creates REST API project structure templates with the aim of making it easy and fast to start a new project.

SKYULL is a command-line interface (CLI) in development that creates REST API project structure templates with the aim of making it easy and fast to start a new project. With just a few primary configurations, such as project name, you can get started quickly.

Fast command-line application to show the moon phase

moon-phases Command-line application to show the moon phase for a given date and time, as a text string, emoji, or numeric value. It can also show the

Crunch is a command-line interface (CLI) to claim staking rewards  every X hours for Substrate-based chains
Crunch is a command-line interface (CLI) to claim staking rewards every X hours for Substrate-based chains

crunch ยท crunch is a command-line interface (CLI) to claim staking rewards every X hours for Substrate-based chains. Why use crunch To automate payout

A simple command line based RPN calculator

stak A simple command line based RPN calculator Usage stak can be used in two modes: one-shot from the command line, or in an interactive shell One-sh

Scouty is a command-line interface (CLI) to keep an eye on substrate-based chains and hook things up
Scouty is a command-line interface (CLI) to keep an eye on substrate-based chains and hook things up

scouty is a command-line interface (CLI) to keep an eye on substrate-based chains and hook things up

Comments
  • Add example of swapping from linereader's next_batch() method

    Add example of swapping from linereader's next_batch() method

    I'd like to swap from linereader to this library due to the silently capped line lengths issue you mention in the README, but it's not obvious to me how to replicate that function using the APIs in ripline. It'd be helpful if you could add an example of processing batches of lines at a time. Thanks!

    opened by dimo414 2
Owner
Seth
Bioinformatics.
Seth
This repository presents a numbers vizualizer in a polar base. This small project has been entirely made in Rust !

NumbersRepresentation This repository presents a numbers vizualizer in a polar base. This small project has been entirely made in Rust ! This is an id

Lilian 'S3l4h' Schall 3 Apr 12, 2022
JiaShiwen 12 Nov 5, 2022
A blazingly fast rust-based bionic reader for blazingly fast reading within a terminal console ๐Ÿฆ€

This Rust-based CLI tool reads text and returns it back in bionic reading format for blazingly fast loading and even faster reading! Bionic reading is

Ismet Handzic 5 Aug 5, 2023
Fast Symbol Ranking based compressor. Based on the idea of Matt Mahoney's SR2

Fast Symbol Ranking based compressor. Based on the idea of Matt Mahoney's SR2

Mai Thanh Minh 3 Apr 29, 2023
Small command-line tool to switch monitor inputs from command line

swmon Small command-line tool to switch monitor inputs from command line Installation git clone https://github.com/cr1901/swmon cargo install --path .

William D. Jones 5 Aug 20, 2022
Checkline: checkbox line picker for stdin line input

checkline is a Unix command line interface (CLI) terminal user interface (TUI) that prompts you to check each line of stdin, to pick each line to output to stdout

SixArm 4 Dec 4, 2022
Command-line HTTP client for sending a POST request to specified URI on each stdin line.

line2httppost Simple tool to read lines from stdin and post each line as separate POST request to a specified URL (TCP connection is reused though). G

Vitaly Shukela 3 Jan 3, 2023
A full featured, fast Command Line Argument Parser for Rust

clap Command Line Argument Parser for Rust It is a simple-to-use, efficient, and full-featured library for parsing command line arguments and subcomma

null 10.4k Jan 10, 2023
โšก๏ธ Lightning-fast and minimal calendar command line. Written in Rust ๐Ÿฆ€

โšก๏ธ Lightning-fast and minimal calendar command line. It's similar to cal. Written in Rust ??

Arthur Henrique 36 Jan 1, 2023
A full featured, fast Command Line Argument Parser for Rust

clap Command Line Argument Parser for Rust It is a simple-to-use, efficient, and full-featured library for parsing command line arguments and subcomma

Ed Page 0 Jun 16, 2022