Read input lines as byte slices for high efficiency

Overview

bytelines

Crates.io Build Status

This library provides an easy way to read in input lines as byte slices for high efficiency. It's basically lines from the standard library, but it reads each line as a byte slice (&[u8]). This performs significantly faster than lines() in the case you don't particularly care about unicode, and basically as fast as writing the loops out by hand. Although the code itself is somewhat trivial, I've had to roll this in at least 4 tools I've written recently and so I figured it was time to have a convenience crate for it.

Installation

This tool will be available via Crates.io, so you can add it as a dependency in your Cargo.toml:

[dependencies]
bytelines = "2.2"

Usage

It's quite simple; in the place you would typically call lines on a BufRead implementor, you can now call byte_lines to retrieve a structure used to walk over lines as &[u8] (and thus avoid allocations). There are two ways to use the API, and both are shown below:

// our input file we're going to walk over lines of, and our reader
let file = File::open("./my-input.txt").expect("able to open file");
let reader = BufReader::new(file);
let mut lines = reader.byte_lines();

// Option 1: Walk using a `while` loop.
//
// This is the most performant option, as it avoids an allocation by
// simply referencing bytes inside the reading structure. This means
// that there's no copying at all, until the developer chooses to.
while let Some(line) = lines.next() {
    // do something with the line
}

// Option 2: Use the `Iterator` trait.
//
// This is more idiomatic, but requires allocating each line into
// an owned `Vec` to avoid potential memory safety issues. Although
// there is an allocation here, the overhead should be negligible
// except in cases where performance is paramount.
for line in lines.into_iter() {
    // do something with the line
}

This interface was introduced in the v2.x lineage of bytelines. The Iterator trait was previously implemented in v1.x, but required an unsafe contract in trying to be too idiomatic. This has since been fixed, and all unsafe code has been removed whilst providing IntoIterator implementations for those who prefer the cleaner syntax.

You might also like...
An efficient way to filter duplicate lines from input, à la uniq.

runiq This project offers an efficient way (in both time and space) to filter duplicate entries (lines) from texual input. This project was born from

⏮ ⏯ ⏭ A Rust library to easily read forwards, backwards or randomly through the lines of huge files.

EasyReader The main goal of this library is to allow long navigations through the lines of large files, freely moving forwards and backwards or gettin

Tells you how many years you need to wait until your subatomic xeon crystal synchronizer has doubled in plasma inversion efficiency on the Goldberg-Moleman scale or whatever.

about Tells you how many years you need to wait until your subatomic xeon crystal synchronizer has doubled in plasma inversion efficiency on the Goldb

An automated CLI tool that optimizes gas usage in Solidity smart contracts, focusing on storage and function call efficiency.

Solidity-Gas-Optimizoor An high performance automated CLI tool that optimizes gas usage in Solidity smart contracts, focusing on storage and function

Read and write ID3 tags with machine-readable input and output

ID3-JSON This project's goal is to provide an easy way to read and write ID3 tags with a consistent input and output. The existing tools I've found re

Rust-tokenizer offers high-performance tokenizers for modern language models, including WordPiece, Byte-Pair Encoding (BPE) and Unigram (SentencePiece) models

rust-tokenizers Rust-tokenizer offers high-performance tokenizers for modern language models, including WordPiece, Byte-Pair Encoding (BPE) and Unigra

SubStrings, Slices and Random String Access in Rust

SubStrings, Slices and Random String Access in Rust This is a simple way to do it. Description Rust string processing is kind of hard, because text in

Yet Another Parser library for Rust. A lightweight, dependency free, parser combinator inspired set of utility methods to help with parsing strings and slices.

Yap: Yet another (rust) parsing library A lightweight, dependency free, parser combinator inspired set of utility methods to help with parsing input.

Rust library crate providing utility functions for diff and patch of slices

This crate provides the Change enum as an abstraction for diff::Result, lcs_diff::DiffResult, and wu_diff::DiffResult; the diff_changes(), diff_diff()

An annotated string type in Rust, made up of string slices

A string type made up of multiple annotated string slices.

Count and convert between different indexing schemes on utf8 string slices

Str Indices Count and convert between different indexing schemes on utf8 string slices. The following schemes are currently supported: Chars (or "Unic

A rust library for sharing and updating arbitrary slices between threads, optimized for wait-free reads

atomicslice A Rust library for thread-safe shared slices that are just about as fast as possible to read while also being writable. Overview Use Atomi

Parse byte size into integer accurately.

parse-size parse-size is an accurate, customizable, allocation-free library for parsing byte size into integer. use parse_size::parse_size; assert_eq

Library + CLI-Tool to measure the TTFB (time to first byte) of HTTP requests. Additionally, this crate measures the times of DNS lookup, TCP connect and TLS handshake.

TTFB: CLI + Lib to Measure the TTFB of HTTP/1.1 Requests Similar to the network tab in Google Chrome or Mozilla Firefox, this crate helps you find the

decode a byte stream of varint length-encoded messages into a stream of chunks

length-prefixed-stream decode a byte stream of varint length-encoded messages into a stream of chunks This crate is similar to and compatible with the

UnixString is An FFI-friendly null-terminated byte string

UnixString is an FFI-friendly null-terminated byte string that may be constructed from a String, a CString, a PathBuf, an OsString or a collection of bytes.

Finds matching solidity function signatures for a given 4 byte signature hash and arguments.

Finds matching solidity function signatures for a given 4 byte signature hash and arguments. Useful for finding collisions or 0x00000000 gas saving methods (though there are better techniques for saving gas on calldata)

Variable-length signed and unsigned integer encoding that is byte-orderable for Rust

ordered-varint Provides variable-length signed and unsigned integer encoding that is byte-orderable. This crate provides the Variable trait which enco

An efficient method of heaplessly converting numbers into their string representations, storing the representation within a reusable byte array.

NumToA #![no_std] Compatible with Zero Heap Allocations The standard library provides a convenient method of converting numbers into strings, but thes

Comments
  • Documentation contains incorrect types

    Documentation contains incorrect types

    Apologies if I'm not understanding some Rust conventions but it seems to me that that the docs are misleading about the types yielded when iterating over lines. E.g.

    // do something with the line, which is &[u8]
    

    whereas really (AIUI) it's Result<&[u8], Error>. The same is true of several of the comments (and thus in the official docs on crates.io).

    opened by dandavison 4
  • empty line without \r causes panic

    empty line without \r causes panic

    use std::io;
    extern crate bytelines;
    use bytelines::*;
    
    fn main() {
        let stdin = io::stdin();
        for _ in stdin.lock().byte_lines() {}
    }
    
    $ echo | cargo run
    thread 'main' panicked at 'attempt to subtract with overflow', …/bytelines-2.2.1/src/lib.rs:129:36
    

    And the line that panics is:

    // also "pop" a leading \r
    if self.buffer[n - 1] == b'\r' {
    
    opened by vthriller 4
  • Add LICENSE file

    Add LICENSE file

    There are many MIT variants out there and it would be very helpful if you would put specific one into the LICENSE file and release new version. This is basically requirement for packaging this crate in Fedora.

    opened by ignatenkobrain 0
  • Add iterator that includes offset

    Add iterator that includes offset

    It would be great if there would be a function with identical usage, but that also includes the byte offset in the iterator item. I need this because i need to remove some characters from a specific line, but cant because the byte offset i have (from .take(n).fold(0, |count, v| count + v.len())) would not take into consideration the bytes of the actual newlines characters.

    opened by crowlKats 0
Releases(v2.4.0)
Owner
Isaac Whitfield
Fan of all things automated. OSS when applicable. Author of Cachex for Elixir. Senior Software Engineer at Axway. Intelligence wanes without practice.
Isaac Whitfield
⏮ ⏯ ⏭ A Rust library to easily read forwards, backwards or randomly through the lines of huge files.

EasyReader The main goal of this library is to allow long navigations through the lines of large files, freely moving forwards and backwards or gettin

Michele Federici 81 Dec 6, 2022
Rust-tokenizer offers high-performance tokenizers for modern language models, including WordPiece, Byte-Pair Encoding (BPE) and Unigram (SentencePiece) models

rust-tokenizers Rust-tokenizer offers high-performance tokenizers for modern language models, including WordPiece, Byte-Pair Encoding (BPE) and Unigra

null 165 Jan 1, 2023
nombytes is a library that provides a wrapper for the bytes::Bytes byte container for use with nom.

NomBytes nombytes is a library that provides a wrapper for the bytes::Bytes byte container for use with nom. I originally made this so that I could ha

Alexander Krivács Schrøder 2 Jul 25, 2022
(Read-only) Generate n-grams

N-grams Documentation This crate takes a sequence of tokens and generates an n-gram for it. For more information about n-grams, check wikipedia: https

Paul Woolcock 26 Dec 30, 2022
Read and modify constituency trees in Rust.

lumberjack Read and process constituency trees in various formats. Install: From crates.io: cargo install lumberjack-utils From GitHub: cargo install

Sebastian Pütz 10 Apr 28, 2022
A naive (read: slow) implementation of Word2Vec. Uses BLAS behind the scenes for speed.

SloWord2Vec This is a naive implementation of Word2Vec implemented in Rust. The goal is to learn the basic principles and formulas behind Word2Vec. BT

Lloyd 2 Jul 5, 2018
High precision decimal

decimal-rs High precision decimal with maximum precision of 38. Optional features serde When this optional dependency is enabled, Decimal implements t

CoD 22 Dec 28, 2022
High-performance time series downsampling algorithms for visualization

tsdownsample ?? Time series downsampling algorithms for visualization Features ✨ Fast: written in rust with PyO3 bindings leverages optimized argminma

PreDiCT.IDLab 5 Dec 8, 2022
Cloc - cloc counts blank lines, comment lines, and physical lines of source code in many programming languages.

cloc Count Lines of Code cloc counts blank lines, comment lines, and physical lines of source code in many programming languages. Latest release: v1.9

null 15.3k Jan 8, 2023
Easy c̵̰͠r̵̛̠ö̴̪s̶̩̒s̵̭̀-t̶̲͝h̶̯̚r̵̺͐e̷̖̽ḁ̴̍d̶̖̔ ȓ̵͙ė̶͎ḟ̴͙e̸̖͛r̶̖͗ë̶̱́ṉ̵̒ĉ̷̥e̷͚̍ s̷̹͌h̷̲̉a̵̭͋r̷̫̊ḭ̵̊n̷̬͂g̵̦̃ f̶̻̊ơ̵̜ṟ̸̈́ R̵̞̋ù̵̺s̷̖̅ţ̸͗!̸̼͋

Rust S̵̓i̸̓n̵̉ I̴n̴f̶e̸r̵n̷a̴l mutability! Howdy, friendly Rust developer! Ever had a value get m̵̯̅ð̶͊v̴̮̾ê̴̼͘d away right under your nose just when

null 294 Dec 23, 2022