nombytes is a library that provides a wrapper for the bytes::Bytes byte container for use with nom.

Overview

NomBytes

Crates.io Crates.io Crates.io Docs.io Docs master Rust codecov Codacy

nombytes is a library that provides a wrapper for the bytes::Bytes byte container for use with nom.

I originally made this so that I could have a function take a file name path and return parsed values that still had references to the loaded file without running into the lifetime issues associated with &[u8] and &str that would prevent me from doing so. I decided to release it as a crate so that others can make use of my efforts too.

This library has been tested to work with bytes down to v5.3.0 and nom down to v6.0.0 and has been marked as such in its Cargo.toml.

Usage

Put this in your Cargo.toml:

[dependencies]
nombytes = "0.1.1"

Features

miette

With the miette feature enabled, the NomBytes implements its SourceCode trait so it can be used directly with miette's #[source_code] error attribute. This feature also enables the std feature.

This library has been tested to work with miette down to v3.0.0 and has been marked as such in its Cargo.toml.

serde

Adds serde::Serialize and serde::Deserialize implementations to the types in this library to allow for using them with serde.

std

Enabled by default; allows creating NomBytes directly from Strings through a From<String> impl. With this feature turned off, this crate is #![no_std] compatible.

Example

Borrowed from the nom crate, using NomBytes instead of &str. Had to be modified slightly because NomBytes acts as &[u8] rather than as &str.

use nom::{
  IResult,
  bytes::complete::{tag, take_while_m_n},
  combinator::map_res,
  sequence::tuple};
use nombytes::NomBytes;

#[derive(Debug,PartialEq)]
pub struct Color {
  pub red:     u8,
  pub green:   u8,
  pub blue:    u8,
}

fn from_hex(input: NomBytes) -> Result<u8, std::num::ParseIntError> {
  u8::from_str_radix(input.to_str(), 16)
}

fn is_hex_digit(c: u8) -> bool {
  (c as char).is_digit(16)
}

fn hex_primary(input: NomBytes) -> IResult<NomBytes, u8> {
  map_res(
    take_while_m_n(2, 2, is_hex_digit),
    from_hex
  )(input)
}

fn hex_color(input: NomBytes) -> IResult<NomBytes, Color> {
  let (input, output) = tag("#")(input)?;

  let (input, (red, green, blue)) = tuple((hex_primary, hex_primary, hex_primary))(input)?;

  Ok((input, Color { red, green, blue }))
}

fn main() {
  assert!(matches!(hex_color(NomBytes::from("#2F14DF")),
    Ok((r, Color {
      red: 47,
      green: 20,
      blue: 223,
    })) if r.to_str() == ""));
}

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

You might also like...
A command-line tool and library for generating regular expressions from user-provided test cases
A command-line tool and library for generating regular expressions from user-provided test cases

Table of Contents What does this tool do? Do I still need to learn to write regexes then? Current features How to install? 4.1 The command-line tool 4

An efficient and powerful Rust library for word wrapping text.

Textwrap Textwrap is a library for wrapping and indenting text. It is most often used by command-line programs to format dynamic output nicely so it l

⏮ ⏯ ⏭ A Rust library to easily read forwards, backwards or randomly through the lines of huge files.

EasyReader The main goal of this library is to allow long navigations through the lines of large files, freely moving forwards and backwards or gettin

Natural language detection library for Rust. Try demo online: https://www.greyblake.com/whatlang/
Natural language detection library for Rust. Try demo online: https://www.greyblake.com/whatlang/

Whatlang Natural language detection for Rust with focus on simplicity and performance. Content Features Get started Documentation Supported languages

A Rust library for generically joining iterables with a separator

joinery A Rust library for generically joining iterables with a separator. Provides the tragically missing string join functionality to rust. extern c

👄 The most accurate natural language detection library in the Rust ecosystem, suitable for long and short text alike
👄 The most accurate natural language detection library in the Rust ecosystem, suitable for long and short text alike

Table of Contents What does this library do? Why does this library exist? Which languages are supported? How good is it? Why is it better than other l

A morphological analysis library.

Lindera A Japanese morphological analysis library in Rust. This project fork from fulmicoton's kuromoji-rs. Lindera aims to build a library which is e

A fast, low-resource Natural Language Processing and Text Correction library written in Rust.

nlprule A fast, low-resource Natural Language Processing and Error Correction library written in Rust. nlprule implements a rule- and lookup-based app

A Rust library containing an offline version of webster's dictionary.

webster-rs A Rust library containing an offline version of webster's dictionary. Add to Cargo.toml webster = 0.3.0 Simple example: fn main() { le

Releases(v0.1.1)
  • v0.1.1(Jul 24, 2022)

    https://crates.io/crates/nombytes/0.1.1

    Added ✨

    • Added serde support

    Changed 🔧

    • Updated README description
    • Removed unnecessary cloning where possible

    Fixed 🐛

    • Fixed minor typo in README
    Source code(tar.gz)
    Source code(zip)
  • v0.1(Jul 24, 2022)

Owner
Alexander Krivács Schrøder
Alexander Krivács Schrøder
Rust-tokenizer offers high-performance tokenizers for modern language models, including WordPiece, Byte-Pair Encoding (BPE) and Unigram (SentencePiece) models

rust-tokenizers Rust-tokenizer offers high-performance tokenizers for modern language models, including WordPiece, Byte-Pair Encoding (BPE) and Unigra

null 165 Jan 1, 2023
Rust wrapper for the BlingFire tokenization library

BlingFire in Rust blingfire is a thin Rust wrapper for the BlingFire tokenization library. Add the library to Cargo.toml to get started cargo add blin

Re:infer 14 Sep 5, 2022
Wrapper around Microsoft CNTK library

Bindings for CNTK library Simple low level bindings for CNTK library from Microsoft. API Documentation Status Currently exploring ways how to interact

Vlado Boza 21 Nov 30, 2021
A Rust wrapper for the Text synthesization service TextSynth API

A Rust wrapper for the Text synthesization service TextSynth API

ALinuxPerson 2 Mar 24, 2022
🛥 Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto.

?? python-vaporetto ?? Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto. Installation

null 17 Dec 22, 2022
Viterbi-based accelerated tokenizer (Python wrapper)

?? python-vibrato ?? Vibrato is a fast implementation of tokenization (or morphological analysis) based on the Viterbi algorithm. This is a Python wra

null 20 Dec 29, 2022
Rust-nlp is a library to use Natural Language Processing algorithm with RUST

nlp Rust-nlp Implemented algorithm Distance Levenshtein (Explanation) Jaro / Jaro-Winkler (Explanation) Phonetics Soundex (Explanation) Metaphone (Exp

Simon Paitrault 34 Dec 20, 2022
Text Expression Runner – Readable and easy to use text expressions

ter - Text Expression Runner ter is a cli to run text expressions and perform basic text operations such as filtering, ignoring and replacing on the c

Maximilian Schulke 72 Jul 31, 2022
Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)

rust-bert Rust native Transformer-based models implementation. Port of Hugging Face's Transformers library, using the tch-rs crate and pre-processing

null 1.3k Jan 8, 2023
Subtitles-rs - Use SRT subtitle files to study foreign languages

Rust subtitle utilities Are you looking for substudy? Try here. (substudy has been merged into the subtitles-rs project.) This repository contains a n

Eric Kidd 268 Dec 29, 2022