A small rust library for creating regex-based lexers

nph

Last update: Feb 5, 2022

Related tags

Text processing reglex

Overview

reglex

A small rust library for creating regex-based lexers

Example

|_| Some(Token::Keyword), r"\d+" => |s: &str| Some(Token::Number(s.parse().unwrap())), r"\{" => |_| Some(Token::Left), r"\}" => |_| Some(Token::Right), r"\s" => |_| None ]; lex(&regexes, input) } fn main() { assert_eq!( lexer(&"kw { 12 53 }".to_string()), Ok(vec![ Token::Keyword, Token::Left, Token::Number(12), Token::Number(53), Token::Right ]) ); assert_eq!(lexer(&"kw ERROR! { 12 53 }".to_string()), Err(3)); }">

use reglex::{RuleList, rule_list, lex};

#[derive(Debug, PartialEq)]
enum Token {
    Keyword,
    Number(u64),
    Left,
    Right,
}

fn lexer(input: &String) -> Result<Vec
   , 
   usize> {
    
   let regexes: RuleList
   <Token
   > 
   = 
   rule_list! [
        
   "kw" 
   => 
   |_
   | 
   Some(Token
   ::Keyword),
        
   r"\d+" 
   => 
   |s: 
   &
   str
   | 
   Some(Token
   ::
   Number(s.
   parse().
   unwrap())),
        
   r"\{" 
   => 
   |_
   | 
   Some(Token
   ::Left),
        
   r"\}" 
   => 
   |_
   | 
   Some(Token
   ::Right),
        
   r"\s" 
   => 
   |_
   | 
   None
    ];

    
   lex(
   &regexes, input)
}


   fn 
   main() {
    
   assert_eq!(
        
   lexer(
   &
   "kw  { 12 53 }".
   to_string()),
        
   Ok(
   vec![
            Token
   ::Keyword,
            Token
   ::Left,
            Token
   ::
   Number(
   12),
            Token
   ::
   Number(
   53),
            Token
   ::Right
        ])
    );

    
   assert_eq!(
   lexer(
   &
   "kw ERROR! { 12 53 }".
   to_string()), 
   Err(
   3));
}

Checks all your documentation for spelling and grammar mistakes with hunspell and a nlprule based checker for grammar

cargo-spellcheck Check your spelling with hunspell and/or nlprule. Use Cases Run cargo spellcheck --fix or cargo spellcheck fix to fix all your docume

274 Nov 5, 2022

🛥 Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto.

🐍 python-vaporetto 🛥 Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto. Installation

17 Dec 22, 2022

Viterbi-based accelerated tokenizer (Python wrapper)

🐍 python-vibrato 🎤 Vibrato is a fast implementation of tokenization (or morphological analysis) based on the Viterbi algorithm. This is a Python wra

20 Dec 29, 2022

Rust-nlp is a library to use Natural Language Processing algorithm with RUST

nlp Rust-nlp Implemented algorithm Distance Levenshtein (Explanation) Jaro / Jaro-Winkler (Explanation) Phonetics Soundex (Explanation) Metaphone (Exp

34 Dec 20, 2022

An efficient and powerful Rust library for word wrapping text.

Textwrap Textwrap is a library for wrapping and indenting text. It is most often used by command-line programs to format dynamic output nicely so it l

322 Dec 26, 2022

⏮ ⏯ ⏭ A Rust library to easily read forwards, backwards or randomly through the lines of huge files.

EasyReader The main goal of this library is to allow long navigations through the lines of large files, freely moving forwards and backwards or gettin

81 Dec 6, 2022

Natural language detection library for Rust. Try demo online: https://www.greyblake.com/whatlang/

Whatlang Natural language detection for Rust with focus on simplicity and performance. Content Features Get started Documentation Supported languages

805 Dec 28, 2022

A Rust library for generically joining iterables with a separator

joinery A Rust library for generically joining iterables with a separator. Provides the tragically missing string join functionality to rust. extern c

72 Dec 16, 2022

👄 The most accurate natural language detection library in the Rust ecosystem, suitable for long and short text alike

Table of Contents What does this library do? Why does this library exist? Which languages are supported? How good is it? Why is it better than other l

569 Jan 3, 2023

A small rust library for creating regex-based lexers

Related tags

Overview

reglex

Example

You might also like...

Checks all your documentation for spelling and grammar mistakes with hunspell and a nlprule based checker for grammar

🛥 Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto.

Viterbi-based accelerated tokenizer (Python wrapper)

Rust-nlp is a library to use Natural Language Processing algorithm with RUST

An efficient and powerful Rust library for word wrapping text.

⏮ ⏯ ⏭ A Rust library to easily read forwards, backwards or randomly through the lines of huge files.

Natural language detection library for Rust. Try demo online: https://www.greyblake.com/whatlang/

A Rust library for generically joining iterables with a separator

👄 The most accurate natural language detection library in the Rust ecosystem, suitable for long and short text alike

Owner

nph

A small CLI utility for helping you learn japanese words made in rust 🦀

frawk is a small programming language for writing short programs processing textual data

A rule based sentence segmentation library.

Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)

Neural network transition-based dependency parser (in Rust)

A backend for mdBook written in Rust for generating PDF based on headless chrome and Chrome DevTools Protocol.

Simple STM32F103 based glitcher FW

Difftastic is an experimental structured diff tool that compares files based on their syntax.

Probabilistically split concatenated words using NLP based on English Wikipedia unigram frequencies.

Vaporetto: a fast and lightweight pointwise prediction based tokenizer