A simple lexer which creates over 75 various tokens based on the rust programming language.

Overview

Documentation.

This complete Lexer/Lexical Scanner produces tokens for a string or a file path entry. The output is a Vector for the user to handle according to their needs. All tokens are included (including whitespace) as that is left to the user to decide how to use the tokens.

To see an example of an output, check out the wiki page:

https://github.com/mjehrhart/lexical_scanner/wiki/Example

Setup

Add this depedency to TOML

[dependencies]
lexical_scanner = "0.1.14"

Basic Usage

The two ways to perform a lexical scan is to pass in file path or pass in a string. Passing in a string is mostly used for testing while passing in a file path is common for every day work. A lexical scanner can produces thousands and thousands of tokens very quickly. For this reason, it is best to use a file path.

use lexical_scanner;

fn main() {
    let path = "/Users/gues/my_file_to_read_into_tokens/temp.txt";
    let token_list = lexical_scanner::lexer(&path); 
}

That is all there is to do! The lexical scanner returns a Vec for the user to handle as needed.

To test with a string, all you need to do is call this

use lexical_scanner;

fn main() {
    let text = "Water is a good choice of drink!";
    let token_list = lexical_scanner::lexer_as_str(&text); 
}

Below is a simple way to view the tokens for unit testing:

for (i, token) in token_list.iter().enumerate(){
    println!("{}. {:?}", i, token);
}

Supported Tokens

And,
AndAnd,
AndEq,
At,
Backslash,
BitCharacterCode7(String),
BitCharacterCode8(String),
BlockCommentStart(String),
BlockCommentStop(String),
BracketLeft,
BracketRight,
Byte(String),
ByteString(String),
Caret,
CaretEq,
CarriageReturn,
Character(String),
Colon,
Comma,
CurlyBraceLeft,
CurlyBraceRight,
Dollar,
Dot,
DotDot,
DotDotDot,
DotDotEq,
DoubleQuote,
Eq,
EqEq,
Ge,
Gt,
FatArrow,
InnerBlockDoc(String),
InnerLineDoc(String),
Le,
LineComment(String),
Lt,
Minus,
MinusEq,
Or,
OrEq,
OrOr,
OuterBlockDoc(String),
OuterLineDoc(String),
Newline,
Not,
NotEq,
Null,
Floating(String),
Numeric(String),
ParenLeft,
ParenRight,
PathSep,
Percent,
PercentEq,
Plus,
PlusEq,
Pound,
Question,
RArrow,
RawString(String),
RawByteString(String),
Semi,
Shl,
ShlEq,
Shr,
ShrEq,
SingleQuote,
Slash,
SlashEq,
Star,
StarEq,
Stopped(String), //for debugging
String(String),
Tab,
Undefined,
Underscore,
WhiteSpace,
Word(String),
KW_As,
KW_Async,
KW_Await,
KW_Break,
KW_Const,
KW_Contine,
KW_Crate,
KW_Dyn,
KW_Else,
KW_Enum,
KW_Extern,
KW_False,
KW_Fn,
KW_For,
KW_If,
KW_Impl,
KW_In,
KW_Let,
KW_Loop,
KW_Match,
KW_Mod,
KW_Move,
KW_Mut,
KW_Pub,
KW_Ref,
KW_Return,
KW_SELF,
KW_Self,
KW_Static,
KW_Struct,
KW_Super,
KW_Trait,
KW_True,
KW_Type,
KW_Union,
KW_Unsafe,
KW_Use,
KW_Where,
KW_While,

crates.io => https://crates.io/crates/lexical_scanner github.com => https://github.com/mjehrhart/lexical_scanner

You might also like...
The Amp programming language: a language designed for building high performance systems.

A language designed for building high performance systems. Platform Support x86_64-pc-windows ✅ x86_64-unknown-linux ⚠️ untested x86_64-unknown-darwin

Nexa programming language. A language for game developers by a game developer
Nexa programming language. A language for game developers by a game developer

NexaLang Nexa programming language. A language for game developers by a game developer. Features High-Level: Nexa is an easy high level language Two M

Stream-based visual programming language for systems observability
Stream-based visual programming language for systems observability

Stream-based visual programming language for systems observability. Metalens allows to build observability programs in a reactive and visual way. On L

A visual node-based programming language

Stainless Script Stainless Script is a visual node-based programming language. The structure is as follows: program contains classes, objects (constan

A structure editor for a simple functional programming language, with Vim-like shortcuts and commands.

dilim A structure editor for a simple functional programming language, with Vim-like shortcuts and commands. Written in Rust, using the Yew framework,

Simple Programming Language for fun.

Chap Chap is an Easy to learn, dynamic, interpretive, isolated, and keywordless scripting language written in Rust. It is useful when you want your us

Command line tool to extract various data from Blender .blend files

blendtool Command line tool to extract various data from Blender .blend files. Currently supports dumping Eevee irradiance volumes to .dds, new featur

 apkeep - A command-line tool for downloading APK files from various sources
apkeep - A command-line tool for downloading APK files from various sources

apkeep - A command-line tool for downloading APK files from various sources Installation Precompiled binaries for apkeep on various platforms can be d

omekasy is a command line application that converts alphanumeric characters in your input to various styles defined in Unicode.

omekasy is a command line application that converts alphanumeric characters in your input to various styles defined in Unicode. omekasy means "dress up" in Japanese.

Owner
null
A C-like programming language that is similar to Rust's syntax. Toy programming language.

CRUST This is a hobby project to learn about compilers and language design. I've designed the language to be similar to C and Rust. Heavily inspired b

Mahmoud Almontasser 50 Jul 13, 2024
A library that creates a terminal-like window with feature-packed drawing of text and easy input handling. MIRROR.

BearLibTerminal provides a pseudoterminal window with a grid of character cells and a simple yet powerful API for flexible textual output and uncompli

Tommy Ettinger 43 Oct 31, 2022
A command-line utility that creates project structure.

petridish A command-line utility that creates project structure. If you have heard of the cookiecutter project, petridish is a rust implementation of

null 11 Dec 29, 2022
Tracing layer that automatically creates and manages progress bars for active spans.

tracing-indicatif A tracing layer that automatically creates and manages indicatif progress bars for active spans. Progress bars are a great way to ma

Emerson Ford 95 Feb 22, 2023
SKYULL is a command-line interface (CLI) in development that creates REST API project structure templates with the aim of making it easy and fast to start a new project.

SKYULL is a command-line interface (CLI) in development that creates REST API project structure templates with the aim of making it easy and fast to start a new project. With just a few primary configurations, such as project name, you can get started quickly.

Gabriel Michaliszen 4 May 9, 2023
tmplt is a command-line interface tool that allows you to quickly and easily set up project templates for various programming languages and frameworks

tmplt A User Friendly CLI Tool For Creating New Projects With Templates About tmplt is a command-line tool that lets users quickly create new projects

Humble Penguin 35 Apr 8, 2023
A diff-based data management language to implement unlimited undo, auto-save for games, and cloud-apps which needs to retain every change.

Docchi is a diff-based data management language to implement unlimited undo, auto-save for games, and cloud-apps which needs to save very often. User'

juzy 21 Sep 19, 2022
Programming language made by me to learn other people how to make programming languages :3

Spectra programming language Programming language made for my tutorial videos (my youtube channel): Syntax Declaring a variable: var a = 3; Function

Adi Salimgereyev 3 Jul 25, 2023
Count your code by tokens, types of syntax tree nodes, and patterns in the syntax tree. A tokei/scc/cloc alternative.

tcount (pronounced "tee-count") Count your code by tokens, types of syntax tree nodes, and patterns in the syntax tree. Quick Start Simply run tcount

Adam P. Regasz-Rethy 48 Dec 7, 2022
Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens

Large language models (LLMs) can be used for many tasks, but often have a limited context size that can be smaller than documents you might want to use. To use documents of larger length, you often have to split your text into chunks to fit within this context size.

Ben Brandt 4 May 8, 2023