better tools for text parsing

Overview

nom-text

Goal: a library that extends nom to provide better tools for text formats (programming languages, configuration files).

current needs

Recognizing line and column in input data: nom-locate

aggregating more precise errors

Using nom-supreme

error recovery

parsers need to be fault tolerant: recovering from the current error allows parsing the rest of the file: https://eyalkalderon.com/blog/nom-error-recovery/

We should also allow for various error levels (like warnings, that would not stop compilation)

precedence

precedence rules can be annoying to write

bounded recursion

most language parsers end up calling themselves recursively, resulting in stack overflow issues

nicer error diagnostic

if we have error spans, we should be able to display more context, like rustc's errors

tokenization

should we support multiple phases, with &str -> Vec<Token> functions followed by &[Token] -> AST functions?

You might also like...
A fast, low-resource Natural Language Processing and Text Correction library written in Rust.

nlprule A fast, low-resource Natural Language Processing and Error Correction library written in Rust. nlprule implements a rule- and lookup-based app

fastest text uwuifier in the west

uwuify fastest text uwuifier in the west transforms Hey... I think I really love you. Do you want a headpat? into hey... i think i w-weawwy wuv you.

A crate using DeepSpeech bindings to convert mic audio from speech to text

DS-TRANSCRIBER Need an Offline Speech To Text converter? Records your mic, and returns a String containing what was said. Features Begins transcriptio

Sorta Text Format in UTF-8

STFU-8: Sorta Text Format in UTF-8 STFU-8 is a hacky text encoding/decoding protocol for data that might be not quite UTF-8 but is still mostly UTF-8.

The fastest way to identify any mysterious text or analyze strings from a file, just ask `lemmeknow` !
The fastest way to identify any mysterious text or analyze strings from a file, just ask `lemmeknow` !

The fastest way to identify anything lemmeknow ⚡ Identify any mysterious text or analyze strings from a file, just ask lemmeknow. lemmeknow can be use

Font independent text analysis support for shaping and layout.

lipi Lipi (Sanskrit for 'writing, letters, alphabet') is a pure Rust crate that provides font independent text analysis support for shaping and layout

lingua-rs Python binding. An accurate natural language detection library, suitable for long and short text alike.

lingua-py lingua-rs Python binding. An accurate natural language detection library, suitable for long and short text alike. Installation pip install l

Makdown-like text parser.

Makdown-like text parser.

A Rust wrapper for the Text synthesization service TextSynth API

A Rust wrapper for the Text synthesization service TextSynth API

Owner
projects related to nom, the Rust parser combinators library
null
Source text parsing, lexing, and AST related functionality for Deno

Source text parsing, lexing, and AST related functionality for Deno.

Deno Land 90 Jan 1, 2023
Text Expression Runner – Readable and easy to use text expressions

ter - Text Expression Runner ter is a cli to run text expressions and perform basic text operations such as filtering, ignoring and replacing on the c

Maximilian Schulke 72 Jul 31, 2022
An opinionated, better system for spelling words in English.

ingLix / ˈɪŋ glɪʃ / English done right. An opinionated, better system for spelling words in English. Preamble Click to expand. The English language is

Nicholas Omer Chiasson 6 Aug 8, 2022
Neural syntax annotator, supporting sequence labeling, lemmatization, and dependency parsing.

SyntaxDot Introduction SyntaxDot is a sequence labeler and dependency parser using Transformer networks. SyntaxDot models can be trained from scratch

TensorDot 46 Dec 27, 2022
Text calculator with support for units and conversion

cpc calculation + conversion cpc parses and evaluates strings of math, with support for units and conversion. 128-bit decimal floating points are used

Kasper 82 Jan 4, 2023
Find and replace text in source files

Ruplacer Find and replace text in source files: $ ruplacer old new src/ Patching src/a_dir/sub/foo.txt -- old is everywhere, old is old ++ new is ever

Tanker 331 Dec 28, 2022
An efficient and powerful Rust library for word wrapping text.

Textwrap Textwrap is a library for wrapping and indenting text. It is most often used by command-line programs to format dynamic output nicely so it l

Martin Geisler 322 Dec 26, 2022
bottom encodes UTF-8 text into a sequence comprised of bottom emoji

bottom encodes UTF-8 text into a sequence comprised of bottom emoji (with , sprinkled in for good measure) followed by ????. It can encode any valid UTF-8 - being a bottom transcends language, after all - and decode back into UTF-8.

Bottom Software Foundation 345 Dec 30, 2022
👄 The most accurate natural language detection library in the Rust ecosystem, suitable for long and short text alike

Table of Contents What does this library do? Why does this library exist? Which languages are supported? How good is it? Why is it better than other l

Peter M. Stahl 569 Jan 3, 2023
Semantic text segmentation. For sentence boundary detection, compound splitting and more.

NNSplit A tool to split text using a neural network. The main application is sentence boundary detection, but e. g. compound splitting for German is a

Benjamin Minixhofer 273 Dec 29, 2022