This crate converts Rust compatible regex-syntax to Vim's NFA engine compatible regex.

Related tags

Miscellaneous vrex
Overview
Attention

This crate is still a proof of concept.

Updated: Thu 10 Feb 2022 08:22:33 PM -03

ToDo:

  • PoC - Rust's Regex to Vim's NFA Regex Engine. PS: Rust's regex is PCRE flavoured.
  • Package the prototype converter from PoC into a CLI app.
  • Test that CLI app.
  • Start on PoC - Rust's Regex to Vim's Old Regex Engine. PS: Rust's regex is PCRE flavoured.
  • Finishing implementing the printer for the Repetition match.
  • Reorder matches to print everything in the correct order.
  • Write tests.
  • Write Documentation.
  • Make a custom Neovim build.

Motivation

Vim is great, but its regexp engine looses in modularity, being a monolithic construct of difficult tweaking and putting a higher enough barrier of entry to using it when writing more complex expressions, see ms-jpq/nvim_rc and vim/src/regexp.c.

The initial objective is conversion of Vim's regexp engine into a PCRE compatible one, feasible in theory and already proven for the reverse process as this PoC suggests. You can install the packaged CLI app following its installation section.

Hopefully it should increase user experience and satisfy my own curiosity about the formal subject.

Futhermore, I hope to try making a build of the editor using this crate for testing and who knows what else?

PoC - Rust's Regex to Vim's Old Regex Engine

Implementation of a VimPrinter. Ideally it will take PCRE compliant regex and output Vim's Old Regex Engine compliant regexp, which is the most troublesome issue on using Rust's regex engine in Vim without breaking old plugins. See lib.rs

PoC - Rust's Regex to Vim's NFA Regex Engine

So, here me out. I was working on a way to parse PCRE syntax compatible regular expressions into Vim's own regular expression syntax. For context but cutting it short, I went for Rust's regex because, well, I'm a proud rustacean. After lots of brain damaging code digging and carving, asking other rustaceans for help and web stalking the original authors of Vim's Regex, it came to me that a simpler solution could be achieved mathematically - off with the failed parsers and lexers I wrote, be sure there were quite a few of those - through the following theoretical approach:

Obs: I've included a primer on the issue's bottom to get those interested but lacking formal basis to follow through and two simple code snippets for replication. Please forgive any typos as my ADHD makes it impossible for me to make sure of their absence.

  1. Theorem: If an NFA recognizes A then A is a regular language.

  2. Theorem: If R is a regular expression and A=L(R) then A is regular. Simply put, Every language describable with regular expressions is regular.

Proof: R being a regular expression, it can be so either by being [atomic] or [composite].

* Case: R is atomic ::= if R is a single symbol then an NFA will reject anything other than it.
                        if R = an empty string or empty language, those are also regular.

* Case R is composite ::= Closure construction through concatenation,
                          union and star, also result in regular languages.

Then, after hacking the source code of Neovim and Vim, I've studied the algorithm for their new NFA engine (see the code) which then occurred to me must be mathematically compatible through Nondeterministic Finite Automata as they're both formal regular languages. Therefore, after researching Kleene's algorithm and Glushkov's construction algorithm, it's implied that both are compatible at high-level intermediate representations Hir, aka NFA.

Concluding, transforming a Rust's (ripgrep's) regex, formalized on the regex_syntax crate (library) into an HIR and printing (thru the Printer struct) it back out as a different, actually equivalent - regular expression bridges the gaps between the two syntaxes and this output would ideally be accepted by both machines(engines). I then ran some tests and it is indeed so, more, Vim would automatically switch engines to accommodate the expression derived from the HIR, sometimes explicitly reporting an NFA engine syntax typo on my part.

I'm packaged the solution into an app, but it was the implication of this success which enticed me on creating this repo, that it's possible to actually replace - or add an opt out of - the old vimregex NFA engine with Rust's, through FFI, or even fully replace it, provided we implement some features like lookahead, etc or use Ripgrep library crate, shielding old plugins from breaking changes by using the old NFA engine as a bridge between incompatible patterns.

PS: I'll be working on that rewrite of the old engine, for my own leisure anyway though.

Installation

Its been tested on linux, built using Rust Nightly 1.60.0 but any not too old versions should work as well.

  • Requirements:
    • rustc or cargo (which implies rustc is installed)
  • Compilation Options: cargo install --help

If you're on Linux or macOS:

git clone https://github.com/kaiuri/vrex.git && cd ./vrex && cargo install --locked .

Primer

Quick primer to help out on those interested but lacking theoretical background.

  • Deterministic Finite Automata (DFA) is a 5-tuple (Q, Σ, &, q_0,F) where
    • Q is a finite set of states,
    • Σ is a finite set of alphabet symbols,
    • & is a transition function &: Q×Σ → Q,
    • q0 is the start state,
    • F is a set of accepted states.
  • Nondeterministic Finite Automata (NFA), think DFA but with infinite sets, that is, & is now defined as &: Q×Σε → P(Q) === (R| ⊆ Q }
  • String: finite set of symbols in Σ
  • Language: Set of strings.
  • Regular Language: Let M be a machine and L a language, L(M) is a language of that machine if it is accepted by it.
  • Regular expressions: A regular Expression R is a set of L languages that have the Union, Concatenation and Star operations defined within it.
  • Closure Properties for Regular Languages: Set of mathematical behaviour shared among all regular languages.

Regular language acceptance examples:

+--------------- DFA -----------------+-------------- NFA -------------------+
|   0                                 |   0---->---1---->----2--->-----3     |
| +---+                               |   a        |         |         |     |
| |   |      1                  0,1   | +->-+      |         |         |     |
| +-+-+  +--->----+           +-->--+ | |   |      a         |        a,ε    |
|   |    |        |    1      |     | | +-+-<  +--->----+    |      +-->--+  |
|   q1---+        +--->q2---> q3<---+ |   |    |        |    b      |     |  |
|        |        |                   |   q1---+        +--->q2---> q3    q4 |
|        +---<----+                   |        |        |                    |
|            0                        |        +---<----+                    |
|                                     |            b                         |
+--------------- DFA -----------------+-------------- NFA -------------------+

Further reading:

You might also like...
A rust library for interacting with multiple Discord.com-compatible APIs and Gateways at the same time.
A rust library for interacting with multiple Discord.com-compatible APIs and Gateways at the same time.

Chorus A rust library for interacting with (multiple) Spacebar-compatible APIs and Gateways (at the same time). Explore the docs » Report Bug · Reques

Emerald, the EVM compatible paratime

The Emerald ParaTime This is the Emerald ParaTime, an official EVM-compatible Oasis Protocol Foundation's ParaTime for the Oasis Network built using t

A Litecord compatible/inspired OSS implementation of Discord's backend for fun and profit.

A Litecord compatible/inspired OSS implementation of Discord's backend for fun and profit.

OpenAI compatible API for serving LLAMA-2 model
OpenAI compatible API for serving LLAMA-2 model

Cria - Local llama OpenAI-compatible API The objective is to serve a local llama-2 model by mimicking an OpenAI API service. The llama2 model runs on

Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.

candle-vllm Efficient platform for inference and serving local LLMs including an OpenAI compatible API server. Features OpenAI compatible API server p

A scalable differentiable probabilistic Datalog engine, with Rust

Scallop A scalable probabilistic datalog engine with Rust. Usage The Scallop system is best integrated inside of the Rust context. With scallop! { ...

MeiliSearch is a powerful, fast, open-source, easy to use and deploy search engine
MeiliSearch is a powerful, fast, open-source, easy to use and deploy search engine

MeiliSearch is a powerful, fast, open-source, easy to use and deploy search engine. Both searching and indexing are highly customizable. Features such as typo-tolerance, filters, and synonyms are provided out-of-the-box. For more information about features go to our documentation.

A collection of exponentially-smoothed camera controllers for the Bevy Engine.

smooth-bevy-cameras A collection of exponentially-smoothed camera controllers for the Bevy Engine. Look Transform All controllers are based on a LookT

A simple induction and BMC engine.

Mikino is a (relatively) simple induction and BMC engine. Its goal is to serve as a simple yet interesting tool for those interested in formal verification, especially SMT-based induction.

Owner
kaiuri
kaiuri
Payments Engine is a simple toy payments engine

Payments Engine is a simple toy payments engine that reads a series of transactions from a CSV, updates client accounts, handles disputes and chargebacks, and then outputs the state of clients accounts as a CSV.

Bogdan Arabadzhi 0 Feb 3, 2022
`Debug` in rust, but only supports valid rust syntax and outputs nicely formatted using pretty-please

dbg-pls A Debug-like trait for rust that outputs properly formatted code Showcase Take the following code: let code = r#" [ "Hello, World!

Conrad Ludgate 12 Dec 22, 2022
a simple compiled language i made in rust. it uses intermediate representation (IR) instead of an abstract syntax tree (AST).

a simple compiled language i made in rust. it uses intermediate representation (IR) instead of an abstract syntax tree (AST).

null 4 Oct 3, 2022
Analogous, indented syntax for the Rust programming language.

Note: After experimenting with this in the wild, I have found representing keywords as symbols to be far less readable in large codebases. Additionall

null 42 Oct 2, 2021
A syntax exploration of eventually stable Rust Iterator items

Rust Iterator Items: a syntax exploration This crate is a thin wrapper around the unstable generator feature, allowing users to create new items that

Esteban Kuber 28 Sep 1, 2022
Rust macro to use a match-like syntax as a elegant alternative to nesting if-else statement

cond Rust macro to use a match-like syntax as an elegant alternative to many if-else statements. I got the idea from empty Go switch statements. I tho

CheckM4te 3 Nov 23, 2023
📦 Crate Protocol allows anyone to create, manage, and trade a tokenized basket of assets, which we refer to as a Crate.

?? Crate Protocol Crate Protocol allows anyone to create, manage, and trade a tokenized basket of assets, which we refer to as a Crate. A Crate is alw

Crate Protocol 63 Oct 31, 2022
tr-lang is a language that aims to bring programming language syntax closer to Turkish.

tr-lang Made with ❤️ in ???? tr-lang is a language that aims to bring programming language syntax closer to Turkish. tr-lang is a stack based language

Kerem Göksu 10 Apr 2, 2022
🚀 Fast and 100% API compatible postcss replacer, built in Rust

?? Fast and 100% API compatible postcss replacer, built in Rust

迷渡 472 Jan 7, 2023
Unofficial Bitwarden compatible server written in Rust, formerly known as bitwarden_rs

Alternative implementation of the Bitwarden server API written in Rust and compatible with upstream Bitwarden clients*, perfect for self-hosted deploy

Daniel García 21.5k Jan 8, 2023