Attention
This crate is still a proof of concept.
Updated: Thu 10 Feb 2022 08:22:33 PM -03
ToDo:
- PoC - Rust's Regex to Vim's NFA Regex Engine. PS: Rust's regex is PCRE flavoured.
- Package the prototype converter from PoC into a CLI app.
- Test that CLI app.
- Start on PoC - Rust's Regex to Vim's Old Regex Engine. PS: Rust's regex is PCRE flavoured.
- Finishing implementing the printer for the Repetition match.
- Reorder matches to print everything in the correct order.
- Write tests.
- Write Documentation.
- Make a custom Neovim build.
Motivation
Vim is great, but its regexp engine looses in modularity, being a monolithic construct of difficult tweaking and putting a higher enough barrier of entry to using it when writing more complex expressions, see ms-jpq/nvim_rc and vim/src/regexp.c.
The initial objective is conversion of Vim's regexp engine into a PCRE compatible one, feasible in theory and already proven for the reverse process as this PoC suggests. You can install the packaged CLI app following its installation section.
Hopefully it should increase user experience and satisfy my own curiosity about the formal subject.
Futhermore, I hope to try making a build of the editor using this crate for testing and who knows what else?
PoC - Rust's Regex to Vim's Old Regex Engine
Implementation of a VimPrinter. Ideally it will take PCRE compliant regex and output Vim's Old Regex Engine compliant regexp, which is the most troublesome issue on using Rust's regex engine in Vim without breaking old plugins. See lib.rs
PoC - Rust's Regex to Vim's NFA Regex Engine
So, here me out. I was working on a way to parse PCRE syntax compatible regular expressions into Vim's own regular expression syntax. For context but cutting it short, I went for Rust's regex because, well, I'm a proud rustacean. After lots of brain damaging code digging and carving, asking other rustaceans for help and web stalking the original authors of Vim's Regex, it came to me that a simpler solution could be achieved mathematically - off with the failed parsers and lexers I wrote, be sure there were quite a few of those - through the following theoretical approach:
Obs: I've included a primer on the issue's bottom to get those interested but lacking formal basis to follow through and two simple code snippets for replication. Please forgive any typos as my ADHD makes it impossible for me to make sure of their absence.
-
Theorem: If an
NFA
recognizesA
thenA
is a regular language. -
Theorem: If
R
is a regular expression andA=L(R)
thenA
is regular. Simply put, Every language describable with regular expressions is regular.
Proof: R
being a regular expression, it can be so either by being [atomic] or [composite].
* Case: R is atomic ::= if R is a single symbol then an NFA will reject anything other than it.
if R = an empty string or empty language, those are also regular.
* Case R is composite ::= Closure construction through concatenation,
union and star, also result in regular languages.
Then, after hacking the source code of Neovim and Vim, I've studied the algorithm for their new NFA engine
(see the code) which then occurred to me must be mathematically compatible through Nondeterministic Finite Automata as they're both formal regular languages. Therefore, after researching Kleene's algorithm and Glushkov's construction algorithm, it's implied that both are compatible at high-level intermediate representations Hir
, aka NFA.
Concluding, transforming a Rust's (ripgrep
's) regex, formalized on the regex_syntax crate
(library) into an HIR and printing (thru the Printer struct
) it back out as a different, actually equivalent - regular expression bridges the gaps between the two syntaxes and this output would ideally be accepted by both machines(engines). I then ran some tests and it is indeed so, more, Vim would automatically switch engines to accommodate the expression derived from the HIR, sometimes explicitly reporting an NFA engine
syntax typo on my part.
I'm packaged the solution into an app, but it was the implication of this success which enticed me on creating this repo, that it's possible to actually replace - or add an opt out of - the old vimregex NFA engine
with Rust's, through FFI
, or even fully replace it, provided we implement some features like lookahead, etc or use Ripgrep library crate, shielding old plugins from breaking changes by using the old NFA engine
as a bridge between incompatible patterns.
PS: I'll be working on that rewrite of the old engine, for my own leisure anyway though.
Installation
Its been tested on linux
, built using Rust Nightly 1.60.0
but any not too old versions should work as well.
- Requirements:
rustc
orcargo
(which implies rustc is installed)
- Compilation Options:
cargo install --help
If you're on Linux
or macOS
:
git clone https://github.com/kaiuri/vrex.git && cd ./vrex && cargo install --locked .
Primer
Quick primer to help out on those interested but lacking theoretical background.
- Deterministic Finite Automata (DFA) is a 5-tuple
(Q, Σ, &, q_0,F)
whereQ
is a finite set of states,Σ
is a finite set of alphabet symbols,&
is a transition function&: Q×Σ → Q
,q0
is the start state,F
is a set of accepted states.
- Nondeterministic Finite Automata (NFA), think DFA but with infinite sets, that is,
&
is now defined as&: Q×Σε → P(Q) === (R| ⊆ Q }
- String: finite set of symbols in
Σ
- Language: Set of strings.
- Regular Language: Let
M
be a machine andL
a language,L(M)
is a language of that machine if it is accepted by it. - Regular expressions: A regular Expression
R
is a set ofL
languages that have theUnion
,Concatenation
andStar
operations defined within it. - Closure Properties for Regular Languages: Set of mathematical behaviour shared among all regular languages.
Regular language acceptance examples:
+--------------- DFA -----------------+-------------- NFA -------------------+
| 0 | 0---->---1---->----2--->-----3 |
| +---+ | a | | | |
| | | 1 0,1 | +->-+ | | | |
| +-+-+ +--->----+ +-->--+ | | | a | a,ε |
| | | | 1 | | | +-+-< +--->----+ | +-->--+ |
| q1---+ +--->q2---> q3<---+ | | | | b | | |
| | | | q1---+ +--->q2---> q3 q4 |
| +---<----+ | | | |
| 0 | +---<----+ |
| | b |
+--------------- DFA -----------------+-------------- NFA -------------------+