This shows proof-of-concept implementation of lexer-parser-evaluator which allows setting custom values to keywords.

Overview

Custom Configurable Lexer-Parser


Note

This is still very experimental, and for any syntax error it will just panic giving very unhelpful error message. The error recovery and more helpful error messages are possible, but have not been implemented yet. Also this is more of a proof-of-concept, and the language itself only supports variable declaration,assignment, while and for loop, if-else statements and baked-in print statement

If someone is actually interested in this, I can add more examples, write up a better documentation etc, let me know in issues. This can also theoretically be used to convert program from one keyword mappings to another, but haven't gotten around to do that yet.


Inspired by a tweet , I made this lexer-parser pair along with an evaluator, which allows you to set custom keywords for the language using a config file, and according to that config, this runs the program. To show in short, what this makes possible:

# English
let v1 = 1234;
if (v1>12){
    print(v1+(123 - (25*6)));
} else {
    print('Hello');
}
v1 = 1;
if(v1>12){
    print(v1+(123 - (25*6)));
} else{
    print('Hello');
}
let i = 0;
while (i<5){
    print('आता i ची value आहे '+i);
    i = i+1;
}
for k in [i,v1]{
    print(k);
}

And

# Marathi
नवीन v1 = १२३४;
जर (v1>12) तर{
    हे(v1+(123 - (25*6))) दाखवा ;
} नाहीतर {
    हे('Hello') दाखवा;
}
v1 = १;
जर (v1>12) तर{
    हे(v1+(123 - (25*6))) दाखवा;
} नाहीतर {
    हे('Hello') दाखवा;
}
नवीन i = 0;
जोपर्यंत (i<5) तोपर्यंत{
    हे('आता i ची value आहे '+i)दाखवा;
    i = i+1;
}
for k in [i,v1]{
    हे(k)दाखवा;
}

both are a valid programs and can be run by the same binary, with their own config-english and config-marathi config files. Moreover, you can create custom config files to write code in your own language.

How this works

This is made of three parts : A handwritten lexer, taken much after the amazing Crafting Interpreters, A Parser , generated using Lalrpop crate, and an evaluator.

the commandline interface is :

USAGE:
    config_lex --config <CONFIG> --file <FILE>

FLAGS:
    -h, --help       Print help information
    -V, --version    Print version information

OPTIONS:
    -c, --config <CONFIG>    Keyword Configuration file
    -f, --file <FILE>        Source code file

This takes in a keyword config file, which specify the keyword mappings, which are then used by the lexer to determine the tokens. The lexer then emits the tokens, which are used by parser to generate AST. This AST is evaluated using a recursive tree walker, again taken after Crafting Interpreters.

The point to note here is that the major changes here from other lexer-parsers are confined to lexer and parser only. This can still emit a convenient Intermediate Representation, such as AST, which can then be transplied to some other language.

Language Specification

Currently following keywords are configurable, the brackets are default english fallback values:

  • PrintStart ( print )
  • PrintEnd
  • ForStart ( for )
  • ForAux1
  • ForAux2
  • In ( in )
  • ForAux3
  • ForAux4
  • IfStart ( if )
  • IfAux1
  • IfAux2
  • ElseStart ( else )
  • ElseAux1
  • LetStart ( let )
  • Or ( || )
  • And ( && )
  • WhileStart ( while )
  • WhileAux1
  • WhileAux2

Where *Aux keywords are optional, and other are required. These auxillary keywords are provided so that the constructs, such as : if, while can be made more "organic", as some languages can use extra keywords to make the constructs more "coherent"/"natural" for that language.

Currently the structure of this language is as (can be seen in parser.lalrpop file):

-> This supports only string and numerical datatypes, and is dynamically typed
-> expr is any expression containing +-/*() and literal values or variables.
-> condition is value/variables compared using <,>,<=,=>,==,!=
    and such conditions joined by And,Or tokens.
-> While comparing string, it only == and != do a char-by-char comparison,
    others compare by string length
-> strings can be only added with other strings or numbers (like java)
    and other arithmetic operations are invalid for strings.
-> block is statements inside { and }

print => PrintStart ( Expr ) PrintEnd
let => LetStart ID = Expr
if => IfStart IfAux1 (condition) IfAux2 block ElseStart ElseAux1 block # else section is optional
while => WhileStart WhileAux1 (condition) WhileAux2 block
for => ForStart ForAux1 ID ForAux2 ForAux3 In array ForAux4 block
array => [ comma-separated-expr ]

Licence

This code is released under GNU GPL V3, see License file for more info.

You might also like...
The Elegant Parser

pest. The Elegant Parser pest is a general purpose parser written in Rust with a focus on accessibility, correctness, and performance. It uses parsing

A typed parser generator embedded in Rust code for Parsing Expression Grammars

Oak Compiled on the nightly channel of Rust. Use rustup for managing compiler channels. You can download and set up the exact same version of the comp

Rust query string parser with nesting support

What is Queryst? This is a fork of the original, with serde and serde_json updated to 0.9 A query string parsing library for Rust inspired by https://

A fast, extensible, command-line arguments parser

parkour A fast, extensible, command-line arguments parser. Introduction 📚 The most popular argument parser, clap, allows you list all the possible ar

Soon to be AsciiDoc parser implemented in rust!

pagliascii "But ASCII Doc, I am Pagliascii" Soon to be AsciiDoc parser implemented in rust! This project is the current implementation of the requeste

An LR parser generator, implemented as a proc macro

parsegen parsegen is an LR parser generator, similar to happy, ocamlyacc, and lalrpop. It currently generates canonical LR(1) parsers, but LALR(1) and

A rusty, dual-wielding Quake and Half-Life texture WAD parser.

Ogre   A rusty, dual-wielding Quake and Half-Life texture WAD parser ogre is a rust representation and nom parser for Quake and Half-Life WAD files. I

A modern dialogue executor and tree parser using YAML.

A modern dialogue executor and tree parser using YAML. This crate is for building(ex), importing/exporting(ex), and walking(ex) dialogue trees. convo

A friendly parser combinator crate

Chumsky A friendly parser combinator crate that makes writing LL-1 parsers with error recovery easy. Example Here follows a Brainfuck parser. See exam

Owner
null
Yet Another Parser library for Rust. A lightweight, dependency free, parser combinator inspired set of utility methods to help with parsing strings and slices.

Yap: Yet another (rust) parsing library A lightweight, dependency free, parser combinator inspired set of utility methods to help with parsing input.

James Wilson 117 Dec 14, 2022
Website for Microformats Rust parser (using 'microformats-parser'/'mf2')

Website for Microformats Rust parser (using 'microformats-parser'/'mf2')

Microformats 5 Jul 19, 2022
A repo which serves as a learning ground for the Nom crate.

Fun with Nom This repo is created as a learning resource for the Nom crate for Rust. Nom is a parser combinators library. It can be used to parse just

Jeff Mitchell 3 Feb 18, 2024
A native Rust port of Google's robots.txt parser and matcher C++ library.

robotstxt A native Rust port of Google's robots.txt parser and matcher C++ library. Native Rust port, no third-part crate dependency Zero unsafe code

Folyd 72 Dec 11, 2022
Rust parser combinator framework

nom, eating data byte by byte nom is a parser combinators library written in Rust. Its goal is to provide tools to build safe parsers without compromi

Geoffroy Couprie 7.6k Jan 7, 2023
url parameter parser for rest filter inquiry

inquerest Inquerest can parse complex url query into a SQL abstract syntax tree. Example this url: /person?age=lt.42&(student=eq.true|gender=eq.'M')&

Jovansonlee Cesar 25 Nov 2, 2020
Parsing Expression Grammar (PEG) parser generator for Rust

Parsing Expression Grammars in Rust Documentation | Release Notes rust-peg is a simple yet flexible parser generator that makes it easy to write robus

Kevin Mehall 1.2k Dec 30, 2022
A fast monadic-style parser combinator designed to work on stable Rust.

Chomp Chomp is a fast monadic-style parser combinator library designed to work on stable Rust. It was written as the culmination of the experiments de

Martin Wernstål 228 Oct 31, 2022
A parser combinator library for Rust

combine An implementation of parser combinators for Rust, inspired by the Haskell library Parsec. As in Parsec the parsers are LL(1) by default but th

Markus Westerlind 1.1k Dec 28, 2022
LR(1) parser generator for Rust

LALRPOP LALRPOP is a Rust parser generator framework with usability as its primary goal. You should be able to write compact, DRY, readable grammars.

null 2.4k Jan 7, 2023