nhotyp-lang

Nhotyp is a conceptual language designed for ease of implementation during my tutoring in an introductive algorithmic course at Harbin Institute of Technology, Weihai. The current repository holds the latest definition for Nhotyp, but the specification itself was initially written in Chinese.

Nhotyp is an "modern" interpretative language imitating a few features in Python and Rust, and used prefix expressions for ease of parsing. It was so designed to make the assignment easier to complete, if one chose to think the problem through, as it required few string operations and would never require the construction of an AST just to function properly.

The said repository introduces a standard implementation which would work on correct implementations, and should report common runtime errors if it was not written properly.

Usage

Build the compiler with Rust and execute your Nhotyp code with the compiled interpreter:

cargo build
cargo run your_code.nh

You may find some samples in the samples/ folder.

An alternative interactive console is available, if no parameters were given to the interpreter.

Specifications

1. Comments

By standard no line shall have code and comments mixed together (i.e. a single line could be either a statement or a comment but never both), it is however up to the interpreter to decide whether this requirement should be enforced.

All comments should start with the character #, and all subsequence characters MUST be ignored, hidden to the statement parser.

2. Data Types

To simplify the implementation, only 48-bit signed integers may appear as variables or constants. There will be no characters, strings, integers of other sizes or any other data type in any form. Implementers should take care of overflow cases, which would guarantee all values within the range [-2^47, 2^47-1]. Calculation between those integers are yet to be defined in the section Operators.

3. Variables

All variable names would consist only of ASCII lowercase letters (a-z) or underscores (_). Other characters should NEVER appear as part of a variable name. All variables or constants MUST NEVER be longer than 63 characters exclusive.

<let-uds> ::= <"a"-"z"> | "_"
<variable-name> ::= <let-uds> <variable-name> | <let-uds>

It is advised to use 64-bit signed integers (or better yet, 128-bit ones to implement storage of variables. The usage of snake_case is preferred for variable names.

Function names follow the same requirements as variable names do.

As for the variable scope (data visibility), all variables are visible to their current function instances (actions like recursive calls would yield multiple instances, hence creating multiple scopes) and nowhere outside. Expressions like if or while do not create scope visibility barriers (i.e. variables within loops are visible outside, as long as they're in the same function call instance). All function names are visible everywhere and SHOULD NOT be changed whatsoever.

4. Expressions

For ease of expression parsing, prefix expressions (like + a b) are preferred over traditional expressions (like a + b). Operators would always take precedence over it's parameters, so it's easier to determine the number of parameters as soon as you see the operator. A formal definition of an expression is like:

<constant> ::= any value from -2^47 ~ 2^47-1
<expression> ::= <constant>
               | <variable-name>
               | <operator> <expression> <expression> ... <expression>

Some examples include:

Prefix expression + 16 233 to infix expression 16 + 233
Prefix expression + * 3 4 - 7 2 to infix expression (3 * 4) + (7 - 2)
Prefix expression * * * 1 2 3 4 to infix expression ((1 * 2) * 3) * 4
Prefix expression func_three 12 13 14 to infix expression func_three(12, 13, 14), whereas func_three is a function with 3 parameters
Prefix expression func_four 1 + 5 6 - 9 3 4 to infix expression func_four(1, 5 + 6, 9 - 3, 4)

5. Operators

There are a few built-in operators which functions mostly like they do in C++ / Python, but certain care have been taken to avoid confusion (especially while in division or remainder calculations).

Addition +: Accepts 2 parameters, yields sum of the two. Handle overflow when needed.
Subtraction -: Accepts 2 parameters, yields the first subtracted by the second. Handle overflow when needed. Example: - 17 12 = 5
Multiplication *: Accepts 2 parameters, yields the product of the two. Proper handle of overflow is required. Example: * 16 -3 = -48
Remainder %: Accepts 2 parameters a and b, yields the smallest non-negative integer k where a = |b|p + k, such that p is an integer. Result is always 0 when b is 0. Examples:
- % 27 4 = 3, because 27 = 4 * 6 + 3
- % -18 5 = 2, because -18 = 5 * (-4) + 2
- % 36 -7 = 1, because 36 = |-7| * 5 + 1
- % 7 0 = 0, this is a defined behavior
Division /: Accepts 2 parameters a and b, yields (a - a % b) / |b|, where a subtracted by the remainder is guaranteed to be divisible by |b|. Division by 0 yields 0 anyway. Examples:
- / 9 4 = 2, which is the same as in C
- / -2 -7 = -1, because the remainder is 5
- / 0 0 = 0 and / 6 0 = 0, because division by 0 yields 0 always
Equality ==: Compares the 2 parameters, returns 1 if equal, 0 otherwise
Less <: Compares the 2 parameters, returns 1 if the latter is greater, 0 otherwise
Greater >: Compares the 2 parameters, returns 1 if the former is greater, 0 otherwise
Less than or equal <=: See < operator
Greater than or equal >=: See < operator
Inequality !=: See < operator
Logic and and: Returns 0 if any of the 2 parameters is 0, 1 otherwise
Logic or or: Returns 1 if any of the 2 parameters is not 0, 0 otherwise
Logic exclusive or xor: Returns 1 if one of the 2 parameters is 0 and the other is not, 0 otherwise
Logic not not: Returns 1 if the parameter is 0, 0 otherwise

6. Assignment Statements

With respect to Rust grammar (although all variables here are mutable and do not implement move semantics), the keyword let is chosen (and only) for assignment statements. For example, the following statement assigns the value 2333 to variable waifu:

let waifu = 2333

An assignment could contain any legal statement as its r-value. We also have a more formal definition:

<assignment-statement> ::= let <variable-name> = <expression>

It should be noted that <variable-name should under all circumstances be of no conflict with function names, built-in operators or built-in keywords.

7. Conditional Statements

There will be only an if expression and no else if, elif or else involved. It is up to Nhotyp users to keep track of the rest of the cases. An example of doubling a value twice if it's less than 10 could be written as follows:

if < value 10 then
    let value = * value 2
    let value = * value 2
end if

More generally, we have the formal definition:

<conditional-statement> ::= if <expression> then
                                <code-block>
                            end if

The exact definition for a code block is to be given beyond all statement introductions (section 11).

8. Loop Statements

To further simplify the implementation of Nhotyp, we eradicated the loop, for, foreach and other statements, leaving only while loops. Additionally, continue and break control statements are also removed. The user should take care of them using flags combined with conditional statements. For example, a program calculating the sum of 1 to 100 (in the silly way) can be written as:

let i = 1
let sum = 0
while <= i 100 do
    let sum = + sum i
end while

For a formal definition of the while loop, we have:

<loop-statement> ::= while <expression> do
                         <code-block>
                     end while

It ought to be kindly noted that, as a simple interpreter, Nhotyp implementations should never try to detect infinite loops or such (the halting problem is unsolvable at a large scale).

9. Functions

As a modern programming language (modern as in time), Nhotyp would have to retain a method of defining functions. Functions have certain limitations:

No two functions may have the same function name pairwise, nor may function names conflict with variable names. In the case of such conflicts, the variable name would trigger a runtime error.
Function names are regularized in the same way as variable names are (i.e. consists of lowercase ASCII letters and underscores, while not being longer than 63 characters).
Functions should never receive more than 16 parameters.
When invoked, all parameters are assigned values and appear as
All functions should have exactly 1 return value at the end of the function. That is, return statements should never appear at other places in the function, and the last statement of the function is a return statement.
The main function is the entry to the program, and all Nhotyp programs must have exactly 1 main function. The return value could be either used as the exit code of the interpreter or not, which is up to the implementation to decide.

We will define the function more formally as:

<parameter> ::= <variable-name>
<parameters> ::= <parameter> | <parameter> <variable-name>
<return-statement> ::= return <expression>
<function-name> ::= <variable-name>
<function-block> ::= function <function-name> <parameters> as
                         <code-block>
                         <return-statement>
                     end function

Now we could write a function that takes in 3 parameters and return the largest among them:

function max a b c as
    let res = a
    if > b res then
        let res = b
    end if
    if > c res then
        let res = c
    end if
    return res
end function

10. Input / Output

Nhotyp defined an input function (operator also) and an output statement. The input function could be seen as an operator with no parameters (0 parameters is allowed). It reads in exactly 1 integer from stdin, raising any errors if the input was not a valid integer, while also ensuring the read value is within the correct range.

The output statement, on the other side, prints a list of variables, separated by spaces. Each print statement produces exactly 1 line of output regardless of the number of variables to output. Additionally:

The print statement does not accept more than 16 variables as input.
It also does not accept expressions or constants as input. This means that if you wish to print a constant, you will have to first assign it to a variable and then print that variable.
Certain implementations could add prefixes to input operators or output statements as an eye candy, as long as it does not break the workflow.
Nhotyp does not currently support printing to stderr.

Thus we have a formal definition of the print statement:

<print-statement> ::= print <parameters>

A sample code of I/O with reference in other programming languages is as follows:

# python: var = int(input()) + 15
# c:      int var;
#         scanf("%d", &var);
#         var = var + 15;
# c++:    int var;
#         cin >> var;
#         var = var + 15
let var = + scan 15

# python: print('%d %d %d %d' % (ab, cd, xy, zw))
# c:      printf("%d %d %d %d\n", ab, cd, xy, zw);
# c++:    cout << ab << ' ' << cd << ' ' << xy << ' ' << zw << endl;
print ab cd xy zw

11. Misc

As we've introduced all definitions, operators and statements, we can finally produce a formal definition of statements, code blocks and the entire program:

<statement> ::= <assignment-statement>
              | <conditional-statement>
              | <loop-statement>
              | <print-statement>
<code-block> ::= <statement>
               | <code-block>
                 <statement>
<program> ::= <function-block>
            | <program>
              <function-block>
           ^ contains exactly 1 `main` function

There's some non-trivial notes that may help you implement Nhotyp interpreters faster:

Proficient Nhotyp users should follow the 4-space block indentation as they would in Python. Though an interpreter should function properly even without indentation. These indentation are purely for better maintainability and readability.
Tokens are strictly separated with spaces for ease of parsing. That means the expression + a b should never appear with the operator stuck to the adjacent variable like +a b.
Comment lines or purely empty lines could appear anywhere.

A deprecated Chinese version of the specification is available at README_zh.md. When the two have conflicts in definition, always respect this version for clarification.

Trivia

When you reverse the string Python, you get nohtyP. The letter o and h were swapped only to make it look better and looks more like an actual word (but it's not).
Nhotyp really has no types.
Ignoring the error handling part will shorten your code for at least 50%.
Nhotyp actually took its inspiration from Python, Rust and Pascal.

A small programming language created in an hour

Building a programming language in an hour This is the project I made while doing the Building a programming language in an hour video. You can run it

40 Nov 24, 2022

The Loop programming language

Loop Language Documentation | Website A dynamic type-safe general purpose programming language Note: currently Loop is being re-written into Rust. Mea

20 Oct 21, 2022

Scripting language focused on processing tabular data.

ogma Welcome to the ogma project! ogma is a scripting language focused on ergonomically and efficiently processing tabular data, with batteries includ

146 Dec 26, 2022

Stackbased programming language

Rack is a stackbased programming language inspired by Forth, every operation push or pop on the stack. Because the language is stackbased and for a ve

1 Oct 28, 2021

REPL for the Rust programming language

Rusti A REPL for the Rust programming language. The rusti project is deprecated. It is not recommended for regular use. Dependencies On Unix systems,

1.3k Dec 20, 2022

Easy c̵̰͠r̵̛̠ö̴̪s̶̩̒s̵̭̀-t̶̲͝h̶̯̚r̵̺͐e̷̖̽ḁ̴̍d̶̖̔ ȓ̵͙ė̶͎ḟ̴͙e̸̖͛r̶̖͗ë̶̱́ṉ̵̒ĉ̷̥e̷͚̍ s̷̹͌h̷̲̉a̵̭͋r̷̫̊ḭ̵̊n̷̬͂g̵̦̃ f̶̻̊ơ̵̜ṟ̸̈́ R̵̞̋ù̵̺s̷̖̅ţ̸͗!̸̼͋

Rust S̵̓i̸̓n̵̉ I̴n̴f̶e̸r̵n̷a̴l mutability! Howdy, friendly Rust developer! Ever had a value get m̵̯̅ð̶͊v̴̮̾ê̴̼͘d away right under your nose just when

294 Dec 23, 2022

[Proof of Concept] Embedded functional scripting language with YAML ¯\_(ツ)_/¯

[YAML, fun] Just an experimental project implementing embedded functional scripting language based on YAML syntax. API docs for the standard library:

12 Aug 15, 2022

A proof of concept file dropper utilizing PowerShell loosely based off

8 Jun 21, 2022

a nom parser combinator that matches a psql statement.

psql_splitter a nom parser combinator that matches a psql statement. Postgres has a dialect of SQL that I'm going to call pgsql. Postgres also has a c

1 Dec 6, 2021

Rust macro to use a match-like syntax as a elegant alternative to nesting if-else statement

cond Rust macro to use a match-like syntax as an elegant alternative to many if-else statements. I got the idea from empty Go switch statements. I tho

3 Nov 23, 2023

interactcli-rs is a command-line program framework used to solve the problem of the integration of command-line and interactive modes, including functions such as unification of command-line interactive modes and sub-command prompts. The framework integrates clap and shellwords.

interactcli-rs 简体中文 interactcli-rs is a command-line program framework used to solve the problem of the integration of command-line and interactive mo