A language that ports⚓: examining the limits of compilation⚙️.

Overview

harbor

A language that ports: examining the limits of compilation.

Donate Open Source

Demo | Crates | Contact Me

Written in Rust 🦀 💖

NOTE: Click the images above for an interactive demonstration!

About the Author

I'm a bored 19 year old sophomore in college working on projects to fill the time. If you enjoy my work, consider supporting me by buying me a coffee!

Buy Me A Coffee

What is this project?

Harbor is a high level programming language with type checking (supports unsigned integers, booleans, characters, pointers, tuples) and manual memory management. What does that mean? Harbor is basically a stripped down version of C. What makes Harbor special then? It compiles to a dialect of Brainf*** called Dynamic Brainf***.

Brainf*** programs are composed entirely of the following operators only:

MIR
Operator Description
< Move the pointer one cell to the left.
> Move the pointer one cell to the right.
+ Increment the current cell by 1.
- Decrement the current cell by 1.
[ Begin a loop while the cell at the pointer is not zero.
] Mark the ending of a loop body.
, Make the current cell equal to the next byte of input.
. Output the current cell as a byte.

Dynamic Brainf*** provides six additional operators: two for memory management, two for pointer manipulation, and two for better IO. With these new operators, it's possible to compile common abstractions like references, stack operations, and compound datatypes.

Operator Description
? Read the value of the current cell, and allocate that many cells at the end of the tape. Then, set the current cell's value equal to the index of first cell in that allocated block.
! Read the value of the current cell, and free the allocated cells starting at that index.
* Push the pointer to a stack, and set the pointer equal to the value of the current cell.
& Pop the old pointer off the dereference stack, and set the pointer equal to it.
# Make the current cell equal to the next integer in the input buffer (like scanf("%d", &tape[pointer])).
$ Output the current cell as an integer (like printf("%d", tape[pointer])).

To give you some perspective on just how little Dynamic Brainf*** adds, the code responsible for assembling Dynamic Brainf*** in this compiler is just 24 lines long! You can write a compiler for it just using string replacement!

How does it work?

Harbor source code goes through three stages before the output code: HIR, MIR, and LIR.

Flow

HIR provides a typesystem and performs typechecking, MIR provides a small untyped reverse-polish-notation assembly language, and LIR is an internal representation of Dynamic Brainf*** specially structured to optimize generated code.

The most interesting part of the compilation process is the transition from Harbor MIR to Dynamic Brainf***. Harbor MIR looks like this:

MIRFib DBF

Memory Layout

MIR provides 14 registers:

  • SP: the stack pointer.
  • FP: the frame pointer.
  • TMP0 through TMP5: 6 temporary registers for helping with arithmetic operations. These are compiler only.
  • R0 through R5: 6 general purpose registers for the user.

The registers are statically allocated by the compiler at the first 14 cells, with the stack beginning immediately after.

Registers

You might notice that FP strangely comes after TMP0 and TMP1, but before TMP2. There's a good reason for this: copying memory cells in Brainf*** dialects is a very expensive (and very frequent) operation. When memory is copied, it uses TMP0 as a buffer:

Copy Cell

So, TMP0 is placed before FP to increase locality (it takes fewer cycles to shift the pointer to TMP0), but I'm sure the effect is negligible. TMP1 is also placed before FP for similar reasons: it's used frequently in almost all alrithmetic operations. TMP2 through TMP5 are more specialized registers, mainly used for integer division, multiplication, and setting up stack frames and activation records for functions.

MIR Opcodes

Opcode Description
set 123 Pops an address and stores a value at that address
= (called Store internally) Pops an address and pops a value into that address. Also takes an optional size parameter for the number of cells store at the address like: %int, %(%int, %bool), or %char.
@ (called Load internally) Pops an address and loads a value from that address. Also takes an optional size parameter to load from the address like: %int, %(%int, %int), or %char.
get %int (called Stalloc internally) Pushes a block of memory on the stack with the given size. %int allocates one cell, %(%int, %int) allocates 2, etc.
dump %int (called Stfree internally) Deallocates a block of memory on the stack with the given size (%int is one cell).
123 Integer literals are pushed to the stack.
+ Pop two numbers off the stack and push their sum.
- Pop two numbers off the stack and push their difference.
* Pop two numbers off the stack and push their product.
/ Pop two numbers off the stack and push their quotient.
== Pop two numbers off the stack and push their equality.
!= Pop two numbers off the stack and push their inequality.
| Pop two numbers off the stack and push their logical or (anything not zero is true).
& Pop two numbers off the stack and push their logical and.
! Pop a number off the stack and push its logical complement.
alloc Pop a number off the stack, allocate that many cells at the end of the tape, and push the address of the allocated block.
free Pop an address off the stack and free the cells at that block.
dup Duplicate the top cell on the stack.
frame %int -> %(%int, %int) do ... end Create a stack frame for a code block that takes an argument and returns a value. The FP points at the first argument, and the return value is left on the stack when the code block ends after the frame is destructed.
if (2 4 *) do ... end Perform an if statement. Else clauses are not supported: it's complicated, but essentially nested if-else statements would walk over each other's saved conditions in the stack.
$R0, $R1, ..., $R5 Push a register's value onto the stack.
&R0, &R1, ..., &R5 Push a register's address onto the stack.

There are also 6 predefined macros for MIR. putnum and putchar both pop a cell off the stack and print it. getchar retrieves a byte of user input and pushes it onto the stack. getnum retrieves an integer from user input and pushes it as well. Finally, inc and dec increment or decrement the value pointed to by the top value on the stack.

MIR opcodes are composed of a sort of "microcode" that's really interesting and fun to write/optimize. The code generator for the addition opcode illustrates this pretty well:

Addition MemoryAlgorithm

Originally, I implemented addition by popping the two values into temporary registers (TMP1 and TMP2), performing the addition, and then pushing the result onto the stack. This solution is much more efficient, as everything is done in place instead of moving values around in memory!

It's also very satisfying to see the result of the optimizations on the output code as well: because everything implemented in Brainf*** seems to be on the order of O(n^2), any reduction in memory usage seems to have a dramatic effect.

These microcode blocks can also get absurdly long: division is upwards of 60 instructions!

Harbor Frontend

Harbor's frontend is significantly more cozy than its MIR; looking at it you wouldn't know it's a terrible, horrible language!

Frontend

Harbor supports method like syntax for function calls, let type inference, pointers with indexing [] and dereference * operators, tuples, heap allocated string literals, and a strict type system.

Its syntax is Rust inspired, but with several slight quirks. Because of the way MIR internally represents scopes and frames, it was much simpler to implement expressions in an explicitly chained manner:

let z = 11 in
  putnum((let x = 5,
      y = 6 in x + y * z))

With this syntax, scopes are explicitly created and destructed upon individual expressions: they're managed by simply creating a frame for each let expression, and destructing it at the end of the let body.

Method

Because method calls are just syntax sugar for function calls, the user needs an alternative way to pass the "self" parameter as a pointer. To do this, I increased the precedence of & to take place before the . and -> operators. So, in the example above, the expression &n.inc.square->putnumln expands to putnumln(*square(inc(&n))). I know this syntax looks confusing to anyone familiar with pointers, but it's impossible to misuse due to the strict typesystem.

The Recursion Problem

Basic Blocks

This seems to be the only area where Harbor really lacks in its domain. The way the compiler is constructed, it is basically impossible to implement recursion in a sane way.

In a language like Brainf***, where the only method of control flow is looping, implementing functions with scopes is difficult enough: it's exceedingly hard to simulate function calls and frames without any mechanism for jumping to an arbitrary instruction.

This can be accomplished, however using a technique where code blocks are divided into "basic blocks": code without any jumps inbetween the beginning and end. This simplifies the structure of the program such that jumps don't occur in the middle of execution: control flow only needs to be monitored and managed between basic blocks, which can easily be done with traditional if statements and while loops!

Unfortunately, though, I was not aware of this solution at the time I implemented most of the compiler, and implementing it probably would have taken far too long anyways. The important thing to note is that it is possible to compile recursive, functional code to Dynamic Brainf***.

Exercises for the reader

  • LLVM or x86 Dynamic Brainf*** Compiler: Harbor compiles its output Dynamic Brainf*** to C, but other compilers targeting LLVM or x86 would be a significant improvement.
  • Reverse-Engineering-Optimizing Dynamic Brainf*** Compiler: because of the way Harbor compiles code, optimizations can easily be applied by reverse engineering the output code. For example: each MIR arithmetic stack operation always compiles to the same result. To optimize the compiled Dynamic Brainf*** code, simply compile the code responsible for an opcode as the actual opcode operation instead of performing hundreds of small Brainf*** operations to achieve the same thing! With such a compiler, Harbor could be as efficient as unoptimized C (This sentence brings me great shame)!
  • Hardware Implementation: imagine running this terrible language natively! All of the fun debugging with a shell, but with an oscilloscope instead!
  • Minecraft Brainf*** Implementation: it would be entirely possible (and exceeding difficult) to implement a 5 or 6 bit implementation (the minimum possible address size is 5 bit, as 4 bit only leaves 2 cells for the stack) of a Dynamic Brainf*** machine, possibly with simplified IO, that could run this compiler's output code natively!

Usage

To install and use, you must download the Rust programming language.

Development Build

# Install directly from git with cargo
cargo install --git https://github.com/adam-mcdaniel/harbor

# Or, alternatively, the repo and install from source
git clone https://github.com/adam-mcdaniel/harbor
cd harbor
cargo install -f --path .

Releases

To get the current release build, install from crates.io.

# Also works for updating harbor
cargo install -f harborc

After Install

# Just run the harbor executable!
harbor
You might also like...
Ruxnasm is an assembler for Uxntal — a programming language for the Uxn stack-machine by Hundred Rabbits

Ruxnasm is an assembler for Uxntal — a programming language for the Uxn stack-machine by Hundred Rabbits. Ruxnasm strives to be an alternative to Uxnasm, featuring more user-friendly error reporting, warnings, and helpful hints, reminiscent of those seen in modern compilers for languages such as Rust or Elm.

A Simple, But amazing telegram bot, Made using the Rust language!

Telegram bot in Rust A fun Telegram bot made using Rust language.

a simple compiled language i made in rust. it uses intermediate representation (IR) instead of an abstract syntax tree (AST).

a simple compiled language i made in rust. it uses intermediate representation (IR) instead of an abstract syntax tree (AST).

A compiler for the esoteric language ℂ.

The ℂ Programming Language It's a language where the only types are "complex number" and "matrix of complex numbers". In particular, this means you ca

Frame is a markdown language for creating state machines (automata) in 7 programming languages as well as generating UML documentation.

Frame Language Transpiler v0.5.1 Hi! So very glad you are interested in Frame. Frame system design markdown language for software architects and engin

beat saber is a strongly typed, self-documenting and highly performant programming language

beatsaber beat saber is a strongly typed, self-documenting and highly performant programming language. With beat saber we aimed to create a language t

Analogous, indented syntax for the Rust programming language.

Note: After experimenting with this in the wild, I have found representing keywords as symbols to be far less readable in large codebases. Additionall

A cell-based esoteric programming language

Tape A cell-based esoteric programming language Tape is a cell-based, brainfuck-like programming language that has a readable syntax and a non-wasted

 RusTiny -- A Rust implementation of Tiny+ language
RusTiny -- A Rust implementation of Tiny+ language

RusTiny -- A Rust implementation of Tiny+ language 编译器实践 基本要求: 参考《编译原理及实践》的TINY语言编译器(已上传到群中)完成TINY+ 语言(见附录 A)的解释器:即给定满足 TINY+语言的源代码输入,你的解 释器可以给出对其的解释执

Owner
adam mcdaniel
open sourcerer🔓🏗️, musician🎸🎶, and student🎓
adam mcdaniel
tr-lang is a language that aims to bring programming language syntax closer to Turkish.

tr-lang Made with ❤️ in ???? tr-lang is a language that aims to bring programming language syntax closer to Turkish. tr-lang is a stack based language

Kerem Göksu 10 Apr 2, 2022
A language server implementation for the WGSL shading language

wgsl-analyzer wgsl-analyzer is a language server plugin for the WGSL Shading language. It comes with a VS Code plugin located in ./editors/code, but d

null 155 Jan 2, 2023
The SATySFi Language Server

[WIP] SATySFi Language Server This repository is work-in-progress yet. Features Kind Function Done codeAction Add the definition of an undefined comma

monaqa 50 Dec 24, 2022
scraps of a potential language

seaslug small, beautiful, knowable, DOESN'T EXIST YET LOL non-turing complete, verified terminating code placed into well-defined interfaces similar t

Tyler Neely 34 Nov 1, 2022
A language server for lua written in rust

lua-analyzer lua-analyzer is a lsp server for lua. This is mostly for me to learn the lsp protocol and language analysis so suggestions are helpful. T

null 61 Dec 11, 2022
A scripting language that allows complex key remapping on Linux.

Map2 A scripting language that allows complex key remapping on Linux, written in Rust. All of the functionality related to interacting with graphical

Matt 99 Dec 6, 2022
Uindex is a data store, for data that can be parsed as sentences in some context-free language.

Uindex - Universal index Uindex is a data store, for data that can be parsed as sentences in some context-free language.

Enrique Pérez Arnaud 3 Jul 20, 2021
A simple programming language for everyone.

Slang A simple programming language for everyone, made with Rust. State In very early stages. Plan is to create a byte-code compiler and make that exe

Slang, Inc. 11 Jul 1, 2022
A programming language. Better mantra pending.

Dusk Dusk is a programming language I've been working on on and off for the past while now. It's still very much in its infancy (... a quick look thro

Kaylynn Morgan 14 Oct 24, 2022
very cool esoteric language pls use

okfrick has one memory pointer has less than 5 characters hopefully works well is turing complete (possibly) + - increase memeory pointer value ( - st

null 3 Jun 24, 2021