The hash programming language compiler

Overview

The Hash Programming language

Run

  • Using the command cargo run hash. This will compile, build and run the program in the current terminal/shell.

Submitting a PR

Before submitting a PR, make sure that:

@@TODO

  • Your code builds successfully, and all the tests run successfully.
Comments
  • source: Shrink `SourceId` from 12 bytes to 4 bytes

    source: Shrink `SourceId` from 12 bytes to 4 bytes

    This patch simplifies the logic of how sources are identified within the compiler, and significantly cuts down on the memory usage of the representation. Previously, the SourceId was defined as:

    enum SourceId {
       Module(ModuleId),
       Interactive(InteractiveId)
    }
    

    with each sub-id being 8 bytes (which is too big) in size. This design is somewhat wasteful because it forces SourceId to be aligned to 12 bytes. Now, SourceId is defined as the following:

    struct SourceId {
        _raw: u32
    }
    

    This _raw value uses the leftmost bit to represent the kind of source it is (either module or interactive), and the rest of the bits are used to describe the Id of the actual source. This is a reasonable size reduction because the compiler isn't expecting the total amount of sources to exceed $2^{31} - 1$.

    This is an effort to reduce the overall memory usage of SourceLocation from 20 bytes to 12 bytes.

    This patch also gets rid of a bunch of hashmaps and slotmaps that aren't really needed to represent the SourceMap or the NodeMap.

    performance 
    opened by feds01 7
  • TC: Implement enums as unions

    TC: Implement enums as unions

    This PR implements the basics of enums-as-unions. To do this, a couple of modifications were made:

    1. Accessing on unions now looks for the nominal in the union matching the accessed name. So Foo := Red | Green | Blue; Foo::Red works out of the box.
    2. For now, instantiating unit structs requires () after it, so true(), None() etc. This is hopefully temporary and is due to the difficulty in differentiating between the type of the unit and the (singular) value.

    Another thing that needs to be done is type enum access as the enum type rather than the variant type, under certain conditions (i.e. when the variable is mutable). This will be done in a future PR.

    typechecking 
    opened by kontheocharis 5
  • TC: Attach type information to nodes

    TC: Attach type information to nodes

    This PR adds a store NodeInfoStore which maps AstNodeIds to TermIds and PatIds. It is a necessary step for implementing mono/lowering to IR, as well as any post-TC semantic stages.

    Closes #240.

    typechecking ast 
    opened by kontheocharis 4
  • Add integer range patterns

    Add integer range patterns

    In an effort to consolidate the typechecking primitives::Pat and exhaustiveness::lower::Pat data structures, integer ranges need to be added to the language:

    Questions to answer:

    • What is the syntax? My proposition would be:
      • a..b for open range, as in a <= x < b
      • a..=b for open range, as in a <= x <= b
      • a.. for unbounded range, as in a <= x <= type_of(a)::MAX with the assumption that a is of an integral type
      • should we have some kind of stepping variant?

    Things that need to be done:

    • AST:
      • New pat variant
    • Support in parsing
    • Semantic Analysis (ensure range makes sense)
      • lower bound is smaller or equal to the upper bound
    • Support in tc traversal
    • Support in exhaustiveness checking (needs to be hooked up with existing logic)

    @kontheocharis thoughts?

    language pattern-matching 
    opened by feds01 4
  • TC: Properly ensure impls implement their traits, and infer original types while at it

    TC: Properly ensure impls implement their traits, and infer original types while at it

    1. Ensure all members in the impl appear in the trait, and have correct types (extension: if types are incomplete, infer from trait. Maybe this should be done in the future). (Unification needs to happen with src=impl and target=trait)
    2. Ensure all members in the trait appear in the impl.
    typechecking 
    opened by kontheocharis 4
  • reporting: Start span start line mis-calculation

    reporting: Start span start line mis-calculation

    In the following error report: image

    The highlighter mistakingly highlights the 9 line number. The span of the expression does not actually begin on the 9th line. This might be a mistake within the lexing implementation or a miscalculation within the report offset to (col, row) calculation. It's clear that the line of the span is for some reason being recorded as the end of line 9 where there is a \n character at the end.

    Further investigation yielded that the actual span of expression (as printed from the raw source is):

    loc=134:195, actual="IoError = struct(
        error: IoErrorType,
        message: str,
    );"
    

    Ideally, the report should not highlight the line number 9, like so:

    image

    This could be fixed by changing offset_col_row by not counting the \n in the case of the initial span offset, an implementation could be:

    pub(crate) fn offset_col_row(offset: usize, source: &str, non_inclusive: bool) -> (/* col: */ usize, /* row: */ usize) {
        let source_lines = source.split('\n');
    
        let mut bytes_skipped = 0;
        let mut total_lines: usize = 0;
        let mut last_line_len = 0;
    
        let mut line_index = None;
        for (line_idx, line) in source_lines.enumerate() {
            // One byte for the newline
            let skip_width = line.len() + 1;
    
    
            // Here, we don't want an inclusive range because we don't want to get the last byte because
            // that will always point 
            let range = if non_inclusive {
                bytes_skipped..bytes_skipped + skip_width
            } else {
                bytes_skipped..bytes_skipped + skip_width + 1
            };
    
            if range.contains(&offset) {
                line_index = Some((offset - bytes_skipped, line_idx));
                break;
            }
    
            bytes_skipped += skip_width;
            total_lines += 1;
            last_line_len = line.len();
        }
    
        line_index.unwrap_or((
            last_line_len.saturating_sub(1),
            total_lines.saturating_sub(1),
        ))
    }
    
    bug error-reporting 
    opened by feds01 3
  • Improve lexing by storing all tokens in a flat fashion

    Improve lexing by storing all tokens in a flat fashion

    Currently, in hash-parser, the parser stores trees of tokens (by nested braces/parens) as separate vectors. This is wasteful in terms of allocation. It would be better if they were all stored in the same vector, and each opening brace token had a position offset stored within it to the closing brace. This way, the parsing can still be parallelised while not having to resort to multiple vectors.

    enhancement performance 
    opened by kontheocharis 3
  • Parsing

    Parsing

    This patch establishes parsing within the compiler, including the ability to support future custom 'parsing backends'.

    • Grammar was finalised and tested using the language_manifest.hash file
    • Module discovery is implemented and working
    • Parallel and sequential parsing
    • Full AST generation from the pest grammar
    parser 
    opened by feds01 3
  • Add ability to automatically generate default visitor methods that just

    Add ability to automatically generate default visitor methods that just "walk" the AST and return void

    This adds the macros:

    ast_visitor*_default_impl!(Expr, Name);
    

    Providing the arguments Expr, Main, the macro will generate default implementations for those AST nodes. In subsequent PRs this can be used to replace all the manually written methods with this macro.

    It would also be nice to have a way to "hide" AST nodes, but macros make that a little bit more complicated. It is left as a @@Todo.

    ast 
    opened by kontheocharis 2
  • IR: initial work to hook up ir generation stage with the rest of the compiler

    IR: initial work to hook up ir generation stage with the rest of the compiler

    This patch is a progress update on the IR generation process of the compiler; adding the initial steps to lower the source into Hash IR.

    • a visitor pattern for the IR to "discover" items that need to be lowered
    • IR builder work, sketch out the initial process of lowering functions
    • use bitflags! within the pipeline to represent the "status" of a module, whether it has been typechecked, de-sugared, semantically checked, etc.
    • ast: rename Expr::LitExpr to Expr::Lit

    The next steps are to expose scopes from the typechecker, which will be done after hash-types crate is re-organised (split lib.rs) to hold relevant data structures to each file that is already within the crate.

    ast core lowering 
    opened by feds01 2
  • Naming: Use consistent naming between typechecker and parser

    Naming: Use consistent naming between typechecker and parser

    Currently the parser naming of AST elements differs from the typechecker's. For example, TyFnTy vs TypeFunction, and TyFn vs TypeFunctionDef, TypeFunctionCall vs AppTyFn. These should probably be brought closer together for consistency's sake.

    ast conventions 
    opened by kontheocharis 2
  • Constructors should record data type arguments

    Constructors should record data type arguments

    Constructors should be Ctor(DataDefId, DefArgsId, CtorDefId, DefArgsId) rather than Ctor(CtorDefId, DefArgsId), because the data type arguments should also be known.

    typechecking 
    opened by kontheocharis 0
  • Fix issue with definition parameters not working properly

    Fix issue with definition parameters not working properly

    Currently, writing:

    i32 := struct();
    
    Bar := mod {
      Bor := enum <T> (Bing, Bang(i32))
    };
    
    foo := () -> Bar::Bor => {
      x := Bar::Bor<()>::Bang(i32());
    };
    

    produces

    error[0035]: mismatch in parameter groups: expected 1 groups but got 0
    
    error[0035]: mismatch in parameter groups: expected 0 groups but got 1
    

    when it should typecheck instead.

    bug typechecking 
    opened by kontheocharis 0
  • semantics: decide on the rules of `as` i.e. casts

    semantics: decide on the rules of `as` i.e. casts

    The language currently does not define any strict rules about as casts. This should be dedicated so that it can be:

    • formally implemented in the IR and lowered
    • supported by whichever backend (if it involves runtime conversions)

    Cast kinds

    Some clear requirements from casts are:

    • Numeric casts:
      • integer type to another integer type cast (for example 8u8 as u64)
      • integer to float cast
      • float to integer cast
    • pointer casts
      • what are the valid pointer casts?
        • array to pointer?
        • decide on how different pointer kinds interact with one another
        • Ref<T> to &T?
        • function pointer casts?

    We should also consider the following

    • c-kind casts
      • enums to integer casts - so long as the enum is compatible with a c-like enum
      • bool or char to integer type cast
      • u8 to `char cast

    Numeric cast semantics

    We should define some clear semantics on numeric casting, what is valid, what isn't, if there is a runtime cost, etc.

    • casting between an integer kind of the same size is a no-op
    • downcasting between integers results in truncation
    • upcasting integers will result:
      • zero-extend if the source integer type is unsigned
      • sign-extend if the source integer type is signed
    • casting between a float to an integer will...?
    • casting between an integer to a float will...?
    • upcasting floats is a perfect and lossless operation.
    • downcasting floats tries to get the nearest approximation of float value (how?)
    • should we follow a formal specification?

    Enum cast

    Cast from an enum value to its discriminant value, and then uses a numeric cast.

    c-kind casts

    Boolean to integer cast adheres to the following rules (basically an enum cast):

    • false casts to 0
    • true casts to 1

    u8 to char cast converts the value into the corresponding char code point.

    pointer casts

    An area for discussion!

    What are the rules for pointer casts... can pointers be cast into an integral type, how do different reference types interact with one another when it comes to casting, etc?

    semantic-analysis 
    opened by feds01 0
Owner
Hash
The hash programming language.
Hash
A computer programming language interpreter written in Rust

Ella lang Welcome to Ella lang! Ella lang is a computer programming language implemented in Rust.

Luke Chu 64 May 27, 2022
Oxide Programming Language

Oxide Programming Language Interpreted C-like language with a Rust influenced syntax. Latest release Example programs /// recursive function calls to

Arthur Kurbidaev 113 Nov 21, 2022
🍖 ham, general purpose programming language

?? ham, a programming language made in rust status: alpha Goals Speed Security Comfort Example fn calc(value){ if value == 5 { return 0

Marc Espín 19 Nov 10, 2022
A small programming language created in an hour

Building a programming language in an hour This is the project I made while doing the Building a programming language in an hour video. You can run it

JT 40 Nov 24, 2022
The Loop programming language

Loop Language Documentation | Website A dynamic type-safe general purpose programming language Note: currently Loop is being re-written into Rust. Mea

LoopLanguage 20 Oct 21, 2022
Stackbased programming language

Rack is a stackbased programming language inspired by Forth, every operation push or pop on the stack. Because the language is stackbased and for a ve

Xavier Hamel 1 Oct 28, 2021
REPL for the Rust programming language

Rusti A REPL for the Rust programming language. The rusti project is deprecated. It is not recommended for regular use. Dependencies On Unix systems,

Murarth 1.3k Dec 20, 2022
A Python compiler targeting JS, implemented in Rust.

A Python compiler targeting JavaScript, implemented in Rust.

Gideon Grinberg 5 Jun 17, 2021
sublingual: toy versions of existing programming languages

sublingual: toy versions of existing programming languages This is a collection of toy languages created by taking much "larger" languages (e.g. Rust)

Eduard-Mihai Burtescu 20 Dec 28, 2022
A rusty dynamically typed scripting language

dyon A rusty dynamically typed scripting language Tutorial Dyon-Interactive Dyon Snippets /r/dyon Dyon script files end with .dyon. To run Dyon script

PistonDevelopers 1.5k Dec 27, 2022
A static, type inferred and embeddable language written in Rust.

gluon Gluon is a small, statically-typed, functional programming language designed for application embedding. Features Statically-typed - Static typin

null 2.7k Dec 29, 2022
Lisp dialect scripting and extension language for Rust programs

Ketos Ketos is a Lisp dialect functional programming language. The primary goal of Ketos is to serve as a scripting and extension language for program

Murarth 721 Dec 12, 2022
Source code for the Mun language and runtime.

Mun Mun is a programming language empowering creation through iteration. Features Ahead of time compilation - Mun is compiled ahead of time (AOT), as

The Mun Programming Language 1.5k Jan 9, 2023
Rhai - An embedded scripting language for Rust.

Rhai - Embedded Scripting for Rust Rhai is an embedded scripting language and evaluation engine for Rust that gives a safe and easy way to add scripti

Rhai - Embedded scripting language and engine for Rust 2.4k Dec 29, 2022
Interpreted language developed in Rust

Xelis VM Xelis is an interpreted language developed in Rust. It supports constants, functions, while/for loops, arrays and structures. The syntax is s

null 8 Jun 21, 2022
Interactive interpreter for a statement-based proof-of-concept language.

nhotyp-lang Nhotyp is a conceptual language designed for ease of implementation during my tutoring in an introductive algorithmic course at Harbin Ins

Geoffrey Tang 5 Jun 26, 2022
Scripting language focused on processing tabular data.

ogma Welcome to the ogma project! ogma is a scripting language focused on ergonomically and efficiently processing tabular data, with batteries includ

kdr-aus 146 Dec 26, 2022
Rust implementation of the PTHash perfect hash function for static compile-time generated hash tables

QuickPHF QuickPHF is a Rust implementation of the PTHash minimal perfect hash function algorithm. It consists of two crates: quickphf - runtime code f

Darko Trifunovski 11 Oct 20, 2023
C-like language compiler, the final project of ZJU Compiler Principle course

cc99 cc99 (not cc98.org) is a C-like language compiler, which is the final project of ZJU Compiler Principle course. It supports many of the C99 langu

Ralph 37 Oct 18, 2022