Build database expression type checker and vectorized runtime executor in type-safe Rust

Overview

Typed Type Exercise in Rust

Build database expression type checker and vectorized runtime executor in type-safe Rust.

This project is highly inspired by @skyzh's type-exercise-in-rust. While adopting his idea in Databend, I also implemented a few features that I think are useful:

  1. Type checking. The type checker can catch all type errors in the SQL compilation phase with a set of carefully defined typing rules. The type checker outputs a totally untyped expression that is ready for runtime execution. So this makes the runtime free of any type information.

  2. Type-safe downcast. Function authors no longer have to worry about downcasting runtime inputs. Thanks to Rust's type system, so long as your function compiles, the downcast is always successful.

  3. All-in-one generic trait. We've only one trait Type. All other traits like Array, ArrayBuilder, ArrayRef and their sophisticated trait bound are all wiped out.

  4. Enum-dispatched columns. Use enum to exhaustive all column types and scalar types. They should further minimize runtime overhead and mental effort, compared to dyn-dispatched strategy.

Snippet of code

Define a fast, type-safe, auto-downcating and vectorized binary function in three lines of code:

let bool_and = Function::new_2_arg::<BooleanType, BooleanType, BooleanType, _>("and", |lhs, rhs| {
    vectorize_binary(lhs, rhs, |lhs: &bool, rhs: &bool| *lhs && *rhs)
});

Run

cargo run

Things to do

  • Automatcially generate the nullable function.
  • Implement arrays.
  • Implement unlimited-length tuples.
  • Implment generic functions.
  • Check ambiguity between function overloads.
  • Read material for the project.

Reading material

You might also like...
Nyah is a programming language runtime built for high performance and comes with a scripting language.

🐱 Nyah ( Unfinished ) Nyah is a programming language runtime built for high performance and comes with a scripting language. 🎖️ Status Nyah is not c

🐱 A high-speed JIT programming language and its runtime, meow~

🐱 A high-speed JIT programming language and its runtime, meow~

An asynchronous runtime compatible with WebAssembly and non-WebAssembly targets.

Promise x Tokio = Prokio An asynchronous runtime compatible with WebAssembly and non-WebAssembly targets. Rationale When designing components and libr

rust database for you to use and help me make!

Welcome To Rust Database! What is this? this is a database for you to git clone and use in your project! Why should i use it? It is fast and it takes

Safe, idiomatic bindings to cFE and OSAL APIs for Rust
Safe, idiomatic bindings to cFE and OSAL APIs for Rust

n2o4 The n2o4 crate provides safe, idiomatic Rust bindings to the APIs of cFE and OSAL, the libraries of the Core Flight System (cFS). IMPORTANT NOTE

An AI-native lightweight, reliable, and high performance open-source vector database.
An AI-native lightweight, reliable, and high performance open-source vector database.

What is OasysDB? OasysDB is a vector database that can be used to store and query high-dimensional vectors. Our goal is to make OasysDB fast and easy

Simple, safe way to store and distribute tensors

safetensors Safetensors This repository implements a new simple format for storing tensors safely (as opposed to pickle) and that is still fast (zero-

Thread-safe clone-on-write container for fast concurrent writing and reading.

sync_cow Thread-safe clone-on-write container for fast concurrent writing and reading. SyncCow is a container for concurrent writing and reading of da

Comments
  • Add create_array function

    Add create_array function

    TL;DR

    Given two columns:

    a b
    ---
    0 5
    1 6
    2 7
    3 8
    4 9
    

    create_array will combine these two columns into an array:

    create_array(a, b)
    ------------------
    [0, 5]
    [1, 6]
    [2, 7]
    [3, 8]
    [4, 9]
    

    And of course, you can put create_array in another create_array:

    create_array((create_array(a, b)), NULL, NULL)
    ----------------------------------------------
    [[0, 5], NULL, NULL]
    [[1, 6], NULL, NULL]
    [[2, 7], NULL, NULL]
    [[3, 8], NULL, NULL]
    [[4, 9], NULL, NULL]
    

    The memory layout of the last result is like this:

    Array { 
        array: Nullable { 
            column: Array { 
                array: Int16([0, 5, 1, 6, 2, 7, 3, 8, 4, 9]), 
                offsets: [0..2, 2..2, 2..2, 2..4, 4..4, 4..4, 4..6, 6..6, 6..6, 6..8, 8..8, 8..8, 8..10, 10..10, 10..10]
            },
            nulls: [false, true, true, false, true, true, false, true, true, false, true, true, false, true, true]
        },
        offsets: [0..3, 3..6, 6..9, 9..12, 12..15]
    }
    

    The memory layout shown above is a three-layers indirect tree into a single underlying data slice:

    [0..3             , 3..6             , 6..9             , 9..12            , 12..15               ] : which row? (the range points to the next layer)
    
    [0..2,  2..2, 2..2, 2..4,  4..4, 4..4, 4..6,  6..6, 6..6, 6..8,  8..8, 8..8, 8..10, 10..10, 10..10] : offsets into a flatten array item (the range points to the next layer)
    [false, true, true, false, true, true, false, true, true, false, true, true, false, true,   true  ] : is the item null?
    
    [0, 5,              1, 6,              2, 7,              3, 8,              4, 9                 ] : array data
    
    opened by andylokandy 3
  • Implement generics in type system

    Implement generics in type system

    TL;DR

    Define the function get(array: Array<T>, idx: i16) -> T for any T is just as simple as:

    registry.register_2_arg::<ArrayType<GenericType<0>>, Int16Type, GenericType<0>, _>(
        "get",
        |array, idx| array.index(*idx as usize),
    );
    

    Then it will just work for any T, including multi-dimensional arrays. Explained in pseudo-sql:

    create table t (arr Array<Array<Int16>>, idx Int16);
    insert into t values ([[0, 1], [2, 3]], 0);
    insert into t values ([[4, 5], [6, 7]], 1);
    
    select get(arr, idx) from t;
    +---------------+
    | get(arr, idx) |
    +---------------+
    | [0, 1]        |
    | [6, 7]        |
    +---------------+
    
    opened by andylokandy 1
Owner
Andy Lok
Andy Lok
S-expression parsing and writing in Rust

rsexp S-expression parsing and writing in Rust using nom parser combinators. This implemantion aims at being compatible with OCaml's sexplib. The main

Laurent Mazare 12 Oct 18, 2022
🎮 game loop + 🐷 coroutine + 🌯 burrito = 🚀🔥 blazingly synchronous async executor for games 🔥🚀

?? Koryto ?? Pronounced like corrito, which is pronounced as if you combined coroutine and burrito, because everyone knows coroutines are burritos in

Jakub Arnold 3 Jul 6, 2023
Tools to feature more lenient Polonius-based borrow-checker patterns in stable Rust

Though this be madness, yet there is method in 't. More context Hamlet: For yourself, sir, shall grow old as I am – if, like a crab, you could go back

Daniel Henry-Mantilla 52 Dec 26, 2022
A simple path traversal checker made with Rust. Useful for APIs that serve dynamic files.

Path trav A simple path traversal checker made with Rust. Useful for APIs that serve dynamic files. Note: this is a security tool. If you see somethin

Gátomo 3 Nov 21, 2022
An expression based data notation, aimed at transpiling itself to any cascaded data notation.

Lala An expression oriented data notation, aimed at transpiling itself to any cascaded data notation. Lala is separated into three components: Nana, L

null 37 Mar 9, 2022
A type-safe, high speed programming language for scalable systems

A type-safe, high speed programming language for scalable systems! (featuring a cheesy logo!) note: the compiler is unfinished and probably buggy. if

Hail 0 Sep 14, 2022
Type erased vector. All elements have the same type.

Type erased vector. All elements have the same type. Designed to be type-erased as far as possible - most of the operations does not know about concre

null 7 Dec 3, 2022
Safe, efficient, and ergonomic bindings to Wolfram LibraryLink and the Wolfram Language

wolfram-library-link Bindings to the Wolfram LibraryLink interface, making it possible to call Rust code from the Wolfram Language. This library is us

Wolfram Research, Inc. 28 Dec 6, 2022
Secure mTLS and gRPC backed runtime daemon. Alternative to systemd. Written in Rust.

Auraed A runtime daemon written in Rust. Designed to run as pid 1 mTLS backed gRPC API over unix domain socket Run executables Run containers Run virt

Aurae Runtime 57 Dec 22, 2022
Salty and Sweet one-line Rust Runtime Optimization Library

SAS SAS (Salty-And-Sweet) is an one-line Rust runtime optimization library. Features NUMA-aware rayon: numa feature should be enabled If you have 1 NU

UlagBulag 3 Feb 21, 2024