X-Engine: A SQL Engine built from scratch in Rust.

Related tags

Database xngin
Overview

XNGIN (pronounced "X Engine")

build codecov

This is a personal project to build a SQL engine from scratch.

The project name is inspired by Nginx, which is a very popular web server with high performance and ease to use.

Goal

  1. Fast.
  2. Easy to use.
  3. Distributed.

Non-Goal

Transactional management.

Development Plan

There are lots of things to do. Just list some as below.

Functionality Status
Frontend AST Definition Done
Frontend AST Parse Done
Frontend AST Format Done
Logical IR Definition In progress
Logical IR Rewrite Todo
Catalog Definition Demo
Catalog Maintain Todo
Statistics Definition Todo
Statistics Maintain Todo
Optimizer Framework Todo
Cost Model Todo
Optimizer implementation Todo
Plan Cache Todo
Physical Plan Definition Todo
Execution Framework Todo
In-Memory Data Format Todo
Physical Operators Todo
Client Protocol Todo
Internal Network Protocol Todo
Index Framework Todo
Index Implementation Todo
Backend Storage Todo
Backend Operators Todo
Backend Adaptor Todo
Data Exporter and Importer Todo

Current focus is on SQL interface and optimizer framework.

License

This project is licensed under either of

at your option.

Comments
  • Add rule predicate propagation

    Add rule predicate propagation

    When two tables join with equation conditions, and one side has predicate on join key, new predicate can be generated and pushed to the other side. The core of this rule is to filtering unused data as early as possible. Predicates on both sides of inner/semi join can be propagated. Predicates on left side of left/anti join can be propagated.

    -- before
    SELECT 1 FROM t1 JOIN t2 ON t1.c1 = t2.c2 AND t1.c1 > 0;
    
    -- after
    SELECT 1 FROM t1 JOIN t2 ON t1.c1 = t2.c2 AND t1.c1 > 0 AND t2.c2 > 0;
    

    Some times the predicate is not located in join condition, and the push may cross multiple query blocks with many operators.

    -- before 
    SELECT * 
    FROM (SELECT c1, count(*) as s1 FROM t1 WHERE c1 > 0 GROUP BY c1) t1 
    JOIN (SELECT c2, count(*) as s2 FROM t2 GROUP BY c2) t2
    ON t1.c1 = t2.c2;
    
    -- after
    SELECT * 
    FROM (SELECT c1, count(*) as s1 FROM t1 WHERE c1 > 0 GROUP BY c1) t1 
    JOIN (SELECT c2, count(*) as s2 FROM t2 WHERE c2 > 0 GROUP BY c2) t2
    ON t1.c1 = t2.c2;
    

    And the subquery case is similar.

    -- before 1
    SELECT c1 FROM t1 WHERE EXISTS (SELECT 1 FROM t2 where c2 = t1.c1 and c2 > 0);
    
    -- before 2
    SELECT c1 FROM t1 WHERE c1 > 0 AND EXISTS (SELECT 1 FROM t2 where c2 = t1.c1);
    
    -- after
    SELECT c1 FROM t1 WHERE c1 > 0 AND EXISTS (SELECT 1 FROM t2 where c2 = t1.c1 and c2 > 0);
    

    The implementation should try to cover such scenarios as much as possible.

    feature 
    opened by jiangzhe 2
  • Add rule pred pullup

    Add rule pred pullup

    close #68 close #52

    The rule of predicate pullup also includes some functionality of predicate propagate. remove duplicates and fold predicates are left for future improvement.

    opened by jiangzhe 1
  • Add rule outerjoin reduce

    Add rule outerjoin reduce

    close #30

    The rule is implemented by collecting null rejecting predicates and traverse child queries to change join types. The predicates are not really pushed down in this optimization.

    opened by jiangzhe 1
  • Add join graph operator, initialization and refactor rule of predicate pushdown

    Add join graph operator, initialization and refactor rule of predicate pushdown

    close #48

    Add join graph operator with optimize rule to initialize it. Refactor predicate pushdown to effectively push predicate down to and through join graph.

    opened by jiangzhe 1
  • Add rule unfold derived table

    Add rule unfold derived table

    This rule is to unfold flat derived table containing only simple projections. For example:

    SELECT * FROM (SELECT c0, c1 + 1 as cc FROM t1) x1 WHERE f(cc+1) > 0 
    

    could be unfolded and translate to equivalent SQL like below:

    SELECT c0, c1 + 1 as cc FROM t1 WHERE f(c1 + 1 + 1) > 0
    

    The benefits are not only further optimizations like expression simplify, but also more tables involved in join reorder. In some scenarios, users specify bushy join via derived table:

    SELECT 1 FROM t1 JOIN t2 JOIN (SELECT * FROM t3 JOIN t4)
    

    With this rule applied, we could reorder 4 tables instead of 3. Althrough SQL does not allow such syntax:

    SELECT 1 FROM t1 JOIN t2 JOIN (t3 JOIN t4)
    

    More advanced unfolding is left for future improvements, e.g. certain patterns of aggr+join involved in derived table could also be unfolded.

    feature 
    opened by jiangzhe 1
  • Codec of DATE

    Codec of DATE

    Usually we can encode date as u32, 2 bytes for year, 1 byte for month, 1 byte for day. Truncation codec may improve filter performance, that means we store minimum date on block level, then compact the offsets as u8 or u16 to speed up SIMD.

    feature 
    opened by jiangzhe 0
  • Build end-to-end data flow

    Build end-to-end data flow

    As draft version of join reorder algorithm is done, the plan module should work with fake cost model and cardinality estimations. To implement the estimator and cost model, the underlying storage and execution layers are required. I decided to build a MVP(Minimal Viable Product) end-to-end data flow first. The initial version will include:

    1. A read-only block-based storage.
    2. An async executor.
    3. A command line tool to execute single query.
    feature 
    opened by jiangzhe 0
  • Enhance optimization of rule to convert outerjoin to antijoin

    Enhance optimization of rule to convert outerjoin to antijoin

    SELECT c1 FROM t1
    LEFT JOIN t2
    ON t1.c1 = t2.c2
    WHERE t2.c2 IS NULL
    

    Above SQL uses LeftJoin to achieve the AntiJoin Semantics. We remove the WHERE clause and convert it to AntiJoin.

    feature 
    opened by jiangzhe 0
Owner
Jiang Zhe
Simple, not easy
Jiang Zhe
A Rust SQL query builder with a pleasant fluent API closely imitating actual SQL

Scooby An SQL query builder with a pleasant fluent API closely imitating actual SQL. Meant to comfortably build dynamic queries with a little bit of s

Aleksei Voronov 100 Nov 11, 2022
Gh-sql - Query GitHub Projects (beta) with SQL

gh-sql: Query GitHub Projects (beta) with SQL Installation gh extension install KOBA789/gh-sql Features SELECT items DELETE items UPDATE item fields

Hidekazu Kobayashi 108 Dec 7, 2022
SQL validator tool for BigQuery standard SQL.

bqvalid What bqvalid does bqvalid is the SQL validator tool for BigQuery standard SQL. bqvalid fails with error message if there's the expression that

null 10 Dec 25, 2022
ReadySet is a lightweight SQL caching engine written in Rust that helps developers enhance the performance and scalability of existing applications.

ReadySet is a SQL caching engine designed to help developers enhance the performance and scalability of their existing database-backed applications. W

ReadySet 1.7k Jan 8, 2023
SQL/JSON path engine in Rust.

sql-json-path SQL/JSON Path implementation in Rust. ?? Under development ?? Features Compatible with SQL/JSON Path standard and PostgreSQL implementat

RisingWave Labs 3 Nov 22, 2023
Rust client for Timeplus Proton, a fast and lightweight streaming SQL engine

Rust Client for Timeplus Proton Rust client for Timeplus Proton. Proton is a streaming SQL engine, a fast and lightweight alternative to Apache Flink,

Timeplus 4 Feb 27, 2024
A Toy Query Engine & SQL interface

Naive Query Engine (Toy for Learning) ?? This is a Query Engine which support SQL interface. And it is only a Toy for learn query engine only. You can

谭巍 45 Dec 21, 2022
SQLite clone from scratch in Rust

Rust-SQLite (SQLRite) Rust-SQLite, aka SQLRite , is a simple embedded database modeled off SQLite, but developed with Rust. The goal is get a better u

João Henrique Machado Silva 952 Jan 5, 2023
Simple document-based NoSQL DBMS from scratch

cudb (a.k.a. cuda++) Simple document-based noSQL DBMS modelled after MongoDB. (Has nothing to do with CUDA, has a lot to do with the Cooper Union and

Jonathan Lam 3 Dec 18, 2021
TDS 7.2+ (mssql / Microsoft SQL Server) async driver for rust

Tiberius A native Microsoft SQL Server (TDS) client for Rust. Supported SQL Server versions Version Support level Notes 2019 Tested on CI 2017 Tested

Prisma 189 Dec 25, 2022
FeOphant - A SQL database server written in Rust and inspired by PostreSQL.

A PostgreSQL inspired SQL database written in Rust.

Christopher Hotchkiss 27 Dec 7, 2022
GlueSQL is a SQL database library written in Rust

GlueSQL is a SQL database library written in Rust. It provides a parser (sqlparser-rs), execution layer, and optional storage (sled) packaged into a single library.

GlueSQL 2.1k Jan 8, 2023
Fully typed SQL query builder for Rust [deprecated]

What is Deuterium? Deuterium is a fancy SQL builder for Rust. It's designed to provide a DSL to easily build SQL queries in safe and typed way. Like R

Stanislav Panferov 169 Nov 20, 2022
Ormlite - An ORM in Rust for developers that love SQL.

ormlite ormlite is an ORM in Rust for developers that love SQL. It provides the following, while staying close to SQL, both in syntax and performance:

Kurt Wolf 28 Jan 1, 2023
Distributed SQL database in Rust, written as a learning project

toyDB Distributed SQL database in Rust, written as a learning project. Most components are built from scratch, including: Raft-based distributed conse

Erik Grinaker 4.6k Jan 8, 2023
Rust library to parse, deparse and normalize SQL queries using the PostgreSQL query parser

This Rust library uses the actual PostgreSQL server source to parse SQL queries and return the internal PostgreSQL parse tree.

pganalyze 37 Dec 18, 2022
ReefDB is a minimalistic, in-memory and on-disk database management system written in Rust, implementing basic SQL query capabilities and full-text search.

ReefDB ReefDB is a minimalistic, in-memory and on-disk database management system written in Rust, implementing basic SQL query capabilities and full-

Sacha Arbonel 75 Jun 12, 2023
Query LDAP and AD with SQL

SQLDAP Ever wanted to query AD or LDAP with SQL like queries ? I'm going to answer this question myself: yes ! Why ? Because I never could remember al

null 9 Nov 15, 2022
An object-relational in-memory cache, supports queries with an SQL-like query language.

qlcache An object-relational in-memory cache, supports queries with an SQL-like query language. Warning This is a rather low-level library, and only p

null 3 Nov 14, 2021