A Distributed SQL Database - Building the Database in the Public to Learn Database Internals

Related tags

Database entangleDB
Overview

Table of Contents

Overview

I'm working on creating entangleDB, a project that's all about really getting to know how databases work from the inside out. My aim is to deeply understand everything about databases, from the big picture down to the small details. It's a way for me to build a strong foundation in database.

The name "entangleDB" is special because it's in honor of a friend who loves databases just as much as I do.

The plan is to write the database in Rust. My main goal is to create something that's not only useful for me to learn from but also helpful for others who are interested in diving deep into how databases work. I'm hoping to make it postgresSQL compatible.

Usage

Pre-requisite is to have the Rust compiler; follow this doc to install the Rust compiler

entangledb cluster can be started on localhost ports 3201 to 3205:

(cd husky/cloud && ./build.sh)

Client can be used to connect with the node on localhost port 3205:

cargo run --release --bin entanglesql

Connected to EntangleDB node "5". Enter !help for instructions.
entangledb> SELECT * FROM dishes;
poha
breads
korma

TODO

  1. Make the isolation level configurable; currently, it is set to repeatable read (snapshot).
  2. Implement partitions, both hash and range types.
  3. Utilize generics throughout in Rust, thereby eliminating the need for std::fmt::Display + Send + Sync.
  4. Consider the use of runtime assertions instead of employing Error::Internal ubiquitously.
  5. Revisit the implementation of time-travel queries

MVCC in entangleDB

image

SQL Query Execution in entangleDB

image

entangleDB Raft Consensus Engine

image

What I am trying to build

1. Distributed Consensus Engine

The design for entangleDB centers around a custom-built consensus engine, intended for high availability in distributed settings. This engine will be crucial in maintaining consistent and reliable state management across various nodes.

A key focus will be on linearizable state machine replication, an essential feature for ensuring data consistency across all nodes, especially for applications that require strong consistency.

2. Transaction Engine

The proposed transaction engine for entangleDB is committed to adhering to ACID properties, ensuring reliability and integrity in every transaction.

The plan includes the implementation of Snapshot Isolation and Serializable Isolation, with the aim of optimizing transaction handling for enhanced concurrency and data integrity.

3. Storage Engine

The planned storage engine for entangleDB will explore a variety of storage formats to find and utilize the most efficient methods for data storage and retrieval.

The storage layer is being designed for flexibility, to support a range of backend technologies and meet diverse storage requirements.

4. Query Engine

The development of the query engine will focus on rapid and effective query processing, utilizing advanced optimization algorithms.

A distinctive feature of entangleDB will be its ability to handle time-travel queries, allowing users to access and analyze data from different historical states.

5. SQL Interface and PostgreSQL Compatibility

The SQL interface for entangleDB is intended to support a wide array of SQL functionalities, including complex queries, joins, aggregates, and window functions.

Compatibility with PostgreSQL’s wire protocol is a goal, to facilitate smooth integration with existing PostgreSQL setups and offer a solid alternative for database system upgrades or migrations.

Proposed Architecture

Screenshot 2023-12-02 at 1 26 15 PM

SQL Engine

The SQL Engine is responsible for the intake and processing of SQL queries. It consists of:

  • SQL Session: The processing pipeline within a session includes:
    • Parser: Interprets SQL queries and converts them into a machine-understandable format.
    • Planner: Devises an execution plan based on the parsed input.
    • Executor: Carries out the plan, accessing and modifying the database.

Adjacent to the session is the:

  • SQL Storage Raft Backend: This component integrates with the Raft consensus protocol to ensure distributed transactions are consistent and resilient.

Raft Engine

The Raft Engine is crucial for maintaining a consistent state across the distributed system:

  • Raft Node: This consensus node confirms that all database transactions are in sync across the network.
  • Raft Log: A record of all transactions agreed upon by the Raft consensus algorithm, which is crucial for data integrity and fault tolerance.

Storage Engine

The Storage Engine is where the actual data is stored and managed:

  • State Machine Driver: Comprising of:
    • State Machine Interface: An intermediary that conveys state changes from the Raft log to the storage layer.
    • Key Value Backend: The primary storage layer, consisting of:
      • Bitcask Engine: A simple, fast on-disk storage system for key-value data.
      • MVCC Storage: Handles multiple versions of data for read-write concurrency control.

entangleDB Peers

  • interaction between multiple database instances or "peers".

Example SQL Queries that you will be able to execute in entangleDB

-- Transaction example with a table creation, data insertion, and selection
BEGIN;

CREATE TABLE employees (id INT PRIMARY KEY, name VARCHAR, department VARCHAR);
INSERT INTO employees VALUES (1, 'Alice', 'Engineering'), (2, 'Bob', 'HR');
SELECT * FROM employees;

COMMIT;

-- Aggregation query with JOIN
SELECT department, AVG(salary) FROM employees JOIN salaries ON employees.id = salaries.emp_id GROUP BY department;

-- Time-travel query
SELECT * FROM employees AS OF SYSTEM TIME '-5m';

Learning Resources I've been using for building the database

For a comprehensive list of resources that have been learning what to build in a distributed database, check out the Learning Resources page.

Comments
  • [chore]: added documentation for running it on local

    [chore]: added documentation for running it on local

    Summary by CodeRabbit

    • Documentation
      • Updated the README with a new table of contents.
      • Added "Usage" section with instructions for starting the cluster and using the command-line client.
      • Included a new section on "MVCC in entangleDB" with an accompanying image link.
    enhancement 
    opened by TypicalDefender 1
  • MVCC in entangleDB

    MVCC in entangleDB

    image

    Summary by CodeRabbit

    • New Features

      • Implemented a new client-server communication module for database interactions.
      • Introduced custom error handling for enhanced application reliability.
      • Developed a new server module supporting Raft consensus and SQL functionalities.
      • Added multiple storage engines, including BitCask and in-memory key/value stores.
      • Established a new storage engine trait for standardized CRUD operations and status checks.
    • Enhancements

      • Added wrappers for serialization and deserialization of data.
      • Created debug helpers for better storage insights during development.
    • Documentation

      • Provided extensive documentation for MVCC (Multi-Version Concurrency Control) system behaviors and anomalies.
    • Maintenance

      • Updated .gitignore to better manage project file exclusions.
      • Reorganized storage module code for improved maintainability.
    • Tests

      • Included comprehensive test cases for storage engine operations and MVCC scenarios.
    enhancement 
    opened by slice-sarthakd 1
  • feat: entangle-sql to interact with the nodes

    feat: entangle-sql to interact with the nodes

    Summary by CodeRabbit

    • New Features
      • Launched EntangleDB server with TCP support for SQL clients and Raft peers.
      • Introduced EntangleSQL client with REPL for database interaction and SQL command execution.
    • Enhancements
      • Implemented server status display and table management in EntangleSQL client.
      • Added multiline input validation for SQL statements in the client interface.
    enhancement 
    opened by slice-sarthakd 1
  • feat: state machine for election

    feat: state machine for election

    image

    Summary by CodeRabbit

    • New Features

      • Introduced a comprehensive Raft log management system.
      • Implemented message handling for Raft node communication.
      • Established core functionality for Raft consensus algorithm access.
      • Added structures and behaviors for Raft node roles: candidate, follower, and leader.
      • Created a Raft server structure for cluster management and event handling.
      • Developed a state management system for the Raft-managed state machine.
    • Documentation

      • Provided extensive inline documentation for new Raft consensus components and their usage.
    • Tests

      • Included tests for candidate, follower, and leader behaviors.
      • Added tests for Raft server operations and state machine management.
    enhancement 
    opened by slice-sarthakd 1
  • Restyle feat: entry point for cluster and husky configured

    Restyle feat: entry point for cluster and husky configured

    Automated style fixes for #2, created by Restyled.

    The following restylers made fixes:

    To incorporate these changes, merge this Pull Request into the original. We recommend using the Squash or Rebase strategies.

    NOTE: As work continues on the original Pull Request, this process will re-run and update (force-push) this Pull Request with updated style fixes as necessary. If the style is fixed manually at any point (i.e. this process finds no fixes to make), this Pull Request will be closed automatically.

    Sorry if this was unexpected. To disable it, see our documentation.

    opened by restyled-io[bot] 1
  • feat: entry point for cluster and husky configured

    feat: entry point for cluster and husky configured

    Summary by CodeRabbit

    • New Features
      • Automated build process for the application with multiple configuration setups.
    • Chores
      • Implemented a cleanup mechanism to maintain system integrity on script exit.
    enhancement 
    opened by slice-sarthakd 1
Releases(v1.0.0)
  • v1.0.0(Dec 25, 2023)

    What's Changed

    • feat: entry point for cluster and husky configured by @slice-sarthakd in https://github.com/TypicalDefender/entangleDB/pull/2
    • feat: state machine for election by @slice-sarthakd in https://github.com/TypicalDefender/entangleDB/pull/4
    • feat: entangle-sql to interact with the nodes by @slice-sarthakd in https://github.com/TypicalDefender/entangleDB/pull/5
    • [feat]: SQL Query Execution in entangleDB by @slice-sarthakd in https://github.com/TypicalDefender/entangleDB/pull/6
    • MVCC in entangleDB by @slice-sarthakd in https://github.com/TypicalDefender/entangleDB/pull/7

    New Contributors

    • @slice-sarthakd made their first contribution in https://github.com/TypicalDefender/entangleDB/pull/2

    Full Changelog: https://github.com/TypicalDefender/entangleDB/commits/v1.0.0

    Source code(tar.gz)
    Source code(zip)
Owner
Sarthak Dalabehera
Just a funny guy!
Sarthak Dalabehera
Distributed, version controlled, SQL database with cryptographically verifiable storage, queries and results. Think git for postgres.

SDB - SignatureDB Distributed, version controlled, SQL database with cryptographically verifiable storage, queries and results. Think git for postgres

Fremantle Industries 5 Apr 26, 2022
open source training courses about distributed database and distributed systemes

Welcome to learn Talent Plan Courses! Talent Plan is an open source training program initiated by PingCAP. It aims to create or combine some open sour

PingCAP 8.3k Dec 30, 2022
🧰 The Rust SQL Toolkit. An async, pure Rust SQL crate featuring compile-time checked queries without a DSL. Supports PostgreSQL, MySQL, SQLite, and MSSQL.

SQLx ?? The Rust SQL Toolkit Install | Usage | Docs Built with ❤️ by The LaunchBadge team SQLx is an async, pure Rust† SQL crate featuring compile-tim

launchbadge 7.6k Dec 31, 2022
A Rust SQL query builder with a pleasant fluent API closely imitating actual SQL

Scooby An SQL query builder with a pleasant fluent API closely imitating actual SQL. Meant to comfortably build dynamic queries with a little bit of s

Aleksei Voronov 100 Nov 11, 2022
Gh-sql - Query GitHub Projects (beta) with SQL

gh-sql: Query GitHub Projects (beta) with SQL Installation gh extension install KOBA789/gh-sql Features SELECT items DELETE items UPDATE item fields

Hidekazu Kobayashi 108 Dec 7, 2022
SQL validator tool for BigQuery standard SQL.

bqvalid What bqvalid does bqvalid is the SQL validator tool for BigQuery standard SQL. bqvalid fails with error message if there's the expression that

null 10 Dec 25, 2022
Learn Rust black magics by implementing basic types in database systems

Type Exercise in Rust (In Chinese) 数据库表达式执行的黑魔法:用 Rust 做类型体操 This is a short lecture on how to use the Rust type system to build necessary components

Alex Chi 996 Jan 3, 2023
FeOphant - A SQL database server written in Rust and inspired by PostreSQL.

A PostgreSQL inspired SQL database written in Rust.

Christopher Hotchkiss 27 Dec 7, 2022
GlueSQL is a SQL database library written in Rust

GlueSQL is a SQL database library written in Rust. It provides a parser (sqlparser-rs), execution layer, and optional storage (sled) packaged into a single library.

GlueSQL 2.1k Jan 8, 2023
RisingWave is a cloud-native streaming database that uses SQL as the interface language.

RisingWave is a cloud-native streaming database that uses SQL as the interface language. It is designed to reduce the complexity and cost of building real-time applications. RisingWave consumes streaming data, performs continuous queries, and updates results dynamically. As a database system, RisingWave maintains results inside its own storage and allows users to access data efficiently.

Singularity Data 3.7k Jan 2, 2023
SQL database to read and write "discord"

GlueSQL Discord Storage After discussing how CI testing will be managed, we plan to move it upstream. Precautions for use discord ToS https://discord.

Jiseok CHOI 9 Feb 28, 2023
ReefDB is a minimalistic, in-memory and on-disk database management system written in Rust, implementing basic SQL query capabilities and full-text search.

ReefDB ReefDB is a minimalistic, in-memory and on-disk database management system written in Rust, implementing basic SQL query capabilities and full-

Sacha Arbonel 75 Jun 12, 2023
Distributed transactional key-value database, originally created to complement TiDB

Website | Documentation | Community Chat TiKV is an open-source, distributed, and transactional key-value database. Unlike other traditional NoSQL sys

TiKV Project 12.4k Jan 3, 2023
small distributed database protocol

clepsydra Overview This is a work-in-progress implementation of a core protocol for a minimalist distributed database. It strives to be as small and s

Graydon Hoare 19 Dec 2, 2021
A scalable, distributed, collaborative, document-graph database, for the realtime web

is the ultimate cloud database for tomorrow's applications Develop easier. Build faster. Scale quicker. What is SurrealDB? SurrealDB is an end-to-end

SurrealDB 16.9k Jan 8, 2023
Embedded Distributed Encrypted Database (Research).

EDED Embedded Distributed Encrypted Database. Research projects to support ESSE. WIP Distributed design features Adapt to personal distributed usecase

Sun 2 Jan 6, 2022
A high-performance, distributed, schema-less, cloud native time-series database

CeresDB is a high-performance, distributed, schema-less, cloud native time-series database that can handle both time-series and analytics workloads.

null 1.8k Dec 30, 2022
The rust client for CeresDB. CeresDB is a high-performance, distributed, schema-less, cloud native time-series database that can handle both time-series and analytics workloads.

The rust client for CeresDB. CeresDB is a high-performance, distributed, schema-less, cloud native time-series database that can handle both time-series and analytics workloads.

null 12 Nov 18, 2022
A template project for building a database-driven microservice in Rust and run it in the WasmEdge sandbox.

Secure & lightweight microservice with a database backend In this repo, we demonstrate a microservice written in Rust, and connected to a MySQL databa

Second State 222 Feb 19, 2023