A Distributed SQL Database - Building the Database in the Public to Learn Database Internals

Sarthak Dalabehera

Last update: Jan 2, 2024

Related tags

Database entangleDB

Overview

Overview
Usage
TODO
MVCC in entangleDB
SQL Query Execution in entangleDB
entangleDB Raft Consensus Engine
What I am trying to build
Proposed Architecture
SQL Engine
Raft Engine
Storage Engine
entangleDB Peers
Example SQL Queries that you will be able to execute in entangleDB
Learning Resources I've been using for building the database

Overview

I'm working on creating entangleDB, a project that's all about really getting to know how databases work from the inside out. My aim is to deeply understand everything about databases, from the big picture down to the small details. It's a way for me to build a strong foundation in database.

The name "entangleDB" is special because it's in honor of a friend who loves databases just as much as I do.

The plan is to write the database in Rust. My main goal is to create something that's not only useful for me to learn from but also helpful for others who are interested in diving deep into how databases work. I'm hoping to make it postgresSQL compatible.

Usage

Pre-requisite is to have the Rust compiler; follow this doc to install the Rust compiler

entangledb cluster can be started on localhost ports 3201 to 3205:

(cd husky/cloud && ./build.sh)

Client can be used to connect with the node on localhost port 3205:

cargo run --release --bin entanglesql

Connected to EntangleDB node "5". Enter !help for instructions.
entangledb> SELECT * FROM dishes;
poha
breads
korma

TODO

Make the isolation level configurable; currently, it is set to repeatable read (snapshot).
Implement partitions, both hash and range types.
Utilize generics throughout in Rust, thereby eliminating the need for std::fmt::Display + Send + Sync.
Consider the use of runtime assertions instead of employing Error::Internal ubiquitously.
Revisit the implementation of time-travel queries

MVCC in entangleDB

SQL Query Execution in entangleDB

entangleDB Raft Consensus Engine

What I am trying to build

1. Distributed Consensus Engine

The design for entangleDB centers around a custom-built consensus engine, intended for high availability in distributed settings. This engine will be crucial in maintaining consistent and reliable state management across various nodes.

A key focus will be on linearizable state machine replication, an essential feature for ensuring data consistency across all nodes, especially for applications that require strong consistency.

2. Transaction Engine

The proposed transaction engine for entangleDB is committed to adhering to ACID properties, ensuring reliability and integrity in every transaction.

The plan includes the implementation of Snapshot Isolation and Serializable Isolation, with the aim of optimizing transaction handling for enhanced concurrency and data integrity.

3. Storage Engine

The planned storage engine for entangleDB will explore a variety of storage formats to find and utilize the most efficient methods for data storage and retrieval.

The storage layer is being designed for flexibility, to support a range of backend technologies and meet diverse storage requirements.

4. Query Engine

The development of the query engine will focus on rapid and effective query processing, utilizing advanced optimization algorithms.

A distinctive feature of entangleDB will be its ability to handle time-travel queries, allowing users to access and analyze data from different historical states.

5. SQL Interface and PostgreSQL Compatibility

The SQL interface for entangleDB is intended to support a wide array of SQL functionalities, including complex queries, joins, aggregates, and window functions.

Compatibility with PostgreSQL’s wire protocol is a goal, to facilitate smooth integration with existing PostgreSQL setups and offer a solid alternative for database system upgrades or migrations.

Proposed Architecture

SQL Engine

The SQL Engine is responsible for the intake and processing of SQL queries. It consists of:

SQL Session: The processing pipeline within a session includes:
- Parser: Interprets SQL queries and converts them into a machine-understandable format.
- Planner: Devises an execution plan based on the parsed input.
- Executor: Carries out the plan, accessing and modifying the database.

Adjacent to the session is the:

SQL Storage Raft Backend: This component integrates with the Raft consensus protocol to ensure distributed transactions are consistent and resilient.

Raft Engine

The Raft Engine is crucial for maintaining a consistent state across the distributed system:

Raft Node: This consensus node confirms that all database transactions are in sync across the network.
Raft Log: A record of all transactions agreed upon by the Raft consensus algorithm, which is crucial for data integrity and fault tolerance.

Storage Engine

The Storage Engine is where the actual data is stored and managed:

State Machine Driver: Comprising of:
- State Machine Interface: An intermediary that conveys state changes from the Raft log to the storage layer.
- Key Value Backend: The primary storage layer, consisting of:
  - Bitcask Engine: A simple, fast on-disk storage system for key-value data.
  - MVCC Storage: Handles multiple versions of data for read-write concurrency control.

entangleDB Peers

interaction between multiple database instances or "peers".

Example SQL Queries that you will be able to execute in entangleDB

-- Transaction example with a table creation, data insertion, and selection
BEGIN;

CREATE TABLE employees (id INT PRIMARY KEY, name VARCHAR, department VARCHAR);
INSERT INTO employees VALUES (1, 'Alice', 'Engineering'), (2, 'Bob', 'HR');
SELECT * FROM employees;

COMMIT;

-- Aggregation query with JOIN
SELECT department, AVG(salary) FROM employees JOIN salaries ON employees.id = salaries.emp_id GROUP BY department;

-- Time-travel query
SELECT * FROM employees AS OF SYSTEM TIME '-5m';

Learning Resources I've been using for building the database

For a comprehensive list of resources that have been learning what to build in a distributed database, check out the Learning Resources page.

Comments

[chore]: added documentation for running it on local
Summary by CodeRabbit

Documentation

Updated the README with a new table of contents.

Added "Usage" section with instructions for starting the cluster and using the command-line client.

Included a new section on "MVCC in entangleDB" with an accompanying image link.

enhancement
opened by TypicalDefender 1
MVCC in entangleDB
Summary by CodeRabbit

New Features

Implemented a new client-server communication module for database interactions.

Introduced custom error handling for enhanced application reliability.

Developed a new server module supporting Raft consensus and SQL functionalities.

Added multiple storage engines, including BitCask and in-memory key/value stores.

Established a new storage engine trait for standardized CRUD operations and status checks.

Enhancements

Added wrappers for serialization and deserialization of data.

Created debug helpers for better storage insights during development.

Documentation

Provided extensive documentation for MVCC (Multi-Version Concurrency Control) system behaviors and anomalies.

Maintenance

Updated .gitignore to better manage project file exclusions.

Reorganized storage module code for improved maintainability.

Tests

Included comprehensive test cases for storage engine operations and MVCC scenarios.

enhancement
opened by slice-sarthakd 1
feat: entangle-sql to interact with the nodes
Summary by CodeRabbit

New Features

Launched EntangleDB server with TCP support for SQL clients and Raft peers.

Introduced EntangleSQL client with REPL for database interaction and SQL command execution.

Enhancements

Implemented server status display and table management in EntangleSQL client.

Added multiline input validation for SQL statements in the client interface.

enhancement
opened by slice-sarthakd 1
feat: state machine for election
Summary by CodeRabbit

New Features

Introduced a comprehensive Raft log management system.

Implemented message handling for Raft node communication.

Established core functionality for Raft consensus algorithm access.

Added structures and behaviors for Raft node roles: candidate, follower, and leader.

Created a Raft server structure for cluster management and event handling.

Developed a state management system for the Raft-managed state machine.

Documentation

Provided extensive inline documentation for new Raft consensus components and their usage.

Tests

Included tests for candidate, follower, and leader behaviors.

Added tests for Raft server operations and state machine management.

enhancement
opened by slice-sarthakd 1
Restyle feat: entry point for cluster and husky configured
Automated style fixes for #2, created by Restyled.

The following restylers made fixes:

prettier-yaml

shellharden

shfmt

To incorporate these changes, merge this Pull Request into the original. We recommend using the Squash or Rebase strategies.

NOTE: As work continues on the original Pull Request, this process will re-run and update (force-push) this Pull Request with updated style fixes as necessary. If the style is fixed manually at any point (i.e. this process finds no fixes to make), this Pull Request will be closed automatically.

Sorry if this was unexpected. To disable it, see our documentation.
opened by restyled-io[bot] 1
feat: entry point for cluster and husky configured
Summary by CodeRabbit

New Features

Automated build process for the application with multiple configuration setups.

Chores

Implemented a cleanup mechanism to maintain system integrity on script exit.

enhancement
opened by slice-sarthakd 1

Releases(v1.0.0)

v1.0.0(Dec 25, 2023)
What's Changed

feat: entry point for cluster and husky configured by @slice-sarthakd in https://github.com/TypicalDefender/entangleDB/pull/2

feat: state machine for election by @slice-sarthakd in https://github.com/TypicalDefender/entangleDB/pull/4

feat: entangle-sql to interact with the nodes by @slice-sarthakd in https://github.com/TypicalDefender/entangleDB/pull/5

[feat]: SQL Query Execution in entangleDB by @slice-sarthakd in https://github.com/TypicalDefender/entangleDB/pull/6

MVCC in entangleDB by @slice-sarthakd in https://github.com/TypicalDefender/entangleDB/pull/7

New Contributors

@slice-sarthakd made their first contribution in https://github.com/TypicalDefender/entangleDB/pull/2

Full Changelog: https://github.com/TypicalDefender/entangleDB/commits/v1.0.0
Source code(tar.gz)
Source code(zip)

Owner

Sarthak Dalabehera

Just a funny guy!

GitHub

Distributed, version controlled, SQL database with cryptographically verifiable storage, queries and results. Think git for postgres.

SDB - SignatureDB Distributed, version controlled, SQL database with cryptographically verifiable storage, queries and results. Think git for postgres

5 Apr 26, 2022

open source training courses about distributed database and distributed systemes

Welcome to learn Talent Plan Courses! Talent Plan is an open source training program initiated by PingCAP. It aims to create or combine some open sour

8.3k Dec 30, 2022

🧰 The Rust SQL Toolkit. An async, pure Rust SQL crate featuring compile-time checked queries without a DSL. Supports PostgreSQL, MySQL, SQLite, and MSSQL.

SQLx ?? The Rust SQL Toolkit Install | Usage | Docs Built with ❤️ by The LaunchBadge team SQLx is an async, pure Rust† SQL crate featuring compile-tim

7.6k Dec 31, 2022

A Rust SQL query builder with a pleasant fluent API closely imitating actual SQL

Scooby An SQL query builder with a pleasant fluent API closely imitating actual SQL. Meant to comfortably build dynamic queries with a little bit of s

100 Nov 11, 2022

Gh-sql - Query GitHub Projects (beta) with SQL

gh-sql: Query GitHub Projects (beta) with SQL Installation gh extension install KOBA789/gh-sql Features SELECT items DELETE items UPDATE item fields

108 Dec 7, 2022

SQL validator tool for BigQuery standard SQL.

bqvalid What bqvalid does bqvalid is the SQL validator tool for BigQuery standard SQL. bqvalid fails with error message if there's the expression that

10 Dec 25, 2022

Learn Rust black magics by implementing basic types in database systems

Type Exercise in Rust (In Chinese) 数据库表达式执行的黑魔法：用 Rust 做类型体操 This is a short lecture on how to use the Rust type system to build necessary components

996 Jan 3, 2023

FeOphant - A SQL database server written in Rust and inspired by PostreSQL.

A PostgreSQL inspired SQL database written in Rust.

27 Dec 7, 2022

GlueSQL is a SQL database library written in Rust

GlueSQL is a SQL database library written in Rust. It provides a parser (sqlparser-rs), execution layer, and optional storage (sled) packaged into a single library.

2.1k Jan 8, 2023

RisingWave is a cloud-native streaming database that uses SQL as the interface language.

RisingWave is a cloud-native streaming database that uses SQL as the interface language. It is designed to reduce the complexity and cost of building real-time applications. RisingWave consumes streaming data, performs continuous queries, and updates results dynamically. As a database system, RisingWave maintains results inside its own storage and allows users to access data efficiently.

3.7k Jan 2, 2023

SQL database to read and write "discord"

GlueSQL Discord Storage After discussing how CI testing will be managed, we plan to move it upstream. Precautions for use discord ToS https://discord.

9 Feb 28, 2023

ReefDB is a minimalistic, in-memory and on-disk database management system written in Rust, implementing basic SQL query capabilities and full-text search.

ReefDB ReefDB is a minimalistic, in-memory and on-disk database management system written in Rust, implementing basic SQL query capabilities and full-

75 Jun 12, 2023

The rust client for CeresDB. CeresDB is a high-performance, distributed, schema-less, cloud native time-series database that can handle both time-series and analytics workloads.

12 Nov 18, 2022

A template project for building a database-driven microservice in Rust and run it in the WasmEdge sandbox.

Secure & lightweight microservice with a database backend In this repo, we demonstrate a microservice written in Rust, and connected to a MySQL databa

222 Feb 19, 2023

A Distributed SQL Database - Building the Database in the Public to Learn Database Internals

Related tags

Overview

Table of Contents

Overview

Usage

TODO

MVCC in entangleDB

SQL Query Execution in entangleDB

entangleDB Raft Consensus Engine

What I am trying to build

1. Distributed Consensus Engine

2. Transaction Engine

3. Storage Engine

4. Query Engine

5. SQL Interface and PostgreSQL Compatibility

Proposed Architecture

SQL Engine

Raft Engine

Storage Engine

entangleDB Peers

Example SQL Queries that you will be able to execute in entangleDB

Learning Resources I've been using for building the database

Comments

[chore]: added documentation for running it on local

Summary by CodeRabbit

MVCC in entangleDB

Summary by CodeRabbit

feat: entangle-sql to interact with the nodes

Summary by CodeRabbit

feat: state machine for election

Summary by CodeRabbit

Restyle feat: entry point for cluster and husky configured

feat: entry point for cluster and husky configured

Summary by CodeRabbit

Releases(v1.0.0)

v1.0.0(Dec 25, 2023)

What's Changed

New Contributors

Owner

Sarthak Dalabehera

Distributed, version controlled, SQL database with cryptographically verifiable storage, queries and results. Think git for postgres.

open source training courses about distributed database and distributed systemes

🧰 The Rust SQL Toolkit. An async, pure Rust SQL crate featuring compile-time checked queries without a DSL. Supports PostgreSQL, MySQL, SQLite, and MSSQL.

A Rust SQL query builder with a pleasant fluent API closely imitating actual SQL

Gh-sql - Query GitHub Projects (beta) with SQL

SQL validator tool for BigQuery standard SQL.

Learn Rust black magics by implementing basic types in database systems

FeOphant - A SQL database server written in Rust and inspired by PostreSQL.

GlueSQL is a SQL database library written in Rust

RisingWave is a cloud-native streaming database that uses SQL as the interface language.

SQL database to read and write "discord"

ReefDB is a minimalistic, in-memory and on-disk database management system written in Rust, implementing basic SQL query capabilities and full-text search.

Distributed transactional key-value database, originally created to complement TiDB

small distributed database protocol

A scalable, distributed, collaborative, document-graph database, for the realtime web

Embedded Distributed Encrypted Database (Research).

A high-performance, distributed, schema-less, cloud native time-series database

The rust client for CeresDB. CeresDB is a high-performance, distributed, schema-less, cloud native time-series database that can handle both time-series and analytics workloads.

A template project for building a database-driven microservice in Rust and run it in the WasmEdge sandbox.