Distributed SQL database in Rust, written as a learning project

Erik Grinaker

Last update: Jan 8, 2023

Related tags

Database rust sql database raft mvcc distributed

Overview

toyDB

Distributed SQL database in Rust, written as a learning project. Most components are built from scratch, including:

Raft-based distributed consensus engine for linearizable state machine replication.
ACID-compliant transaction engine with MVCC-based snapshot isolation.
Pluggable storage engine with B+tree and log-structured backends.
Iterator-based query engine with heuristic optimization and time-travel support.
SQL interface including projections, filters, joins, aggregates, and transactions.

toyDB is not suitable for real-world use, but may be of interest to others learning about database internals.

Documentation

Architecture guide: a guide to toyDB's architecture and implementation.
SQL examples: comprehensive examples of toyDB's SQL features.
SQL reference: detailed reference documentation for toyDB's SQL dialect.
References: books and other research material used while building toyDB.

Usage

With a Rust compiler installed, a local five-node cluster can be started on localhost ports 9601 to 9605:

$ (cd clusters/local && ./run.sh)

A command-line client can be built and used with the node on localhost port 9605:

$ cargo run --release --bin toysql
Connected to toyDB node "toydb-e". Enter !help for instructions.
toydb> CREATE TABLE movies (id INTEGER PRIMARY KEY, title VARCHAR NOT NULL);
toydb> INSERT INTO movies VALUES (1, 'Sicario'), (2, 'Stalker'), (3, 'Her');
toydb> SELECT * FROM movies;
1|Sicario
2|Stalker
3|Her

toyDB supports most common SQL features, including joins, aggregates, and ACID transactions.

Architecture

toyDB's architecture is fairly typical for distributed SQL databases: a transactional key/value store managed by a Raft cluster with a SQL query engine on top. See the architecture guide for more details.

Tests

toyDB has decent test coverage, with about a thousand tests of core functionality. These consist of in-code unit-tests for many low-level components, golden master integration tests of the SQL engine under tests/sql, and a basic set of end-to-end cluster tests under tests/. Jepsen tests, or similar system-wide correctness and reliability tests, are desirable but not yet implemented.

Execute cargo test to run all tests, or check out the latest CI run.

Performance

Performance is not a primary goal of toyDB, but it has a bank simulation as a basic gauge of throughput and correctness. This creates a set of customers and accounts, and spawns several concurrent workers that make random transfers between them, retrying serialization failures and verifying invariants:

$ cargo run --release --bin bank
Created 100 customers (1000 accounts) in 0.123s
Verified that total balance is 100000 with no negative balances

Thread 0 transferred   18 from  92 (0911) to 100 (0994) in 0.007s (1 attempts)
Thread 1 transferred   84 from  61 (0601) to  85 (0843) in 0.007s (1 attempts)
Thread 3 transferred   15 from  40 (0393) to  62 (0614) in 0.007s (1 attempts)
[...]
Thread 6 transferred   48 from  78 (0777) to  52 (0513) in 0.004s (1 attempts)
Thread 3 transferred   57 from  93 (0921) to  19 (0188) in 0.065s (2 attempts)
Thread 4 transferred   70 from  35 (0347) to  49 (0484) in 0.068s (2 attempts)

Ran 1000 transactions in 0.937s (1067.691/s)
Verified that total balance is 100000 with no negative balances

The informal target was 100 transactions per second, and these results exceed that by an order of magnitude. For an unoptimized implementation, this is certainly "good enough". However, this is with a single node and fsync disabled - the table below shows results for other configurations, revealing clear potential for improvement:

	`sync: false`	`sync: true`
1 node	1067 txn/s	38 txn/s
5 nodes	417 txn/s	19 txn/s

Note that each transaction consists of six statements, including joins, not just a single update:

BEGIN;

-- Find the sender account with the highest balance
SELECT a.id, a.balance
FROM account a JOIN customer c ON a.customer_id = c.id
WHERE c.id = {sender}
ORDER BY a.balance DESC
LIMIT 1;

-- Find the receiver account with the lowest balance
SELECT a.id, a.balance
FROM account a JOIN customer c ON a.customer_id = c.id
WHERE c.id = {receiver}
ORDER BY a.balance ASC
LIMIT 1;

-- Transfer a random amount within the sender's balance to the receiver
UPDATE account SET balance = balance - {amount} WHERE id = {source};
UPDATE account SET balance = balance + {amount} WHERE id = {destination};

COMMIT;

Credits

toyDB logo is courtesy of @jonasmerlin.

Comments

Add debugging instructions

This adds instructions on how to set up debugging environment in VS Code for both, the engine and the sql client. Debugging specific tests works great as well.

Context: when I first started working my way through the toydb codebase, it was hard to understand what's going on (even though the docs, the tests, and the code itself are beautifully written!) without a debugger, and I couldn't find instructions on how to debug the code in the repo itself. I ended up googling around and setting up debugging environments on two laptops. With this PR I want to make it easier for the new engineers to onboard.

opened by zaaath 5
add logo

Hey! So to learn how to create vector graphics I set myself the goal of designing a logo for an OSS project every week. While searching for a fitting first candidate I stumbled upon your repo and decided to give it a shot. You can find the result in this PR. The thinking behind the design was to lean into the toy theme a bit. So I took a classic database visualisation and "built it out of toy blocks." The idea behind this would be to use it as a logo in the README and such.

It's not perfect but showing you the result and "releasing" it that way is part of the goal. 🙂 If you don't like it or it doesn't fit your vision for the projects, that's fine! But I would love to hear what you think. I feel like the logo works best for smaller sizes btw.

Oh and cool project btw. 😄

opened by jonasmerlin 3
fix for no more mod.rs and a compile warning

As of rust 2018, it's not recomanded to write mod.rs file in each module, so i convert these mod.rs to the newer style.

And also when i build this repo with toolchain(rustc 1.52.0-nightly (d1206f950 2021-02-15)), there is a compile warning says: warning: unnecessary trailing semicolon, fix it too.

opened by pkking 2
Improve SQL key encoding

All storage keys (even individual rows and index entries) currently use full identifiers for tables and columns, they should use integer identifiers instead. They must also escape separators.
performance correctness

opened by erikgrinaker 2
test cluster::isolation::anomaly_dirty_read blocks indefinitely
It looks like:

Running target/debug/deps/tests-28925d9acf338694 running 1 test test cluster::isolation::anomaly_dirty_read ...

It seems blocked within tokio, but I am not falimiar with tokio.
opened by bud-adamas 1
I like this repo

Hi Erik, Im new to rust and database and i think this repo is a amazing project for learning both things. Want to contribute some code but i did not find any code of conduct or CONTRIBUTING things. Is there anything i should notice about that?

opened by pkking 1
Can't connect to the database
I've cloned the toydb repository and do a cargo build --release successful. I then went into target/release directory and try to run ./toysql and got the following error message:

Error: Internal("No connection could be made because the target machine actively refused it. (os error 10061)")

I run the same command with administrator privilege and got the same error message. I'm on Windows 10 x64 Version 2004 OS build 20201.1000.
opened by Tembocs 1
add logo as .svg

Hey! So I exported the logo as .svg. It took longer than expected because there were some things that would've needed to be rasterized, which I then had to take out. It should resize now without blurring though. 😄

Again, super glad you like it. If you ever want something changed etc. just hit me up on here. And congrats on your success with this project! Each time I look, the stars have made another jump.

opened by jonasmerlin 1
Client pipelining

The client should support pipelining: https://docs.rs/tokio-postgres/0.5.3/tokio_postgres/index.html#pipelining

This would e.g. be useful for the bank simulation setup.
performance

opened by erikgrinaker 1
Schema cache

The KV SQL engine should cache schema lookups.

This needs to be implemented at the MVCC level, so that versioning works - probably by specifying prefixes that should be cached. The downside of this is that it needs to go through serialization.
performance

opened by erikgrinaker 1
On-disk B+tree store

By default, toyDB uses an on-disk Raft log for persistence and an in-memory B+tree key-value store for SQL state data. It might be interesting to build an on-disk B+tree key-value store as well.

opened by erikgrinaker 0
Iterators should be O(1), streaming, and borrowless

Iterators are currently rather hacky and inefficient, and e.g. use O(log n) lookups per next() call (B+tree and MVCC) or buffer all results (Raft and SQL engines). Ideally, iterators should have O(1) complexity when calling next(), stream all results (with some amount of IO buffering), and don't hold read borrows to the entire data structure for the lifetime of the iterator.
performance polish

opened by erikgrinaker 0

Owner

Erik Grinaker

Cache rules everything around me.

GitHub

Distributed, version controlled, SQL database with cryptographically verifiable storage, queries and results. Think git for postgres.

SDB - SignatureDB Distributed, version controlled, SQL database with cryptographically verifiable storage, queries and results. Think git for postgres

5 Apr 26, 2022

open source training courses about distributed database and distributed systemes

Welcome to learn Talent Plan Courses! Talent Plan is an open source training program initiated by PingCAP. It aims to create or combine some open sour

8.3k Dec 30, 2022

🧰 The Rust SQL Toolkit. An async, pure Rust SQL crate featuring compile-time checked queries without a DSL. Supports PostgreSQL, MySQL, SQLite, and MSSQL.

SQLx ?? The Rust SQL Toolkit Install | Usage | Docs Built with ❤️ by The LaunchBadge team SQLx is an async, pure Rust† SQL crate featuring compile-tim

7.6k Dec 31, 2022

A Rust SQL query builder with a pleasant fluent API closely imitating actual SQL

Scooby An SQL query builder with a pleasant fluent API closely imitating actual SQL. Meant to comfortably build dynamic queries with a little bit of s

100 Nov 11, 2022

Gh-sql - Query GitHub Projects (beta) with SQL

gh-sql: Query GitHub Projects (beta) with SQL Installation gh extension install KOBA789/gh-sql Features SELECT items DELETE items UPDATE item fields

108 Dec 7, 2022

SQL validator tool for BigQuery standard SQL.

bqvalid What bqvalid does bqvalid is the SQL validator tool for BigQuery standard SQL. bqvalid fails with error message if there's the expression that

10 Dec 25, 2022

FeOphant - A SQL database server written in Rust and inspired by PostreSQL.

A PostgreSQL inspired SQL database written in Rust.

27 Dec 7, 2022

GlueSQL is a SQL database library written in Rust

GlueSQL is a SQL database library written in Rust. It provides a parser (sqlparser-rs), execution layer, and optional storage (sled) packaged into a single library.

2.1k Jan 8, 2023

ReefDB is a minimalistic, in-memory and on-disk database management system written in Rust, implementing basic SQL query capabilities and full-text search.

ReefDB ReefDB is a minimalistic, in-memory and on-disk database management system written in Rust, implementing basic SQL query capabilities and full-

75 Jun 12, 2023

The rust client for CeresDB. CeresDB is a high-performance, distributed, schema-less, cloud native time-series database that can handle both time-series and analytics workloads.

12 Nov 18, 2022

RisingWave is a cloud-native streaming database that uses SQL as the interface language.

RisingWave is a cloud-native streaming database that uses SQL as the interface language. It is designed to reduce the complexity and cost of building real-time applications. RisingWave consumes streaming data, performs continuous queries, and updates results dynamically. As a database system, RisingWave maintains results inside its own storage and allows users to access data efficiently.

3.7k Jan 2, 2023

Distributed SQL database in Rust, written as a learning project

Related tags

Overview

toyDB

Documentation

Usage

Architecture

Tests

Performance

Credits

Comments

Owner

Erik Grinaker

Distributed, version controlled, SQL database with cryptographically verifiable storage, queries and results. Think git for postgres.

open source training courses about distributed database and distributed systemes

🧰 The Rust SQL Toolkit. An async, pure Rust SQL crate featuring compile-time checked queries without a DSL. Supports PostgreSQL, MySQL, SQLite, and MSSQL.

A Rust SQL query builder with a pleasant fluent API closely imitating actual SQL

Gh-sql - Query GitHub Projects (beta) with SQL

SQL validator tool for BigQuery standard SQL.

FeOphant - A SQL database server written in Rust and inspired by PostreSQL.

GlueSQL is a SQL database library written in Rust

ReefDB is a minimalistic, in-memory and on-disk database management system written in Rust, implementing basic SQL query capabilities and full-text search.

The rust client for CeresDB. CeresDB is a high-performance, distributed, schema-less, cloud native time-series database that can handle both time-series and analytics workloads.

RisingWave is a cloud-native streaming database that uses SQL as the interface language.

SQL database to read and write "discord"

Distributed transactional key-value database, originally created to complement TiDB

small distributed database protocol

A scalable, distributed, collaborative, document-graph database, for the realtime web

Embedded Distributed Encrypted Database (Research).

A high-performance, distributed, schema-less, cloud native time-series database

A simple embedded key-value store written in rust as a learning project

A template project for building a database-driven microservice in Rust and run it in the WasmEdge sandbox.