small distributed database protocol

Related tags

Database clepsydra
Overview

clepsydra

Overview

This is a work-in-progress implementation of a core protocol for a minimalist distributed database. It strives to be as small and simple as possible while attempting to provide relatively challenging features:

  • Strict Serializability

  • Online Reconfiguration

  • Fault tolerance

  • High throughput

The implementation is based on a simplified version of the "Ocean Vista" (OV) protocol, and uses its terminology wherever possible. OV combines replication, transaction commitment and concurrency control into a single protocol.

Summary

The short version of the protocol is:

  • Transactions are represented as deterministic thunks over snapshots.

  • Each transaction is assigned a globally-unique timestamp.

  • Transactions are separated into two phases: S-phase and E-phase.

  • S-phase (storage) consists of coordination-free "blind quorum-writes" replicating the thunks into their MVCC order on each replica.

  • A watermark tracking minimum transaction timestamps-being-written is gossiped between peers, increasing as quorum-writes complete.

  • A transaction only enters E-phase after the watermark advances past it.

  • E-phase (evaluation) quorum-reads and evaluates thunks from consistent snapshots below the watermark, lazily resolving any earlier thunks. Everything below the watermark is coordination-free and deterministic.

Caveats

Nothing's perfect, and this crate is anything but:

  • This crate is very incomplete and does not work yet. Don't use it for anything other than experiments and toys. Recovery, reconfiguration, timeouts and nontrivial fault tolerance paths definitely don't work.

  • It also (somewhat recklessly) attempts to combine OV's reconfiguration and gossip protocols into an instance of the [concorde] reconfigurable lattice agreement protocol. This might not even be theoretically safe.

  • It is much more minimal than the full OV protocol: there's no support for sharding, nor the two-level peer-vs-datacenter locality organization. This crate treats its whole peer group as a single symmetric shard.

  • As a result, performance won't be "webscale" or anything. It will scale vertically if you throw cores at it, but no better, and its latency will always have speed-of-light WAN RTT factors in it. It's distributed for fault tolerance, not horizontal scaling.

  • As with OV, this crate does require partial clock synchronization. It doesn't need to be very tight: clock drift only causes increased latency as the watermarks progress as the minimum of all times; it doesn't affect correctness. Normal weak-NTP-level sync should be ok.

  • As with OV, Calvin, and all deterministic databases: your txns have to be deterministic and must have deterministic read and write sets. If they cannot have their read and write sets statically computed (eg. if they rely on the data to decide read and write set) you have to build slightly awkward multi-phase txns. The term in the literature is "reconnaisance queries".

Reference

Hua Fan and Wojciech Golab. Ocean Vista: Gossip-Based Visibility Control for Speedy Geo-Distributed Transactions. PVLDB, 12(11): 1471-1484, 2019.

DOI: https://doi.org/10.14778/3342263.3342627

http://www.vldb.org/pvldb/vol12/p1471-fan.pdf

Name

Wikipedia:

A water clock or clepsydra (Greek κλεψύδρα from κλέπτειν kleptein, 'to steal'; ὕδωρ hydor, 'water') is any timepiece by which time is measured by the regulated flow of liquid into (inflow type) or out from (outflow type) a vessel, and where the amount is then measured.

License: MIT OR Apache-2.0

You might also like...
tectonicdb is a fast, highly compressed standalone database and streaming protocol for order book ticks.

tectonicdb crate docs.rs crate.io tectonicdb tdb-core tdb-server-core tdb-cli tectonicdb is a fast, highly compressed standalone database and streamin

Teach your PostgreSQL database how to speak MongoDB Wire Protocol
Teach your PostgreSQL database how to speak MongoDB Wire Protocol

“If it looks like MongoDB, swims like MongoDB, and quacks like MongoDB, then it probably is PostgreSQL.” 🙃 Discord | Online Demo | Intro Video | Quic

High performance and distributed KV store w/ REST API. 🦀
High performance and distributed KV store w/ REST API. 🦀

About Lucid KV High performance and distributed KV store w/ REST API. 🦀 Introduction Lucid is an high performance, secure and distributed key-value s

Aggregatable Distributed Key Generation

Aggregatable DKG and VUF WARNING: this code should not be used in production! Implementation of Aggregatable Distributed Key Generation, a distributed

Garage is a lightweight S3-compatible distributed object store

Garage [ Website and documentation | Binary releases | Git repository | Matrix channel ] Garage is a lightweight S3-compatible distributed object stor

ChiselStore is an embeddable, distributed SQLite for Rust, powered by Little Raft.

ChiselStore ChiselStore is an embeddable, distributed SQLite for Rust, powered by Little Raft. SQLite is a fast and compact relational database manage

Canary - Distributed systems library for making communications through the network easier, while keeping minimalism and flexibility.
Canary - Distributed systems library for making communications through the network easier, while keeping minimalism and flexibility.

Canary Canary is a distributed systems and communications framework, focusing on minimalism, ease of use and performance. Development of Canary utiliz

Raft distributed consensus algorithm implemented in Rust.
Raft distributed consensus algorithm implemented in Rust.

Raft Problem and Importance When building a distributed system one principal goal is often to build in fault-tolerance. That is, if one particular nod

Distributed, MVCC SQLite that runs on FoundationDB.

mvSQLite Distributed, MVCC SQLite that runs on top of FoundationDB. Documentation mvSQLite Features Releases Quick reference Try it Contributing Featu

Owner
Graydon Hoare
Graydon Hoare
open source training courses about distributed database and distributed systemes

Welcome to learn Talent Plan Courses! Talent Plan is an open source training program initiated by PingCAP. It aims to create or combine some open sour

PingCAP 8.3k Dec 30, 2022
Distributed transactional key-value database, originally created to complement TiDB

Website | Documentation | Community Chat TiKV is an open-source, distributed, and transactional key-value database. Unlike other traditional NoSQL sys

TiKV Project 12.4k Jan 3, 2023
A scalable, distributed, collaborative, document-graph database, for the realtime web

is the ultimate cloud database for tomorrow's applications Develop easier. Build faster. Scale quicker. What is SurrealDB? SurrealDB is an end-to-end

SurrealDB 16.9k Jan 8, 2023
Distributed SQL database in Rust, written as a learning project

toyDB Distributed SQL database in Rust, written as a learning project. Most components are built from scratch, including: Raft-based distributed conse

Erik Grinaker 4.6k Jan 8, 2023
Distributed, version controlled, SQL database with cryptographically verifiable storage, queries and results. Think git for postgres.

SDB - SignatureDB Distributed, version controlled, SQL database with cryptographically verifiable storage, queries and results. Think git for postgres

Fremantle Industries 5 Apr 26, 2022
Embedded Distributed Encrypted Database (Research).

EDED Embedded Distributed Encrypted Database. Research projects to support ESSE. WIP Distributed design features Adapt to personal distributed usecase

Sun 2 Jan 6, 2022
A high-performance, distributed, schema-less, cloud native time-series database

CeresDB is a high-performance, distributed, schema-less, cloud native time-series database that can handle both time-series and analytics workloads.

null 1.8k Dec 30, 2022
The rust client for CeresDB. CeresDB is a high-performance, distributed, schema-less, cloud native time-series database that can handle both time-series and analytics workloads.

The rust client for CeresDB. CeresDB is a high-performance, distributed, schema-less, cloud native time-series database that can handle both time-series and analytics workloads.

null 12 Nov 18, 2022
This is a small demo of how to transform a simple single-server RocksDB service written in Rust into a distributed version using OmniPaxos.

OmniPaxos Demo This is a small demo of how to transform a simple single-server RocksDB service into a distributed version using OmniPaxos. Related res

Harald Ng 6 Jun 28, 2023
A small rust database that uses json in memory.

Rust Small Database (RSDB) RSDB is a small library for creating a query-able database that is encoded with json. The library is well tested (~96.30% c

Kace Cottam 2 Jan 4, 2022