FeOphant - A SQL database server written in Rust and inspired by PostreSQL.

Overview

FeOphant

A SQL database server written in Rust and inspired by PostreSQL.

We now have support for persistent storage! Not crash safe but I'm getting there!

Latest Build codecov

Website

Launch

Launch the server ./feophant

Lauch a postgres client application to test ./pgbench -h 127.0.0.1 -p 50000 ./psql -h 127.0.0.1 -p 50000

Benchmark to aid in profiling cargo instruments --bench feophant_benchmark -t time

What works user facing

  • Connecting unauthenticated using a postgres client/driver.
  • You can create tables, insert data and query single tables.
  • Data is persisted to disk, not crash safe and the on disk format is NOT stable.

Postgres Divergance

Its kinda pointless to blindly reproduce what has already been done so I'm making the following changes to the db server design vs Postgres.

  • Rust's memory safety and strong type system.

  • Multi-threaded async design based on Tokio instead of Postgres's multi-process design.

    • Perk of this is not needing to manage SYSV shared memory. (Postgres largely fixed this but I think its still worth noting).
  • Want to avoid vaccuum for transaction wrap around. Will try 64-bit transaction IDs but might go to 128-bit.

    • I can avoid the need to freeze Transaction IDs however the hint bits will need scanning to ensure that they are updated.
  • Replacing OIDs with UUIDv4s.

  • I think I've figured out what the core divergeance from Postgres that I'm interested in. I love Postgres's transactional DDLs but version controlling a schema is awful. What if I make the database server a library and your schema is code? You supply a new binary that runs as the database server and if you need to change it you just deploy the binary instead? Then the compiler can optimize out anything you don't need to run the system in your use case. The hardest part is dealing with schema changes that affect your on disk format.

Rust Notes

How to setup modules sanely: https://dev.to/stevepryde/intro-to-rust-modules-3g8k

Reasonable application error type creation: https://github.com/dtolnay/anyhow

Library Errors: https://github.com/dtolnay/thiserror

Rust's inability to treat enum variants as a type is a HUGE pain. I end up having an enum to hold data and another enum to validate and match the sub type. The RFC to fix this was postponed indefinately.

Legal Stuff (Note I'm not a lawyer!)

I am explicitly striving for SQL+Driver compatibility with PostgreSQL so things such as system tables and code that handles them will be named the same. I don't think this violates their trademark policy but if I am please just reach out to me! I have also gone with a pretty restrictive license but I'm not tied to it if that is causing an issue for others who are using the code.

Issues
  • Implement page level locks that are ordered to avoid deadlocking.

    Implement page level locks that are ordered to avoid deadlocking.

    Acceptance Criteria:

    • Should be able to update a row either inside a page or not without loosing commits.
    • This is independent of transaction control so I think this sits below/in row manager.
    opened by chotchki 1
  • Implement Free Space Maps so mutation of data doesn't need a linear scan/parse non stop.

    Implement Free Space Maps so mutation of data doesn't need a linear scan/parse non stop.

    Did more thinking, I should implement postgres's streams concept so that I don't need to do lookups to find associated metadata on an object. I thought I was going to get to use Uuid+page offset. I think its now going to be uuid+page offset+ type.

    struct PageId + enum PageType should do it (done).

    opened by chotchki 0
  • Implement Page Level Locking

    Implement Page Level Locking

    So I have a fully ready free space map but I can't avoid the locking issue anymore despite it also being the next item on the todo list.

    So I started trying to bake in locks and I've hit my normal do I go for a read or write lock approach? I think I may have screwed up the lock manager design, oops too much in silos!

    So what I really need in a lock manager is this:

    • multi-threaded access
    • ability to read page
    • ability to write a page without causing collisions

    I'm debating do I need to put a check-in / check-out layer on top of file manager. So I could expose an api for the following: Read - Easy ReadForUpdate - This is the hard part Ideally Append - Needs to be serialized //Ignoring Delete/shrink for now

    I should treat this as a lesson for not implementing something outside the greater system and realizing I can't integrate it.

    opened by chotchki 0
  • Extract the creation of new page offsets

    Extract the creation of new page offsets

    Currently page offsets are generated as part of the insertion into the file system. However this results in a bad assumption in a couple layers of the stack.

    My issue is that the code ends up assuming when a new page is inserted that it will result in the logical next page.

    If I can avoid the extraction I would prefer it, we'll see.

    opened by chotchki 0
  • Figure out how to test multithreaded code

    Figure out how to test multithreaded code

    A lot of my low level code is very heavy and sensitive to locks. The Tokio project has a library called loom that seems to offer a reasonable way to test this type of code for correctness.

    I suspect I'll have to move away from Github based runners once I do so.

    https://github.com/tokio-rs/loom

    opened by chotchki 0
  • The number of managers in io is excessive

    The number of managers in io is excessive

    I've been using a series of layered providers I've been calling managers and its gotten overly complex.

    I need a better naming convention and package organization structure.

    opened by chotchki 0
  • Need to figure out a better error method to bubble up errors

    Need to figure out a better error method to bubble up errors

    https://github.com/chotchki/feophant/blob/a67e8f736ef8f847276915fa9ea4a70185c2731a/src/codec/network_frame.rs#L96-L96

    opened by chotchki 0
  • Integrate indexes into the system.

    Integrate indexes into the system.

    Right now I have an index manager but its not part of the actual implementation.

    Time to integrate it.

    opened by chotchki 0
  • Fix the index implementation to use the locking layer

    Fix the index implementation to use the locking layer

    https://github.com/chotchki/feophant/blob/12f569038b2d04e8d27727b8472b7dea6a22d17f/src/engine/io/index_manager.rs#L1-L1

    opened by chotchki 0
  • Figure out the longer term license for FeOphant

    Figure out the longer term license for FeOphant

    At the moment I have the license set to AGPL3.0 but that IS NOT conducive to getting people interested in this as a library. Once I get to a MVP I need to consider strongly relicensing to BSD/Apache/MIT something.

    As I write this I might make the decision sooner than later.

    opened by chotchki 0
  • There is an extension that removes the need to lock tables to repack / vaccum. Figure out how it works!

    There is an extension that removes the need to lock tables to repack / vaccum. Figure out how it works!

    • https://github.com/reorg/pg_repack
    • Investigate if the zheap table format would be better to implement.
    • Until I get past a WAL implementation and planner costs I don't think its worth it.
      • Since I extended the size of transaction IDs, I probably have a larger issue on my hands than normal postgres.
        • Reading into the zheap approach I'm thinking that I might have some space saving options availible for me. In particular if a tuple is frozen so its always availible I could remove the xmin/xmax and pack more into the page. Need more thinking however my approach of questioning the storage efficency of each part of data seems to be worth it.
    opened by chotchki 0
  • Fix pg codec DDOS vulnerability

    Fix pg codec DDOS vulnerability

    • The codec that parses the network traffic is pretty naive. You could make the server allocate 2GB of data for a DDOS easily.
      • We should either add state to the codec or change how it parses to produce chunked requests. That means that when the 2GB offer is reached the server can react and terminate before we accept too much data. Its a little more nuanced than that, 2GB input might be okay but we should make decisions based on users and roles.
    bug 
    opened by chotchki 0
  • Figure out the server start should be layered

    Figure out the server start should be layered

    Right now the main function runs the server from primitives. The Tokio Tower layer will probably do it better.

    opened by chotchki 0
  • Implement SQL Updates.

    Implement SQL Updates.

    null

    opened by chotchki 0
Releases(v0.8.0)
  • v0.8.0(Aug 15, 2021)

    This release is a grab bag of support technologies to enable the underlying goal of enforced unique indexes.

    • Switched from IOManager to FileManager which means all storage is persisted now. This is not crash safe. File interactions are designed to scale to a fixed maximum file handle count.
    • Decided the primary key for storage is a Uuid, this will be the basic differentiator to route storage requests. At this time I'm not planning on supporting real tablespaces. If I change my mind, I'll have to plumb it in to this infrastructure.
    • Have real integration tests using a non-feophant Postgres driver.
    • Have the ability to profile the code and already found two places that were unneeded hot spots. "vec![]" is a code smell! You should be able to make a moderate guess on how big it will get.

    Implemented but unintegrated code (which means I'll change it like crazy as I integrate):

    • A lock manager based on weak refs
    • BTree indexes
    Source code(tar.gz)
    Source code(zip)
  • v0.7.1(Aug 1, 2021)

    This release marks the major enhancement of the internal type system.

    • The structs were moved into a better place that aligns with the rest of the code.
    • The serialization process now uses a unified buffer instead of concatenating many small buffers.
    • I can support complex AND array types.
    • There is now a split between table and type which will make transformations easier in queries.
    • I have a strong definition of what each layer should be doing.

    Known design issues.

    • I still have not addressed the lack of well done locking.
    • The state of the client is not tracked in regards to in progress transactions.
    Source code(tar.gz)
    Source code(zip)
  • v0.7.0(Jul 18, 2021)

  • v0.6.0(Jul 14, 2021)

    FeOphant now supports the concept of nullable columns.

    I'll note this was mainly ensuring the parser could support it, the underlying infrastructure was built with null in mind from the beginning.

    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Jul 13, 2021)

  • v0.4.0(Jul 10, 2021)

    FeOphant now supports a simple SQL insert of values.

    A basic pipeline for query execution has been built based on the Postgres model.

    • Query String -> sql_parser -> ParseTree
    • ParseTree -> analyzer -> QueryTree
    • QueryTree -> rewriter -> QueryTree
    • QueryTree -> planner -> PlannedStatement
    • PlannedStatement -> executer -> do stuff
    Source code(tar.gz)
    Source code(zip)
Owner
Christopher Hotchkiss
Christopher Hotchkiss
🧰 The Rust SQL Toolkit. An async, pure Rust SQL crate featuring compile-time checked queries without a DSL. Supports PostgreSQL, MySQL, SQLite, and MSSQL.

SQLx ?? The Rust SQL Toolkit Install | Usage | Docs Built with ❤️ by The LaunchBadge team SQLx is an async, pure Rust† SQL crate featuring compile-tim

launchbadge 4.6k Sep 12, 2021
TDS 7.2+ (mssql / Microsoft SQL Server) async driver for rust

Tiberius A native Microsoft SQL Server (TDS) client for Rust. Supported SQL Server versions Version Support level Notes 2019 Tested on CI 2017 Tested

Prisma 115 Sep 8, 2021
https://crates.io/crates/transistor

Transistor A Rust Crux Client crate/lib. For now, this crate intends to support 2 ways to interact with Crux: Via Docker with a crux-standalone versio

Julia Naomi 27 Aug 31, 2021
A programmable document database inspired by CouchDB written in Rust

PliantDB PliantDB aims to be a Rust-written, ACID-compliant, document-database inspired by CouchDB. While it is inspired by CouchDB, this project will

Khonsu Labs 55 Sep 20, 2021
Skytable is an extremely fast, secure and reliable real-time NoSQL database with automated snapshots and TLS

Skytable is an effort to provide the best of key/value stores, document stores and columnar databases, that is, simplicity, flexibility and queryability at scale. The name 'Skytable' exemplifies our vision to create a database that has limitless possibilities. Skytable was previously known as TerrabaseDB (and then Skybase) and is also nicknamed "STable", "Sky" and "SDB" by the community.

Skytable 417 Sep 18, 2021
Immutable Ordered Key-Value Database Engine

PumpkinDB Build status (Linux) Build status (Windows) Project status Usable, between alpha and beta Production-readiness Depends on your risk toleranc

null 1.3k Sep 14, 2021
Skybase is an extremely fast, secure and reliable real-time NoSQL database with automated snapshots and SSL

Skybase The next-generation NoSQL database What is Skybase? Skybase (or SkybaseDB/SDB) is an effort to provide the best of key/value stores, document

Skybase 411 Sep 6, 2021
🐸Slippi DB ingests Slippi replays and puts the data into a SQLite database for easier parsing.

The primary goal of this project is to make it easier to analyze large amounts of Slippi data. Its end goal is to create something similar to Ballchasing.com but for Melee.

Max Timkovich 15 Jun 19, 2021
LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values. Authors: Sanjay Ghem

Google 26.4k Sep 11, 2021
Materialize simplifies application development with streaming data. Incrementally-updated materialized views - in PostgreSQL and in real time. Materialize is powered by Timely Dataflow.

Materialize is a streaming database for real-time applications. Get started Check out our getting started guide. About Materialize lets you ask questi

Materialize, Inc. 3.1k Sep 17, 2021
The official MongoDB Rust Driver

MongoDB Rust Driver This repository contains the officially supported MongoDB Rust driver, a client side library that can be used to interact with Mon

mongodb 814 Sep 14, 2021
Query LDAP and AD with SQL

SQLDAP Ever wanted to query AD or LDAP with SQL like queries ? I'm going to answer this question myself: yes ! Why ? Because I never could remember al

null 5 Jun 30, 2021
Yet Another Kev-Value DataBase

Yet Another Kev-Value DataBase Extremely simple (simplest possible?) single-file BTree-based key-value database. Build for fun and learning: goal is t

Sergey Melnychuk 16 Aug 30, 2021
SQLite clone from scratch in Rust

Rust-SQLite (SQLRite) Rust-SQLite, aka SQLRite , is a simple embedded database modeled off SQLite, but developed with Rust. The goal is get a better u

João Henrique Machado Silva 722 Sep 17, 2021
UnQLite wrapper 1.0 is avaliable for Rust

unqlite A high-level UnQLite database engine wrapper. NOTE: Some of the documents is stolen from UnQLite Official Website. What is UnQLite? UnQLite is

Huo Linhe 83 Aug 19, 2021
A user crud written in Rust, designed to connect to a MySQL database with full integration test coverage.

SQLX User CRUD Purpose This application demonstrates the how to implement a common design for CRUDs in, potentially, a system of microservices. The de

null 38 Sep 10, 2021
WooriDB

WooriDB USER GUIDE WooriDB is a general purpose (EXPERIMENTAL) time serial database, which means it contains all entities registries indexed by DateTi

Julia Naomi 85 Aug 11, 2021
A Rust SQL query builder with a pleasant fluent API closely imitating actual SQL

Scooby An SQL query builder with a pleasant fluent API closely imitating actual SQL. Meant to comfortably build dynamic queries with a little bit of s

Aleksei Voronov 79 Sep 10, 2021
Experimental blockchain database

A database for the blockchain. Design considerations API The database is a universal key-value storage that supports transactions. It does not support

Parity Technologies 80 Sep 11, 2021