Immutable Ordered Key-Value Database Engine

Last update: May 26, 2022

PumpkinDB

Gitter chat Code Triagers OpenCollective OpenCollective

Build status (Linux) Build Status
Build status (Windows) Windows Build status
Project status Usable, between alpha and beta
Production-readiness Depends on your risk tolerance

PumpkinDB is an immutable ordered key-value database engine, featuring:

  • ACID transactions
  • Persistent storage
  • An embedded programming language (PumpkinScript)
  • Binary keys and values (allows any encoding to be used: JSON, XML, Protobuf, Cap'n Proto, etc.)
  • Standalone and embedded scenarios

Why immutable?

Simply put, the data replaced is data deleted and is therefore, an unsafe way to manage data. Bugs, misunderstanding, changing scope and requirements and other factors might influence what data (and especially past data) means and how can it be used.

By guaranteeing the immutability of key's value once it is set, PumpkinDB forces its users to think of their data through a temporal perspective.

This approach is highly beneficial for implementing event sourcing and similar types of architectures.

What is PumpkinDB?

PumpkinDB is essentially a database programming environment, largely inspired by core ideas behind MUMPS. Instead of M, it has a Forth-inspired stack-based language, PumpkinScript. Instead of hierarchical keys, it has a flat key namespace and doesn't allow overriding values once they are set. Core motivation for immutability was that with the cost of storage declining, erasing data is effectively a strategical mistake.

While not intended for general purpose programming, its main objective is to facilitate building specialized application-specific and generic databases with a particular focus on immutability and processing data as close to storage as possible, incurring as little communication penalty as possible.

Applications communicate with PumpkinDB by sending small PumpkinScript programs over a network interface (or API when using PumpkinDB as an embedded solution).

PumpkinDB offers a wide array of primitives for concurrency, storage, journalling, indexing and other common building blocks.

Why is it a database engine?

The core ideas behind PumpkinDB stem from the so called lazy event sourcing approach which is based on storing and indexing events while delaying domain binding for as long as possible. That said, the intention of this database is to be a building block for different kinds of architectures, be it classic event sourcing (using it as an event store), lazy event sourcing (using indices) or anything else. It's also possible to implement different approaches within a single database for different parts of the domain.

Instead of devising custom protocols for talking to PumpkinDB, the protocol of communication has become a pipeline to a script executor. This offers us enormous extension and flexibility capabilities.

While an external application can talk to PumpkinDB over a network connection, PumpkinDB's engine itself is embeddable and can be used directly. Currenly, it is available for Rust applications only, but this may one day extend to all languages that can interface with C.

Client libraries

Language Library Status
Rust pumpkindb_client Early release (0.2.0)
Java pumpkindb-client Pre-release

Trying it out

You can download PumpkinDB releases from GitHub.

Docker

You can try out latest PumpkinDB HEAD revision by using a docker image:

$ docker pull pumpkindb/pumpkindb

Alternatively, you can build the image yourself:

$ docker build . -t pumpkindb/pumpkindb

Run the server:

$ docker run -p 9981:9981 -ti pumpkindb/pumpkindb
2017-04-12T02:52:47.440873517+00:00 WARN pumpkindb - No logging configuration specified, switching to console logging
2017-04-12T02:52:47.440983318+00:00 INFO pumpkindb - Starting up
2017-04-12T02:52:47.441122740+00:00 INFO pumpkindb_engine::storage - Available disk space is approx. 56Gb, setting database map size to it
2017-04-12T02:52:47.441460231+00:00 INFO pumpkindb - Starting 4 schedulers
2017-04-12T02:52:47.442375937+00:00 INFO pumpkindb - Listening on 0.0.0.0:9981

Finally, connect to it using pumpkindb-term:

$ docker run -ti pumpkindb/pumpkindb pumpkindb-term 172.17.0.1:9981 # replace IP with the docker host IP

Building from the source code

You are also welcome to clone the repository and build it yourself. You will need Rust Nightly to do this. The easiest way to get it is to use rustup

$ rustup install nightly
$ rustup override set nightly # in PumpkinDB directory

After that, you can run PumpkinDB server this way:

$ cargo build --all
$ ./target/debug/pumpkindb
2017-04-03T10:43:49.667667-07:00 WARN pumpkindb - No logging configuration specified, switching to console logging
2017-04-03T10:43:49.668660-07:00 INFO pumpkindb - Starting up
2017-04-03T10:43:49.674139-07:00 INFO pumpkindb_engine::storage - Available disk space is approx. 7Gb, setting database map size to it
2017-04-03T10:43:49.675759-07:00 INFO pumpkindb - Starting 8 schedulers
2017-04-03T10:43:49.676113-07:00 INFO pumpkindb - Listening on 0.0.0.0:9981

You can connect to it using pumpkindb-term:

$ ./target/debug/pumpkindb-term
Connected to PumpkinDB at 0.0.0.0:9981
To send an expression, end it with `.`
Type \h for help.
PumpkinDB> ["Name" HLC CONCAT "Jopn Doe" ASSOC COMMIT] WRITE.

PumpkinDB> ["Name" HLC CONCAT "John Doe" ASSOC COMMIT] WRITE.

PumpkinDB> [CURSOR DUP "Name" CURSOR/SEEKLAST DROP CURSOR/VAL] READ (Get last value).
"John Doe"
PumpkinDB> [CURSOR DUP "Name" CURSOR/SEEKLAST DROP DUP CURSOR/PREV DROP CURSOR/VAL] READ (Get previous value).
"Jopn Doe"

(The above example shows how one can query and navigate for values submitted at a different time, using low level primitives).

You can change some of the server's parameters by creating pumpkindb.toml:

[storage]
path = "path/to/db"
# By default, mapsize will equal to the size of
# available space on the disk, except on Windows,
# where default would be 1Gb.
# `mapsize` is a theoretical limit the database can
# grow to. However, on Windows, this also means that
# the database file will take that space.
# This parameter allows to specify the mapsize
# in megabytes.
# mapsize = 2048

[server]
port = 9981

Components

PumpkinDB project is split into a couple of separate components (crates):

  • pumpkinscript — PumpkinScript parser. Allows to convert text PumpkinScript form into binary one.
  • pumpkindb_engine — Core PumpkinDB library. Provides PumpkinScript scheduler, and a standard library of instructions
  • pumpkindb_mio_server — Async MIO-based PumpkinDB server library. Useful for building custom PumpkinProtocol-compatible servers.
  • pumpkindb_client — PumpkinProtocol client library.
  • pumpkindb_server — Stock PumpkinDB server. Built on top of pumpkindb_mio_server.
  • pumpkindb_term — console-based PumpkinDB server client.
  • doctests — a small utility to run instructions doctests.

Contributing

This project is in its very early days and we will always be welcoming contributors.

Our goal is to encourage frictionless contributions to the project. In order to achieve that, we use Unprotocols C4 process. Please read it, it will answer a lot of questions. Our goal is to merge pull requests as quickly as possible and make new stable releases regularly.

In a nutshell, this means:

  • We merge pull requests rapidly (try!)
  • We are open to diverse ideas
  • We prefer code now over consensus later

To learn more, read our contribution guidelines

We also maintain a list of issues that we think are good starters for new contributors.

Backers

Support us with a monthly donation and help us continue our activities. [Become a backer]

Sponsors

Become a sponsor and get your logo on our README on Github with a link to your site. [Become a sponsor]

GitHub

https://github.com/PumpkinDB/PumpkinDB
Comments
  • 1. Problem: lexicographical sorting for signed integers doesn't always work

    Due to the way signed integers are encoded (two's complement), even if we flip MSB to be 0 for negative and 1 for positive, it would "still incorrectly compare negative INTs, for example: 0b00000001 (-1) 0b00000010 (-2) Using LT? and GT? with this encoding would mean that -2 > -1 (the same problem we have right now with INTs), and changing LT? and GT? to take the sign bit into account would break comparisons between unsigned INTs, because 0b01111111(unsigned 127) would be interpreted as -127." @Matt8898 (see #265)

    It is an important issue as order key navigation is fairly critical to overall usefulness of PumpkinDB as a database engine.

    Proposed solution (long term): explore the direction of maintaining multiple b-trees or collation sequencing for key ranges.

    Any other, less invasive solutions?

    Reviewed by yrashk at 2017-05-02 19:16
  • 2. Problem: wall clock going back in time between restarts will break HLC guarantee

    Basically, it means that after a restart values returned by HLC will be less than those generated before.

    Proposed solution: let timestamp module persist last generated HLC.

    Letting it store in the same database is rather intricate: there might be no write txn, the write txn might never succeed, etc. So I propose that PumpkinDB maintains a separate "meta" database within the environment with its own transactions.

    Any better solution (that would involve I/O penalty?)

    Reviewed by yrashk at 2017-02-13 12:57
  • 3. Problem: HLC invariants can be broken (#32) and too much use of static

    Solution: This PR introduces a memory mapped meta database that is used by modules to persist whatever state they might need to reconstruct their previous state after a restart. For now this is only done for HLCs (see 77b2506).

    Additionally, remove excessive usage of lazy_static (06e1156, 5200d65, 4c7065f).

    Reviewed by omarkj at 2017-03-12 06:32
  • 4. Problem: Compile times are slow

    Compiling PumpkinDB takes a while, sometimes making it pretty annoying to work on.

    There are a few possible reasons as to why this could be:

    1 Lots of macros being used; 1 Lots of manual in-lining; 1 Code size.

    Maybe compile time can be brought down by breaking the project into sub-crates? I'm not sure how cargo goes about compiling then.

    Reviewed by omarkj at 2017-02-26 23:38
  • 5. Problem: in multi-VM setup, multiple write transactions are possible

    It is not a problem right now, but soon enough multi-VM setups will be a thing (use all the cores!) and they need to synchronize their access to writing, since lmdb only permits one active writer

    Reviewed by yrashk at 2017-02-13 18:14
  • 6. Problem: implementing words without subwords is difficult

    Technically speaking, loaded words can define their own words but they will leak into the remainder of the program, which less than ideal.

    Using stack alone requires a significant amount of juggling that makes writing code incredibly frustrating. I personally believe that the value of stack based programming languages is in concatenative abilities (composition via stack), not just being able to do everything by juggling items in the stack.

    Solution: make all SETs and DEFs done within any closure local to that closure.

    This means that if eventually we'll need to do code injection that's not injecting what effectively amounts to closures, we'll need to have a separate pass result type that can indicate that.

    Reviewed by yrashk at 2017-02-17 10:07
  • 7. Problem: sized INTs with different signs don't compare well

    Wherever possible, typing conventions in PumpkinScript enforce lexicographical value sorting, i.e. if a value is considered to be smaller, it should appear earlier during cursor iteration.

    Recent changes introduced in #258 break this convention:

    PumpkinDB> -100i8 +1i8 LT?.
    0x00 (incorrect)
    PumpkinDB> -100i8 -1i8 LT?.
    0x01 (correct)
    PumpkinDB> -2i8 -0i8 LT?.
    0x00 (incorrect)
    

    Proposed solution:

    I am not sure if there is a good solution to this, given how signed integers are encoded, but I think at the very least it is worth to highlight this fact (if we don't find any reasonable solution, we should explicitly document this).

    It might also be valuable to add INTxx/LT? and GT? since INT/LT? and GT? don't work on sized ints and LT and GT? don't produce desirable solution.

    Any other thoughts?

    cc @Matt8898

    Reviewed by yrashk at 2017-04-30 16:00
  • 8. Problem: expressing numbers in text form is inconvenient

    One has to make sure that they are hexadecimal and little-endian.

    Proposed solution: prohibit words to start with - and 0..9 and allow these to be interpreted as bigint/biguints.

    Reviewed by yrashk at 2017-02-10 07:19
  • 9. Problem: HLC clock can't observe another timestamp

    Solution: Implement HLC/OBSERVE instruction

    Addresses #199

    NOTE This commit uses a fork of the hybrid-clocks repo. Changes to it are necessary for tests to pass.

    Reviewed by stuarth at 2017-04-03 20:37
  • 10. Problem: contributors in toml files are hard to maintain

    Every contributor is mentioned in a dedicated file anyway, so maintaining a list of contributors in each toml file is rather cumbersome and redundant.

    Solution: don't mention contributors in Cargo.toml's.

    Reviewed by theduke at 2017-03-29 19:32
  • 11. Problem: PumpkinDB is not a statically linked binary

    This means potential binary distribution challenges as some libraries might not be available.

    Proposed solution: use Rust capabilities where possible (= Linux at the moment)

    See https://blog.rust-lang.org/2016/05/13/rustup.html

    Reviewed by yrashk at 2017-03-05 21:38
  • 12. `Read` on uninitialized buffer may cause UB ( `PacketReader::read()` )

    Hello :crab:, we (Rust group @sslab-gatech) found a memory-safety/soundness issue in this crate while scanning Rust code on crates.io for potential vulnerabilities.

    Issue Description

    https://github.com/PumpkinDB/PumpkinDB/blob/f37b0b586357cafd2fd0eba8bea9072665af0b00/pumpkindb_client/src/packet.rs#L50-L55

    PacketReader::read() method creates an uninitialized buffer and passes it to user-provided Read implementation. This is unsound, because it allows safe Rust code to exhibit an undefined behavior (read from uninitialized memory).

    This part from the Read trait documentation explains the issue:

    It is your responsibility to make sure that buf is initialized before calling read. Calling read with an uninitialized buf (of the kind one obtains via MaybeUninit<T>) is not safe, and can lead to undefined behavior.

    How to fix the issue?

    The Naive & safe way to fix the issue is to always zero-initialize a buffer before lending it to a user-provided Read implementation. Note that this approach will add runtime performance overhead of zero-initializing the buffer.

    As of Jan 2021, there is not yet an ideal fix that works in stable Rust with no performance overhead. Below are links to relevant discussions & suggestions for the fix.

    Reviewed by JOE1994 at 2021-01-31 15:47
  • 13. Problem: does not build on Linux anymore

    Hi, I worked on making PumpkinDB compile on Linux again. I also removed deprecation warnings along the way and adopted stabilized features regarding slice patterns in Rust. As I'm not very familiar with contributing to foreign code bases please check and provide me feedback in case this pull request is strange or wrong.

    I would like to see this pull request merge thus giving PumpkinDB kind of revival. What do you think?

    Thank you.

    Kind regards

    Tobias

    BTW I also checked build with Travis beforehand.

    Reviewed by tokcum at 2020-03-21 18:48
  • 14. Problem: Rust nightly after 2017-06-20 affects benchmarks negatively

    Running cargo bench acker in the pumpkindb_engine crate:

    nightly-2017-06-20: (last known "good" nightly)

    test script::tests::ackermann                                                  ... bench:  32,562,056 ns/iter (+/- 4,313,209)
    test script::tests::ackermann_stack                                            ... bench:  24,391,250 ns/iter (+/- 4,758,534)
    

    Most likely "first offender"

    nightly-2017-06-21: (https://github.com/rust-lang/rust/commit/445077963)

    test script::tests::ackermann                                                  ... bench:  99,398,843 ns/iter (+/- 5,556,874)
    test script::tests::ackermann_stack                                            ... bench:  51,625,752 ns/iter (+/- 3,971,108)
    

    Random builds:

    2017-07-24:

    test script::tests::ackermann                                                  ... bench:  49,617,433 ns/iter (+/- 6,028,955)
    test script::tests::ackermann_stack                                            ... bench:  36,797,438 ns/iter (+/- 4,442,438)
    

    2017-07-01:

    test script::tests::ackermann                                                  ... bench:  97,953,353 ns/iter (+/- 5,756,554)
    test script::tests::ackermann_stack                                            ... bench:  51,599,955 ns/iter (+/- 3,756,541)
    

    2017-07-12:

    test script::tests::ackermann                                                  ... bench:  50,183,660 ns/iter (+/- 7,364,549)
    test script::tests::ackermann_stack                                            ... bench:  39,244,258 ns/iter (+/- 6,607,862)
    

    2017-06-23:

    test script::tests::ackermann                                                  ... bench: 111,698,229 ns/iter (+/- 17,495,313)
    test script::tests::ackermann_stack                                            ... bench:  58,807,694 ns/iter (+/- 4,591,336)
    

    Last build:

    2017-08-02:

    test script::tests::ackermann                                                  ... bench:  49,253,683 ns/iter (+/- 9,255,754)
    test script::tests::ackermann_stack                                            ... bench:  35,636,600 ns/iter (+/- 5,089,613)
    

    Proposed solution: figure out what happened in the first place, why did it get so much slower?

    Reviewed by yrashk at 2017-08-03 12:25
  • 15. Problem: lazy builtins! macro defers syntax checking until run time

    Currently, builtins! uses lazy_static! to prepare BUILTINS map of instructions. This means that until handle_builtins is first called, nothing is going to get parsed. So if there's, for example, a parsing mistake, the whole engine will crash.

    Proposed solution: find a way to parse builtins during compile time

    Reviewed by yrashk at 2017-07-10 10:17
  • 16. Problem: PumpkinDB is too low-level

    For most end-user cases, PumpkinDB is too "close to the metal" to be easily plugged into a business application, especially if "event sourcing" type of scenarios are considered.

    Proposed solution: implement a higher level database on top of PumpkinDB

    Reviewed by yrashk at 2017-07-05 03:50
General basic key-value structs for Key-Value based storages

General basic key-value structs for Key-Value based storages

Jan 23, 2022
Distributed transactional key-value database, originally created to complement TiDB
Distributed transactional key-value database, originally created to complement TiDB

Website | Documentation | Community Chat TiKV is an open-source, distributed, and transactional key-value database. Unlike other traditional NoSQL sys

Jun 21, 2022
AgateDB is an embeddable, persistent and fast key-value (KV) database written in pure Rust

AgateDB is an embeddable, persistent and fast key-value (KV) database written in pure Rust. It is designed as an experimental engine for the TiKV project, and will bring aggressive optimizations for TiKV specifically.

Jun 22, 2022
Pure rust embeddable key-value store database.

MHdb is a pure Rust database implementation, based on dbm. See crate documentation. Changelog v1.0.3 Update Cargo.toml v1.0.2 Update Cargo.toml v1.0.1

Jan 4, 2022
RefineDB - A strongly-typed document database that runs on any transactional key-value store.

RefineDB - A strongly-typed document database that runs on any transactional key-value store.

Jun 14, 2022
A simple key value database for storing simple structures.

Perdia-DB A simple key value database for storing simple structures. No nesting of structures is supported, but may be implemented in the future. Toke

May 24, 2022
ForestDB - A Fast Key-Value Storage Engine Based on Hierarchical B+-Tree Trie

ForestDB is a key-value storage engine developed by Couchbase Caching and Storage Team, and its main index structure is built from Hierarchic

Jun 15, 2022
PickleDB-rs is a lightweight and simple key-value store. It is a Rust version for Python's PickleDB

PickleDB PickleDB is a lightweight and simple key-value store written in Rust, heavily inspired by Python's PickleDB PickleDB is fun and easy to use u

Jun 18, 2022
CLI tool to work with Sled key-value databases.

sledtool CLI tool to work with Sled key-value databases. $ sledtool --help Usage: sledtool <dbpath> <command> [<args>] CLI tool to work with Sled da

May 29, 2022
A Key-Value data storage system. - dorea db

Dorea DB ?? Dorea is a key-value data storage system. It is based on the Bitcask storage model Documentation | Crates.io | API Doucment 简体中文 | English

Jun 25, 2022
RedisLess is a fast, lightweight, embedded and scalable in-memory Key/Value store library compatible with the Redis API.

RedisLess is a fast, lightweight, embedded and scalable in-memory Key/Value store library compatible with the Redis API.

May 16, 2022
Log structured append-only key-value store from Rust In Action with some enhancements.

riakv Log structured, append only, key value store implementation from Rust In Action with some enhancements. Features Persistent key value store with

Jan 13, 2022
A LSM-based Key-Value Store in Rust

CobbleDB A LSM-based Key-Value Store in Rust Motivation There is no open-source LSM-based key-value store in Rust natively. Some crates are either a w

Oct 25, 2021
A sessioned Merkle key/value store

storage A sessioned Merkle key/value store The crate was designed to be the blockchain state database. It provides persistent storage layer for key-va

Dec 20, 2021
A rust Key-Value store based on Redis.

Key-Value Store A Key-Value store that uses Redis to store data. Built using an async web framework in Rust with a full Command-Line interface and log

Jan 14, 2022
🔥🌲 High-performance Merkle key/value store

merk High-performance Merkle key/value store Merk is a crypto key/value store - more specifically, it's a Merkle AVL tree built on top of RocksDB (Fac

May 29, 2022
A simple embedded key-value store written in rust as a learning project

A simple embedded key-value store written in rust as a learning project

Feb 20, 2022
Yet Another Kev-Value DataBase

Yet Another Kev-Value DataBase Extremely simple (simplest possible?) single-file BTree-based key-value database. Build for fun and learning: goal is t

May 23, 2022
X-Engine: A SQL Engine built from scratch in Rust.

XNGIN (pronounced "X Engine") This is a personal project to build a SQL engine from scratch. The project name is inspired by Nginx, which is a very po

May 3, 2022