Immutable Ordered Key-Value Database Engine

Last update: Jan 2, 2023

Related tags

Database rust query database key-value storage event-sourcing indexing concatenative forth

Overview

PumpkinDB


Build status (Linux)
Build status (Windows)
Project status	Usable, between alpha and beta
Production-readiness	Depends on your risk tolerance

PumpkinDB is an immutable ordered key-value database engine, featuring:

ACID transactions
Persistent storage
An embedded programming language (PumpkinScript)
Binary keys and values (allows any encoding to be used: JSON, XML, Protobuf, Cap'n Proto, etc.)
Standalone and embedded scenarios

Why immutable?

Simply put, the data replaced is data deleted and is therefore, an unsafe way to manage data. Bugs, misunderstanding, changing scope and requirements and other factors might influence what data (and especially past data) means and how can it be used.

By guaranteeing the immutability of key's value once it is set, PumpkinDB forces its users to think of their data through a temporal perspective.

This approach is highly beneficial for implementing event sourcing and similar types of architectures.

What is PumpkinDB?

PumpkinDB is essentially a database programming environment, largely inspired by core ideas behind MUMPS. Instead of M, it has a Forth-inspired stack-based language, PumpkinScript. Instead of hierarchical keys, it has a flat key namespace and doesn't allow overriding values once they are set. Core motivation for immutability was that with the cost of storage declining, erasing data is effectively a strategical mistake.

While not intended for general purpose programming, its main objective is to facilitate building specialized application-specific and generic databases with a particular focus on immutability and processing data as close to storage as possible, incurring as little communication penalty as possible.

Applications communicate with PumpkinDB by sending small PumpkinScript programs over a network interface (or API when using PumpkinDB as an embedded solution).

PumpkinDB offers a wide array of primitives for concurrency, storage, journalling, indexing and other common building blocks.

Why is it a database engine?

The core ideas behind PumpkinDB stem from the so called lazy event sourcing approach which is based on storing and indexing events while delaying domain binding for as long as possible. That said, the intention of this database is to be a building block for different kinds of architectures, be it classic event sourcing (using it as an event store), lazy event sourcing (using indices) or anything else. It's also possible to implement different approaches within a single database for different parts of the domain.

Instead of devising custom protocols for talking to PumpkinDB, the protocol of communication has become a pipeline to a script executor. This offers us enormous extension and flexibility capabilities.

While an external application can talk to PumpkinDB over a network connection, PumpkinDB's engine itself is embeddable and can be used directly. Currenly, it is available for Rust applications only, but this may one day extend to all languages that can interface with C.

Client libraries

Language	Library	Status
Rust	pumpkindb_client	Early release (0.2.0)
Java	pumpkindb-client	Pre-release

Trying it out

You can download PumpkinDB releases from GitHub.

Docker

You can try out latest PumpkinDB HEAD revision by using a docker image:

$ docker pull pumpkindb/pumpkindb

Alternatively, you can build the image yourself:

$ docker build . -t pumpkindb/pumpkindb

Run the server:

$ docker run -p 9981:9981 -ti pumpkindb/pumpkindb
2017-04-12T02:52:47.440873517+00:00 WARN pumpkindb - No logging configuration specified, switching to console logging
2017-04-12T02:52:47.440983318+00:00 INFO pumpkindb - Starting up
2017-04-12T02:52:47.441122740+00:00 INFO pumpkindb_engine::storage - Available disk space is approx. 56Gb, setting database map size to it
2017-04-12T02:52:47.441460231+00:00 INFO pumpkindb - Starting 4 schedulers
2017-04-12T02:52:47.442375937+00:00 INFO pumpkindb - Listening on 0.0.0.0:9981

Finally, connect to it using pumpkindb-term:

$ docker run -ti pumpkindb/pumpkindb pumpkindb-term 172.17.0.1:9981 # replace IP with the docker host IP

Building from the source code

You are also welcome to clone the repository and build it yourself. You will need Rust Nightly to do this. The easiest way to get it is to use rustup

$ rustup install nightly
$ rustup override set nightly # in PumpkinDB directory

After that, you can run PumpkinDB server this way:

$ cargo build --all
$ ./target/debug/pumpkindb
2017-04-03T10:43:49.667667-07:00 WARN pumpkindb - No logging configuration specified, switching to console logging
2017-04-03T10:43:49.668660-07:00 INFO pumpkindb - Starting up
2017-04-03T10:43:49.674139-07:00 INFO pumpkindb_engine::storage - Available disk space is approx. 7Gb, setting database map size to it
2017-04-03T10:43:49.675759-07:00 INFO pumpkindb - Starting 8 schedulers
2017-04-03T10:43:49.676113-07:00 INFO pumpkindb - Listening on 0.0.0.0:9981

You can connect to it using pumpkindb-term:

$ ./target/debug/pumpkindb-term
Connected to PumpkinDB at 0.0.0.0:9981
To send an expression, end it with `.`
Type \h for help.
PumpkinDB> ["Name" HLC CONCAT "Jopn Doe" ASSOC COMMIT] WRITE.

PumpkinDB> ["Name" HLC CONCAT "John Doe" ASSOC COMMIT] WRITE.

PumpkinDB> [CURSOR DUP "Name" CURSOR/SEEKLAST DROP CURSOR/VAL] READ (Get last value).
"John Doe"
PumpkinDB> [CURSOR DUP "Name" CURSOR/SEEKLAST DROP DUP CURSOR/PREV DROP CURSOR/VAL] READ (Get previous value).
"Jopn Doe"

(The above example shows how one can query and navigate for values submitted at a different time, using low level primitives).

You can change some of the server's parameters by creating pumpkindb.toml:

[storage]
path = "path/to/db"
# By default, mapsize will equal to the size of
# available space on the disk, except on Windows,
# where default would be 1Gb.
# `mapsize` is a theoretical limit the database can
# grow to. However, on Windows, this also means that
# the database file will take that space.
# This parameter allows to specify the mapsize
# in megabytes.
# mapsize = 2048

[server]
port = 9981

Components

PumpkinDB project is split into a couple of separate components (crates):

pumpkinscript — PumpkinScript parser. Allows to convert text PumpkinScript form into binary one.
pumpkindb_engine — Core PumpkinDB library. Provides PumpkinScript scheduler, and a standard library of instructions
pumpkindb_mio_server — Async MIO-based PumpkinDB server library. Useful for building custom PumpkinProtocol-compatible servers.
pumpkindb_client — PumpkinProtocol client library.
pumpkindb_server — Stock PumpkinDB server. Built on top of pumpkindb_mio_server.
pumpkindb_term — console-based PumpkinDB server client.
doctests — a small utility to run instructions doctests.

Contributing

This project is in its very early days and we will always be welcoming contributors.

Our goal is to encourage frictionless contributions to the project. In order to achieve that, we use Unprotocols C4 process. Please read it, it will answer a lot of questions. Our goal is to merge pull requests as quickly as possible and make new stable releases regularly.

In a nutshell, this means:

We merge pull requests rapidly (try!)
We are open to diverse ideas
We prefer code now over consensus later

To learn more, read our contribution guidelines

We also maintain a list of issues that we think are good starters for new contributors.

Backers

Support us with a monthly donation and help us continue our activities. [Become a backer]

Sponsors

Become a sponsor and get your logo on our README on Github with a link to your site. [Become a sponsor]

Comments

Problem: lexicographical sorting for signed integers doesn't always work

Due to the way signed integers are encoded (two's complement), even if we flip MSB to be 0 for negative and 1 for positive, it would "still incorrectly compare negative INTs, for example: 0b00000001 (-1) 0b00000010 (-2) Using LT? and GT? with this encoding would mean that -2 > -1 (the same problem we have right now with INTs), and changing LT? and GT? to take the sign bit into account would break comparisons between unsigned INTs, because 0b01111111(unsigned 127) would be interpreted as -127." @Matt8898 (see #265)

It is an important issue as order key navigation is fairly critical to overall usefulness of PumpkinDB as a database engine.

Proposed solution (long term): explore the direction of maintaining multiple b-trees or collation sequencing for key ranges.

Any other, less invasive solutions?
PumpkinScript

opened by yrashk 24
Problem: wall clock going back in time between restarts will break HLC guarantee

Basically, it means that after a restart values returned by HLC will be less than those generated before.

Proposed solution: let timestamp module persist last generated HLC.

Letting it store in the same database is rather intricate: there might be no write txn, the write txn might never succeed, etc. So I propose that PumpkinDB maintains a separate "meta" database within the environment with its own transactions.

Any better solution (that would involve I/O penalty?)
bug

opened by yrashk 22
Problem: HLC invariants can be broken (#32) and too much use of static

Solution: This PR introduces a memory mapped meta database that is used by modules to persist whatever state they might need to reconstruct their previous state after a restart. For now this is only done for HLCs (see 77b2506).

Additionally, remove excessive usage of lazy_static (06e1156, 5200d65, 4c7065f).

opened by omarkj 13
Problem: Compile times are slow

Compiling PumpkinDB takes a while, sometimes making it pretty annoying to work on.

There are a few possible reasons as to why this could be:

1 Lots of macros being used; 1 Lots of manual in-lining; 1 Code size.

Maybe compile time can be brought down by breaking the project into sub-crates? I'm not sure how cargo goes about compiling then.
build

opened by omarkj 11
Problem: in multi-VM setup, multiple write transactions are possible

It is not a problem right now, but soon enough multi-VM setups will be a thing (use all the cores!) and they need to synchronize their access to writing, since lmdb only permits one active writer
Storage

opened by yrashk 11
Problem: implementing words without subwords is difficult

Technically speaking, loaded words can define their own words but they will leak into the remainder of the program, which less than ideal.

Using stack alone requires a significant amount of juggling that makes writing code incredibly frustrating. I personally believe that the value of stack based programming languages is in concatenative abilities (composition via stack), not just being able to do everything by juggling items in the stack.

Solution: make all SETs and DEFs done within any closure local to that closure.

This means that if eventually we'll need to do code injection that's not injecting what effectively amounts to closures, we'll need to have a separate pass result type that can indicate that.
PumpkinScript

opened by yrashk 9
Problem: sized INTs with different signs don't compare well
Wherever possible, typing conventions in PumpkinScript enforce lexicographical value sorting, i.e. if a value is considered to be smaller, it should appear earlier during cursor iteration.

Recent changes introduced in #258 break this convention:

PumpkinDB> -100i8 +1i8 LT?. 0x00 (incorrect) PumpkinDB> -100i8 -1i8 LT?. 0x01 (correct) PumpkinDB> -2i8 -0i8 LT?. 0x00 (incorrect)

Proposed solution:

I am not sure if there is a good solution to this, given how signed integers are encoded, but I think at the very least it is worth to highlight this fact (if we don't find any reasonable solution, we should explicitly document this).

It might also be valuable to add INTxx/LT? and GT? since INT/LT? and GT? don't work on sized ints and LT and GT? don't produce desirable solution.

Any other thoughts?

cc @Matt8898
opened by yrashk 8
Problem: expressing numbers in text form is inconvenient

One has to make sure that they are hexadecimal and little-endian.

Proposed solution: prohibit words to start with - and 0..9 and allow these to be interpreted as bigint/biguints.
PumpkinScript

opened by yrashk 8
Problem: HLC clock can't observe another timestamp

Solution: Implement HLC/OBSERVE instruction

Addresses #199

NOTE This commit uses a fork of the hybrid-clocks repo. Changes to it are necessary for tests to pass.

opened by stuarth 6
Problem: contributors in toml files are hard to maintain

Every contributor is mentioned in a dedicated file anyway, so maintaining a list of contributors in each toml file is rather cumbersome and redundant.

Solution: don't mention contributors in Cargo.toml's.

opened by theduke 6
Problem: PumpkinDB is not a statically linked binary

This means potential binary distribution challenges as some libraries might not be available.

Proposed solution: use Rust capabilities where possible (= Linux at the moment)

See https://blog.rust-lang.org/2016/05/13/rustup.html

opened by yrashk 6
`Read` on uninitialized buffer may cause UB ( `PacketReader::read()` )
Hello :crab:, we (Rust group @sslab-gatech) found a memory-safety/soundness issue in this crate while scanning Rust code on crates.io for potential vulnerabilities.

Issue Description

https://github.com/PumpkinDB/PumpkinDB/blob/f37b0b586357cafd2fd0eba8bea9072665af0b00/pumpkindb_client/src/packet.rs#L50-L55

PacketReader::read() method creates an uninitialized buffer and passes it to user-provided Read implementation. This is unsound, because it allows safe Rust code to exhibit an undefined behavior (read from uninitialized memory).

This part from the Read trait documentation explains the issue:

It is your responsibility to make sure that buf is initialized before calling read. Calling read with an uninitialized buf (of the kind one obtains via MaybeUninit<T>) is not safe, and can lead to undefined behavior.

How to fix the issue?

The Naive & safe way to fix the issue is to always zero-initialize a buffer before lending it to a user-provided Read implementation. Note that this approach will add runtime performance overhead of zero-initializing the buffer.

As of Jan 2021, there is not yet an ideal fix that works in stable Rust with no performance overhead. Below are links to relevant discussions & suggestions for the fix.

Well-written document regarding the issue

Rust RFC 2930

nightly feature std::io::Initializer

Discussion in Rust Internals Forum
opened by JOE1994 0
Problem: does not build on Linux anymore

Hi, I worked on making PumpkinDB compile on Linux again. I also removed deprecation warnings along the way and adopted stabilized features regarding slice patterns in Rust. As I'm not very familiar with contributing to foreign code bases please check and provide me feedback in case this pull request is strange or wrong.

I would like to see this pull request merge thus giving PumpkinDB kind of revival. What do you think?

Thank you.

Kind regards

Tobias

BTW I also checked build with Travis beforehand.

opened by tokcum 0

Problem: Rust nightly after 2017-06-20 affects benchmarks negatively

Running cargo bench acker in the pumpkindb_engine crate:

nightly-2017-06-20: (last known "good" nightly)

test script::tests::ackermann                                                  ... bench:  32,562,056 ns/iter (+/- 4,313,209)
test script::tests::ackermann_stack                                            ... bench:  24,391,250 ns/iter (+/- 4,758,534)

Most likely "first offender"

nightly-2017-06-21: (https://github.com/rust-lang/rust/commit/445077963)

test script::tests::ackermann                                                  ... bench:  99,398,843 ns/iter (+/- 5,556,874)
test script::tests::ackermann_stack                                            ... bench:  51,625,752 ns/iter (+/- 3,971,108)

Random builds:

2017-07-24:

test script::tests::ackermann                                                  ... bench:  49,617,433 ns/iter (+/- 6,028,955)
test script::tests::ackermann_stack                                            ... bench:  36,797,438 ns/iter (+/- 4,442,438)

2017-07-01:

test script::tests::ackermann                                                  ... bench:  97,953,353 ns/iter (+/- 5,756,554)
test script::tests::ackermann_stack                                            ... bench:  51,599,955 ns/iter (+/- 3,756,541)

2017-07-12:

test script::tests::ackermann                                                  ... bench:  50,183,660 ns/iter (+/- 7,364,549)
test script::tests::ackermann_stack                                            ... bench:  39,244,258 ns/iter (+/- 6,607,862)

2017-06-23:

test script::tests::ackermann                                                  ... bench: 111,698,229 ns/iter (+/- 17,495,313)
test script::tests::ackermann_stack                                            ... bench:  58,807,694 ns/iter (+/- 4,591,336)

Last build:

2017-08-02:

test script::tests::ackermann                                                  ... bench:  49,253,683 ns/iter (+/- 9,255,754)
test script::tests::ackermann_stack                                            ... bench:  35,636,600 ns/iter (+/- 5,089,613)

Proposed solution: figure out what happened in the first place, why did it get so much slower?

opened by yrashk 2

Problem: lazy builtins! macro defers syntax checking until run time

Currently, builtins! uses lazy_static! to prepare BUILTINS map of instructions. This means that until handle_builtins is first called, nothing is going to get parsed. So if there's, for example, a parsing mistake, the whole engine will crash.

Proposed solution: find a way to parse builtins during compile time
PumpkinScript WhatCanIStartWith?

opened by yrashk 0
Problem: PumpkinDB is too low-level

For most end-user cases, PumpkinDB is too "close to the metal" to be easily plugged into a business application, especially if "event sourcing" type of scenarios are considered.

Proposed solution: implement a higher level database on top of PumpkinDB
discussion

opened by yrashk 8

Releases(v0.2.0)

v0.2.0(Jun 7, 2017)

In this release, the most notable change was splitting of PumpkinDB into multiple crates. This enables PumpkinDB users to build custom systems (embedded and client-server ones) that use PumpkinDB as an engine. Also, pumpkinscript and pumpkindb_client crates allow building Rust applications that use PumpkinDB as remote clients. pumpkindb_mio_server crate simplifies developing custom PumpkinDB server builds.

A confusing "word" term has been changed to "instruction" to provide a clearer indication of the concept behind it.

We've also introduced more conventions around numbers (especially signed and unsigned sized integers, as well as floats) and added respective instructions to the library.

Initial UUID support has been added. Cursor API surface has been reduced and is much easier to comprehend now.

Internally, a lot has changed. A new messaging system (it is now available not only through the server component, but pumpkindb_engine as well). Dispatcher trait allows defining custom sets of instruction "modules" and is flexible enough to support both static and dynamic dispatching.

And, of course, a number of bugs has been spotted and some of them were even fixed. We've also optimized a number of things and found out that there's a lot more to optimize in the upcoming versions!

This is the first release with published library crates on crates.io

Also, this release includes statically built versions of PumpkinDB server and terminal for Linux (using musl). This is an experimental build as in the future we might need to depend on some dynamically libraries.

New contributors: alex, Christoph Herzog, Guillaume Gauvrit, Matteo Semenzato and Stuart Hinson.

Thank you!

SHA-256 checksums:

| File | Checksum | |------|----------| | pumpkindb-v0.2.0-x86_64-apple-darwin.tar.gz | f7e81dfa0bfc02de31b4bd70c5b6a13e95a44550d33d53a77179f5c23a9efde4 | | pumpkindb-v0.2.0-x86_64-pc-windows-msvc.zip | cf4b0fb7e893fdcd4fc7881b01e7c4bf045594e85bdd73cc88d2fec8ac2418a1 | | pumpkindb-v0.2.0-x86_64-unknown-linux-gnu.tar.gz | efcca81f4b482d6a69bfe08f935d2da61c8f7e2262798dcd93a656aeb8c58e15 | | pumpkindb-v0.2.0-x86_64-unknown-linux-musl.tar.gz | 4dc32dbf05dbe0df727128c7455c5482d3366ebf1a32f7308d22120a53c67388 |
Source code(tar.gz)
Source code(zip)
pumpkindb-v0.2.0-x86_64-apple-darwin.tar.gz(2.19 MB)
pumpkindb-v0.2.0-x86_64-pc-windows-msvc.zip(1.87 MB)
pumpkindb-v0.2.0-x86_64-unknown-linux-gnu.tar.gz(3.92 MB)
pumpkindb-v0.2.0-x86_64-unknown-linux-musl.tar.gz(3.96 MB)
v0.1.0(Feb 25, 2017)

This is PumpkinDB's first release (ever!), the intention of which is to finally provide some binaries to those who want to play with PumpkinDB but can't be bothered to install Rust and switch to nightly builds just yet!

Don't expect it to be very stable or feature-complete. Those were the non-goals this time around!

We'd really like you to try it out and let us know what you think about it!

SHA-256 checksums:

pumpkindb-v0.1.0-x86_64-apple-darwin.tar.gz 0c4bf2f56bee2139d3a0616be882df828fe881abd45a8176d076547fb9da7ea8
pumpkindb-v0.1.0-x86_64-pc-windows-msvc.zip f8d0f48486ef790ee36620823503aaefaa6a4eb9fef856b136e948791318d2d4
pumpkindb-v0.1.0-x86_64-unknown-linux-gnu.tar.gz 5d1355cf88a7b691c12ad8ec2bca3cbf783e42cde48a975b059890ed88496fad
Source code(tar.gz)
Source code(zip)
pumpkindb-v0.1.0-x86_64-apple-darwin.tar.gz(1.67 MB)
pumpkindb-v0.1.0-x86_64-pc-windows-msvc.zip(1.38 MB)
pumpkindb-v0.1.0-x86_64-unknown-linux-gnu.tar.gz(3.19 MB)

Owner

GitHub http://pumpkindb.org

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values. Authors: Sanjay Ghem