🔥🌲 High-performance Merkle key/value store

Nomic

Last update: Dec 13, 2022

Related tags

Database merk

Overview

merk

High-performance Merkle key/value store

Merk is a crypto key/value store - more specifically, it's a Merkle AVL tree built on top of RocksDB (Facebook's fork of LevelDB).

Its priorities are performance and reliability. While Merk was designed to be the state database for blockchains, it can also be used anywhere an auditable key/value store is needed.

FEATURES:

Fast reads/writes - Reads have no overhead compared to a normal RocksDB store, and writes are optimized for batch operations (e.g. blocks in a blockchain).
Fast proof generation - Since Merk implements an AVL tree rather than a trie, it is very efficient to create and verify proofs for ranges of keys.
Concurrency - Unlike most other Merkle stores, all operations utilize all available cores - giving huge performance gains and allowing nodes to scale along with Moore's Law.
Replication - The tree is optimized to efficiently build proofs of large chunks, allowing for nodes to download the entire state (e.g. "state syncing").
Checkpointing - Merk can create checkpoints on disk (an immutable view of the entire store at a certain point in time) without blocking, so there are no delays in availability or liveness.
Web-friendly - Being written in Rust means it is easy to run the proof-verification code in browsers with WebAssembly, allowing for light-clients that can verify data for themselves.
Fits any Profile - Performant on RAM-constrained Raspberry Pi's and beefy validator rigs alike.

Usage

Install:

cargo add merk

Example:

extern crate merk;
use merk::*;

// load or create a Merk store at the given path
let mut merk = Merk::open("./merk.db").unwrap();

// apply some operations
let batch = [
    (b"key", Op::Put(b"value")),
    (b"key2", Op::Put(b"value2")),
    (b"key3", Op::Put(b"value3")),
    (b"key4", Op::Delete)
];
merk.apply(&batch).unwrap();

Status

Merk is being used in the Nomic Bitcoin Sidechain.

The codebase has not been audited but has been throroughly tested and proves to be stable.

Benchmarks

Benchmarks are measured on a 1M node tree, each node having a key length of 16 bytes and value length of 40 bytes. All tests are single-threaded (not counting RocksDB background threads).

You can test these yourself by running cargo bench.

2017 Macbook Pro

(Using 1 Merk thread and 4 RocksDB compaction threads)

Pruned (no state kept in memory)

RAM usage: ~20MB average, ~26MB max

Test	Ops per second
Random inserts	23,000
Random updates	32,000
Random deletes	26,000
Random reads	210,000
Random proof generation	133,000

Cached (all state kept in memory)

RAM usage: ~400MB average, ~1.1GB max

Test	Ops per second
Random inserts	58,000
Random updates	81,000
Random deletes	72,000
Random reads	1,565,000
Random proof generation	311,000

i9-9900K Desktop

(Using 1 Merk thread and 16 RocksDB compaction threads)

Pruned (no state kept in memory)

RAM usage: ~20MB average, ~26MB max

Test	Ops per second
Random inserts	40,000
Random updates	55,000
Random deletes	45,000
Random reads	383,000
Random proof generation	249,000

Cached (all state kept in memory)

RAM usage: ~400MB average, ~1.1GB max

Test	Ops per second
Random inserts	93,000
Random updates	123,000
Random deletes	111,000
Random reads	2,370,000
Random proof generation	497,000

Algorithm Details

The algorithms are based on AVL, but optimized for batches of operations and random fetches from the backing store. Read about the algorithms here: https://github.com/nomic-io/merk/blob/develop/docs/algorithms.md

Comments

Specify proof format and verification algorithm
I was curious about how to validate proofs exported from merk.js and reading the code, I wasn't entirely sure. For the purpose of this issue, I just care about existence proofs for exactly one value (it seems you return general range proofs by default, maybe simpler to focus on one case).

The code seems to be this: https://github.com/nomic-io/merk/blob/master/src/verify.js#L15-L47

This is my understanding:

A child node contains {key, value} (both strings) An inner node contains {left, right, kvHash}

left and right are either null?, a reference to another node (recurse), or the hash of that subset of the tree.

kvHash is ??? (data stored in inner node also?)

In the case of serialized proofs, I would assume for every inner node, either left or right is a base64 hash of the subtree, and the other one is a reference to the child node leading to the leaf of interest.

So far, so good. The hash for a leaf node also seems quite simple:

function getKvHash ({ key, value }) { let input = Buffer.concat([ VarString.encode(key), VarString.encode(value) ]) return ripemd160(sha256(input)) }

varint length prefix the key and the value, concatenate, then hash. Solid.

But... looking at:

function childHash (child) { if (child == null) { return nullHash } if (typeof child === 'string') { return Buffer.from(child, 'base64') } if (typeof child === 'object') { return getHash(child) } throw Error('Invalid child node value') } function getHash (node) { let kvHash = node.kvHash ? Buffer.from(node.kvHash, 'base64') : getKvHash(node) let input = Buffer.concat([ childHash(node.left), childHash(node.right), kvHash ]) return ripemd160(sha256(input)) }

It seems that the hash of a child node is not just getKvHash, but getHash. Given left and right are both null for the child node. This ends up like:

kvhash = ripemd160(sha256(key.length || key || value.length || value))

Then one step that treats it like an inner node childHash = ripemd160(sha256([0, 0, ... (40 bytes 0)] || kvhash))

The inner node would be:

innerHash = ripemd160(sha256(child.left || child.right || inner.kvhash))

And then what is inner.kvhash?

I'd love to understand this algorithm well, and be able to validate it in another language (eg. go) as a step towards unifying merkle proofs on the road to ibc
opened by ethanfrey 8
Proof benchmark misleading

I was surprised to read in https://github.com/nomic-io/merk/tree/develop#benchmarks that getting a proof is faster than a read!

I then looked at the code, in particular read and prove and realized that they create keys differently. Notably, read always queries an existent value, while prove queries a random value (which is not in the store).

I think you should update the benchmark to prove one value guaranteed to be in the store.

I also note the setup uses different batch sizes for writing, although I don't think this will affect the performance. It just seems like something else nice to normalize.

opened by ethanfrey 5
Range queries

The proof format already supports range queries, but there isn't currently a way to use them. We will need the prover to include boundary keys to create a verifiable range proof, then verify the boundaries and ensure there are no missing nodes within the range on the client.

prove and verify_proof take in lists of keys, but we'll replace that with a Query type which contains a combination of unique keys and ranges to be included in the proof, or checked on verify.

One question will be whether verify_proof will still need to know about the list of queried keys (currently required to know which values to return and which keys to check for absence proofs). We could instead just verify the proof against the root hash, then have a type (e.g. Proof) which makes it easier to pull out data (by key or range), can check for arbitrary absence proofs, and can verify arbitrary ranges (either by keeping track of all contiguous KVNode ranges on the initial verify pass, or by going back into the proof data and checking them when a method is called).

opened by mappum 4
Suggested change for TreeOp::Put
Not a big deal, but one suggestion I have is to changeTreeOp::Put(&'a[u8]) to TreeOp::Put(Vec<u8>). Why?

The consumer Node.value is already a Vec<u8>

It simplifies TreeOp a bit (lifetime is not needed)

It's easier to use Put(Vec<u8>) from external calls - (again simplifies lifetime stuff)
opened by davebryson 4
Order of transactions in the tree

With IAVL I believe it was (is) a best practice to sort transactions before committing them to the tree. Is this true for merk as well? Looking through the code it doesn't appear that it's doing so. Great work by the way...

opened by davebryson 4
WebAssembly implementation

This module can be easily optimized by porting performance-critical components (e.g. the tree operations) to a language that compiles to wasm (likely Rust). Also probably makes this module less prone to critical vulnerabilities since it's hard to reduce the complexity of this code and it pushes JS to its limits.

opened by mappum 4
Add function to return root of a proof that can later be verified via a signing scheme.

Using verify query on light clients would require them to be supplied with the expected root hash, in addition to the proof, keys and signature. Since the proof already contains the root_hash, and any other root hash wouldn't match for the signature, it would be optimal not to send it.

The change requested decomposes verify_query into a new function query_proof_root that returns the root hash as well as the result of querying the keys.

opened by QuantumExplorer 3
Advanced queries
Closes #40

Description:

This branch makes changes to querying so that we can also query for ranges of keys rather than just individual keys.

Due to this change, we no longer return data from proofs just as a simple mapping of values for each queried key, but create a proofs::query::Map type which contains data for a verified proof, and verifies individual keys or ranges as they are pulled out by a consumer (so that we can, for instance, ensure the proof included all data for a desired range).

We also restructure some internal modules - namely renaming proofs::verify to proofs::tree since it contains the core Tree type (which represents a partial Merk tree when verifying a proof), and move the query-specific logic (as opposed to chunk proof logic) into proofs::query.

TODO:

[ ] Add test coverage for newly added code
opened by mappum 3
Aux data API

We'll need to store some auxiliary data in the store such as the most recent blockchain height or version information, which doesn't need to be part of the Merkle tree but will still need to be part of the same atomic commit. This should probably be stored in a separate column family.

We can possibly create a method like apply_aux(tree_ops: Batch, aux_ops: Batch).

opened by mappum 3
Big ol' rewrite
This WIP PR contains the changes I have been working on for the past few months, which have over time replaced all parts of the existing codebase. While the tree algorithms have not changed and feature-wise it is in the same place as a few months ago, the internal types and interfaces have changed a lot and have resulted in much better code (the second time around I'm more experienced with Rust).

Main improvements:

Caching/pruning - It is now possible to keep some or all tree nodes in memory across batches, meaning we don't have to fetch them from disk. This results in ~2.5x single-core performance increases if the entire tree can be held in memory, with the bottleneck then being disk writes.

Succinctness - Comparable functions have been reduced by 50% LOC throughout the codebase.

Maintainability - Saner abstractions in the internal interfaces, it will be much easier to change the code in the future.

Performance - Compared to the previous benchmark of 18k updates per second, we can now get ~32k on a single core with <30MB memory usage (or 76k with the tree fully held in RAM) due to various optimizations. (These figures are on my 2017 MBP, with a 1M node tree).

I'll finalize this PR once the core Merk store API is finished, leaving it where the master branch currently is.
opened by mappum 3
Seemingly incorrect proof

I stored a bunch of data and when querying on one key, get results for a different key back.

Please see https://github.com/confio/proofs-merk#issues-found

Code to reproduce https://github.com/confio/proofs-merk/blob/master/index.js#L45-L82

Am I mis-using the merkle tree?

opened by ethanfrey 3
Add function to return root of a proof that can later be verified via a signing scheme.

Using verify query on light clients would require them to be supplied with the expected root hash, in addition to the proof, keys and signature. Since the proof already contains the root_hash, and any other root hash wouldn't match for the signature, it would be optimal not to send it.

The change requested adds a new function that executes the proof and returns the root hash and the elements. That root hash can then be verified by a signature scheme.

opened by QuantumExplorer 0
Can v2.0.1 work well with rocksdb 0.15.0?

Hi, I am sorry to bother you guys and put my question in here which might not be the correct place. I have been looking for a Merkleized KV store for my project and this Merk looks pretty nice to me. After looking into the latest release 'v2.0.1' for a while, I noticed that the Merk is still using rocksdb 0.14.0 instead of 0.15.0, which is now being used in develop branch of Merk.

The question is that can I simply point underlying rocksdb to 0.15.0 on a fork of Merk "v2.0.1" and assume that everything will be working fine if "cargo test" pass? The reason I wanted to upgrade to rocksdb 0.15.0 was that it provides both upper_bound and lower_bound options for iterator. I appreciate it if any of you guys could help me on this. Thanks a lot.

opened by ws4charlie 4
Node pages

Nodes are currently stored in individual Boxes, but this means we make a heap allocation for each new node and also don't get any cache locality.

Instead, we should store nodes in pages organized by the shape of the graph (e.g. the root of that subtree is node 0), then make a reference type which stores the reference to the containing page and the node index within the page (while also somehow respecting there borrow checker rules).

opened by mappum 0
Relative key encodings for child references

Each node must contain the keys of its children in order to be able to traverse the tree. This adds significant overhead compared to the actual data itself, especially when keys are long.

Since a node's children's keys will often be very similar to it's own and only differ at the last few bytes, we can cheaply shave down that data by instead storing keyA XOR keyB, removing leading zeroes. When traversing, we can expand into the full key by XORing once again with the parent node's key.

We also have to handle keys which are a different length, and this can be done with a second vector of additional bytes which are not part of the XOR.

opened by mappum 1

Owner

Nomic

GitHub

General basic key-value structs for Key-Value based storages

0 Dec 3, 2022

PickleDB-rs is a lightweight and simple key-value store. It is a Rust version for Python's PickleDB

PickleDB PickleDB is a lightweight and simple key-value store written in Rust, heavily inspired by Python's PickleDB PickleDB is fun and easy to use u

155 Jan 5, 2023

Pure rust embeddable key-value store database.

MHdb is a pure Rust database implementation, based on dbm. See crate documentation. Changelog v1.0.3 Update Cargo.toml v1.0.2 Update Cargo.toml v1.0.1

7 Dec 10, 2022

RefineDB - A strongly-typed document database that runs on any transactional key-value store.

375 Jan 4, 2023

RedisLess is a fast, lightweight, embedded and scalable in-memory Key/Value store library compatible with the Redis API.

145 Nov 23, 2022

Log structured append-only key-value store from Rust In Action with some enhancements.

riakv Log structured, append only, key value store implementation from Rust In Action with some enhancements. Features Persistent key value store with

5 Oct 29, 2022

A LSM-based Key-Value Store in Rust

CobbleDB A LSM-based Key-Value Store in Rust Motivation There is no open-source LSM-based key-value store in Rust natively. Some crates are either a w

2 Oct 25, 2021

A rust Key-Value store based on Redis.

Key-Value Store A Key-Value store that uses Redis to store data. Built using an async web framework in Rust with a full Command-Line interface and log

0 Jan 14, 2022

A simple embedded key-value store written in rust as a learning project

1 Feb 20, 2022

High performance and distributed KV store w/ REST API. 🦀

About Lucid KV High performance and distributed KV store w/ REST API. ?? Introduction Lucid is an high performance, secure and distributed key-value s

306 Dec 28, 2022

Immutable Ordered Key-Value Database Engine

PumpkinDB Build status (Linux) Build status (Windows) Project status Usable, between alpha and beta Production-readiness Depends on your risk toleranc

1.3k Jan 2, 2023

Distributed transactional key-value database, originally created to complement TiDB

Website | Documentation | Community Chat TiKV is an open-source, distributed, and transactional key-value database. Unlike other traditional NoSQL sys

12.4k Jan 3, 2023

CLI tool to work with Sled key-value databases.

sledtool CLI tool to work with Sled key-value databases. $ sledtool --help Usage: sledtool <dbpath> <command> [<args>] CLI tool to work with Sled da

27 Sep 26, 2022

AgateDB is an embeddable, persistent and fast key-value (KV) database written in pure Rust

AgateDB is an embeddable, persistent and fast key-value (KV) database written in pure Rust. It is designed as an experimental engine for the TiKV project, and will bring aggressive optimizations for TiKV specifically.

535 Jan 9, 2023

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values. Authors: Sanjay Ghem

31.5k Jan 1, 2023

🔥🌲 High-performance Merkle key/value store

Related tags

Overview

merk

FEATURES:

Usage

Status

Benchmarks

2017 Macbook Pro

i9-9900K Desktop

Algorithm Details

Comments

Owner

Nomic

General basic key-value structs for Key-Value based storages

PickleDB-rs is a lightweight and simple key-value store. It is a Rust version for Python's PickleDB

Pure rust embeddable key-value store database.

RefineDB - A strongly-typed document database that runs on any transactional key-value store.

RedisLess is a fast, lightweight, embedded and scalable in-memory Key/Value store library compatible with the Redis API.

Log structured append-only key-value store from Rust In Action with some enhancements.

A LSM-based Key-Value Store in Rust

A rust Key-Value store based on Redis.

A simple embedded key-value store written in rust as a learning project

High performance and distributed KV store w/ REST API. 🦀

Immutable Ordered Key-Value Database Engine

Distributed transactional key-value database, originally created to complement TiDB

CLI tool to work with Sled key-value databases.

AgateDB is an embeddable, persistent and fast key-value (KV) database written in pure Rust

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.

ForestDB - A Fast Key-Value Storage Engine Based on Hierarchical B+-Tree Trie

A Key-Value data storage system. - dorea db

A simple key value database for storing simple structures.

A fast and simple in-memory database with a key-value data model written in Rust