๐Ÿ”ฅ๐ŸŒฒ High-performance Merkle key/value store

Related tags

Database merk
Overview

merk

High-performance Merkle key/value store

CI codecov Crate API

Merk is a crypto key/value store - more specifically, it's a Merkle AVL tree built on top of RocksDB (Facebook's fork of LevelDB).

Its priorities are performance and reliability. While Merk was designed to be the state database for blockchains, it can also be used anywhere an auditable key/value store is needed.

FEATURES:

  • Fast reads/writes - Reads have no overhead compared to a normal RocksDB store, and writes are optimized for batch operations (e.g. blocks in a blockchain).
  • Fast proof generation - Since Merk implements an AVL tree rather than a trie, it is very efficient to create and verify proofs for ranges of keys.
  • Concurrency - Unlike most other Merkle stores, all operations utilize all available cores - giving huge performance gains and allowing nodes to scale along with Moore's Law.
  • Replication - The tree is optimized to efficiently build proofs of large chunks, allowing for nodes to download the entire state (e.g. "state syncing").
  • Checkpointing - Merk can create checkpoints on disk (an immutable view of the entire store at a certain point in time) without blocking, so there are no delays in availability or liveness.
  • Web-friendly - Being written in Rust means it is easy to run the proof-verification code in browsers with WebAssembly, allowing for light-clients that can verify data for themselves.
  • Fits any Profile - Performant on RAM-constrained Raspberry Pi's and beefy validator rigs alike.

Usage

Install:

cargo add merk

Example:

extern crate merk;
use merk::*;

// load or create a Merk store at the given path
let mut merk = Merk::open("./merk.db").unwrap();

// apply some operations
let batch = [
    (b"key", Op::Put(b"value")),
    (b"key2", Op::Put(b"value2")),
    (b"key3", Op::Put(b"value3")),
    (b"key4", Op::Delete)
];
merk.apply(&batch).unwrap();

Status

Merk is being used in the Nomic Bitcoin Sidechain.

The codebase has not been audited but has been throroughly tested and proves to be stable.

Benchmarks

Benchmarks are measured on a 1M node tree, each node having a key length of 16 bytes and value length of 40 bytes. All tests are single-threaded (not counting RocksDB background threads).

You can test these yourself by running cargo bench.

2017 Macbook Pro

(Using 1 Merk thread and 4 RocksDB compaction threads)

Pruned (no state kept in memory)

RAM usage: ~20MB average, ~26MB max

Test Ops per second
Random inserts 23,000
Random updates 32,000
Random deletes 26,000
Random reads 210,000
Random proof generation 133,000

Cached (all state kept in memory)

RAM usage: ~400MB average, ~1.1GB max

Test Ops per second
Random inserts 58,000
Random updates 81,000
Random deletes 72,000
Random reads 1,565,000
Random proof generation 311,000

i9-9900K Desktop

(Using 1 Merk thread and 16 RocksDB compaction threads)

Pruned (no state kept in memory)

RAM usage: ~20MB average, ~26MB max

Test Ops per second
Random inserts 40,000
Random updates 55,000
Random deletes 45,000
Random reads 383,000
Random proof generation 249,000

Cached (all state kept in memory)

RAM usage: ~400MB average, ~1.1GB max

Test Ops per second
Random inserts 93,000
Random updates 123,000
Random deletes 111,000
Random reads 2,370,000
Random proof generation 497,000

Algorithm Details

The algorithms are based on AVL, but optimized for batches of operations and random fetches from the backing store. Read about the algorithms here: https://github.com/nomic-io/merk/blob/develop/docs/algorithms.md

Comments
  • Specify proof format and verification algorithm

    Specify proof format and verification algorithm

    I was curious about how to validate proofs exported from merk.js and reading the code, I wasn't entirely sure. For the purpose of this issue, I just care about existence proofs for exactly one value (it seems you return general range proofs by default, maybe simpler to focus on one case).

    The code seems to be this: https://github.com/nomic-io/merk/blob/master/src/verify.js#L15-L47

    This is my understanding:

    A child node contains {key, value} (both strings) An inner node contains {left, right, kvHash}

    • left and right are either null?, a reference to another node (recurse), or the hash of that subset of the tree.
    • kvHash is ??? (data stored in inner node also?)

    In the case of serialized proofs, I would assume for every inner node, either left or right is a base64 hash of the subtree, and the other one is a reference to the child node leading to the leaf of interest.

    So far, so good. The hash for a leaf node also seems quite simple:

    function getKvHash ({ key, value }) {
      let input = Buffer.concat([
        VarString.encode(key),
        VarString.encode(value)
      ])
      return ripemd160(sha256(input))
    }
    

    varint length prefix the key and the value, concatenate, then hash. Solid.

    But... looking at:

    function childHash (child) {
      if (child == null) {
        return nullHash
      }
      if (typeof child === 'string') {
        return Buffer.from(child, 'base64')
      }
      if (typeof child === 'object') {
        return getHash(child)
      }
      throw Error('Invalid child node value')
    }
    
    function getHash (node) {
      let kvHash = node.kvHash
        ? Buffer.from(node.kvHash, 'base64')
        : getKvHash(node)
    
      let input = Buffer.concat([
        childHash(node.left),
        childHash(node.right),
        kvHash
      ])
      return ripemd160(sha256(input))
    }
    

    It seems that the hash of a child node is not just getKvHash, but getHash. Given left and right are both null for the child node. This ends up like:

    kvhash = ripemd160(sha256(key.length || key || value.length || value))

    Then one step that treats it like an inner node childHash = ripemd160(sha256([0, 0, ... (40 bytes 0)] || kvhash))

    The inner node would be:

    innerHash = ripemd160(sha256(child.left || child.right || inner.kvhash))

    And then what is inner.kvhash?

    I'd love to understand this algorithm well, and be able to validate it in another language (eg. go) as a step towards unifying merkle proofs on the road to ibc

    opened by ethanfrey 8
  • Proof benchmark misleading

    Proof benchmark misleading

    I was surprised to read in https://github.com/nomic-io/merk/tree/develop#benchmarks that getting a proof is faster than a read!

    I then looked at the code, in particular read and prove and realized that they create keys differently. Notably, read always queries an existent value, while prove queries a random value (which is not in the store).

    I think you should update the benchmark to prove one value guaranteed to be in the store.

    I also note the setup uses different batch sizes for writing, although I don't think this will affect the performance. It just seems like something else nice to normalize.

    opened by ethanfrey 5
  • Range queries

    Range queries

    The proof format already supports range queries, but there isn't currently a way to use them. We will need the prover to include boundary keys to create a verifiable range proof, then verify the boundaries and ensure there are no missing nodes within the range on the client.

    prove and verify_proof take in lists of keys, but we'll replace that with a Query type which contains a combination of unique keys and ranges to be included in the proof, or checked on verify.

    One question will be whether verify_proof will still need to know about the list of queried keys (currently required to know which values to return and which keys to check for absence proofs). We could instead just verify the proof against the root hash, then have a type (e.g. Proof) which makes it easier to pull out data (by key or range), can check for arbitrary absence proofs, and can verify arbitrary ranges (either by keeping track of all contiguous KVNode ranges on the initial verify pass, or by going back into the proof data and checking them when a method is called).

    opened by mappum 4
  • Suggested change for TreeOp::Put

    Suggested change for TreeOp::Put

    Not a big deal, but one suggestion I have is to changeTreeOp::Put(&'a[u8]) to TreeOp::Put(Vec<u8>). Why?

    1. The consumer Node.value is already a Vec<u8>
    2. It simplifies TreeOp a bit (lifetime is not needed)
    3. It's easier to use Put(Vec<u8>) from external calls - (again simplifies lifetime stuff)
    opened by davebryson 4
  • Order of transactions in the tree

    Order of transactions in the tree

    With IAVL I believe it was (is) a best practice to sort transactions before committing them to the tree. Is this true for merk as well? Looking through the code it doesn't appear that it's doing so. Great work by the way...

    opened by davebryson 4
  • WebAssembly implementation

    WebAssembly implementation

    This module can be easily optimized by porting performance-critical components (e.g. the tree operations) to a language that compiles to wasm (likely Rust). Also probably makes this module less prone to critical vulnerabilities since it's hard to reduce the complexity of this code and it pushes JS to its limits.

    opened by mappum 4
  • Add function to return root of a proof that can later be verified via a signing scheme.

    Add function to return root of a proof that can later be verified via a signing scheme.

    Using verify query on light clients would require them to be supplied with the expected root hash, in addition to the proof, keys and signature. Since the proof already contains the root_hash, and any other root hash wouldn't match for the signature, it would be optimal not to send it.

    The change requested decomposes verify_query into a new function query_proof_root that returns the root hash as well as the result of querying the keys.

    opened by QuantumExplorer 3
  • Advanced queries

    Advanced queries

    Closes #40

    Description:

    This branch makes changes to querying so that we can also query for ranges of keys rather than just individual keys.

    Due to this change, we no longer return data from proofs just as a simple mapping of values for each queried key, but create a proofs::query::Map type which contains data for a verified proof, and verifies individual keys or ranges as they are pulled out by a consumer (so that we can, for instance, ensure the proof included all data for a desired range).

    We also restructure some internal modules - namely renaming proofs::verify to proofs::tree since it contains the core Tree type (which represents a partial Merk tree when verifying a proof), and move the query-specific logic (as opposed to chunk proof logic) into proofs::query.

    TODO:

    • [ ] Add test coverage for newly added code
    opened by mappum 3
  • Aux data API

    Aux data API

    We'll need to store some auxiliary data in the store such as the most recent blockchain height or version information, which doesn't need to be part of the Merkle tree but will still need to be part of the same atomic commit. This should probably be stored in a separate column family.

    We can possibly create a method like apply_aux(tree_ops: Batch, aux_ops: Batch).

    opened by mappum 3
  • Big ol' rewrite

    Big ol' rewrite

    This WIP PR contains the changes I have been working on for the past few months, which have over time replaced all parts of the existing codebase. While the tree algorithms have not changed and feature-wise it is in the same place as a few months ago, the internal types and interfaces have changed a lot and have resulted in much better code (the second time around I'm more experienced with Rust).

    Main improvements:

    • Caching/pruning - It is now possible to keep some or all tree nodes in memory across batches, meaning we don't have to fetch them from disk. This results in ~2.5x single-core performance increases if the entire tree can be held in memory, with the bottleneck then being disk writes.
    • Succinctness - Comparable functions have been reduced by 50% LOC throughout the codebase.
    • Maintainability - Saner abstractions in the internal interfaces, it will be much easier to change the code in the future.
    • Performance - Compared to the previous benchmark of 18k updates per second, we can now get ~32k on a single core with <30MB memory usage (or 76k with the tree fully held in RAM) due to various optimizations. (These figures are on my 2017 MBP, with a 1M node tree).

    I'll finalize this PR once the core Merk store API is finished, leaving it where the master branch currently is.

    opened by mappum 3
  • Seemingly incorrect proof

    Seemingly incorrect proof

    I stored a bunch of data and when querying on one key, get results for a different key back.

    Please see https://github.com/confio/proofs-merk#issues-found

    Code to reproduce https://github.com/confio/proofs-merk/blob/master/index.js#L45-L82

    Am I mis-using the merkle tree?

    opened by ethanfrey 3
  • Add function to return root of a proof that can later be verified via a signing scheme.

    Add function to return root of a proof that can later be verified via a signing scheme.

    Using verify query on light clients would require them to be supplied with the expected root hash, in addition to the proof, keys and signature. Since the proof already contains the root_hash, and any other root hash wouldn't match for the signature, it would be optimal not to send it.

    The change requested adds a new function that executes the proof and returns the root hash and the elements. That root hash can then be verified by a signature scheme.

    opened by QuantumExplorer 0
  • Can v2.0.1 work well with rocksdb 0.15.0?

    Can v2.0.1 work well with rocksdb 0.15.0?

    Hi, I am sorry to bother you guys and put my question in here which might not be the correct place. I have been looking for a Merkleized KV store for my project and this Merk looks pretty nice to me. After looking into the latest release 'v2.0.1' for a while, I noticed that the Merk is still using rocksdb 0.14.0 instead of 0.15.0, which is now being used in develop branch of Merk.

    The question is that can I simply point underlying rocksdb to 0.15.0 on a fork of Merk "v2.0.1" and assume that everything will be working fine if "cargo test" pass? The reason I wanted to upgrade to rocksdb 0.15.0 was that it provides both upper_bound and lower_bound options for iterator. I appreciate it if any of you guys could help me on this. Thanks a lot.

    opened by ws4charlie 4
  • Node pages

    Node pages

    Nodes are currently stored in individual Boxes, but this means we make a heap allocation for each new node and also don't get any cache locality.

    Instead, we should store nodes in pages organized by the shape of the graph (e.g. the root of that subtree is node 0), then make a reference type which stores the reference to the containing page and the node index within the page (while also somehow respecting there borrow checker rules).

    opened by mappum 0
  • Relative key encodings for child references

    Relative key encodings for child references

    Each node must contain the keys of its children in order to be able to traverse the tree. This adds significant overhead compared to the actual data itself, especially when keys are long.

    Since a node's children's keys will often be very similar to it's own and only differ at the last few bytes, we can cheaply shave down that data by instead storing keyA XOR keyB, removing leading zeroes. When traversing, we can expand into the full key by XORing once again with the parent node's key.

    We also have to handle keys which are a different length, and this can be done with a second vector of additional bytes which are not part of the XOR.

    opened by mappum 1
Owner
Nomic
Nomic
General basic key-value structs for Key-Value based storages

General basic key-value structs for Key-Value based storages

Al Liu 0 Dec 3, 2022
PickleDB-rs is a lightweight and simple key-value store. It is a Rust version for Python's PickleDB

PickleDB PickleDB is a lightweight and simple key-value store written in Rust, heavily inspired by Python's PickleDB PickleDB is fun and easy to use u

null 155 Jan 5, 2023
Pure rust embeddable key-value store database.

MHdb is a pure Rust database implementation, based on dbm. See crate documentation. Changelog v1.0.3 Update Cargo.toml v1.0.2 Update Cargo.toml v1.0.1

Magnus Hirth 7 Dec 10, 2022
RefineDB - A strongly-typed document database that runs on any transactional key-value store.

RefineDB - A strongly-typed document database that runs on any transactional key-value store.

Heyang Zhou 375 Jan 4, 2023
RedisLess is a fast, lightweight, embedded and scalable in-memory Key/Value store library compatible with the Redis API.

RedisLess is a fast, lightweight, embedded and scalable in-memory Key/Value store library compatible with the Redis API.

Qovery 145 Nov 23, 2022
Log structured append-only key-value store from Rust In Action with some enhancements.

riakv Log structured, append only, key value store implementation from Rust In Action with some enhancements. Features Persistent key value store with

Arindam Das 5 Oct 29, 2022
A LSM-based Key-Value Store in Rust

CobbleDB A LSM-based Key-Value Store in Rust Motivation There is no open-source LSM-based key-value store in Rust natively. Some crates are either a w

Yizheng Jiao 2 Oct 25, 2021
A rust Key-Value store based on Redis.

Key-Value Store A Key-Value store that uses Redis to store data. Built using an async web framework in Rust with a full Command-Line interface and log

Miguel David Salcedo 0 Jan 14, 2022
A simple embedded key-value store written in rust as a learning project

A simple embedded key-value store written in rust as a learning project

Blobcode 1 Feb 20, 2022
High performance and distributed KV store w/ REST API. ๐Ÿฆ€

About Lucid KV High performance and distributed KV store w/ REST API. ?? Introduction Lucid is an high performance, secure and distributed key-value s

Lucid แตแต› 306 Dec 28, 2022
Immutable Ordered Key-Value Database Engine

PumpkinDB Build status (Linux) Build status (Windows) Project status Usable, between alpha and beta Production-readiness Depends on your risk toleranc

null 1.3k Jan 2, 2023
Distributed transactional key-value database, originally created to complement TiDB

Website | Documentation | Community Chat TiKV is an open-source, distributed, and transactional key-value database. Unlike other traditional NoSQL sys

TiKV Project 12.4k Jan 3, 2023
CLI tool to work with Sled key-value databases.

sledtool CLI tool to work with Sled key-value databases. $ sledtool --help Usage: sledtool <dbpath> <command> [<args>] CLI tool to work with Sled da

Vitaly Shukela 27 Sep 26, 2022
AgateDB is an embeddable, persistent and fast key-value (KV) database written in pure Rust

AgateDB is an embeddable, persistent and fast key-value (KV) database written in pure Rust. It is designed as an experimental engine for the TiKV project, and will bring aggressive optimizations for TiKV specifically.

TiKV Project 535 Jan 9, 2023
LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values. Authors: Sanjay Ghem

Google 31.5k Jan 1, 2023
ForestDB - A Fast Key-Value Storage Engine Based on Hierarchical B+-Tree Trie

ForestDB is a key-value storage engine developed by Couchbase Caching and Storage Team, and its main index structure is built from Hierarchic

null 1.2k Dec 26, 2022
A Key-Value data storage system. - dorea db

Dorea DB ?? Dorea is a key-value data storage system. It is based on the Bitcask storage model Documentation | Crates.io | API Doucment ็ฎ€ไฝ“ไธญๆ–‡ | English

ZhuoEr Liu 112 Dec 2, 2022
A simple key value database for storing simple structures.

Perdia-DB A simple key value database for storing simple structures. No nesting of structures is supported, but may be implemented in the future. Toke

Perdia 4 May 24, 2022
A fast and simple in-memory database with a key-value data model written in Rust

Segment Segment is a simple & fast in-memory database with a key-value data model written in Rust. Features Dynamic keyspaces Keyspace level control o

Segment 61 Jan 5, 2023