Scalable and encrypted embedded database with 3-tier caching

Overview

Infinitree

Crates.io docs.rs Build Status MIT licensed Apache2 licensed

Infinitree is a versioned, embedded database that uses uniform, encrypted blobs to store data.

It works best for use cases with independent writer processes, as multiple writer processes on a single tree are not supported.

In fact, calling Infinitree a database may be generous, as all persistence-related operations are explicit. Under the hood, it's using serde for flexibility and interoperability with the most libraries out of the box.

Features

  • Thread-safe by default
  • Transparently handle hot/warm/cold storage tiers; currently S3-compatible backends is supported
  • Versioned data structures that can be queried using the Iterator trait without loading in full
  • Encrypt all on-disk data, and only decrypt it on use
  • Focus on performance and flexible choice of performance/memory use tradeoffs
  • Extensible for custom data types and storage strategies
  • Easy to integrate with cloud workers & KMS for access control

Example use

, #[infinitree(name = "last_time2")] last_time: Serialized , // only store the keys in the index, not the values #[infinitree(strategy = "infinitree::fields::SparseField")] measurements: VersionedMap , // skip the next field when loading & serializing #[infinitree(skip)] current_time: usize, } fn main() -> anyhow::Result<()> { let mut tree = Infinitree:: ::empty( Directory::new("/storage")?, Key::from_credentials("username", "password")? ); tree.index().measurements.insert(1, PlantHealth { id: 0, air_humidity: 50, soil_humidity: 60, temperature: 23.3, }); *tree.index().last_time.write() = 1; tree.commit("first measurement! yay!"); Ok(()) } ">
use infinitree::{
    Infinitree,
    Index,
    Key,
    anyhow,
    backends::Directory,
    fields::{Serialized, VersionedMap, LocalField},
};
use serde::{Serialize, Deserialize};

#[derive(Serialize, Deserialize)]
pub struct PlantHealth {
    id: usize,
    air_humidity: usize,
    soil_humidity: usize,
    temperature: f32
}

#[derive(Index, Default, Clone)]
pub struct Measurements {
    // rename the field when serializing
    #[infinitree(name = "last_time")]
    _old_last_time: Serialized<String>,

    #[infinitree(name = "last_time2")]
    last_time: Serialized<usize>,

    // only store the keys in the index, not the values
    #[infinitree(strategy = "infinitree::fields::SparseField")]
    measurements: VersionedMap<usize, PlantHealth>,

    // skip the next field when loading & serializing
    #[infinitree(skip)]
    current_time: usize,
}

fn main() -> anyhow::Result<()> {
    let mut tree = Infinitree::<Measurements>::empty(
        Directory::new("/storage")?,
        Key::from_credentials("username", "password")?
    );

    tree.index().measurements.insert(1, PlantHealth {
        id: 0,
        air_humidity: 50,
        soil_humidity: 60,
        temperature: 23.3,
    });

    *tree.index().last_time.write() = 1;
    tree.commit("first measurement! yay!");
    Ok(())
}

Versioning

Infinitree supports versioning data sets, similarly to Git does with files.

While some index fields work as snapshots (eg. Serialized ), and serialize the entire content on each commit, it is possible to use eg. VersionedMap as an incremental HashMap.

Versioned types only store differences from the currently loaded state.

It also possible to restore state selectively, or create completely disparate branches of data for each commit, depending on the use case.

Caching

Data is always moved as part of objects.

This mechanism allows for indexing hundreds of terrabytes of data that span multiple disks and cloud storage platforms, while only synchronizing and loading into memory a small proportion of that.

Application developers can use fine-grained control of cache layers using simple strategies, eg. Least-Recently-Used, where recently queried objects can be stored in a local directory, while the rest is in an S3 bucket.

Object system

The core of Infinitree is an object system that stores all data in uniform 4MiB blobs, encrypted. Objects are named using 256 bit random identifiers, which have no correlation to the content. Indexing data and overlaying it on the physical objects is an interesting problem.

There are 2 types of objects in the Infinitree storage model, which are indistinguishable to the storage layer.

  • Indexes are encrypted as a 4MB unit, and support versioning of serializable data structures.
  • Storage Objects stores and encrypts chunks of data independently, located by a ChunkPointer.

In both cases, knowledge of the master, symmetric encryption key is necessary to access the stored data.

To establish a root of trust, a username/password combination is used to derive an passphrase using Argon 2. The Argon 2 output locates the so called root object, which is the root of the versioned index tree.

Since the system requires some objects to have a deterministic identifier, all objects IDs are uncorrelated with the data they store.

Ensuring integrity of data is done using an ChaCha20-Poly1305 AEAD. The ChunkPointer stores the tags for all data encrypted in storage objects, while the tags are appended to the end of all index objects.

Note that while the master key is necessary to access the root object, there are multiple subkeys used internally, which means layering other (e.g. public key) encryption methods onto data stored in indexes is safe.

For a more in-depth overview of the security and attacker model of the object system, please see the DESIGN.md document.

Warning

This is an unreviewed piece of experimental security software.

DO NOT USE FOR CRITICAL WORKLOADS OR APPLICATIONS.

License

Released under the MIT and Apache 2 licenses.

Support

If you are interested in using Infinitree in your application, and would like to work with Symmetree Research Labs on features or implementation, get in touch.

Comments
  • RUSTSEC-2021-0127: serde_cbor is unmaintained

    RUSTSEC-2021-0127: serde_cbor is unmaintained

    serde_cbor is unmaintained

    | Details | | | ------------------- | ---------------------------------------------- | | Status | unmaintained | | Package | serde_cbor | | Version | 0.11.2 | | URL | https://github.com/pyfisch/cbor | | Date | 2021-08-15 |

    The serde_cbor crate is unmaintained. The author has archived the github repository.

    Alternatives proposed by the author:

    *ciborium *minicbor

    See advisory page for additional details.

    opened by github-actions[bot] 1
  • Expand unit testing

    Expand unit testing

    Much of the testing that the library received has come from the benchmark set in the zerostash repo. To improve maintenance, the projects now live in separate repos, and this gives a good opportunity to add more unit-tests in addition to better designed CI-compatible integration tests.

    enhancement 
    opened by rsdy 1
  • RUSTSEC-2020-0159: Potential segfault in `localtime_r` invocations

    RUSTSEC-2020-0159: Potential segfault in `localtime_r` invocations

    Potential segfault in localtime_r invocations

    | Details | | | ------------------- | ---------------------------------------------- | | Package | chrono | | Version | 0.4.19 | | URL | https://github.com/chronotope/chrono/issues/499 | | Date | 2020-11-10 |

    Impact

    Unix-like operating systems may segfault due to dereferencing a dangling pointer in specific circumstances. This requires an environment variable to be set in a different thread than the affected functions. This may occur without the user's knowledge, notably in a third-party library.

    Workarounds

    No workarounds are known.

    References

    See advisory page for additional details.

    opened by github-actions[bot] 1
  • Expand documentation

    Expand documentation

    Make sure there's adequate documentation and examples for different scenarios.

    • [ ] Add examples to the repo
    • [ ] Document infinitree::object
    • [ ] Document infinitree::index
    • [ ] Document infinitree::backends

    Please feel free to suggest in this issue any additional resources and improvements that you would like to see.

    documentation enhancement good first issue help wanted 
    opened by rsdy 0
  • RUSTSEC-2020-0071: Potential segfault in the time crate

    RUSTSEC-2020-0071: Potential segfault in the time crate

    Potential segfault in the time crate

    | Details | | | ------------------- | ---------------------------------------------- | | Package | time | | Version | 0.1.43 | | URL | https://github.com/time-rs/time/issues/293 | | Date | 2020-11-18 | | Patched versions | >=0.2.23 | | Unaffected versions | =0.2.0,=0.2.1,=0.2.2,=0.2.3,=0.2.4,=0.2.5,=0.2.6 |

    Impact

    Unix-like operating systems may segfault due to dereferencing a dangling pointer in specific circumstances. This requires an environment variable to be set in a different thread than the affected functions. This may occur without the user's knowledge, notably in a third-party library.

    The affected functions from time 0.2.7 through 0.2.22 are:

    • time::UtcOffset::local_offset_at
    • time::UtcOffset::try_local_offset_at
    • time::UtcOffset::current_local_offset
    • time::UtcOffset::try_current_local_offset
    • time::OffsetDateTime::now_local
    • time::OffsetDateTime::try_now_local

    The affected functions in time 0.1 (all versions) are:

    • at
    • at_utc

    Non-Unix targets (including Windows and wasm) are unaffected.

    Patches

    Pending a proper fix, the internal method that determines the local offset has been modified to always return None on the affected operating systems. This has the effect of returning an Err on the try_* methods and UTC on the non-try_* methods.

    Users and library authors with time in their dependency tree should perform cargo update, which will pull in the updated, unaffected code.

    Users of time 0.1 do not have a patch and should upgrade to an unaffected version: time 0.2.23 or greater or the 0.3. series.

    Workarounds

    No workarounds are known.

    References

    time-rs/time#293

    See advisory page for additional details.

    opened by github-actions[bot] 0
  • RUSTSEC-2022-0048: xml-rs is Unmaintained

    RUSTSEC-2022-0048: xml-rs is Unmaintained

    xml-rs is Unmaintained

    | Details | | | ------------------- | ---------------------------------------------- | | Status | unmaintained | | Package | xml-rs | | Version | 0.8.4 | | URL | https://github.com/netvl/xml-rs/issues | | Date | 2022-01-26 |

    xml-rs is a XML parser has open issues around parsing including integer overflows / panics that may or may not be an issue with untrusted data.

    Together with these open issues with Unmaintained status xml-rs may or may not be suited to parse untrusted data.

    Alternatives

    See advisory page for additional details.

    opened by github-actions[bot] 0
Releases(v0.9.0)
  • v0.9.0(Jul 25, 2022)

    This release is a fairly large extension of the cryptographic systems in Infinitree. Support for multiple crypto schemes have been added, including Yubikeys, and split key encryption through libsodium. These are gated through the yubikey and cryptobox features, respectively. Check out the documentation for details.

    The default encryption mode has also changed, and will transparently upgrade existing trees to the newer construct. This construct provides more flexibility and a more robust crypto implementation, that potentially re-used nonces and may have been vulnerable to partitioning oracle attacks, as noted here.

    In addition, the new crate, infinitree-backends, has been added to track non-essential backends to be used by infinitree. The API should be easy to grasp even though documentation is mostly lacking.

    Source code(tar.gz)
    Source code(zip)
  • v0.8.1(May 31, 2022)

    Storing a local cache of remotes is important. This release pulls together a more coherent story around caching, including forcing fresh updates on open, and properly maintaining a cache size while preserving as much of the index as possible.

    Source code(tar.gz)
    Source code(zip)
  • v0.8.0(May 25, 2022)

  • v0.3.0(Jan 7, 2022)

    List of improvements:

    • Add versioned LinkedList
    • Improved documentation
    • Collection types can now be used as Index
    • Cleaned up public interfaces
    • Stabilized S3 and directory cache components
    • Fix memory management issue on Windows when using mmap
    • q tests and benchmarks
    • Update dependencies
    Source code(tar.gz)
    Source code(zip)
  • v0.2.2(Nov 18, 2021)

  • v0.1.0(Oct 15, 2021)

Owner
Symmetree Research Labs
Fast, secure dataplane
Symmetree Research Labs
A Distributed SQL Database - Building the Database in the Public to Learn Database Internals

Table of Contents Overview Usage TODO MVCC in entangleDB SQL Query Execution in entangleDB entangleDB Raft Consensus Engine What I am trying to build

Sarthak Dalabehera 38 Jan 2, 2024
RedisLess is a fast, lightweight, embedded and scalable in-memory Key/Value store library compatible with the Redis API.

RedisLess is a fast, lightweight, embedded and scalable in-memory Key/Value store library compatible with the Redis API.

Qovery 145 Nov 23, 2022
A scalable, distributed, collaborative, document-graph database, for the realtime web

is the ultimate cloud database for tomorrow's applications Develop easier. Build faster. Scale quicker. What is SurrealDB? SurrealDB is an end-to-end

SurrealDB 16.9k Jan 8, 2023
ReadySet is a lightweight SQL caching engine written in Rust that helps developers enhance the performance and scalability of existing applications.

ReadySet is a SQL caching engine designed to help developers enhance the performance and scalability of their existing database-backed applications. W

ReadySet 1.7k Jan 8, 2023
Embedded graph database

CQLite An embedded graph database implemented in Rust. This is currently a pre-release. It has not been extensively tested with 'real-world work-loads

Tilman Roeder 81 Dec 27, 2022
Embedded graph database

CQLite An embedded graph database implemented in Rust. This is currently a pre-release. It has not been extensively tested with 'real-world work-loads

Tilman Roeder 82 Dec 31, 2022
Using embedded database modeled off SQLite - in Rust

Rust-SQLite (SQLRite) Rust-SQLite, aka SQLRite , is a simple embedded database modeled off SQLite, but developed with Rust. The goal is get a better u

Hand of Midas 3 May 19, 2023
A tiny embedded database built in Rust.

TinyBase TinyBase is an in-memory database built with Rust, based on the sled embedded key-value store. It supports indexing and constraints, allowing

Josh Rudnik 8 May 27, 2023
Scalable and fast data store optimised for time series data such as financial data, events, metrics for real time analysis

OnTimeDB Scalable and fast data store optimised for time series data such as financial data, events, metrics for real time analysis OnTimeDB is a time

Stuart 2 Apr 5, 2022
The most efficient, scalable, and fast production-ready serverless REST API backend which provides CRUD operations for a MongoDB collection

Optimal CRUD Mongo Goals of This Project This is meant to be the most efficient, scalable, and fast production-ready serverless REST API backend which

Evaluates2 1 Feb 22, 2022
A highly scalable MySQL Proxy framework written in Rust

mysql-proxy-rs An implementation of a MySQL proxy server built on top of tokio-core. Overview This crate provides a MySQL proxy server that you can ex

AgilData 175 Dec 19, 2022
Skybase is an extremely fast, secure and reliable real-time NoSQL database with automated snapshots and SSL

Skybase The next-generation NoSQL database What is Skybase? Skybase (or SkybaseDB/SDB) is an effort to provide the best of key/value stores, document

Skybase 1.4k Dec 29, 2022
Skytable is an extremely fast, secure and reliable real-time NoSQL database with automated snapshots and TLS

Skytable is an effort to provide the best of key/value stores, document stores and columnar databases, that is, simplicity, flexibility and queryability at scale. The name 'Skytable' exemplifies our vision to create a database that has limitless possibilities. Skytable was previously known as TerrabaseDB (and then Skybase) and is also nicknamed "STable", "Sky" and "SDB" by the community.

Skytable 1.4k Dec 29, 2022
rust_arango enables you to connect with ArangoDB server, access to database, execute AQL query, manage ArangoDB in an easy and intuitive way, both async and plain synchronous code with any HTTP ecosystem you love.

rust_arango enables you to connect with ArangoDB server, access to database, execute AQL query, manage ArangoDB in an easy and intuitive way, both async and plain synchronous code with any HTTP ecosystem you love.

Foretag 3 Mar 24, 2022
ReefDB is a minimalistic, in-memory and on-disk database management system written in Rust, implementing basic SQL query capabilities and full-text search.

ReefDB ReefDB is a minimalistic, in-memory and on-disk database management system written in Rust, implementing basic SQL query capabilities and full-

Sacha Arbonel 75 Jun 12, 2023
Simple, async embedded Rust

Cntrlr - Simple, asynchronous embedded Cntrlr is an all-in-one embedded platform for writing simple asynchronous applications on top of common hobbyis

Branan Riley 11 Jun 3, 2021
Sled - the champagne of beta embedded databases

key value buy a coffee for us to convert into databases documentation chat about databases with us sled - it's all downhill from here!!! An embedded d

Tyler Neely 6.6k Jan 8, 2023
A simple embedded key-value store written in rust as a learning project

A simple embedded key-value store written in rust as a learning project

Blobcode 1 Feb 20, 2022
AgateDB is an embeddable, persistent and fast key-value (KV) database written in pure Rust

AgateDB is an embeddable, persistent and fast key-value (KV) database written in pure Rust. It is designed as an experimental engine for the TiKV project, and will bring aggressive optimizations for TiKV specifically.

TiKV Project 535 Jan 9, 2023