A small in-memory key value database for rust

Overview

SmollDB

Small in-memory key value database for rust

This is a small in-memory key-value database, which can be easly backed up in a file or stream and later loaded from it


It ain't much but it's honest work

This database is nothing but an hashmap, it already comes with function to easly load and save the hashmap on file though, so you don't have to implement it. It also compress it in a Zlib compatible format

Inspired by Pickles

This crate was inspired by pickleDB and it works in similars use cases

Now with streams

since 0.4.0

You can use function load_from_stream to load from anything that implements std::io::Read, and you can use backup_to_stream to backup on antything that implements std::io::Write


Some examples

Basic use

let mut db = SmollDB::default();
db.set("Nome", "Mario".to_string());
db.set("Eta", 34_i16);
db.set("Stinky", true);
db.set("Height", 23.3_f32);
db.set("CF", vec![0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]);
assert_eq!(DataType::STRING("Mario".to_string()),*(db.get(&"Nome").unwrap()));
assert_eq!(DataType::INT16(34_i16), *(db.get(&"Eta").unwrap()));
assert_eq!(DataType::BOOL(true), *(db.get(&"Stinky").unwrap()));
assert_eq!(DataType::FLOAT32(23.3_f32), *(db.get(&"Height").unwrap()));
assert_eq!(DataType::BYTES(vec![0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]),*(db.get(&"CF").unwrap()));

Loading from file

let mut db = SmollDB::default();
db.set("bool", false);
db.set("int8", 8_i8);
db.set("int16", 8_i16);
db.set("int32", 8_i32);
db.set("int64", 8_i64);
db.set("float32", 4_f32);
db.set("float64", 4_f64);
db.set("string", String::from("8_i8"));
db.set("bytes",vec![1, 2, 3, 4, 5, 6, 7, 8, 243, 123,46, 11, 123, 65, 2, 3, 5, 7, 2,],);
db.backup(&"database").unwrap();
let db_copy = SmollDB::load(&"database").unwrap();
assert_eq!(db, db_copy);

Load and backup from stream

let mut database = SmollDB::default();
let mut stream = OpenOptions::new().create(true).read(true).write(true).open("myfile.smoll").unwrap();
let data = String::from("data");
let key = String::from("example");
database.set(key.clone(), data.clone());
database.backup_to_stream(&mut stream).unwrap();
stream.seek(std::io::SeekFrom::Start(0)).unwrap();
let database = SmollDB::load_from_stream(&mut stream).unwrap();
let result = database.get(&key).unwrap();
assert_eq!(*result, DataType::STRING(data));
You might also like...
Library and proc macro to analyze memory usage of data structures in rust.
Library and proc macro to analyze memory usage of data structures in rust.

Allocative: memory profiler for Rust This crate implements a lightweight memory profiler which allows object traversal and memory size introspection.

A Rust implementation of HyperLogLog trying to be parsimonious with memory.

🧮 HyperLogLog-rs This is a Rust library that provides an implementation of the HyperLogLog (HLL) algorithm, trying to be parsimonious with memory. Wh

Super-simple, fully Rust powered
Super-simple, fully Rust powered "memory" (doc store + semantic search) for LLM projects, semantic search, etc.

memex Super simple "memory" for LLM projects, semantic search, etc. Running the service Note that if you're running on Apple silicon (M1/M2/etc.), it'

Rust library for concurrent data access, using memory-mapped files, zero-copy deserialization, and wait-free synchronization.

mmap-sync mmap-sync is a Rust crate designed to manage high-performance, concurrent data access between a single writer process and multiple reader pr

Blazing fast, memory safe & modern Linux package manager written in Rust.

paket Blazing fast, memory safe & modern Linux package manager written in Rust. Roadmap Version: 0.1 Paket.toml file parsing. (#1, #2) CLI handling (p

Proof-of-concept for a memory-efficient data structure for zooming billion-event traces

Proof-of-concept for a gigabyte-scale trace viewer This repo includes: A memory-efficient representation for event traces An unusually simple and memo

This crate allows to generate a flat binary with the memory representation of an ELF.

flatelf Library This crate allows to generate a flat binary with the memory representation of an ELF. It also allows to generate a FLATELF with the fo

High concurrency, RealTime, In-memory storage inspired by erlang mnesia
High concurrency, RealTime, In-memory storage inspired by erlang mnesia

DarkBird is a Document oriented, high concurrency in-memory Storage, also persist data to disk to avoid loss any data The darkbird provides the follow

Support SIMD low-memory overhead and high-performance adaptive radix tree.

Artful Artful is an adaptive radix tree library for Rust. At a high-level, it's like a BTreeMap. It is based on the implementation of paper, see The A

Comments
  • Uncalled for Code Review

    Uncalled for Code Review

    It looks like you're still pretty new to Rust, so here is the code review you didn't ask for :D

    These are mostly just nits with some more serious points sprinkled throughout (I'm too lazy to go back and try to organize points)


    You specify both license and license-file in your Cargo.toml when only one is needed. cargo warns you about this when it's run


    There is a myfile.smoll comitted to the repo even though *.smoll is ignored in the .gitignore. I'm assuming the file shouldn't be there


    There is a license header included at the top of every source file. (Disclaimer: I'm not a lawyer) from my understanding this doesn't provide any benefit over just using a LICENSE file, but I know some licenses require that the header is kept if it was present at any time which means you can't always remove it when you already have it. You may want to stick to just using LICENSE files in the future to avoid the headache


    The way you expose your public API is a bit odd. You have some vital things that I wouldn't consider utils provided in a util module like the type returned from the db and the error type. There are so few types that I would probably just expose everything from the library root i.e. smolldb::{DataType, SmollDB, SmollError}. This of course doesn't mean that you need to have everything in one file. You would just keep them private and do a public reexport in lib.rs


    You use a lot of non-standard style. Most of that can be handled automatically by running cargo fmt (not having to deal with manual formatting is a huge plus for me). The other bits are just from the way things are laid out and also naming. Something like

    #[derive(Debug,PartialEq)]
    
    ///Object to represent the in memory database
    pub struct SmollDB {
        inner: HashMap<String, DataType>,
    }
    

    looks very off to me. Normally you would see it as

    ///Object to represent the in memory database
    #[derive(Debug,PartialEq)]
    pub struct SmollDB {
        inner: HashMap<String, DataType>,
    }
    

    As for naming the two things I see are SmollDB which would be SmollDb by Rust convention (things like acronyms only have the first letter capitalized for PascalCase in Rust). Also the variant names in DataType are all capitalized when the convention is to use PascalCase there too (like you did for SmollError)


    A couple of notes on how the errors are defined. The really small nit is that normally libraries will provide a Result type definition for their custom error type. You normally see this named either Error and Result or LibraryNameError and LibraryNameResul. Since you're already using the latter it would just be

    pub type SmollResult<T> = Result<T, SmollError>;
    

    This can help clean up some of the method signatures in db.rs.

    The other note is that your error types don't provide a lot of context which can make troubleshooting errors more tedious. To give a basic example calling SmollDB::save_file may return a SmollError::SaveFileError. I would have a hard time figuring out what failed from this error type alone. Maybe I don't have permission to write a file, maybe the folder in the path doesn't exist, or maybe the filesystem is full, I can't discern any of that from the returned error alone. Since both of the possible underlying errors are just std::io::Errors it would make sense to change that variant to SmollError::SaveFileError(std::io::Error) where the underlying error is captured and gets bubbled up.


    Run cargo clippy and fix the lints it points out. Some of the lints can be crucial things, but at the very least it will help to make your Rust code more idiomatic.


    There are some unnecessary manual trait implementations that can just be derives. The ones I noticed were Default for SmollDB and PartialEq, Debug for DataType. I would also just use the Debug derive for SmollError instead of using the more detailed description. You have a doc-comment on Default for SmollDB, but that doesn't actually get used as a doc-comment since doc-comments are rendered for traits definition, not their implementation


    Super-nit: The variable l0 looks a whole lot like 10. I would avoid using l as the only letter in variable names


    The docs could use some module level documentation. These are just made from doing

    //! This is a module doc comment
    

    at the top of the module


    Your tests all only use the public interface, so they could all be integration tests instead of unit tests aka they could live in a tests folder in the crate root instead of being within the src folder.


    Tests are run in parallel, but a lot of your tests write to the same database file which can lead to conflicts when multiple tests are writing to this same file at the same time. You'll see this happen where there are random failues when you run

    cargo test backup_and_load
    

    which magically gets resolved when you force the use of one test thread (note: Don't rely on this. cargo test should generally "Just work")

    cargo test backup_and_load -- --test-threads 1
    

    A common way to handle this is to specify that specific tests should be run in series with something like serial_test


    Another test note is that all of your doc-tests fail. I would set any doc-tests that save a database to not be run (by changing the opening code fence to ```no_run) since it's not easy to run them serially. This will still make sure that the code example compiles, it just won't run it as a test.

    Other failures seem to be from missing imports. If you want to avoid displaying the import in the rendered docs then you can comment out lines with a # e.g.

    /// ```
    /// # use smolldb::{db::SmollDb, util::DataType};  // <- This bit here
    /// let mut database = SmollDB::default();
    /// let data = String::from("data");
    /// database.set("example",data.clone());
    /// match database.get("example"){
    ///     Some(result) => {
    ///         assert_eq!(*result, DataType::STRING(data))
    ///     },
    ///     None => todo!(),
    /// };
    /// ```
    

    The last bit of issues is just from broken code examples which are easy to fix


    rand is listed in the dependencies, but isn't used, so it can be removed. cargo-udeps can detect this automatically e.g.

    $ cargo +nightly udeps
    ...
    info: Loading depinfo from "/.../debug/deps/smolldb-82ba8e747bff79d0.d"
    unused dependencies:
    `smolldb v0.1.0 (/.../SmollDB)`
    └─── dependencies
         └─── "rand"
    

    Your encoding of the keys in the database is problematic in both how it is encoded and decoded. The encoding converts the string to bytes and then null-terminates it which doesn't handle all strings because a NULL byte is totally valid in UTF-8 strings e.g. std::str::from_utf8(&[0]).is_ok() returns true. This means that the "end" of the string can be earlier than it should be

    The issue with decoding is that encoding stores the string as bytes whereas decoding treats each byte as a char which only works for ASCII text. I would personally just length-prefix the strings instead of null-terminating and then read the bytes into a Vec<u8> that gets converted to a string with String::from_utf8(). Adding in some basic fuzz-testing would catch this class of issues really easily


    There is this pattern used in a few places

    match foo {
        Some(bar) => ...,
        None => todo!(),
    }
    

    in several cases where I would expect to just see a .unwrap()


    There's some more small stuff, but this is already a wall of text at this point, so I'll stop

    opened by LovecraftianHorror 1
Owner
null
An embedded key-value storage for learning purpose, which is based on the idea of SSTable / LSM-tree.

Nouzdb An embedded key-value storage for learning purpose, which is based on the idea of SSTable / LSM-tree. Plan Implement a memtable. Implement the

Nouzan 1 Dec 5, 2021
Key-value store for embedded systems, for raw NOR flash, using an LSM-Tree.

ekv Key-value store for embedded systems, for raw NOR flash, using an LSM-Tree. Features None yet TODO Everything Minimum supported Rust version (MSRV

Dario Nieuwenhuis 16 Nov 22, 2022
hashmap macro for creating hashmap from provided key/value pairs

HashMap Macro Creates a HashMap from provided key/value pairs. Usage use std::collections::HashMap; use hashmap_macro::hashmap; let m: HashMap<&str,

null 6 Oct 2, 2022
Vemcache is an in-memory vector database.

Vemcache Vemcache is an in-memory vector database. Vemcache can be thought of as the Redis equivalent for vector databases. Getting Started Prerequisi

Faizaan Chishtie 8 May 21, 2023
RcLite: small, fast, and memory-friendly reference counting for Rust

RcLite: small, fast, and memory-friendly reference counting RcLite is a lightweight reference-counting solution for Rust that serves as an alternative

Khashayar Fereidani 147 Apr 14, 2023
Build database expression type checker and vectorized runtime executor in type-safe Rust

Typed Type Exercise in Rust Build database expression type checker and vectorized runtime executor in type-safe Rust. This project is highly inspired

Andy Lok 89 Dec 27, 2022
rust database for you to use and help me make!

Welcome To Rust Database! What is this? this is a database for you to git clone and use in your project! Why should i use it? It is fast and it takes

Carghai74 2 Dec 4, 2022
An AI-native lightweight, reliable, and high performance open-source vector database.

What is OasysDB? OasysDB is a vector database that can be used to store and query high-dimensional vectors. Our goal is to make OasysDB fast and easy

Oasys 3 Dec 25, 2023
A additional Rust compiler pass to detect memory safe bugs of Rust programs.

SafeDrop A additional Rust compiler pass to detect memory safe bugs of Rust programs. SafeDrop performs path-sensitive and field-sensitive inter-proce

Artisan-Lab  (Fn*) 5 Nov 25, 2022