SmollDB

Small in-memory key value database for rust

This is a small in-memory key-value database, which can be easly backed up in a file or stream and later loaded from it

It ain't much but it's honest work

This database is nothing but an hashmap, it already comes with function to easly load and save the hashmap on file though, so you don't have to implement it. It also compress it in a Zlib compatible format

Inspired by Pickles

This crate was inspired by pickleDB and it works in similars use cases

Now with streams

since 0.4.0

You can use function load_from_stream to load from anything that implements std::io::Read, and you can use backup_to_stream to backup on antything that implements std::io::Write

Some examples

Basic use

let mut db = SmollDB::default();
db.set("Nome", "Mario".to_string());
db.set("Eta", 34_i16);
db.set("Stinky", true);
db.set("Height", 23.3_f32);
db.set("CF", vec![0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]);
assert_eq!(DataType::STRING("Mario".to_string()),*(db.get(&"Nome").unwrap()));
assert_eq!(DataType::INT16(34_i16), *(db.get(&"Eta").unwrap()));
assert_eq!(DataType::BOOL(true), *(db.get(&"Stinky").unwrap()));
assert_eq!(DataType::FLOAT32(23.3_f32), *(db.get(&"Height").unwrap()));
assert_eq!(DataType::BYTES(vec![0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]),*(db.get(&"CF").unwrap()));

Loading from file

let mut db = SmollDB::default();
db.set("bool", false);
db.set("int8", 8_i8);
db.set("int16", 8_i16);
db.set("int32", 8_i32);
db.set("int64", 8_i64);
db.set("float32", 4_f32);
db.set("float64", 4_f64);
db.set("string", String::from("8_i8"));
db.set("bytes",vec![1, 2, 3, 4, 5, 6, 7, 8, 243, 123,46, 11, 123, 65, 2, 3, 5, 7, 2,],);
db.backup(&"database").unwrap();
let db_copy = SmollDB::load(&"database").unwrap();
assert_eq!(db, db_copy);

Load and backup from stream

let mut database = SmollDB::default();
let mut stream = OpenOptions::new().create(true).read(true).write(true).open("myfile.smoll").unwrap();
let data = String::from("data");
let key = String::from("example");
database.set(key.clone(), data.clone());
database.backup_to_stream(&mut stream).unwrap();
stream.seek(std::io::SeekFrom::Start(0)).unwrap();
let database = SmollDB::load_from_stream(&mut stream).unwrap();
let result = database.get(&key).unwrap();
assert_eq!(*result, DataType::STRING(data));

It looks like you're still pretty new to Rust, so here is the code review you didn't ask for :D

These are mostly just nits with some more serious points sprinkled throughout (I'm too lazy to go back and try to organize points)

You specify both license and license-file in your Cargo.toml when only one is needed. cargo warns you about this when it's run

There is a myfile.smoll comitted to the repo even though *.smoll is ignored in the .gitignore. I'm assuming the file shouldn't be there

There is a license header included at the top of every source file. (Disclaimer: I'm not a lawyer) from my understanding this doesn't provide any benefit over just using a LICENSE file, but I know some licenses require that the header is kept if it was present at any time which means you can't always remove it when you already have it. You may want to stick to just using LICENSE files in the future to avoid the headache

The way you expose your public API is a bit odd. You have some vital things that I wouldn't consider utils provided in a util module like the type returned from the db and the error type. There are so few types that I would probably just expose everything from the library root i.e. smolldb::{DataType, SmollDB, SmollError}. This of course doesn't mean that you need to have everything in one file. You would just keep them private and do a public reexport in lib.rs

You use a lot of non-standard style. Most of that can be handled automatically by running cargo fmt (not having to deal with manual formatting is a huge plus for me). The other bits are just from the way things are laid out and also naming. Something like

#[derive(Debug,PartialEq)]

///Object to represent the in memory database
pub struct SmollDB {
    inner: HashMap<String, DataType>,
}

looks very off to me. Normally you would see it as

///Object to represent the in memory database
#[derive(Debug,PartialEq)]
pub struct SmollDB {
    inner: HashMap<String, DataType>,
}

As for naming the two things I see are SmollDB which would be SmollDb by Rust convention (things like acronyms only have the first letter capitalized for PascalCase in Rust). Also the variant names in DataType are all capitalized when the convention is to use PascalCase there too (like you did for SmollError)

A couple of notes on how the errors are defined. The really small nit is that normally libraries will provide a Result type definition for their custom error type. You normally see this named either Error and Result or LibraryNameError and LibraryNameResul. Since you're already using the latter it would just be

pub type SmollResult<T> = Result<T, SmollError>;

This can help clean up some of the method signatures in db.rs.

The other note is that your error types don't provide a lot of context which can make troubleshooting errors more tedious. To give a basic example calling SmollDB::save_file may return a SmollError::SaveFileError. I would have a hard time figuring out what failed from this error type alone. Maybe I don't have permission to write a file, maybe the folder in the path doesn't exist, or maybe the filesystem is full, I can't discern any of that from the returned error alone. Since both of the possible underlying errors are just std::io::Errors it would make sense to change that variant to SmollError::SaveFileError(std::io::Error) where the underlying error is captured and gets bubbled up.

Run cargo clippy and fix the lints it points out. Some of the lints can be crucial things, but at the very least it will help to make your Rust code more idiomatic.

There are some unnecessary manual trait implementations that can just be derives. The ones I noticed were Default for SmollDB and PartialEq, Debug for DataType. I would also just use the Debug derive for SmollError instead of using the more detailed description. You have a doc-comment on Default for SmollDB, but that doesn't actually get used as a doc-comment since doc-comments are rendered for traits definition, not their implementation

Super-nit: The variable l0 looks a whole lot like 10. I would avoid using l as the only letter in variable names

The docs could use some module level documentation. These are just made from doing

//! This is a module doc comment

at the top of the module

Your tests all only use the public interface, so they could all be integration tests instead of unit tests aka they could live in a tests folder in the crate root instead of being within the src folder.

Tests are run in parallel, but a lot of your tests write to the same database file which can lead to conflicts when multiple tests are writing to this same file at the same time. You'll see this happen where there are random failues when you run

cargo test backup_and_load

which magically gets resolved when you force the use of one test thread (note: Don't rely on this. cargo test should generally "Just work")

cargo test backup_and_load -- --test-threads 1

A common way to handle this is to specify that specific tests should be run in series with something like serial_test

Another test note is that all of your doc-tests fail. I would set any doc-tests that save a database to not be run (by changing the opening code fence to ```no_run) since it's not easy to run them serially. This will still make sure that the code example compiles, it just won't run it as a test.

Other failures seem to be from missing imports. If you want to avoid displaying the import in the rendered docs then you can comment out lines with a # e.g.

/// ```
/// # use smolldb::{db::SmollDb, util::DataType};  // <- This bit here
/// let mut database = SmollDB::default();
/// let data = String::from("data");
/// database.set("example",data.clone());
/// match database.get("example"){
///     Some(result) => {
///         assert_eq!(*result, DataType::STRING(data))
///     },
///     None => todo!(),
/// };
/// ```

The last bit of issues is just from broken code examples which are easy to fix

rand is listed in the dependencies, but isn't used, so it can be removed. cargo-udeps can detect this automatically e.g.

$ cargo +nightly udeps
...
info: Loading depinfo from "/.../debug/deps/smolldb-82ba8e747bff79d0.d"
unused dependencies:
`smolldb v0.1.0 (/.../SmollDB)`
└─── dependencies
     └─── "rand"

Your encoding of the keys in the database is problematic in both how it is encoded and decoded. The encoding converts the string to bytes and then null-terminates it which doesn't handle all strings because a NULL byte is totally valid in UTF-8 strings e.g. std::str::from_utf8(&[0]).is_ok() returns true. This means that the "end" of the string can be earlier than it should be

The issue with decoding is that encoding stores the string as bytes whereas decoding treats each byte as a char which only works for ASCII text. I would personally just length-prefix the strings instead of null-terminating and then read the bytes into a Vec<u8> that gets converted to a string with String::from_utf8(). Adding in some basic fuzz-testing would catch this class of issues really easily

There is this pattern used in a few places

match foo {
    Some(bar) => ...,
    None => todo!(),
}

in several cases where I would expect to just see a .unwrap()

There's some more small stuff, but this is already a wall of text at this point, so I'll stop

A Rust implementation of HyperLogLog trying to be parsimonious with memory.

🧮 HyperLogLog-rs This is a Rust library that provides an implementation of the HyperLogLog (HLL) algorithm, trying to be parsimonious with memory. Wh

10 May 16, 2023

Super-simple, fully Rust powered "memory" (doc store + semantic search) for LLM projects, semantic search, etc.

memex Super simple "memory" for LLM projects, semantic search, etc. Running the service Note that if you're running on Apple silicon (M1/M2/etc.), it'

15 Jun 19, 2023

Rust library for concurrent data access, using memory-mapped files, zero-copy deserialization, and wait-free synchronization.

mmap-sync mmap-sync is a Rust crate designed to manage high-performance, concurrent data access between a single writer process and multiple reader pr

97 Jun 26, 2023

Uncalled for Code Review
It looks like you're still pretty new to Rust, so here is the code review you didn't ask for :D

These are mostly just nits with some more serious points sprinkled throughout (I'm too lazy to go back and try to organize points)

You specify both license and license-file in your Cargo.toml when only one is needed. cargo warns you about this when it's run

There is a myfile.smoll comitted to the repo even though *.smoll is ignored in the .gitignore. I'm assuming the file shouldn't be there

There is a license header included at the top of every source file. (Disclaimer: I'm not a lawyer) from my understanding this doesn't provide any benefit over just using a LICENSE file, but I know some licenses require that the header is kept if it was present at any time which means you can't always remove it when you already have it. You may want to stick to just using LICENSE files in the future to avoid the headache

The way you expose your public API is a bit odd. You have some vital things that I wouldn't consider utils provided in a util module like the type returned from the db and the error type. There are so few types that I would probably just expose everything from the library root i.e. smolldb::{DataType, SmollDB, SmollError}. This of course doesn't mean that you need to have everything in one file. You would just keep them private and do a public reexport in lib.rs

You use a lot of non-standard style. Most of that can be handled automatically by running cargo fmt (not having to deal with manual formatting is a huge plus for me). The other bits are just from the way things are laid out and also naming. Something like

#[derive(Debug,PartialEq)] ///Object to represent the in memory database pub struct SmollDB { inner: HashMap<String, DataType>, }

looks very off to me. Normally you would see it as

///Object to represent the in memory database #[derive(Debug,PartialEq)] pub struct SmollDB { inner: HashMap<String, DataType>, }

As for naming the two things I see are SmollDB which would be SmollDb by Rust convention (things like acronyms only have the first letter capitalized for PascalCase in Rust). Also the variant names in DataType are all capitalized when the convention is to use PascalCase there too (like you did for SmollError)

A couple of notes on how the errors are defined. The really small nit is that normally libraries will provide a Result type definition for their custom error type. You normally see this named either Error and Result or LibraryNameError and LibraryNameResul. Since you're already using the latter it would just be

pub type SmollResult<T> = Result<T, SmollError>;

This can help clean up some of the method signatures in db.rs.

The other note is that your error types don't provide a lot of context which can make troubleshooting errors more tedious. To give a basic example calling SmollDB::save_file may return a SmollError::SaveFileError. I would have a hard time figuring out what failed from this error type alone. Maybe I don't have permission to write a file, maybe the folder in the path doesn't exist, or maybe the filesystem is full, I can't discern any of that from the returned error alone. Since both of the possible underlying errors are just std::io::Errors it would make sense to change that variant to SmollError::SaveFileError(std::io::Error) where the underlying error is captured and gets bubbled up.

Run cargo clippy and fix the lints it points out. Some of the lints can be crucial things, but at the very least it will help to make your Rust code more idiomatic.

There are some unnecessary manual trait implementations that can just be derives. The ones I noticed were Default for SmollDB and PartialEq, Debug for DataType. I would also just use the Debug derive for SmollError instead of using the more detailed description. You have a doc-comment on Default for SmollDB, but that doesn't actually get used as a doc-comment since doc-comments are rendered for traits definition, not their implementation

Super-nit: The variable l0 looks a whole lot like 10. I would avoid using l as the only letter in variable names

The docs could use some module level documentation. These are just made from doing

//! This is a module doc comment

at the top of the module

Your tests all only use the public interface, so they could all be integration tests instead of unit tests aka they could live in a tests folder in the crate root instead of being within the src folder.

Tests are run in parallel, but a lot of your tests write to the same database file which can lead to conflicts when multiple tests are writing to this same file at the same time. You'll see this happen where there are random failues when you run

cargo test backup_and_load

which magically gets resolved when you force the use of one test thread (note: Don't rely on this. cargo test should generally "Just work")

cargo test backup_and_load -- --test-threads 1

A common way to handle this is to specify that specific tests should be run in series with something like serial_test

Another test note is that all of your doc-tests fail. I would set any doc-tests that save a database to not be run (by changing the opening code fence to ```no_run) since it's not easy to run them serially. This will still make sure that the code example compiles, it just won't run it as a test.

Other failures seem to be from missing imports. If you want to avoid displaying the import in the rendered docs then you can comment out lines with a # e.g.

/// ``` /// # use smolldb::{db::SmollDb, util::DataType}; // <- This bit here /// let mut database = SmollDB::default(); /// let data = String::from("data"); /// database.set("example",data.clone()); /// match database.get("example"){ /// Some(result) => { /// assert_eq!(*result, DataType::STRING(data)) /// }, /// None => todo!(), /// }; /// ```

The last bit of issues is just from broken code examples which are easy to fix

rand is listed in the dependencies, but isn't used, so it can be removed. cargo-udeps can detect this automatically e.g.

$ cargo +nightly udeps ... info: Loading depinfo from "/.../debug/deps/smolldb-82ba8e747bff79d0.d" unused dependencies: `smolldb v0.1.0 (/.../SmollDB)` └─── dependencies └─── "rand"

Your encoding of the keys in the database is problematic in both how it is encoded and decoded. The encoding converts the string to bytes and then null-terminates it which doesn't handle all strings because a NULL byte is totally valid in UTF-8 strings e.g. std::str::from_utf8(&[0]).is_ok() returns true. This means that the "end" of the string can be earlier than it should be

The issue with decoding is that encoding stores the string as bytes whereas decoding treats each byte as a char which only works for ASCII text. I would personally just length-prefix the strings instead of null-terminating and then read the bytes into a Vec<u8> that gets converted to a string with String::from_utf8(). Adding in some basic fuzz-testing would catch this class of issues really easily

There is this pattern used in a few places

match foo { Some(bar) => ..., None => todo!(), }

in several cases where I would expect to just see a .unwrap()

There's some more small stuff, but this is already a wall of text at this point, so I'll stop
opened by LovecraftianHorror 1