Non-volatile, distributed file cache backed by content-addressed storage

Modal Labs

Last update: Dec 31, 2022

Related tags

Caching blobnet

Overview

blobnet

A low-latency file server that responds to requests for chunks of file data.

This acts as a non-volatile, over-the-network content cache. Internal users can add binary blobs to the cache, and the data is indexed by its SHA-256 hash. Any blob can be retrieved by its hash and range of bytes to read.

Data stored in this server is locally cached, backed by NFS, and durable.

Usage

Run cargo install blobnet and see the options in the CLI. The server supports only two types of requests (GET / PUT), and an example client in Rust is located in the blobnet library crate.

Comments

Implement blobnet v0.2: flexible providers and faster caching
Blobnet v0.2 — Design

Resolves: MOD-306, MOD-542

Blobnet: An embedded caching, content-addressed blob storage system with configurable sources and proxies.

Issues with Blobnet v0.1

We have to download the entire file even if we only read a range, for caching reasons.

There is no way of implementing local caching on the worker.

The system isn’t flexible enough to allow for a gradual migration to S3 (or even just trying it out as another source of truth).

Proposal for Blobnet v0.2

The blobnet system is configured with an ordered set of sources, plus a cache.

Having two sources allows us to gradually transition from NFS to S3 without downtime.

let blobnet = Blobnet::builder() .source(provider::S3::new("modal-blobnet")) .source(provider::NFS::new("/efs/blobnet")) .cache("/var/tmp/.blobnet-cache", 1 << 21) .build();

Each provider has the following API:

#[async_trait] trait Provider { async fn read(hash: &str, range: (u64, u64)) -> Result<Option<Bytes>>; // Return the data async fn write(data: Bytes) -> Result<String>; // Returns the hash of the data }

All functions are fallible. The blobnet server has a Blobnet struct, and the worker client also has a Blobnet struct, which allows them to share SSD (local instance storage) caching logic.

The difference is that the worker client just uses its Blobnet struct to handle imagefs requests, while the blobnet server uses its struct to serve requests over HTTP.

S3 Provider

Takes the S3 bucket as an argument, also probably an instance of aws_sdk_s3's Client object to interact with them. Writes complete files to chunks in S3 by SHA-256 hash, such as /aa/bb/cc/dddddddddddddddddddddddddddddddddddddddddddddddddddddddddd.

NFS Provider

Takes a local directory as input, checks that it is a network file system on creation, and writes complete files atomically to that directory. Similar to S3-provider.

“Client Proxy” Provider

This acts as a client to a running blobnet server. This can be used on the worker itself to connect to the blobnet instance over HTTP, but then also include a cache to reuse the exact same caching logic on its local file system. Specifically:

let blobnet = Blobnet::builder() .source(provider::ClientProxy::new("http://blobnet.modal.internal")) .cache("/var/tmp/.blobnet-cache", 1 << 21) .build();

Cache

Saves chunks of files within a certain page size in the cache whenever something is read. All file reads go through the cache first before hitting the provider if it is missing. Note that this page size is configurable. In the examples above the page size is set to 2 MB (1 << 21) but this is configurable, since it only affects the cache.

The cache is a local file system directory. But the files saved are in the format /aa/bb/cc/ddd...dd/<chunk-num> because we don’t want to download the entire file if our goal is to execute a small read.

Update (Oct. 26): I think I’ll first benchmark Sled as a cache instead of EBS. It has the potential to be a lot faster, especially given its built in page cache and file defragmentation. Let’s see what happens. It doesn’t seem to be mature enough yet though, maybe RocksDB is better?

Update 2: Okay, I did some quick measurements, and with reasonable confidence I think using the file system directly is better. However there is still quite some latency here and we might want to also include an in-memory LRU cache too — this is the difference between 200-400 us for a 2 MB read with warm page cache, and 700-2800 ns for the same 2 MB read from memory.
opened by ekzhang 4
Add statistical benchmarks for a simulated image load
I organized the benchmarks into groups and added two new benchmarks:

image_delayed/cold — load an "image" by sequentially fully reading all files in a list with simulated random delay of mean 400 µs, where the files are:

128 files of size 0-1KB

10 files of size 1-10 MB

1 file of size 1 GB

image_delayed/warm — same as above, but a local memory + disk cache (non-delayed) has already been populated

400 µs is around the order of magnitude, or actually maybe twice as large as, the delay we were getting from blobnet over the network. The hope is that this benchmark can replicate the difference between a cold and warm cache, and we can use it to measure and tune prefetching strategies.

Note: I should also add a variable component to the delay that is proportional to number of bytes transferred.
opened by ekzhang 3
Add statistical benchmark with criterion

I want to start take a look at system calls like read_at(), and to know whether changes like this actually affect performance, we need a benchmarking library. This PR adds criterion and some basic benchmarks.

Here's what the output looks like for removing a stat operation in #11 for example (when running on my MacBook Pro, I need to run this on Linux machines to actually get relevant results, will do after checking out this branch):

The improvement in the memory case is not important; just my computer not being a perfect deterministic environment.

opened by ekzhang 3
Return total file length in blobnet header

Planning on storing mount sizes in the server. Returning file length in the GET response header, so we can add up the final mount size in the server MountBuild method.

opened by aksh-at 3
Optimize file reads: larger buffer, uninit memory, and pread(2)

This change was motivated by noticing that the read buffer was implicitly configured to be 16 KiB by tokio::fs::File and could not be changed, which made large file operations very inefficient.

I ended up doing a bit more though, since this is a pretty important part of the system. A quick mini-benchmark showed me that allocating 2 MiB of memory (Vec::with_capacity(1 << 21)) takes nanoseconds, while allocating and initializing 2 MiB of memory (vec![0; 1 << 21]) takes about 60 µs, which is really significant. It showed up in my measurements. So I took some time to remove that overhead as well.

1 KiB File Mini-Benchmarks

But even on small files, this change improves a lot due to avoiding uninitialized memory (with MaybeUninit and unsafe code) and minimizing seek(2) system calls using the low-level libc::pread function instead.

(I fixed a bug in our current benchmarks that made it not actually wait for the read response.) Now, on benchmarks that involve inserting then reading small 1 KiB files 1000 times, this shows a 3x speedup for reads from LocalDir, on an Ubuntu machine on AWS EBS.

I expect that speedup to carry through to other operations like ones on NFS.

64 MiB File Mini-Benchmarks

The main improvement should be in large files, though, due to the buffer size increase from 16 KiB to 2 MiB. What if we changed the benchmark to insert a single, 64 MiB file and read it 10 times?

Still twice as fast, great. It's on par with the cached-chunks implementation now, which makes sense, since both the source and the cache live on local file system.

Real-world impact

After this is merged, I can update #15 and we'll be able to see simulated latencies on NFS better.

opened by ekzhang 1
Remove stat call by truncating end-of-file ranges

This changes the semantics of blobnet.get(hash, range) consistently so that if the range starts after the end of the file, an empty result is returned instead of Error::BadRange. With these new semantics we can remove a stat() system call from the LocalDir provider and also remove the hacky secondary fallback request we had for the S3 provider.

This also makes end-of-file responses cacheable, resolving a part of MOD-306 that I mentioned in a comment.

opened by ekzhang 1
Add a lifetime to ReadStream for a more flexible API
This changes the ReadStream type definition from

pub type ReadStream = Pin<Box<dyn AsyncRead + Send>>;

to

pub type ReadStream<'a> = Pin<Box<dyn AsyncRead + Send + 'a>>;

By adding this lifetime, the code does become a little bit more complicated since lifetimes need to be propagated, but we can simplify some use of blobnet.put() as a result. Now to put a data: &[u8] we can just do:

blobnet.put(Box::pin(data)).await?;

Instead of the previous:

use std::io::Cursor; blobnet.put(Box::pin(Cursor::new(data.to_owned()))).await?;

This avoids an unnecessary memory copy. The reason why this was necessary in the past is because ReadStream previously just always took a 'static lifetime implicitly, but for these arguments it's sometimes useful to have a stream that's borrowed rather than owned.

cc @freider who pointed this out
opened by ekzhang 0
Write data to EFS / LocalDir in 2 MiB chunks
In my measurements, adding a 2 MiB buffer to the std::io::copy call has increased write throughput of a 128 MiB file to EFS from:

Without buffering (before): 1 MiB / s

With buffering (after): 30 MiB / s

This change adds the buffer. Note that std::io::write() has a specialization specifically for BufWriter that optimizes out its second internal stack buffer, so this code is equivalent or as fast as directly writing a for-loop over buffered chunks.

Note: There are some messages on Slack. It's late, I'm going to discuss this tomorrow with a fresh perspective.
opened by ekzhang 0
Handle empty range and file-length starts better
This makes two changes to the API:

A request to an empty range such as 0-0 will check if the file exists and the index is within the file's length.

Previously this just immediately returned OK.

If the file has n bytes, GET requests with a range that starts at byte n will succeed and return an empty string.

Previously they returned a BadRange response, mimicking S3.

This makes the blobnet compatible with v0.1 Client's way of checking for existence, using a range 0-0.
opened by ekzhang 0
Simplify LRU cache and fix memory leak

I found a memory leak in the existing code and wrote a test that reproduced it.

But at this point I'm just going to remove all of this cache-advisor / bimap code right now because it's prone to memory leaks if not implemented perfectly, and I'll replace it with our own, very simple LRU cache implementation. No two-phase stuff, just extremely basic.

opened by ekzhang 0
Make caching provider not require file length

This is important because blobnet would require a second network round trip to check the file length, which isn't good for workloads like blobnet where we shouldn't have range issues regardless because we store it in the image manifest.

To do this, the blobnet API is slightly modified so that ranges dangling past the end of the file are now valid and truncated automatically. This is consistent with S3's API. I also corrected for some of the error handling logic.

I intend to release 0.2 after merging this and then testing the S3 provider locally.

opened by ekzhang 0

Owner

Modal Labs

Modal makes it easy to run code in the cloud.

GitHub

Rust cache structures and easy function memoization

cached Caching structures and simplified function memoization cached provides implementations of several caching structures as well as a handy macro f

996 Jan 7, 2023

This is a Rust implementation for HashiCorp's golang-lru. This crate contains three LRU based cache, LRUCache, TwoQueueCache and AdaptiveCache.

84 Jan 3, 2023

A native stateless cache implementation.

fBNC fBNC, Blockchain Native Cache. A native stateless storage library for block chain. Its value is to improve the stability and security of online s

1 Jan 12, 2022

Key-Value based in-memory cache library which supports Custom Expiration Policies

Endorphin Key-Value based in-memory cache library which supports Custom Expiration Policies with standard HashMap, HashSet interface. use endorphin::H

15 Oct 1, 2022

Stretto is a Rust implementation for ristretto. A high performance memory-bound Rust cache.

310 Dec 29, 2022

A lightweight key-value cache system developed for experimental purposes

A lightweight key-value cache system developed for experimental purposes. It can also include distributed systems setup if I can.

8 Jul 23, 2022

ConstDB - an in-memory cache store which aims at master-master replications

A redis-like cache store that implements CRDTs and active-active replications.

27 Aug 15, 2022

A set of safe Least Recently Used (LRU) map/cache types for Rust

LruMap A set of safe Least-Recently-Used (LRU) cache types aimed at providing flexible map-like structures that automatically evict the least recently

4 Sep 24, 2022

Read-optimized cache of Cardano on-chain entities

Read-optimized cache of Cardano on-chain entities Intro Scrolls is a tool for building and maintaining read-optimized collections of Cardano's on-chai

58 Dec 2, 2022

A read-only, memory-mapped cache.

mmap-cache A low-level API for a memory-mapped cache of a read-only key-value store. Design The [Cache] index is an [fst::Map], which maps from arbitr

3 Jun 28, 2022

Turn your discord cache back to viewable images.

discache ?? Every time you view an Image in Discord, it gets saved in your cache folder as an unviewable file. Discache allows you to convert those fi

2 Dec 14, 2022

Rust type wrapper to cache hash of potentially large structures.

CachedHash For a type T, CachedHash<T> wraps T and implements Hash in a way that caches T's hash value. This is useful when T is expensive to hash (fo

3 Dec 8, 2022

Key-value cache RESP server with support for key expirations ⌛

BADER-DB (بادِر) Key-value cache RESP server with support for key expirations ⌛ Supported Features • Getting Started • Basic Usage • Cache Eviction •

7 Apr 21, 2023

A generational arena based LRU Cache implementation in 100% safe rust.

generational-lru Crate providing a 100% safe, generational arena based LRU cache implementation. use generational_lru::lrucache::{LRUCache, CacheError

37 Dec 21, 2022

Hitbox is an asynchronous caching framework supporting multiple backends and suitable for distributed and for single-machine applications.

62 Dec 27, 2022

sccache is ccache with cloud storage

sccache - Shared Compilation Cache sccache is a ccache-like compiler caching tool. It is used as a compiler wrapper and avoids compilation when possib

3.6k Jan 2, 2023

Plugin for macro-, mini-quad (quads) to save data in simple local storage using Web Storage API in WASM and local file on a native platforms.

quad-storage This is the crate to save data in persistent local storage in miniquad/macroquad environment. In WASM the data persists even if tab or br

9 Jan 4, 2023

Non-volatile, distributed file cache backed by content-addressed storage

Related tags

Overview

blobnet

Usage

Comments

Blobnet v0.2 — Design

Issues with Blobnet v0.1

Proposal for Blobnet v0.2

S3 Provider

NFS Provider

“Client Proxy” Provider

Cache

1 KiB File Mini-Benchmarks

64 MiB File Mini-Benchmarks

Real-world impact

Owner

Modal Labs

Rust cache structures and easy function memoization

This is a Rust implementation for HashiCorp's golang-lru. This crate contains three LRU based cache, LRUCache, TwoQueueCache and AdaptiveCache.

A native stateless cache implementation.

Key-Value based in-memory cache library which supports Custom Expiration Policies

Stretto is a Rust implementation for ristretto. A high performance memory-bound Rust cache.

A lightweight key-value cache system developed for experimental purposes

ConstDB - an in-memory cache store which aims at master-master replications

A set of safe Least Recently Used (LRU) map/cache types for Rust

Read-optimized cache of Cardano on-chain entities

A read-only, memory-mapped cache.

Turn your discord cache back to viewable images.

Rust type wrapper to cache hash of potentially large structures.

Key-value cache RESP server with support for key expirations ⌛

A generational arena based LRU Cache implementation in 100% safe rust.

Hitbox is an asynchronous caching framework supporting multiple backends and suitable for distributed and for single-machine applications.

sccache is ccache with cloud storage

Plugin for macro-, mini-quad (quads) to save data in simple local storage using Web Storage API in WASM and local file on a native platforms.

Appendable and iterable key/list storage, backed by S3, written in rust

A general-purpose distributed memory cache system compatible with Memcached

A pure Rust implementation of the Web Local Storage API, for use in non-browser contexts