An apocalypse-resistant data storage format for the truly paranoid.

diba-io

Last update: Dec 29, 2022

Related tags

Command-line carbonado

Overview

Carbonado

An apocalypse-resistant data storage format for the truly paranoid.

Designed to keep encrypted, durable, compressed, provably replicated consensus-critical data, without need for a blockchain or powerful hardware. Decoding and encoding can be done in the browser through WebAssembly, built into remote nodes on P2P networks, kept on S3-compatible cloud storage, or locally on-disk as a single highly portable flat file container format.

Features

Carbonado has features to make it resistant against:

Drive failure and Data loss
- Uses bao encoding so it can be uploaded to a remote peer, and random 1KB slices of that data can be periodically checked against a local hash to verify data replication and integrity. This way, copies can be distributed geographically; in case of a coronal mass ejection or solar flare, at most, only half the planet will be affected.
Surveillance
- Files are encrypted at-rest by default using ecies authenticated encryption from secp256k1 keys, which can either be provided or derived from a mnemonic.
Theft
- Decoding is done by the client with their own keys, so it won't matter if devices where data is stored are taken or lost, even if the storage media is unencrypted.
Digital obsolescence
- All project code, dependencies, and programs will be vendored into a tarball and made available in Carbonado format with every release.
Bit rot and cosmic rays
- As a final encoding step, forward error correction codes are added using zfec, to augment the ones already used in some filesystems and storage media.

All without needing a blockchain, however, they can be useful for periodically checkpointing data in a durable place.

Checkpoints

Carbonado supports an optional Bitcoin-compatible HD wallet with a specific derivation path that can be used to secure timestamped Carbonado Checkpoints using an on-chain OP_RETURN.

Checkpoints are structured human-readable yaml files that can be used to reference other carbonado-encoded files. They can also include an index of all the places the file has been stored, so multiple locations on the internet that can be checked for the presence of Carbonado-encoded data for that hash.

Applications

Contracts

RGB contract consignments must be encoded in a consensus-critical manner that is also resistant to data loss, otherwise, they cannot be imported or spent.

Content

Includes metadata for mime type and preview content, good for NFTs and UDAs, especially for taking full possession and self-hosting data, or paying peers to keep it safe, remotely.

Code

Code, dependencies, and programs can be vendored and preserved wherever they are needed. This helps ensure data is accessible, even if there's no longer internet access, or package managers are offline.

Comparisons

Ethereum

On Ethereum, all contract code is replicated by nodes for all addresses at all times. This results in scalability problems, is prohibitively expensive for larger amounts of data, and exposes all data for all contract users, in addition to the possibility it can be altered for all users without their involvement at any time.

Carbonado is specifically designed for encoding RGB contracts, which are to be kept off-chain, encrypted, and safe.

IPFS

IPFS stores data into a database called BadgerDS, encoded in IPLD formats, which isn't the same as a simple, portable flat file format that can be transferred and stored out-of-band of any server, service, or node.

Filecoin

Carbonado uses Bao stream verification based on the performant Blake3 hash algorithm, to establish a statistical proof of replication (which can be proven repeatedly over time). Filecoin instead uses zk-SNARKs, which are notoriously computationally expensive, often recommending GPU acceleration. In addition, Filecoin requires a blockchain, whereas Carbonado does not.

Storm

Storm is great, but it has a file size limit of 16MB, and while files can be split into chunks, they're stored directly in an embedded database, and not in flat files. Ideally, Carbonado would be used in conjunction with Storm.

Error correction

Some decisions were made in how error correction is handled. A chunking forward error correction algorithm was used, called Zfec, which is used in Tahoe-LAFS. Similar to how RAID 5 and 6 stripes parity bits across a storage array, Zfec encodes bits in such a manner where only k valid of m total chunks are needed to reconstruct the original. This becomes more complicated by the fact that Zfec does not have integrity checks built-in. Bao is used to verify the integrity of the decoded input; if the integrity check fails, we can't be quite sure which chunk failed. So, there are two ways to handle this; either create a hash for each chunk and persist it in a safe place out-of-band, or, try each combination of chunks until a combination is found that works. The latter approach is used here, since the need for scrubbing should hopefully be a relatively rare occurrence, especially if reliable storage media is used, a CoW filesystem set to scrub for bitrot, or there's an entire copy that's good. However, if you're down to your last copy, and all you have is the hash (name of the file) and some good chunks, the scrub method in this crate should help, even if it can be computationally-intensive.

Running scrub on an input that has no errors in it actually returns an error; this is to prevent the need for unnecessary writes of bytes that don't need to be scrubbed. This is useful in append-only datastores and metered cloud storage scenarios.

The values 4/8 were chosen for Zfec's k of m parameters, meaning, only 4 valid chunks are needed, but 8 chunks are provided. Half of the chunks could fail to decode. This doubles the size of the data, on top of the encryption and integrity-checking, but such is the price of paranoia. Also, a non-prime k is needed to align chunk size with Bao slice size.

Bao only supports a fixed chunk size of 1KB, so the smallest a Carbonado file can be is 8KB.

Comments

Double-encryption

Bytes can be encrypted with your own public key, but also a second layer could be added to encrypt for a storage provider. This will result in a separate hash, and improve proof of replication, since it reduces the risk of de-duplication (multiple providers colluding to keep data in one place, and then relaying slice challenges).

opened by cryptoquick 0
Web Storage Provider

A web storage provider will have a private key in a configuration file, and will use that along with the public key the file is signed by to encrypt it locally. All Carbonado files must be either signed or encrypted.

It will also store chunks in 8 separate folders, which are recommended to be moved to separate storage volume arrays.

This makes #11 obsolete because for private files, the key is simply not shared. If a storage provider is told to store a file that's not encrypted, it checks the signature and creates an ECDH key that encrypts the file using a shared secret. If the storage provider is paid to, it will provide the content.

This will also need to support key blacklisting and whitelisting. Whitelisting will be useful for storage providers who only want to support specific users, and blacklisting is useful for if someone is trying to share bad files using the same key across different storage providers.

opened by cryptoquick 0
Content Adressability and File Segmenting

Files over 16MB will be segmented in order to improve computational parallelization and to support streaming very large files.

Segments are different than chunks in that there will always need to be 4/8 chunks, but there can be many segment increments of 16MB.

In order to support parallelization, a content catalog is needed in order to refer to the original content that was encoded. This content catalog will be storage frontend-specific. For BitTorrent it'll be a SHA-2 hash, for IPFS it'll be a Blake2b Multihash, and for the HTTP frontend, it'll use a Blake3 hash. In all cases, the client is encouraged to hash the contents received once-over in order to verify it has indeed received the correct data. Content catalogs will be Carbonado-encoded on-disk, with optional encryption in order to preserve privacy at-rest.

For each frontend supported, a YAML file is used to simplify inspection, and it will contain a list of segments indexed by the Bao hash used to encode them. Additional metadata can also be included such as offset and index within the file to align the contents with IPLD DAGs or BitTorrent chunks. For the rsync frontend, original file metadata can be stored, and the rsync frontend indexes files by a hash of their path. Blake3 hashes will be keyed using the file's public key in order to improve privacy by breaking authoritative content hash tables (such as a sort of Rainbow table used to index files known by state actors).

opened by cryptoquick 0
Geographic Redundancy

Octants are used to better ensure geographic distribution of data. This is volunteered data, and won't have as much an effect on geographic arbitrage if default storage modes are tolerant of some measure of adjacency. By using an octant system, it's easier to select a different storage region to avoid putting all replicas in the same region without necessarily needing to resort to trigonometry.

The Carbonado Octants System: Although O4 is quite sparse, if there are no providers available within this region (or others), an adjacent octant will be chosen.

opened by cryptoquick 2
Storage markets

Market node, used for routing, LN channels are made to the market node route by both storage clients and storage providers. Storage prices rise on a per-node basis; this encourages a more even distribution of load, and incentivizes adding storage to a network in a competitive market as prices rise with demand. For instance, price should be a function of supply, and increase exponentially as a storage allocation approaches 100%.

Storage allocation can be verified by the storage provider node. Speed tests can be used between the storage provider and the storage market. Storage markets can relay data between multiple storage providers, improving replication, with replication factor managed by the storage market node itself, or replication can be performed peer to peer between storage clients and individual storage clients. The primary function of a market node is to set a market price. Otherwise, fully P2P operation can occur at a fixed price.

opened by cryptoquick 1

Owner

diba-io

Infrastructure for Web3 /webZero utility on Bitcoin and lightning

GitHub

Truly universal encoding detector in pure Rust - port of Python version

Charset Normalizer A library that helps you read text from an unknown charset encoding. Motivated by original Python version of charset-normalizer, I'

29 Oct 9, 2023

Databento Binary Encoding (DBZ) - Fast message encoding and storage format for market data

dbz A library (dbz-lib) and CLI tool (dbz-cli) for working with Databento Binary Encoding (DBZ) files. Python bindings for dbz-lib are provided in the

15 Nov 4, 2022

niwl - a prototype system for open, decentralized, metadata resistant communication

niwl - a prototype system for open, decentralized, metadata resistant communication niwl (/nɪu̯l/) - fog, mist or haze (Welsh). niwl is an experimenta

5 Feb 4, 2022

Single File Assets is a file storage format for images

SFA (Rust) Single File Assets is a file storage format for images. The packed images are not guaranteed to be of same format because the format while

1 Jan 23, 2022

Tight Model format is a lossy 3D model format focused on reducing file size as much as posible without decreasing visual quality of the viewed model or read speeds.

What is Tight Model Format The main goal of the tmf project is to provide a way to save 3D game assets compressed in such a way, that there are no not

59 Mar 6, 2023

Given a set of kmers (fasta format) and a set of sequences (fasta format), this tool will extract the sequences containing the kmers.

Kmer2sequences Description Given a set of kmers (fasta / fastq [.gz] format) and a set of sequences (fasta / fastq [.gz] format), this tool will extra

22 Sep 16, 2023

UniSBOM is a tool to build a software bill of materials on any platform with a unified data format.

UniSBOM is a tool to build a software bill of materials on any platform with a unified data format. Work in progress Support MacOS Uses system_profile

32 Nov 2, 2022

PyO3's PyAny as a serde data format

serde-pyobject PyO3's PyAny as a serde data format Usage Serialize T: Serialize into &'py PyAny: use serde::Serialize; use pyo3::{Python, types::{PyAn

3 Nov 24, 2023

a simple, non-self-describing data-interchange format.

rust-fr 'rust-fr' (aka rust for real) is a simple, non-self-describing data-interchange format. installation You can use either of these methods. Add

4 Feb 28, 2024

An event replay tool for the Trento storage backend.

photofinish - a little, handy tool to replay events This tiny CLI tool aims to fulfill the need to replay some events and get fixtures. Photofinish re

5 Nov 10, 2022

A simple command line program to upload file or directory to web3.storage with optional encryption and compression

w3s-cli A simple command line program to upload file or directory to web3.storage with optional encryption and compression. Features Uploads single fi

5 Oct 22, 2022

Mirroring remote repositories to s3 storage, with atomic updates and periodic garbage collection.

rsync-sjtug WIP: This project is still under development, and is not ready for production use. rsync-sjtug is an open-source project designed to provi

57 Feb 22, 2023

ISG lets you use YouTube as cloud storage for ANY files, not just video

I was working on this instead of my finals, hope you appreciate it. I'll add all relevant executables when I can Infinite-Storage-Glitch AKA ISG (writ

3.6k Feb 23, 2023

Tool and framework for securely reading untrusted USB mass storage devices.

usbsas is a free and open source (GPLv3) tool and framework for securely reading untrusted USB mass storage devices. Description Following the concept

250 Aug 16, 2023

A reliable key-value storage for modern software

Quick-KV A reliable key-value storage for modern software Features Binary Based Data-Store Serde Supported Data Types Thread Safe Links Documentation

3 Oct 11, 2023

HTTP client/libcurl TUI front end in Rust, with request + key storage

Rust TUI HTTP Client with API Key Management This project is still in active development and although it is useable, there may still be bugs and signi

23 Nov 9, 2023

Concurrent and multi-stage data ingestion and data processing with Rust+Tokio

TokioSky Build concurrent and multi-stage data ingestion and data processing pipelines with Rust+Tokio. TokioSky allows developers to consume data eff

29 Dec 11, 2022

Infer a JSON schema from example data, produce nonsense synthetic data (drivel) according to the schema

drivel drivel is a command-line tool written in Rust for inferring a schema from an example JSON (or JSON lines) file, and generating synthetic data (

36 Jul 5, 2024

⚗️ Superfast CLI interface for the conventional commits commit format

resin ⚗️ Superfast CLI interface for the conventional commits commit format ❓ What is resin? resin is a CLI (command-line interface) tool that makes i

23 Oct 12, 2022

An apocalypse-resistant data storage format for the truly paranoid.

Related tags

Overview

Carbonado

Features

Checkpoints

Applications

Contracts

Content

Code

Comparisons

Ethereum

IPFS

Filecoin

Storm

Error correction

Comments

Double-encryption

Web Storage Provider

Content Adressability and File Segmenting

Geographic Redundancy

Storage markets

Owner

diba-io

Truly universal encoding detector in pure Rust - port of Python version

Databento Binary Encoding (DBZ) - Fast message encoding and storage format for market data

niwl - a prototype system for open, decentralized, metadata resistant communication

Single File Assets is a file storage format for images

Tight Model format is a lossy 3D model format focused on reducing file size as much as posible without decreasing visual quality of the viewed model or read speeds.

Given a set of kmers (fasta format) and a set of sequences (fasta format), this tool will extract the sequences containing the kmers.

UniSBOM is a tool to build a software bill of materials on any platform with a unified data format.

PyO3's PyAny as a serde data format

a simple, non-self-describing data-interchange format.

An event replay tool for the Trento storage backend.

A simple command line program to upload file or directory to web3.storage with optional encryption and compression

Mirroring remote repositories to s3 storage, with atomic updates and periodic garbage collection.

ISG lets you use YouTube as cloud storage for ANY files, not just video

Tool and framework for securely reading untrusted USB mass storage devices.

A reliable key-value storage for modern software

HTTP client/libcurl TUI front end in Rust, with request + key storage

Concurrent and multi-stage data ingestion and data processing with Rust+Tokio

Infer a JSON schema from example data, produce nonsense synthetic data (drivel) according to the schema

⚗️ Superfast CLI interface for the conventional commits commit format