A diff-based data management language to implement unlimited undo, auto-save for games, and cloud-apps which needs to retain every change.

Related tags

Command-line docchi
Overview

crates.io link

Docchi is a diff-based data management language to implement unlimited undo, auto-save for games, and cloud-apps which needs to save very often.

Docchi is a language, so the API documentation is not very good to learn. You may want to read User's Manual.

  • Demonstration

Demo

Test Data

{
  "data0": "oiufsjdsj...", //1 MB of random string
  "data1": "abuisoehg...", //1 MB of random string
  //...
  "data9": "bhsiofdis...", //1 MB of random string
}

Pseudo-Code

for i in 0..100{
    modify_one_string(json.get_data_mut(rand.gen_range(0..10)));
    let s = serde_json::to_string(&json);
    let mut file = std::fs::File::create(format!("{}/d{}.json", json_dir, i))?;
    file.write_all(s.as_bytes())?;
}

The JSON has ten random 1 MB strings, so the entire JSON file is about 10 MB.

We modified one string and saved as JSON format at a time, and it was repeated 100 times. The total amount of the files was about 1 GB.

It means 10 % of the data is modified each time. Data tends to be modified partially, so 10 % is not very uncommon setting, I think.

Equivalent Docchi data is created, modified, and saved 100 times.

Docchi saves "diff". Only 1 MB of the modified data is saved at best.

The result is below

JSON
sum of file sizes 1021435957
1906 milliseconds

Docchi
sum of file sizes 173005171
604 milliseconds

JSON saved about 1 GB of data and took 1906 milliseconds.

Docchi saved about 173 MB of data and took 604 milliseconds.

Docchi took 17 % of the storage space and about one-third of the time.

Very nice?

Docchi only saved 17 % of the data so of course it's faster.

For comparison, we changed JSON strings to the length of 17 % and run the demo.

JSON(short)
sum of file sizes 173570901
338 milliseconds

About the same file size, and JSON was twice as fast as Docchi.

Serde is very fast, so the result is comprehensible.

But I think Docchi's overhead is reasonable, and Docchi can save in non-blocking manner, so saving time may not worsen user experience.

*Load Demo

Loading is where Docchi pays the cost. Docchi creates "Diff Hierarchy".

Diff Hierarchy Concept

Diff0(10 MB) - Diff00(1 MB) - Diff000(1 MB)
                                   │
                              Diff001(1 MB)
                                   │
                              Diff002(1 MB)
             - Diff01(5 MB) - Diff010(1 MB)
                                   │
                              Diff011(1 MB)
                                   │
                              Diff012(1 MB)
                                  ...
             - Diff02(10 MB)- Diff020(1 MB)
                                   │
                              Diff021(1 MB)
                                  ...
                  ...
 Diff1(10 MB)
   ... 

To load Docchi's diffs, we must load files hierarchically from top to bottom, and apply diffs repeatedly.

We used the default setting of Docchi, and it takes 13 files to load one data at most.

The total file size to load can be 4 times bigger than the biggest diff file (10 MB in this case), so it's 40 MB.

We searched the deepest file from the hierarchy and loaded the data.

Docchi 
40 milliseconds

JSON
94 milliseconds

JSON(Short)
16 milliseconds

The Docchi's total amount of data is 4 times bigger than JSON's, but more than twice as fast as JSON.

Docchi is a binary data format and efficiently multi-threaded for loading, so it was able to beat Serde, I think.

*How can it be done?

Constructing diff is very costly process in nature, but Rust's Arc(Atomic-Reference-Count-Pointer) makes it very easy.

Docchi's data is cloned on saving, so non-blocking saving can be done. Docchi's data consists of Arcs, so the cloning can be done instantly.

Using Arc::make_mut, actual copy of the inner data happens when two different Arcs point to the same object, and one of them is modified. When it's modified, two Arcs point to the different objects, so comparing two pointers of Arcs is enough to confirm if it's modified. Comparing actual values is not necessary.

And actual copy happens on the part which is to be actually modified. Copying everything is also unnecessary. Rust's Arc is really magical.

When comparing data and constructing diff, we compare the current object with the cloned object on save, and compare pointers. it's very fast process.

On the other hand, we didn't do anything special on loading. If it's fast, it owes to Rayon.

License

Licensed under either of

at your discretion.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

I'm eagerly looking for someone who can

  • correct my English(I'm not an English speaker)
  • criticize my code/API
You might also like...
An LLM-powered (CodeLlama or OpenAI) local diff code review tool.

augre An LLM-powered (CodeLlama or OpenAI) local diff code review tool. Binary Usage Install Windows: $ iwr https://github.com/twitchax/augre/releases

Crunch is a command-line interface (CLI) to claim staking rewards  every X hours for Substrate-based chains
Crunch is a command-line interface (CLI) to claim staking rewards every X hours for Substrate-based chains

crunch · crunch is a command-line interface (CLI) to claim staking rewards every X hours for Substrate-based chains. Why use crunch To automate payout

A Rust-based shell script to create a folder structure to use for a single class every semester. Mostly an excuse to use Rust.

A Rust Course Folder Shell Script PROJECT IN PROGRESS (Spring 2022) When completed, script will create a folder structure of the following schema: [ro

Ideas = Creations, a multi-language CMS(Content Management System) based on Rust Web stacks, with long-term upgrade and maintenance.

Ideas = Creations 中文 RustHub: Rust ideas yesterday, shining creations today! This repository holds source code used to run https://rusthub.org, it's

Oxygen is a voice journal and audio analysis toolkit for people who want to change the way their voice comes across.

Oxygen Voice Journal Oxygen is a voice journal and audio analysis toolkit for people who want to change the way their voice comes across. Or rather, i

Catch Tailwindcss Errors  at Compile-Time Before They Catch You, without making any change to your code!  Supports overriding, extending, custom classes, custom modifiers, Plugins and many more 🚀🔥🦀
Catch Tailwindcss Errors at Compile-Time Before They Catch You, without making any change to your code! Supports overriding, extending, custom classes, custom modifiers, Plugins and many more 🚀🔥🦀

twust Twust is a powerful static checker in rust for TailwindCSS class names at compile-time. Table of Contents Overview Installation Usage Statement

Text-based to-do management CLI & language server
Text-based to-do management CLI & language server

☑️ Todome (日本語版はこちら) Todome is a notation developed for making and editing to-do lists. It is inspired by Todo.txt, and in fact some of the todome not

Automate your business flows, support, change tickets with Automatdeck
Automate your business flows, support, change tickets with Automatdeck

Automatdeck agent Website: https://automatdeck.com Documentation: https://doc.automatdeck.com Automatdeck agent is a simple lightweight IT automation

Simple tool to change gmsts on launch.
Simple tool to change gmsts on launch.

Starfield GMST editor A simple UI to edit Starfield GMSTs and manage your bat file mods. Features filter GMSTs revert to default mod integration merge

Owner
juzy
I'm an English learner. I love Rust. The pic is my cat. Any feedback is welcome!
juzy
🚀 Yet another repository management with auto-attaching profiles.

?? ghr Yet another repository management with auto-attaching profiles. ?? Motivation ghq is the most famous solution to resolve stress of our reposito

Naoki Ikeguchi 29 Dec 2, 2022
This is a `Rust` based package to help with the management of complex medicine (pill) management cycles.

reepicheep This is a Rust based package to help with the management of complex medicine (pill) management cycles. reepicheep notifies a person(s) via

Daniel B 24 Dec 13, 2023
Firefox used to have this feature a while back (from Firefox 11 to 46) and it is so good, that I feel it needs revival.

3D WebPage Inspector By: Seanpm2001, Et; Al. Top README.md Read this article in a different language Sorted by: A-Z Sorting options unavailable ( af A

Sean P. Myrick V19.1.7.2 3 Nov 10, 2022
Tool that was built quickly for personal needs, you might find it useful though

COPYCAT Produced with stable-diffusion Clipboard (copy/paste) history buffer for terminal emulators, MAC OS gui and VIM* environment usage. Rrequireme

Dragan Jovanović 4 Dec 9, 2022
🦾 An AI developer that evolves to fit your needs

collective Discord An AI developer that evolves to fit your needs. collective is an AI developer that adapts to your coding style. When prompted to cr

Collective AI 5 Mar 30, 2023
auto-rust is an experimental project that aims to automatically generate Rust code with LLM (Large Language Models) during compilation, utilizing procedural macros.

Auto Rust auto-rust is an experimental project that aims to automatically generate Rust code with LLM (Large Language Models) during compilation, util

Minsky 6 May 14, 2023
A small program which makes a rofi game launcher menu possible by creating .desktop entries for games

rofi-games A small program which makes a `rofi` game launcher menu possible by creating `.desktop` entries for games Installation Manual Clone repo: g

Rolv Apneseth 20 May 4, 2023
A syntax-highlighting pager for git, diff, and grep output

Get Started Install delta and add this to your ~/.gitconfig: [core] pager = delta [interactive] diffFilter = delta --color-only [delta]

Dan Davison 16k Dec 31, 2022
Watch output and trigger on diff!

watchdiff Watch output and trigger on diff! Ever want to have watch output only tell you what changed? And not only what, but when? Now you can! Enter

geno 2 Apr 6, 2022
Rust library crate providing utility functions for diff and patch of slices

This crate provides the Change enum as an abstraction for diff::Result, lcs_diff::DiffResult, and wu_diff::DiffResult; the diff_changes(), diff_diff()

qtfkwk 5 Oct 19, 2022