Run if inputs have changed, otherwise use cache

Related tags

Command-line boost
Overview

Boost

Why Boost?

Our planet is burning, and everywhere I look I see CI pipelines repeating work that has already been done.

Tools such as TurboRepo, Nx and Bazel are amazing for working out what needs to be done and only doing each thing once, reusing outputs from previous runs where they can.

We should use these tools wherever possible because everyone wins — we get faster feedback and the planet thanks us because the energy required to recompute the outputs is not used.

These tools are big though, and in some cases require an all-in commitment (Bazel, as brilliant as it is, takes over your native toolchain). Also they can be quite ecosystem specific (TurboRepo does work in a Rust monorepo, but you have to include package.json files everywhere, which doesn't spark joy). All of them seem to be designed to work in a monorepo (I love a monorepo by the way).

It struck me that what we need is a "small, sharp tool" that "does one thing, and does it well", as per the UNIX Philosophy. Hence Boost. Boost only runs a task if any of the declared inputs have changed since a previous run, otherwise it restores the outputs from that previous run.

That's all it does. So hopefully you can use it easily, in more places, and without (yet) going all-in on a bigger tool.

Great for CI pipelines

Wrapping tasks in your CI/CD pipelines with boost will always remain simple, easy, and non-intrusive. It will only work, though, for "pure" tasks, i.e. deterministic tasks that will always produce the same outputs given the same inputs. In our opinion all pipelines should be like this anyway.

If there is anything non-deterministic in your pipeline (e.g. a potentially variable environment), you can capture it with the invariants in the task's config. This will ensure that the right cache key is used. The env_vars and input.files config sections are essentially specializations of invariants that should be useful (but ultimately could be expressed as invariants for the same result: e.g. invariants = ["echo $TEST"] has the same effect as env_vars = ["TEST"]).

Installing Boost

For now...

> cargo install --git https://github.com/StuartHarris/boost.git

help

Soon, we'll publish to crates.io and support homebrew etc.

Configuring Boost

Each task needs a simple TOML config file. For example, create a file called build.toml and run it with boost build.

You can list the tasks that have configuration files in the current directory by running boost (with no arguments).

list

You can run multiple tasks, e.g. boost build test.

Here is a file to build boost itself (build.toml).

"We want boost to build boost using the shell script build.sh only if we haven't already cached the specified outputs, which were generated from the specified inputs, in the dist folder. If we have a matching cache we'll restore the outputs to the same place".

description = "Build boost"
run = "./build.sh"

[input]
invariants = ["rustc -vV"]
env_vars = ["TEST"]

# these are the defaults, so you could miss this out if you want to
[[input.files]]
root = "."
filters = ["*"]

[[output.files]]
root = "dist"
filters = ["dist/boost"]

When calculating the hash for the cache-key, we visit every file from the root that matches any of the globs, hashing their contents.

We also run the specified invariants (commands) and hash their outputs (currently stdout). So for instance if you want to make sure your cache is no older than a day, you could specify an invariant of date +%y-%m-%d.

We also hash the specified environment variables and the tasks' config files.

A change in any of these inputs will result in a new run.

In this screenshot, we can see that the compilation happened the first time, but not the second time.

example

For another example, see test.toml.

description = "Test boost"
run = "cargo test"

[input]
invariants = ["rustc --vV"]

[[input.files]]
root = "."
filters = ["./src/**", "./Cargo.*"]

This is for running unit tests, so we don't need an [output] section.

example

Coming soon

  • imports, e.g. dependsOn (include the hashes from dependent boosts in the current boost’s hash)
  • green credentials
    • aggregate time saved and other stats (e.g at cache root)
    • ask to share these (anonymously and publicly)
    • link to e.g. "Principles of Green Software Engineering"
    • add a section to the readme with examples of potential carbon cost savings
  • remote cache
    • S3 compatible (can use e.g. Garage for on-prem)
    • output e.g. "found cache in eu-west-1 from 3 days ago"
    • atomic writes, optimistic concurrency
  • add ignore filter to input.files
  • cache cleanup, e.g. rolling removal of old cache items
  • configurable options
You might also like...
Fast turbo remote cache server written in Rust

Fast turbo remote cache server written in Rust. if you are using turbo and you want to have a self hosted remote cache server this is for you.

An expressive Rust library for interacting with a cache.

Amnesia An expressive Rust library for interacting with a Cache. Features Driver-Based Architecture: Easily switch between different caching strategie

fclicache - File-based Simple CLI Cache

fclicache is a command-line utility that caches the output of resource-intensive commands, enabling faster retrieval of results when the same commands are executed repeatedly within a specified Time-to-Live (TTL) period.

A Rust-based shell script to create a folder structure to use for a single class every semester. Mostly an excuse to use Rust.

A Rust Course Folder Shell Script PROJECT IN PROGRESS (Spring 2022) When completed, script will create a folder structure of the following schema: [ro

🤖 just is a handy way to save and run project-specific commands.

just just is a handy way to save and run project-specific commands. (非官方中文文档,这里,快看过来!) Commands, called recipes, are stored in a file called justfile

Run your Rust CLI programs as state machines with persistence and recovery abilities

step-machine Run your CLI programs as state machines with persistence and recovery abilities. When such a program breaks you'll have opportunity to ch

1 library and 2 binary crates to run SSH/SCP commands on a
1 library and 2 binary crates to run SSH/SCP commands on a "mass" of hosts in parallel

massh 1 library and 2 binary crates to run SSH/SCP commands on a "mass" of hosts in parallel. The binary crates are CLI and GUI "frontends" for the li

koyo is a cli tool that lets you run commands as another user. It is similar to doas or sudo.

koyo is a cli tool that lets you run commands as another user. It is similar to doas or sudo.

🕺 Run React code snippets/components from your command-line without config

Run React code snippets/components from your command-line without config.

Comments
  • Command line updates

    Command line updates

    • [x] boost on its own lists available tasks. It tries to parse all the TOML files in the current directory, returning the stems (of valid files) as task names.
    • [x] boost build runs the build task (from build.toml).
    • [x] boost build test runs the build task (from build.toml) and the test task (from test.toml).
    • [x] we no longer hash the command line arguments (as this causes cache misses for no reason).
    opened by StuartHarris 0
  • "depends_on" for task hierarchies (WIP)

    WIP: Introduces the concept of having boost be dependent on child boosts, with their hashes contributing to the parent's hash.

    NOTE: this PR may never be merged in full. I am using it as an experiment to see how best to orchestrate a schedule of async tasks. It currently uses Bevy ECS to represent the hierarchy and manage tasks — this may well be overkill, but a) it's fun and b) might actually be quite ergonomic, let's see :-)

    opened by StuartHarris 0
Owner
Stuart Harris
Stuart Harris
Small command-line tool to switch monitor inputs from command line

swmon Small command-line tool to switch monitor inputs from command line Installation git clone https://github.com/cr1901/swmon cargo install --path .

William D. Jones 5 Aug 20, 2022
Pure Rust Fault Proof Program that runs the rollup state-transition to verify an L2 output from L1 inputs.

palmtop palmtop is a fault proof program that runs the rollup state transition to verify an L2 output from L1 inputs. The verifiable L2 output can the

Anton Systems 5 Sep 26, 2023
The official CLI for FlakeHub: search for flakes, and add new inputs to your Nix flake.

fh, the official FlakeHub CLI fh is a scrappy CLI for searching FlakeHub and adding new inputs to your Nix flakes. Usage Using fh from FlakeHub: nix s

Determinate Systems 35 Oct 11, 2023
Lists Steam applications that have specified a Steam Play compatibility tool

proton-usage Lists Steam applications that have specified a Steam Play compatibility tool. Useful for when you want to remove/uninstall unused compati

Chris 10 Nov 13, 2022
Firefox used to have this feature a while back (from Firefox 11 to 46) and it is so good, that I feel it needs revival.

3D WebPage Inspector By: Seanpm2001, Et; Al. Top README.md Read this article in a different language Sorted by: A-Z Sorting options unavailable ( af A

Sean P. Myrick V19.1.7.2 3 Nov 10, 2022
This automatically patches the RoPro extension for you, allowing you to have pro_tier for free.

RoPro Patcher This automatically patches the RoPro extension for you, allowing you to have pro_tier for free. NOTE Chrome, Brave (and possibly other b

Stefan 10 Jan 1, 2023
I will be attempting Advent of Code 2022 with Rust, a language I have never learned before.

Advent of Code 2022 This year, I will be attempting Advent of Code with Rust, a language I have never learned before. I will also be taking some notes

null 4 Jan 7, 2023
A CLI for extracting libraries from Apple's dyld shared cache file

dyld-shared-cache-extractor As of macOS Big Sur, instead of shipping the system libraries with macOS, Apple ships a generated cache of all built in dy

Keith Smiley 238 Jan 4, 2023
Find and clean heavy build or cache directories.

ProjClean Find and clean heavy build or cache directories. ProjClean finds directories such as node_modules(node), target(rust), build(java) and their

null 42 Sep 25, 2022
Download pdbs from symbol servers and cache locally, parse symbol paths from env vars

symsrv This crate lets you download and cache pdb files from symbol servers, according to the rules from the _NT_SYMBOL_PATH environment variable. It

Markus Stange 6 Sep 15, 2022