CoreOS Disk Image Rehydrator

Overview

CoreOS Disk Image Rehydrator

Part of implementing https://github.com/coreos/fedora-coreos-tracker/issues/828 which is in turn part of https://github.com/openshift/enhancements/pull/201

In CoreOS we ship a lot of disk images, one for each platform. These are almost entirely the same thing, mostly just with an ignition.platform.id stamp in each disk image, wrapped in a hypervisor/platform specific container.

Plus for bare metal, we ship both a "split out" PXE setup as well as an ISO, which are also the same thing.

Today, users access this via stream metadata as documented here: https://docs.fedoraproject.org/en-US/fedora-coreos/stream-metadata/

The goal of this project is to make our build and delivery pipeline more "container native" by creating a container image that de-duplicates all of those images, and can generate them on-demand.

Specific goals:

  • Make offline mirroring much easier because an administrator can just mirror this container image along with the other containers they want
  • Orient our release engineering to be more "container native" in general
  • Ensure this container image is signed, which then in turn means we have a signature that covers all of our disk images.

Try it now!

An image is uploaded to quay.io/cgwalters/fcos-images:v0.1.1 (also tagged stable). For example here we extract the OpenStack image:

$ podman run --rm -i quay.io/cgwalters/fcos-images:v0.1.1 rehydrate - --disk openstack > fedora-coreos-openstack.x86_64.qcow2

And now you can e.g. upload this image with glance.

There's more artifacts, for example use --iso to get the metal live ISO.

We're using - to output to stdout, because it's more convenient than dealing with podman bind mounts. You can also use e.g. podman run --rm -i -v .:/out:Z quay.io/cgwalters/fcos-images:v0.1.1 rehydrate /out --disk openstack to write to a directory that was bind mounted from the host.

When extracting multiple things to stdout (e.g. --iso --disk qemu to get both the ISO and qemu.qcow2, or --pxe) then the output stream will be a tarball which you can extract via piping to tar xf -.

Use --help to see other commands. Notice in the above invocation, we chose the filename, and we also don't have the version number. This information can currently be retrieved via the print-stream-json command, which outputs the stream metadata stored in the image.

As of right now for Fedora CoreOS stable, the original (compressed) images total 8.34GiB, and the container image is 1.77GiB, so a savings of nearly 80% which isn't bad. Most importantly, adding new platforms will only incur a space hit of ~20MiB and not !700MiB. But there's more we can do here - see the issues list for details on ideas.

We also aren't including all the images; e.g. vmware is doable but needs some ova handling. The more images we include here, the better the overall compression ratio will look.

Goal: Bit-for-bit uncompressed SHA-256 match

Our CI tests the disk images. For example, we have extensive tests that exercise the live .iso, etc.

In order to ensure that we're re-generating what we tested, ideally the uncompressed SHA-256 checksum matches.

However, this is not currently done for all formats.

Image differences

The -openstack.qcow2 and the -qemu.qcow2 only differ in the ignition.platform.id in the boot partition that is written by https://github.com/coreos/coreos-assembler/blob/master/src/gf-platformid

However, the way we replace this also causes e.g. filesystem metadata (extents, timestamps) to change.

Plus, on s390x we need to rerun zipl which changes another bit of data.

Compression

We need to get this out of the way: compression gets very hard to reproduce. See https://manpages.debian.org/unstable/pristine-tar/pristine-tar.1.en.html

Our initial goal will be to generate uncompressed images, and verify the uncompressed SHA-256 matches. That's all we need to be sure we've generated the same thing - we don't need to replicate the compression exactly. We can trust our compression tools.

Hence, while we ship e.g. -qemu.qcow2.xz (or .gz for RHCOS currently), we will primarily generate e.g. -qemu.qcow2 and for callers that want it compressed, we may pass it to gzip -1 or so.

First approach: Add the -qemu.qcow2 image and use rsync to regenerate most images

A lowest common denominator to de-duplicate these is rsync-style rolling checksums. This approach is also used by ostree "baseline" deltas, although it can also use bsdiff.

Other approach: Reuse oscontainer content

We need to have a separate container image from machine-os-content in the RHCOS case, because today any change to machine-os-content will cause nodes to update on all platforms. But we don't want to force every machine to reboot just because we needed to respin the vsphere OVA!

However, what may work well is to have our image do FROM quay.io/openshift/machine-os-content@sha256:...

Today the rhcos oscontainer is an "archive" mode repo (each file individually compressed). In the future with ostree-ext containers we'll have an uncompressed repo, which will be easier to use as a deduplication source.

Either way, what may work is to e.g. generate a delta from "oscontainer stream" (tarball e.g.) to the metal/qcow2 image, and then deltas from that to other images.

Related: osmet?

https://github.com/coreos/coreos-installer/blob/master/docs/osmet.md

We could be much smarter about our deltas with an osmet-style approach. We even ship the ISO which has osmet for the metal images inside it.

So a simple approach could be to ship the ISO as the basis, and launch it in qemu to have it generate the metal image. Then we use the metal image as an rsync-style rolling basis for everything else.

See below for more on the ISO.

VMDK

AWS and vSphere are "VMDK" images which have internal compression. cosa today uses an invocation like this:

$ qemu-img convert -O vmdk -f qcow2 -o adapter_type=lsilogic,subformat=streamOptimized,compat6 fcos-qemu.qcow2 fcos-aws.vmdk

And this streamOptimized bit seems to turn on internal compression. Further, there's a bit of random data generated during this process: https://github.com/qemu/qemu/blob/266469947161aa10b1d36843580d369d5aa38589/block/vmdk.c#L2519 (Which will be handled by the rsync-style delta)

For now, let's hardcode the qemu options for these two here again. But longer term perhaps we fork off qemu-img info to try to gather this, or change coreos-assembler to include the qemu-img options used to generate the image.

ISO

The structure of the ISO is mostly a wrapper for images/pxeboot/rootfs.img which is a CPIO blob which contains a squashfs plus the osmet glue.

Reproducing squashfs bit may require using a fork:

What is "rehydration"?

A tip of the hat to https://en.wikipedia.org/wiki/The_Three-Body_Problem_(novel) where an alien species can "dehydrate" during times of crisis to a thinned-out version that can be stored, then "rehydrate" when it's over.

Comments
  • Parse Stream into

    Parse Stream into "RiverDelta"

    We kept parsing the stream data over and over. Let's create an explicit structure that has the artifacts we want split up, plus an explicit list of unhandled ones.

    opened by cgwalters 1
  • Allow unhandled

    Allow unhandled

    main: Remove unimplemented --skip-compress

    For now.


    build: Error out if there are unhandled artifacts by default

    Now that we handle vmware, that's everything for FCOS and we want to contain everything by default.


    opened by cgwalters 0
  • Strengthen artifact parsing

    Strengthen artifact parsing

    Dealing with metal was so much nesting of Option<T>. A quick analysis shows that all platforms have iso and pxe. So make those not optional. Further, this explicitly filters out the raw metal images since osmet makes those mostly obsolete.

    opened by cgwalters 0
  • Implement webserver/interface

    Implement webserver/interface

    It would make total sense to support being run as a webserver too, with a pretty interface.

    One uncertainty is whether the API should appear just as a GET or require a POST to generate an image. Generating some images is going to be sufficiently slow that I worry clients may time out on a GET request.

    opened by cgwalters 1
  • root from an ostree-container

    root from an ostree-container

    Today we have two "roots": the squashfs (used to generate the ISO and PXE) and the qcow2.

    Once we productize https://github.com/ostreedev/ostree-rs-ext/#module-container-encapsulate-ostree-commits-in-ocidocker-images i.e. https://github.com/coreos/fedora-coreos-tracker/issues/812 then our diskimage container could derive from it.

    This would offer a lot of advantages; we could use it instead of the qcow2 as one root.

    We can assume most people who want to mirror our disk images also want to mirror the os updates, so we'd get use container image layering for deduplication that way.

    opened by cgwalters 2
Owner
Colin Walters
@openshift & Fedora/RHEL @coreos engineer at @RedHatOfficial
Colin Walters
A K8s-optimized operating system, based on CoreOS

Kanopy Ultramarine Kanopy is a lightweight and easy to setup operating system optimized for Kubernetes. It is based on Ultramarine Linux, a Fedora rem

Ultramarine Linux 4 Jun 14, 2023
Lust is a static image server designed to automatically convert uploaded image to several formats and preset sizes

What is Lust? Lust is a static image server designed to automatically convert uploaded image to several formats and preset sizes with scaling in mind.

Harrison Burt 242 Dec 22, 2022
Save image from your clipboard 📋 as an image file directly from your command line! 🔥

Clpy ?? Save copied image from clipboard as an image file directly from your command line! Note It works only on windows as of now. I'll be adding sup

Piyush Suthar 13 Nov 28, 2022
Takes a folder of images (as a palette), and an image, and figures out how to tile the palette to resemble the image!

Takes a folder of images (as a palette), and an image, and figures out how to tile the palette to resemble the image!

Jacob 258 Dec 30, 2022
Rust Lean Image Viewer - Fast and configurable image viewer inspired by JPEGView by David Kleiner

Rust Lean Image Viewer - Fast and configurable image viewer inspired by JPEGView by David Kleiner

3top1a 4 Apr 9, 2022
Conference Monitoring Project based on Image Recognition that uses Rust Language and AWS Rekognition service to get the level of image similarity.

Conference Monitoring System based on Image Recognition in Rust This is a Conference Monitoring Project based on Image Recognition that uses Rust Lang

Pankaj Chaudhary 6 Dec 18, 2022
Terminal disk space navigator 🔭

diskonaut How does it work? Given a path on your hard-drive (which could also be the root path, eg. /). diskonaut scans it and indexes its metadata to

Aram Drevekenin 1.6k Dec 30, 2022
Save disk space by cleaning non-essential files from software projects.

Kondo ?? Cleans unneeded directories and files from your system. It will identify the disk space savings you would get from deleting temporary/unneces

Trent 920 Dec 27, 2022
Zenith - sort of like top or htop but with zoom-able charts, CPU, GPU, network, and disk usage

Zenith - sort of like top or htop but with zoom-able charts, CPU, GPU, network, and disk usage

Benjamin Vaisvil 1.6k Jan 4, 2023
Terminal disk space navigator 🔭

Given a path on your hard-drive (which could also be the root path, eg. /). diskonaut scans it and indexes its metadata to memory so that you could explore its contents (even while still scanning!).

Aram Drevekenin 1.6k Dec 30, 2022
osu-link is a program which links osu!stable beatmaps to osu!lazer's new store format, saving you disk space.

osu-link is a program which links osu!stable beatmaps to osu!lazer's new store format, saving you disk space.

LavaDesu 2 Nov 8, 2021
disk backed wal queue

Repository Template  Queue like disk backed WAL Pronouced Quál - from the german wordrd for agony - because it is. Operations The basic concept is si

The Tremor Project 8 Jun 4, 2022
A tool to subscribe to Twitch channels and store them efficiently on disk

twitch-messages A tool to subscribe to Twitch channels and store them efficiently on disk Build the Tools You can start by building the binaries that

Clément Renault 1 Oct 31, 2021
Count zeroes on a disk or a file

Count zeroes on a disk or a file

Cecile Tonglet 1 Dec 12, 2021
"putzen" is German and means cleaning. It helps keeping your disk clean of build and dependency artifacts safely.

Putzen "putzen" is German and means cleaning. It helps keeping your disk clean of build and dependency artifacts safely. About In short, putzen solves

Sven Assmann 2 Jul 4, 2022
A library for loading and executing PE (Portable Executable) from memory without ever touching the disk

memexec A library for loading and executing PE (Portable Executable) from memory without ever touching the disk This is my own version for specific pr

FssAy 5 Aug 27, 2022
garbage-collecting on-disk object store, supporting higher level KV stores and databases.

marble Garbage-collecting disk-based object-store. See examples/kv.rs for a minimal key-value store built on top of this. Supports 4 methods: read: de

Komora 215 Dec 30, 2022
Rust libraries for working with GPT (GUID Partition Table) disk data

gpt-disk-rs no_std libraries related to GPT (GUID Partition Table) disk data. There are three Rust packages in this repository: uguid The uguid packag

Google 25 Dec 24, 2022
bin2json extract recursively file, directory of files (or disk dump) metadata to json

bin2json bin2json extract metadata from different binary file format to json. It can take in input a file, a directory containing different files, a d

null 11 Oct 6, 2022
Executables on Disk? Bleh 🤮

Executables on Disk? Preposterous! Saving executables to disk is like telling EDRs that "Hey! Take a look at this thing I just fetched from the Intern

whokilleddb 87 Dec 18, 2022