A Zincati lock backend for stateful workloads.

Related tags

Utilities strand

Overview

This repository is deprecated. We realized CoreOS is probably not a good fit for us. The repository will be kept up on the off chance that this is useful to someone else.

strand

A Zincati (Fedora CoreOS) reboot lock backend that makes sure the software running on your nodes is actually healthy before releasing the lock. This is useful because it allows you to run stateful workloads (like Ceph) on CoreOS, and take advantage of auto-updates.

Supported Strategies

All strategies consist of three parts.

The "pre-reboot" conditions and actions. The conditions must be met, and the actions must be taken before the node is given the lock and allowed to reboot.
The "post-reboot" conditions and actions. The conditions must be met before a certain timeout in order for the node to be considered healthy. The actions will be taken at some point before or after the conditions to do any neccesary cleanup.
The "timeout" action. If the conditions in the "post-reboot" stage are not met in time, the timeout action will be taken, and the lock will be permanently locked until a human manually resolves the issue (by design).

Failures may cause actions in any given stage to get run multiple times. They are guaranteed to run at least once. As such, all actions should be idempotent.

The currently supported strategies, each split up into these three parts, are as follows.

Kubernetes
- Pre-reboot: drain + cordon the node
- Post-reboot: wait for node.status.conditions[type="Ready"].status == "True"
- Timeout: no action taken
Ceph
- Pre-reboot: wait for cluster_status == Healthy, set noout on OSDs that are about to be down
- Post-reboot: wait for OSDs up, unset noout, wait for cluster_status == Healthy
- Timeout: unset noout (causing data replication)

In the future, it might be a good idea to allow waiting for arbitrary Prometheus metrics as part of a strategy.

Locking Design

Strand employs a Kubernetes-native locking design. It depends on the sequential consistency guarantees provided by etcd, and works as follows.

To obtain the lock...

Get the Lease object you've configured. If it doesn't exist, create it.
- If some other node holds the lease, fail.
- If you hold the lease, goto 3.
If no node holds the lease, update the lease object with your Node ID.
- If another node raced you and updated it first, the api server will reject the update, since the object version does not match. Fail.
Run the post-reboot actions, return success.

Metrics and Logs

Strand exports Prometheus metrics on /metrics, as is customary. You definitely want to hook into these metrics, because it contains an alarm for when a timeout has been hit and so manual human intervention is required.

Logs might be interesting for debugging issues if the alarm gets hit. Strand makes use of the excellent tracing crate, which means it might be possible to hook it into opentelemetry in the future.

Similar Software

Strand is inspired by poseidon/fleetlock. We really like typhoon but it isn't really for running pet Kubernetes clusters (the author suggests blue/greening your entire Kubernetes cluster if you want to update, which is reasonable for some groups). We need to run Ceph, and also we don't have enough hardware to blue/green an entire deployment, so a little more care with locking is neccesary. If you aren't running Ceph, go check out that project!

A straightforward stateful input manager for the Bevy game engine.

About A simple but robust input-action manager for Bevy: intended to be useful both as a plugin and a helpful library. Inputs from various input sourc

212 Jan 6, 2023

Log for concurrent workloads, with support for atomic batches and in-order recovery

sharded-log A batch-oriented multi-threaded sharded log for workloads that occasionally flush logs into some other system. All batches have a 32-bit C

16 Nov 20, 2022

The rust client for CeresDB. CeresDB is a high-performance, distributed, schema-less, cloud native time-series database that can handle both time-series and analytics workloads.

12 Nov 18, 2022

Cloud native log storage and management for Kubernetes, containerised workloads

Live Demo | Website | API Workspace on Postman Parseable is an open source, cloud native, log storage and management platform. Parseable helps you ing

715 Jan 1, 2023

Lockstitch is an incremental, stateful cryptographic primitive for symmetric-key cryptographic operations in complex protocols.

Lockstitch is an incremental, stateful cryptographic primitive for symmetric-key cryptographic operations (e.g. hashing, encryption, message authentication codes, and authenticated encryption) in complex protocols.

3 Dec 27, 2022

A Zincati lock backend for stateful workloads.

Related tags

Overview

strand

Supported Strategies

Locking Design

Metrics and Logs

Similar Software

You might also like...

A straightforward stateful input manager for the Bevy game engine.

Log for concurrent workloads, with support for atomic batches and in-order recovery

The rust client for CeresDB. CeresDB is a high-performance, distributed, schema-less, cloud native time-series database that can handle both time-series and analytics workloads.

Cloud native log storage and management for Kubernetes, containerised workloads

Lockstitch is an incremental, stateful cryptographic primitive for symmetric-key cryptographic operations in complex protocols.

A render-backend independant egui backend for sdl2

Garbage Collector(Hyaline- Safe Memory Reclaimation) for lock free data structures

A lock-free thread-owned queue whereby tasks are taken by stealers in entirety via buffer swapping

A lock-free multi-producer multi-consumer unbounded queue.

Audit Cargo.lock files for dependencies with security vulnerabilities

Audit Cargo.lock files for dependencies with security vulnerabilities

A contract to lock fungible tokens with a given vesting schedule including cliffs.

A Rust CLI to provide last publish dates for packages in a package-lock.json file

Quinine is a Rust library that implements atomic, lock-free, but write-once versions of containers like `Box` or `Arc`

A lock-free, append-only atomic pool.

A lock-free, partially wait-free, eventually consistent, concurrent hashmap.

Rust library for practical time-lock encryption using `drand` threshold network

Free and open-source reimplementation of Native Mouse Lock (display_mouse_lock) in rust.

Rust encryption library for practical time-lock encryption.

Owner

Open Computing Facility

Free and open-source reimplementation of Native Mouse Lock (display_mouse_lock) in rust.

An efficient async condition variable for lock-free algorithms

Verify that registry crates in your Cargo.lock are reproducible from the git repository

A pure Rust PLONK implementation using arkworks as a backend.

A principled BSDF pathtracer with an abstracted backend. Perfect for rendering procedural content.

Easy c̵̰͠r̵̛̠ö̴̪s̶̩̒s̵̭̀-t̶̲͝h̶̯̚r̵̺͐e̷̖̽ḁ̴̍d̶̖̔ ȓ̵͙ė̶͎ḟ̴͙e̸̖͛r̶̖͗ë̶̱́ṉ̵̒ĉ̷̥e̷͚̍ s̷̹͌h̷̲̉a̵̭͋r̷̫̊ḭ̵̊n̷̬͂g̵̦̃ f̶̻̊ơ̵̜ṟ̸̈́ R̵̞̋ù̵̺s̷̖̅ţ̸͗!̸̼͋

🍋: A General Lock following paper "Optimistic Lock Coupling: A Scalable and Efficient General-Purpose Synchronization Method"

Small and simple stateful applications, designed to facilitate the monitoring of unwanted behaviors of the same.

In-memory, non stateful and session based code sharing application.

A crate to implement leader election for Kubernetes workloads in Rust.