A Zincati lock backend for stateful workloads.

Related tags

Utilities strand
Overview

This repository is deprecated. We realized CoreOS is probably not a good fit for us. The repository will be kept up on the off chance that this is useful to someone else.

strand

A Zincati (Fedora CoreOS) reboot lock backend that makes sure the software running on your nodes is actually healthy before releasing the lock. This is useful because it allows you to run stateful workloads (like Ceph) on CoreOS, and take advantage of auto-updates.

Supported Strategies

All strategies consist of three parts.

  1. The "pre-reboot" conditions and actions. The conditions must be met, and the actions must be taken before the node is given the lock and allowed to reboot.
  2. The "post-reboot" conditions and actions. The conditions must be met before a certain timeout in order for the node to be considered healthy. The actions will be taken at some point before or after the conditions to do any neccesary cleanup.
  3. The "timeout" action. If the conditions in the "post-reboot" stage are not met in time, the timeout action will be taken, and the lock will be permanently locked until a human manually resolves the issue (by design).

Failures may cause actions in any given stage to get run multiple times. They are guaranteed to run at least once. As such, all actions should be idempotent.

The currently supported strategies, each split up into these three parts, are as follows.

  • Kubernetes
    • Pre-reboot: drain + cordon the node
    • Post-reboot: wait for node.status.conditions[type="Ready"].status == "True"
    • Timeout: no action taken
  • Ceph
    • Pre-reboot: wait for cluster_status == Healthy, set noout on OSDs that are about to be down
    • Post-reboot: wait for OSDs up, unset noout, wait for cluster_status == Healthy
    • Timeout: unset noout (causing data replication)

In the future, it might be a good idea to allow waiting for arbitrary Prometheus metrics as part of a strategy.

Locking Design

Strand employs a Kubernetes-native locking design. It depends on the sequential consistency guarantees provided by etcd, and works as follows.

To obtain the lock...

  1. Get the Lease object you've configured. If it doesn't exist, create it.
    • If some other node holds the lease, fail.
    • If you hold the lease, goto 3.
  2. If no node holds the lease, update the lease object with your Node ID.
    • If another node raced you and updated it first, the api server will reject the update, since the object version does not match. Fail.
  3. Run the post-reboot actions, return success.

Metrics and Logs

Strand exports Prometheus metrics on /metrics, as is customary. You definitely want to hook into these metrics, because it contains an alarm for when a timeout has been hit and so manual human intervention is required.

Logs might be interesting for debugging issues if the alarm gets hit. Strand makes use of the excellent tracing crate, which means it might be possible to hook it into opentelemetry in the future.

Similar Software

Strand is inspired by poseidon/fleetlock. We really like typhoon but it isn't really for running pet Kubernetes clusters (the author suggests blue/greening your entire Kubernetes cluster if you want to update, which is reasonable for some groups). We need to run Ceph, and also we don't have enough hardware to blue/green an entire deployment, so a little more care with locking is neccesary. If you aren't running Ceph, go check out that project!

You might also like...
A straightforward stateful input manager for the Bevy game engine.

About A simple but robust input-action manager for Bevy: intended to be useful both as a plugin and a helpful library. Inputs from various input sourc

Log for concurrent workloads, with support for atomic batches and in-order recovery

sharded-log A batch-oriented multi-threaded sharded log for workloads that occasionally flush logs into some other system. All batches have a 32-bit C

The rust client for CeresDB. CeresDB is a high-performance, distributed, schema-less, cloud native time-series database that can handle both time-series and analytics workloads.

The rust client for CeresDB. CeresDB is a high-performance, distributed, schema-less, cloud native time-series database that can handle both time-series and analytics workloads.

Cloud native log storage and management for Kubernetes, containerised workloads
Cloud native log storage and management for Kubernetes, containerised workloads

Live Demo | Website | API Workspace on Postman Parseable is an open source, cloud native, log storage and management platform. Parseable helps you ing

Lockstitch is an incremental, stateful cryptographic primitive for symmetric-key cryptographic operations in complex protocols.

Lockstitch is an incremental, stateful cryptographic primitive for symmetric-key cryptographic operations (e.g. hashing, encryption, message authentication codes, and authenticated encryption) in complex protocols.

A render-backend independant egui backend for sdl2

A Sdl2 + Egui Backend An egui backend for sdl2 unbound to any renderer-backend. You can include it like so: [dependencies] egui_sdl2_platform = "0.1.0

Garbage Collector(Hyaline- Safe Memory Reclaimation) for lock free data structures

Hyaline-SMR This crate provides garbage collection using hyaline algorithm for building concurrent data structures. When a thread removes an object fr

A lock-free thread-owned queue whereby tasks are taken by stealers in entirety via buffer swapping

Swap Queue A lock-free thread-owned queue whereby tasks are taken by stealers in entirety via buffer swapping. This is meant to be used [thread_local]

A lock-free multi-producer multi-consumer unbounded queue.

lf-queue A lock-free multi-producer multi-consumer unbounded queue. Examples [dependencies] lf-queue = "0.1" Single Producer - Single Consumer: use lf

Audit Cargo.lock files for dependencies with security vulnerabilities

RustSec Crates 🦀 🛡️ 📦 The RustSec Advisory Database is a repository of security advisories filed against Rust crates published via crates.io. The a

Audit Cargo.lock files for dependencies with security vulnerabilities

RustSec Crates 🦀 🛡️ 📦 The RustSec Advisory Database is a repository of security advisories filed against Rust crates published via crates.io. The a

A contract to lock fungible tokens with a given vesting schedule including cliffs.

Fungible Token Lockup contract Features A reusable lockup contract for a select fungible token. Lockup schedule can be set as a list of checkpoints wi

A Rust CLI to provide last publish dates for packages in a package-lock.json file

NPM Package Age A Rust CLI which if you provide a npm lockfile (package-lock.json to start), it will give you a listing of all of the packages & the l

Quinine is a Rust library that implements atomic, lock-free, but write-once versions of containers like `Box` or `Arc`

Quinine is a Rust library that implements atomic, lock-free, but write-once versions of containers like `Box` or `Arc`

A lock-free, append-only atomic pool.

A lock-free, append-only atomic pool. This library implements an atomic, append-only collection of items, where individual items can be acquired and r

A lock-free, partially wait-free, eventually consistent, concurrent hashmap.
A lock-free, partially wait-free, eventually consistent, concurrent hashmap.

A lock-free, partially wait-free, eventually consistent, concurrent hashmap. This map implementation allows reads to always be wait-free on certain pl

Rust library for practical time-lock encryption using `drand` threshold network

tlock-rs: Practical Timelock Encryption/Decryption in Rust This repo contains pure Rust implementation of drand/tlock scheme. It provides time-based e

Free and open-source reimplementation of Native Mouse Lock (display_mouse_lock) in rust.

dml-rs display_mouse_lock in rust. Free, open-source reimplementation of display_mouse_lock (Native Mouse Lock) in Rust. Written because I felt like i

Rust encryption library for practical time-lock encryption.

tlock_age: Hybrid Timelock Encryption/Decryption in Rust tlock_age is a library to encrypt and decrypt age filekey using tlock scheme. It provides an

Owner
Open Computing Facility
The OCF is an all-volunteer, student-run program dedicated to free computing for all students, faculty, and staff at UC Berkeley.
Open Computing Facility
Free and open-source reimplementation of Native Mouse Lock (display_mouse_lock) in rust.

dml-rs display_mouse_lock in rust. Free, open-source reimplementation of display_mouse_lock (Native Mouse Lock) in Rust. Written because I felt like i

Tomat 4 Feb 12, 2023
An efficient async condition variable for lock-free algorithms

async-event An efficient async condition variable for lock-free algorithms, a.k.a. "eventcount". Overview Eventcount-like primitives are useful to mak

Asynchronics 3 Jul 10, 2023
Verify that registry crates in your Cargo.lock are reproducible from the git repository

cargo-goggles Verify that registry crates in your Cargo.lock are reproducible from the git repository. This cargo subcommand analyzes the following pr

M4SS - Industrial IoT Solutions 36 Jul 16, 2024
A pure Rust PLONK implementation using arkworks as a backend.

PLONK This is a pure Rust implementation of the PLONK zk proving system Usage use ark_plonk::prelude::*; use ark_ec::bls12::Bls12; use rand_core::OsRn

rust-zkp 201 Dec 31, 2022
A principled BSDF pathtracer with an abstracted backend. Perfect for rendering procedural content.

This is a port of the excellent GLSL_Pathtracer to Rust utilizing an abstracted, trait based backend. Perfect for rendering procedural content. Rust F

Markus Moenig 5 Nov 23, 2022
Easy c̵̰͠r̵̛̠ö̴̪s̶̩̒s̵̭̀-t̶̲͝h̶̯̚r̵̺͐e̷̖̽ḁ̴̍d̶̖̔ ȓ̵͙ė̶͎ḟ̴͙e̸̖͛r̶̖͗ë̶̱́ṉ̵̒ĉ̷̥e̷͚̍ s̷̹͌h̷̲̉a̵̭͋r̷̫̊ḭ̵̊n̷̬͂g̵̦̃ f̶̻̊ơ̵̜ṟ̸̈́ R̵̞̋ù̵̺s̷̖̅ţ̸͗!̸̼͋

Rust S̵̓i̸̓n̵̉ I̴n̴f̶e̸r̵n̷a̴l mutability! Howdy, friendly Rust developer! Ever had a value get m̵̯̅ð̶͊v̴̮̾ê̴̼͘d away right under your nose just when

null 294 Dec 23, 2022
🍋: A General Lock following paper "Optimistic Lock Coupling: A Scalable and Efficient General-Purpose Synchronization Method"

Optimistic Lock Coupling from paper "Optimistic Lock Coupling: A Scalable and Efficient General-Purpose Synchronization Method" In actual projects, th

LemonHX 22 Oct 13, 2022
Small and simple stateful applications, designed to facilitate the monitoring of unwanted behaviors of the same.

Violet Violet é um pequeno e simples monitorador de aplicação, voltado para receber eventos de erro e estado. Instalação simples: Dependencias: Docker

Lucas Mendes Campos 3 Jun 4, 2022
In-memory, non stateful and session based code sharing application.

interviewer In-memory, non stateful and session based code sharing application. Test it here: interviewer.taras.lol Note: it's deployed to render auto

2pac 7 Aug 16, 2021
A crate to implement leader election for Kubernetes workloads in Rust.

Kubernetes Leader Election in Rust This library provides simple leader election for Kubernetes workloads.

Hendrik Maus 33 Dec 29, 2022