Gossip-based cluster membership discovery (SWIM)

Overview

Foca: Cluster membership discovery on your terms

Foca is a building block for your gossip-based cluster discovery. It's a small no_std + alloc crate that implements the SWIM protocol along with its useful extensions (SWIM+Inf.+Susp.).

Project:

Introduction

The most notable thing about Foca is the fact that it does almost nothing. Out of the box, all it gives is a reliable and efficient implementation of the SWIM protocol that's transport and identity agnostic.

Knowledge of how SWIM works is helpful but not necessary to make use of this library. Reading the documentation for the Message enum should give you an idea of how the protocol works, but the paper is a very accessible read.

Foca is designed to fit into any sort of transport: If your network allows peers to talk to each other you can deploy Foca on it. Not only the general bandwidth requirements are low, but you also have full control of how members identify each other (see ./examples/identity_golf.rs) and how messages are encoded.

Usage

Please take a look at ./examples/foca_insecure_udp_agent.rs. It showcases how a simple tokio-based agent could look like and lets you actually run and see Foca swimming.

$ cargo run --features agent --example foca_insecure_udp_agent -- --help
foca_insecure_udp_agent 

USAGE:
    foca_insecure_udp_agent [OPTIONS] <BIND_ADDR>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
    -a, --announce <announce>    Address to another Foca instance to join with
    -f, --filename <filename>    Name of the file that will contain all active members
    -i, --identity <identity>    The address cluster members will use to talk to you.
                                 Defaults to BIND_ADDR

ARGS:
    <BIND_ADDR>    Socket address to bind to. Example: 127.0.0.1:8080

So you can start the agent in one terminal with ./foca_insecure_udp_agent 127.0.0.1:8000 and join it with as many others as you want with using a different BIND_ADDR and --announce to a running instance. Example: ./foca_insecure_udp_agent 127.0.0.1:8001 -a 127.0.0.1:8000.

The agent outputs some information to the console via tracing's subscriber. It defaults to the INFO log level and can be customized via the RUST_LOG environment variable using tracing_subscriber's EnvFilter directives.

Cargo Features

Every feature is optional. The default set will always be empty.

  • std: Adds std::error::Error support and implements foca::Identity for std::net::SocketAddr*.
  • tracing: Instruments Foca using the tracing crate.
  • serde: Derives Serialize and Deserialize for Foca's public types.
  • bincode-codec: Provides BincodeCodec, a serde-based codec type that uses bincode under the hood.
  • postcard-codec: Provides PostcardCodec a serde-based, no_std friendly codec that uses postcard under the hood.

Only for examples:

  • identity-golf: For ./examples/identity_golf.rs
  • agent: For ./examples/foca_insecure_udp_agent.rs

Notes

When writing this library, the main goal was having a simple and small core that's easy to test, simulate and reason about; It was mostly about getting a better understanding of the protocol after reading the paper.

Sticking to these goals naturally led to an implementation that doesn't rely on many operating system features like a hardware clock, atomics and threads, so becoming a no_std crate (albeit still requiring heap allocations) was kind of a nice accidental feature that I decided to commit to.

Comparison to memberlist

I avoided looking at memberlist until I was satisfied with my own implementation. Since then I did take a non-thorough look at it:

  • memberlist supports custom broadcasts, which is a very cool feature for complex service discovery scenarios, so now Foca has support for disseminating user data too (see BroadcastHandler documentation) :-)

  • It has a stream-based synchronization mechanism (push-pull) that's used for joining and periodic merging state between members: It's way beyond Foca's responsibilities, but it's a very interesting idea, so I've exposed the Foca::apply_many method which enables code using Foca to do a similar thing if desired.

  • Its configuration parameters change based on (current) cluster size. It's super useful for a more plug-and-play experience, so I want introduce something along those lines in the future, likely by pulling Config into Foca as a trait implementation.

Future

Foca is very focused on doing almost nothing, so it's likely that some things will end up on separate crates. But, in no particular order, I want to:

  • Provide a more plug-and-play experience, closer to what memberlist gives out of the box.

  • Make Foca run as a library for a higher level language. I'm not even sure I can take it that far, so sounds like fun!

  • Deliver a (re)usable simulator. Right now I've been yolo-coding one just to give me more confidence on what's implemented right now, but what I want is something that you can: Slap your own identity, codec and configuration; Set network parameters like TTL, loss rate, bandwidth; And then simulate production behavior (rolling restarts, partitions, etc) while watching convergence stats. This is a ridiculous amount of work.

  • Actually demonstrate a running Foca with no_std constraints; I don't have access to devices to play with at the moment, so it's been difficult to find motivation to pursue this.

References

License

Unless explicitly stated otherwise, all work is subject to the terms of the Mozilla Public License, version 2.0.

Files inside the ensure_no_std_alloc/ directory are under the MIT license, as their original.

Files inside the examples/ directory are dedicated to the Public Domain.

Comments
  • Member can lose all memberships and never recover

    Member can lose all memberships and never recover

    If there's a network event causing a node to lose all connectivity, all members will appear down from its point of view.

    It doesn't appear to recover from this condition when the network goes back up.

    Is this expected? If so, what would be a good way to remediate the situation? I assume detecting a large amount of down nodes -> re-announcing to the original bootstrap list would work?

    opened by jeromegn 32
  • `updates_backlog` grows to a certain number and stays there

    `updates_backlog` grows to a certain number and stays there

    I haven't been looking at it, but foca.updates_backlog() appears to be reporting 10832. Restarting my program just makes it grow back up to almost the same number of backlog items.

    Is this because we're trying to retransmit too much?

    image

    (Thats a single node, it restarted at the time of that dip)

    All nodes:

    image
    opened by jeromegn 23
  • Tweaking the `Config` for fast broadcast

    Tweaking the `Config` for fast broadcast

    Hey, it's me again!

    I've been reading the docs on the Config and trying to figure out how to tweak it. I'm just using the Config::simple() for now, it seemed sensible.

    I noticed it took a full ~4-5 minutes for all broadcasts to fully propagate on a cluster of 6 nodes geographically far apart. I'm sending foca messages over UDP.

    How would you tweak the config to make these broadcasts propagate faster? Perhaps increasing max_transmissions, as the documentation suggests?

    I can't help but feel like 4-5 minutes is a very long time even with the default max_transmissions setting. If I send a single update it takes about 1 second to reach everywhere. If I send ~20+ broadcasts from nodes randomly as a test, it takes 4-5 minutes to fully propagate everything once I stop my test.

    I'm testing this by updating values in a KV store and comparing the final state of each node. I'm diffing every state with every other state to make sure they're exactly the same. After the 4-5 minutes delay, I saw logs stopped printing my log line stating the node received a gossip broadcast item to process. This coincided with the state being the same everywhere.

    opened by jeromegn 18
  • Getting a lot of

    Getting a lot of "Broadcasts disabled" error logs when not using broadcasts

    I've turned off broadcasts through foca entirely a while ago and do my own thing now.

    I'm not using Foca::broadcast or Foca::add_broadcast anywhere in my code, yet I keep seeing "Broadcasts disabled" error logs.

    Can it happen any other way? Perhaps this is due to my corrupted messages being interpreted as custom broadcasts by foca?

    opened by jeromegn 10
  • Encountered:

    Encountered: "BUG! Probe cycle finished without running its full course"

    I figured I might bring this up since it was labeled "BUG".

    I'm not sure what can cause this, I've been restarting a few instances and encountered it. It seems to have recovered from it.

    From my logs, I see this sequence of things happened:

    • Notification::Rejoin
    • Notification::Active

    and then:

    ERROR foca: error handling timer: BUG! Probe cycle finished without running its full course
    
    opened by jeromegn 6
  • Membership discovery and (self) awareness improvements

    Membership discovery and (self) awareness improvements

    Going through features discussed on #15. I'll slowly be ticking the boxes as I complete each:

    • [x] A way for members to learn that their messages are being ignored, so they can react to the situation where they've been declared down but haven't received that info through normal means
    • [x] Periodic announce() to increase the speed to discover the whole cluster
    • [x] Periodic gossip(), to speed up cluster update propagation
    opened by caio 4
  • Broadcasting extensions

    Broadcasting extensions

    This will

    • [X] Allow implementations to instruct Foca when to include custom broadcasts on their messages based on their identity
    • [x] Expose functionality to guarantee broadcast dissemination

    Related: #3

    opened by caio 4
  • Adaptive max_transmissions

    Adaptive max_transmissions

    Determining the value for max_transmissions should be dynamic based on the number of members. Probably based on the memberlist implementation!

    Or it should be possible to update this value at runtime as more members are added to a cluster.

    Currently I'm passing a NODE_COUNT env var (for now I know this before deploying anything) and using the formula from memberlist.

    opened by jeromegn 3
  • Bump postcard dependency to 1.0.0

    Bump postcard dependency to 1.0.0

    Postcard 1.0.0 fixed an issue which caused chrono types and #[serde_as(as = "DisplayFromStr")] failed to be serialized. See https://github.com/jamesmunns/postcard/issues/32.

    In my use case, I put http::Uri type into my identity type, which doesn't implement serde traits. So I used DisplayFromStr to work it around. However, postcard codec failed to serialize it due to the issue stated above.

    opened by PhotonQuantum 2
  • Should I worry about

    Should I worry about "Member not found" warning log?

    We're using an fat identity with a bump.

    pub struct Actor {
        id: ActorId,
        name: ActorName,
        addr: SocketAddr,
        group: String,
    
        // An extra field to allow fast rejoin
        bump: u16,
    }
    

    and our Identity implementation:

    impl Identity for Actor {
        fn has_same_prefix(&self, other: &Self) -> bool {
            // sometimes the ID can be nil, when we connect to a member,
            // we don't know its ID, we only know its address
            // the ID should be updated later on, at least I hope
            if other.id.is_nil() || self.id.is_nil() {
                self.addr.eq(&other.addr)
            } else {
                self.id.eq(&other.id)
            }
        }
    
        fn renew(&self) -> Option<Self> {
            Some(Self {
                id: self.id,
                name: self.name.clone(),
                addr: self.addr,
                group: self.group.clone(),
                bump: self.bump.wrapping_add(1),
            })
        }
    }
    

    We're handling members like:

    match notification {
            Notification::MemberUp(actor) => {
                let added = { members.write().add_member(&actor) };
                info!("Member Up {actor:?} (added: {added})");
                if added {
                    // actually added a member
                    // notify of new cluster size
                    if let Ok(size) = (members.read().0.len() as u32).try_into() {
                        foca_tx.send(FocaInput::ClusterSize(size)).ok();
                    }
                }
                tokio::spawn(write_sql(write_sql_sender, move |pool| async move {
                    if let Err(e) = upsert_actor_name(&pool, actor.id(), actor.name().as_str()).await {
                        warn!("could not upsert actor name: {e}");
                    }
                }));
            }
            Notification::MemberDown(actor) => {
                let removed = { members.write().remove_member(&actor) };
                info!("Member Down {actor:?} (removed: {removed})");
                if removed {
                    // actually removed a member
                    // notify of new cluster size
                    if let Ok(size) = (members.read().0.len() as u32).try_into() {
                        foca_tx.send(FocaInput::ClusterSize(size)).ok();
                    }
                }
            }
    // ...
    }
    

    I started listing foca members via iter_members() to try and debug this.

    Some logs I see when restarting:

    2022-07-22T16:14:38.007685Z [INFO] Current Actor ID: 385107b4-3423-4b2e-b4e8-637cfac5779e
    2022-07-22T16:14:41.036757Z [INFO] Member Up Id(ActorName("ab03"), ActorId(9c84ee68-d7db-427a-9a08-b8c7d501e856), <ip>:7878, bump: 17951) (added: true)
    2022-07-22T16:14:41.036810Z [INFO] Current node is considered ACTIVE
    2022-07-22T16:14:41.036832Z [INFO] foca knows about: Id(ActorName("ab03"), ActorId(9c84ee68-d7db-427a-9a08-b8c7d501e856), <ip>:7878, bump: 17951)
    2022-07-22T16:14:42.103568Z [INFO] Member Up Id(ActorName("2f55"), ActorId(67c1bd17-a13b-4c72-8e81-63c6a5b82977), <ip>:7878, bump: 46452) (added: true)
    2022-07-22T16:14:42.103698Z [INFO] foca knows about: Id(ActorName("ab03"), ActorId(9c84ee68-d7db-427a-9a08-b8c7d501e856), <ip>:7878, bump: 17951)
    2022-07-22T16:14:42.103736Z [INFO] foca knows about: Id(ActorName("2f55"), ActorId(67c1bd17-a13b-4c72-8e81-63c6a5b82977), <ip>:7878, bump: 46452)
    2022-07-22T16:14:43.011694Z [INFO] Member Up Id(ActorName("09cd"), ActorId(b129db5e-650e-40c3-9046-852ac72ed1ef), <ip>:7878, bump: 57964) (added: true)
    2022-07-22T16:14:43.011765Z [INFO] foca knows about: Id(ActorName("ab03"), ActorId(9c84ee68-d7db-427a-9a08-b8c7d501e856), <ip>:7878, bump: 17951)
    2022-07-22T16:14:43.011779Z [INFO] foca knows about: Id(ActorName("09cd"), ActorId(b129db5e-650e-40c3-9046-852ac72ed1ef), <ip>:7878, bump: 57964)
    2022-07-22T16:14:43.011787Z [INFO] foca knows about: Id(ActorName("2f55"), ActorId(67c1bd17-a13b-4c72-8e81-63c6a5b82977), <ip>:7878, bump: 46452)
    2022-07-22T16:14:47.527714Z [INFO] Member Up Id(ActorName("1337"), ActorId(69fddfaf-b618-452e-829b-897e54c91889), <ip>:7878, bump: 39791) (added: true)
    2022-07-22T16:14:47.527816Z [INFO] foca knows about: Id(ActorName("ab03"), ActorId(9c84ee68-d7db-427a-9a08-b8c7d501e856), <ip>:7878, bump: 17951)
    2022-07-22T16:14:47.527869Z [INFO] foca knows about: Id(ActorName("09cd"), ActorId(b129db5e-650e-40c3-9046-852ac72ed1ef), <ip>:7878, bump: 57964)
    2022-07-22T16:14:47.527879Z [INFO] foca knows about: Id(ActorName("2f55"), ActorId(67c1bd17-a13b-4c72-8e81-63c6a5b82977), <ip>:7878, bump: 46452)
    2022-07-22T16:14:47.527887Z [INFO] foca knows about: Id(ActorName("1337"), ActorId(69fddfaf-b618-452e-829b-897e54c91889), <ip>:7878, bump: 39791)
    2022-07-22T16:14:50.424720Z [INFO] Member Up Id(ActorName("957f"), ActorId(9a925261-675a-4fa7-8478-ccf378b01d90), <ip>:7878, bump: 6158) (added: true)
    2022-07-22T16:14:50.424831Z [INFO] foca knows about: Id(ActorName("ab03"), ActorId(9c84ee68-d7db-427a-9a08-b8c7d501e856), <ip>:7878, bump: 17951)
    2022-07-22T16:14:50.424902Z [INFO] foca knows about: Id(ActorName("957f"), ActorId(9a925261-675a-4fa7-8478-ccf378b01d90), <ip>:7878, bump: 6158)
    2022-07-22T16:14:50.424913Z [INFO] foca knows about: Id(ActorName("2f55"), ActorId(67c1bd17-a13b-4c72-8e81-63c6a5b82977), <ip>:7878, bump: 46452)
    2022-07-22T16:14:50.424922Z [INFO] foca knows about: Id(ActorName("1337"), ActorId(69fddfaf-b618-452e-829b-897e54c91889), <ip>:7878, bump: 39791)
    2022-07-22T16:14:50.424930Z [INFO] foca knows about: Id(ActorName("09cd"), ActorId(b129db5e-650e-40c3-9046-852ac72ed1ef), <ip>:7878, bump: 57964)
    2022-07-22T16:14:53.231006Z [INFO] Member Up Id(ActorName("2f55"), ActorId(67c1bd17-a13b-4c72-8e81-63c6a5b82977), <ip>:7878, bump: 57499) (added: false)
    2022-07-22T16:15:11.043867Z [INFO] Member Up Id(ActorName("1337"), ActorId(69fddfaf-b618-452e-829b-897e54c91889), <ip>:7878, bump: 36078) (added: false)
    2022-07-22T16:15:31.041785Z [INFO] Member Down Id(ActorName("2f55"), ActorId(67c1bd17-a13b-4c72-8e81-63c6a5b82977), <ip>:7878, bump: 46452) (removed: false)
    2022-07-22T16:15:41.110738Z [INFO] Member Down Id(ActorName("1337"), ActorId(69fddfaf-b618-452e-829b-897e54c91889), <ip>:7878, bump: 36078) (removed: false)
    2022-07-22T16:16:06.048567Z [WARN] foca: Member not found
    

    Should we be worried about the Member not found log message or are these normal? They don't seem to appear repeatedly. This could be a race between 2 notifications?

    opened by jeromegn 1
  • probe validation and recovery improvements

    probe validation and recovery improvements

    this:

    • [x] ensures the probe state is clean if the instance goes idle in the middle of a probe cycle
    • [x] makes foca go back to a fully functional state in case it detects another case of incomplete probe cycle

    resolves #2

    opened by caio 0
  • Document nice-to-have things when using foca for serious business

    Document nice-to-have things when using foca for serious business

    There are some things foca doesn't do which would be nice to have them explicit.

    Foca doesn't:

    • Try to recover from zero-membership: if it reaches zero members implementors must decide how to handle it (announce to well known addresses, restart the application, w/e)
    • Version its protocol (yet, at least), so running distinct versions at once may lead to problems (inability to decode certain messages, for example)
    • Add any encryption to its payload, so it's trivial for anyone in the network to snoop in the topology and even join the cluster
    • ...More?

    (Ref #15)

    opened by caio 0
  • Get traces / errors in a better shape

    Get traces / errors in a better shape

    Traces are in a weird shape; when not running on debug the warns it emits don't contain any useful info.

    Issue opened to handle this with more care. Errors might include more data to make them more actionable (not sure this will be the case), but warn/error traces should definitely include useful metadata instead of requiring a rerun with the very noisy debug level.

    REF https://github.com/caio/foca/issues/2#issuecomment-1287697765

    opened by caio 0
  • Lots of CPU time spent sorting broadcast bytes

    Lots of CPU time spent sorting broadcast bytes

    I'm actually not sure where the time is being spent, but most of the CPU time in my project is currently being used sorting something related to broadcast messages.

    Flamegraph: corro.svg.zip

    Looks like it goes like this:

    • Foca::broadcast
    • Foca::send_message
    • Broadcasts::fill
    • core::slice::sort::recurse
    • core::slice::sort::partial_insertion_sort
    • PartialOrd::lt

    I have a loop that calls Foca::broadcast every 200ms to ensure maximum gossip propagation of broadcast messages.

    opened by jeromegn 6
  • How to use BroadcastHandler in a user-friendly way

    How to use BroadcastHandler in a user-friendly way

    I've stumbled upon this crate and it looks like it's of very high quality and has all the features I need for a little project.

    Everything has been pretty clear in the docs and examples, except I'm not sure how to correctly use BroadcastHandler.

    I'm assuming I should be able to use it to send / receive an enum that may represent multiple message types. However, BroadcastHandler::Broadcast requires AsRef<[u8]>. It seems like the only thing I can then really broadcast are serialized bytes.

    Now, that might be OK, but it means if my messages have any kind of complexity to them, I have to constantly deserialize them inside in my Invalidates implementation to determine if they should be invalidated.

    I get that this was built to run in embedded environment without too much resources. The serialization I'm doing is probably far heavier than intended for the use cases here. I could see how it would be entirely possible to read a u64 from the start of the message to know to invalidate it or not, but in my implementation there will be many different message variants and invalidating them might even require a database round-trip (TBD).

    Any insight on how I could use the BroadcastHandler to disseminate more complex types of messages?

    opened by jeromegn 5
Owner
null
Northstar is a horizontally scalable and multi-tenant Kubernetes cluster provisioner and orchestrator

Northstar Northstar is a horizontally scalable and multi-tenant Kubernetes cluster provisioner and orchestrator. Explore the docs » View Demo · Report

Lucas Clerisse 1 Jan 22, 2022
Utilities and tools based around Amazon S3 to provide convenience APIs in a CLI

s3-utils Utilities and tools based around Amazon S3 to provide convenience APIs in a CLI. This tool contains a small set of command line utilities for

Isaac Whitfield 47 Dec 15, 2022
A DIY, IMU-based skateboard activity tracker

tracksb A DIY, IMU-based skateboard activity tracker. The idea is to come up with algorithms to track activity during skateboarding sessions. A compan

null 21 May 5, 2022
A tiling window manager for Windows 10 based on binary space partitioning

yatta BSP Tiling Window Manager for Windows 10 Getting Started This project is still heavily under development and there are no prebuilt binaries avai

Jade 143 Nov 12, 2022
A high level diffing library for rust based on diffs

Similar: A Diffing Library Similar is a dependency free crate for Rust that implements different diffing algorithms and high level interfaces for it.

Armin Ronacher 617 Dec 30, 2022
A low-ish level tool for easily writing and hosting WASM based plugins.

A low-ish level tool for easily writing and hosting WASM based plugins. The goal of wasm_plugin is to make communicating across the host-plugin bounda

Alec Deason 62 Sep 20, 2022
wasm actor system based on lunatic

Wactor WASM actor system based on lunatic. Actors run on isolated green threads. They cannot share memory, and communicate only through input and outp

Noah Corona 25 Nov 8, 2022
a hobby OS for x86_64 based on MikanOS.

a hobby OS for x86_64 based on MikanOS.

algon 22 Dec 29, 2022
Nannou/Rust tutorial based on Schotter by Georg Nees

Schotter (German for gravel) is a piece by computer art pioneer Georg Nees. It consists of a grid of squares 12 across and 22 down with random rotation and displacement that increases towards the bottom.

null 101 Dec 27, 2022
Debug2 is a pretty printing crate based on std::fmt

debug2 is a pretty printing crate based on std::fmt Why not just use Debug The Debug trait is good, but the problem is it is not very good at n

Nixon Enraght-Moony 18 Jun 23, 2022
🦀 Rust-based implementation of a Snowflake Generator which communicates using gRPC

Clawflake Clawflake is a Rust application which implements Twitter Snowflakes and communicates using gRPC. Snowflake ID numbers are 63 bits integers s

n1c00o 5 Oct 31, 2022
simple epoch-based reclamation

ebr a simple epoch-based reclamation (EBR) library with low cacheline ping-pong. use ebr::Ebr; let mut ebr: Ebr<Box<u64>> = Ebr::default(); let mut

Tyler Neely 18 Nov 20, 2022
Another Async IO Framework based on io_uring

kbio, the Async IO Framework based on io_uring, is used in KuiBaDB to implement async io. Features Support multi-threading concurrent task submission.

KuiBaDB 59 Oct 31, 2022
Easy to use Rust i18n library based on code generation

rosetta-i18n rosetta-i18n is an easy-to-use and opinionated Rust internationalization (i18n) library powered by code generation. rosetta_i18n::include

null 38 Dec 18, 2022
Provides utility functions to perform a graceful shutdown on an tokio-rs based service

tokio-graceful-shutdown IMPORTANT: This crate is in an early stage and not ready for production. This crate provides utility functions to perform a gr

null 61 Jan 8, 2023
Generate an HTML page based on a Notion document

Notion Generator Generate an HTML page based on a Notion document! Still a bit of a work in progress, but I am about to actually use it for some actua

null 9 Dec 14, 2022
Stream-based FSEvents API bindings.

fsevent-stream Stream-based FSEvents API bindings. Features Support directory-granular and file-granular events. Retrieve related file inode with kFSE

LightQuantum 7 Dec 28, 2022
A Rust-based tool to analyze an application's heap.

Heap analysis tool for Rust Heap analysis is a pure-Rust implementation to track memory allocations on the heap. Usage Heap analysis provides a custom

Moritz Hoffmann 8 May 9, 2022
CBOR (binary JSON) for Rust with automatic type based decoding and encoding.

THIS PROJECT IS UNMAINTAINED. USE serde_cbor INSTEAD. This crate provides an implementation of RFC 7049, which specifies Concise Binary Object Represe

Andrew Gallant 121 Dec 27, 2022