Provides utility functions to perform a graceful shutdown on an tokio-rs based service

Overview

tokio-graceful-shutdown

IMPORTANT: This crate is in an early stage and not ready for production.

This crate provides utility functions to perform a graceful shutdown on tokio-rs based services.

Specifically, it provides:

  • Listening for shutdown requests from within subsystems
  • Manual shutdown initiation from within subsystems
  • Automatic shutdown on
    • SIGINT/SIGTERM/Ctrl+C
    • Subsystem failure
    • Subsystem panic
  • Clean shutdown procedure with timeout and error propagation
  • Subsystem nesting

Usage Example

struct Subsystem1 {}

#[async_trait]
impl AsyncSubsystem for Subsystem1 {
    async fn run(mut self, subsys: SubsystemHandle)
      -> Result<()>
    {
        log::info!("Subsystem1 started.");
        subsys.on_shutdown_requested().await;
        log::info!("Subsystem1 stopped.");
        Ok(())
    }
}

This shows a simple asynchronous subsystem that simply starts, waits for the system shutdown to be triggered, and then stops itself.

This subsystem can now be executed like this:

#[tokio::main]
async fn main() -> Result<()> {
    Toplevel::new()
        .start("Subsys1", Subsystem1::new())
        .catch_signals()
        .wait_for_shutdown(Duration::from_millis(1000))
        .await
}

The Toplevel object is the root object of the subsystem tree. Subsystems can then be started using the start() functionality of the toplevel object.

The catch_signals() method signals the Toplevel object to listen for SIGINT/SIGTERM/Ctrl+C and initiate a shutdown thereafter.

wait_for_shutdown() is the final and most important method of Toplevel. It idles until the system enters the shutdown mode. Then, it collects all the return values of the subsystems and determines the global error state, and makes sure shutdown happens within the given timeout. Lastly, it returns an error value that can be directly used as a return code for main().

Further examples can be seen in the examples folder.

Building

To use this library in your project, add the following to the [dependencies] section of Cargo.toml:

[dependencies]
tokio-graceful-shutdown = "0.2"

To run one of the examples (here 01_normal_shutdown.rs), simply enter the repository folder and execute:

cargo run --example 01_normal_shutdown

Motivation

Performing a graceful shutdown on an asynchronous system is a non-trivial problem. There are several solutions, but they all have their drawbacks:

  • Global cancellation by forking with tokio::select. This is a wide-spread solution, but has the drawback that the canceled tasks cannot react to it, so it's impossible for them to shut down gracefully.

  • Forking with tokio::spawn and signalling the desire to shutdown running tasks with mechanisms like tokio::CancellationToken. This allows tasks to shut down gracefully, but requires a lot of boilerplate code:

    • Passing the tokens to the tasks
    • Waiting for the tasks to finish
    • Implement a timeout mechanism to prevent deadlock

    If then further functionality is required, as listening for signals like SIGINT or SIGTERM, the boilerplate code will become quite messy.

And this is exactly what this crate aims to provide: clean abstractions to all this boilerplate code.

Contributions

Contributions are welcome!

I primarily wrote this crate for my own convenience, so any ideas for improvements are greatly appreciated.

Comments
  • Wait on NestedSubsystem

    Wait on NestedSubsystem

    Thanks for writing this library! It seems helpful for my async code. I couldn't figure out how to handle this use case though: I have a nested subsystem, which will shut itself down on error. I want to retry from the top level in a loop. Something like

    async fn top_subsystem(subsys: SubsystemHandle) -> Result<()> {
      let init = one_time_expensive_setup();
      loop {
        let s = cheap_setup();
        let nested = subsys.start("nested", nested_subsystem);
        nested.wait();
      }
    }
    
    async fn nested_subsystem(subsys: SubsystemHandle) -> Result<()> {
      ...
      // On error:
      subsys.request_shutdown();
    }
    

    How would I do the nested.wait() part?

    opened by viveksjain 27
  • Question regarding usage in a non-trivial case

    Question regarding usage in a non-trivial case

    Hi there! First off, thanks for this amazing project, it really helped simplifying my previous setup with handling shutdowns manually!

    I tried to look for ways how people use this library in real world applications, but I couldn't really find much. I have a relatively simple question, so maybe you can guide me in the right direction:

    Say I have 4 subsystems attached to the top level. Three of those need to send messages to the first one. Where would you create tx and how would you pass it down? At the moment I'm creating all of them in my main function and pass them into the new() function of every subsystem before calling run(subsys) on them. This looks a little bit like this, and I'm wondering if there's a cleaner solution (to be fair, I'm relatively new to rust):

    #[tokio::main]
    async fn main() -> Result<()> {
        Builder::from_env(Env::default().default_filter_or("debug")).init();
    
        let (foo_for_alpha_tx, foo_rx) = mpsc::channel(8);
        let foo_for_beta = foo_for_alpha.clone();
        let foo_for_gamma = foo_for_alpha.clone();
    
        Toplevel::new()
            .start("Alpha", |s| async move {
                Alpha::new(foo_for_alpha_tx).run(s).await
            })
            .start("Beta", |s| async move {
                Beta::new(foo_for_beta_tx).run(s).await
            })
            .start("Gamma", |s| async move {
                Beta::new(foo_for_gamma_tx).run(s).await
            })
            .start("Delta", |s| async move {
                Delta::new(foo_rx).run(s).await
            })
            .catch_signals()
            .handle_shutdown_requests(Duration::from_millis(1000))
            .await
    }
    

    In general, it'd be really nice to see a more complete example with subsystem intercommunication in the example directory I guess :)

    Another question which popped into my mind, which I couldn't find an answer for: When a subsystem has nested subsystems and a shutdown is requested, do the nested subsystems shut down first before the parent, or is the order not guaranteed?

    Thanks a lot for your time!

    opened by DASPRiD 19
  • Replace anyhow dependency

    Replace anyhow dependency

    In my project I am not using anyhow and dont plan to - snafu implements something similar with Whatever if needed.

    The problem is: I would love to use this project, but I have to use this dependency just for the callbacks.

    IMO it would be nicer and easier to use a boxed error (Box<Error + Send + Sync>) - this introduces no extra dependency on anyhow and at the same time can contain all errors thrown at it.

    opened by cking 14
  • Provide helpers for common cancellation patterns

    Provide helpers for common cancellation patterns

    I did get the chance to try out the beta version, and happy to say that it works great. Helped me clean up my code quite a bit. Feel free to close this issue!

    One observation I will mention is that in most cases, I would like a newly started subsystem to select! on the parent's on_shutdown_requested so that it shuts down immediately on error, and this was causing quite a bit of boilerplate. I was able to address it with the following macro though

    /// Start a nested subsystem that `select!`s on `subsys.on_shutdown_requested()` to stop automatically.
    /// `subsystem_to_start` must have type `Future<Output=anyhow::Result<()>>`.
    #[macro_export]
    macro_rules! start {
        ($subsys:expr, $name:expr, $subsystem_to_start:expr) => {
            let subsys_clone = $subsys.clone();
            $subsys.start($name, move |_h: SubsystemHandle| async move {
                tokio::select! {
                    r = $subsystem_to_start => r,
                    _ = subsys_clone.on_shutdown_requested() => Ok::<(), anyhow::Error>(())
                }
            });
        };
    }
    

    Will leave it up to you on whether such a feature makes sense to be in the lib or not (in which case it can probably be another function on SubsystemHandle, rather than a macro).

    Originally posted by @viveksjain in https://github.com/Finomnis/tokio-graceful-shutdown/issues/37#issuecomment-1147069424

    opened by Finomnis 10
  • Remove anyhow from signatures.

    Remove anyhow from signatures.

    This removes anyhow::Error from the future signatures. That way it's easier for people to integrate this crate without having to bring anyhow as a dependency.

    I've updated all the examples.

    Signed-off-by: David Calavera [email protected]

    opened by calavera 9
  • Passing errors from subsystems to application code instead of simply printing them

    Passing errors from subsystems to application code instead of simply printing them

    What would be involved in supporting such a feature? It would be cool if the parent subsystem was able to explicitly handle the child subsystem errors.

    opened by TheButlah 8
  • macOS support?

    macOS support?

    I'm trying to run the code examples on macOS but the program just hangs without any error indefinitely.

    $ git clone [email protected]:Finomnis/tokio-graceful-shutdown.git
    $ cd tokio-graceful-shutdown
    $ cargo run --example 01_normal_shutdown
    # builds successfully, then launches and hangs
    
    $ rustc --version
    rustc 1.64.0 (a55dd71d5 2022-09-19)
    $ cargo --version
    cargo 1.64.0 (387270bc7 2022-09-16)
    

    macOS 12.5 (21G72)

    opened by beeb 6
  • Make errors types adjustable

    Make errors types adjustable

    Currently everything is a Box<dyn Error>, but as default template arguments for classes exist, we might make that a template argument with the default to Box<dyn Error>.

    Reasoning is that this simplifies passing errors from subsystems to this crate and back to the user again, without the user having to resort to weird runtime matching to get the original error type back.

    opened by Finomnis 4
  • Restart failed tasks

    Restart failed tasks

    Hello! Do you think that this crate could be extended be used to restart failed tasks? That way we could have the same unit of abstraction (a "subsystem", which is a wrapper around a task) for both handling graceful shutdowns and recovering from failure (in case of panic this maybe means wrapping the future on catch_unwind before calling tokio::spawn, but there could be other ways to signal failure).

    The reason I'm asking is that since tokio-graceful-shutdown is in charge of spawning the tasks, I can't have another crate (say, an hypothetical tokio-restart) fulfill this role. Any code that wanted to do restarts would need to hook on some internals of tokio-graceful-shutdown. Moreover, if a task is failed and the policy was not to restart it, then during the graceful shutdown it shouldn't be terminated.

    And well, if this is done, then Toplevel::start can't receive just a closure anymore; it must receive two (one for the spawned future, another that decides whether the task is restarted if it fails), or maybe define a trait with two methods and receive a trait object there; the default impl for the new method could be to just not restart anything.

    opened by dlight 4
  • Increase compatibility with other error handling libraries

    Increase compatibility with other error handling libraries

    To be more specific, increase compatibility with:

    This is based on PR https://github.com/Finomnis/tokio-graceful-shutdown/pull/16, contributed by calavera.

    opened by Finomnis 3
  • Memory leak due to Arc cycles

    Memory leak due to Arc cycles

    Hi, I've been reading through the codebase. It looks like SubsystemDescriptors hold Arc<SubsystemData>. And the SubsystemData holds the subsystem descriptors (without an Arc). Doesn't this mean that even if the parent subsystem is dropped, there will be a cyclic reference between a SubsystemDescriptor and a SubsystemData? So neither gets dropped and the memory is leaked

    opened by TheButlah 2
Releases(0.12.1)
  • 0.12.1(Dec 21, 2022)

  • 0.12.0(Dec 18, 2022)

  • 0.11.1(Aug 16, 2022)

  • 0.11.0(Aug 16, 2022)

  • 0.10.1(Jun 10, 2022)

    Non-Breaking Changes

    • Add .cancel_on_shutdown() method to all std::future::Future objects through the FutureExt trait.
      • Cancels the future when a shutdown happened.
    Source code(tar.gz)
    Source code(zip)
  • 0.10.0(Jun 7, 2022)

    Breaking Changes

    • The error return type of Toplevel::handle_shutdown_requests is GracefulShutdownError instead of a template

    Non-Breaking Changes

    • Add Toplevel::nested() to allow toplevel objects that are nested inside of subsystems.
      • This allows for a clean isolation of program parts that require their own shutdown context.
    • Add SubsystemHandle::request_global_shutdown() to initiate a shutdown of the entire program.
      • SubsystemHandle::request_shutdown() will only shut down the next Toplevel object.
    • Subsystem names can be &str instead of &'static str
    Source code(tar.gz)
    Source code(zip)
  • 0.10.0-beta.0(Jun 1, 2022)

    Breaking Changes

    • The error return type of Toplevel::handle_shutdown_requests is GracefulShutdownError instead of a template

    Non-Breaking Changes

    • Add Toplevel::nested() to allow toplevel objects that are nested inside of subsystems.
      • This allows for a clean isolation of program parts that require their own shutdown context.
    • Add SubsystemHandle::request_global_shutdown() to initiate a shutdown of the entire program.
      • SubsystemHandle::request_shutdown() will only shut down the next Toplevel object.
    • Subsystem names can be &str instead of &'static str
    Source code(tar.gz)
    Source code(zip)
  • 0.9.0(May 21, 2022)

    Breaking Changes

    • Move error types to mod errors
    • Shutdown gets triggered automatically when all subsystems have finished
      • In most cases this should make no difference, but in a couple of corner cases this might be a breaking change

    Non-Breaking Changes

    • Add SubsystemHandle::is_shutdown_requested() as a querying alternative to the async on_shutdown_requested()
    Source code(tar.gz)
    Source code(zip)
  • 0.9.0-beta.1(May 18, 2022)

  • 0.9.0-beta.0(May 16, 2022)

    Breaking Changes

    • Shutdown gets triggered automatically when all subsystems have finished
      • In most cases this should make no difference, but in a couple of corner cases this might be a breaking change

    Non-Breaking Changes

    • Add SubsystemHandle::is_shutdown_requested() as a querying alternative to the async on_shutdown_requested()
    Source code(tar.gz)
    Source code(zip)
  • 0.8.0(May 15, 2022)

    Breaking Changes

    • Error type is now configurable via generic
      • Enables custom error types for projects that are then recoverable from GracefulShutdownError in their original type
      • For more infos, see example 18_error_type_passthrough
    Source code(tar.gz)
    Source code(zip)
  • 0.8.0-beta.2(May 9, 2022)

  • 0.8.0-beta.1(May 9, 2022)

  • 0.8.0-beta.0(May 7, 2022)

    Breaking Changes

    • Error type is now configurable via generic
      • Enables custom error types for projects that are then recoverable from GracefulShutdownError in their original type
      • For more infos, see example 18_error_type_passthrough
    Source code(tar.gz)
    Source code(zip)
  • 0.7.0(May 6, 2022)

    Breaking Changes

    • Rewrite Errors
      • Errors can now be prettified with miette
      • Errors now carry the error sources including the actual errors from the failed subsystems
    • Display a warning if the Toplevel object isn't consumed (via handle_shutdown_requests)
    • Cancel all subsystems connected to the Toplevel object when the Toplevel object is dropped

    Non-breaking changes

    • Change recommended error wrapper library from anyhow to miette.
      • Compatibility to other error handling wrapper libraries stays unchanged.
    • Add IntoSubsystem trait to make writing struct-based subsystems more convenient
    Source code(tar.gz)
    Source code(zip)
  • 0.6.0(Apr 20, 2022)

    Breaking Changes

    • Subsystems can return Err<Into<Box<dyn Error + Send + Sync>>> instead of Err<Into<anyhow::Error>>
      • Allows all error types that can be converted to Box<dyn Error + Send + Sync>
      • Should be compatible with existing code, but increases compatibility with new error handling crates
    • Toplevel::handle_shutdown_requests can now return all types that implement From<GracefulShutdownError>, which should integrate seamlessly with most return types from main(), like anyhow::Result, Box<dyn Error> or eyre::Result
    Source code(tar.gz)
    Source code(zip)
  • 0.6.0-beta.0(Apr 20, 2022)

    Breaking Changes

    • Subsystems can return Err<Into<Box<dyn Error + Send + Sync>>> instead of Err<Into<anyhow::Error>>
      • Allows all error types that can be converted to Box<dyn Error + Send + Sync>
      • Should be compatible with existing code, but increases compatibility with new error handling crates
    • Toplevel::handle_shutdown_requests can now return all types that implement From<GracefulShutdownError>, which should integrate seamlessly with most return types from main(), like anyhow::Result, Box<dyn Error> or eyre::Result
    Source code(tar.gz)
    Source code(zip)
  • 0.5.0(Mar 15, 2022)

    Breaking Changes

    • Subsystems can return Err<Into<anyhow::Error>> instead of Err<anyhow::Error>
      • No longer imposes the anyhow dependency on library users
      • Should not be a problem in most cases, but closures only returning Ok(()) will now fail to compile, as Rust is unable to deduce the Error type
    Source code(tar.gz)
    Source code(zip)
  • 0.4.4(Mar 13, 2022)

    Changes

    • Make SubsystemHandle::start() a non-mut function (@Trivernis)
      • This removes the necessity of the handle argument to be mut, making it much easiert to pass it around, to store it in structs and to use it as a shared reference.
    Source code(tar.gz)
    Source code(zip)
  • 0.4.3(Dec 5, 2021)

    Changes

    • Fix: Errors during partial shutdown no longer cause global shutdown.
      • Instead, errors get properly delivered to the task that issued the partial shutdown.
    Source code(tar.gz)
    Source code(zip)
  • 0.4.2(Dec 5, 2021)

  • 0.4.1(Nov 28, 2021)

    Breaking Changes

    • Rename Toplevel::wait_for_shutdown to Toplevel::handle_shutdown_requests to make it clearer what the purpose of this function is
    • Implement partial shutdown API
      • Change return value of SubsystemHandle::create() to NestedSubsystem
      • Add SubsystemHandle::perform_partial_shutdown(NestedSubsystem)
      • Add NestedSubsystem struct and PartialShutdownError enum
    Source code(tar.gz)
    Source code(zip)
  • 0.3.2(Nov 28, 2021)

  • 0.3.1(Nov 22, 2021)

    Changes

    • Rewrite panic handling to no longer use panic hooks

    This should have no effect on the API itself. It is mainly an improvement in the interaction with other libraries that use panic hooks.

    Source code(tar.gz)
    Source code(zip)
  • 0.3.0(Nov 21, 2021)

    Breaking changes

    • Rewrite the API to now use async lambdas or coroutines instead of async traits

    Using coroutines/lambdas simplifies most use cases a lot without any reduction in capability.

    More information about how to use subsystem structs as in the previous API can be seen in the examples.

    Source code(tar.gz)
    Source code(zip)
Owner
null
Wait Service is a pure rust program to test and wait on the availability of a service.

Wait Service Wait Service is a pure rust program to test and wait on the availability of a service.

Magic Len (Ron Li) 3 Jan 18, 2022
Tons of extension utility functions for Rust

LazyExt Tons of extension utility functions for Rust. English | 简体中文 Status Name Status Crate Documents Introduction lazyext-slice Alpha Thousands of

Al Liu 2 Dec 5, 2022
Mix async code with CPU-heavy thread pools using Tokio + Rayon

tokio-rayon Mix async code with CPU-heavy thread pools using Tokio + Rayon Resources Documentation crates.io TL;DR Sometimes, you're doing async stuff

Andy Barron 74 Jan 2, 2023
An asynchronous IO utilities crate powered by tokio.

An asynchronous IO utilities crate powered by tokio.

Harry 2 Aug 18, 2022
dark-std an Implementation of asynchronous containers build on tokio

dark-std dark-std is an Implementation of asynchronous containers build on tokio. It uses a read-write separation design borrowed from Golang SyncHash

darkrpc 4 Dec 13, 2022
Pure Rust library for Apache ZooKeeper built on tokio

zookeeper-async Async Zookeeper client written 100% in Rust, based on tokio. This library is intended to be equivalent with the official (low-level) Z

Kamil Rojewski 16 Dec 16, 2022
Thin wrapper around [`tokio::process`] to make it streamable

This library provide ProcessExt to create your own custom process

null 4 Jun 25, 2022
Rc version `tokio-rs/bytes`

RcBytes The aim for this crate is to implement a Rc version bytes, which means that the structs in this crate does not implement the Sync and Send. Th

Al Liu 2 Aug 1, 2022
Alternative StreamMap fork of tokio-stream

streammap-ext This is a fork of StreamMap from tokio-stream crate. The only difference between the implementations is that this version of StreamMap n

MetalBear 5 Aug 18, 2022
Async Rust cron scheduler running on Tokio.

Grizzly Cron Scheduler A simple and easy to use scheduler, built on top of Tokio, that allows you to schedule async tasks using cron expressions (with

Ivan Brko 4 Feb 27, 2024
A rust-based version of the popular dnsgen python utility

ripgen A rust-based version of the popular dnsgen python utility. ripgen is split into two main parts: ripgen: A CLI utility that calls into ripgen_li

resync 198 Jan 2, 2023
📦 🚀 a smooth-talking smuggler of Rust HTTP functions into AWS lambda

lando ?? maintenance mode ahead ?? As of this announcement AWS not officialy supports Rust through this project. As mentioned below this projects goal

Doug Tangren 68 Dec 7, 2021
Simple and fast git helper functions

Simple and fast git helper functions

LongYinan 126 Dec 11, 2022
Various extention traits for providing asynchronous higher-order functions

async-hofs Various extention traits for providing asynchronous higher-order functions. // This won't make any name conflicts since all imports inside

かわえもん 5 Jun 28, 2022
A collection of functions written in Triton VM assembly (tasm)

tasm-lib This repository contains a collection of functions written in Triton VM assembly (tasm). There are two big projects to be written in tasm: Th

Triton VM 2 Dec 20, 2022
The lambda-chaos-extension allows you to inject faults into Lambda functions without modifying the function code.

Chaos Extension - Seamless, Universal & Lightning-Fast The lambda-chaos-extension allows you to inject faults into Lambda functions without modifying

AWS CLI Tools 5 Aug 2, 2023
Rust client for AWS Infinidash service.

AWS Infinidash - Fully featured Rust client Fully featured AWS Infinidash client for Rust applications. You can use the AWS Infinidash client to make

Rafael Carício 15 Feb 12, 2022
A service for helping your cat find other cats

Check back later! Discord Self-hosting This is an open-source service! Feel free to host you own private instances. All we ask is you credit us and li

ibx34 4 Oct 31, 2021
Ultralight, security-first service mesh for Kubernetes. Main repo for Linkerd 2.x.

Linkerd ?? Welcome to Linkerd! ?? Linkerd is an ultralight, security-first service mesh for Kubernetes. Linkerd adds critical security, observability,

Linkerd 9.2k Jan 1, 2023