Redirects your plumbing for you.

Josiah Parry

Last update: Jun 13, 2023

Related tags

Utilities valve

Overview

Valve

_Redirects your plumbing for you. _

valve creates multi-threaded Plumber APIs powered by Rust's tokio and axum web frameworks.

Motivation

Plumber is an R package that creates RESTful APIs from R functions. It is limited in that each API is a single R process and thus a single thread. Multiple queries are executed in the sequence that they came in. Scaling plumber APIs is not easy. The intention of valve is to be able to make scaling plumber APIs, and thus R itself, easier. We can make R better by leveraging Rust's "fearless concurrency."

Installation

Install the R package using {remotes}. Note that this will compile the package from source which will require Rust to be installed. If you don't have rust installed follow the instructions here. Rust is the second easiest programming language to install after R.

I also recommend installing the development version of {rextendr} via pak::pak("extendr/rextendr") which provides the function rextendr::rust_sitrep() which will update you on if you have a compatible Rust installation.

remotes::install_github("josiahparry/valve")

When the R package is built it also includes the binary executable at inst/valve. So if you ever find yourself needing the executable system.file("valve", package = "valve") will point you right to it! This will always be the version of the executable that your R package is using.

To install the executable only run

cargo install --git https://github.com/josiahparry/valve/ --no-default-features

Creating the app

The R package exports only 1 function: valve_run(). The most important argument is filepath which determines which Plumber API will be executed as well as specifying the host and port to determine where your app will run. Additional configuration can be done with the n_max, workers, check_unused, and max_age argument to specify how your app will scale.

library(valve)
# get included plumber API path
plumber_api_path <- system.file("plumber.R", package = "valve")

valve_run(plumber_api_path, n_max = 5)
#> Docs hosted at <http://127.0.0.1:3000/__docs__/>

n_max refers to the maximum number of background Plumber APIs that can be spawned whereas workers specifies how many main worker threads are available to handle incoming requests. Generally, the number of workers should be equal to the number of plumber APIs since because plumber is single threaded. This is the default. If workers is less than n_max, you'll never spawn the maximum number of APIs.

Plumber connections are automatically spawned, pooled, and terminated using deadpool. App connections are automatically pooled by hyper.

Running this from your R session will block the session. If you are comfortable, it is recommended to install the cli so you can run them from your terminal so that you can call the plumber APIs from your R session.

Calling valve with multiple workers

The way valve works is by accepting requests on a main port (3000 by default) and then distributing the requests round robin to the plumber APIs that are spawned on random ports. Requests are captured by axum and proxied to a plumber API process.

First I'm going to define a function to call my /sleep endpoint. The function will take two parameters: the port and the duration of sleep. The port will be used to change between the valve app and a single plumber API.

sleep <- function(port, secs) {
  httr2::request(
        paste0("127.0.0.1:", port, "/sleep?zzz=", secs)
    ) |> 
    httr2::req_perform() |> 
    httr2::resp_body_string()
}

Using this function we'll create 5 total R sessions each will make a request to sleep for 2 seconds.

library(furrr)
plan(multisession, workers = 5)

First, we'll ping the main valve app which will distribute requests round robin.

start <- Sys.time()
multi_sleep <- future_map(1:5, ~ sleep(3000, 2))
multi_total <- Sys.time() - start

Next, we select only one of the available plumber APIs and query it.

start <- Sys.time()
single_sleep <- furrr::future_map(1:5, ~ sleep(35219, 2))
single_total <- Sys.time() - start

Notice the performance difference.

print(paste0("Multiple Plumber APIs: ", round(multi_total, 2)))
#> [1] "Multiple Plumber APIs: 2.63"
print(paste0("One Plumber API: ", round(single_total, 2)))
#> [1] "One Plumber API: 10.08"

In the former each worker gets to make the request in approximately the same amount of time. The latter has to wait for each subsequent step to finish before the next one can occur. So we've effectively distributed the work load.

Benchmarks with drill

Simple benchmarks using drill can be found in inst/bench-sleep-plumber.yml and bench-sleep-valve.yml.

The bench mark calls the /sleep endpoint and sleeps for 500ms for 100 times with 5 concurrent threads. This alone can illustrate how much we can speed up a single plumber API's response time with valve.

Plumber's benchmark:

Time taken for tests      50.7 seconds
Total requests            100
Successful requests       100
Failed requests           0
Requests per second       1.97 [#/sec]
Median time per request   2540ms
Average time per request  2482ms
Sample standard deviation 272ms
99.0'th percentile        2556ms
99.5'th percentile        2556ms
99.9'th percentile        2556ms

Valve's benchmark:

Time taken for tests      10.2 seconds
Total requests            100
Successful requests       100
Failed requests           0
Requests per second       9.78 [#/sec]
Median time per request   510ms
Average time per request  510ms
Sample standard deviation 2ms
99.0'th percentile        516ms
99.5'th percentile        518ms
99.9'th percentile        518ms

With all that said....

valve is best suited for light to medium sized work loads. Each background plumber API will hold their own copy of their R objects. So if you are serving a machine learning model that is a GB big, that model will have to be copied into each thread and that can be quickly bloat up your ram. So be smart! If you have massive objects in your R session, try and reduce the clutter and thin it out.

Comments

Implement connection pooling with Deadpool

This PR changes valve to utilize deadpool to create custom connection pools of plumber APIs. Spawning and termination is handled mostly by deadpool. There no longer is a fixed number of APIs in the background. One down side is that it is possible to have no background apps but this can handled later i think and it is intended by deadpool to be fixed.

The main thread app is always running, though. So even if there isn't a background plumber API a request to the main thread will spawn one.

opened by JosiahParry 0

Redirects your plumbing for you.

Related tags

Overview

Valve

Motivation

Installation

Creating the app

Calling valve with multiple workers

Benchmarks with drill

With all that said....

You might also like...

File Tree Fuzzer allows you to create a pseudo-random directory hierarchy filled with some number of files.

Hosts EDitor, it will add/update/delete host entries for you

A tool that generates a Sublime Text project file that helps you get started using Scoggle.

cargo-lambda a Cargo subcommand to help you work with AWS Lambda

You can name anonymous Future from async fn without dyn or Box!

UnTeX is both a library and an executable that allows you to manipulate and understand TeX files.

cargo-lambda is a Cargo subcommand to help you work with AWS Lambda.

A simple web-app allowing you to batch archive groups of repositories from a given organization

The last kubernetes tool you'll ever need.

Comments

Implement connection pooling with Deadpool

Owner

Josiah Parry

Twidge is a fresh approach to productivity. It integrates with your workflow and allows you to be your most productive self.

Searchbuddy is a browser extension that lets you chat with people that are searching for what you're searching for.

Flexcord! A custom Discord client to allow you to do what you want!

miette is a diagnostic library for Rust. It includes a series of traits/protocols that allow you to hook into its error reporting facilities, and even write your own error reports!

Tagref helps you maintain cross-references in your code.

For when you really, really just want to know that your config changed

Check Have I Been Pwned and see if it's time for you to change passwords.

bevy_blender is a Bevy library that allows you to use assets created in Blender directly from the .blend file

This crate allows you to safely initialize Dynamically Sized Types (DST) using only safe Rust.

Fusion is a cross-platform App Dev ToolKit build on Rust . Fusion lets you create Beautiful and Fast apps for mobile and desktop platform.