A Prometheus Aggregation Gateway for FAAS applications

Overview

Gravel Gateway

Crates.io

Gravel Gateway is a Prometheus Push Gateway for FAAS applications. In particular it allows aggregation to be controlled by the incoming metrics, and thus provides much more flexibility in the semantics that your metrics can follow. In general, the Gravel Gateway functions as a standard aggregating push gateway - by default, everything except Gauges are sumed, so e.g. if you push

# TYPE value_total counter
value_total 1
# TYPE value2 gauge
value2 1

three times, then Prometheus will scrape

# TYPE value_total counter
value_total 3
# TYPE value2 gauge
value2 1

Where the Gravel Gateway differs, is that it allows you to specify a special clearmode label to dictate how metrics are aggregated.

We currently support three different values of clearmode - aggregate (the default for non gauges), replace (the default for gauges), and family which provides info like semantics. As a practical example, if we push:

# TYPE value_total counter
value_total 1
# TYPE value2 gauge
value2{clearmode="aggregate"} 1
# TYPE version gauge
version{version="0.0.1",clearmode="family"} 1

and then

# TYPE value_total counter
value_total 3
# TYPE value2 gauge
value2{clearmode="aggregate"} 1
# TYPE version gauge
version{version="0.0.2",clearmode="family"} 1

(note the changed version label), Prometheus will scrape:

# TYPE version gauge
version{version="0.0.2"} 1
# TYPE value2 gauge
value2 2
# TYPE value_total counter
value_total 4

With the counter value being replaced, the gauge value being sumed, and the version value completly replacing the old version. You'll auso note that the clearmode label is removed by the gateway - it's not included in the metrics exposed to the Prometheus scrape. In that way, this aggregating process is completly transparent to the Prometheus.

Usage

Prometheus Gravel Gateway 

USAGE:
    gravel-gateway [OPTIONS]

FLAGS:
    -h, --help       
            Prints help information

    -V, --version    
            Prints version information


OPTIONS:
        --basic-auth-file <basic-auth-file>    
            The file to use for basic authentication validation.
                            This should be a path to a file of bcrypt hashes, one per line,
                            with each line being an allowed hash.
    -l <listen>                                
            The address/port to listen on [default: localhost:4278]

        --tls-cert <tls-cert>                  
            The certificate file to use with TLS

        --tls-key <tls-key>                    
            The private key file to use with TLS

To use, run the gateway:

gravel-gateway

You can then make POSTs to /metrics to push metrics:

echo '# TYPE value_total counter
value_total{clearmode="replace"} 3
# TYPE value2 gauge
value2{clearmode="aggregate"} 1
# TYPE version gauge
version{version="0.0.2",clearmode="family"} 1' | curl --data-binary @- localhost:4278/metrics

And point Prometheus at it to scrape:

global:
  scrape_interval: 15s
  evaluation_interval: 30s
scrape_configs:
  - job_name: prometheus
    honor_labels: true
    static_configs:
      - targets: ["127.0.0.1:4278"]

Authentication

Gravel Gateway supports (pseudo) Basic autentication (with the auth feature). To use, populate a file with bcrypt hashes, 1 per line, e.g.

htpasswd -bnBC 10 "" supersecrets | tr -d ':\n' > passwords

and then start gravel-gateway pointing to that file:

gravel-gateway --basic-auth-file ./passwords

Requests to the POST /metrics endpoint will then be rejected unless they contain a valid Authorization header:

curl http://localhost:4278/metrics -vvv --data-binary @metrics.txt -H "Authorization: Basic supersecrets"

You'll note that we don't base64 the authorization header, so it's not technically Basic Auth, but I don't like Base64ing it because I believe that gives a false sense of security. Instead, you should enable TLS

TLS

TLS is provided by the tls-key and tls-cert args. Both are required to start a TLS server, and represent the private key, and the certificate that is presented respectivly.

Motivation

I recently wrote about my frustrations with trying to orchestrate Prometheus in an FAAS (Functions-As-A-Service) system that will rename nameless. My key frustration was that the number of semantics I was trying to extract from my Prometheus metrics was too much for the limited amount of data you can ship with them. In particular, there was three semantics I was trying to drive:

  1. Aggregated Counters - Things like request counts. FAAS applications only process one request (in general), so each sends a 1 to the gateway and I want to aggregate that into a total request count across all the invocations
  2. Non aggregated Gauges - It doesn't really make sense to aggregate Gauges in the general case, so I want to be able to send gauge values to the gateway and have them replace the old value (TODO: A rolling average would be nice)
  3. Info values - Things like the build information. When a new labelset comes along for these metrics, I want to be able to replace all the old labelsets, e.g. upgrading from {version="0.1"} to {version="0.2"} should replace the {version="0.1"} labelset

Existing gateways, like the prom-aggregation-gateway, or pushgateway are all or nothing in regards to aggregation - the pushgateway does not aggregate at all, completly replacing values as they come in. The aggregation gateway is the opposite here - it aggregates everything. What I wanted was something that allows more flexibility in how metrics are aggregated. To that end, I wrote the Gravel Gateway

Comments
  • metric label overriding

    metric label overriding

    I have 2 lambda functions pointing to the same gateway. the first lambda sends:

    • requests_num_total{LAMBDA_NAME="test_function"}

    the second lambda sends:

    • requests_num_total{job="test"}

    when I curl the gateway endpoint i see only:

    requests_num_total{LAMBDA_NAME="test_function"} 1
    requests_num_total{LAMBDA_NAME="test"} 1
    

    where the label job has been overridden by LAMBDA_NAME.

    How to reproduce

    • run the gateway locally
    • send the first metric
    echo '# TYPE requests_num_total counter
    requests_num_total{LAMBDA_NAME="test_function"} 1' | curl --data-binary @- 127.0.0.1:4278/metrics
    
    • send the second metric
    echo '# TYPE requests_num_total counter
    requests_num_total{job="test"} 1' | curl --data-binary @- 127.0.0.1:4278/metrics
    

    curl the gateway endpoint curl 127.0.0.1:4278/metrics:

    # TYPE requests_num_total counter
    requests_num_total{LAMBDA_NAME="test_function"} 1
    requests_num_total{LAMBDA_NAME="test"} 1
    

    as you can see the label job has been renamed to LAMBDA_NAME the first label sent always overrides any other following labels.

    bug 
    opened by ltagliamonte 10
  • Error 500 when pushing same metric with different label

    Error 500 when pushing same metric with different label

    Hi,

    thanks for developing Gravel Gateway!

    I have an issue in pushing the same metric with different label values. I'm not sure whether it's something I'm missing, but here's how to reproduce:

    • clone master@06e50df0210797b47a7ca13ccf8fb0f4b8bc159e
    • cargo run (rustc 1.57.0 (f1edd0429 2021-11-29))
    • The following curl returns correctly with 200 OK
    echo '# HELP mymetric My Metric
    > # TYPE mymetric gauge
    > mymetric{clearmode="aggregate",label="value1",job="my_metric"} 1.0' | curl -v -H 'Authorization: x' --data-binary @- localhost:4278/metrics
    
    • Multiple curl invocations as above yield the expected results (summed values for the metric).
    • But the following curl returns with 500 with message Unhandled rejection: AggregationError(ParseError(InvalidMetric("Cannot add a sample with 3 labels into a family with 2")))* (see label="value2")
    echo '# HELP mymetric My Metric
    > # TYPE mymetric gauge
    > mymetric{clearmode="aggregate",label="value2",job="my_metric"} 1.0' | curl -v -H 'Authorization: x' --data-binary @- localhost:4278/metrics
    

    Is this something to be expected?

    Please let me know if I can be of further help.

    Thanks!

    opened by sanjioh 3
  • Pebbles - mean not correct

    Pebbles - mean not correct

    The result of using mean5m seems not correct. To reproduce run the gateway locally and send:

    echo '# TYPE test_value gauge
    test_value{clearmode="mean5m"} 22' | curl --data-binary @- localhost:4278/metrics/job/testjob
    

    expected 22, returns 0

    curl localhost:4278/metrics
    # TYPE test_value gauge
    test_value{job="testjob"} 0
    

    the result is wrong for all successive submissions.

    opened by ltagliamonte 2
  • [Questions] Clustering

    [Questions] Clustering

    Hello @sinkingpoint, i'd like to ask few questions about clustering:

    • is this feature using any membership algorithm (like serf for example) so that if a node of the cluster is down the metric gets re-assigned to another node?
    • is the metric proxy best effort? If a node is down or temp unavailable the data point just discarded?
    • can the cluster be scaled in-out? is the data re-organized when this happens?

    From what i'm reading the current clustering feature is just best-effort and depending on the requirements not production ready?

    question 
    opened by ltagliamonte-dd 1
  • [Pebbles] AggregationError

    [Pebbles] AggregationError

    Trying to use pebbles with {clearmode="mean5m"} and it doesn't seem to work, maybe i'm getting the doc wrong. To replicate, run the gateway locally and perform:

    echo '# TYPE value2 gauge
    value2{clearmode="mean5m"} 1' | curl --data-binary @- localhost:4278/metrics/job/lttest
    

    this POST will successfully complete, by perform a GET /metrics we get:

    # TYPE value2 gauge
    value2{job="lttest"} 0
    

    let's now trying to push another value for the same gauge metric:

    echo '# TYPE value2 gauge
    value2{clearmode="mean5m"} 10' | curl --data-binary @- localhost:4278/metrics/job/lttest
    

    will receive the following error:

    Unhandled rejection: AggregationError(Error("invalid push - new push has different label names than the existing family"))
    
    opened by ltagliamonte 1
  • Found some unsupported items

    Found some unsupported items

    Found some unsupported items example:

    # HELP jvm_memory_bytes_used Used bytes of a given JVM memory area
    # TYPE jvm_memory_bytes_used gauge
    jvm_memory_bytes_used{area="heap",} 9.4094392E12
    
    
    # Service=myapp, date=Fri May 27 11:34:23 UTC 2022
    
    1. coma at the end of labels list is unsupported
    2. exponent uppercase letter "E" is not supported, but just lowercase "e"
    3. 2 and more blank lines after the last metric is not supported
    bug 
    opened by ixvick 1
  • Dockerized application

    Dockerized application

    Tested with:

    docker build -t prometheus-gravel-gateway .
    docker run -it --rm -e RUST_BACKTRACE=1 -p 4278:4278 prometheus-gravel-gateway
    

    The only problem with the dockerized app is that doesn't manager the INT signal, so pressing CTRL-C does not interrupt the daemon, the service has to be stopped with a docker kill

    opened by lorello 1
  • Bump openmetrics-parser to 0.3.1

    Bump openmetrics-parser to 0.3.1

    The new version of openmetrics-parser loosens the prometheus grammar a bit and thus should make us more compliant with actual Prometheus text exposition in the wild

    In theory this solves #5

    opened by sinkingpoint 0
  • Metrics that do not always have a recorded value cause 400 errors.

    Metrics that do not always have a recorded value cause 400 errors.

    Hello, when trialing out this project I encountered a bug when a job ran that didn't have a value recorded for one of the metrics. Any metrics detailed in the PUT after the empty metric would cause the server to throw back a 400 error, the Prometheus pushgateway handles empty metrics without errors.

    Sample PUT:

    # HELP metric_without_values_total This metric does not always have values
    # TYPE metric_without_values_total counter
    # HELP metric_with_values_total This metric will always have values
    # TYPE metric_with_values_total counter
    metric_with_values_total{a_label="label_value",another_label="a_value"} 1.0
    # HELP metric_with_values_created This metric will always have values
    # TYPE metric_with_values_created gauge
    metric_with_values_created{a_label="label_value",another_label="a_value"} 1.665577650707084e+09
    

    Gives the following error back:

    Invalid metric name in family. Family name is metric_without_values_total, but got a metric called metric_with_values_total
    

    Code to reproduce (Python):

    from prometheus_client import CollectorRegistry, Counter, push_to_gateway
    
    registry = CollectorRegistry()
    
    no_values = Counter(
        "metric_without_values",
        "This metric does not always have values",
        ["label"],
        registry=registry
    )
    
    has_values = Counter(
        "metric_with_values",
        "This metric will always have values",
        ["a_label", "another_label"],
        registry=registry
    )
    
    has_values.labels(a_label="label_value", another_label="a_value").inc()
    push_to_gateway("localhost:4278", job="test_job", registry=registry)
    
    bug 
    opened by jsymons 0
  • Base64 encode the authorization header

    Base64 encode the authorization header

    I understand the comment and intention here:

    You'll note that we don't base64 the authorization header, so it's not technically Basic Auth, but I don't like Base64ing it because I believe that gives a false sense of security. Instead, you should enable TLS

    However, the basic auth spec themselves mention that base64 has nothing to do with security and should be used in conjuction with TLS https://www.rfc-editor.org/rfc/rfc7617#section-1

    The downside of not following the spec is integration efforts with libraries. For example, I want to use prom-client to push to gravel gateway but now I can't use the methods meant for that because they will obviously base64 the authorization header for me.

    I would request the change to expect a base64 encoded authorization header to improve client integration. Or maybe just accept both ways for backwards compatibility

    enhancement awaiting-response 
    opened by rene84 1
  • Support K8s deployment via helm charts

    Support K8s deployment via helm charts

    In order to deploy the gravel-gateway within our K8s cluster, a helm chart would be helpful. This would make operations like ha scaling easier. We tried a drop in replacement in the pushgateway chart but needed to disable the probes for that. If a Helm chart is not feasible, could you add a section in the Readme which probes (Liveness, Readiness) should be configured.

    enhancement 
    opened by mbeckDWRE 1
  • Support PUT for clearing all existing metrics

    Support PUT for clearing all existing metrics

    All prometheus_client example uses push_to_gateway function to push metrics to the prometheus gateway. https://github.com/prometheus/client_python#exporting-to-a-pushgateway

    The push_to_gateway method uses PUT as http method: https://github.com/prometheus/client_python/blob/v0.14.1/prometheus_client/exposition.py#L448

    meanwhile pushadd_to_gateway uses POST as HTTP method: https://github.com/prometheus/client_python/blob/v0.14.1/prometheus_client/exposition.py#L479

    would be nice to add support to PUT method as well so that developers can switch gateways with no code change.

    enhancement 
    opened by ltagliamonte 2
  • Persistent state

    Persistent state

    Hi

    I saw your talk on KubeCon EU and found this project quite interesting, one thing I was wondering how you are tackling in your use case (if it is needed) is persistent state.

    As far as I can see in the readme or the code, there isnt anything written to disk and it is "more or less" stateless.

    I assume this would mean that if the gateway is restarted then it will loose the metrics that was already located in there and you "risk" having a scrape with absent metrics?

    How do you tackle this issue in your setup?, and is persisting the state something that you would consider useful? The state could for example be written to blob/s3 like storage periodically and allow it to start up again with data from the last session.

    enhancement 
    opened by kradalby 2
Releases(v1.6.0)
Owner
Colin Douch
Colin Douch
Drop-in proxy for Discord gateway connections and sessions allowing for zero downtime deploys

gateway-proxy This is a very hacky project, so it might stop working if Discord changes their API core. This is unlikely, but keep that in mind while

Jens Reidel 39 Nov 26, 2022
A rust-based command line tool to serve as a gateway for a Internet Computer replica.

icx-proxy A command line tool to serve as a gateway for a Internet Computer replica. Contributing Please follow the guidelines in the CONTRIBUTING.md

DFINITY 25 Sep 6, 2022
The registration server for WebThings Gateway.

Registration Server This server exposes an HTTP API that lets you register a WebThings Gateway for tunneling support. When combined with a PowerDNS se

WebThings 78 Nov 21, 2022
WireGuard gateway with SNI for portable connectivity.

Gateway This is a daemon that controls gateway servers. Gateway servers are servers that fulfil three major purposes: facilitating connectivity betwee

Fractal Networks 5 Aug 9, 2022
A simple API gateway written in Rust, using the Hyper and Reqwest libraries.

API Gateway A simple API gateway written in Rust, using the Hyper and Reqwest libraries. This gateway can be used to forward requests to different bac

Adão Raul 3 Apr 24, 2023
A library-first, lightweight, high-performance, cloud-native supported API gateway🪐 by RUST

Preview version, will not guarantee the stability of the API! Do NOT use in production environment! A library-first, lightweight, high-performance, cl

Ideal World 4 May 7, 2023
Bring the power of pre-signed URLs to your apps. Signway is a gateway for redirecting authentic signed URLs to the requested API

A gateway that proxies signed requests to other APIs. Check the docs for more info. If you are looking for the managed version checkout this link http

Gabriel 37 Jun 24, 2023
User-space Wireguard gateway allowing sharing network connection from environment where usual routing rules are inaccessible.

wgslirpy A command line tool (and a Rust library) for accepting incoming connections within a Wireguard link and routing them to external network usin

Vitaly Shukela 4 Aug 21, 2023
A sample API Gateway built in Rust (work in progress) for learning purposes

rust-api-gateway A sample API Gateway built in Rust (work in progress) for learning purposes. You can follow along by reading the tutorial articles: P

Luis Soares 4 Oct 29, 2023
An asynchronous Prometheus exporter for iptables

iptables_exporter An asynchronous Prometheus exporter for iptables iptables_exporter runs iptables-save --counter and scrapes the output to build Prom

Kevin K. 21 Dec 29, 2022
A Prometheus exporter for WireGuard

wireguard_exporter An asynchronous Prometheus exporter for wireguard wireguard_exporter runs wg show [..] and scrapes the output to build Prometheus m

Kevin K. 15 Dec 29, 2022
Export statistics of Mosquitto MQTT broker (topic: $SYS) to Prometheus

Preface The Mosquitto MQTT broker provides a number of statistics on the special $SYS/# topic (see mosquitto(8)). Build requirements As a Rust program

Bobobo-bo Bo-bobo 2 Dec 15, 2022
An asynchronous dumb exporter proxy for prometheus. This aggregates all the metrics and exposes as a single scrape endpoint.

A dumb light weight asynchronous exporter proxy This is a dumb lightweight asynchronous exporter proxy that will help to expose multiple application m

Dark streams 3 Dec 4, 2022
Prometheus instrumentation service for the NGINX RTMP module.

nginx-rtmp-exporter Prometheus instrumentation service for the NGINX RTMP module. Usage nginx-rtmp-exporter [OPTIONS] --scrape-url <SCRAPE_URL> O

kaylen ✨ 2 Jul 3, 2022
⏱ Cross-platform Prometheus style process metrics collector of metrics crate

⏱ metrics-process This crate provides Prometheus style process metrics collector of metrics crate for Linux, macOS, and Windows. Collector code is man

Alisue 12 Dec 16, 2022
`prometheus` backend for `metrics` crate

metrics + prometheus = ❤️ API Docs | Changelog prometheus backend for metrics crate. Motivation Rust has at least two ecosystems regarding metrics col

Instrumentisto Team 2 Dec 17, 2022
Easily add metrics to your system -- and actually understand them using automatically customized Prometheus queries

Autometrics ?? ✨ Autometrics is a macro that makes it trivial to add useful metrics to any function in your codebase. Easily understand and debug your

Fiberplane 341 Feb 6, 2023
Easily add metrics to your system -- and actually understand them using automatically customized Prometheus queries

A Rust macro that makes it easy to understand the error rate, response time, and production usage of any function in your code. Jump from your IDE to

Autometrics 462 Mar 6, 2023
A minimal, allocation-free Prometheus/OpenMetrics metrics implementation for `no-std` and embedded Rust.

tinymetrics a minimal, allocation-free Prometheus/OpenMetrics metrics implementation for no-std and embedded projects. why should you use it? you may

Eliza Weisman 282 Apr 16, 2023