A filter proxy for StatsD

Overview

MIT licensed Continuous integration Docker Cloud Build Status

statsd-filter-proxy-rs

statsd-filter-proxy-rs is efficient and lightweight StatsD proxy that filters out unwanted metrics to a StatsD server.

Why

"If you don't want metrics, why not just stop sending them?" you might ask. Sometimes disabling metrics isn't trivial because of scale, legacy code and time constraints. Sometimes the fastest way to disable a large number of metrics is to deploy a proxy to block them.

Getting started

To build the proxy, you need

  • The rust toolset
    • Rust 1.51+
    • Cargo
  • You can also get them from rustup

Compile and Run Locally

export PROXY_CONFIG_FILE=/path/to/your/proxy-config-file.json
RUST_LOG=debug 
cargo run --release

PROXY_CONFIG_FILE is a required variable that points to the configuration file path.

RUST_LOG is an optional variable that defines the log level. They can be error, warn, info, debug or trace.

Run Locally Through Docker

Make a JSON configuration file locally. This sample configuration below would make the filter proxy listen on port 8125, and forward datagrams to port 8127.

{
    "listen_host": "0.0.0.0",
    "listen_port": 8125,
    "target_host": "127.0.0.1",
    "target_port": 8127,
    "metric_blocklist": [
        "foo1",
        "foo2",
    ]
}

Now run the proxy with the configuration mounted through Docker volume.

docker run -it \
    --volume $(pwd)/config.json:/app/config.json:Z \
    -e PROXY_CONFIG_FILE=/app/config.json \
    -e RUST_LOG=trace \
    -p 8125:8125/udp \
    askldjd/statsd-filter-proxy-rs:latest

Configuration

statsd-filter-proxy-rs takes in a JSON file as the configuration file.

{
    // The host to bind to
    "listen_host": "0.0.0.0",
    
    // The UDP port to listen on for datagrams
    "listen_port": 8125,

    // The target StatsD server address to forward to
    "target_host": "0.0.0.0",
    
    // The target StatsD server port to forward to
    "target_port": 8125,

    // The list of metrics prefix to block
    "metric_blocklist": [
        "prefix1",
        "prefix2",
        "prefix3"
    ]

    // Set to true to delegate the send path to the tokio threadpool.
    // If you turn this on, filtering and the sending of the datagram may
    // be performed in Tokio background threads.
    //
    // Pros:
    // - scalable to more than 1 CPU, especially useful if your filter list 
    //   large enough to become a bottleneck.
    // Cons:
    // - slightly more overhead performed per message
    //   - an extra deep copy of the send buffer
    //   - Arc increments for sharing objects among threads
    //
    // - message sent might not be the same order they are received, since
    //   send path is concurrent
    "multi_thread": true | false (optional, default=false)
}

Tests

Unit Test

cargo run test

Integration Test

python test/integration_test.py

Benchmark

statsd-filter-proxy was originally written in Node.js. So benchmark will use the original version as a baseline.

packet latency JS Rust (single-threaded) RS (multi-threaded)
Median(us) 639 399 499
P95(us) 853 434 547

The latency number should not be taken in absolute form because it doesn not account for benchmark overhead (in Python).

CPU = Intel i7-8700K (12) @ 4.700GHz

Limitations / Known Issues

  • StatsD datagram are capped at 8192 bytes. This can be only be adjusted in code at the moment.
Comments
  • Added support for multiple metrics per StatsD datagram

    Added support for multiple metrics per StatsD datagram

    StatsD protocol allows multiple metrics in a single datagram, as long as they are separated by \n. This is known as "buffering" in Datadog dogstatsd. Filtering logic now supports this by tokenizing the string up front, and filter out each metric independently.

    opened by askldjd 0
  • Calling safe function can lead to undefined behaviour

    Calling safe function can lead to undefined behaviour

    https://github.com/askldjd/statsd-filter-proxy-rs/blob/6ba64dabc280b3e483a93f29df93688945b7f8fc/src/filter.rs#L4

    If &buf isn't valid UTF-8, this results in undefined behaviour. Since filter is not marked as unsafe, it is currently possible to cause undefined behaviour by calling a safe function.

    As far as I can tell, this buffer comes from a socket from outside the application, so it is in my opinion a better idea to use the safe str::from_utf8 with proper error handling instead.

    opened by gpluscb 0
  • Reduce allocations

    Reduce allocations

    This depends on #11 I guess (the PR branch is based on the PR branch from #11)

    Please review carefully and also see the commit message of the last commit for a rationale.

    opened by matthiasbeyer 0
  • Some bench improvements and greetings from the Tremor team :)

    Some bench improvements and greetings from the Tremor team :)

    Hi Alan,

    I came over your blog post on optimizing away 700 CPUs with rust, it's a really good read and was fascinating to see since our team at Wayfair ran into exactly the same issue and came up with nearly the same solution :D.

    I went through the benchmarks today and noticed you're selling yourself short, the node.js version did only half the logic that the rust version did (it ignored newlines) I twiddled something together to make them more the same on the logical front.

    I also changed the benchmark script to emit some filtered metrics (to see the impact on that), I hope you approve of that change.

    With the updated results I ran your benchmark through tremor and notices something interesting, your code is quicker for single-core use but not quite as fast when it comes to multi-core setups. (I'll put my local benchmark below ran with taskset -c 1 and taskset -c 1,2,3 for single and 3c respectively).

    We donated tremor to the CNCF last year, so I'm curious if you're open to chatting, sharing some experiences, and possibly expanding the use case - neither your nor our employer is in the game of writing this kind of software as part of their core business so we're probably both very produced focused which seems to align well :). If you're interested I'll leave you a link to our community server.


    | | nodejs | nodejs (3c) | rust_pxy | rust_pxy (3c) | tremor | tremor (3c) | |------------|--------|-------------|----------|---------------|--------|-------------| | Median(us) | 110 | 153 | 78 | 137 | 97 | 94 | | P95(us) | 125 | 183 | 103 | 150 | 112 | 137 |

    PS: we found a bug in our stated decoder when trying this :D thank you!

    opened by Licenser 0
  • Few optimization for NodeJS version

    Few optimization for NodeJS version

    I came here after reading the post of medium - https://medium.com/tenable-techblog/optimizing-700-cpus-away-with-rust-dc7a000dbdb2

    Although I don't think JS is going beat Rust but I have few suggestions to improve the performance characteristics of the code.

    1. Use latest version of NodeJS
    2. extract error handling function outside which is registered to "client.send" method
    3. Run >> node --expose-gc --inspect statsd-filter-proxy.js
    4. The most import step is to do little experiment with socket.setRecvBufferSize(size) and socket.setSendBufferSize(size) can give you optimum size of buffer which can help in reducing the GC pressure on heap.
    5. Consider receiving, storing and sending packets in batch rather than one by one
    6. Explicitly set the heap size >> node --max-old-space-size=10240 statsd-filter-proxy.js. here the number for 10240 is just an example, not a concrete suggestion. First 5 steps will help you to come to actual number. Similarily set the value of min_semi_space_size, max_semi_space_size. more info here, https://deepu.tech/memory-management-in-v8/
    7. use "taskset" linux command to pin nodejs process to particular CPU core instead of all the cores which the default, this ensures maximum yield
    opened by corporatepiyush 0
  • Potential performance wins

    Potential performance wins

    There are a few potential performance wins you can explore here I think.

    1. Use a trie rather than a linear scan for the prefix matching.
    2. You could allocate less in filter. For example, by directly building a Vec<u8> instead of a String. Currently there are two allocations, a Vec<&str> and String. With that approach you'd only have one and you could potentially allocate with too much capacity to prevent resizing the Vec.

    EDIT: With the signature pub fn filter(block_list: &Vec<String>, buf: &mut [u8]) -> &mut [u8] and some index fiddlig you could probably entirely avoid allocations in filter.

    opened by k0nserv 0
Releases(0.2.1)
Owner
Alan Ning
SRE @tenable
Alan Ning
HTTP Proxy based solution for real-time interception and prioritization of SQL queries.

starproxy ⚠️ starproxy is a prototype: Not currently used in production, but will likely be some day. Table of Contents starproxy Table of Contents Ba

Will Eaton 5 Mar 6, 2023
Easy c̵̰͠r̵̛̠ö̴̪s̶̩̒s̵̭̀-t̶̲͝h̶̯̚r̵̺͐e̷̖̽ḁ̴̍d̶̖̔ ȓ̵͙ė̶͎ḟ̴͙e̸̖͛r̶̖͗ë̶̱́ṉ̵̒ĉ̷̥e̷͚̍ s̷̹͌h̷̲̉a̵̭͋r̷̫̊ḭ̵̊n̷̬͂g̵̦̃ f̶̻̊ơ̵̜ṟ̸̈́ R̵̞̋ù̵̺s̷̖̅ţ̸͗!̸̼͋

Rust S̵̓i̸̓n̵̉ I̴n̴f̶e̸r̵n̷a̴l mutability! Howdy, friendly Rust developer! Ever had a value get m̵̯̅ð̶͊v̴̮̾ê̴̼͘d away right under your nose just when

null 294 Dec 23, 2022
Bioyino is a distributed statsd-protocol server with carbon backend.

Bioyino The StatsD server written in Rust Description Bioyino is a distributed statsd-protocol server with carbon backend. Features all basic metric t

avito.tech 206 Dec 13, 2022
Leaksignal Proxy-Wasm Filter Module

Website | Docs | Blog | Slack ?? There are all kinds of sensitive data flowing through my services, but I don’t know which ones or what data. ?? LeakS

LeakSignal 27 Feb 5, 2023
Proxy sentry request to a sentry server using a tunnel/proxy endpoint

Sentry Tunnel This is a proxy that forwards tunneled sentry requests to the real sentry server. The implementation is based on the explanation provide

Paul FLORENCE 14 Dec 20, 2022
UDP proxy with Proxy Protocol and mmproxy support

udppp UDP proxy with Proxy Protocol and mmproxy support. Features Async Support Proxy Protocol V2 SOCKET preserve client IP addresses in L7 proxies(mm

b23r0 10 Dec 18, 2022
Lightweight proxy that allows redirect HTTP(S) traffic through a proxy.

Proxyswarm Proxyswarm is a lightweight proxy that allows redirect HTTP(S) traffic through a proxy. WARNING: This app isn't recomended for download lar

Jorge Alejandro Jimenez Luna 4 Apr 16, 2022
Web3-proxy: a fast caching and load balancing proxy for web3 (Ethereum or similar) JsonRPC servers.

web3-proxy Web3-proxy is a fast caching and load balancing proxy for web3 (Ethereum or similar) JsonRPC servers. Signed transactions (eth_sendRawTrans

null 55 Jan 8, 2023
A TCP proxy using HTTP - Reach SSH behind a Nginx reverse proxy

?? TCP over HTTP ?? The Questions ?? What does it do? You can proxy TCP traffic over HTTP. A basic setup would be: [Your TCP target] <--TCP-- [Exit No

Julian 185 Dec 15, 2022
An efficient way to filter duplicate lines from input, à la uniq.

runiq This project offers an efficient way (in both time and space) to filter duplicate entries (lines) from texual input. This project was born from

Isaac Whitfield 170 Dec 24, 2022
url parameter parser for rest filter inquiry

inquerest Inquerest can parse complex url query into a SQL abstract syntax tree. Example this url: /person?age=lt.42&(student=eq.true|gender=eq.'M')&

Jovansonlee Cesar 25 Nov 2, 2020
Use enum to filter something, support | and & operator.

Filter Use enum to filter something, support | and & operator. Just need to implement Filter Trait with filter-macros crate. How to work Example #[add

上铺小哥 9 Feb 8, 2022
Test the interception/filter of UDP 53 of your local networks or hotspots.

udp53_lookup Test the interception/filter of UDP 53 of your local networks or hotspots. Inspired by BennyThink/UDP53-Filter-Type . What's the purpose?

null 1 Dec 6, 2021
Wasm video filter booth app written in Rust

Video effect booth written in Rust and WebAssembly Play with it here: https://mtharrison.github.io/wasmbooth/ Aim I wrote this purely to teach myself

Matt Harrison 75 Nov 21, 2022
This repository features a simple Kalman filter and RTS smoother (KFS) implementation in Rust by using the ndarray library.

Kalman filter and RTS smoother in Rust (ndarray) This repository features a simple Kalman filter and RTS smoother (KFS) implementation in Rust by usin

SPDEs 3 Dec 1, 2022
Alexander Mongus is a state-of-the-art filter to sneak amogus characters in pictures

A. Mongus Go to: http://www.lortex.org/amogu/ ??? This is a client-side, Webassembly-based filter to hide amongus characters in your images. Example:

Lucas Pluvinage 3 Apr 16, 2022
Yet Another Kalman Filter Implementation. As well as Lie Theory (Lie group and algebra) on SE(3). [no_std] is supported by default.

yakf - Yet Another Kalman Filter Yet Another Kalman Filter Implementation, as well as, Lie Theory (Lie group, algebra, vector) on SO(3), SE(3), SO(2),

null 7 Dec 1, 2022
Filter, Sort & Delete Duplicate Files Recursively

Deduplicator Find, Sort, Filter & Delete duplicate files Usage Usage: deduplicator [OPTIONS] [scan_dir_path] Arguments: [scan_dir_path] Run Dedupl

Sreedev Kodichath 108 Jan 27, 2023
A fast, simple and lightweight Bloom filter library for Python, fully implemented in Rust.

rBloom A fast, simple and lightweight Bloom filter library for Python, fully implemented in Rust. It's designed to be as pythonic as possible, mimicki

Kenan Hanke 91 Feb 4, 2023
A tool to filter sites in a FASTA-format whole-genome pseudo-alignment

Core-SNP-filter This is a tool to filter sites (i.e. columns) in a FASTA-format whole-genome pseudo-alignment based on: Whether the site contains vari

Ryan Wick 15 Apr 2, 2023