Simple benchmark to compare different Kafka clients performance with similar configuration.

Related tags

Miscellaneous kafka
Overview

Kafka Producer Benchmark

Simple benchmark to compare different clients performance against similar configuration.

The project is relatively low tech and just dumping similar metrics against different client into files.

Which metrics are captured ?

The following metrics are collected (average across all topic partitions):

  • Send rate = Number of messages sent per seconds
  • Duration spent in the local queue = How many ms messages stayed in the queue before being sent ?
  • Batch size = Average batch size
  • Request Rate = number of Produce Request per seconds
  • Request Latency = Average latency of Produce Request
  • Records per Produce Request = Average number of records per Produce Request

At the end of the test all the messages are flushed then the benchmark will display the number of messages sent, the duration and number of produce request made.

How to run it ?

Running all scenarios

To run all scenarios use the following command :

./run-scenarios.sh

The script will execute all the scenario files named scenario-<description>.env at the root of the project against each Kafka Producer clients registered in the project.

Execution logs will be dumped in ./target/scenario-<description>.env/<client-name>.txt.

Running a single scenario

To run a scenario use the following command :

./run-scenario.sh <myscenario>.env

Execution logs will be dumped in ./target/scenario-<description>.env/<client-name>.txt.

Running with a custom docker-compose file

Just set docker_compose_file=... variable in the shell you're running run-scenario from. For instance:

docker_compose_file=docker-compose-kraft-3-brokers.yml;./run-scenario.sh <myscenario>.env

How to run the scenarios on a Confluent Cloud cluster ?

The repository now contains scripts to run the producer benchmark on Kubernetes connected to a Confluent Cloud cluster.

Pre-requisites

Pre-requisites are :

  • Terraform installed
  • Having a kubernetes environment to run the producer benchmark
  • Making sure you have a current context (kubectl config get-contexts)

Configuring the parameters for Confluent Cloud

Create a file called secrets.auto.tfvars in the cloud/setup folder with the following content :

confluent_org_id            = "<YOUR_CCLOUD_ORG_ID>"
confluent_environment_id    = "<YOUR_CCLOUD_ENV_ID>" 
confluent_cloud_api_key = "<YOUR_CCLOUD_API_KEY>"
confluent_cloud_api_secret = "<YOUR_CCLOUD_API_KEY>"

To know all the variables you can tweak, please read variable file

Running all scenarios

To run all scenarios use the following command :

./run-scenarios-cloud.sh

The script will execute all the scenario files named scenario-<description>.env at the root of the project against each Kafka Producer clients registered in the project.

Running a single scenario

To run a scenario use the following command :

./run-scenario-cloud.sh <myscenario>.env

How to contribute ?

Have any idea to make this benchmark better ? Found a bug ? Do not hesitate to report us via github issues and/or create a pull request.

Adding a new scenario ?

The ./run-scenarios.sh script is looking for all files matching the patern scenario-<description>.env. Existing scenarios has been named with the following naming convention scenario-<nbtopics>t<nbpartitions>p-<description>.env.

The easiest way to create a new scenario would be to duplicate an existing scenario file and play with the values. You can override any producer configuration available in the clients by using the following naming conventions :

  • Prefix with KAFKA_.
  • Convert to upper-case.
  • Replace a period (.) with a single underscore (_).

Adding a new client ?

We strongly recommend you to run your test against localhost:9092 you can leverage the default docker-compose.yml to have a development environment. If interested in experimenting with KIP-500, you can run your implementation locally against docker compose -f docker-compose-kraft.yml (1 single node acting as both controller and broker).

Make everything configurable via environment variable.

Default variables are :

  • KAFKA_BOOTSTRAP_SERVERS=localhost
  • NB_TOPICS=1
  • REPLICATION_FACTOR=1
  • NUMBER_OF_PARTITIONS=6
  • MESSAGE_SIZE=200
  • NB_MESSAGES=1000000
  • REPORTING_INTERVAL=1000
  • USE_RANDOM_KEYS=true
  • AGG_PER_TOPIC_NB_MESSAGES=1 Those variables should be used by all clients to makes things easier to configure, but each client implementation can have its own set of custom configuration variables.

Convert KAFKA_XXX into lowercase by replacing "_" with dots. This will help to play with batch.size/linger.ms/etc...

Specs

By default each client implementation will need to capture metrics at regular interval (defined via REPORTING_INTERVAL). The should be logged using the following format to make things easier to compare:

logger.info("Sent rate = {}/sec, duration spent in queue = {}ms, batch size = {}, request rate = {}/sec, request latency avg = {}ms, records per ProduceRequest = {}", avgSendRate, queueTimeAvg, batchSizeAvg, requestRate, requestLatencyAvg, recordsPerRequestAvg);

At the end of the run make sure all messages are delivered (ex. by calling producer.flush().

At the end of the run make sure you produce a log starting with "REPORT" keyword, this will be displayed at when executing scenarios. Example:

logger.info("REPORT: Produced %s with %s ProduceRequests in %s ms", lastTotalMsgsMetric, lastRequestCount, str(round(delta)))

My client is ready, how can I plug it in the test suite ?

Create a new folder at the root of the project Make sure you have a Dockerfile inside of it.

Update the PRODUCER_IMAGES in utils.sh to reference your new client. This will be taken into account to build the image but also to start the scenarios.

Comments
  • Jupyter report

    Jupyter report

    This PR basically add some python script to produce CSV files for all producer implementations and then use Jupiter to generate markdown report in results/${SCENARIO}/report.md.

    In addition to this there is a new script call run-jupiter.sh that start jupiter web-ui on http://localhost:8888, so you can hack and play with the notebooks.

    opened by jeanlouisboudart 4
  • Benchmark with Kraft/3 brokers

    Benchmark with Kraft/3 brokers

    • [ ] ~Change utils.sh to make the docker-compose-file a parameter~ <= not needed
    • [x] Same parameters as non-kraft implementations (broker address etc.) for consistency (i.e. change from 29092 to 9092)
    • [ ] ~Create the appropriate .env files~ <= not needed, same parameters
    • [x] docker-compose-kraft.yml (for local use, faster to start, easier to test) (i.e.: same as the current docker-compose.yml: no network config, etc.)
    opened by aesteve 1
  • Benchmark multiple config options against each other

    Benchmark multiple config options against each other

    Now that we have main methods in every language to run a benchmark, we could benefit from that to run different benchmarks consecutively with a different set of parameters. Would ease the task of checking the impact of, say, batch.size for that workload.

    The easy path codewise would be to keep everything as-is, run different scenarios consecutively, and build a new Jupyter report. The alternative would be to refactor the code we have now to accept either arrays (or range / step) for some env variables, or have a set of new ones saying which config options are fixed, which ones are not.

    💡 Idea 
    opened by aesteve 0
  • Allow to configure payload

    Allow to configure payload

    The user could use a txt file sample with one payload per line, etc. (so that it's language agnostic) and we would create the payload array from shuffling (and potentially repeating records) from this file.

    💡 Idea 
    opened by aesteve 0
  • WIP: Traffic throttling / generation

    WIP: Traffic throttling / generation

    • [ ] Design: how do we read this from env. variables, what's the default?
    • [x] Rust implementations
    • [ ] Python implementations
    • [ ] Java implementations
    • [ ] .NET implementations

    Design-wise, the following env. variables could used:

    • RATE_LIMIT_PER_SEC, defaults to -1, meaning "no throttling", best-effort
    • RATE_LIMIT_STRATEGY=throttle or RATE_LIMIT_STRATEGY=poisson to choose the traffic gen strategy, defaulting to "throttle" (and crash if there's no RATE_LIMIT_PER_SEC)

    @jeanlouisboudart OK with this?

    opened by aesteve 1
Owner
Jean-Louis Boudart
Jean-Louis Boudart
RustHunter is a modular incident response framework to build and compare environmental baselines

RustHunter is a modular incident response framework to build and compare environmental baselines. It is written in Rust and uses Ansible to collect data across multiple hosts.

Giovanni Pecoraro 13 Dec 12, 2022
librdkafka - the Apache Kafka C/C++ client library

librdkafka - the Apache Kafka C/C++ client library Copyright (c) 2012-2020, Magnus Edenhill. https://github.com/edenhill/librdkafka librdkafka is a C

Magnus Edenhill 6.4k Dec 31, 2022
Provides a Suricata Eve output for Kafka with Suricate Eve plugin

Suricata Eve Kafka Output Plugin for Suricata 6.0.x This plugin provides a Suricata Eve output for Kafka. Base on suricata-redis-output: https://githu

Center 7 Dec 15, 2022
Devops kafka topics like files with kls, ktail, khead and kecho

Devops kafka topics like files with kls, ktail, khead and kecho

imotai 4 Dec 31, 2021
kindly is a simple Rust implementation of a set-user-ID-root program, similar to sudo but in a much reduced way.

kindly is a simple Rust implementation of a set-user-ID-root program, similar to sudo but in a much reduced way.

Vinícius Miguel 26 Dec 5, 2022
Rustcat is a port listener that can be used for different purposes.

⚙️ Rustcat ⚙️ -- Basic Netcat Alternative -- About Rustcat is a port listener that can be used for different purposes.

Robiot 489 Dec 28, 2022
A repository containing dozens of projects requiring vastly different skillsets.

The 100 Project Challenge A repository containing dozens of projects requiring vastly different skillsets. All the projects that I might add to this r

null 4 Jun 21, 2022
A dynamic output configuration tool that automatically detects and configures connected outputs based on a set of profiles.

shikane A dynamic output configuration tool that automatically detects and configures connected outputs based on a set of profiles. Each profile speci

Hendrik Wolff 15 May 4, 2023
Swayidle alternative to handle wayland idle notifications, sleep and lock events in Rust with Lua scripting based configuration language

swayidle-rs This is intended as a replacement of sway's idle management daemon. I use it as a tool to understand rust message passing and state manage

Reza Jelveh 8 Nov 27, 2023
Totally Speedy Transmute (TST) is a library providing a small, performance oriented, safe version of std::mem::transmute

Totally Speedy Transmute An evil spiritual successor to Totally Safe Transmute What is it? Totally Speedy Transmute (TST) is a library providing a sma

John Schmidt 19 Jun 7, 2022
dm-jitaux is a Rust-based JIT compiler using modified auxtools, dmasm and Inkwell LLVM wrapper for boosting Byond DM performance without any hassle!

dm-jitaux is a Rust-based JIT compiler using modified auxtools, dmasm and Inkwell LLVM wrapper for boosting Byond DM performance without any hassle (such as rewriting/refactroing your DM code).

SS220 20 Dec 13, 2022
Damavand is a quantum circuit simulator. It can run on laptops or High Performance Computing architectures, such CPU distributed architectures or multi GPU distributed architectures.

Damavand is a quantum circuit simulator. It can run on laptops or High Performance Computing architectures, such CPU distributed architectures or multi GPU distributed architectures.

MichelNowak 0 Mar 29, 2022
Cloud-Based Microservice Performance Profiling Tool

Revelio Systems Revelio Systems is a student startup sponsored by UT Austin's Inventors Program in partnership with Trend Micro. Team: Tejas Saboo, So

Tejas Saboo 1 Feb 24, 2022
This is a simple Telegram bot with interface to Firefly III to process and store simple transactions.

Firefly Telegram Bot Fireflies are free, so beautiful. (Les lucioles sont libres, donc belles.) ― Charles de Leusse, Les Contes de la nuit This is a s

null 13 Dec 14, 2022
Cassette A simple, single-future, non-blocking executor intended for building state machines.

Cassette A simple, single-future, non-blocking executor intended for building state machines. Designed to be no-std and embedded friendly. This execut

James Munns 50 Jan 2, 2023
Simple library to host lv2 plugins. Is not meant to support any kind of GUI.

lv2-host-minimal Simple library to host lv2 plugins. Is not meant to support any kind of GUI. Host fx plugins (audio in, audio out) Set parameters Hos

Cody Bloemhard 11 Aug 31, 2022
A simple programming language for everyone.

Slang A simple programming language for everyone, made with Rust. State In very early stages. Plan is to create a byte-code compiler and make that exe

Slang, Inc. 11 Jul 1, 2022
A copypastable guide to implementing simple derive macros in Rust.

A copypastable guide to implementing simple derive macros in Rust. The goal Let's say we have a trait with a getter trait MyTrait {

Imbolc 131 Dec 27, 2022
A simple bot for discord.

Rusky Um simples bot para o discord! ?? Executando ⚠️ Antes de tudo você precisa do Rust Instalado você pode instalar clicando aqui Preparando Primeir

Rusky 3 Aug 12, 2022