Concurrent and multi-stage data ingestion and data processing with Rust+Tokio

Overview

TokioSky

Build concurrent and multi-stage data ingestion and data processing pipelines with Rust+Tokio. TokioSky allows developers to consume data efficiently from different sources, known as producers, such as Apache Kafka, Apache Pulsar and others. inspired by elixir broadway

Features

TokioSky takes the burden of defining concurrent GenStage topologies and provide a simple configuration API that automatically defines concurrent producers, concurrent processing, leading to both time and cost efficient ingestion and processing of data. It features:

  • Producer - source of data piplines

  • Processor - process message also can dispath to next stage by dispatcher

  • BatchProcessor process group of message, that is used for last stage, have not next stage

  • Dispatcher - dispatch message with three mode (RoundRobin, BroadCast, Partition)

  • Customizable - can use built-in Producer, Processor, BatchProcessor like Apache Kafka, Apache Pulsar or write your custom Producer, Processor, BatchProcessor

  • Batching - TokioSky provides built-in batching, allowing you to group messages either by size and/or by time. This is important in systems such as Amazon SQS, where batching is the most efficient way to consume messages, both in terms of time and cost. Good Example imagine processor has to check out a database connection to insert a record for every single insert operation, That’s pretty inefficient, especially if we’re processing lots of inserts.Fortunately, with TokioSky we can use this technique, is grouping operations into batches, otherwise known as Partitioning. See Example

  • Dynamic batching - TokioSky allows developers to batch messages based on custom criteria. For example, if your pipeline needs to build batches based on the user_id, email address, etc, See Example

  • Ordering and Partitioning - TokioSky allows developers to partition messages across workers, guaranteeing messages within the same partition are processed in order. For example, if you want to guarantee all events tied to a given user_id are processed in order and not concurrently, you can use Dispatcher with Partition mode option. See Example.

  • Data Collector - when source Producer of your app is web server and need absorb data from client request can use 'Collector' as Producer, that asynchronous absorb data, then feeds to pipelines See Example

  • Graceful shutdown - first terminate Producers, wait until all processors job done, then shutdown

  • Topology - create and syncing components

Examples

The complete Examples on Link.

Explain

  • factory - instance factory

  • concurrency - creates multiple instance (For parallelism)

  • router - used by dispatcher for routing message (RoundRobin || BroadCast || Partition)

  • producer_buffer_pool - producer internally used buffer for increase throughout

  • run_topology - TokioSky always have one Producer Layer and at-least have 1 processor layer and at-max 5 processor layer and 1 optional layer batcher for creating and syncing components must use run_topology_X or run_topology_X_with_batcher

Attention

  • Producer.dispatcher cannot be Partition mode

  • Processor if have not next stage channel must return ProcResult::Continue unless processor (skip) that message

  • All Built-in processor if have next stage, must dispatcher not be partition mode

Crates.io

tokio_sky = 1.0.0

Author

  • DanyalMh

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

You might also like...
Allows processing of iterators of Result types

try-continue try-continue provides one method, try_continue, which allows you to work with iterators of type ResultT, _, as if they were simply iter

convert images to ansi or irc, with a bunch of post-processing filters
convert images to ansi or irc, with a bunch of post-processing filters

img2irc (0.2.0) img2irc is a utility which converts images to halfblock irc/ansi art, with a lot of post-processing filters halfblock means that each

Russh - Async (tokio) SSH2 client and server rimplementation

Russh Async (tokio) SSH2 client and server rimplementation. This is a fork of Thrussh by Pierre-Étienne Meunier which adds: More safety guarantees AES

Services Info Register/KeepAlive/Publish/Subscribe. Based on etcd-rs, tokio

Services Info Register/KeepAlive/Publish/Subscribe. Based on etcd-rs, tokio

CarLI is a framework for creating single-command and multi-command CLI applications in Rust

CarLI is a framework for creating single-command and multi-command CLI applications in Rust. The framework provides error and IO types better suited for the command line environment, especially in cases where unit testing is needed.

Ideas = Creations, a multi-language CMS(Content Management System) based on Rust Web stacks, with long-term upgrade and maintenance.

Ideas = Creations 中文 RustHub: Rust ideas yesterday, shining creations today! This repository holds source code used to run https://rusthub.org, it's

Show unused code from multi-crate Rust projects

Warnalyzer Remove unused code from multi-crate Rust projects. The dead_code lint family of rustc is limited to one crate only and thus can't tell whet

A multi-page fuzzy launcher for your terminal, written in Rust.
A multi-page fuzzy launcher for your terminal, written in Rust.

fr33zmenu A multi-page fuzzy launcher for your terminal, written in Rust. Supports theming and multiple keybind schemes, including basic vim keybinds.

Pure Rust multi-line text handling
Pure Rust multi-line text handling

COSMIC Text Pure Rust multi-line text handling. COSMIC Text provides advanced text shaping, layout, and rendering wrapped up into a simple abstraction

Owner
DanyalMh
I love to make Rust to be great and productivity choice for every situation
DanyalMh
🚀 JavaScript driver for ScyllaDB, harnessing Rust's power through napi-rs for top performance. Pre-release stage. 🧪🔧

?? JavaScript driver for ScyllaDB. Pre-release stage. ???? ⚠️ Disclaimer ⚠️ This repository and the associated npm package are currently in a ?? pre-r

Daniel Boll 16 Oct 21, 2023
Shaping, Processing, and Transforming Data with the Power of Sulfur with Rust

Sulfur WIP https://www.youtube.com/watch?v=PAAvNmoqDq0 "Shaping, Processing, and Transforming Data with the Power of Sulfur" Welcome to the Sulfur pro

Emre 6 Aug 22, 2023
A Rust synchronisation primitive for "Multiplexed Concurrent Single-Threaded Read" access

exit-left verb; 1. To exit or disappear in a quiet, non-dramatic fashion, making way for more interesting events. 2. (imperative) Leave the scene, and

Jonathan de Jong 0 Dec 5, 2021
A lightweight async Web crawler in Rust, optimized for concurrent scraping while respecting `robots.txt` rules.

??️ crawly A lightweight and efficient web crawler in Rust, optimized for concurrent scraping while respecting robots.txt rules. ?? Features Concurren

CrystalSoft 5 Aug 29, 2023
Spawn multiple concurrent unix terminals in Discord

Using this bot can be exceedingly dangerous since you're basically granting people direct access to your shell.

Simon Larsson 11 Jun 1, 2021
A concurrent, append-only vector

The vector provided by this crate suports concurrent get and push operations. Reads are always lock-free, as are writes except when resizing is required.

Ibraheem Ahmed 18 Nov 27, 2022
Implementation of CSP for concurrent programming.

CSPLib Communicating Sequential Processes (CSP) Background Communicating Sequential Processes (CSP) is a way of writing a concurrent application using

Akira Hayakawa 4 Nov 9, 2022
Rust Imaging Library's Python binding: A performant and high-level image processing library for Python written in Rust

ril-py Rust Imaging Library for Python: Python bindings for ril, a performant and high-level image processing library written in Rust. What's this? Th

Cryptex 13 Dec 6, 2022
Simple, extensible multithreaded background job processing library for Rust.

Apalis Apalis is a simple, extensible multithreaded background job processing library for Rust. Features Simple and predictable job handling model. Jo

null 50 Jan 2, 2023
Cornucopia is a small CLI utility resting on tokio-postgres and designed to facilitate PostgreSQL workflows in Rust

Cornucopia Generate type checked Rust from your SQL Install | Example Cornucopia is a small CLI utility resting on tokio-postgres and designed to faci

Louis Gariépy 1 Dec 20, 2022