Shaping, Processing, and Transforming Data with the Power of Sulfur with Rust

Related tags

Command-line sulfur
Overview

Sulfur

alt text

WIP

https://www.youtube.com/watch?v=PAAvNmoqDq0

"Shaping, Processing, and Transforming Data with the Power of Sulfur"

Welcome to the Sulfur project, where we harness the elemental power of data transformation. Just like sulfur can reshape its form, our platform reshapes, processes, and transforms data, turning it into valuable insights.

Join us on this journey of alchemy where data turns into gold through customization and innovation. Unleash the potential of Sulfur and turn raw data into refined intelligence.

Usage

  1. Clone the repository:

    git clone https://github.com/emreyalvac/sulfur.git
  2. Navigate to the project directory:

    cd sulfur
  3. Install dependencies:

    cargo build
  4. Configure config.yml for your pipelines.

  5. To run pipelines, use the terminal:

    cargo run -- --config config.yml

Configuration (config.yml)

Here's an example of how to configure your pipelines using the config.yml file:

sulfur:
  - name: "Mongo to Redis"
    cron: "0 0 * * *"
    source:
      type: "Mongo"
      host: "example.com"
      port: "5432"
      user: "user"
      password: "password"
      database: "db_name"
      collection: "collection_name"
    destination:
      type: "Redis"
      host: "redis.example.com"
      port: "6379"
      password: "redis_password"
      key: "data_key"      

Configuration Details

  • Each pipeline is defined under the sulfur key.
  • name: A descriptive name for your pipeline.
  • cron (optional): The cron expression to schedule pipeline runs (e.g., "0 0 * * *" for daily runs).
  • source: Specifies the data source configuration.
  • destination: Specifies the data destination configuration.

For the source and destination configurations, you can specify various parameters based on the type of engine you're using (e.g., "Database," "Redis," "BigQuery," etc.).

Remember to customize the configuration according to your project's specific requirements.

Transform Data, Shape Intelligence

Sulfur is more than a project; it's a catalyst for data alchemy. Whether you're merging, filtering, or aggregating, Sulfur empowers you to sculpt raw data into refined insights, making your data truly valuable.

Supported Engines

Sulfur currently supports the following data storage engines:

Engine Type Description Parameters
ElasticSearch Database Use Elasticsearch as a data source or destination. host, port, user, password, index
MongoDB Database Use MongoDB as a data source or destination. host, port, user, password, database, collection
Redis In-Memory Data Store Use Redis as a data source or destination. host, port, password, key
BigQuery Data Warehouse Use Google BigQuery as a data destination. project_id, dataset_id, table_id, credentials (service_key.json)

We're committed to expanding the list of supported engines to give you even more flexibility. Adding new platforms is a straightforward process, allowing you to tailor Sulfur to your evolving data needs.

Upcoming Storage Possibilities

At Sulfur, we're committed to expanding the range of supported storage engines to cater to your evolving needs. Here's a sneak peek at some potential storage engines that might be added in the future:

  1. Amazon S3
  2. Microsoft Azure Blob Storage
  3. PostgreSQL
  4. MySQL
  5. SQLite
  6. Cassandra
  7. Apache Hadoop HDFS
  8. Amazon Redshift
  9. Snowflake
  10. Apache Kafka
  11. Oracle Database
  12. IBM Db2
  13. Microsoft SQL Server
  14. Apache Hive
  15. MongoDB Atlas
  16. Elasticsearch Service
  17. Redis Cloud
  18. Memcached
  19. InfluxDB
  20. Kafka
  21. RabbitMQ

Stay tuned as we continue to explore and add more storage engine options to the Sulfur platform. We're excited to provide you with a broader range of choices for your data storage needs!

Advanced Data Transformation

At Sulfur, we're dedicated to evolving our platform to meet your needs. We're excited to introduce a feature:

How It Works

The Advanced Data Transformation feature will provide a powerful toolkit for crafting precise data transformations. From mathematical operations to conditional logic, this feature grants you unparalleled control over your data.

Stay tuned as we work diligently to unveil this enhancement. Your data transformation possibilities are about to expand like never before!

Advanced Data Transformation using Python

Sulfur enables you to harness the power of custom Python scripts for advanced data transformation during the pipeline process. By integrating Python scripts, you can perform complex data manipulations, calculations, and enrichments before the data is forwarded to its destination.

Implementing Advanced Transformation

To demonstrate the power of custom Python scripts for data transformation, we've provided an example advanced_transform function that showcases a basic transformation:

import json

def advanced_transform(*args):
   # Unpack the arguments tuple
   data_string = args[0]

   # Load the JSON data
   data = json.loads(data_string)

   # Perform your advanced transformation here
   transformed_data = {
      "name": "TRANSFORMED",
      "original_data": data
   }

   # Convert the transformed data back to a JSON string
   transformed_json = json.dumps(transformed_data)

   return transformed_json
sulfur:
  - name: "Pipeline1"
    cron: "0 0 * * *"
    transform:
      file: './transform.py'
      fn: 'advanced_transform'
    source:
      type: "MongoDB"
      host: "mongodb.example.com"
      port: "27017"
      user: "user"
      password: "password"
      database: "db_name"
      collection: "collection_name"
    destination:
      type: "Redis"
      host: "redis.example.com"
      port: "6379"
      password: "redis_password"
      key: "data_key"

Contributing

See Contributing Guidelines for details on how to contribute to this project.

License

This project is licensed under the MIT License. See the LICENSE file for details.

You might also like...
Background task processing for Rust applications with Tokio, Diesel, and PostgreSQL.
Background task processing for Rust applications with Tokio, Diesel, and PostgreSQL.

Async persistent background task processing for Rust applications with Tokio. Queue asynchronous tasks to be processed by workers. It's designed to be

Simple, extensible multithreaded background job processing library for Rust.
Simple, extensible multithreaded background job processing library for Rust.

Apalis Apalis is a simple, extensible multithreaded background job processing library for Rust. Features Simple and predictable job handling model. Jo

Allows processing of iterators of Result types

try-continue try-continue provides one method, try_continue, which allows you to work with iterators of type ResultT, _, as if they were simply iter

convert images to ansi or irc, with a bunch of post-processing filters
convert images to ansi or irc, with a bunch of post-processing filters

img2irc (0.2.0) img2irc is a utility which converts images to halfblock irc/ansi art, with a lot of post-processing filters halfblock means that each

Infer a JSON schema from example data, produce nonsense synthetic data (drivel) according to the schema

drivel drivel is a command-line tool written in Rust for inferring a schema from an example JSON (or JSON lines) file, and generating synthetic data (

A lightweight and high-performance order-book designed to process level 2 and trades data. Available in Rust and Python

ninjabook A lightweight and high-performance order-book implemented in Rust, designed to process level 2 and trades data. Available in Python and Rust

Nodium is an easy-to-use data analysis and automation platform built using Rust, designed to be versatile and modular.
Nodium is an easy-to-use data analysis and automation platform built using Rust, designed to be versatile and modular.

Nodium is an easy-to-use data analysis and automation platform built using Rust, designed to be versatile and modular. Nodium aims to provide a user-friendly visual node-based interface for various tasks.

Collection of immutable and persistent data structures written in Rust, inspired by the standard libraries found in Haskell, Closure and OCaml

PRust: (P)ersistent & Immutable Data Structures in (Rust) This library houses a collection of immutable and persistent data structures, inspired by th

A general-purpose, transactional, relational database that uses Datalog and focuses on graph data and algorithms

cozo A general-purpose, transactional, relational database that uses Datalog for query and focuses on graph data and algorithms. Features Relational d

Owner
Emre
developer @hepsiburada
Emre
A common library and set of test cases for transforming OSM tags to lane specifications

osm2lanes See discussion for context. This repo is currently just for starting this experiment. No license chosen yet. Structure data tests.json—tests

A/B Street 29 Nov 16, 2022
Concurrent and multi-stage data ingestion and data processing with Rust+Tokio

TokioSky Build concurrent and multi-stage data ingestion and data processing pipelines with Rust+Tokio. TokioSky allows developers to consume data eff

DanyalMh 29 Dec 11, 2022
Sample and plot power consumption, average frequency and cpu die temperatures over time.

sense Sense is a small tool to gather data on cpu temperature, power usage and clock frequency and plot graphs during some load. Dependencies Sense is

Luuk van der Duim 6 Oct 31, 2022
Maccha is an extremely extensible and themable power menu for Windows, macOS, and Linux.

Maccha I hate coffee. Maccha is an extremely extensible and themable power menu for Windows, macOS, and Linux. Plugins Plugins are written in Rust (ot

Kyza 9 May 13, 2023
RustGPT is a ChatGPT UI built with Rust + HTMX: the power of Rust coupled with the simplicity of HTMX 💚

RustGPT ??✨ RustGPT.Blog.Post.mp4 Welcome to the RustGPT repository! Here, you'll find a web ChatGPT clone entirely crafted using Rust and HTMX, where

Bitswired 529 Dec 4, 2023
🚀 JavaScript driver for ScyllaDB, harnessing Rust's power through napi-rs for top performance. Pre-release stage. 🧪🔧

?? JavaScript driver for ScyllaDB. Pre-release stage. ???? ⚠️ Disclaimer ⚠️ This repository and the associated npm package are currently in a ?? pre-r

Daniel Boll 16 Oct 21, 2023
A command line tool to control the power state of Valve Base Stations 2.0.

lighthousectl A command line tool to control the power state of Valve Base Stations 2.0. Usage Scan All Base Stations It scans endlessly. You can stop

KOBA789 11 Aug 9, 2022
Move multiple files, with the power of your editor

NOTE FOR USERS THAT WANT TO TRY THIS OUT! The current code on the master branch doesn’t work as documented, as a newer more powerful interface is bein

Thomas Voss 3 Dec 22, 2022
Make beautiful colored code listings in LaTeX with the power of TreeSitter.

What is this? This is a CLI tool that consumes TreeSitter's output and transforms it into LaTeX code that will produce syntax-colored code listing. If

Tomáš Lebeda 11 Sep 4, 2023
Rust Imaging Library's Python binding: A performant and high-level image processing library for Python written in Rust

ril-py Rust Imaging Library for Python: Python bindings for ril, a performant and high-level image processing library written in Rust. What's this? Th

Cryptex 13 Dec 6, 2022