The ultimate Data Engineering Chadstack. Apache Airflow running Rust. Bring it.

Overview

RustOnApacheAirflow

The ultimate Data Engineering Chadstack. Apache Airflow running Rust. Bring it.

This is part of a larger blog post trying to do something never done before. Rust on Airflow. Is it even possible?

Using the astro cli we get a local Airflow up and running. Next, we write a Rust program that can do fixed-width-to-tab conversion of files, both reading from and writing to s3 buckets. Next, build a linux binary.

We then write an Airflow DAG that can download the Rust binary onto the local Airflow instance, then trigger that Rust binary passing it an s3 file uri for processing.

https://www.confessionsofadataguy.com/the-ultimate-data-engineering-chadstack-running-rust-inside-apache-airflow/

You might also like...
A HashMap/Vector hybrid: efficient, ordered key-value data storage in Rust.

hashvec A HashVec is a hash map / dictionary whose key-value pairs are stored (and can be iterated over) in a fixed order, by default the order in whi

Library and proc macro to analyze memory usage of data structures in rust.
Library and proc macro to analyze memory usage of data structures in rust.

Allocative: memory profiler for Rust This crate implements a lightweight memory profiler which allows object traversal and memory size introspection.

A compact generational arena data structure for Rust.

Compact Generational Arena This crate provides ArenaT, a contiguous growable container which assigns and returns IDs to values when they are added t

A fast rendezvous in rust where data can optionally be swapped between the two threads.

rendezvous_swap A rendezvous is an execution barrier between a pair of threads, but this crate also provides the option of swapping data at the synchr

Platform independent data channels for WebRTC/Rust.

preach Platform independent data channels Preach provides an abstraction for WebRTC data channels that runs on both native and web platforms. Preach m

Rust library for concurrent data access, using memory-mapped files, zero-copy deserialization, and wait-free synchronization.

mmap-sync mmap-sync is a Rust crate designed to manage high-performance, concurrent data access between a single writer process and multiple reader pr

A Rust crate that implements a range map data structure backed by a Vec.

range_map_vec This crate implements a range map data structure backed by a Vec using binary search. Docs and usage can be found in the corresponding r

Rust Vector for large amounts of data, that does not copy when growing, by using full `mmap`'d pages.

Large Vector Rust Vector for large amounts of data, that does not copy when growing, by using full mmap'd pages. Maturity I made ths to learn about mm

Proof-of-concept for a memory-efficient data structure for zooming billion-event traces

Proof-of-concept for a gigabyte-scale trace viewer This repo includes: A memory-efficient representation for event traces An unusually simple and memo

Comments
  • Env vars missing - root cause of 101 issue

    Env vars missing - root cause of 101 issue

    Hello @danielbeach, I think there are separate temporary environments for each task run.. I've found with bash operator, that each run is in a different temporary folder.. Maybe python operator is similar?

    UPDATE: I noticed you were using a particular temporary directory to store, so this shouldn't be an issue

    And when you run on a set of Kubernetes workers, shouldn't the download, chmod, and execution be in the same task on the DAG, so they get executed inside the same worker?

    opened by NewtonChutney 2
Owner
Daniel B
Data Engineer. Data lover. Data warehouse expert. Python, Rust, SQL, Databricks, Delta Lake is all I need in life.
Daniel B
Accompanying the 5-class, 1 class per week series of Ultimate Rust: Foundations

Ultimate Rust Foundations Presented by Ardan Labs, Ultima Rust: Foundations gives you a "zero to hero" class to get you started with Rust. You'll lear

Herbert 7 May 22, 2023
MetaCall: The ultimate polyglot programming experience.

MetaCall Polyglot Runtime MetaCall.io | Install | Docs MetaCall allows calling functions, methods or procedures between multiple programming languages

MetaCall 1.1k Jan 7, 2023
A comprehensive and FREE Online Rust hacking tutorial utilizing the x64, ARM64 and ARM32 architectures going step-by-step into the world of reverse engineering Rust from scratch.

FREE Reverse Engineering Self-Study Course HERE Hacking Rust A comprehensive and FREE Online Rust hacking tutorial utilizing the x64, ARM64 and ARM32

Kevin Thomas 98 Jun 21, 2023
Crabzilla provides a simple interface for running JavaScript modules alongside Rust code.

Crabzilla Crabzilla provides a simple interface for running JavaScript modules alongside Rust code. Example use crabzilla::*; use std::io::stdin; #[i

Andy 14 Feb 19, 2022
Rust library for scheduling, managing resources, and running DAGs 🌙

?? moongraph ?? moongraph is a Rust library for scheduling, managing resources, and running directed acyclic graphs. In moongraph, graph nodes are nor

Schell Carl Scivally 3 May 1, 2023
Rust library for compiling and running other programs.

Exers ?? Exers is a rust library for compiling and running code in different languages and runtimes. Usage example fn main() { // Imports...

olix3001 5 Jun 10, 2023
An inquiry into nondogmatic software development. An experiment showing double performance of the code running on JVM comparing to equivalent native C code.

java-2-times-faster-than-c An experiment showing double performance of the code running on JVM comparing to equivalent native C code ⚠️ The title of t

xemantic 49 Aug 14, 2022
Operating system based off of blog_os, with the goal of running wasm modules as executables

yavkOS - A OS that attempts at running WASM modules as userspace programs Recommended Development Environment You need nix with the flakes, and nix-co

Yavor Kolev 12 Apr 1, 2023
An expression based data notation, aimed at transpiling itself to any cascaded data notation.

Lala An expression oriented data notation, aimed at transpiling itself to any cascaded data notation. Lala is separated into three components: Nana, L

null 37 Mar 9, 2022
A library for transcoding between bytes in Astro Notation Format and Native Rust data types.

Rust Astro Notation A library for transcoding between hexadecimal strings in Astro Notation Format and Native Rust data types. Usage In your Cargo.tom

Stelar Software 1 Feb 4, 2022