Easy Hadoop Streaming and MapReduce interfaces in Rust

Isaac Whitfield

Last update: Nov 22, 2022

Related tags

Overview

Efflux

Efflux is a set of Rust interfaces for MapReduce and Hadoop Streaming. It enables Rust developers to run batch jobs on Hadoop infrastructure whilst staying with the efficiency and safety they're used to.

Initially written to scratch a personal itch, this crate offers simple traits to mask the internals of working with Hadoop Streaming which lend themselves well to writing jobs quickly. Functionality is handed off to macros where possible to provide compile time guarantees, and any other functionality is kept simple to avoid overhead wherever possible.

Installation

Efflux is available on crates.io as a library crate, so you only need to add it as a dependency:

[dependencies]
efflux = "2.0"

You can then gain access to everything relevant using the prelude module of Efflux:

use efflux::prelude::*;

Usage

Efflux comes with a handy template to help generate new projects, using the kickstart tool. You can simply use the commands below and follow the prompt to generate a new project skeleton:

# install kickstart
$ cargo install kickstart

# create a project from the template
$ kickstart -s examples/template https://github.com/whitfin/efflux

If you'd rather not use the templating tool, you can always work from the examples found in this repository. A good place to start is the traditional wordcount example.

Testing

Testing your binaries is actually fairly simple, as you can simulate the Hadoop phases using a basic UNIX pipeline. The following example replicates the Hadoop job flow and generates output that matches a job executed with Hadoop itself:

# example Hadoop task invocation
$ hadoop jar hadoop-streaming-2.8.2.jar \
    -input <INPUT> \
    -output <OUTPUT> \
    -mapper <MAPPER> \
    -reducer <REDUCER>

# example simulation run via UNIX utilities
$ cat <INPUT> | <MAPPER> | sort -k1,1 | <REDUCER> > <OUTPUT>

This can be tested using the wordcount example to confirm that the outputs are indeed the same. There may be some cases where output differs, but it should be sufficient for many cases.

Paxakos is a pure Rust implementation of a distributed consensus algorithm

Paxakos is a pure Rust implementation of a distributed consensus algorithm based on Leslie Lamport's Paxos. It enables distributed systems to consistently modify shared state across their network, even in the presence of failures.

2 Jul 5, 2022

asynchronous and synchronous interfaces and persistence implementations for your OOD architecture

OOD Persistence Asynchronous and synchronous interfaces and persistence implementations for your OOD architecture Installation Add ood_persistence = {

1 Feb 15, 2022

Rust API Server: A versatile template for building RESTful interfaces, designed for simplicity in setup and configuration using the Rust programming language.

RUST API SERVER Introduction Welcome to the Rust API Server! This server provides a simple REST interface for your applications. This README will guid

3 Feb 25, 2024

Build terminal user interfaces and dashboards using Rust

tui-rs tui-rs is a Rust library to build rich terminal user interfaces and dashboards. It is heavily inspired by the Javascript library blessed-contri

9.3k Jan 4, 2023

SixtyFPS is a toolkit to efficiently develop fluid graphical user interfaces for any display: embedded devices and desktop applications. We support multiple programming languages, such as Rust, C++ or JavaScript.

SixtyFPS is a toolkit to efficiently develop fluid graphical user interfaces for any display: embedded devices and desktop applications. We support multiple programming languages, such as Rust, C++ or JavaScript.

5.5k Jan 1, 2023

Rust crate providing a variety of automotive related libraries, such as communicating with CAN interfaces and diagnostic APIs

The Automotive Crate Welcome to the automotive crate documentation. The purpose of this crate is to help you with all things automotive related. Most

29 Mar 11, 2024

Making composability with the Zeta DEX a breeze, FuZe provides CPI interfaces and sample implementations for on-chain program integration.

Zeta FuZe 🧬 Zeta FuZe FuZe is Zeta's cross-program integration ecosystem. This repository contains the Zeta Cross Program Invocation (CPI) interface

39 Aug 27, 2022

Comments

Move from accepting Strings to Vec

Currently the value inputs are String, but it's definitely possible that the user wants bytes due to different encoding. This should be tweaked in the traits, although it technically requires a major version so I'm not sure when.
enhancement

opened by whitfin 1
Carry types across the MapReduce layers

Once #1 is resolved, we can also provide type consistency across layers. If a user emits u64 as a value from a mapper; it's technically possible to carry this type directly into Vec<u64> in the Reducer.

I'm not sure yet whether this is possible to provide as a compile time guarantee - in theory it should be, I think. This would provide an extra layer of convenience for the user and protect against things like deserialization issues.
enhancement

opened by whitfin 0

Easy Hadoop Streaming and MapReduce interfaces in Rust

Related tags

Overview

Efflux

Installation

Usage

Testing

You might also like...

Paxakos is a pure Rust implementation of a distributed consensus algorithm

asynchronous and synchronous interfaces and persistence implementations for your OOD architecture

Rust API Server: A versatile template for building RESTful interfaces, designed for simplicity in setup and configuration using the Rust programming language.

Build terminal user interfaces and dashboards using Rust

SixtyFPS is a toolkit to efficiently develop fluid graphical user interfaces for any display: embedded devices and desktop applications. We support multiple programming languages, such as Rust, C++ or JavaScript.

Rust crate providing a variety of automotive related libraries, such as communicating with CAN interfaces and diagnostic APIs

Making composability with the Zeta DEX a breeze, FuZe provides CPI interfaces and sample implementations for on-chain program integration.

Interfaces for Relations and SNARKs for these relations

Define safe interfaces to MMIO and CPU registers with ease

🖼 A Rust library for building user interfaces on the web with WebGL.

Featured Dioxus projects on how to build clean user interfaces in Rust

A developer-friendly framework for building user interfaces in Rust

A Rust library for drawing grid-based user interfaces using ASCII characters.

📡 Rust mDNS library designed with user interfaces in mind

Termbox is a library that provides minimalistic API which allows the programmer to write text-based user interfaces.

Tool to create web interfaces to command-line tools

Batteries included command line interfaces.

A library for building declarative text-based user interfaces

Rust client for Timeplus Proton, a fast and lightweight streaming SQL engine

Comments

Move from accepting Strings to Vec

Carry types across the MapReduce layers

Releases(v2.0.1)

v2.0.1(Jan 15, 2019)

v2.0.0(Jan 12, 2019)

v1.2.0(Jan 10, 2019)

v1.1.0(Nov 29, 2018)

v1.0.2(Nov 28, 2018)

v1.0.1(Sep 9, 2018)

v1.0.0(Sep 7, 2018)

Owner

Isaac Whitfield

A highly efficient daemon for streaming data from Kafka into Delta Lake

Easy-to-use beanstalkd client for Rust (IronMQ compatible)

A crate to convert bytes to something more useable and the other way around in a way Compatible with the Confluent Schema Registry. Supporting Avro, Protobuf, Json schema, and both async and blocking.

libhdfs binding and wrapper APIs for Rust

Twitch data consumer and broadcaster

A fully asynchronous, futures-based Kafka client library for Rust based on librdkafka

Rust client for Apache Kafka

Magical Automatic Deterministic Simulator for distributed systems in Rust.

The Raft algorithm implement by Rust.

Raft distributed consensus for WebAssembly in Rust