Easy Hadoop Streaming and MapReduce interfaces in Rust

Overview

Efflux

Crates.io Build Status

Efflux is a set of Rust interfaces for MapReduce and Hadoop Streaming. It enables Rust developers to run batch jobs on Hadoop infrastructure whilst staying with the efficiency and safety they're used to.

Initially written to scratch a personal itch, this crate offers simple traits to mask the internals of working with Hadoop Streaming which lend themselves well to writing jobs quickly. Functionality is handed off to macros where possible to provide compile time guarantees, and any other functionality is kept simple to avoid overhead wherever possible.

Installation

Efflux is available on crates.io as a library crate, so you only need to add it as a dependency:

[dependencies]
efflux = "2.0"

You can then gain access to everything relevant using the prelude module of Efflux:

use efflux::prelude::*;

Usage

Efflux comes with a handy template to help generate new projects, using the kickstart tool. You can simply use the commands below and follow the prompt to generate a new project skeleton:

# install kickstart
$ cargo install kickstart

# create a project from the template
$ kickstart -s examples/template https://github.com/whitfin/efflux

If you'd rather not use the templating tool, you can always work from the examples found in this repository. A good place to start is the traditional wordcount example.

Testing

Testing your binaries is actually fairly simple, as you can simulate the Hadoop phases using a basic UNIX pipeline. The following example replicates the Hadoop job flow and generates output that matches a job executed with Hadoop itself:

# example Hadoop task invocation
$ hadoop jar hadoop-streaming-2.8.2.jar \
    -input <INPUT> \
    -output <OUTPUT> \
    -mapper <MAPPER> \
    -reducer <REDUCER>

# example simulation run via UNIX utilities
$ cat <INPUT> | <MAPPER> | sort -k1,1 | <REDUCER> > <OUTPUT>

This can be tested using the wordcount example to confirm that the outputs are indeed the same. There may be some cases where output differs, but it should be sufficient for many cases.

You might also like...
Paxakos is a pure Rust implementation of a distributed consensus algorithm

Paxakos is a pure Rust implementation of a distributed consensus algorithm based on Leslie Lamport's Paxos. It enables distributed systems to consistently modify shared state across their network, even in the presence of failures.

asynchronous and synchronous interfaces and persistence implementations for your OOD architecture

OOD Persistence Asynchronous and synchronous interfaces and persistence implementations for your OOD architecture Installation Add ood_persistence = {

Rust API Server: A versatile template for building RESTful interfaces, designed for simplicity in setup and configuration using the Rust programming language.
Rust API Server: A versatile template for building RESTful interfaces, designed for simplicity in setup and configuration using the Rust programming language.

RUST API SERVER Introduction Welcome to the Rust API Server! This server provides a simple REST interface for your applications. This README will guid

Build terminal user interfaces and dashboards using Rust
Build terminal user interfaces and dashboards using Rust

tui-rs tui-rs is a Rust library to build rich terminal user interfaces and dashboards. It is heavily inspired by the Javascript library blessed-contri

SixtyFPS is a toolkit to efficiently develop fluid graphical user interfaces for any display: embedded devices and desktop applications. We support multiple programming languages, such as Rust, C++ or JavaScript.
SixtyFPS is a toolkit to efficiently develop fluid graphical user interfaces for any display: embedded devices and desktop applications. We support multiple programming languages, such as Rust, C++ or JavaScript.

SixtyFPS is a toolkit to efficiently develop fluid graphical user interfaces for any display: embedded devices and desktop applications. We support multiple programming languages, such as Rust, C++ or JavaScript.

Rust crate providing a variety of automotive related libraries, such as communicating with CAN interfaces and diagnostic APIs

The Automotive Crate Welcome to the automotive crate documentation. The purpose of this crate is to help you with all things automotive related. Most

Making composability with the Zeta DEX a breeze, FuZe provides CPI interfaces and sample implementations for on-chain program integration.
Making composability with the Zeta DEX a breeze, FuZe provides CPI interfaces and sample implementations for on-chain program integration.

Zeta FuZe 🧬 Zeta FuZe FuZe is Zeta's cross-program integration ecosystem. This repository contains the Zeta Cross Program Invocation (CPI) interface

Interfaces for Relations and SNARKs for these relations

SNARK and Relation Traits The arkworks ecosystem consists of Rust libraries for designing and working with zero knowledge succinct non-interactive arg

Define safe interfaces to MMIO and CPU registers with ease

regi regi lets you define safe interfaces to MMIO and CPU registers with ease. License Licensed under either of Apache License, Version 2.0 or MIT lic

🖼 A Rust library for building user interfaces on the web with WebGL.
🖼 A Rust library for building user interfaces on the web with WebGL.

wasm-ui Hey! Welcome to my little experiment. It's a Rust library for building user interfaces on the web. You write the interface in Rust, and wasm-u

Featured Dioxus projects on how to build clean user interfaces in Rust
Featured Dioxus projects on how to build clean user interfaces in Rust

Example projects with Dioxus This repository holds the code for a variety of example projects built with Dioxus. Each project has information on how t

A developer-friendly framework for building user interfaces in Rust
A developer-friendly framework for building user interfaces in Rust

Reading: "Fru" as in "fruit" and "i" as in "I" (I am). What is Frui? Frui is a developer-friendly UI framework that makes building user interfaces eas

A Rust library for drawing grid-based user interfaces using ASCII characters.

grux A library for drawing grid-based user interfaces using ASCII characters. // Provides a uniform interface for drawing to a 2D grid. use grux::Grid

📡 Rust mDNS library designed with user interfaces in mind

📡 Searchlight Searchlight is an mDNS server & client library designed to be simple, lightweight and easy to use, even if you just have basic knowledg

Termbox is a library that provides minimalistic API which allows the programmer to write text-based user interfaces.

Termbox is a library that provides minimalistic API which allows the programmer to write text-based user interfaces.

Tool to create web interfaces to command-line tools

webgate This command line utility allows you to: serve files and directories listed in a config file remotely run shell commands listed in a config fi

Batteries included command line interfaces.

CLI Batteries Opinionated batteries-included command line interface runtime utilities. To use it, add it to your Cargo.toml [dependencies] cli-batteri

A library for building declarative text-based user interfaces
A library for building declarative text-based user interfaces

Intuitive docs.rs Documentation Intuitive is a component-based library for creating text-based user interfaces (TUIs) easily. It is heavily inspired b

Rust client for Timeplus Proton, a fast and lightweight streaming SQL engine

Rust Client for Timeplus Proton Rust client for Timeplus Proton. Proton is a streaming SQL engine, a fast and lightweight alternative to Apache Flink,

Comments
  • Move from accepting Strings to Vec<u8>

    Move from accepting Strings to Vec

    Currently the value inputs are String, but it's definitely possible that the user wants bytes due to different encoding. This should be tweaked in the traits, although it technically requires a major version so I'm not sure when.

    enhancement 
    opened by whitfin 1
  • Carry types across the MapReduce layers

    Carry types across the MapReduce layers

    Once #1 is resolved, we can also provide type consistency across layers. If a user emits u64 as a value from a mapper; it's technically possible to carry this type directly into Vec<u64> in the Reducer.

    I'm not sure yet whether this is possible to provide as a compile time guarantee - in theory it should be, I think. This would provide an extra layer of convenience for the user and protect against things like deserialization issues.

    enhancement 
    opened by whitfin 0
Releases(v2.0.1)
Owner
Isaac Whitfield
Fan of all things automated. OSS when applicable. Author of Cachex for Elixir. Senior Software Engineer at Axway. Intelligence wanes without practice.
Isaac Whitfield
A highly efficient daemon for streaming data from Kafka into Delta Lake

kafka-delta-ingest The kafka-delta-ingest project aims to build a highly efficient daemon for streaming data through Apache Kafka into Delta Lake. Thi

Delta Lake 173 Dec 28, 2022
Easy-to-use beanstalkd client for Rust (IronMQ compatible)

rust-beanstalkd Easy-to-use beanstalkd client for Rust (IronMQ compatible) Install Add this dependency to your Cargo.toml beanstalkd = "*" Documentati

Johannes Schickling 44 Oct 4, 2022
A crate to convert bytes to something more useable and the other way around in a way Compatible with the Confluent Schema Registry. Supporting Avro, Protobuf, Json schema, and both async and blocking.

#schema_registry_converter This library provides a way of using the Confluent Schema Registry in a way that is compliant with the Java client. The rel

Gerard Klijs 69 Dec 13, 2022
libhdfs binding and wrapper APIs for Rust

hdfs-rs libhdfs binding library and rust APIs which safely wraps libhdfs binding APIs Current Status Alpha Status (Rust wrapping APIs can be changed)

Hyunsik Choi 32 Dec 1, 2022
Twitch data consumer and broadcaster

NeoTwitch Arch Network broadcaster Chat (message buffer) If the message buffer is full then shut down Channel point events If the message buffer is fu

Togglebit 3 Dec 3, 2021
A fully asynchronous, futures-based Kafka client library for Rust based on librdkafka

rust-rdkafka A fully asynchronous, futures-enabled Apache Kafka client library for Rust based on librdkafka. The library rust-rdkafka provides a safe

Federico Giraud 1.1k Jan 8, 2023
Rust client for Apache Kafka

Kafka Rust Client Project Status This project is starting to be maintained by John Ward, the current status is that I am bringing the project up to da

Yousuf Fauzan 902 Jan 2, 2023
Magical Automatic Deterministic Simulator for distributed systems in Rust.

MadSim Magical Automatic Deterministic Simulator for distributed systems. Deterministic simulation MadSim is a Rust async runtime similar to tokio, bu

MadSys Research Group 249 Dec 28, 2022
The Raft algorithm implement by Rust.

Raft The Raft algorithm implement by Rust. This project refers to Eli Bendersky's website, the link as follows: https://eli.thegreenplace.net/2020/imp

Qiang Zhao 1 Oct 23, 2021
Raft distributed consensus for WebAssembly in Rust

WRaft: Raft in WebAssembly What is this? A toy implementation of the Raft Consensus Algorithm for WebAssembly, written in Rust. Basically, it synchron

Emanuel Evans 60 Oct 22, 2022