๐Ÿฆ– Evolve your fixed length data files into Apache Arrow tables, fully parallelized!

Overview


License: MIT Crates.io (latest) codecov CI Tests

๐Ÿฆ– Evolve your fixed length data files into Apache Arrow tables, fully parallelized!

๐Ÿ”Ž Overview

...

๐Ÿ“ฆ Installation

The easiest way to install evolution on your system is by using the Cargo package manager.

$ cargo install evolution

Alternatively, you can build from source by cloning this repo and compiling using Cargo.

$ git clone https://github.com/firelink-data/evolution.git
$ cd evolution
$ cargo build --release

๐Ÿ“‹ License

All code is to be held under a general MIT license, please see LICENSE for specific information.

You might also like...
A new arguably faster implementation of Apache Spark from scratch in Rust

vega Previously known as native_spark. Documentation A new, arguably faster, implementation of Apache Spark from scratch in Rust. WIP Framework tested

A highly efficient daemon for streaming data from Kafka into Delta Lake

A highly efficient daemon for streaming data from Kafka into Delta Lake

Perhaps the fastest and most memory efficient way to pull data from PostgreSQL into pandas and numpy. ๐Ÿš€

flaco Perhaps the fastest and most memory efficient way to pull data from PostgreSQL into pandas and numpy. ๐Ÿš€ Have a gander at the initial benchmarks

A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy
A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy

A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy

New generation decentralized data warehouse and streaming data pipeline
New generation decentralized data warehouse and streaming data pipeline

World's first decentralized real-time data warehouse, on your laptop Docs | Demo | Tutorials | Examples | FAQ | Chat Get Started Watch this introducto

This library provides a data view for reading and writing data in a byte array.

Docs This library provides a data view for reading and writing data in a byte array. This library requires feature(generic_const_exprs) to be enabled.

AppFlowy is an open-source alternative to Notion. You are in charge of your data and customizations
AppFlowy is an open-source alternative to Notion. You are in charge of your data and customizations

AppFlowy is an open-source alternative to Notion. You are in charge of your data and customizations. Built with Flutter and Rust.

Dig into ClickHouse with TUI interface. PRE ALPHA version, everything will be changed.
Dig into ClickHouse with TUI interface. PRE ALPHA version, everything will be changed.

chdig Dig into ClickHouse with TUI interface. Motivation The idea is came from everyday digging into various ClickHouse issues. ClickHouse has a appro

A Rust crate that reads and writes tfrecord files

tfrecord-rust The crate provides the functionality to serialize and deserialize TFRecord data format from TensorFlow. Features Provide both high level

Comments
  • Update padder requirement from 0.1.0 to 1.0.0

    Update padder requirement from 0.1.0 to 1.0.0

    Updates the requirements on padder to permit the latest version.

    Release notes

    Sourced from padder's releases.

    v1.0.0

    First major release ๐Ÿ”ฅโšก๏ธ

    This is the first major release of padder. At face value, it offers any Rust developer an alternative to using the standard library macro format! which has known performance issues due to multiple allocations.

    Padder allocates memory once, and then uses that memory for your formatting and padding needs. Check out the docs for examples and more information. The padding trait is currently implemented for string slices &str and generic vectors Vec<T>, with some trait bounds necessary.

    Sadly no new contributors for this major release. However, feel free to contribute in any way if you see something that can be added or improved.

    ๐Ÿ“‹ Changelog:

    Commits

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies rust 
    opened by dependabot[bot] 0
  • Feature/cleanup slicer

    Feature/cleanup slicer

    SlicerProcessor is now a trait. An SampleSliceProcessor implementation is included that simply adds slices to an output file expected NOT to diff.

    pub trait SlicerProcessor { fn set_line_break_handler(&mut self, fn_line_break:FnLineBreak); fn get_line_break_handler(&self) -> FnLineBreak;

    fn process(&mut self,slices: Vec<&[u8]>) -> usize;
    

    }

    opened by Ignalina 0
  • Feature/arrow aggrigator

    Feature/arrow aggrigator

    Would be good to have this in main so we dont diverge to much.

    improved: made CLAP use 3 subcomands mock,slice,parse Partial impl of parse.

    bad news: No unit test

    opened by Ignalina 1
Owner
Firelink Data
๐Ÿงก๐Ÿ’›โค๏ธ #opensource
Firelink Data
Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing

Apache Arrow Powering In-Memory Analytics Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enabl

The Apache Software Foundation 10.9k Jan 6, 2023
A Rust DataFrame implementation, built on Apache Arrow

Rust DataFrame A dataframe implementation in Rust, powered by Apache Arrow. What is a dataframe? A dataframe is a 2-dimensional tabular data structure

Wakahisa 287 Nov 11, 2022
Official Rust implementation of Apache Arrow

Native Rust implementation of Apache Arrow Welcome to the implementation of Arrow, the popular in-memory columnar format, in Rust. This part of the Ar

The Apache Software Foundation 1.3k Jan 9, 2023
Apache Arrow DataFusion and Ballista query engines

DataFusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.

The Apache Software Foundation 2.9k Jan 2, 2023
Polars is a blazingly fast DataFrames library implemented in Rust using Apache Arrow Columnar Format as memory model.

Polars Python Documentation | Rust Documentation | User Guide | Discord | StackOverflow Blazingly fast DataFrames in Rust, Python & Node.js Polars is

null 11.8k Jan 8, 2023
An AWS Lambda for automatically loading JSON files as they're created into Delta tables

Delta S3 Loader This AWS Lambda serves a singular purpose: bring JSON files from an S3 bucket into Delta Lake. This can be highly useful for legacy or

R. Tyler Croy 4 Jan 12, 2022
PostQuet: Stream PostgreSQL tables/queries to Parquet files seamlessly with this high-performance, Rust-based command-line tool.

STATUS: IN DEVELOPMENT PostQuet: Streaming PostgreSQL to Parquet Exporter PostQuet is a powerful and efficient command-line tool written in Rust that

Per Arneng 4 Apr 11, 2023
Integration between arrow-rs and extendr

arrow_extendr arrow-extendr is a crate that facilitates the transfer of Apache Arrow memory between R and Rust. It utilizes extendr, the {nanoarrow} R

Josiah Parry 8 Nov 24, 2023
Arrow User-Defined Functions Framework on WebAssembly.

Arrow User-Defined Functions Framework on WebAssembly Example Build the WebAssembly module: cargo build --release -p arrow-udf-wasm-example --target w

RisingWave Labs 3 Dec 14, 2023
Apache TinkerPop from Rust via Rucaja (JNI)

Apache TinkerPop from Rust An example showing how to call Apache TinkerPop from Rust via Rucaja (JNI). This repository contains two directories: java

null 8 Sep 27, 2022