Home / Rust Data processing
69 Repositories
Sortby
🦖 Evolve your fixed length data files into Apache Arrow tables, fully parallelized! 🔎 Overview ... 📦 Installation The easiest way to install evolut
Arrow User-Defined Functions Framework on WebAssembly Example Build the WebAssembly module: cargo build --release -p arrow-udf-wasm-example --target w
arrow_extendr arrow-extendr is a crate that facilitates the transfer of Apache Arrow memory between R and Rust. It utilizes extendr, the {nanoarrow} R
fisher-rs fisher-rs is a Rust library that brings powerful data manipulation and analysis capabilities to Rust developers, inspired by the popular pan
oxbow Read specialized bioinformatic file formats as data frames in R, Python, and more. File formats create a lot of friction for computational biolo
warc-parquet 🗄️ A utility for converting WARC to Parquet. 📦 Install The binary may be installed via cargo: $ cargo install warc-parquet To use the c
dply is a command line tool for viewing, querying, and writing csv and parquet files, inspired by dplyr and powered by polars. Usage overview A dply p
STATUS: IN DEVELOPMENT PostQuet: Streaming PostgreSQL to Parquet Exporter PostQuet is a powerful and efficient command-line tool written in Rust that
chdig Dig into ClickHouse with TUI interface. Motivation The idea is came from everyday digging into various ClickHouse issues. ClickHouse has a appro
Native Rust implementation of Apache Arrow Welcome to the implementation of Arrow, the popular in-memory columnar format, in Rust. This part of the Ar
A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy
Polars Blazingly fast DataFrames in Rust & Python Polars is a blazingly fast DataFrames library implemented in Rust. Its memory model uses Apache Arro
📊 Cube.js — Open-Source Analytics API for Building Data Apps
Rayon Rayon is a data-parallelism library for Rust. It is extremely lightweight and makes it easy to convert a sequential computation into a parallel
Polars Python Documentation | Rust Documentation | User Guide | Discord | StackOverflow Blazingly fast DataFrames in Rust, Python & Node.js Polars is
ndarray The ndarray crate provides an n-dimensional container for general elements and for numerics. Please read the API documentation on docs.rs or t
Quickwit This repository will host Quickwit, the big data search engine developed by Quickwit Inc. We will progressively polish and opensource our cod
AppFlowy is an open-source alternative to Notion. You are in charge of your data and customizations. Built with Flutter and Rust.
Bytewax Bytewax is an open source Python framework for building highly scalable dataflows. Bytewax uses PyO3 to provide Python bindings to the Timely
Apache Arrow Powering In-Memory Analytics Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enabl
ConnectorX enables you to load data from databases into Python in the fastest and most memory efficient way.
vega Previously known as native_spark. Documentation A new, arguably faster, implementation of Apache Spark from scratch in Rust. WIP Framework tested
TensorBase is a new big data warehousing with modern efforts.
Datafuse Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture Datafuse is a Real-Time Data Processing & Analytics DBMS wit
Blackjack Your Rusty 🦀 procedural 3d modeler Blackjack is a procedural modelling application, following the steps of great tools like Houdini or Blen
rust-numpy Rust bindings for the NumPy C-API API documentation Latest release (possibly broken) Current Master Requirements Rust = 1.41.1 Basically,
Quickstart • Docs • Guides • Integrations • Chat • Download What is Vector? Vector is a high-performance, end-to-end (agent & aggregator) observabilit
DataFusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
Rustic is a backup tool that provides fast, encrypted, deduplicated backups. It can read the restic repo format desribed in the design document and writes a compatible repo format which can also be read by restic.
Sonos' Neural Network inference engine. This project used to be called tfdeploy, or Tensorflow-deploy-rust. What ? tract is a Neural Network inference
Live Demo | Website | API Workspace on Postman Parseable is an open source, cloud native, log storage and management platform. Parseable helps you ing
Parquet2 This is a re-write of the official parquet crate with performance, parallelism and safety in mind. The five main differentiators in compariso
ROAPI ROAPI automatically spins up read-only APIs for static datasets without requiring you to write a single line of code. It builds on top of Apache
async-datachannel This repository holds two crates, both of which share (mostly) the same API. async-datachannel Async wrapper for the datachannel cra
YATA Yet Another Technical Analysis library YaTa implements most common technical analysis methods and indicators. It also provides you an interface t
Weld Documentation Weld is a language and runtime for improving the performance of data-intensive applications. It optimizes across libraries and func
arrow-odbc Fill Apache Arrow arrays from ODBC data sources. This crate is build on top of the arrow and odbc-api crate and enables you to read the dat
A highly efficient daemon for streaming data from Kafka into Delta Lake
xaynet Xaynet: Train on the Edge with Federated Learning Want a framework that supports federated learning on the edge, in desktop browsers, integrate
World's first decentralized real-time data warehouse, on your laptop Docs | Demo | Tutorials | Examples | FAQ | Chat Get Started Watch this introducto
Orkhon: ML Inference Framework and Server Runtime Latest Release License Build Status Downloads Gitter What is it? Orkhon is Rust framework for Machin
bitsvec A bit vector with the Rust standard library's portable SIMD API Usage Add bitsvec to Cargo.toml: bitsvec = "x.y.z" Write some code like this:
An example repository on how to start building graph applications on streaming data. Just clone and start building 💻 💪
Dataflow Dataflow is a data processing library, primarily for machine learning. It provides efficient pipeline primitives to build a directed acyclic
Hastic Hastic needs Prometheus or InfluxDB instance for getting metrics. Build from source (Linux) Prerequirements Install cargo Install x86_64-unknow
sprs, sparse matrices for Rust sprs implements some sparse matrix data structures and linear algebra algorithms in pure Rust. The API is a work in pro
crates.io database dumps Library for scripting analyses against crates.io's database dumps. These database dumps contain all information exposed by th
Fabrix Fabrix is a lib crate, who uses Polars Series and DataFrame as fundamental data structures, and is capable to communicate among different data
black-jack While PRs are welcome, the approach taken only allows for concrete types (String, f64, i64, ...) I'm not sure this is the way to go. I want
TimeSeries TimeSeries is a framework for building analytical models in Rust that have a time dimension. Inspiration The inspiration for writing this i
brassfibre Provides multiple-dtype columner storage, known as DataFrame in pandas/R. Series Single-dtype 1-dimentional vector with label (index). Crea
bspipe A Rust implementation of Bidirectional Secure Pipe
A toolkit designed to be a foundation for applications to monitor their performance.