Dataframe structure and operations in Rust

Suchin

Last update: Sep 26, 2022

Related tags

Data processing utah

Overview

Utah

Utah is a Rust crate backed by ndarray for type-conscious, tabular data manipulation with an expressive, functional interface.

Note: This crate works on stable. However, if you are working with dataframes with f64 data, use nightly, because you will get the performance benefits of specialization.

API currently in development and subject to change.

For an in-depth introduction to the mechanics of this crate, as well as future goals, read this blog post.

Install

Add the following to your Cargo.toml:

utah="0.1.2"

And add the following to your lib.rs or main.rs

#[macro_use]
extern crate utah

Documentation

Check out docs.rs for latest documentation.

Examples

Create dataframes on the fly

use utah::prelude::*;
let df = DataFrame<f64> = dataframe!(
    {
        "a" =>  col!([2., 3., 2.]),
        "b" =>  col!([2., NAN, 2.])
    });

let a = arr2(&[[2.0, 7.0], [3.0, 4.0]]);
let df : Result<DataFrame<f64>> = DataFrame::new(a).index(&["1", "2"]);

Transform the dataframe

use utah::prelude::*;
let df: DataFrame<f64> = DataFrame::read_csv("test.csv").unwrap();       
let res : DataFrame<f64> = df.remove(&["a", "c"], UtahAxis::Column).as_df()?;

Chain operations

use utah::prelude::*;
let df: DataFrame<f64> = DataFrame::read_csv("test.csv").unwrap();       
let res : DataFrame<f64> = df.df_iter(UtahAxis::Row)
                                     .remove(&["1"])
                                     .select(&["2"])
                                     .append("8", new_data.view())
                                     .sumdf()
                                     .as_df()?;

Support mixed types

use utah::prelude::*;
let a = DataFrame<InnerType> = dataframe!(
    {
        "name" =>  col!([InnerType::Str("Alice"),
                            InnerType::Str("Bob"),
                            InnerType::Str("Jane")]),
        "data" =>  col!([InnerType::Float(2.0),
                            InnerType::Empty(),
                            InnerType::Float(3.0)])
    });
let b: DataFrame<InnerType> = DataFrame::read_csv("test.csv").unwrap();
let res : DataFrame<InnerType> = a.concat(&b).as_df()?;

Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow

Parquet2 This is a re-write of the official parquet crate with performance, parallelism and safety in mind. The five main differentiators in compariso

237 Jan 1, 2023

ConnectorX - Fastest library to load data from DB to DataFrames in Rust and Python

ConnectorX enables you to load data from databases into Python in the fastest and most memory efficient way.

939 Jan 5, 2023

Apache Arrow DataFusion and Ballista query engines

DataFusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.

2.9k Jan 2, 2023

Orkhon: ML Inference Framework and Server Runtime

Orkhon: ML Inference Framework and Server Runtime Latest Release License Build Status Downloads Gitter What is it? Orkhon is Rust framework for Machin

129 Dec 21, 2022

Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference

Sonos' Neural Network inference engine. This project used to be called tfdeploy, or Tensorflow-deploy-rust. What ? tract is a Neural Network inference

1.5k Jan 2, 2023

Provides a way to use enums to describe and execute ordered data pipelines. 🦀🐾

enum_pipline Provides a way to use enums to describe and execute ordered data pipelines. 🦀 🐾 I needed a succinct way to describe 2d pixel map operat

0 Oct 29, 2021

AppFlowy is an open-source alternative to Notion. You are in charge of your data and customizations

AppFlowy is an open-source alternative to Notion. You are in charge of your data and customizations. Built with Flutter and Rust.

30.7k Jan 7, 2023

New generation decentralized data warehouse and streaming data pipeline

184 Dec 22, 2022

An example repository on how to start building graph applications on streaming data. Just clone and start building 💻 💪

40 Dec 20, 2022

Comments

Any plans to replace rustc-serialize with serde?

Hey, very interesting crate you have here! :-)

Let me ask something: is this crate still being maintained? And, if so, are there any plans to migrate from depending on rustc-serialize to serde instead - since rustc-serialize itself is already deprecated in favor of serde?

Thanks! Diogo

opened by diogobaeder 5

Dataframe structure and operations in Rust

Related tags

Overview

Utah

Install

Documentation

Examples

Create dataframes on the fly

Transform the dataframe

Chain operations

Support mixed types

You might also like...

Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow

ConnectorX - Fastest library to load data from DB to DataFrames in Rust and Python

Apache Arrow DataFusion and Ballista query engines

Orkhon: ML Inference Framework and Server Runtime

Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference

Provides a way to use enums to describe and execute ordered data pipelines. 🦀🐾

AppFlowy is an open-source alternative to Notion. You are in charge of your data and customizations

New generation decentralized data warehouse and streaming data pipeline

An example repository on how to start building graph applications on streaming data. Just clone and start building 💻 💪

Comments

Any plans to replace rustc-serialize with serde?

Owner

Suchin

Rust DataFrame library

A Rust DataFrame implementation, built on Apache Arrow

DataFrame / Series data processing in Rust

DataFrame & its adaptors

Provides multiple-dtype columner storage, known as DataFrame in pandas/R

ndarray: an N-dimensional array with array views, multidimensional slicing, and efficient operations

A fast, powerful, flexible and easy to use open source data analysis and manipulation tool written in Rust

A fast, powerful, flexible and easy to use open source data analysis and manipulation tool written in Rust

Perhaps the fastest and most memory efficient way to pull data from PostgreSQL into pandas and numpy. 🚀

A Rust crate that reads and writes tfrecord files