Dataframe structure and operations in Rust

Related tags

Data processing utah
Overview

Utah

Build Status

Utah is a Rust crate backed by ndarray for type-conscious, tabular data manipulation with an expressive, functional interface.

Note: This crate works on stable. However, if you are working with dataframes with f64 data, use nightly, because you will get the performance benefits of specialization.

API currently in development and subject to change.

For an in-depth introduction to the mechanics of this crate, as well as future goals, read this blog post.

Install

Add the following to your Cargo.toml:

utah="0.1.2"

And add the following to your lib.rs or main.rs

#[macro_use]
extern crate utah

Documentation

Check out docs.rs for latest documentation.

Examples

Create dataframes on the fly

use utah::prelude::*;
let df = DataFrame<f64> = dataframe!(
    {
        "a" =>  col!([2., 3., 2.]),
        "b" =>  col!([2., NAN, 2.])
    });

let a = arr2(&[[2.0, 7.0], [3.0, 4.0]]);
let df : Result<DataFrame<f64>> = DataFrame::new(a).index(&["1", "2"]);

Transform the dataframe

use utah::prelude::*;
let df: DataFrame<f64> = DataFrame::read_csv("test.csv").unwrap();       
let res : DataFrame<f64> = df.remove(&["a", "c"], UtahAxis::Column).as_df()?;

Chain operations

use utah::prelude::*;
let df: DataFrame<f64> = DataFrame::read_csv("test.csv").unwrap();       
let res : DataFrame<f64> = df.df_iter(UtahAxis::Row)
                                     .remove(&["1"])
                                     .select(&["2"])
                                     .append("8", new_data.view())
                                     .sumdf()
                                     .as_df()?;

Support mixed types

use utah::prelude::*;
let a = DataFrame<InnerType> = dataframe!(
    {
        "name" =>  col!([InnerType::Str("Alice"),
                            InnerType::Str("Bob"),
                            InnerType::Str("Jane")]),
        "data" =>  col!([InnerType::Float(2.0),
                            InnerType::Empty(),
                            InnerType::Float(3.0)])
    });
let b: DataFrame<InnerType> = DataFrame::read_csv("test.csv").unwrap();
let res : DataFrame<InnerType> = a.concat(&b).as_df()?;
You might also like...
Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow

Parquet2 This is a re-write of the official parquet crate with performance, parallelism and safety in mind. The five main differentiators in compariso

ConnectorX - Fastest library to load data from DB to DataFrames in Rust and Python
ConnectorX - Fastest library to load data from DB to DataFrames in Rust and Python

ConnectorX enables you to load data from databases into Python in the fastest and most memory efficient way.

Apache Arrow DataFusion and Ballista query engines
Apache Arrow DataFusion and Ballista query engines

DataFusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.

Orkhon: ML Inference Framework and Server Runtime
Orkhon: ML Inference Framework and Server Runtime

Orkhon: ML Inference Framework and Server Runtime Latest Release License Build Status Downloads Gitter What is it? Orkhon is Rust framework for Machin

Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference
Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference

Sonos' Neural Network inference engine. This project used to be called tfdeploy, or Tensorflow-deploy-rust. What ? tract is a Neural Network inference

Provides a way to use enums to describe and execute ordered data pipelines. 🦀🐾

enum_pipline Provides a way to use enums to describe and execute ordered data pipelines. 🦀 🐾 I needed a succinct way to describe 2d pixel map operat

AppFlowy is an open-source alternative to Notion. You are in charge of your data and customizations
AppFlowy is an open-source alternative to Notion. You are in charge of your data and customizations

AppFlowy is an open-source alternative to Notion. You are in charge of your data and customizations. Built with Flutter and Rust.

New generation decentralized data warehouse and streaming data pipeline
New generation decentralized data warehouse and streaming data pipeline

World's first decentralized real-time data warehouse, on your laptop Docs | Demo | Tutorials | Examples | FAQ | Chat Get Started Watch this introducto

An example repository on how to start building graph applications on streaming data. Just clone and start building 💻 💪
An example repository on how to start building graph applications on streaming data. Just clone and start building 💻 💪

An example repository on how to start building graph applications on streaming data. Just clone and start building 💻 💪

Comments
  • Any plans to replace rustc-serialize with serde?

    Any plans to replace rustc-serialize with serde?

    Hey, very interesting crate you have here! :-)

    Let me ask something: is this crate still being maintained? And, if so, are there any plans to migrate from depending on rustc-serialize to serde instead - since rustc-serialize itself is already deprecated in favor of serde?

    Thanks! Diogo

    opened by diogobaeder 5
Owner
Suchin
Allen Institute for AI / Facebook AI
Suchin
Rust DataFrame library

Polars Blazingly fast DataFrames in Rust & Python Polars is a blazingly fast DataFrames library implemented in Rust. Its memory model uses Apache Arro

Ritchie Vink 11.9k Jan 8, 2023
A Rust DataFrame implementation, built on Apache Arrow

Rust DataFrame A dataframe implementation in Rust, powered by Apache Arrow. What is a dataframe? A dataframe is a 2-dimensional tabular data structure

Wakahisa 287 Nov 11, 2022
DataFrame / Series data processing in Rust

black-jack While PRs are welcome, the approach taken only allows for concrete types (String, f64, i64, ...) I'm not sure this is the way to go. I want

Miles Granger 30 Dec 10, 2022
DataFrame & its adaptors

Fabrix Fabrix is a lib crate, who uses Polars Series and DataFrame as fundamental data structures, and is capable to communicate among different data

Jacob Xie 18 Dec 12, 2022
Provides multiple-dtype columner storage, known as DataFrame in pandas/R

brassfibre Provides multiple-dtype columner storage, known as DataFrame in pandas/R. Series Single-dtype 1-dimentional vector with label (index). Crea

Sinhrks 21 Nov 28, 2022
ndarray: an N-dimensional array with array views, multidimensional slicing, and efficient operations

ndarray The ndarray crate provides an n-dimensional container for general elements and for numerics. Please read the API documentation on docs.rs or t

null 2.6k Jan 7, 2023
A fast, powerful, flexible and easy to use open source data analysis and manipulation tool written in Rust

fisher-rs fisher-rs is a Rust library that brings powerful data manipulation and analysis capabilities to Rust developers, inspired by the popular pan

Syed Vilayat Ali Rizvi 5 Aug 31, 2023
A fast, powerful, flexible and easy to use open source data analysis and manipulation tool written in Rust

fisher-rs fisher-rs is a Rust library that brings powerful data manipulation and analysis capabilities to Rust developers, inspired by the popular pan

null 5 Sep 6, 2023
Perhaps the fastest and most memory efficient way to pull data from PostgreSQL into pandas and numpy. 🚀

flaco Perhaps the fastest and most memory efficient way to pull data from PostgreSQL into pandas and numpy. ?? Have a gander at the initial benchmarks

Miles Granger 14 Oct 31, 2022
A Rust crate that reads and writes tfrecord files

tfrecord-rust The crate provides the functionality to serialize and deserialize TFRecord data format from TensorFlow. Features Provide both high level

null 22 Nov 3, 2022