Dataframe structure and operations in Rust

Related tags

Data processing utah
Overview

Utah

Build Status

Utah is a Rust crate backed by ndarray for type-conscious, tabular data manipulation with an expressive, functional interface.

Note: This crate works on stable. However, if you are working with dataframes with f64 data, use nightly, because you will get the performance benefits of specialization.

API currently in development and subject to change.

For an in-depth introduction to the mechanics of this crate, as well as future goals, read this blog post.

Install

Add the following to your Cargo.toml:

utah="0.1.2"

And add the following to your lib.rs or main.rs

#[macro_use]
extern crate utah

Documentation

Check out docs.rs for latest documentation.

Examples

Create dataframes on the fly

use utah::prelude::*;
let df = DataFrame<f64> = dataframe!(
    {
        "a" =>  col!([2., 3., 2.]),
        "b" =>  col!([2., NAN, 2.])
    });

let a = arr2(&[[2.0, 7.0], [3.0, 4.0]]);
let df : Result<DataFrame<f64>> = DataFrame::new(a).index(&["1", "2"]);

Transform the dataframe

use utah::prelude::*;
let df: DataFrame<f64> = DataFrame::read_csv("test.csv").unwrap();       
let res : DataFrame<f64> = df.remove(&["a", "c"], UtahAxis::Column).as_df()?;

Chain operations

use utah::prelude::*;
let df: DataFrame<f64> = DataFrame::read_csv("test.csv").unwrap();       
let res : DataFrame<f64> = df.df_iter(UtahAxis::Row)
                                     .remove(&["1"])
                                     .select(&["2"])
                                     .append("8", new_data.view())
                                     .sumdf()
                                     .as_df()?;

Support mixed types

use utah::prelude::*;
let a = DataFrame<InnerType> = dataframe!(
    {
        "name" =>  col!([InnerType::Str("Alice"),
                            InnerType::Str("Bob"),
                            InnerType::Str("Jane")]),
        "data" =>  col!([InnerType::Float(2.0),
                            InnerType::Empty(),
                            InnerType::Float(3.0)])
    });
let b: DataFrame<InnerType> = DataFrame::read_csv("test.csv").unwrap();
let res : DataFrame<InnerType> = a.concat(&b).as_df()?;
You might also like...
Orkhon: ML Inference Framework and Server Runtime
Orkhon: ML Inference Framework and Server Runtime

Orkhon: ML Inference Framework and Server Runtime Latest Release License Build Status Downloads Gitter What is it? Orkhon is Rust framework for Machin

Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference
Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference

Sonos' Neural Network inference engine. This project used to be called tfdeploy, or Tensorflow-deploy-rust. What ? tract is a Neural Network inference

Provides a way to use enums to describe and execute ordered data pipelines. 🦀🐾

enum_pipline Provides a way to use enums to describe and execute ordered data pipelines. 🦀 🐾 I needed a succinct way to describe 2d pixel map operat

AppFlowy is an open-source alternative to Notion. You are in charge of your data and customizations
AppFlowy is an open-source alternative to Notion. You are in charge of your data and customizations

AppFlowy is an open-source alternative to Notion. You are in charge of your data and customizations. Built with Flutter and Rust.

New generation decentralized data warehouse and streaming data pipeline
New generation decentralized data warehouse and streaming data pipeline

World's first decentralized real-time data warehouse, on your laptop Docs | Demo | Tutorials | Examples | FAQ | Chat Get Started Watch this introducto

An example repository on how to start building graph applications on streaming data. Just clone and start building 💻 💪
An example repository on how to start building graph applications on streaming data. Just clone and start building 💻 💪

An example repository on how to start building graph applications on streaming data. Just clone and start building 💻 💪

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing

Apache Arrow Powering In-Memory Analytics Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enabl

This library provides a data view for reading and writing data in a byte array.

Docs This library provides a data view for reading and writing data in a byte array. This library requires feature(generic_const_exprs) to be enabled.

Cloud native log storage and management for Kubernetes, containerised workloads
Cloud native log storage and management for Kubernetes, containerised workloads

Live Demo | Website | API Workspace on Postman Parseable is an open source, cloud native, log storage and management platform. Parseable helps you ing

Comments
  • Any plans to replace rustc-serialize with serde?

    Any plans to replace rustc-serialize with serde?

    Hey, very interesting crate you have here! :-)

    Let me ask something: is this crate still being maintained? And, if so, are there any plans to migrate from depending on rustc-serialize to serde instead - since rustc-serialize itself is already deprecated in favor of serde?

    Thanks! Diogo

    opened by diogobaeder 5
Owner
Suchin
Allen Institute for AI / Facebook AI
Suchin
A Rust DataFrame implementation, built on Apache Arrow

Rust DataFrame A dataframe implementation in Rust, powered by Apache Arrow. What is a dataframe? A dataframe is a 2-dimensional tabular data structure

Wakahisa 287 Nov 11, 2022
DataFrame / Series data processing in Rust

black-jack While PRs are welcome, the approach taken only allows for concrete types (String, f64, i64, ...) I'm not sure this is the way to go. I want

Miles Granger 30 Oct 9, 2022
DataFrame & its adaptors

Fabrix Fabrix is a lib crate, who uses Polars Series and DataFrame as fundamental data structures, and is capable to communicate among different data

Jacob Xie 17 Aug 7, 2022
Provides multiple-dtype columner storage, known as DataFrame in pandas/R

brassfibre Provides multiple-dtype columner storage, known as DataFrame in pandas/R. Series Single-dtype 1-dimentional vector with label (index). Crea

Sinhrks 20 Sep 3, 2021
ndarray: an N-dimensional array with array views, multidimensional slicing, and efficient operations

ndarray The ndarray crate provides an n-dimensional container for general elements and for numerics. Please read the API documentation on docs.rs or t

null 2.5k Nov 23, 2022
Perhaps the fastest and most memory efficient way to pull data from PostgreSQL into pandas and numpy. 🚀

flaco Perhaps the fastest and most memory efficient way to pull data from PostgreSQL into pandas and numpy. ?? Have a gander at the initial benchmarks

Miles Granger 14 Oct 31, 2022
A Rust crate that reads and writes tfrecord files

tfrecord-rust The crate provides the functionality to serialize and deserialize TFRecord data format from TensorFlow. Features Provide both high level

null 22 Nov 3, 2022
Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow

Parquet2 This is a re-write of the official parquet crate with performance, parallelism and safety in mind. The five main differentiators in compariso

Jorge Leitao 221 Nov 25, 2022
ConnectorX - Fastest library to load data from DB to DataFrames in Rust and Python

ConnectorX enables you to load data from databases into Python in the fastest and most memory efficient way.

SFU Database Group 888 Nov 25, 2022
Apache Arrow DataFusion and Ballista query engines

DataFusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.

The Apache Software Foundation 2.8k Dec 2, 2022