Benchmarks to read parquet to arrow

Overview

Parquet benchmarks

This repository contains a set of benchmarks of different implementations of Parquet (storage format) <-> Arrow (in-memory format).

The results on Azure's Standard D4s v3 (4 vcpus, 16 GiB memory) are available here.

Read uncompressed

read uncompressed i64

read uncompressed bool

read uncompressed utf8

(Note: neither pyarrow nor arrow validate utf8, which can result in undefined behavior.)

read uncompressed dict utf8

(Note: neither pyarrow nor arrow validate utf8, which can result in undefined behavior.)

Read compressed (snappy)

read compressed i64

read compressed bool

read compressed utf8

(Note: neither pyarrow nor arrow validate utf8, which can result in undefined behavior.)

Write uncompressed

write uncompressed i64

write uncompressed bool

write uncompressed utf8

Write compressed (snappy)

write compressed i64

write compressed bool

write compressed utf8

(Note: neither pyarrow nor arrow validate utf8, which can result in undefined behavior.)

Run benchmarks

To reproduce, use

python3 -m venv venv
venv/bin/pip install -U pip
venv/bin/pip install pyarrow

# create files
venv/bin/python write_parquet.py

# run benchmarks
venv/bin/python run.py

# print results to stdout as csv
venv/bin/python summarize.py

Details

The benchmark reads a single column from a file pre-loaded into memory, decompresses and deserializes the column to an arrow array.

The benchmark includes different configurations:

  • dictionary-encoded vs plain encoding
  • single page vs multiple pages
  • compressed vs uncompressed
  • different types:
    • i64
    • bool
    • utf8
You might also like...
Benchmarks of algorithms for convex min-plus convolutions

Convex min-plus convolution を実現するアルゴリズムの実行時間を比較します。 やり方 長さ N の convex な Veci32 を2本生成して、それらの min-plus convolution を3通りの方法で計算します。 brute: 定義どおり計算します。(Θ

A small utility to compare Rust micro-benchmarks.
A small utility to compare Rust micro-benchmarks.

cargo benchcmp A small utility for comparing micro-benchmarks produced by cargo bench. The utility takes as input two sets of micro-benchmarks (one "o

Fastmurmur3 - Fast non-cryptographic hash, with the benchmarks to prove it.

Fastmurmur3 Murmur3 is a fast, non-cryptographic hash function. fastmurmur3 is, in my testing, the fastest implementation of Murmur3. Usage let bytes:

rust channel benchmarks to keep stat of performance of Kanal library in comparison with other competitors.
rust channel benchmarks to keep stat of performance of Kanal library in comparison with other competitors.

Rust Channel Benchmarks This is a highly modified fork of the crossbeam-channel benchmarks. to keep track of Kanal library stats in comparison with ot

Benchmarks of most widely used web frameworks built in rust.
Benchmarks of most widely used web frameworks built in rust.

Rust framework benchmarks Benchmarking utility to test the performance of all the rust web frameworks. Built with rust 🚀 . Demo (Last updated: Thu Ju

cargo-generate template for Criterion benchmarks

Criterion Benchmark Template This is a cargo-generate template for quickly creating benchmarks using the Criterion benchmarking framework. Usage $ car

A collection of comparison-benchmarks for Nova & related Proving systems

Nova benchmarks Here's a list of some of the benchmarks we've been taking to better understand how Nova performs vs other proof systems. Live version:

A suite of benchmarks to test e-graph extraction algorithms

Extraction Gym A suite of benchmarks to test e-graph extraction algorithms. Add your algorithm in src/extract and then add a line in src/main.rs. To r

specs & benchmarks for the ZPrize 3 - High Throughput Signature Verification

Zprize: High Throughput Signature Verification How fast do you think you can verify our ECDSA signatures on Aleo? This year's Zprize is winner-take-al

Distributed compute platform implemented in Rust, and powered by Apache Arrow.
Distributed compute platform implemented in Rust, and powered by Apache Arrow.

Ballista: Distributed Compute Platform Overview Ballista is a distributed compute platform primarily implemented in Rust, powered by Apache Arrow. It

Apache Arrow in WebAssembly

WASM Arrow This package compiles the Rust library of Apache Arrow to WebAssembly. This might be a viable alternative to the pure JavaScript library. R

transmute-free Rust library to work with the Arrow format

Arrow2: Transmute-free Arrow This repository contains a Rust library to work with the Arrow format. It is a re-write of the official Arrow crate using

Distributed compute platform implemented in Rust, and powered by Apache Arrow.
Distributed compute platform implemented in Rust, and powered by Apache Arrow.

Ballista: Distributed Compute Platform Overview Ballista is a distributed compute platform primarily implemented in Rust, powered by Apache Arrow. It

A Rust DataFrame implementation, built on Apache Arrow

Rust DataFrame A dataframe implementation in Rust, powered by Apache Arrow. What is a dataframe? A dataframe is a 2-dimensional tabular data structure

Official Rust implementation of Apache Arrow

Native Rust implementation of Apache Arrow Welcome to the implementation of Arrow, the popular in-memory columnar format, in Rust. This part of the Ar

Apache Arrow DataFusion and Ballista query engines
Apache Arrow DataFusion and Ballista query engines

DataFusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.

Fill Apache Arrow record batches from an ODBC data source in Rust.

arrow-odbc Fill Apache Arrow arrays from ODBC data sources. This crate is build on top of the arrow and odbc-api crate and enables you to read the dat

This crate allows writing a struct in Rust and have it derive a struct of arrays layed out in memory according to the arrow format.

Arrow2-derive - derive for Arrow2 This crate allows writing a struct in Rust and have it derive a struct of arrays layed out in memory according to th

Generated Ryst of Apache Arrow spec

Arrow generated IPC format The generated flatbuffers code for Rust. Note that these files suffered modifications because flatbuffers is unable to comp

Owner
null
Work out how to read Parquet files in a browser using web assembly (via the Rust toolchain)

wasm-pack-template A template for kick starting a Rust and WebAssembly project using wasm-pack. Tutorial | Chat Built with ?? ?? by The Rust and WebAs

null 5 Oct 11, 2022
Rust-based WebAssembly bindings to read and write Apache Parquet files

parquet-wasm WebAssembly bindings to read and write the Parquet format to Apache Arrow. This is designed to be used alongside a JavaScript Arrow imple

Kyle Barron 103 Dec 25, 2022
Command line tool for inspecting Parquet files

pqrs pqrs is a command line tool for inspecting Parquet files This is a replacement for the parquet-tools utility written in Rust Built using the Rust

Manoj Karthick 127 Dec 23, 2022
Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow

Parquet2 This is a re-write of the official parquet crate with performance, parallelism and safety in mind. The five main differentiators in compariso

Jorge Leitao 237 Jan 1, 2023
PostQuet: Stream PostgreSQL tables/queries to Parquet files seamlessly with this high-performance, Rust-based command-line tool.

STATUS: IN DEVELOPMENT PostQuet: Streaming PostgreSQL to Parquet Exporter PostQuet is a powerful and efficient command-line tool written in Rust that

Per Arneng 4 Apr 11, 2023
cryo is the easiest way to extract blockchain data to parquet, csv, or json

❄️ ?? cryo ?? ❄️ cryo is the easiest way to extract blockchain data to parquet, csv, or json cryo is also extremely flexible, with many different opti

Paradigm 287 Jul 12, 2023
🗄️ A simple CLI for converting WARC to Parquet.

warc-parquet ??️ A utility for converting WARC to Parquet. ?? Install The binary may be installed via cargo: $ cargo install warc-parquet To use the c

Max Countryman 89 Jun 5, 2023
The Computer Language Benchmarks Game: Rust implementations

The Computer Language Benchmarks Game: Rust implementations This is the version I propose to the The Computer Language Benchmarks Game. For regex-dna,

Guillaume P. 69 Jul 11, 2022
A more modern http framework benchmarker supporting HTTP/1 and HTTP/2 benchmarks.

rewrk A more modern http framework benchmark utility.

Harrison Burt 273 Dec 27, 2022
Benchmarks for rust serialization frameworks

Rust serialization benchmark The goal of these benchmarks is to provide thorough and complete benchmarks for various rust serialization frameworks. Th

David Koloski 187 Jan 4, 2023