Arrowdantic is a small Python library backed by a mature Rust implementation of Apache Arrow

Overview

Welcome to arrowdantic

Arrowdantic is a small Python library backed by a mature Rust implementation of Apache Arrow that can interoperate with

For simple (but data-heavy) data engineering tasks, this package essentially replaces pyarrow: it supports reading from and writing to Parquet, Arrow at the same or higher performance and higher safety (e.g. no segfaults).

Furthermore, it supports reading from and writing to ODBC compliant databases at the same or higher performance than turbodbc.

This package is also suitable for environments such as AWS Lambda functions. It takes 13M of disk space, compared to 82M taken by pyarrow.

Features

  • declare and access Arrow-backed arrays (integers, floats, boolean, string, binary)
  • read from and write to Apache Arrow IPC file
  • read from and write to Apache Parquet
  • read from and write to ODBC-compliant databases (e.g. postgres, mongoDB)

Examples

Use parquet

import io
import arrowdantic as ad

original_arrays = [ad.UInt32Array([1, None])]

schema = ad.Schema(
    [ad.Field(f"c{i}", array.type, True) for i, array in enumerate(original_arrays)]
)

data = io.BytesIO()
with ad.ParquetFileWriter(data, schema) as writer:
    writer.write(ad.Chunk(original_arrays))
data.seek(0)

reader = ad.ParquetFileReader(data)
chunk = next(reader)
assert chunk.arrays() == original_arrays

Use Arrow files

import arrowdantic as ad

original_arrays = [ad.UInt32Array([1, None])]

schema = ad.Schema(
    [ad.Field(f"c{i}", array.type, True) for i, array in enumerate(original_arrays)]
)

import io

data = io.BytesIO()
with ad.ArrowFileWriter(data, schema) as writer:
    writer.write(ad.Chunk(original_arrays))
data.seek(0)

reader = ad.ArrowFileReader(data)
chunk = next(reader)
assert chunk.arrays() == original_arrays

Use ODBC

import arrowdantic as ad


arrays = [ad.Int32Array([1, None]), ad.StringArray(["aa", None])]

with ad.ODBCConnector(r"Driver={SQLite3};Database=sqlite-test.db") as con:
    # create an empty table with a schema
    con.execute("DROP TABLE IF EXISTS example;")
    con.execute("CREATE TABLE example (c1 INT, c2 TEXT);")

    # insert the arrays
    con.write("INSERT INTO example (c1, c2) VALUES (?, ?)", ad.Chunk(arrays))

    # read the arrays
    with con.execute("SELECT c1, c2 FROM example", 1024) as chunks:
        assert chunks.fields() == [
            ad.Field("c1", ad.DataType.int32(), True),
            ad.Field("c2", ad.DataType.string(), True),
        ]
        chunk = next(chunks)
assert chunk.arrays() == arrays
Comments
  • Added float32 and float64 to datatype

    Added float32 and float64 to datatype

    Hello,

    Thanks for this great libs, it is very useful.

    I add the float32 and float64 that are missing (because I need it). It works and I will deploy it in production.

    For the tests, I don't understand but float32 transform 1.2 to 1.2000000476837158

    Do you have an idea why ?

    opened by blackrez 3
  • Using the main version of arrow2

    Using the main version of arrow2

    Hello,

    I tried to use the latest of arrow2 but my build failed.

    https://github.com/blackrez/arrowdantic/actions/runs/3563855994/jobs/5987131244

    I think this is due to an odbc function.

    But the odbc_fix patch is not merged with the master, what is the blocking point and how I can help ?

    Thanks in advance (I'm starting to use it in production and it works great).

    opened by blackrez 2
  • Fixed release build

    Fixed release build

    Hello,

    I saw the build release is broken. I tried to fix it but I can't build because maturin 2010 doesn't support aarch64 (I only have aarch64 env). So I have to migrate to maturin 2014 and it works. Capture d’écran 2022-12-06 à 14 26 16

    bug 
    opened by blackrez 1
  • Cannot install on Macbook M1

    Cannot install on Macbook M1

    Hello,

    I can't install arrowdantic on Macbook M1, there is a compilation error.

            = note: ld: warning: directory not found for option '-L/Users/nabil/lib'
                    ld: library not found for -lodbc
                    clang: error: linker command failed with exit code 1 (use -v to see invocation)
    

    unixodbc is installed with brew.

    opened by blackrez 1
  • Add wheels for aarch64 and python 3.8, 3.9, 3.10 for linux

    Add wheels for aarch64 and python 3.8, 3.9, 3.10 for linux

    Actually, manylinux only push wheels for python 3.7-amd64.

    Maturin have the ability to build multi-arch and multi-version.

    For example : https://github.com/ijl/orjson/blob/master/.github/workflows/linux-cross.yaml

    opened by blackrez 0
  • Exporting to numpy & cloud file systems

    Exporting to numpy & cloud file systems

    Hi @jorgecarleitao,

    Thanks for putting this library together -- it looks awesome! I had a few quick questions.

    1. What is the easiest way to convert to / from numpy using arrowdantic?
    2. Do you have any recommendations for reading Arrow files from cloud storage (e.g., s3 or gcs) that are backed by rust with python bindings?
    opened by benjaminrwilson 1
Owner
Jorge Leitao
Open source contributor; PMC member of Apache Arrow
Jorge Leitao
Create a Python project automatically with rust (like create-react-app but for python)

create-python-project Create a Python project automatically with rust (like create-react-app but for python) Installation cargo install create-python-

Dhravya Shah 2 Mar 12, 2022
Create, open, manage your Python projects with ease, a project aimed to make python development experience a little better

Create, open, manage your Python projects with ease, a project aimed to make python development experience a little better

Dhravya Shah 7 Nov 18, 2022
Mod_wasm - an extension module for the Apache HTTP Server (httpd) that enables the usage of WebAssembly (Wasm).

mod_wasm is an extension module for the Apache HTTP Server (httpd) that enables the usage of WebAssembly (Wasm). This module will allow to execute certain tasks in the backend in a very efficient and secure way.

VMware  Labs 67 Dec 21, 2022
Implementation of Monte Carlo PI approximation algorithm in Rust Python bindings

rusty_pi Implementation of Monte Carlo PI approximation algorithm in Rust Python bindings. Time of 100M iterations approximation on Core i7 10th gen:

Aleksey Popov 1 Jul 6, 2022
A simple library to allow for easy use of python from rust.

Rustpy A simple library to allow for easy use of python from rust. Status Currently this library has not received much love (pull requests welcome for

Luke 74 Jun 20, 2022
Robust and Fast tokenizations alignment library for Rust and Python

Robust and Fast tokenizations alignment library for Rust and Python Demo: demo Rust document: docs.rs Blog post: How to calculate the alignment betwee

Explosion 157 Dec 28, 2022
Very experimental Python bindings for the Rust biscuit-auth library

Overview This is a very experimental take on Python bindings for the biscuit_auth Rust library. It is very much a work in progress (limited testing, m

Josh Wright 5 Sep 14, 2022
Python bindings for heck, the Rust case conversion library

pyheck PyHeck is a case conversion library (for converting strings to snake_case, camelCase etc). It is a thin wrapper around the Rust library heck. R

Kevin Heavey 35 Nov 7, 2022
Build a python wheel from a dynamic library

build_wheel Small utility to create a Python wheel given a pre-built dynamic library (.so, .dylib, .dll). If you are just trying to produce a wheel fr

Tangram 1 Dec 2, 2021
The polyglot bindings generator for your library (C#, C, Python, …) 🐙

Interoptopus ?? The polyglot bindings generator for your library. Interoptopus allows you to deliver high-quality system libraries to your users, and

Ralf Biedert 155 Jan 3, 2023
Rust <-> Python bindings

rust-cpython Rust bindings for the python interpreter. Documentation Cargo package: cpython Copyright (c) 2015-2020 Daniel Grunwald. Rust-cpython is l

Daniel Grunwald 1.7k Dec 29, 2022
Rust bindings for the Python interpreter

PyO3 Rust bindings for Python. This includes running and interacting with Python code from a Rust binary, as well as writing native Python modules. Us

PyO3 7.2k Jan 4, 2023
A script language like Python or Lua written in Rust, with exactly the same syntax as Go's.

A script language like Python or Lua written in Rust, with exactly the same syntax as Go's.

null 1.4k Jan 1, 2023
Rust Python modules for interacting with Metaplex's NFT standard.

Simple Metaplex Metadata Decoder Install the correct Python wheel for your Python version with pip: pip install metaplex_decoder-0.1.0-cp39-cp39-manyl

Samuel Vanderwaal 11 Mar 31, 2022
Pyo3 - Rust bindings for the Python interpreter

PyO3 Rust bindings for Python, including tools for creating native Python extension modules. Running and interacting with Python code from a Rust bina

PyO3 7.2k Jan 2, 2023
RustPython - A Python Interpreter written in Rust

RustPython A Python-3 (CPython >= 3.9.0) Interpreter written in Rust ?? ?? ?? . Usage Check out our online demo running on WebAssembly. RustPython req

null 13.3k Jan 2, 2023
Python module implemented in Rust for counting the number of one bits in a numpy array.

bit-counter Package for counting the number of one bits in a numpy array of uint8 values. Implemented as a Python module using Rust, providing high pe

Andrew MacDonald 3 Jul 9, 2022
lavalink-rs bindings for Python

lavasnek_rs Dev Docs: Main Site | Fallback: GitHub Pages GitHub repo GitLab repo Using the library The library is available on PyPi, and you can insta

Victoria Casasampere Fernandez 39 Dec 27, 2022
Whitewash is python binding for Ammonia.

Whitewash Whitewash is python binding for Ammonia. Ammonia is a whitelist-based HTML sanitization library. It is designed to prevent cross-site script

Vivek Kushwaha 1 Nov 23, 2021