PyO3-based Rust binding of NumPy C-API

Overview

rust-numpy

Actions Status Crate minimum rustc 1.41

Rust bindings for the NumPy C-API

API documentation

Requirements

  • Rust >= 1.41.1
    • Basically, our MSRV follows the one of PyO3.
  • Python >= 3.6
    • Python 3.5 support is dropped from 0.13
  • Some Rust libraries
  • numpy installed in your python environments (e.g., via pip install numpy)
    • We recommend numpy >= 1.16.0, though older version may work.

Note Starting from 0.3, rust-numpy migrated from rust-cpython to pyo3. If you want to use rust-cpython, use version 0.2.1 from crates.io.

Python2 Support

Version 0.5.0 is the last version that supports Python2.

If you want to compile this library with Python2, please use 0.5.0 from crates.io.

In addition, you have to add a feature flag in Cargo.toml like

[dependencies.numpy]
version = "0.5.0"
features = ["python2"]

.

You can also automatically specify python version in setup.py, using setuptools-rust.

Example

Execute a Python program from Rust and get results

[package]
name = "numpy-test"

[dependencies]
pyo3 = "0.14"
numpy = "0.14"
use numpy::PyArray1;
use pyo3::prelude::{PyResult, Python};
use pyo3::types::IntoPyDict;

fn main() -> PyResult<()> {
    Python::with_gil(|py| {
        let np = py.import("numpy")?;
        let locals = [("np", np)].into_py_dict(py);
        let pyarray: &PyArray1<i32> = py
            .eval("np.absolute(np.array([-1, -2, -3], dtype='int32'))", Some(locals), None)?
            .extract()?;
        let readonly = pyarray.readonly();
        let slice = readonly.as_slice()?;
        assert_eq!(slice, &[1, 2, 3]);
        Ok(())
    })
}

Write a Python module in Rust

Please see simple-extension directory for the complete example. Also, we have an example project with ndarray-linalg.

[lib]
name = "rust_ext"
crate-type = ["cdylib"]

[dependencies]
numpy = "0.14"
ndarray = "0.14"

[dependencies.pyo3]
version = "0.14"
features = ["extension-module"]
use ndarray::{ArrayD, ArrayViewD, ArrayViewMutD};
use numpy::{IntoPyArray, PyArrayDyn, PyReadonlyArrayDyn};
use pyo3::prelude::{pymodule, PyModule, PyResult, Python};

#[pymodule]
fn rust_ext(_py: Python<'_>, m: &PyModule) -> PyResult<()> {
    // immutable example
    fn axpy(a: f64, x: ArrayViewD<'_, f64>, y: ArrayViewD<'_, f64>) -> ArrayD<f64> {
        a * &x + &y
    }

    // mutable example (no return)
    fn mult(a: f64, mut x: ArrayViewMutD<'_, f64>) {
        x *= a;
    }

    // wrapper of `axpy`
    #[pyfn(m, "axpy")]
    fn axpy_py<'py>(
        py: Python<'py>,
        a: f64,
        x: PyReadonlyArrayDyn<f64>,
        y: PyReadonlyArrayDyn<f64>,
    ) -> &'py PyArrayDyn<f64> {
        let x = x.as_array();
        let y = y.as_array();
        axpy(a, x, y).into_pyarray(py)
    }

    // wrapper of `mult`
    #[pyfn(m, "mult")]
    fn mult_py(_py: Python<'_>, a: f64, x: &PyArrayDyn<f64>) -> PyResult<()> {
        let x = unsafe { x.as_array_mut() };
        mult(a, x);
        Ok(())
    }

    Ok(())
}

Conributing

We welcome issues and pull requests.

PyO3's contrinbuting.md is a nice guide for starting. Also, we have a gitter channel for communicating.

Comments
  • Discussion: dtype system and integrating record types

    Discussion: dtype system and integrating record types

    I've been looking at how record types can be integrated in rust-numpy and here's an unsorted collection of thoughts for discussion.

    Let's look at Element:

    pub unsafe trait Element: Clone + Send {
        const DATA_TYPE: DataType;
        fn is_same_type(dtype: &PyArrayDescr) -> bool;
        fn npy_type() -> NPY_TYPES { ... }
        fn get_dtype(py: Python) -> &PyArrayDescr { ... }
    }
    
    • npy_type() is used in PyArray::new() and the like. Instead, one should use PyArray_NewFromDescr() to make use of the custom descriptor. Should all places where npy_type() is used split between "simple type, use New" and "user type, use NewFromDescr"? Or, alternatively, should arrays always be constructed from descriptor? (in which case, npy_type() becomes redundant and should be removed)
    • Why is same_type() needed at all? It is only used in FromPyObject::extract where one could simply use PyArray_EquivTypes (like it's done in pybind11). Isn't it largely redundant? (or does it exist for optimization purposes? In which case, is it even noticeable performance-wise?)
    • DATA_TYPE constant is really only used to check if it's an object or not in 2 places, like this:
      if T::DATA_TYPE != DataType::Object
      

      Isn't this redundant as well? Given that one can always do

      T::get_dtype().get_datatype() != Some(DataType::Object)
      // or, can add something like: T::get_dtype().is_object()
      
    • With all the notes above, Element essentially is just
       pub unsafe trait Element: Clone + Send {
           fn get_dtype(py: Python) -> &PyArrayDescr;
       }
      
    • For structured types, do we want to stick the type descriptor into DataType? E.g.:
      enum DataType { ..., Record(RecordType) }
      

      Or, alternatively, just keep it as DataType::Void? In which case, how does one recover record type descriptor? (it can always be done through numpy C API of course, via PyArrayDescr).

    • In order to enable user-defined record dtypes, having to return &PyArrayDescr would probably require:
      • Maintaining a global static thread-safe registry of registered dtypes (kind of like it's done in pybind11)
      • Initializing this registry somewhere
      • Any other options?
    • Element should probably be implemented for tuples and fixed-size arrays.
    • In order to implement structured dtypes, we'll inevitably have to resort to proc-macros. A few random thoughts and examples of how it can be done (any suggestions?):
      • #[numpy(record)]
        #[derive(Clone, Copy)]
        #[repr(packed)]
        struct Foo { x: i32, u: Bar } // where Bar is a registered numpy dtype as well
        // dtype = [('x', '<i4'), ('u', ...)]
        
      • We probably have to require either of #[repr(C)], #[repr(packed)] or #[repr(transparent)]
      • If repr is required, it can be an argument of the macro, e.g. #[numpy(record, repr = "C")]. (or not)
      • We also have to require Copy? (or not? technically, you could have object-type fields inside)
      • For wrapper types, we can allow something like this:
      • #[numpy(transparent)]
        #[repr(transparent)]
        struct Wrapper(pub i32);
        // dtype = '<i4'
        
      • For object types, the current suggestion in the docs is to implement a wrapper type and then impl Element for it manually. This seems largely redundant, given that the DATA_TYPE will always be Object. It would be nice if any #[pyclass]-wrapped types could automatically implement Element, but it would be impossible due to orphan rule. An alternative would be something like this:
        #[pyclass]
        #[numpy] // i.e., #[numpy(object)]
        struct Foo {}
        
      • How does one register dtypes for foreign (remote) types? I.e., OrderedFloat<f32> or Wrapping<u64> or some PyClassFromOtherCrate? We can try doing something like what serde does for remote types.
    opened by aldanor 58
  • Dependency Hell

    Dependency Hell

    Today I read this post on users.rust-lang.org: Dependency Hell.

    I don't really know anything about how cargo resolves versions, so I figured I'd raise it here.

    Is there something that can be done about this? One of the commenters raised that rust-numpy can reexport nd_array - is that likely to make things easier?

    opened by mejrs 22
  • Safe NpyIter Interface.

    Safe NpyIter Interface.

    This is a highly incomplete first attempt at creating a safe wrapper for NdIter. If anyone has any feedback on code style etc. I'll happily take it as I keep working.

    Continues #129

    opened by PTNobel 22
  • Breaking PyReadonlyArray

    Breaking PyReadonlyArray

    I am sorry for repeatedly opening soundness issues, but I fear that using NumPy's writeable is insufficient to protect safe code from mutable aliasing.

    At least it seems incompatible with a safe as_cell_slice as demonstrated by the following test failing

    #[test]
    fn alias_readonly_cell_slice() {
        Python::with_gil(|py| {
            let array = PyArray::<i32, _>::zeros(py, (2, 3), false);
    
            let slice = array.as_cell_slice().unwrap();
    
            let array = array.readonly();
            let value = array.get((0, 0)).unwrap();
    
            assert_eq!(*value, 0);
            slice[0].set(1);
            assert_eq!(*value, 0);
        });
    }
    

    which implies that as_cell_slice cannot be safe. (I am not sure if it adds anything over as_array(_mut) at all viewed this way. I think a safe method that yields an ndarray::RawArraView might be preferable to handle situations involving aliasing.)

    Another issue I see is that Python does not need to leave the writeable flag alone, i.e. the test

    #[test]
    fn alias_readonly_python() {
        use pyo3::py_run;
    
        Python::with_gil(|py| {
            let array = PyArray::<i32, _>::zeros(py, (2, 3), false);
    
            let array = array.readonly();
            let value = array.get((0, 0)).unwrap();
    
            assert_eq!(*value, 0);
            py_run!(py, array, "array.flags.writeable = True\narray[(0,0)] = 1");
            assert_eq!(*value, 0);
        });
    }
    

    also fails, but I have to admit that this might be a case of "you're holding it wrong" as I am not sure what guarantees we can expect from the Python code? (After all PyArray<A,D>: !Send which Python code does not need to respect either even just trying to access an array and thereby checking its flags from another thread might race with us modifying the flags without atomics.)

    opened by adamreichold 21
  • RFC: Add dynamic borrow checking for dereferencing NumPy arrays.

    RFC: Add dynamic borrow checking for dereferencing NumPy arrays.

    This PR tries to follow up on the position taken by the cxx crate: C++ or Python code is unsafe and therefore needs to ensure that relevant invariants are upheld. Safe Rust code can then rely on this for e.g. aliasing discipline as long as it does not introduce any memory safety violations itself.

    Hence, this PR adds dynamic borrow checking which ensures that safe Rust code will uphold aliasing discipline as long as the Python does so as well. This does come with the cost of two accesses to a global hash table per dereference which is surely not negligible, but also constant, i.e. it does not increase with the number of array elements. It can also be avoided by calling the unsafe/unchecked variants of the accessors when it really is a performance issue.

    While I think this PR solves #258 in a manner that is as safely as possible when interacting with unchecked language like Python, I would open another issue immediately after merging this into main to track refining the over-approximation of considering all views into the same base object as conflict to detect non-overlap and interleaving. But I prefer like to do this as separate PR to keep this reviewable.

    Closes #258

    opened by adamreichold 19
  • Slim down `Element`, various dtype-related improvements and changes

    Slim down `Element`, various dtype-related improvements and changes

    @adamreichold Here comes, as promised 😄

    This is the preliminary PR to enable further work on #254.

    Although there's some breaking changes in here, it's probably better to do it all in one batch because many of these changes are inter-related. I believe all changes are improvements, in one way or another; and there's also some fixes.

    An unsorted list of changes:

    • Replace Element::DATA_TYPE with Element::IS_POD
      • Otherwise, we would run into serious complications when trying to implement record types. Also, this is much simpler since all we need really is a quick way to check whether a type is pod or not.
    • Remove Element::same_type()
      • This was based on typenums and wouldn't work with more complex types. Now it uses PyArray_EquivTypes.
    • Use PyArray_NewFromDescr instead of PyArray_New to create arrays.
      • This detaches it from typenums and will allow using custom descriptors later on.
    • Added a FIXME note in FromPyObject::extract() - it currently doesn't check the instance type and only verifies that dtype is 'O' which may and will lead to unsafe behaviour and so it has to be fixed.
    • Split a weird ShapeError into DimensionalityError and TypeError. The latter is no longer typenum-based either and would work with any dtypes; formatting is left for numpy to handle.
    • Add methods to PyArrayDescr:
      • into_dtype_ptr() - an alternative to as_dtype_ptr() that increfs it; useful since numpy API often steals descriptor references
      • of<T>() - an equivalent to pybind11::dtype::of<T>()
      • is_equiv_to() - to check if descriptor types are equivalent (PyArray_EquivTypes)
      • get_typenum() is made public since DataType::from_typenum() is public; doesn't make much sense to hide it
      • object() - a shortcut for creating 'O' dtype (useful in user implementations of Element)
    • Add get_typenum() to DataType
    • Fix how integer types are mapped to npy types and added a test for it (which was previously failing):
      • As an example, DataType::Uint64 could be previously mapped to np.c_ulonglong instead of np.uint64 which is not the same thing. Now, u64 always maps to np.uint64.
      • Now it uses the same logic as numpy itself (and same as in pybind11)
      • Reverse conversions (npy -> DataType) have also been reimplemented and cleaned up
    • Implemented Element for isize

    Open question: one thing that I find extremely confusing is that DataType::Complex32 maps to np.complex64 (and Complex64 maps to np.complex128). Wouldn't it make more sense if they were named consistently? (i.e. same as in numpy, 64/128)

    (if this is accepted, I can also work on updating the changelog if needed so.)

    opened by aldanor 16
  • Give PyArray<PyObject> another try.

    Give PyArray another try.

    I am not sure what changed since https://github.com/PyO3/rust-numpy/pull/143#issuecomment-658867164 or whether the segmentation fault was only triggered by a more involved test but this seems to work using Python 3.8.2 on Linux. (Similarly to how https://github.com/PyO3/rust-numpy/issues/138#issuecomment-962435764 worked in this environment.)

    Fixes #175

    opened by adamreichold 15
  • Fix signature of PyArray::new or better yet replace it by PyArray::uninit

    Fix signature of PyArray::new or better yet replace it by PyArray::uninit

    As discussed in https://github.com/PyO3/rust-numpy/pull/216#issuecomment-962460983, PyArray::new creates an uninitialized array and should therefore be marked unsafe as it is the callers responsibility to not access (and of non-copy element types not even drop) the array until all elements are initialized.

    However, we also do not seem to offer a good API for doing that element-wise initialization post facto. I think instead of just marking the method unsafe, we should rename it PyArray::uninit and keep it safe but change its return type to PyArray<MaybeUninit<A>, D>> and add an additional assume_init method for turning this into PyArray<A, D>. This would be similar to the approach taken by ndarray.

    opened by adamreichold 14
  • Refine dynamic borrow checking to track data ranges.

    Refine dynamic borrow checking to track data ranges.

    Follow-up to #274 as discussed in https://github.com/PyO3/rust-numpy/pull/274#issuecomment-1071048638. Should show that this can work, but still needs needs many unit tests to drive the range data structures into various stages...

    opened by adamreichold 13
  • Enum Variants to Numpy Dtype

    Enum Variants to Numpy Dtype

    I know there is the Element trait, but it isn't easy to implement and almost impossible for Enums.

    I wonder if it would be possible to introduce a new trait that would allow for conversion.

    My use case was to convert avro records to numpy arrays. The exact schema for the avro file might not be known a head of time. Unfortunately, I don't think it is possible with the library's current infrastructure.

    Side note, new to Rust, there may be many uninformed/unintelligent things said above.

    opened by aaronclong 13
  • Safe wrapper for NPyIter

    Safe wrapper for NPyIter

    I'm evaluating using rust-numpy for a project that will require extensive use of NPyIter as I have to re-implement a number of ufuncs and need access to the broadcasting implementation.

    I'm currently using np.broadcast in Python code to implement this.

    Would there be any interest in making a safe wrapper for NPyIter if I built one?

    Does anyone have any advice or similar wrappers I could reference? This is my first experience with writing unsafe rust code.

    enhancement 
    opened by PTNobel 12
  • Support for multi type arrays

    Support for multi type arrays

    It would be useful to support arrays with multiple data types like structured arrays https://numpy.org/doc/stable/user/basics.rec.html

    This would be useful when dealing with for example timestamped data where you have a u64 timestamp in one column and x,y,z floats in 3 other columns.

    enhancement 
    opened by saty9 1
Releases(v0.17.2)
Owner
PyO3
Pythonium Trioxide
PyO3
A bit vector with the Rust standard library's portable SIMD API

bitsvec A bit vector with the Rust standard library's portable SIMD API Usage Add bitsvec to Cargo.toml: bitsvec = "x.y.z" Write some code like this:

Chojan Shang 31 Dec 21, 2022
📊 Cube.js — Open-Source Analytics API for Building Data Apps

?? Cube.js — Open-Source Analytics API for Building Data Apps

Cube.js 14.4k Jan 8, 2023
A rust library built to support building time-series based projection models

TimeSeries TimeSeries is a framework for building analytical models in Rust that have a time dimension. Inspiration The inspiration for writing this i

James MacAdie 12 Dec 7, 2022
PostQuet: Stream PostgreSQL tables/queries to Parquet files seamlessly with this high-performance, Rust-based command-line tool.

STATUS: IN DEVELOPMENT PostQuet: Streaming PostgreSQL to Parquet Exporter PostQuet is a powerful and efficient command-line tool written in Rust that

Per Arneng 4 Apr 11, 2023
Yet Another Technical Analysis library [for Rust]

YATA Yet Another Technical Analysis library YaTa implements most common technical analysis methods and indicators. It also provides you an interface t

Dmitry 197 Dec 29, 2022
Dataframe structure and operations in Rust

Utah Utah is a Rust crate backed by ndarray for type-conscious, tabular data manipulation with an expressive, functional interface. Note: This crate w

Suchin 139 Sep 26, 2022
Rust DataFrame library

Polars Blazingly fast DataFrames in Rust & Python Polars is a blazingly fast DataFrames library implemented in Rust. Its memory model uses Apache Arro

Ritchie Vink 11.9k Jan 8, 2023
A Rust DataFrame implementation, built on Apache Arrow

Rust DataFrame A dataframe implementation in Rust, powered by Apache Arrow. What is a dataframe? A dataframe is a 2-dimensional tabular data structure

Wakahisa 287 Nov 11, 2022
Rayon: A data parallelism library for Rust

Rayon Rayon is a data-parallelism library for Rust. It is extremely lightweight and makes it easy to convert a sequential computation into a parallel

null 7.8k Jan 8, 2023
A Rust crate that reads and writes tfrecord files

tfrecord-rust The crate provides the functionality to serialize and deserialize TFRecord data format from TensorFlow. Features Provide both high level

null 22 Nov 3, 2022
Official Rust implementation of Apache Arrow

Native Rust implementation of Apache Arrow Welcome to the implementation of Arrow, the popular in-memory columnar format, in Rust. This part of the Ar

The Apache Software Foundation 1.3k Jan 9, 2023
Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow

Parquet2 This is a re-write of the official parquet crate with performance, parallelism and safety in mind. The five main differentiators in compariso

Jorge Leitao 237 Jan 1, 2023
Apache TinkerPop from Rust via Rucaja (JNI)

Apache TinkerPop from Rust An example showing how to call Apache TinkerPop from Rust via Rucaja (JNI). This repository contains two directories: java

null 8 Sep 27, 2022
sparse linear algebra library for rust

sprs, sparse matrices for Rust sprs implements some sparse matrix data structures and linear algebra algorithms in pure Rust. The API is a work in pro

Vincent Barrielle 311 Dec 18, 2022
DataFrame / Series data processing in Rust

black-jack While PRs are welcome, the approach taken only allows for concrete types (String, f64, i64, ...) I'm not sure this is the way to go. I want

Miles Granger 30 Dec 10, 2022
A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, written in Rust

Datafuse Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture Datafuse is a Real-Time Data Processing & Analytics DBMS wit

Datafuse Labs 5k Jan 4, 2023
(MERGED) Rust bindings for TVM runtime

DEPRECATED The RFC is closed and this has been merge into TVM. TVM Runtime Frontend Support This crate provides an idiomatic Rust API for TVM runtime

Ehsan M. Kermani 26 Sep 29, 2022
ConnectorX - Fastest library to load data from DB to DataFrames in Rust and Python

ConnectorX enables you to load data from databases into Python in the fastest and most memory efficient way.

SFU Database Group 939 Jan 5, 2023
Fill Apache Arrow record batches from an ODBC data source in Rust.

arrow-odbc Fill Apache Arrow arrays from ODBC data sources. This crate is build on top of the arrow and odbc-api crate and enables you to read the dat

Markus Klein 21 Dec 27, 2022