ndarray: an N-dimensional array with array views, multidimensional slicing, and efficient operations

Last update: Jan 7, 2023

Related tags

Overview

ndarray

The ndarray crate provides an n-dimensional container for general elements and for numerics.

Please read the API documentation on docs.rs or take a look at the quickstart tutorial.

Highlights

Generic 1, 2, ..., n-dimensional arrays
Owned arrays and array views
Slicing, also with arbitrary step size, and negative indices to mean elements from the end of the axis.
Views and subviews of arrays; iterators that yield subviews.

Status and Lookout

Still iterating on and evolving the crate
- The crate is continuously developing, and breaking changes are expected during evolution from version to version. We adopt the newest stable rust features if we need them.
Performance:
- Prefer higher order methods and arithmetic operations on arrays first, then iteration, and as a last priority using indexed algorithms.
- Efficient floating point matrix multiplication even for very large matrices; can optionally use BLAS to improve it further.

Crate Feature Flags

The following crate feature flags are available. They are configured in your Cargo.toml.

std
- Rust standard library (enabled by default)
- This crate can be used without the standard library by disabling the default std feature. To do so, use this in your Cargo.toml:
  
  [dependencies] ndarray = { version = "0.x.y", default-features = false }
- The geomspace linspace logspace range std var var_axis and std_axis methods are only available when std is enabled.
serde
- Enables serialization support for serde 1.x
rayon
- Enables parallel iterators, parallelized methods and par_azip!.
- Implies std
blas
- Enable transparent BLAS support for matrix multiplication. Uses blas-src for pluggable backend, which needs to be configured separately (see below).

How to use with cargo

[dependencies]
ndarray = "0.14.0"

How to enable blas integration. Depend on blas-src directly to pick a blas provider. Depend on the same blas-src version as ndarray does, for the selection to work. An example configuration using system openblas is shown below. Note that only end-user projects (not libraries) should select provider:

[dependencies]
ndarray = { version = "0.14.0", features = ["blas"] }
blas-src = { version = "0.7.0", default-features = false, features = ["openblas"] }
openblas-src = { version = "0.9", default-features = false, features = ["cblas", "system"] }

For official releases of ndarray, the versions are:

`ndarray`	`blas-src`	`openblas-src`
0.15	0.7.0	0.9.0
0.14	0.6.1	0.9.0
0.13	0.2.0	0.6.0
0.12	0.2.0	0.6.0
0.11	0.1.2	0.5.0

Recent Changes

See RELEASES.md.

License

Dual-licensed to be compatible with the Rust project.

Licensed under the Apache License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0 or the MIT license http://opensource.org/licenses/MIT, at your option. This file may not be copied, modified, or distributed except according to those terms.

Comments

Use axpy for scaled_add
This adds a method to use saxpy and daxpy when doing scaled addition. It gives a nice performance boost:

test scaled_add_2d_f32_axpy ... bench: 312 ns/iter (+/- 13) test scaled_add_2d_f32_regular ... bench: 571 ns/iter (+/- 23)

The main addition in this code is Dimension::equispaced_stride, which is a generalization of the code formerly inside Dimension::is_contiguous: if all elements of an array are one ptr-width away from each other, we call the array contiguous. If all elements of an array are exactly n ptr-widths away from each other, we call the array equispaced (not sure if that's the right terminology). If n = 1, then an equispaced array is contiguous. As long as an array is equispaced, it is admissable to blas via the incx and incy variables (in case of *axpy).

I think we might be able to use this for the gemm implementations, too.

EDIT: I notice that there are inlined versions of the Dimensions trait functions for 1 to 3 dimensional arrays. I guess I should extend those too?

EDIT 2: I am also not sure I like the extra allocations in that continuity/equidistance check. But I guess it doesn't matter much for large arrays.
opened by SuperFluffy 32
ndarray::Array macro array![]

This allows for Array and RcArray creation without specifying the dimensionality using macros. On a sidenote, this also allows for compile time flattening of arrays, I've implemented a version of this here, I don't know if doing something similar to this is more efficient then what is being done now in rust-ndarray but maybe you find it interesting.

opened by Pireax 32
no_std for ndarray
Using std gives us the following features:

std::arch and target feature detection for seamless simd

.

Having optional no_std support would give us these tangible benefits:

.

I have created this issue so that the pros and cons of no_std can be explained. As of this writing, using no_std does not seem to be attractive.

As usual, all new features are to be used because of what gain they give us, not to score points. Formulate goal & gain before implementing.
enhancement
opened by bluss 30
Better doc for beginners?

Hi there,

As a beginner for our nice looking tool, it seems that I cannot find a good beginner level, step by step tutorial for rust ndarry in our repo?

To draw some users from python numpy to start using rust numpy, I feel that it could be very helpful to have some readme doc or a site presenting some information like this? https://numpy.org/devdocs/user/quickstart.html

A quickstart doc would be very helpful, isn't it? Not sure how busy you are, but maybe I can try start making some simple doc, basically port some or all the example operations from https://numpy.org/devdocs/user/quickstart.html to some read me here would be helpful.

Also, I wonder, how a site/book like this is generated https://doc.rust-lang.org/rust-by-example/hello.html I guess it would be nice that eventually we have a doc site for rust-ndarry (or perhaps there is already some?)

Thank you :)

opened by liufuyang 24
append, append_row, append_column: methods for appending an array or single rows and columns
For owned arrays specifically (Array), allow appending rows and/or columns or whole arrays (along a given axis).

New methods:

.append_row(ArrayView1<A>) -> Result

.append_column(ArrayView1<A>) -> Result

.append(axis, ArrayView<A, D>) -> Result

.move_into(impl Into<ArrayViewMut>)

New features:

stack, concatenate and .select(...) now support Clone elements (previously only Copy).

The axis we append along should be a "growing axis", i.e the axis with greatest, and positive, stride. However if that axis is of length 0 or 1, we can always convert it to the new growing axis.

These methods automatically move the whole array to a compatible memory layout for appending, if needed.

The examples show that empty arrays are valid both ways (both memory layouts) - and that might be the easiest way to use these methods for the users.

There were quite a few corner cases in the implementation in this PR, but hopefully they should all be dealt with in a moderately clean way and with little duplication.

Fixes #269
opened by bluss 23
Integrate ndarray-parallel and make rayon an optional feature

This moves all the ndarray-parallel functionality directly into ndarray itself.

This means the parallelization features are more visible (in docs by default) and more accessible. The user still has to opt-in to use it, and use the rayon traits.

We also get rid of extra shims in the integration and use the rayon IntoParallel* traits directly, and can use inherent methods for par_apply and so on.

Fixes #551

opened by bluss 23
Add support for slicing with subviews
Please don't merge this PR immediately. It needs some refinement, including updated documentation and more tests.

This is an implementation of #215, adding support for taking subviews while slicing. A few of the changes are:

The s! macro can take individual indices (to indicate subviews) in addition to ranges.

The s! macro returns an owned SliceInfo instance instead of just a reference, so the caller can easily pass it around between methods.

The various *slice*() methods now take a SliceInfo instance (or reference).

There's now a slice_into() method. Note that ideally we'd come up with a better name than slice_into because there's already an into_slice() method.

ArrayBase now has three additional methods *slice_axis*() that slice along a single axis.

The primary disadvantage of this change is that it's not easy to create a SliceInfo instance without the s! macro. If that's an important feature, it would be possible to do any (or all) of the following:

Add impl<'a> From<&'a [SliceOrIndex]> for SliceInfo<Vec<SliceOrIndex>, IxDyn>.

Add a macro to convert [SliceOrIndex; n] into SliceInfo<[SliceOrIndex; n], D>, where D is determined at compile time.

Add a function to convert [SliceOrIndex; n] into SliceInfo<[SliceOrIndex; n], D>, where the caller would have to specify D, and D would be checked at runtime.

Note that this PR is a breaking change.

Will you please let me know what you think?
opened by jturner314 23
Implement co-broadcasting in operator overloading
This PR did the following to implement co_broadcasting in operator overloading

Implement DimMax trait, which uses the same broadcast mechanism as Numpy, for all Dimension so as to get the returned array type in operator overloading.

Use map_collect() and map_collect_owned() in operator overloading to avoid redundant array traversal.
opened by SparrowLii 22
Meta Issue: Support for parallelized/blocked algorithms
What are your thoughts on implementing something similar to http://dask.pydata.org/en/latest/ on top of ndarrays? I suspect parallelized computations on submatrices should be pretty natural to do in the Rust framework, and it seems you've already created sub-array view functions. Do you agree?

(Community Edits below)

Actionable sub issues:

[x] Send/Sync splittable array views are already present

[x] Implement Rayon parallel iterator traits for Axis Iter #248 #252

[x] Implement Rayon parallel iterator traits for element iterators / for Array itself #252

[x] alternative methods of collecting an axis_iter to ndarray matrix #249

[x] Parallel Iter for AxisChunksIter

[x] Parallel support for Array::map_inplace #288

[x] Parallel support for Array::map -> Array

[x] Parallel lock step function application (Zip) #288
opened by kernelmachine 22
Status of project?

Just wondering what's happening with ndarray (and the future 0.14.0 release). It looks like there hasn't been much activity in recent months. Just curious btw (and willing to assist in any PRs if help is wanted)

opened by xd009642 21
Iterate by chunks over specified axis

This relies on iterating using OuterIterCore while monitoring the location of the current index. When the iterator detects the last chunk, a different shape is used.

I'm not sure this is the proper design yet, it works for forward iteration but probably not for reversed iteration.

opened by vbarrielle 19
Inconsistent behaviour between `arr0` and `arr1`
To create a 0-d array with arr0, we can actually pass an array literal as follows

let a = arr0([0]);

This is ignored because arr0 does not check what is passed or whether it has the correct shape. First of all, I think this is bad. See the related issue: https://github.com/rust-ndarray/ndarray/issues/1253

As you can see, I don't need to pass a reference.

Now, if we attempt to create a 1d, 2d or 3d array with the similar functions, we need to pass the literal by reference, i.e. the following does not compile

let a = arr1([[0], [2]]);

The same happens with arr2 and arr3.

Whether we decide to solve the other issue https://github.com/rust-ndarray/ndarray/issues/1253 or not, in my view, this inconsistency should not exist.

By the way, we can also do

let a = arr0(&[0]);

This inconsistency is due to the fact that arr0 accepts A while the other functions accept &[A], and I suppose we can pass references when we accept A
opened by nbro 0
Should the examples inside the `examples` folder be moved to the repo `ndarray-examples`?

It seems that the examples here https://github.com/rust-ndarray/ndarray/tree/master/examples can be moved to https://github.com/rust-ndarray/ndarray-examples, which, in my view, doesn't just have to contain machine learning examples, but could contain even simple examples to help newbies use ndarray.

opened by nbro 0
`arr0` is able to create arrays with multiple dimensions
Similarly to https://github.com/rust-ndarray/ndarray/issues/1252, it seems that we can use the function arr0 to create arrays that actually have more dimensions. See the example below.

use ndarray::arr0; fn main() { let a = arr0([0, 2]); println!("0-d array with shape {:?} is {:?}", a.shape(), a); let a = arr0([[0], [2]]); println!("0-d array with shape {:?} is {:?}", a.shape(), a); }

The shape is [], but the array is displayed as higher-dimensional array, so I suppose that, under the hood, it's not actually a 0-d array (a number). In fact, looking at the implementation of arr0, it seems it just calls vec![], which of course may create any array. So, basically, not checks are performed on the dimensions and type of x. Looking at the documentation of from_shape_vec_unchecked (which is called from arr0), it states

Unsafe because dimension and strides are unchecked.

This is ok. However, if arr0 is supposed to be unsafe or the code above is supposed to be "fine" (which, in my view, should not), this should be documented at least in the arr0 page. I am happy to contribute to the documentation. It's possible that this issue also happens with all other similar functions arr1, arr2 and arr3 - I didn't check that yet.
opened by nbro 1
The `array!` macro's documentation should clarify that if we attempt to create arrays with more than 3 dimensions they are ignored
The documentation says

Create an Array with one, two or three dimensions.

This is fine.

However, I wonder if the following code should be able to run silently (not even a warning).

use ndarray::array; fn main() { let a = array![ [ [[[2, 2]]], [[[3, 1]]], [[[5, 3]]], [[[2, 2]]] ] ]; println!("An int array with shape {:?}: {:?}", a.shape(), a.ndim()); // Shape is [1, 4, 1] }

Basically, what's happening is that some dimensions or parentheses are automatically ignored. I think this should at least be documented or maybe the macro should fail in those cases? I am happy to contribute to the documentation, if you think that's the way to go in this case. I am also opening this issue because, if we attempt to create a similar thing in numpy with e.g. the following code

import numpy as np a = np.array( [ [[[2, 2]]], [[[3, 1]]], [[[5, 3]]], [[[2, 2]]] ] ) print(a.shape)

We get a different shape: (4, 1, 1, 2).

Btw, I suppose there's no macro that allows us to create arrays from literals with more than 3 dimensions. Is that correct? Is there any reason why the macro only supports up to 3 dimensions? In machine learning, for example, it it's not rare to have multi-dimensional arrays with more than 3 dimensions, so I think that supporting the creation of arrays from literals with more than 3 dimensions would not be a bad idea.
opened by nbro 0
What would be the equivalent of numpy's `dtype` in `ndarray`?

Rust is, of course, statically and strongly typed. However, I wonder if it would make sense to have something like dtype for arrays in ndarray. I am not a Rust expert too, but it seems that there's a way to query the type of a variable. See this SO post. This could be useful because Rust has multiple integer and float types and one may need to know which specific type do the elements of an array have at runtime. This would just be a nice-to-have feature. Have you ever thought about this?

opened by nbro 6
Should `ndarray-rand` be moved to its own repo?

It seems that ndarray-rand could be an independent crate in its own separate repo, like ndarray-linalg or ndarray-stats. In that way, people could focus on the specific crates and could use ndarray-rand independently.

Is there any specific/particular reason why ndarray-rand is not in its own repo? Currently, it's specified as a member here, so I suppose you would like to use the same dependencies in ndarray-rand as in ndarray, but I don't see why this constraint couldn't be kept if we move ndarray-rand to a separate repo, but I am not very familiar yet with neither ndarray nor ndarray-rand.

opened by nbro 0

Owner

GitHub https://docs.rs/ndarray/

This library provides a data view for reading and writing data in a byte array.

Docs This library provides a data view for reading and writing data in a byte array. This library requires feature(generic_const_exprs) to be enabled.

2 Nov 2, 2022

Perhaps the fastest and most memory efficient way to pull data from PostgreSQL into pandas and numpy. 🚀

flaco Perhaps the fastest and most memory efficient way to pull data from PostgreSQL into pandas and numpy. ?? Have a gander at the initial benchmarks

14 Oct 31, 2022

A highly efficient daemon for streaming data from Kafka into Delta Lake

172 Dec 23, 2022

A fast, powerful, flexible and easy to use open source data analysis and manipulation tool written in Rust

fisher-rs fisher-rs is a Rust library that brings powerful data manipulation and analysis capabilities to Rust developers, inspired by the popular pan

5 Aug 31, 2023

A fast, powerful, flexible and easy to use open source data analysis and manipulation tool written in Rust

fisher-rs fisher-rs is a Rust library that brings powerful data manipulation and analysis capabilities to Rust developers, inspired by the popular pan

5 Sep 6, 2023

A Rust crate that reads and writes tfrecord files

tfrecord-rust The crate provides the functionality to serialize and deserialize TFRecord data format from TensorFlow. Features Provide both high level

22 Nov 3, 2022

Apache Arrow DataFusion and Ballista query engines

DataFusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.

2.9k Jan 2, 2023

Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow

Parquet2 This is a re-write of the official parquet crate with performance, parallelism and safety in mind. The five main differentiators in compariso

237 Jan 1, 2023

Orkhon: ML Inference Framework and Server Runtime

Orkhon: ML Inference Framework and Server Runtime Latest Release License Build Status Downloads Gitter What is it? Orkhon is Rust framework for Machin

129 Dec 21, 2022

Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference

Sonos' Neural Network inference engine. This project used to be called tfdeploy, or Tensorflow-deploy-rust. What ? tract is a Neural Network inference

1.5k Jan 2, 2023

ConnectorX - Fastest library to load data from DB to DataFrames in Rust and Python

ConnectorX enables you to load data from databases into Python in the fastest and most memory efficient way.

939 Jan 5, 2023

Provides a way to use enums to describe and execute ordered data pipelines. 🦀🐾

enum_pipline Provides a way to use enums to describe and execute ordered data pipelines. ?? ?? I needed a succinct way to describe 2d pixel map operat

0 Oct 29, 2021

AppFlowy is an open-source alternative to Notion. You are in charge of your data and customizations

AppFlowy is an open-source alternative to Notion. You are in charge of your data and customizations. Built with Flutter and Rust.

30.7k Jan 7, 2023

New generation decentralized data warehouse and streaming data pipeline

184 Dec 22, 2022

An example repository on how to start building graph applications on streaming data. Just clone and start building 💻 💪

An example repository on how to start building graph applications on streaming data. Just clone and start building ?? ??

40 Dec 20, 2022

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing

Apache Arrow Powering In-Memory Analytics Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enabl

10.9k Jan 6, 2023

Cloud native log storage and management for Kubernetes, containerised workloads

Live Demo | Website | API Workspace on Postman Parseable is an open source, cloud native, log storage and management platform. Parseable helps you ing

715 Jan 1, 2023

A dataframe manipulation tool inspired by dplyr and powered by polars.

dply is a command line tool for viewing, querying, and writing csv and parquet files, inspired by dplyr and powered by polars. Usage overview A dply p

14 May 29, 2023

Read specialized NGS formats as data frames in R, Python, and more.

oxbow Read specialized bioinformatic file formats as data frames in R, Python, and more. File formats create a lot of friction for computational biolo

12 Jun 7, 2023