ndarray: an N-dimensional array with array views, multidimensional slicing, and efficient operations

Overview

ndarray

The ndarray crate provides an n-dimensional container for general elements and for numerics.

Please read the API documentation on docs.rs or take a look at the quickstart tutorial.

CI build status ndarray at crates.io Matrix chat at #rust-sci:matrix.org IRC at #rust-sci on OFTC

Highlights

  • Generic 1, 2, ..., n-dimensional arrays
  • Owned arrays and array views
  • Slicing, also with arbitrary step size, and negative indices to mean elements from the end of the axis.
  • Views and subviews of arrays; iterators that yield subviews.

Status and Lookout

  • Still iterating on and evolving the crate
    • The crate is continuously developing, and breaking changes are expected during evolution from version to version. We adopt the newest stable rust features if we need them.
  • Performance:
    • Prefer higher order methods and arithmetic operations on arrays first, then iteration, and as a last priority using indexed algorithms.
    • Efficient floating point matrix multiplication even for very large matrices; can optionally use BLAS to improve it further.

Crate Feature Flags

The following crate feature flags are available. They are configured in your Cargo.toml.

  • std

    • Rust standard library (enabled by default)

    • This crate can be used without the standard library by disabling the default std feature. To do so, use this in your Cargo.toml:

      [dependencies] ndarray = { version = "0.x.y", default-features = false }

    • The geomspace linspace logspace range std var var_axis and std_axis methods are only available when std is enabled.

  • serde

    • Enables serialization support for serde 1.x
  • rayon

    • Enables parallel iterators, parallelized methods and par_azip!.
    • Implies std
  • blas

    • Enable transparent BLAS support for matrix multiplication. Uses blas-src for pluggable backend, which needs to be configured separately (see below).

How to use with cargo

[dependencies]
ndarray = "0.14.0"

How to enable blas integration. Depend on blas-src directly to pick a blas provider. Depend on the same blas-src version as ndarray does, for the selection to work. An example configuration using system openblas is shown below. Note that only end-user projects (not libraries) should select provider:

[dependencies]
ndarray = { version = "0.14.0", features = ["blas"] }
blas-src = { version = "0.7.0", default-features = false, features = ["openblas"] }
openblas-src = { version = "0.9", default-features = false, features = ["cblas", "system"] }

For official releases of ndarray, the versions are:

ndarray blas-src openblas-src
0.15 0.7.0 0.9.0
0.14 0.6.1 0.9.0
0.13 0.2.0 0.6.0
0.12 0.2.0 0.6.0
0.11 0.1.2 0.5.0

Recent Changes

See RELEASES.md.

License

Dual-licensed to be compatible with the Rust project.

Licensed under the Apache License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0 or the MIT license http://opensource.org/licenses/MIT, at your option. This file may not be copied, modified, or distributed except according to those terms.

Comments
  • Use axpy for scaled_add

    Use axpy for scaled_add

    This adds a method to use saxpy and daxpy when doing scaled addition. It gives a nice performance boost:

    test scaled_add_2d_f32_axpy           ... bench:         312 ns/iter (+/- 13)
    test scaled_add_2d_f32_regular        ... bench:         571 ns/iter (+/- 23)
    

    The main addition in this code is Dimension::equispaced_stride, which is a generalization of the code formerly inside Dimension::is_contiguous: if all elements of an array are one ptr-width away from each other, we call the array contiguous. If all elements of an array are exactly n ptr-widths away from each other, we call the array equispaced (not sure if that's the right terminology). If n = 1, then an equispaced array is contiguous. As long as an array is equispaced, it is admissable to blas via the incx and incy variables (in case of *axpy).

    I think we might be able to use this for the gemm implementations, too.

    EDIT: I notice that there are inlined versions of the Dimensions trait functions for 1 to 3 dimensional arrays. I guess I should extend those too?

    EDIT 2: I am also not sure I like the extra allocations in that continuity/equidistance check. But I guess it doesn't matter much for large arrays.

    opened by SuperFluffy 32
  • ndarray::Array macro array![]

    ndarray::Array macro array![]

    This allows for Array and RcArray creation without specifying the dimensionality using macros. On a sidenote, this also allows for compile time flattening of arrays, I've implemented a version of this here, I don't know if doing something similar to this is more efficient then what is being done now in rust-ndarray but maybe you find it interesting.

    opened by Pireax 32
  • no_std for ndarray

    no_std for ndarray

    Using std gives us the following features:

    • std::arch and target feature detection for seamless simd
    • .

    Having optional no_std support would give us these tangible benefits:

    • .

    I have created this issue so that the pros and cons of no_std can be explained. As of this writing, using no_std does not seem to be attractive.

    As usual, all new features are to be used because of what gain they give us, not to score points. Formulate goal & gain before implementing.

    enhancement 
    opened by bluss 30
  • Better doc for beginners?

    Better doc for beginners?

    Hi there,

    As a beginner for our nice looking tool, it seems that I cannot find a good beginner level, step by step tutorial for rust ndarry in our repo?

    To draw some users from python numpy to start using rust numpy, I feel that it could be very helpful to have some readme doc or a site presenting some information like this? https://numpy.org/devdocs/user/quickstart.html

    A quickstart doc would be very helpful, isn't it? Not sure how busy you are, but maybe I can try start making some simple doc, basically port some or all the example operations from https://numpy.org/devdocs/user/quickstart.html to some read me here would be helpful.

    Also, I wonder, how a site/book like this is generated https://doc.rust-lang.org/rust-by-example/hello.html I guess it would be nice that eventually we have a doc site for rust-ndarry (or perhaps there is already some?)

    Thank you :)

    opened by liufuyang 24
  • append, append_row, append_column: methods for appending an array or single rows and columns

    append, append_row, append_column: methods for appending an array or single rows and columns

    For owned arrays specifically (Array), allow appending rows and/or columns or whole arrays (along a given axis).

    New methods:

    • .append_row(ArrayView1<A>) -> Result
    • .append_column(ArrayView1<A>) -> Result
    • .append(axis, ArrayView<A, D>) -> Result
    • .move_into(impl Into<ArrayViewMut>)

    New features:

    • stack, concatenate and .select(...) now support Clone elements (previously only Copy).

    The axis we append along should be a "growing axis", i.e the axis with greatest, and positive, stride. However if that axis is of length 0 or 1, we can always convert it to the new growing axis.

    These methods automatically move the whole array to a compatible memory layout for appending, if needed.

    The examples show that empty arrays are valid both ways (both memory layouts) - and that might be the easiest way to use these methods for the users.

    There were quite a few corner cases in the implementation in this PR, but hopefully they should all be dealt with in a moderately clean way and with little duplication.

    Fixes #269

    opened by bluss 23
  • Integrate ndarray-parallel and make rayon an optional feature

    Integrate ndarray-parallel and make rayon an optional feature

    This moves all the ndarray-parallel functionality directly into ndarray itself.

    This means the parallelization features are more visible (in docs by default) and more accessible. The user still has to opt-in to use it, and use the rayon traits.

    We also get rid of extra shims in the integration and use the rayon IntoParallel* traits directly, and can use inherent methods for par_apply and so on.

    Fixes #551

    opened by bluss 23
  • Add support for slicing with subviews

    Add support for slicing with subviews

    Please don't merge this PR immediately. It needs some refinement, including updated documentation and more tests.

    This is an implementation of #215, adding support for taking subviews while slicing. A few of the changes are:

    • The s! macro can take individual indices (to indicate subviews) in addition to ranges.
    • The s! macro returns an owned SliceInfo instance instead of just a reference, so the caller can easily pass it around between methods.
    • The various *slice*() methods now take a SliceInfo instance (or reference).
    • There's now a slice_into() method. Note that ideally we'd come up with a better name than slice_into because there's already an into_slice() method.
    • ArrayBase now has three additional methods *slice_axis*() that slice along a single axis.

    The primary disadvantage of this change is that it's not easy to create a SliceInfo instance without the s! macro. If that's an important feature, it would be possible to do any (or all) of the following:

    • Add impl<'a> From<&'a [SliceOrIndex]> for SliceInfo<Vec<SliceOrIndex>, IxDyn>.
    • Add a macro to convert [SliceOrIndex; n] into SliceInfo<[SliceOrIndex; n], D>, where D is determined at compile time.
    • Add a function to convert [SliceOrIndex; n] into SliceInfo<[SliceOrIndex; n], D>, where the caller would have to specify D, and D would be checked at runtime.

    Note that this PR is a breaking change.

    Will you please let me know what you think?

    opened by jturner314 23
  • Implement co-broadcasting in operator overloading

    Implement co-broadcasting in operator overloading

    This PR did the following to implement co_broadcasting in operator overloading

    1. Implement DimMax trait, which uses the same broadcast mechanism as Numpy, for all Dimension so as to get the returned array type in operator overloading.
    2. Use map_collect() and map_collect_owned() in operator overloading to avoid redundant array traversal.
    opened by SparrowLii 22
  • Meta Issue: Support for parallelized/blocked algorithms

    Meta Issue: Support for parallelized/blocked algorithms

    What are your thoughts on implementing something similar to http://dask.pydata.org/en/latest/ on top of ndarrays? I suspect parallelized computations on submatrices should be pretty natural to do in the Rust framework, and it seems you've already created sub-array view functions. Do you agree?

    (Community Edits below)


    Actionable sub issues:

    • [x] Send/Sync splittable array views are already present
    • [x] Implement Rayon parallel iterator traits for Axis Iter #248 #252
    • [x] Implement Rayon parallel iterator traits for element iterators / for Array itself #252
    • [x] alternative methods of collecting an axis_iter to ndarray matrix #249
    • [x] Parallel Iter for AxisChunksIter
    • [x] Parallel support for Array::map_inplace #288
    • [x] Parallel support for Array::map -> Array
    • [x] Parallel lock step function application (Zip) #288
    opened by kernelmachine 22
  • Status of project?

    Status of project?

    Just wondering what's happening with ndarray (and the future 0.14.0 release). It looks like there hasn't been much activity in recent months. Just curious btw (and willing to assist in any PRs if help is wanted)

    opened by xd009642 21
  • Iterate by chunks over specified axis

    Iterate by chunks over specified axis

    This relies on iterating using OuterIterCore while monitoring the location of the current index. When the iterator detects the last chunk, a different shape is used.

    I'm not sure this is the proper design yet, it works for forward iteration but probably not for reversed iteration.

    opened by vbarrielle 19
  • Inconsistent behaviour between `arr0` and `arr1`

    Inconsistent behaviour between `arr0` and `arr1`

    To create a 0-d array with arr0, we can actually pass an array literal as follows

    let a = arr0([0]);
    

    This is ignored because arr0 does not check what is passed or whether it has the correct shape. First of all, I think this is bad. See the related issue: https://github.com/rust-ndarray/ndarray/issues/1253

    As you can see, I don't need to pass a reference.

    Now, if we attempt to create a 1d, 2d or 3d array with the similar functions, we need to pass the literal by reference, i.e. the following does not compile

    let a = arr1([[0], [2]]);
    

    The same happens with arr2 and arr3.

    Whether we decide to solve the other issue https://github.com/rust-ndarray/ndarray/issues/1253 or not, in my view, this inconsistency should not exist.

    By the way, we can also do

    let a = arr0(&[0]);
    

    This inconsistency is due to the fact that arr0 accepts A while the other functions accept &[A], and I suppose we can pass references when we accept A

    opened by nbro 0
  • Should the examples inside the `examples` folder be moved to the repo `ndarray-examples`?

    Should the examples inside the `examples` folder be moved to the repo `ndarray-examples`?

    It seems that the examples here https://github.com/rust-ndarray/ndarray/tree/master/examples can be moved to https://github.com/rust-ndarray/ndarray-examples, which, in my view, doesn't just have to contain machine learning examples, but could contain even simple examples to help newbies use ndarray.

    opened by nbro 0
  • `arr0` is able to create arrays with multiple dimensions

    `arr0` is able to create arrays with multiple dimensions

    Similarly to https://github.com/rust-ndarray/ndarray/issues/1252, it seems that we can use the function arr0 to create arrays that actually have more dimensions. See the example below.

    use ndarray::arr0;
    
    fn main() {
        let a = arr0([0, 2]);
        println!("0-d array with shape {:?} is {:?}", a.shape(), a);
    
        let a = arr0([[0], [2]]);
        println!("0-d array with shape {:?} is {:?}", a.shape(), a);
    }
    

    The shape is [], but the array is displayed as higher-dimensional array, so I suppose that, under the hood, it's not actually a 0-d array (a number). In fact, looking at the implementation of arr0, it seems it just calls vec![], which of course may create any array. So, basically, not checks are performed on the dimensions and type of x. Looking at the documentation of from_shape_vec_unchecked (which is called from arr0), it states

    Unsafe because dimension and strides are unchecked.

    This is ok. However, if arr0 is supposed to be unsafe or the code above is supposed to be "fine" (which, in my view, should not), this should be documented at least in the arr0 page. I am happy to contribute to the documentation. It's possible that this issue also happens with all other similar functions arr1, arr2 and arr3 - I didn't check that yet.

    opened by nbro 1
  • The `array!` macro's documentation should clarify that if we attempt to create arrays with more than 3 dimensions they are ignored

    The `array!` macro's documentation should clarify that if we attempt to create arrays with more than 3 dimensions they are ignored

    The documentation says

    Create an Array with one, two or three dimensions.

    This is fine.

    However, I wonder if the following code should be able to run silently (not even a warning).

    use ndarray::array;
    
    fn main() {
        let a = array![
            [
                    [[[2, 2]]], 
                    [[[3, 1]]], 
                    [[[5, 3]]], 
                    [[[2, 2]]]
            ]
            ];
        println!("An int array with shape {:?}: {:?}", a.shape(), a.ndim()); // Shape is [1, 4, 1]
    }
    
    

    Basically, what's happening is that some dimensions or parentheses are automatically ignored. I think this should at least be documented or maybe the macro should fail in those cases? I am happy to contribute to the documentation, if you think that's the way to go in this case. I am also opening this issue because, if we attempt to create a similar thing in numpy with e.g. the following code

    import numpy as np
    
    a = np.array(
            [
                    [[[2, 2]]], 
                    [[[3, 1]]], 
                    [[[5, 3]]], 
                    [[[2, 2]]]
            ]
            )
    
    print(a.shape)
    

    We get a different shape: (4, 1, 1, 2).

    Btw, I suppose there's no macro that allows us to create arrays from literals with more than 3 dimensions. Is that correct? Is there any reason why the macro only supports up to 3 dimensions? In machine learning, for example, it it's not rare to have multi-dimensional arrays with more than 3 dimensions, so I think that supporting the creation of arrays from literals with more than 3 dimensions would not be a bad idea.

    opened by nbro 0
  • What would be the equivalent of numpy's `dtype` in `ndarray`?

    What would be the equivalent of numpy's `dtype` in `ndarray`?

    Rust is, of course, statically and strongly typed. However, I wonder if it would make sense to have something like dtype for arrays in ndarray. I am not a Rust expert too, but it seems that there's a way to query the type of a variable. See this SO post. This could be useful because Rust has multiple integer and float types and one may need to know which specific type do the elements of an array have at runtime. This would just be a nice-to-have feature. Have you ever thought about this?

    opened by nbro 6
  • Should `ndarray-rand` be moved to its own repo?

    Should `ndarray-rand` be moved to its own repo?

    It seems that ndarray-rand could be an independent crate in its own separate repo, like ndarray-linalg or ndarray-stats. In that way, people could focus on the specific crates and could use ndarray-rand independently.

    Is there any specific/particular reason why ndarray-rand is not in its own repo? Currently, it's specified as a member here, so I suppose you would like to use the same dependencies in ndarray-rand as in ndarray, but I don't see why this constraint couldn't be kept if we move ndarray-rand to a separate repo, but I am not very familiar yet with neither ndarray nor ndarray-rand.

    opened by nbro 0
This library provides a data view for reading and writing data in a byte array.

Docs This library provides a data view for reading and writing data in a byte array. This library requires feature(generic_const_exprs) to be enabled.

null 2 Nov 2, 2022
Perhaps the fastest and most memory efficient way to pull data from PostgreSQL into pandas and numpy. 🚀

flaco Perhaps the fastest and most memory efficient way to pull data from PostgreSQL into pandas and numpy. ?? Have a gander at the initial benchmarks

Miles Granger 14 Oct 31, 2022
A highly efficient daemon for streaming data from Kafka into Delta Lake

A highly efficient daemon for streaming data from Kafka into Delta Lake

Delta Lake 172 Dec 23, 2022
A fast, powerful, flexible and easy to use open source data analysis and manipulation tool written in Rust

fisher-rs fisher-rs is a Rust library that brings powerful data manipulation and analysis capabilities to Rust developers, inspired by the popular pan

Syed Vilayat Ali Rizvi 5 Aug 31, 2023
A fast, powerful, flexible and easy to use open source data analysis and manipulation tool written in Rust

fisher-rs fisher-rs is a Rust library that brings powerful data manipulation and analysis capabilities to Rust developers, inspired by the popular pan

null 5 Sep 6, 2023
A Rust crate that reads and writes tfrecord files

tfrecord-rust The crate provides the functionality to serialize and deserialize TFRecord data format from TensorFlow. Features Provide both high level

null 22 Nov 3, 2022
Apache Arrow DataFusion and Ballista query engines

DataFusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.

The Apache Software Foundation 2.9k Jan 2, 2023
Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow

Parquet2 This is a re-write of the official parquet crate with performance, parallelism and safety in mind. The five main differentiators in compariso

Jorge Leitao 237 Jan 1, 2023
Orkhon: ML Inference Framework and Server Runtime

Orkhon: ML Inference Framework and Server Runtime Latest Release License Build Status Downloads Gitter What is it? Orkhon is Rust framework for Machin

Theo M. Bulut 129 Dec 21, 2022
Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference

Sonos' Neural Network inference engine. This project used to be called tfdeploy, or Tensorflow-deploy-rust. What ? tract is a Neural Network inference

Sonos, Inc. 1.5k Jan 2, 2023
ConnectorX - Fastest library to load data from DB to DataFrames in Rust and Python

ConnectorX enables you to load data from databases into Python in the fastest and most memory efficient way.

SFU Database Group 939 Jan 5, 2023
Provides a way to use enums to describe and execute ordered data pipelines. 🦀🐾

enum_pipline Provides a way to use enums to describe and execute ordered data pipelines. ?? ?? I needed a succinct way to describe 2d pixel map operat

Ben Greenier 0 Oct 29, 2021
AppFlowy is an open-source alternative to Notion. You are in charge of your data and customizations

AppFlowy is an open-source alternative to Notion. You are in charge of your data and customizations. Built with Flutter and Rust.

null 30.7k Jan 7, 2023
New generation decentralized data warehouse and streaming data pipeline

World's first decentralized real-time data warehouse, on your laptop Docs | Demo | Tutorials | Examples | FAQ | Chat Get Started Watch this introducto

kamu 184 Dec 22, 2022
An example repository on how to start building graph applications on streaming data. Just clone and start building 💻 💪

An example repository on how to start building graph applications on streaming data. Just clone and start building ?? ??

Memgraph 40 Dec 20, 2022
Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing

Apache Arrow Powering In-Memory Analytics Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enabl

The Apache Software Foundation 10.9k Jan 6, 2023
Cloud native log storage and management for Kubernetes, containerised workloads

Live Demo | Website | API Workspace on Postman Parseable is an open source, cloud native, log storage and management platform. Parseable helps you ing

Parseable, Inc. 715 Jan 1, 2023
A dataframe manipulation tool inspired by dplyr and powered by polars.

dply is a command line tool for viewing, querying, and writing csv and parquet files, inspired by dplyr and powered by polars. Usage overview A dply p

null 14 May 29, 2023
Read specialized NGS formats as data frames in R, Python, and more.

oxbow Read specialized bioinformatic file formats as data frames in R, Python, and more. File formats create a lot of friction for computational biolo

null 12 Jun 7, 2023