Statistical routines for ndarray

Overview

ndarray-stats

Build status Coverage Dependencies status Crate Documentation

This crate provides statistical methods for ndarray's ArrayBase type.

Currently available routines include:

  • order statistics (minimum, maximum, median, quantiles, etc.);
  • summary statistics (mean, skewness, kurtosis, central moments, etc.)
  • partitioning;
  • correlation analysis (covariance, pearson correlation);
  • measures from information theory (entropy, KL divergence, etc.);
  • deviation functions (distances, counts, errors, etc.);
  • histogram computation.

See the documentation for more information.

Please feel free to contribute new functionality! A roadmap can be found here.

Using with Cargo

[dependencies]
ndarray = "0.15"
ndarray-stats = "0.5"

Releases

  • 0.5.0

    • Breaking changes
      • Minimum supported Rust version: 1.49.0
      • Updated to ndarray:v0.15.0

    Contributors: @Armavica, @cassiersg

  • 0.4.0

    • Breaking changes
      • Minimum supported Rust version: 1.42.0
    • New functionality:
      • Summary statistics:
        • Weighted variance
        • Weighted standard deviation
    • Improvements / breaking changes:
      • Documentation improvements for Histograms
      • Updated to ndarray:v0.14.0

    Contributors: @munckymagik, @nilgoyette, @LukeMathWalker, @lebensterben, @xd009642

  • 0.3.0

    • Breaking changes
      • Minimum supported Rust version: 1.37
    • New functionality:
      • Deviation functions:
        • Counts equal/unequal
        • l1, l2, linf distances
        • (Root) mean squared error
        • Peak signal-to-noise ratio
      • Summary statistics:
        • Weighted sum
        • Weighted mean
    • Improvements / breaking changes:
      • Updated to ndarray:v0.13.0

    Contributors: @munckymagik, @nilgoyette, @jturner314, @LukeMathWalker

  • 0.2.0

    • Breaking changes
      • All ndarray-stats' extension traits are now impossible to implement by users of the library (see #34)
      • Redesigned error handling across the whole crate, standardising on Result
    • New functionality:
      • Summary statistics:
        • Harmonic mean
        • Geometric mean
        • Central moments
        • Kurtosis
        • Skewness
      • Information theory:
        • Entropy
        • Cross-entropy
        • Kullback-Leibler divergence
      • Quantiles and order statistics:
        • argmin / argmin_skipnan
        • argmax / argmax_skipnan
        • Optimized bulk quantile computation (quantiles_mut, quantiles_axis_mut)
    • Fixes:
      • Reduced occurrences of overflow for interpolate::midpoint

    Contributors: @jturner314, @LukeMathWalker, @phungleson, @munckymagik

  • 0.1.0

    • Initial release by @LukeMathWalker and @jturner314.

Contributing

Please feel free to create issues and submit PRs.

License

Copyright 2018 ndarray-stats developers

Licensed under the Apache License, Version 2.0, or the MIT license, at your option. You may not use this project except in compliance with those terms.

Comments
  • Add deviation functions

    Add deviation functions

    Port of deviation functions from StatsBase.jl.

    References:

    TODO

    Basic port:

    Design:

    • [x] Design error handling
    • [x] ~Consider using sum, max etc~

    Testing:

    • [x] Test with integer inputs
    • [x] Test with multi-dimensional inputs
    • [x] Test with empty inputs
    • [x] Test with inconsistent lengths
    • [x] ~Test with no differences~
    • [x] Test with NaNs / NoisyFloats
    • [x] ~Test with mut input~
    • [x] Test with Clone types (BigInt, Complex maybe?)
    • [x] ~Consider using quickcheck~

    Finishing up:

    • [x] Try to simplify trait bounds
    • [x] Review function names
    • [x] Review variable names, r, a, and b etc.
    • [x] private_impl! marker
    • [x] Documentation & examples
    • [x] Squash commits down a bit
    opened by munckymagik 10
  • Bulk quantiles

    Bulk quantiles

    Using the ordering guarantees we have on the output of quantile_mut/sorted_get_mut, it provides a method optimized to compute multiple quantiles at once (by scanning an increasingly smaller subset of the original array, thanks to the computation of the previous quantile).

    Breaking changes:

    • I have changed the quantile parameter from f64 to N64 - floats are not hashable, hence they cannot be used as keys in IndexMap and the function panics anyway if that argument is NaN. This change propagates to the Interpolate types;
    • I have renamed sorted_get_mut to get_from_sorted. It plays better with the bulk version, get_many_from_sorted, and I think it's clearer;
    • quantile_axis_mut and quantile_axis_skipnan_mut now return an Option instead of panicking if the axis length is 0.
    Breaking changes 
    opened by LukeMathWalker 10
  • Histogram (revisited)

    Histogram (revisited)

    Based on our discussion in #8, I have revised the implementation.

    It needs a testing suite and a couple more helper methods on HistogramCounts but the skeleton it's there. Let me know your thoughts @jturner314

    opened by LukeMathWalker 9
  • Weighted var

    Weighted var

    Here's the var/std version.

    • I moved all "summary statistic" tests. Sadly github shows them as a deleted and a new file. Sorry, I should have used git mv. Ctrl-f weighted_var and weighted_std to know what I actually changed.
    • I'm not too sure about ddof. I know that 0.0 and 1.0 work, but I don't actually know what others ddof do. ndarray's definition of ddof is different, so I'm probably wrong :)
    • How can A::from_usize(0) fails? An unwrap would be safe, no?
    • I picked the first 1-pass+weights algorithm that I found. I don't know if there are better or faster algorithm around, or if it accepts other ddof than 0.0 and 1.0. All I know is that it does give the same results as the 2-pass algorithm.
    opened by nilgoyette 7
  • Implement argmin argmax

    Implement argmin argmax

    Hey mates, this PR currently has a rough impl for argmin for discussions.

    Please have some comments. Once everyone is happy with the implementation, I will add necessary docs, tests.. and other methods.

    opened by phungleson 7
  • Weighted mean

    Weighted mean

    Here's a first version of weighted_mean and weighted_mean_axis.

    Disclaimers:

    1. I don't really know where weighted_mean and friends should go. Is summary_statistics ok?
    2. I had to move return_err_if_empty and return_err_unless_same_shape because ther are useful elsewhere.
    3. There's a little code-copy in weighted_mean_axis, to avoid (2 conditions + 1 unwrap) x nb_lanes. Maybe I could create an inner function called inner_weighted_mean or something, then call it in both functions?

    Questions:

    1. Why are the summary_statistics tests (and others) not in /tests/*? I thought that the public API was supposed to be tested outside the crate. Is this not a "standard"?
    opened by nilgoyette 6
  • Central moments

    Central moments

    Computation of central moments of arbitrary order.

    Would it be worth to add another method that takes the mean as a parameter to avoid re-computing it if the user already has its value @jturner314?

    Enhancement 
    opened by LukeMathWalker 6
  • Consider adding examples

    Consider adding examples

    I suggest we consider adding an examples folder to demonstrate more real world usage.

    The benefits I think this would bring are:

    • By seeing the library used in a more realistic situation we may learn things about the design that the tests/doc-tests didn't reveal.
    • We help users get started more quickly for typical use-cases.
    • We get to set up some good usage patterns for others to follow.

    Can we brain-storm a list of the kinds of examples we would want?

    Does anybody have any toy examples we could use to seed the folder?

    opened by munckymagik 5
  • Means

    Means

    A basic implementation of arithmetic, harmonic and geometric mean.

    Even though ndarray exposes mean_axis it does not provide mean, hence I have added it to the PR. Should I contribute it back to ndarray @jturner314?

    opened by LukeMathWalker 5
  • Non-deterministic test?

    Non-deterministic test?

        #[test]
        fn test_zero_observations() {
            let a = Array2::<f32>::zeros((2, 0));
            let pearson = a.pearson_correlation();
            assert_eq!(pearson.shape(), &[2, 2]);
            let all_nan_flag = pearson.iter().map(|x| x.is_nan()).fold(true, |acc, flag| acc & flag);
            assert_eq!(all_nan_flag, true);
        }
    

    This test fails on Travis, for #5, while it succeeds on my local machine. It's weird - any idea? @jturner314 I am trying to reproduce the error, but I am failing.

    Bug Help wanted 
    opened by LukeMathWalker 5
  • Update changelog and version for release

    Update changelog and version for release

    New ndarray version, new release! Only dependencies have changed for this one but it does add breaking changes due to new ndarray and minimum supported rust version moving up to match ndarray 0.15.

    opened by xd009642 4
  • Reexport `noisy_float::types::N64`

    Reexport `noisy_float::types::N64`

    Description So that crates using this crate don't have to add it explicitly to their Cargo.toml, it would be convenient to have this crate pub use things like noisy_float::types::N64 since they are required directly by the API, such as with QuantileExt:: quantile_axis_mut:

    https://github.com/rust-ndarray/ndarray-stats/blob/b6628c6a1c143532673213c56d46da5fda80cbe8/src/quantile/mod.rs#L208-L213

    Version Information

    • ndarray: 0.15.4
    • ndarray-stats: 0.5.0
    • Rust: 1.61.0

    To Reproduce N/A

    Expected behavior Not have to add noisy_float to my Cargo.toml when using ndarray-stats::quantile::QuantileExt.

    opened by metasim 0
  • No unweighted standard deviation

    No unweighted standard deviation

    There exists a method for weighted standard deviation, weighted_std, but I can't find one for regular unweighted standard deviation. Is there a reason for this?

    opened by albertsgarde 1
  • quantile_mut: fatal runtime error: stack overflow

    quantile_mut: fatal runtime error: stack overflow

    Description quantile_mut can fail with the error message:

    thread 'main' has overflowed its stack
    fatal runtime error: stack overflow
    

    Version Information

    • ndarray: 0.15.4
    • ndarray-stats: 0.5.0
    • Rust: 1.58.1

    To Reproduce

    use ndarray::Array1;
    use ndarray_stats::{interpolate::Linear, Quantile1dExt};
    use noisy_float::types::{n64, N64};
    
    fn main() {
        {
            let mut array: Array1<N64> = Array1::ones(15300);
            println!("One {}", array.quantile_mut(n64(0.5), &Linear).unwrap());
        }
    
        {
            let mut array: Array1<N64> = Array1::ones(15600);
            println!("Two {}", array.quantile_mut(n64(0.5), &Linear).unwrap());
        }
    
        {
            let mut array: Array1<N64> = Array1::ones(100000);
            println!("Three {}", array.quantile_mut(n64(0.5), &Linear).unwrap());
        }
    }
    

    Observed behavior

    $ cargo run --profile=dev
    One 1
    
    thread 'main' has overflowed its stack
    fatal runtime error: stack overflow
    $ cargo run --profile=release
    One 1
    Two 1
    
    thread 'main' has overflowed its stack
    fatal runtime error: stack overflow
    

    Expected behavior

    One 1
    Two 1
    Three 1
    

    Additional context

    • I'm able to reproduce this issue on both Linux and macOS with the default stack limit of 8 MiB. (ulimit -s reports 8192)
    • The result is non-deterministic. Re-running the executable can succeed sometimes and fail sometimes. The larger the vector the more likely it is to fail.
    • The result depends on whether optimization is enabled.
    opened by sjackman 5
  • Implement a `cov_to_corr()` method

    Implement a `cov_to_corr()` method

    I just had a situation where I wanted to convert from a covariance matrix to a correlation matrix. It would be neat if we could do that with one function call.

    This is as much a note to myself to implement this (when I get time) as it is a feature request.

    opened by multimeric 0
  • The assumed matrix layout for correlation is unintuitive

    The assumed matrix layout for correlation is unintuitive

    The docs for the correlation methods say:

    Let (r, o) be the shape of M:

    • r is the number of random variables;
    • o is the number of observations we have collected for each random variable.

    What this implicitly says is that "M should be a matrix with r rows, corresponding to random variables, and o columns, corresponding to observations". We know this because ndarray has an explicit definition for rows and columns, whereby the first axis refers to the rows and the second axis is called the column axis. For example refer to nrows and ncols functions.

    However I find this assumption is counter-intuitive. The convention in my experience is to use the "tidy" layout which is that each row corresponds to an observation and each column corresponds to a variable. I refer here to Hadley Wickham's work, and this figure (e.g. here): image.

    Also this is how R works:

    > mat
         [,1] [,2]
    [1,]    1    5
    [2,]    2    6
    [3,]    3    7
    [4,]    4    8
    > nrow(mat)
    [1] 4
    > ncol(mat)
    [1] 2
    > cov(mat)
             [,1]     [,2]
    [1,] 1.666667 1.666667
    [2,] 1.666667 1.666667
    

    Thirdly, in terms of the Rust data science ecosystem, note that polars (as far as I know, the best supported data frame library in Rust) outputs matricies with the same assumptions. If you create a DataFrame with 2 series (which correspond to variables) and 3 rows, and run .to_ndarray(), you will get a (3, 2) ndarray. Then when you call .cov() on it, you will get something that is not the covariance matrix that you are after.

    One argument in the defence of the current method is numpy.cov, which makes the same assumption, as it takes:

    A 1-D or 2-D array containing multiple variables and observations. Each row of m represents a variable, and each column a single observation of all those variables.

    My suggestions is therefore to consider reversing the assumed dimensions for these methods in the next major (breaking) release. I realise that using .t() is not a difficult thing to do, but unfortunately forgetting to do this in your code will result in a valid matrix that may continue into downstream code without the user realising that it is not the correct covariance matrix. This happened to me and I'd like to spare other users from this issue.

    opened by multimeric 2
Releases(0.2.0)
  • 0.2.0(Apr 13, 2019)

    • New functionality:
      • Summary statistics:
        • Harmonic mean
        • Geometric mean
        • Central moments
        • Kurtosis
        • Skewness
      • Information theory:
        • Entropy
        • Cross-entropy
        • Kullback-Leibler divergence
      • Quantiles and order statistics:
        • argmin / argmin_skipnan
        • argmax / argmax_skipnan
        • Optimized bulk quantile computation (quantiles_mut, quantiles_axis_mut)
    • Fixes:
      • Reduced occurrences of overflow for interpolate::midpoint
    • Improvements / breaking changes:
      • Redesigned error handling across the whole crate, standardising on Result
      • All ndarray-stats' extension traits are now impossible to implement by users of the library (see [#34])

    Contributors: @jturner314, @LukeMathWalker, @phungleson, @munckymagik

    Source code(tar.gz)
    Source code(zip)
This repository features a simple Kalman filter and RTS smoother (KFS) implementation in Rust by using the ndarray library.

Kalman filter and RTS smoother in Rust (ndarray) This repository features a simple Kalman filter and RTS smoother (KFS) implementation in Rust by usin

SPDEs 3 Dec 1, 2022
Statistical computation library for Rust

statrs Current Version: v0.15.0 Should work for both nightly and stable Rust. NOTE: While I will try to maintain backwards compatibility as much as po

null 385 Jan 4, 2023
m2cgen (Model 2 Code Generator) - is a lightweight library which provides an easy way to transpile trained statistical models into a native code

Transform ML models into a native code (Java, C, Python, Go, JavaScript, Visual Basic, C#, R, PowerShell, PHP, Dart, Haskell, Ruby, F#, Rust) with zero dependencies

Bayes' Witnesses 2.3k Dec 31, 2022
Easy c̵̰͠r̵̛̠ö̴̪s̶̩̒s̵̭̀-t̶̲͝h̶̯̚r̵̺͐e̷̖̽ḁ̴̍d̶̖̔ ȓ̵͙ė̶͎ḟ̴͙e̸̖͛r̶̖͗ë̶̱́ṉ̵̒ĉ̷̥e̷͚̍ s̷̹͌h̷̲̉a̵̭͋r̷̫̊ḭ̵̊n̷̬͂g̵̦̃ f̶̻̊ơ̵̜ṟ̸̈́ R̵̞̋ù̵̺s̷̖̅ţ̸͗!̸̼͋

Rust S̵̓i̸̓n̵̉ I̴n̴f̶e̸r̵n̷a̴l mutability! Howdy, friendly Rust developer! Ever had a value get m̵̯̅ð̶͊v̴̮̾ê̴̼͘d away right under your nose just when

null 294 Dec 23, 2022
Rust edit distance routines accelerated using SIMD. Supports fast Hamming, Levenshtein, restricted Damerau-Levenshtein, etc. distance calculations and string search.

triple_accel Rust edit distance routines accelerated using SIMD. Supports fast Hamming, Levenshtein, restricted Damerau-Levenshtein, etc. distance cal

Daniel Liu 75 Jan 8, 2023
This crate provides high-performance formatting and parsing routines for ISO8061 timestamps

ISO8061 Timestamp This crate provides high-performance formatting and parsing routines for ISO8061 timestamps, primarily focused on UTC values but wit

Lantern 4 Sep 21, 2022
ndarray: an N-dimensional array with array views, multidimensional slicing, and efficient operations

ndarray The ndarray crate provides an n-dimensional container for general elements and for numerics. Please read the API documentation on docs.rs or t

null 2.6k Jan 7, 2023
Allows conversion between ndarray's types and image's types

ndarray-image Allows conversion between ndarray's types and image's types Deprecated WARNING: This crate is currently deprecated in favor of https://g

Rust Computer Vision 11 Jul 26, 2022
This repository features a simple Kalman filter and RTS smoother (KFS) implementation in Rust by using the ndarray library.

Kalman filter and RTS smoother in Rust (ndarray) This repository features a simple Kalman filter and RTS smoother (KFS) implementation in Rust by usin

SPDEs 3 Dec 1, 2022
Statistical computation library for Rust

statrs Current Version: v0.13.0 Should work for both nightly and stable Rust. NOTE: While I will try to maintain backwards compatibility as much as po

Michael Ma 384 Dec 27, 2022
Statistical computation library for Rust

statrs Current Version: v0.15.0 Should work for both nightly and stable Rust. NOTE: While I will try to maintain backwards compatibility as much as po

null 385 Jan 4, 2023
m2cgen (Model 2 Code Generator) - is a lightweight library which provides an easy way to transpile trained statistical models into a native code

Transform ML models into a native code (Java, C, Python, Go, JavaScript, Visual Basic, C#, R, PowerShell, PHP, Dart, Haskell, Ruby, F#, Rust) with zero dependencies

Bayes' Witnesses 2.3k Dec 31, 2022
COINGATOR: a statistical rusty searcher 𝗶𝗻 𝗿𝘂𝘀𝘁

?? ?? COINGATOR: a statistical rusty searcher ?? ✨ (need to commit the rest of this work: i will add everything when i am back from vacation) tl; dr ?

go outside labs 4 Jan 11, 2023