Statistical computation library for Rust

Overview

statrs

Build Status MIT licensed Crates.io

Current Version: v0.15.0

Should work for both nightly and stable Rust.

NOTE: While I will try to maintain backwards compatibility as much as possible, since this is still a 0.x.x project the API is not considered stable and thus subject to possible breaking changes up until v1.0.0

Description

Statrs provides a host of statistical utilities for Rust scientific computing. Included are a number of common distributions that can be sampled (i.e. Normal, Exponential, Student's T, Gamma, Uniform, etc.) plus common statistical functions like the gamma function, beta function, and error function.

This library is a work-in-progress port of the statistical capabilities in the C# Math.NET library. All unit tests in the library borrowed from Math.NET when possible and filled-in when not.

This library is a work-in-progress and not complete. Planned for future releases are continued implementations of distributions as well as porting over more statistical utilities

Please check out the documentation here

Usage

Add the most recent release to your Cargo.toml

[dependencies]
statrs = "0.15"

Examples

Statrs comes with a number of commonly used distributions including Normal, Gamma, Student's T, Exponential, Weibull, etc. The common use case is to set up the distributions and sample from them which depends on the Rand crate for random number generation

use statrs::distribution::Exp;
use rand::distributions::Distribution;

let mut r = rand::rngs::OsRng;
let n = Exp::new(0.5).unwrap();
print!("{}", n.sample(&mut r);

Statrs also comes with a number of useful utility traits for more detailed introspection of distributions

use statrs::distribution::{Exp, Continuous, ContinuousCDF};
use statrs::statistics::Distribution;

let n = Exp::new(1.0).unwrap();
assert_eq!(n.mean(), Some(1.0));
assert_eq!(n.variance(), Some(1.0));
assert_eq!(n.entropy(), Some(1.0));
assert_eq!(n.skewness(), Some(2.0));
assert_eq!(n.cdf(1.0), 0.6321205588285576784045);
assert_eq!(n.pdf(1.0), 0.3678794411714423215955);

as well as utility functions including erf, gamma, ln_gamma, beta, etc.

use statrs::statistics::Distribution;
use statrs::distribution::FisherSnedecor;

let n = FisherSnedecor::new(1.0, 1.0).unwrap();
assert!(n.variance().is_none());

Contributing

Want to contribute? Check out some of the issues marked help wanted

How to contribute

Clone the repo:

git clone https://github.com/statrs-dev/statrs

Create a feature branch:

git checkout -b <feature_branch> master

After commiting your code:

git push -u origin <feature_branch>

Then submit a PR, preferably referencing the relevant issue.

Style

This repo makes use of rustfmt with the configuration specified in rustfmt.toml. See https://github.com/rust-lang-nursery/rustfmt for instructions on installation and usage and run the formatter using rustfmt --write-mode overwrite *.rs in the src directory before committing.

Commit messages

Please be explicit and and purposeful with commit messages.

Bad

Modify test code

Good

test: Update statrs::distribution::Normal test_cdf
Comments
  • Review of iterator statistics trait

    Review of iterator statistics trait

    Currently the iterator statistics trait is treated as a special case since to act over the iterator the methods need to take a mutable reference, so all the traits from statrs::statistics are (going to be) combined in the IterStatistics trait that is implemented for all Iterators. I haven't come up with a better solution but for some reason this implementation doesn't sit too well with me and I'd love to have someone review it and provide feedback.

    help wanted 
    opened by boxtown 20
  • License compliance issue

    License compliance issue

    I would like to bring to your attention the fact that your licensing is incompatible with your dependencies.

    Your crate is MIT licensed but depends upon nalgebra which is Apache-2.0 licensed only. While Apache-2.0 projects can use MIT licensed components, the reverse is not so.

    Consider for instance that MIT is GPL v3 compatible, while Apache-2.0 is not; Your crate being MIT licensed is thus misleading to any potential GPL-v3 projects that consider using your crate - they'd end up constructing a non-compliant product.

    opened by jnqnfe 9
  • Update to nalgebra 0.27.1 to avoid RUSTSEC-2021-0070

    Update to nalgebra 0.27.1 to avoid RUSTSEC-2021-0070

    statrs's latest version is 0.14.0. (Even though the website says it is 0.13.0, in the README.md and the link to the docs in the "About" section at the top right.)

    statrs 0.14.0 depends on nalgebra 0.26. nalgebra has a RUSTSEC-2021-0070 advisory against it. Among other things, this causes cargo deny to fail.

    Version 0.27.1 of nalgebra fixes the advisory.

    It would be very helpful if statrs could have its dependency on nalgebra updated to 0.27.1, and then a new version of statrs (0.14.1 or 0.15.0) be released. Thank you.

    opened by nnethercote 8
  • Fix a bug in the uniform continuous distribution

    Fix a bug in the uniform continuous distribution

    Hi and thanks for the library!

    Problem

    I think there's a bug in the implementation of the continuous uniform distribution. According to the documentation, it should return values in the range [min, max], i.e. min ≤ random value ≤ max.

    Unfortunately, the current implementation generates values in the range [min, max + 1), i.e. min ≤ random value < max + 1.

    I think it might be a copy-paste bug 'inherited' from the discrete uniform distribution, where you really need to add 1 to the upper bound.

    Solution

    This PR is a partial fix for this bug: It changes the range of the continuous uniform distribution to [min, max), i.e. min ≤ random value < max. I don't have a quick fix to include the upper bound, but I think it's important to at least fix the + 1.0 issue. As of 0.7.0, you cannot really sample values between 0 and 1 without resorting to wild workarounds.

    References

    opened by mp4096 8
  • Release request

    Release request

    Hello! This is a (hopefully) polite request to check if statrs is in a state ready to have a release cut? It's been 10mo according to crates.io and I for one am eager to get off rand 0.7. Thanks again for all your hard work!

    opened by nlhepler 7
  • [RFC] Student's T inverse CDF

    [RFC] Student's T inverse CDF

    This branch is not fit for merging (see below), but I wanted to gauge interest in having this functionality in statrs.

    Issues with the implementation in this branch:

    • [ ] Ignores location and scale parameters (they're assumed to be 0 and 1 respectively)
    • [x] Pulls in "special" and "approx" as deps
    • [x] No CheckedInverseCDF impl
    • [ ] 400 lines of unit tests is a bit much
    • [ ] Docs don't describe the formula
    opened by asayers 7
  • Error handling: Panics vs Result

    Error handling: Panics vs Result

    Currently the responsibility for guarding against exceptional cases (e.g. input not in valid domain, mathematically invalid operations etc) is passed to the user. We panic when an operation does not make mathematical sense (e.g. calculating the cumulative distribution function for discrete distributions at a negative input) which forces users to double check to make sure their inputs are valid. While this results in technically correct and predictable behavior from the API, I'm not sure if it's ergonomic or idiomatic and have been mulling over possibly introducing a Result based API either replacing or in addition to the stricter panic based API. This however warrants some discussion and I would love feedback from the community

    help wanted discussion 
    opened by boxtown 7
  • Update to rand >= 0.8

    Update to rand >= 0.8

    Currently, statrs relies on rand 0.7 and nalgebra 0.23 (which itself relies on rand 0.7). The newer releases of nalgebra update their dependency to the latest rand, which is currently 0.8.3. One of the minor, but frustrating, differences between 0.7 and 0.8 is a change in the syntax of the rand_range function, from taking 2 arguments to taking a single range argument.

    It would be great if statrs could be updated to rely on a newer nalgebra and a newer statrs. Currently, if one is depending on rand >= 0.8 (or is depending on any package that depends on this), then multiple different versions of rand are pulled down. Not only does this bloat the build, but it can run the risk of confusing the compiler about definitions that appear in both versions of the package.

    The actual changes to conform with the new API interface of rnd_range are small, but I think there are some other changes that would need to be made, since the version of nalgebra should be bumped (but perhaps not to the very latest (0.26.1), unless other changes are made because they have deprecated some interfaces that are currently used in statrs).

    opened by rob-p 6
  • Use dev-dependencies for random number generation

    Use dev-dependencies for random number generation

    I just saw this in the code:

        #[ignore]
        #[test]
        fn test_mean_variance_stability() {
            // TODO: Implement tests. Depends on Mersenne Twister RNG implementation.
            // Currently hesistant to bring extra dependency just for test
        }
    

    You can add dependencies to the Cargo.toml that are only used when running the tests, but not when using the library as a dependency: http://doc.crates.io/specifying-dependencies.html#development-dependencies

    enhancement help wanted 
    opened by vks 6
  • Allow vector of floats for mean in statsrs::distribution::Normal (as per numpy.random.normal)

    Allow vector of floats for mean in statsrs::distribution::Normal (as per numpy.random.normal)

    The code below understandably runs into a type mismatch as the argument mean of statsrs::distribution::Normal requires a f64.

    Similar to np.random.normal in Python's numpy package, it would be good to add support for Vector arguments of mean in statsrs.

    Rust code (doesn't work due to type mismatch in let n...)

    let x0: Vec<f64> = thread_rng().sample_iter(Standard).take(200).collect();
    
    let endpoint_mean: Vec<_> = x0
      .iter()
      .map(|&x| x * (-0.5).exp())
      .collect();
    
    let endpoint_variance: f64 =
      (SIGMA.pow(2) as f64 /  (1.0 - (-1.0).exp())).sqrt();
    
    // this should output a vector n of some arbitrary length (here, 200). the argument endpoint_mean could be a Vec<f64> (as per numpy), but currently must be f64.
    let n = Normal::new(endpoint_mean, endpoint_variance).unwrap();
    

    Python code (works with an array of floats)

    x0 = np.random.normal(loc = 0, scale = 1, size = 10000) // initial points, outputs a 10_000 vector
    mean = X0*np.exp(-0.5) // 10_000 vector 
    variance = np.sqrt(4/(1-np.exp(-1)) // = 1.5901201952413002
    xt = np.random.normal(m,v) // = 10_000 vector
    
    opened by 0jg 5
  • Removes gamma special cases

    Removes gamma special cases

    Removed also the tests which instantiated gamma with infinity as a parameter. Tests are still failing, but don't know what the motivation is behind those numbers. If it doesn't matter too much I'll just update the testvalues.

    do not merge 
    opened by ghost 5
  • Mutable/Movable parameters for multivariate normal

    Mutable/Movable parameters for multivariate normal

    Hello, I'm studying the possibility of using this crate for Markov Chain Monte Carlo (MCMC) based inference. In this use case, the log-density of a distribution is evaluated repeatedly at different parameter values. To do that currently, the crate requires re-creating the distributions at each iteration. This isn't much of a problem for scalar distributions, but for the multivariate normal, I have to re-allocate the mean vector and covariance matrix at each iteration (since distributions are immutable), which impacts performance.

    Allowing the user to re-set the parameters separately would work:

    pub fn set_mean(&mut self, mu : &[f64]);
    pub fn set_variance(&mut self, var : &[f64]);
    

    But a solution that moves the parameters out of the struct would also work (therefore preserving the intended immutable API):

    pub fn take_parameters(self) -> (DVector<f64>, DMatrix<f64>);
    

    Are there any plans to offer something like that?

    opened by limads 0
  • Add CDF for multivariate normal

    Add CDF for multivariate normal

    Multivariate Normal CDF

    Implements the multivariate normal CDF

    Algorithm

    Uses the algorithm as explained in Section 4.2.2 in Computation of Multivariate Normal and t Probabilities by Alan Genz and Frank Bretz, together with the cholesky decomposition with dynamic changing of rows explained in Section 4.1.3. Specifically we use a Quasi Monte Carlo method.

    Additions

    • Trait ContinuousMultivariateCDF in mod.rs
    • Module MultivariateUniform in multivariate_uniform.rs (mainly for me wanting an in-house way to get uniform distribution in $[0,1]^n$). Implements mean, mode, pdf, cdf, min, max, ln_pdf.
    • Function chol_chrows for computing the Cholesky decomposition dynamically whilst changing rows for better integration limits
    • Function integrate_pdf to integrate a multivariate pdf between limits a and b
    • Implement ContinuousMultivariateCDF for MultivariateNormal (and MultivariateUniform), where cdf uses integrate_pdf with left limit a=[f64::NEG_INFINITY; dim] and right limit x=b
    • Tests cdf against scipy.stats.multivariate_normal.cdf in python, as well as MvNormalCDF in Julia
    • Import crate primes for generating first $n$ primes as Richtmyer generators in the Quasi MC algorithm
    opened by henryjac 0
  • Multivariate students t distribution

    Multivariate students t distribution

    Implementation of Multivariate students t distribution in a very similar way to Multivariate normal. Includes sampling, mean, covariance, mode, pdf and log pdf functions.

    Testing with exact values from python scipy.stats.multivariate_t functions. Large degrees of freedoms does not work yet

    opened by henryjac 0
  • Calculate the coefficient of variation without calculating the mean twice?

    Calculate the coefficient of variation without calculating the mean twice?

    opened by Boscop 0
  • Mention quantile function in docstring

    Mention quantile function in docstring

    First, of all, thanks for making and maintaining this project. I'm trying to make a wasm project and I would have probably given up without statrs.

    Anyway, I'm always confused by the many synonyms used in statistics. This PR suggests to add one synonym to the inverse_cdf to the docstring of the normal distribution. This should make it at least visible when searching the docs?

    Adding the note to all inverse_cdf implementations seems a bit redundant so that's why I kept it to the Normal for now.

    opened by rikhuijzer 0
Releases(v0.15.0)
Statistical routines for ndarray

ndarray-stats This crate provides statistical methods for ndarray's ArrayBase type. Currently available routines include: order statistics (minimum, m

null 150 Dec 26, 2022
Msgpack serialization/deserialization library for Python, written in Rust using PyO3, and rust-msgpack. Reboot of orjson. msgpack.org[Python]

ormsgpack ormsgpack is a fast msgpack library for Python. It is a fork/reboot of orjson It serializes faster than msgpack-python and deserializes a bi

Aviram Hassan 139 Dec 30, 2022
A Rust library with homemade machine learning models to classify the MNIST dataset. Built in an attempt to get familiar with advanced Rust concepts.

mnist-classifier Ideas UPDATED: Finish CLI Flags Parallelize conputationally intensive functions Class-based naive bayes README Image parsing Confusio

Neil Kaushikkar 0 Sep 2, 2021
Machine Learning library for Rust

rusty-machine This library is no longer actively maintained. The crate is currently on version 0.5.4. Read the API Documentation to learn more. And he

James Lucas 1.2k Dec 31, 2022
Rust library for Self Organising Maps (SOM).

RusticSOM Rust library for Self Organising Maps (SOM). Using this Crate Add rusticsom as a dependency in Cargo.toml [dependencies] rusticsom = "1.1.0"

Avinash Shenoy 26 Oct 17, 2022
Rust numeric library with R, MATLAB & Python syntax

Peroxide Rust numeric library contains linear algebra, numerical analysis, statistics and machine learning tools with R, MATLAB, Python like macros. W

Tae Geun Kim 351 Dec 29, 2022
A deep learning library for rust

Alumina An experimental deep learning library written in pure rust. Breakage expected on each release in the short term. See mnist.rs in examples or R

zza 95 Nov 30, 2022
Machine Learning Library for Rust

autograph Machine Learning Library for Rust undergoing maintenance Features Portable accelerated compute Run SPIR-V shaders on GPU's that support Vulk

null 223 Jan 1, 2023
Simple neural network library for classification written in Rust.

Cogent A note I continue working on GPU stuff, I've made some interesting things there, but ultimately it made me realise this is far too monumental a

Jonathan Woollett-Light 41 Dec 25, 2022
Rust wrapper for the Fast Artificial Neural Network library

fann-rs Rust wrapper for the Fast Artificial Neural Network (FANN) library. This crate provides a safe interface to FANN on top of the low-level bindi

Andreas Fackler 12 Jul 17, 2022
RustFFT is a high-performance FFT library written in pure Rust.

RustFFT is a high-performance FFT library written in pure Rust. It can compute FFTs of any size, including prime-number sizes, in O(nlogn) time.

Elliott Mahler 411 Jan 9, 2023
Rust crate to create Anki decks. Based on the python library genanki

genanki-rs: A Rust Crate for Generating Anki Decks With genanki-rs you can easily generate decks for the popular open source flashcard platform Anki.

Yannick Funk 63 Dec 23, 2022
Generic Automatic Differentiation library for Rust (aka "autograd")

GAD: Generic Automatic Differentiation for Rust This project aims to provide a general and extensible framework for tape-based automatic differentiati

Facebook Research 24 Dec 20, 2022
l2 is a fast, Pytorch-style Tensor+Autograd library written in Rust

l2 • ?? A Pytorch-style Tensor+Autograd library written in Rust Installation • Contributing • Authors • License • Acknowledgements Made by Bilal Khan

Bilal Khan 163 Dec 25, 2022
Reinforcement learning library written in Rust

REnforce Reinforcement library written in Rust This library is still in early stages, and the API has not yet been finalized. The documentation can be

Niven Achenjang 20 Jun 14, 2022
Rust library for genetic algorithms

Spiril Spiril is an implementation of a genetic algorithm for obtaining optimum variables (genetics) for a task through mutation and natural selection

Ashley Jeffs 25 Apr 29, 2022
🚀 efficient approximate nearest neighbor search algorithm collections library written in Rust 🦀 .

?? efficient approximate nearest neighbor search algorithm collections library written in Rust ?? .

Hora-Search 2.3k Jan 3, 2023
miniature: a toy deep learning library written in Rust

miniature: a toy deep learning library written in Rust A miniature is a toy deep learning library written in Rust. The miniature is: implemented for a

Takuma Seno 4 Nov 29, 2021
A rust library inspired by kDDBSCAN clustering algorithm

kddbscan-rs Rust implementation of the kddbscan clustering algorithm. From the authors of kDDBSCAN algorithm. Due to the adoption of global parameters

WhizSid 2 Apr 28, 2021