Statistical computation library for Rust

Related tags

Computation statrs
Overview

statrs

Build Status Codecov MIT licensed Crates.io

Current Version: v0.13.0

Should work for both nightly and stable Rust.

NOTE: While I will try to maintain backwards compatibility as much as possible, since this is still a 0.x.x project the API is not considered stable and thus subject to possible breaking changes up until v1.0.0

Description

Statrs provides a host of statistical utilities for Rust scientific computing. Included are a number of common distributions that can be sampled (i.e. Normal, Exponential, Student's T, Gamma, Uniform, etc.) plus common statistical functions like the gamma function, beta function, and error function.

This library is a work-in-progress port of the statistical capabilities in the C# Math.NET library. All unit tests in the library borrowed from Math.NET when possible and filled-in when not.

This library is a work-in-progress and not complete. Planned for future releases are continued implementations of distributions as well as porting over more statistical utilities

Please check out the documentation here

Usage

Add the following to your Cargo.toml

[dependencies]
statrs = "0.13.0"

and this to your crate root

extern crate statrs;

Examples

Statrs v0.13.0 comes with a number of commonly used distributions including Normal, Gamma, Student's T, Exponential, Weibull, etc. The common use case is to set up the distributions and sample from them which depends on the Rand crate for random number generation

use rand;
use statrs::distribution::{Exponential, Distribution};

let mut r = rand::StdRng::new().unwrap();
let n = Exponential::new(0.5).unwrap();
print!("{}", n.Sample::<StdRng>(&mut r);

Statrs also comes with a number of useful utility traits for more detailed introspection of distributions

use statrs::distribution::{Exponential, Univariate, Continuous};
use statrs::statistics::{Mean, Variance, Entropy, Skewness};

let n = Exponential::new(1.0).unwrap();
assert_eq!(n.mean(), 1.0);
assert_eq!(n.variance(), 1.0);
assert_eq!(n.entropy(), 1.0);
assert_eq!(n.skewness(), 2.0);
assert_eq!(n.cdf(1.0), 0.6321205588285576784045);
assert_eq!(n.pdf(1.0), 0.3678794411714423215955);

as well as utility functions including erf, gamma, ln_gamma, beta, etc.

For functions or methods with failure modes, Statrs provides a checked and unchecked interface. The unchecked interface will panic on an error while the checked interface returns a Result.

use statrs::statistics::CheckedVariance;
use statrs::distribution::FisherSnedecor;

let n = FisherSnedecor::new(1.0, 1.0).unwrap();
assert!(n.checked_variance().is_err());
// n.variance(); // uncomment this line to see it panic

Contributing

Want to contribute? Check out some of the issues marked help wanted

How to contribute

Clone the repo:

git clone https://github.com/boxtown/statrs

Create a feature branch:

git checkout -b <feature_branch> master

After commiting your code:

git push -u origin <feature_branch>

Then submit a PR, preferably referencing the relevant issue.

Style

This repo makes use of rustfmt with the configuration specified in rustfmt.toml. See https://github.com/rust-lang-nursery/rustfmt for instructions on installation and usage and run the formatter using rustfmt --write-mode overwrite *.rs in the src directory before committing.

Commit messages

Please be explicit and and purposeful with commit messages.

Bad

Modify test code

Good

test: Update statrs::distribution::Normal test_cdf
Comments
  • Review of iterator statistics trait

    Review of iterator statistics trait

    Currently the iterator statistics trait is treated as a special case since to act over the iterator the methods need to take a mutable reference, so all the traits from statrs::statistics are (going to be) combined in the IterStatistics trait that is implemented for all Iterators. I haven't come up with a better solution but for some reason this implementation doesn't sit too well with me and I'd love to have someone review it and provide feedback.

    help wanted 
    opened by boxtown 20
  • License compliance issue

    License compliance issue

    I would like to bring to your attention the fact that your licensing is incompatible with your dependencies.

    Your crate is MIT licensed but depends upon nalgebra which is Apache-2.0 licensed only. While Apache-2.0 projects can use MIT licensed components, the reverse is not so.

    Consider for instance that MIT is GPL v3 compatible, while Apache-2.0 is not; Your crate being MIT licensed is thus misleading to any potential GPL-v3 projects that consider using your crate - they'd end up constructing a non-compliant product.

    opened by jnqnfe 9
  • Update to nalgebra 0.27.1 to avoid RUSTSEC-2021-0070

    Update to nalgebra 0.27.1 to avoid RUSTSEC-2021-0070

    statrs's latest version is 0.14.0. (Even though the website says it is 0.13.0, in the README.md and the link to the docs in the "About" section at the top right.)

    statrs 0.14.0 depends on nalgebra 0.26. nalgebra has a RUSTSEC-2021-0070 advisory against it. Among other things, this causes cargo deny to fail.

    Version 0.27.1 of nalgebra fixes the advisory.

    It would be very helpful if statrs could have its dependency on nalgebra updated to 0.27.1, and then a new version of statrs (0.14.1 or 0.15.0) be released. Thank you.

    opened by nnethercote 8
  • Fix a bug in the uniform continuous distribution

    Fix a bug in the uniform continuous distribution

    Hi and thanks for the library!

    Problem

    I think there's a bug in the implementation of the continuous uniform distribution. According to the documentation, it should return values in the range [min, max], i.e. min ≤ random value ≤ max.

    Unfortunately, the current implementation generates values in the range [min, max + 1), i.e. min ≤ random value < max + 1.

    I think it might be a copy-paste bug 'inherited' from the discrete uniform distribution, where you really need to add 1 to the upper bound.

    Solution

    This PR is a partial fix for this bug: It changes the range of the continuous uniform distribution to [min, max), i.e. min ≤ random value < max. I don't have a quick fix to include the upper bound, but I think it's important to at least fix the + 1.0 issue. As of 0.7.0, you cannot really sample values between 0 and 1 without resorting to wild workarounds.

    References

    opened by mp4096 8
  • Release request

    Release request

    Hello! This is a (hopefully) polite request to check if statrs is in a state ready to have a release cut? It's been 10mo according to crates.io and I for one am eager to get off rand 0.7. Thanks again for all your hard work!

    opened by nlhepler 7
  • [RFC] Student's T inverse CDF

    [RFC] Student's T inverse CDF

    This branch is not fit for merging (see below), but I wanted to gauge interest in having this functionality in statrs.

    Issues with the implementation in this branch:

    • [ ] Ignores location and scale parameters (they're assumed to be 0 and 1 respectively)
    • [x] Pulls in "special" and "approx" as deps
    • [x] No CheckedInverseCDF impl
    • [ ] 400 lines of unit tests is a bit much
    • [ ] Docs don't describe the formula
    opened by asayers 7
  • Error handling: Panics vs Result

    Error handling: Panics vs Result

    Currently the responsibility for guarding against exceptional cases (e.g. input not in valid domain, mathematically invalid operations etc) is passed to the user. We panic when an operation does not make mathematical sense (e.g. calculating the cumulative distribution function for discrete distributions at a negative input) which forces users to double check to make sure their inputs are valid. While this results in technically correct and predictable behavior from the API, I'm not sure if it's ergonomic or idiomatic and have been mulling over possibly introducing a Result based API either replacing or in addition to the stricter panic based API. This however warrants some discussion and I would love feedback from the community

    help wanted discussion 
    opened by boxtown 7
  • Update to rand >= 0.8

    Update to rand >= 0.8

    Currently, statrs relies on rand 0.7 and nalgebra 0.23 (which itself relies on rand 0.7). The newer releases of nalgebra update their dependency to the latest rand, which is currently 0.8.3. One of the minor, but frustrating, differences between 0.7 and 0.8 is a change in the syntax of the rand_range function, from taking 2 arguments to taking a single range argument.

    It would be great if statrs could be updated to rely on a newer nalgebra and a newer statrs. Currently, if one is depending on rand >= 0.8 (or is depending on any package that depends on this), then multiple different versions of rand are pulled down. Not only does this bloat the build, but it can run the risk of confusing the compiler about definitions that appear in both versions of the package.

    The actual changes to conform with the new API interface of rnd_range are small, but I think there are some other changes that would need to be made, since the version of nalgebra should be bumped (but perhaps not to the very latest (0.26.1), unless other changes are made because they have deprecated some interfaces that are currently used in statrs).

    opened by rob-p 6
  • Use dev-dependencies for random number generation

    Use dev-dependencies for random number generation

    I just saw this in the code:

        #[ignore]
        #[test]
        fn test_mean_variance_stability() {
            // TODO: Implement tests. Depends on Mersenne Twister RNG implementation.
            // Currently hesistant to bring extra dependency just for test
        }
    

    You can add dependencies to the Cargo.toml that are only used when running the tests, but not when using the library as a dependency: http://doc.crates.io/specifying-dependencies.html#development-dependencies

    enhancement help wanted 
    opened by vks 6
  • Allow vector of floats for mean in statsrs::distribution::Normal (as per numpy.random.normal)

    Allow vector of floats for mean in statsrs::distribution::Normal (as per numpy.random.normal)

    The code below understandably runs into a type mismatch as the argument mean of statsrs::distribution::Normal requires a f64.

    Similar to np.random.normal in Python's numpy package, it would be good to add support for Vector arguments of mean in statsrs.

    Rust code (doesn't work due to type mismatch in let n...)

    let x0: Vec<f64> = thread_rng().sample_iter(Standard).take(200).collect();
    
    let endpoint_mean: Vec<_> = x0
      .iter()
      .map(|&x| x * (-0.5).exp())
      .collect();
    
    let endpoint_variance: f64 =
      (SIGMA.pow(2) as f64 /  (1.0 - (-1.0).exp())).sqrt();
    
    // this should output a vector n of some arbitrary length (here, 200). the argument endpoint_mean could be a Vec<f64> (as per numpy), but currently must be f64.
    let n = Normal::new(endpoint_mean, endpoint_variance).unwrap();
    

    Python code (works with an array of floats)

    x0 = np.random.normal(loc = 0, scale = 1, size = 10000) // initial points, outputs a 10_000 vector
    mean = X0*np.exp(-0.5) // 10_000 vector 
    variance = np.sqrt(4/(1-np.exp(-1)) // = 1.5901201952413002
    xt = np.random.normal(m,v) // = 10_000 vector
    
    opened by 0jg 5
  • Removes gamma special cases

    Removes gamma special cases

    Removed also the tests which instantiated gamma with infinity as a parameter. Tests are still failing, but don't know what the motivation is behind those numbers. If it doesn't matter too much I'll just update the testvalues.

    do not merge 
    opened by ghost 5
  • Mutable/Movable parameters for multivariate normal

    Mutable/Movable parameters for multivariate normal

    Hello, I'm studying the possibility of using this crate for Markov Chain Monte Carlo (MCMC) based inference. In this use case, the log-density of a distribution is evaluated repeatedly at different parameter values. To do that currently, the crate requires re-creating the distributions at each iteration. This isn't much of a problem for scalar distributions, but for the multivariate normal, I have to re-allocate the mean vector and covariance matrix at each iteration (since distributions are immutable), which impacts performance.

    Allowing the user to re-set the parameters separately would work:

    pub fn set_mean(&mut self, mu : &[f64]);
    pub fn set_variance(&mut self, var : &[f64]);
    

    But a solution that moves the parameters out of the struct would also work (therefore preserving the intended immutable API):

    pub fn take_parameters(self) -> (DVector<f64>, DMatrix<f64>);
    

    Are there any plans to offer something like that?

    opened by limads 0
  • Add CDF for multivariate normal

    Add CDF for multivariate normal

    Multivariate Normal CDF

    Implements the multivariate normal CDF

    Algorithm

    Uses the algorithm as explained in Section 4.2.2 in Computation of Multivariate Normal and t Probabilities by Alan Genz and Frank Bretz, together with the cholesky decomposition with dynamic changing of rows explained in Section 4.1.3. Specifically we use a Quasi Monte Carlo method.

    Additions

    • Trait ContinuousMultivariateCDF in mod.rs
    • Module MultivariateUniform in multivariate_uniform.rs (mainly for me wanting an in-house way to get uniform distribution in $[0,1]^n$). Implements mean, mode, pdf, cdf, min, max, ln_pdf.
    • Function chol_chrows for computing the Cholesky decomposition dynamically whilst changing rows for better integration limits
    • Function integrate_pdf to integrate a multivariate pdf between limits a and b
    • Implement ContinuousMultivariateCDF for MultivariateNormal (and MultivariateUniform), where cdf uses integrate_pdf with left limit a=[f64::NEG_INFINITY; dim] and right limit x=b
    • Tests cdf against scipy.stats.multivariate_normal.cdf in python, as well as MvNormalCDF in Julia
    • Import crate primes for generating first $n$ primes as Richtmyer generators in the Quasi MC algorithm
    opened by henryjac 0
  • Multivariate students t distribution

    Multivariate students t distribution

    Implementation of Multivariate students t distribution in a very similar way to Multivariate normal. Includes sampling, mean, covariance, mode, pdf and log pdf functions.

    Testing with exact values from python scipy.stats.multivariate_t functions. Large degrees of freedoms does not work yet

    opened by henryjac 0
  • Calculate the coefficient of variation without calculating the mean twice?

    Calculate the coefficient of variation without calculating the mean twice?

    opened by Boscop 0
  • Mention quantile function in docstring

    Mention quantile function in docstring

    First, of all, thanks for making and maintaining this project. I'm trying to make a wasm project and I would have probably given up without statrs.

    Anyway, I'm always confused by the many synonyms used in statistics. This PR suggests to add one synonym to the inverse_cdf to the docstring of the normal distribution. This should make it at least visible when searching the docs?

    Adding the note to all inverse_cdf implementations seems a bit redundant so that's why I kept it to the Normal for now.

    opened by rikhuijzer 0
Releases(v0.15.0)
Scientific Computing Library in Rust

SciRust Scientific computing library written in Rust programming language. The objective is to design a generic library which can be used as a backbon

In Digits 242 Dec 16, 2022
The write-once-run-anywhere GPGPU library for Rust

The old version of Emu (which used macros) is here. Overview Emu is a GPGPU library for Rust with a focus on portability, modularity, and performance.

Caleb Winston 1.5k Dec 30, 2022
Mathematical optimization in pure Rust

argmin A pure Rust optimization framework This crate offers a numerical optimization toolbox/framework written entirely in Rust. It is at the moment p

argmin 549 Jan 1, 2023
BLAS bindings for Rust

RBLAS Rust bindings and wrappers for BLAS (Basic Linear Algebra Subprograms). Overview RBLAS wraps each external call in a trait with the same name (b

Michael Yang 77 Oct 8, 2022
gmp bindings for rust

Documentation The following functions are intentionally left out of the bindings: gmp_randinit (not thread-safe, obsolete) mpz_random (not thread-safe

Bartłomiej Kamiński 37 Nov 5, 2022
Rust wrapper for ArrayFire

Arrayfire Rust Bindings ArrayFire is a high performance library for parallel computing with an easy-to-use API. It enables users to write scientific c

ArrayFire 696 Dec 30, 2022
Collection of Optimization algorithm in Rust

rustimization A rust optimization library which includes L-BFGS-B and Conjugate Gradient algorithm. Documentation The simplest way to use these optimi

Naushad Karim 47 Sep 23, 2022
Statistical computation library for Rust

statrs Current Version: v0.15.0 Should work for both nightly and stable Rust. NOTE: While I will try to maintain backwards compatibility as much as po

null 385 Jan 4, 2023
User-friendly secure computation engine based on secure multi-party computation

CipherCore If you have any questions, or, more generally, would like to discuss CipherCore, please join the Slack community. See a vastly extended ver

CipherMode Labs 356 Jan 5, 2023
Easy c̵̰͠r̵̛̠ö̴̪s̶̩̒s̵̭̀-t̶̲͝h̶̯̚r̵̺͐e̷̖̽ḁ̴̍d̶̖̔ ȓ̵͙ė̶͎ḟ̴͙e̸̖͛r̶̖͗ë̶̱́ṉ̵̒ĉ̷̥e̷͚̍ s̷̹͌h̷̲̉a̵̭͋r̷̫̊ḭ̵̊n̷̬͂g̵̦̃ f̶̻̊ơ̵̜ṟ̸̈́ R̵̞̋ù̵̺s̷̖̅ţ̸͗!̸̼͋

Rust S̵̓i̸̓n̵̉ I̴n̴f̶e̸r̵n̷a̴l mutability! Howdy, friendly Rust developer! Ever had a value get m̵̯̅ð̶͊v̴̮̾ê̴̼͘d away right under your nose just when

null 294 Dec 23, 2022
m2cgen (Model 2 Code Generator) - is a lightweight library which provides an easy way to transpile trained statistical models into a native code

Transform ML models into a native code (Java, C, Python, Go, JavaScript, Visual Basic, C#, R, PowerShell, PHP, Dart, Haskell, Ruby, F#, Rust) with zero dependencies

Bayes' Witnesses 2.3k Dec 31, 2022
Statistical routines for ndarray

ndarray-stats This crate provides statistical methods for ndarray's ArrayBase type. Currently available routines include: order statistics (minimum, m

null 150 Dec 26, 2022
COINGATOR: a statistical rusty searcher 𝗶𝗻 𝗿𝘂𝘀𝘁

?? ?? COINGATOR: a statistical rusty searcher ?? ✨ (need to commit the rest of this work: i will add everything when i am back from vacation) tl; dr ?

go outside labs 4 Jan 11, 2023
Extreme fast factor expression & computation library for quantitative trading in Python.

Extreme fast factor expression & computation library for quantitative trading in Python.

Weiyuan Wu 22 Dec 8, 2022
The Fast Vector Similarity Library is designed to provide efficient computation of various similarity measures between vectors.

Fast Vector Similarity Library Introduction The Fast Vector Similarity Library is designed to provide efficient computation of various similarity meas

Jeff Emanuel 243 Sep 6, 2023
A pure-rust(with zero dependencies) fenwick tree, for the efficient computation of dynamic prefix sums.

indexset A pure-rust(with zero dependencies, no-std) fenwick tree, for the efficient computation of dynamic prefix sums. Background Did you ever have

Bruno Rucy Carneiro Alves de Lima 2 Jul 13, 2023
A generic framework for on-demand, incrementalized computation. Inspired by adapton, glimmer, and rustc's query system.

salsa A generic framework for on-demand, incrementalized computation. Obligatory warning Very much a WORK IN PROGRESS at this point. Ready for experim

salsa 1.7k Jan 8, 2023
A simple mandelbrot-computation.

Mandelbrot set We consider the sequence $z_{n+1} = z_n^2 + c$, with $z_0=0$, where $c$ is a complex number. The Mandelbrot set are all $c$ such that t

Andreas Atle 1 Jan 27, 2022
Incremental computation through constrained memoization.

comemo Incremental computation through constrained memoization. [dependencies] comemo = "0.1" A memoized function caches its return values so that it

Typst 37 Dec 15, 2022
High-performance asynchronous computation framework for system simulation

Asynchronix A high-performance asynchronous computation framework for system simulation. What is this? Warning: this page is at the moment mostly addr

Asynchronics 7 Dec 7, 2022