sparse linear algebra library for rust

Overview

sprs, sparse matrices for Rust

Build status crates.io

sprs implements some sparse matrix data structures and linear algebra algorithms in pure Rust.

The API is a work in progress, and feedback on its rough edges is highly appreciated :)

Features

Structures

  • CSR/CSC matrix
  • triplet matrix
  • Sparse vector

Operations

  • sparse matrix / sparse vector product
  • sparse matrix / sparse matrix product
  • sparse matrix / sparse matrix addition, subtraction
  • sparse vector / sparse vector addition, subtraction, dot product
  • sparse/dense matrix operations

Algorithms

  • Outer iterator on compressed sparse matrices
  • sparse vector iteration
  • sparse vectors joint non zero iterations
  • simple sparse Cholesky decomposition (requires opting into an LGPL license)
  • sparse triangular solves with dense right-hand side

Examples

Matrix construction

  use sprs::{CsMat, CsMatOwned, CsVec};
  let eye : CsMatOwned<f64> = CsMat::eye(3);
  let a = CsMat::new_csc((3, 3),
                         vec![0, 2, 4, 5],
                         vec![0, 1, 0, 2, 2],
                         vec![1., 2., 3., 4., 5.]);

Matrix vector multiplication

  use sprs::{CsMat, CsVec};
  let eye = CsMat::eye(5);
  let x = CsVec::new(5, vec![0, 2, 4], vec![1., 2., 3.]);
  let y = &eye * &x;
  assert_eq!(x, y);

Matrix matrix multiplication, addition

  use sprs::{CsMat, CsVec};
  let eye = CsMat::eye(3);
  let a = CsMat::new_csc((3, 3),
                         vec![0, 2, 4, 5],
                         vec![0, 1, 0, 2, 2],
                         vec![1., 2., 3., 4., 5.]);
  let b = &eye * &a;
  assert_eq!(a, b.to_csr());

For a more complete example, be sure to check out the heat diffusion example.

Documentation

Documentation is available at docs.rs.

Changelog

See the changelog.

Minimum Supported Rust Version

The minimum supported Rust version currently is 1.49. Prior to a 1.0 version, bumping the MSRV will not be considered a breaking change, but breakage will be avoided on a best effort basis.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Please see the contribution guidelines for additional information about contributing.

Comments
  • Investigate possible issues with partial views

    Investigate possible issues with partial views

    As evidenced by #242 and #243, middle_outer_views is not tested enough, and edge cases may happen in numerous algorithms. It's necessary to write tests for many functions involving these partial views, to reveal the presence of possible bugs.

    opened by vbarrielle 20
  • Writing a Complex64-valued matrix in MatrixMarket format seems incompatible with scipy

    Writing a Complex64-valued matrix in MatrixMarket format seems incompatible with scipy

    I wrote a small matrix out in MatrixMarket format using sprs::io::... and the format I got had values of the format "1+2i", whereas with scipy, I get two columns, one with real, the other with imag. I attach sample out from each writer.

    %%MatrixMarket matrix coordinate complex general
    % written by sprs
    2 2 4
    1 1 1+0i
    1 2 0-2i
    2 1 0+2i
    2 2 1+0i
    

    and from scipy:

    %%MatrixMarket matrix coordinate complex hermitian
    %
    2 2 3
    1 1 1.000000000000000e+00 0.000000000000000e+00
    2 1 0.000000000000000e+00 2.000000000000000e+00
    2 2 1.000000000000000e+00 0.000000000000000e+00
    

    I am completely unfamiliar with this format, so maybe both should work? But I thought I should report it. I was able to copy your writer and modify it to produce the format scipy expects, but .... well, maybe there's a bug here?

    bug 
    opened by chetmurthy 17
  • added MulAcc<A,B> support for matrix csc & csr X dense vec formats

    added MulAcc support for matrix csc & csr X dense vec formats

    2 functions that were easy to do. 1 minor change in each - flip the mul_acc operands to be matrix elem, vec elem (the order you see it in the expression Matrix X Vector) - and the tests ended up needing a couple of type annotations. besides that the callgraph is unaffected, everywhere else those fn's are used.. nothing changed.

    opened by experiment9123 13
  • Quadratic complexity of sparse matrices multiplication

    Quadratic complexity of sparse matrices multiplication

    Hello!

    I try to use sprs package in my experimental code. I have encountered the issue of very slow multiplication of large sparse matrices.

    My code uses large diagonal matrices in CSR format. The matrices might contain from hundreds/thousands to millions NNZ elements. Multiplication of such matrices in sprs has almost quadratic complexity O(nnz^2). So we cannot use sprs in production code currently.

    I wrote some tests in Rust+sprs and Python+scipy. SciPy uses SMMP algorithm (Sparse Matrix Multiplication Package) and perform multiplication such matrices in tens/hundreds of milliseconds.

    The code of my benchmarks can be found on gist: https://gist.github.com/espdev/d278962828ce7360d653b8679d7bb511

    The plot for mat * mat.T: sprs_scipy_matmul

    Could you consider using SMMP algorithm in sprs? This algorithm has complexity O(n_row*K^2 + max(n_row,n_col)) where K is the maximum nnz in a row of A and column of B.

    opened by espdev 12
  • Implementation reverse Cuthill-McKee

    Implementation reverse Cuthill-McKee

    See also #185 . This changes the Cuthill-McKee algorithm implemented here to the reverse Cuthill-McKee algorithm. Starting vertices are choosen by a George-Liu pseudoperipheral vertex finder. I have added two more test cases and verified the correctness of the exisiting one w.r.t. the new algorithm.

    I did try to write in the style of the library, however I might not always have got it right when to use .index(), when to call .unwrap() vs .expect(), and so on. Feel free to change these details, as well as remove unneccesary comments or line breaks. I just can't get enough of those.

    As benchmark possibilities are scarce, I did not optimize anything to aggressively. I really believe this crate should come with some sparse matrix collection as well as random matrices to benchmark against, automatically. That would make tweaking the details worthwile. That's an issue for another time though.

    Concerning the minimum degree ordering: As it is significantly more complex, It will take some more time to implement it, as in a few days to weeks. I will do it eventually, but I don't want to block somebody else on it, if there is interest.

    opened by lksriemer 11
  • Refactor to allow partial views

    Refactor to allow partial views

    This is a refactoring that should fix #244. This is very much WIP, but the basic test suite is passing. @mulimoen I'd be very interested in your feedback.

    This introduces lots of breaking changes, for instance lots of iterators are now using an impl Iterator pattern instead of an explicit type.

    There are still a few rough edges, which may not be addressed in this PR but should be addressed before 0.10 is released:

    • middle_outer_views is cumbersome to use, it probably should take a start and end parameter (or a range) to be more convenient and less error prone.
    • the IndPtr type is not used to build owned indptr types, but it could be interesting to use it to enforce good patterns. As it would not be practical to have runtime checks, this would be a crate-only API
    • smmp was quite hard to port, mostly because it does not take a CsMatViewI in its API. I probably should change that before 0.10.
    breaking v0.10.0 
    opened by vbarrielle 10
  • Fix serde deserialisation

    Fix serde deserialisation

    Fixes #231 by first copying from a Shadow of the same structure. This fails on invalid structures, and ensures only valid structures can be deserialized.

    This unfortunately breaks backwards compatability as CsVecBase does not have a Deref restriction, unlike CsMatBase. We should consider merging this when we are preparing for 0.10.

    breaking v0.10.0 
    opened by mulimoen 10
  • Relax Copy bound to Clone

    Relax Copy bound to Clone

    This issue was mentioned in #118. Thus I felt it would be good to make a separate issue for it, as I do not think it makes sense to handle it as part of that. Big integers or rationals would be an interesting use case, for example.

    opened by milibopp 9
  • Fix matrixmarket read/write complex

    Fix matrixmarket read/write complex

    This PR adds support for read/write of MatrixMarket files in complex format compatible with scipy.

    Also, fixes symmetric and skew-symmetric read/write, and a bug in a test.

    Added tests to cover new behaviour.

    opened by chetmurthy 8
  • Draft: Support generic dense vectors in more APIs

    Draft: Support generic dense vectors in more APIs

    Lots of APIs were using slices as their input, particularly when using output buffers. However, this was not a good fit for linear algebra, since consumers of the APIs are more likely to be using eg an array from ndarray.

    Here we extend the DenseVector trait with a mutable version to be able to use it on output parameters. Since these are sealed traits we should be able to add unsafe indexing if necessary without a breaking change. It should also be possible to support eg nalgebra in the future without too much trouble.

    This should improve the situation discussed in #93, though it's probably not done yet.

    opened by vbarrielle 7
  • Serde tests failure

    Serde tests failure

    I've recently noticed that the test tests/bincode_ser.rs was not being executed. I fixed that to make it run in the fix_serde_test_branch, but it reveals there's an error at serialization time for CsMatView: https://github.com/vbarrielle/sprs/runs/1581558055

    I've tried looking into it, but so far I don't get what's wrong, the serialization code is the same than for CsVec, where it works... @mulimoen you know that serialization code much better than I do, do you have an idea?

    opened by vbarrielle 7
  • relax trait requirements for binary operations.

    relax trait requirements for binary operations.

    Following #319, I have relaxed the trait requirements for Add, Sub etc for CsVec, so that in particular we can add and subtract vectors with scalar types that does not necessarily implement the num_traits::Num trait.

    opened by taketo1024 1
  • python interface for sprs

    python interface for sprs

    Hi, a while back I mentioned that I had written some Python code to wrapper sprs. I got permission to open-source that code (Apache license), and thought I should mention it here. I don't know exactly which bits might be useful to sprs, but .... well, figured we could start the conversation.

    Link to the file containing the python bits: https://github.com/chetmurthy/qrusty/blob/master/pyqrusty/src/lib.rs

    I'm happy to do more work on this, with suggestions as to how it should evolve. Ditto move bits of this into sprs.

    opened by chetmurthy 0
  • more-perspicuous error-messages for reading matrixmarket files?

    more-perspicuous error-messages for reading matrixmarket files?

    Recently I was writing tests to read matrix market files written by Scipy, and I made the mistake of writing a diagonal matrix with scipy as "skew-symmetric" and then reading it with sprs. Of course, scipy dutifully wrote it out and marked it as skew-symmetric, and then sprs blew up with BadMatrixMarketFile. All well and correct.

    But it's a little opaque, as error-messages go, and I had to puzzle for a while, reading the sprs read_XXXX source code, before I realized what was going wrong (and hence, my mistake). Looking at the source code of read_matrix_market_from_bufread, there are twelve places where the error is raised, and they all have different meanings. I thought: "gee, maybe this error should carry a string with it, so we can write a description of the erroneous input (including maybe line-number?)

    Just a thought. It's obviously a low-priority change, but I wonder if maybe it might be useful for users.

    opened by chetmurthy 0
  • sparse by dense parallel products #298

    sparse by dense parallel products #298

    Starts on #298. Using ndarray parallel iterators this is the naive conversion of the sparse by dense products to a multithreaded implementation. Ideally, there is a rayon into_par_iter implemenation for the existing CsMat::outer_iterator for some of these to be more optimal. In the current implementation, some sparse matrix by dense vector products are still going to be single threaded.

    opened by aujxn 2
  • Parallel Matrix Operations

    Parallel Matrix Operations

    Currently only sparse by sparse products are parallel in the smmp module. Converting the current sparse by dense products using ndarray::parallel should be straight forward. Here is an implementation for par_csr_mulacc_dense_colmaj that gives a significant speedup on my machine:

    pub fn par_csr_mulacc_dense_colmaj<'a, N, A, B, I, Iptr>(
        lhs: CsMatViewI<A, I, Iptr>,
        rhs: ArrayView<B, Ix2>,
        mut out: ArrayViewMut<'a, N, Ix2>,
    ) where
        A: Send + Sync,
        B: Send + Sync,
        N: 'a + crate::MulAcc<A, B>  + Send + Sync,
        I: 'a + SpIndex,
        Iptr: 'a + SpIndex,
    {
        assert_eq!(lhs.cols(), rhs.shape()[0], "Dimension mismatch");
        assert_eq!(lhs.rows(), out.shape()[0], "Dimension mismatch");
        assert_eq!(rhs.shape()[1], out.shape()[1], "Dimension mismatch");
        assert!(lhs.is_csr(), "Storage mismatch");
    
        let axis1 = Axis(1);
        ndarray::Zip::from(out.axis_iter_mut(axis1))
            .and(rhs.axis_iter(axis1))
            .par_for_each(|mut ocol, rcol| {
                for (orow, lrow) in lhs.outer_iterator().enumerate() {
                    let oval = &mut ocol[[orow]];
                    for (rrow, lval) in lrow.iter() {
                        let rval = &rcol[[rrow]];
                        oval.mul_acc(lval, rval);
                    }
                }
            });
    }
    

    The only changes here are the parallel iterator, adding the rayon feature for ndarray, and adding the Sync and Send trait bounds to the data types inside the matrices. My concern is that adding Send + Sync will result in these trait requirements to be unnecessarily added in many places.

    Looking at the impl Mul for CsMatBase and CsMatBase I see that Sync + Send is required no matter if multi_thread is enabled or not. Is it okay to propagate these trait requirements all the way up to many of the trait impls for CsMatBase and then use the conditional feature compilation on the lowest level functions found in the prod module? Conditionally compiling at all the higher level implementations sounds like it would get nasty very quickly, especially as more parallel features get added.

    opened by aujxn 3
  • `MulAcc` causes significant speed regression

    `MulAcc` causes significant speed regression

    I just found out that my implementation of an ML model has been training up to several times slower than before. After some debugging, turns out that the cause was vbarrielle/sprs#276, where the MulAcc trait was introduced in d56bd72e4f883c9f12f7e38c47c2141784127f21, and specifically where the Copy bound was removed in d2f0da76df51be6e81839b04ed64c5121a2ec640. For details please see https://github.com/tomtung/omikuji/pull/32.

    My intuitive guess is that the trait MulAcc got rid of the Copy bound at the cost of forcing the values to be passed as references (pointers). This might have made it hard for the compiler to do inlining and/or other optimizations. since the += operation is typically in the innermost part of the loop, any slow-down in such hot spots would result in significant speed regression.

    I don't have context on the introduction of MulAcc, so I'm not sure what's the right way to fix this. Let me know if there's something I could help with.

    opened by tomtung 9
Owner
Vincent Barrielle
Research engineer, interested in computer graphics, computer vision, facial animation and physical simulation.
Vincent Barrielle
Rust DataFrame library

Polars Blazingly fast DataFrames in Rust & Python Polars is a blazingly fast DataFrames library implemented in Rust. Its memory model uses Apache Arro

Ritchie Vink 11.9k Jan 8, 2023
Rayon: A data parallelism library for Rust

Rayon Rayon is a data-parallelism library for Rust. It is extremely lightweight and makes it easy to convert a sequential computation into a parallel

null 7.8k Jan 8, 2023
ConnectorX - Fastest library to load data from DB to DataFrames in Rust and Python

ConnectorX enables you to load data from databases into Python in the fastest and most memory efficient way.

SFU Database Group 939 Jan 5, 2023
A bit vector with the Rust standard library's portable SIMD API

bitsvec A bit vector with the Rust standard library's portable SIMD API Usage Add bitsvec to Cargo.toml: bitsvec = "x.y.z" Write some code like this:

Chojan Shang 31 Dec 21, 2022
A rust library built to support building time-series based projection models

TimeSeries TimeSeries is a framework for building analytical models in Rust that have a time dimension. Inspiration The inspiration for writing this i

James MacAdie 12 Dec 7, 2022
Polars is a blazingly fast DataFrames library implemented in Rust using Apache Arrow Columnar Format as memory model.

Polars Python Documentation | Rust Documentation | User Guide | Discord | StackOverflow Blazingly fast DataFrames in Rust, Python & Node.js Polars is

null 11.8k Jan 8, 2023
Library for scripting analyses against crates.io's database dumps

crates.io database dumps Library for scripting analyses against crates.io's database dumps. These database dumps contain all information exposed by th

David Tolnay 52 Dec 14, 2022
A cross-platform library to retrieve performance statistics data.

A toolkit designed to be a foundation for applications to monitor their performance.

Lark Technologies Pte. Ltd. 155 Nov 12, 2022
This library provides a data view for reading and writing data in a byte array.

Docs This library provides a data view for reading and writing data in a byte array. This library requires feature(generic_const_exprs) to be enabled.

null 2 Nov 2, 2022
Dataflow is a data processing library, primarily for machine learning

Dataflow Dataflow is a data processing library, primarily for machine learning. It provides efficient pipeline primitives to build a directed acyclic

Sidekick AI 9 Dec 19, 2022
Dataframe structure and operations in Rust

Utah Utah is a Rust crate backed by ndarray for type-conscious, tabular data manipulation with an expressive, functional interface. Note: This crate w

Suchin 139 Sep 26, 2022
A Rust DataFrame implementation, built on Apache Arrow

Rust DataFrame A dataframe implementation in Rust, powered by Apache Arrow. What is a dataframe? A dataframe is a 2-dimensional tabular data structure

Wakahisa 287 Nov 11, 2022
A Rust crate that reads and writes tfrecord files

tfrecord-rust The crate provides the functionality to serialize and deserialize TFRecord data format from TensorFlow. Features Provide both high level

null 22 Nov 3, 2022
Official Rust implementation of Apache Arrow

Native Rust implementation of Apache Arrow Welcome to the implementation of Arrow, the popular in-memory columnar format, in Rust. This part of the Ar

The Apache Software Foundation 1.3k Jan 9, 2023
Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow

Parquet2 This is a re-write of the official parquet crate with performance, parallelism and safety in mind. The five main differentiators in compariso

Jorge Leitao 237 Jan 1, 2023
Apache TinkerPop from Rust via Rucaja (JNI)

Apache TinkerPop from Rust An example showing how to call Apache TinkerPop from Rust via Rucaja (JNI). This repository contains two directories: java

null 8 Sep 27, 2022
PyO3-based Rust binding of NumPy C-API

rust-numpy Rust bindings for the NumPy C-API API documentation Latest release (possibly broken) Current Master Requirements Rust >= 1.41.1 Basically,

PyO3 759 Jan 3, 2023
DataFrame / Series data processing in Rust

black-jack While PRs are welcome, the approach taken only allows for concrete types (String, f64, i64, ...) I'm not sure this is the way to go. I want

Miles Granger 30 Dec 10, 2022
A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, written in Rust

Datafuse Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture Datafuse is a Real-Time Data Processing & Analytics DBMS wit

Datafuse Labs 5k Jan 4, 2023