Command line tool for computing gigantic correlation matrices

Overview

turbocor

This is little program designed to compute (sparse) Pearson correlation matrices as efficiently as possible.

The main ideas are thus:

  1. Pairwise correlations computed using AVX2 and FMA instructions.
  2. Multithreading, of course.
  3. Save space by discarding correlations with absolute value below some threshold. (Hense, "sparse" correlation matrix.)
  4. Use constant space and avoid thread contention and overhead from reallocation by using thread local buffers which are dumped to a temporary file when full.
  5. Find the top-k correlations using partial sort taking O(n^2 + k log(k)) time, rather than full sort taking O(n^2 log(n)) time.

Combined, this lets you compute a correlation network in ~10min and 10GB memory that would ordinary take a day and hundreds of GBs.

Usage

There are two subcommand: compute computes the sparse correlation matrix, and topk does a partial sort of that matrix to find the top-k correlations by absolute value.

Compute sparse correlation matrix (rounding entries with absolute value below 0.7 to zero). Features are expected to be in hdf5 format in a dataset determined by the --dataset argument.

turbocor compute --lower-bound 0.7 --dataset "some_interesting_data" features.h5 correlations.h5

Write the top-k correlations in comma delimited format to standard output:

turbocor topk 1000 correlations.h5 > topk-correlations.csv

Possible improvements

The most obvious way to make this even faster to is to compute pairwise correlations on a GPU. A less obvious way would be to try to tweak the order matrix entries are computed in to improve data locality.

You might also like...
verilot (verifiable lottery) is a command line tool for running and verifying one-time lotteries.

verilot verilot (verifiable lottery) is a command line tool for running and verifying one-time lotteries. Install Install Rust and Cargo with Rustup.

A tool crate to quickly build rust command line application.

Falsework A tool crate to quickly build rust command line application.

Standard Graphics is a command-line tool for printing 2D graphics from any language to any screen.
Standard Graphics is a command-line tool for printing 2D graphics from any language to any screen.

2D graphics in any programming language with just print statements!

🌤️ Command line weather tool.
🌤️ Command line weather tool.

🌤️ Command line weather tool.

A dead simple functional testing tool for command line applications

Pharaoh : build that test pyramid! What it is Pharaoh is a dead simple, no permission needed, functional test runner for command line applications, wr

ruborute is an interactive command-line tool to get asphyxia@sdvx gaming data.

ruborute Are you 暴龍天 ?. The ruborute is an interactive command-line tool to get asphyxia@sdvx gaming data. asphyxia-core/plugins: https://github.com/a

Command line tool to extract various data from Blender .blend files

blendtool Command line tool to extract various data from Blender .blend files. Currently supports dumping Eevee irradiance volumes to .dds, new featur

A tool to use the webeep platform of the Politecnico di Milano directly from the command line.

webeep-cli A tool to use the WeBeep platform of the Politecnico di Milano directly from the command line. Features Browse the course folders as if the

A simple command line tool for creating font palettes for engines like libtcod

palscii A simple command line tool for creating font palettes for engines like libtcod. Usage This can also be viewed by running palscii --help. palsc

Releases(v0.1.1)
Owner
Daniel C. Jones
Daniel C. Jones
Small command-line tool to switch monitor inputs from command line

swmon Small command-line tool to switch monitor inputs from command line Installation git clone https://github.com/cr1901/swmon cargo install --path .

William D. Jones 5 Aug 20, 2022
Pink is a command-line tool inspired by the Unix man command.

Pink is a command-line tool inspired by the Unix man command. It displays custom-formatted text pages in the terminal using a subset of HTML-like tags.

null 3 Nov 2, 2023
Scientific computing for Rhai.

About rhai-sci This crate provides some basic scientific computing utilities for the Rhai scripting language, inspired by languages like MATLAB, Octav

Rhai - Embedded scripting language and engine for Rust 5 Dec 5, 2022
A command line tool written in Rust and designed to be a modern build tool + package manager for C/C++ projects.

CCake CCake is a command line tool written in Rust and designed to be a modern build tool + package manager for C/C++ projects. Goals To be easily und

Boston Vanseghi 4 Oct 24, 2022
Rust File Management CLI is a command-line tool written in Rust that provides essential file management functionalities. Whether you're working with files or directories, this tool simplifies common file operations with ease.

Rust FileOps Rust File Management CLI is a command-line tool written in Rust that provides essential file management functionalities. Whether you're w

Harikesh Ranjan Sinha 5 May 2, 2024
Command-line HTTP client for sending a POST request to specified URI on each stdin line.

line2httppost Simple tool to read lines from stdin and post each line as separate POST request to a specified URL (TCP connection is reused though). G

Vitaly Shukela 3 Jan 3, 2023
rip is a command-line deletion tool focused on safety, ergonomics, and performance

rip (Rm ImProved) rip is a command-line deletion tool focused on safety, ergonomics, and performance. It favors a simple interface, and does not imple

Kevin Liu 776 Jan 1, 2023
A command-line benchmarking tool

hyperfine 中文 A command-line benchmarking tool. Demo: Benchmarking fd and find: Features Statistical analysis across multiple runs. Support for arbitra

David Peter 14.1k Jan 4, 2023
An interactive cheatsheet tool for the command-line

navi An interactive cheatsheet tool for the command-line. navi allows you to browse through cheatsheets (that you may write yourself or download from

Denis Isidoro 12.2k Dec 30, 2022
Coinlive is an interactive command line tool that displays live cryptocurrency prices.

Coinlive is an interactive command line tool that displays live cryptocurrency prices. It can also display simple historical price charts.

Mayer Analytics 9 Dec 7, 2022