Python+Rust implementation of the Probabilistic Principal Component Analysis model

FindHotel

Last update: Dec 16, 2022

Related tags

Machine learning python rust data-science machine-learning linear-algebra machine-learning-algorithms pca-analysis pca dimensionality-reduction missing-data maximum-likelihood em-algorithm maximum-likelihood-estimation missing-values

Overview

Probabilistic Principal Component Analysis (PPCA) model

This project implements a PPCA model implemented in Rust for Python using pyO3 and maturin.

Installing

This package is available in PyPI!

pip install ppca-rs

Why use PPCA?

Glad you asked!

The PPCA is a simples extension of the PCA (principal component analysis), but can be overall more robust to train.
The PPCA is a proper statistical model. It doesn't spit out only the mean. You get standard deviations, covariances, and all the goodies that come from thre realm of probability and statistics.
The PPCA model can handle missing values. If there is data missing from your dataset, it can extrapolate it with reasonable values and even give you a confidence interval.
The training converges quickly and will always tend to a global maxima. No metaparameters to dabble with and no local maxima.

Why use `ppca-rs`?

That's an easy one!

It's written in Rust, with only a bit of Python glue on top. You can expect a performance in the same leage as of C code.
It uses rayon to paralellize computations evenly across as many CPUs as you have.
It also uses fancy Linear Algebra Trickery Technology to reduce computational complexity in key bottlenecks.
Battle-tested at Vio.com with some ridiculously huge datasets.

Quick example

import numpy as np
from ppca_rs import Dataset, PPCATrainer, PPCA

samples: np.ndarray

# Create your dataset from a rank 2 np.ndarray, where each line is a sample.
# Use non-finite values (`inf`s and `nan`) to signal masked values
dataset = Dataset(samples)

# Train the model (convenient edition!):
model: PPCAModel = PPCATrainer(dataset).train(state_size=10, n_iters=10)


# And now, here is a free sample of what you can do:

# Extrapolates the missing values with the most probable values:
extrapolated: Dataset = model.extrapolate(dataset)

# Smooths (removes noise from) samples and fills in missing values:
extrapolated: Dataset = model.filter_extrapolate(dataset)

# ... go back to numpy:
eextrapolated_np = extrapolated.numpy()

Juicy extras!

Tired of the linear? Support for PPCA mixture models is coming soon. Clustering and dimensionality reduction in a single tool.
Support for adaptation of DataFrames using either pandas or polars. Never juggle those dfs in your code again.

Building from soure

Prerequisites

You will need Rust, which can be installed locally (i.e., without sudo) and you will also need maturin, which can be installed by

pip install maturin

pipenv is also a good idea if you are going to mess around with it locally. At least, you need a venv set, otherwise, maturin will complain with you.

Installing it locally

Check the Makefile for the available commands (or just type make). To install it locally, do

make install    # optional: i=python.version (e.g, `i=3.9`)

Messing around and testing

To mess around, inside a virtual environment (a Pipfile is provided for the pipenv lovers), do

maturin develop  # use the flag --release to unlock superspeed!

This will install the package locally as is from source.

How do I use this stuff?

See the examples in the examples folder. Also, all functions are type hinted and commented. If you are using pylance or mypy, it should be easy to navigate.

Is it faster than the pure Python implemetation you made?

You bet!

A Voice Activity Detector rust library using the Silero VAD model.

Voice Activity Detector Provides a model and extensions for detecting speech in audio. Standalone Voice Activity Detector This crate provides a standa

3 Apr 3, 2024

Cleora AI is a general-purpose model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data.

Cleora Cleora is a genus of moths in the family Geometridae. Their scientific name derives from the Ancient Greek geo γῆ or γαῖα "the earth", and metr

405 Dec 20, 2022

Masked Language Model on Wasm

Masked Language Model on Wasm This project is for OPTiM TECH BLOG. Please see below: WebAssemblyを用いてBERTモデルをフロントエンドで動かす Demo Usage Build image docker

20 Sep 23, 2022

This is a rewrite of the RAMP (Rapid Assistance in Modelling the Pandemic) model

RAMP from scratch This is a rewrite of the RAMP (Rapid Assistance in Modelling the Pandemic) model, based on the EcoTwins-withCommuting branch, in Rus

3 Oct 26, 2022

A neural network model that can approximate any non-linear function by using the random search algorithm for the optimization of the loss function.

random_search A neural network model that can approximate any non-linear function by using the random search algorithm for the optimization of the los

2 Apr 1, 2022

m2cgen (Model 2 Code Generator) - is a lightweight library which provides an easy way to transpile trained statistical models into a native code

Transform ML models into a native code (Java, C, Python, Go, JavaScript, Visual Basic, C#, R, PowerShell, PHP, Dart, Haskell, Ruby, F#, Rust) with zero dependencies

2.3k Dec 31, 2022

Using OpenAI Codex's "davinci-edit" Model for Gradual Type Inference

OpenTau: Using OpenAI Codex for Gradual Type Inference Current implementation is focused on TypeScript Python implementation comes next Requirements r

11 Dec 18, 2022

Believe in AI democratization. llama for nodejs backed by llama-rs, work locally on your laptop CPU. support llama/alpaca model.

llama-node Large Language Model LLaMA on node.js This project is in an early stage, the API for nodejs may change in the future, use it with caution.

145 Apr 10, 2023

Rust numeric library with R, MATLAB & Python syntax

Peroxide Rust numeric library contains linear algebra, numerical analysis, statistics and machine learning tools with R, MATLAB, Python like macros. W

351 Dec 29, 2022

Python+Rust implementation of the Probabilistic Principal Component Analysis model

Related tags

Overview

Probabilistic Principal Component Analysis (PPCA) model

Installing

Why use PPCA?

Why use `ppca-rs`?

Quick example

Juicy extras!

Building from soure

Prerequisites

Installing it locally

Messing around and testing

How do I use this stuff?

Is it faster than the pure Python implemetation you made?

You might also like...

A Voice Activity Detector rust library using the Silero VAD model.

Cleora AI is a general-purpose model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data.

Masked Language Model on Wasm

This is a rewrite of the RAMP (Rapid Assistance in Modelling the Pandemic) model

A neural network model that can approximate any non-linear function by using the random search algorithm for the optimization of the loss function.

m2cgen (Model 2 Code Generator) - is a lightweight library which provides an easy way to transpile trained statistical models into a native code

Using OpenAI Codex's "davinci-edit" Model for Gradual Type Inference

Believe in AI democratization. llama for nodejs backed by llama-rs, work locally on your laptop CPU. support llama/alpaca model.

Rust numeric library with R, MATLAB & Python syntax

Owner

FindHotel

ThRust is a software framework for thermodynamic and probabilistic computing.

A demo repo that shows how to use the latest component model feature in wasmtime to implement a key-value capability defined in a WIT file.

Your one stop CLI for ONNX model analysis.

A high performance python technical analysis library written in Rust and the Numpy C API.

Msgpack serialization/deserialization library for Python, written in Rust using PyO3, and rust-msgpack. Reboot of orjson. msgpack.org[Python]

A Python CLI tool that finds all third-party packages imported into your Python project

A rust implementation of the csl-next model.

Docker for PyTorch rust bindings `tch`. Example of pretrain model.

Experimenting with Rust's fundamental data model

Library for the Standoff Text Annotation Model, in Rust

Python+Rust implementation of the Probabilistic Principal Component Analysis model

Related tags

Overview

Probabilistic Principal Component Analysis (PPCA) model

Installing

Why use PPCA?

Why use ppca-rs?

Quick example

Juicy extras!

Building from soure

Prerequisites

Installing it locally

Messing around and testing

How do I use this stuff?

Is it faster than the pure Python implemetation you made?

You might also like...

A Voice Activity Detector rust library using the Silero VAD model.

Cleora AI is a general-purpose model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data.

Masked Language Model on Wasm

This is a rewrite of the RAMP (Rapid Assistance in Modelling the Pandemic) model

A neural network model that can approximate any non-linear function by using the random search algorithm for the optimization of the loss function.

m2cgen (Model 2 Code Generator) - is a lightweight library which provides an easy way to transpile trained statistical models into a native code

Using OpenAI Codex's "davinci-edit" Model for Gradual Type Inference

Believe in AI democratization. llama for nodejs backed by llama-rs, work locally on your laptop CPU. support llama/alpaca model.

Rust numeric library with R, MATLAB & Python syntax

Owner

FindHotel

ThRust is a software framework for thermodynamic and probabilistic computing.

A demo repo that shows how to use the latest component model feature in wasmtime to implement a key-value capability defined in a WIT file.

Your one stop CLI for ONNX model analysis.

A high performance python technical analysis library written in Rust and the Numpy C API.

Msgpack serialization/deserialization library for Python, written in Rust using PyO3, and rust-msgpack. Reboot of orjson. msgpack.org[Python]

A Python CLI tool that finds all third-party packages imported into your Python project

A rust implementation of the csl-next model.

Docker for PyTorch rust bindings `tch`. Example of pretrain model.

Experimenting with Rust's fundamental data model

Library for the Standoff Text Annotation Model, in Rust

Why use `ppca-rs`?