Supervised discretization in Rust

Overview

Discrust

Supervised discretization in Rust

PyPI version shields.io

The discrust package provides a supervised discretization algorithm. Under the hood it implements a decision tree, using information value to find the optimal splits, and provides several different methods to constrain the final discretization scheme.

The package draws heavily from the ivpy package, both in the algorithm and the parameter controls.

Usage

The package has a single user facing class, Discretizer that can be instantiated with the following arguments.

  • min_obs (Optional[float], optional): Minimum number of observations required in a bin. Defaults to 5.
  • max_bins (Optional[int], optional): Maximum number of bins to split the variable into. Defaults to 10.
  • min_iv (Optional[float], optional): Minimum information value required to make a split. Defaults to 0.001.
  • min_pos (Optional[float], optional): Minimum number of records with a value of one that should be present in a split. Defaults to 5.
  • mono (Optional[int], optional): The monotonicity required between the binned variable and the binary performance outcome. A value of -1 will result in negative correlation between the binned x and y variables, while a value of 1 will result in a positive correlation between the binned x variable and the y variable. Specifying a value of 0 will result in binning x, with no monotonicity constraint. If a value of None is specified the monotonicity will be determined the monotonicity of the first split. Defaults to None.

The fit method can be called on data and accepts the following parameters.

  • x (ArrayLike): An arraylike numeric field that will be discretized based on the values of y, and the constraints the Discretizer was initialized with.
  • y (ArrayLike): An arraylike binary field.
  • sample_weight (Optional[ArrayLike], optional): Optional sample weight array to be used when calculating the optimal breaks. Defaults to None.

This method will return a list of the optimal split values for the feature given the constraints. After being fit the discretizer will have a splits_ attribute with this list.

import seaborn as sns

df = sns.load_dataset("titanic")

from discrust import Discretizer

ds = Discretizer(min_obs=5, max_bins=10, min_iv=0.001, min_pos=1.0, mono=None)
ds.fit(df["fare"], df["survived"])
# [-inf, 6.95, 7.125, 7.7292, 10.4625, 15.1, 50.4958, 52.0, 73.5, 79.65, inf]

The predict method can be called and will discretize the feature, and then perform weight of evidence substitution on each binned level. This method takes the following arguments.

  • x (ArrayLike): An arraylike numeric field.
ds.predict(df["fare"])[0:5]
array([-0.84846814, 0.78344263, -0.787529, 0.78344263, -0.787529])

Installation

From PyPi

For Windows users, the package can be installed directly from pypi with the following command.

python -m pip install discrust

Building from Source

The package can be built from source, it utalizes the maturin tool as a build backend. This tool requires you have python, and a working Rust compiler installed, see here for details. If these two requirements are met, you can clone this repository, and run the following command in the repositories root directory.

python -m pip install . -v

This should invoke the maturin tool, which will handle the building of the Rust code and installation of the package. Alternativly, if you simply want to build a wheel, you can run the following command after installing maturin.

maturin build --release

I have had some problems building packages with maturin directly in a conda environment, this is actually a bug on anaconda's side that will hopefully be resolved. If this does give you any problems, it's usually easiest to build a wheel inside of a venv and then install the wheel.

Additional TODOs

  • Support for exception values
  • Support for missing values in both the dependant and independent variables
You might also like...
"Algorithms for approximate string matching" in Rust, with Python bindings.

ukkonen Implementation of a bounded Levenshtein distance by Esko Ukkonen in "Algorithms for approximate string matching" in Rust, with Python bindings

Common data structures and algorithms for competitive programming in Rust
Common data structures and algorithms for competitive programming in Rust

algorithm-rs algorithm-rs is common data structures and algorithms for competitive programming in Rust. Contents TBA How To Contribute Contributions a

zine/book about bitmap drawing algorithms and math with code examples in Rust
zine/book about bitmap drawing algorithms and math with code examples in Rust

A Bitmapper's Companion - zine/book about bitmap drawing algorithms and math with code examples in Rust A small zine/book written in LaTeX. In progres

Pure Rust implementation of the Double Ratchet algorithm
Pure Rust implementation of the Double Ratchet algorithm

Double Ratchet A pure Rust implementation of the Double Ratchet, as specified by Trevor Perrin and Moxie Marlinspike. The Double Ratchet allows two us

A rust implementation of Alexey Akhunov's multiproof algorithm

multiproof.rs A rust implementation of Alexey Akhunov's multiproof algorithm. At the time of creation, multiproof is still a work in progress and this

Algorithms implemented in Rust, explained.

Rusty Algorithms & Data Structures for Learners This repository presents Rust implementations of common algorithms and data structures, many of which

 Example of a genetic algorithm in Rust and Python
Example of a genetic algorithm in Rust and Python

Example of a genetic algorithm in Rust and Python Monkey typewriter Finding the phrase 'To be or not to be. That is the question.' Inspired by the exa

Bandit Algorithms in Rust
Bandit Algorithms in Rust

Multi-armed bandit algorithms in Rust Cargo [dependencies] bandit = "0.12.3" Description and Scope This library currently only implements the annealin

A Rust implementation of an Oware AI
A Rust implementation of an Oware AI

Rust implementation of a minimax algorithm. The game is a custom version of the Oware. Using Alphabeta pruning, move ordering, and a tiny iterative deepening.

Comments
  • Update feature aggs

    Update feature aggs

    Updated how aggregations are calculated and used, now it uses a precomputed vector of mins and max values, so aggregations are only performed once for the feature.

    opened by jinlow 0
  • Git actions

    Git actions

    This pull requests adds github actions to the project, to support building and testing the package for multiple OS versions, as well as different python version.

    opened by jinlow 0
Releases(0.1.6)
  • 0.1.6(May 8, 2022)

    This release adds a number of optimizations to the package. Aggregations only happen once, and the pre-computed values are always used, instead of recalculating statistics for each split. A special sort method that allows for missing values is only used if exception values are provided. A split index is determined and used throughout the crate, instead of it ever needing to be recalculated.

    Source code(tar.gz)
    Source code(zip)
Owner
James Inlow
James Inlow
darwin-rs, evolutionary algorithms with rust

darwin-rs This library allows you to write evolutionary algorithms (EA) using the Rust programming language. Written by Willi Kappler, License: MIT -

Willi Kappler 95 Jan 1, 2023
Genetic Algorithm library in Rust

RsGenetic Summary and Features RsGenetic is a framework for executing genetic algorithms in Rust. It is designed to have a simple but modular API. Exa

Mathieu De Coster 74 Dec 27, 2022
Rust implementation of real-coded GA for solving optimization problems and training of neural networks

revonet Rust implementation of real-coded genetic algorithm for solving optimization problems and training of neural networks. The latter is also know

Yury Tsoy 19 Aug 11, 2022
Linear Programming for Rust, with an user-friendly API. This crate allows modeling LP problems, and let's you solve them with various solvers.

good_lp A Linear Programming modeler that is easy to use, performant with large problems, and well-typed. use good_lp::{variables, variable, coin_cbc,

Rust Operations Research 101 Dec 27, 2022
Raft implementation in Rust

rsraft Raft implementation in Rust. The aim of this project is implementing the Raft Consensus Algorithm as described in the paper, with the goal of f

Lauro Caetano 42 Dec 24, 2022
A tool used to evaluate the output of retrieval algorithms. Written in Rust.

Rusteval A tool used to evaluate the output of retrieval algorithms. Written in Rust. Building Install Rust with curl -sSf https://static.rust-lang.or

Giorgos Sfikas 17 Mar 22, 2022
This crate implements fast route planning algorithms in Rust.

This crate implements fast route planning algorithms in Rust. Algorithms Currently implemented: Contraction Hierarchies: The implementat

Payas Rajan 4 Aug 15, 2021
A browser app that evolves vehicles using genetic algorithms, written in Rust and Bevy

Vehicle Evolver Deluxe This is a simulation that uses AI (to be specific: genetic algorithms) to try to build better and better vehicles. The vehicles

null 95 Dec 26, 2022
Rust implementation of AstroBWT Proof-Of-Work algorithm

AstroBWT AstroBWT is a proof-of-work (PoW) algorithm based on Burrows-Wheeler transform (BWT). Developed and used by the DERO Project for Mobile (arm)

null 6 Mar 7, 2022
Stalin Binary Search, Rust implementation

Stalin Binary Search Idea is based on Stalin Sort It's alike binary search but any checking element which is not target one is eliminated. Complexity

null 7 Dec 12, 2022