Adjustment Identification Distance: A gadjid for Causal Structure Learning

Overview

test & lint PyPi read

Adjustment Identification Distance: A πšπšŠπšπš“πš’πš for Causal Structure Learning

This is an early release of πšπšŠπšπš“πš’πš πŸ₯ and feedback is very welcome!

If you publish research using πšπšŠπšπš“πš’πš, please cite our article

@article{henckel2024adjustment,
    title = {{Adjustment Identification Distance: A gadjid for Causal Structure Learning}},
    author = {Leonard Henckel and Theo WΓΌrtzen and Sebastian Weichwald},
    journal = {{arXiv preprint arXiv:2402.08616}},
    year = {2024},
    doi = {10.48550/arXiv.2402.08616},
}

Get Started Real Quick πŸš€

Installation – Python

Just pip install gadjid to install the latest release of πšπšŠπšπš“πš’πš from PyPI
and run python -c "import gadjid; help(gadjid)" to get started.

Install Alternatives

Pip tries to find a matching wheel and install that. Since we offer precompiled wheels for most common operating systems, python versions, and CPU architectures, the installation will usually be quick. If there is no matching wheel (or when explicitly installing from source via pip install gadjid --no-binary gadjid), pip will download the source distribution and compile a wheel for the current platform, which requires the rust toolchain to be installed.

The current development version can be compiled and installed via
pip install "git+https://github.com/CausalDisco/gadjid.git"
or by cloning this repository and calling either
maturin develop --manifest-path ./gadjid_python/Cargo.toml (unoptimized dev compile) or
maturin develop --manifest-path ./gadjid_python/Cargo.toml --release (optimized release compile).

Introductory Example – Python

import gadjid
from gadjid import example, ancestor_aid, oset_aid, parent_aid, shd
import numpy as np

help(gadjid)

example.run_parent_aid()

Gtrue = np.array([
    [0, 1, 1, 1, 1],
    [0, 0, 1, 1, 1],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0]
], dtype=np.int8)
Gguess = np.array([
    [0, 0, 1, 1, 1],
    [1, 0, 1, 1, 1],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0]
], dtype=np.int8)

print(ancestor_aid(Gtrue, Gguess))
print(shd(Gtrue, Gguess))

πšπšŠπšπš“πš’πš is implemented in Rust and can conveniently be called from Python via our Python wrapper (implemented using maturin and PyO3).

Evaluating graphs learned by causal discovery algorithms is difficult: The number of edges that differ between two graphs does not reflect how the graphs differ with respect to the identifying formulas they suggest for causal effects. We introduce a framework for developing causal distances between graphs which includes the structural intervention distance for directed acyclic graphs as a special case. We use this framework to develop improved adjustment-based distances as well as extensions to completed partially directed acyclic graphs and causal orders. We develop polynomial-time reachability algorithms to compute the distances efficiently. In our package πšπšŠπšπš“πš’πš, we provide implementations of our distances; they are orders of magnitude faster than the structural intervention distance and thereby provide a success metric for causal discovery that scales to graph sizes that were previously prohibitive.

This is an Early Release πŸ₯

  • Feedback is very welcome! Just open an issue on here.
  • We are working on making πšπšŠπšπš“πš’πš available also for R.
  • πšπšŠπšπš“πš’πš is extensively tested (tests at bottom of each /gadjid/src/**.rs file) and validated against SID for DAG inputs. We are working on further extending and future-proofing the test suite.
  • The code is well documented. We plan on making a user and developer documentation available. πŸ“ƒ

Implemented Distances

  • ancestor_aid(Gtrue, Gguess)
  • oset_aid(Gtrue, Gguess)
  • parent_aid(Gtrue, Gguess)
  • for convenience, the following distances are implemented, too
    • shd(Gtrue, Gguess)
    • sid(Gtrue, Gguess) – only for DAGs!

where Gtrue and Gguess are adjacency matrices of a DAG or CPDAG. The functions are not symmetric in their input: To calculate a distance, identifying formulas for causal effects are inferred in the graph Gguess and verified against the graph Gtrue. Distances return a tuple (normalised_distance, mistake_count) of the fraction of causal effects inferred in Gguess that are wrong relative to Gtrue, normalised_distance, and the number of wrongly inferred causal effects, mistake_count. There are $p(p-1)$ pairwise causal effects to infer in graphs with $p$ nodes and we define normalisation as normalised_distance = mistake_count / p(p-1).

All graphs are assumed simple, that is, at most one edge is allowed between any two nodes. An adjacency matrix for a DAG may only contain 0s and 1s; a 1 in row s and column t codes a directed edge Xβ‚› β†’ Xβ‚œ; DAG inputs are validated for acyclicity. An adjacency matrix for a CPDAG may only contain 0s, 1s and 2s; a 2 in row s and column t codes a undirected edge Xβ‚› β€” Xβ‚œ (an additional 2 in row t and column s is ignored; only one of the two entries is required to code an undirected edge); CPDAG inputs are not validated and the user needs to ensure the adjacency matrix indeed codes a valid CPDAG (instead of just a PDAG). You may also calculate the SID between DAGs via parent_aid(DAGtrue, DAGguess), but we recommend ancestor_aid and oset_aid and for CPDAG inputs our parent_aid does not coincide with the SID (see also our accompanying article).

Empirical Runtime Analysis

Experiments run on a laptop with 8 GB RAM and 4-core i5-8365U processor. Here, for a graph with $p$ nodes, sparse graphs have $10p$ edges in expectation, dense graphs have $0.3p(p-1)/2$ edges in expectation, and sparse graphs have $0.75p$ edges in expectation.

Maximum graph size feasible within 1 minute

Method sparse dense
Parent-AID 13005 960
Ancestor-AID 8200 932
Oset-AID 546 250
SID in R 255 239

Average runtime

Method x-sparse ( $p=1000$ ) sparse ( $p=256$ ) dense ( $p=239$ )
Parent-AID 6.3 ms 22.8 ms 189 ms
Ancestor-AID 2.7 ms 38.7 ms 226 ms
Oset-AID 3.2 ms 4.69 s 47.3 s
SID in R ~1–2 h ~60 s ~60 s

Project Structure Overview

  • .github/workflows/ – github actions for linting/testing/packaging
  • gadjid/ – Rust core package, which implements a graph memory layout purposefully designed for fast memory access in reachability algorithms, a new reachability algorithm to check the validity of an adjustment set, and all DAG/CPDAG distances discussed in the accompanying article
  • gadjid_python/ – python wrapper that accepts numpy and scipy int8 matrices as graph adjacency matrices
    • gadjid_python/tests/ – runs tests of and via the python πšπšŠπšπš“πš’πš wrapper:
      1. tests the loading of numpy arrays and views as well as scipy sparse csr/csc matrices
      2. tests parent_aid against the R implementation of the SID on pairs of testgraphs; since in the special case of DAG inputs the Parent-AID coincides with the SID, this end-to-end tests the check for validity of adjustment sets implemented via a new reachability algorithm
  • gadjid_r/ – placeholder for the R wrapper to come!
  • testgraphs/ – testgraphs in .mtx files (Matrix Market Exchange Format), a csv file with the SHD/SID between the testgraphs to test against, checksums

LICENSE

πšπšŠπšπš“πš’πš is available in source code form at https://github.com/CausalDisco/gadjid.

This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at https://mozilla.org/MPL/2.0/.

See also the MPL-2.0 FAQ.

You might also like...
Cleora AI is a general-purpose model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data.
Cleora AI is a general-purpose model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data.

Cleora Cleora is a genus of moths in the family Geometridae. Their scientific name derives from the Ancient Greek geo Ξ³αΏ† or Ξ³Ξ±αΏ–Ξ± "the earth", and metr

A fast, safe and easy to use reinforcement learning framework in Rust.
A fast, safe and easy to use reinforcement learning framework in Rust.

RSRL (api) Reinforcement learning should be fast, safe and easy to use. Overview rsrl provides generic constructs for reinforcement learning (RL) expe

A deep learning library for rust

Alumina An experimental deep learning library written in pure rust. Breakage expected on each release in the short term. See mnist.rs in examples or R

Awesome deep learning crate
Awesome deep learning crate

NeuroFlow is fast neural networks (deep learning) Rust crate. It relies on three pillars: speed, reliability, and speed again. Hello, everyone! Work o

Machine learning in Rust.

Rustml Rustml is a library for doing machine learning in Rust. The documentation of the project with a descprition of the modules can be found here. F

Rust based Cross-GPU Machine Learning

HAL : Hyper Adaptive Learning Rust based Cross-GPU Machine Learning. Why Rust? This project is for those that miss strongly typed compiled languages.

Machine Learning Library for Rust

autograph Machine Learning Library for Rust undergoing maintenance Features Portable accelerated compute Run SPIR-V shaders on GPU's that support Vulk

Fwumious Wabbit, fast on-line machine learning toolkit written in Rust
Fwumious Wabbit, fast on-line machine learning toolkit written in Rust

Fwumious Wabbit is a very fast machine learning tool built with Rust inspired by and partially compatible with Vowpal Wabbit (much love! read more abo

πŸ¦€ Example of serving deep learning models in Rust with batched prediction
πŸ¦€ Example of serving deep learning models in Rust with batched prediction

rust-dl-webserver This project provides an example of serving a deep learning model with batched prediction using Rust. In particular it runs a GPT2 m

Comments
  • Move .mtx test from python to rust

    Move .mtx test from python to rust

    Currently the test that asserts that parentAID between DAGs is equivalent to SID (implemented in R) is written in python, and makes use of the python bindings. There is no real reason this couldn't be written directly in the core rust library, so this is what's been done in this PR.

    opened by Whatisthisname 0
Owner
CausalDisco
Benchmarking and Analytics for Causal Discovery
CausalDisco
Python package to compute levensthein distance in rust

Contents Introduction Installation Usage License Introduction Rust implementation of levensthein distance (https://en.wikipedia.org/wiki/Levenshtein_d

Thibault Blanc 2 Feb 21, 2022
A fast and cross-platform Signed Distance Function (SDF) viewer, easily integrated with your SDF library.

SDF Viewer (demo below) A fast and cross-platform Signed Distance Function (SDF) viewer, easily integrated with your SDF library. A Signed Distance Fu

null 39 Dec 21, 2022
Signed distance functions + Rust (CPU & GPU) = ❀️❀️

sdf-playground Signed distance functions + Rust (CPU & GPU) = ❀️❀️ Platforms: Windows, Mac & Linux. About sdf-playground is a demo showcasing how you

Patryk Wychowaniec 5 Nov 16, 2023
Flexible, reusable reinforcement learning (Q learning) implementation in Rust

Rurel Rurel is a flexible, reusable reinforcement learning (Q learning) implementation in Rust. Release documentation In Cargo.toml: rurel = "0.2.0"

Milan Boers 60 Dec 29, 2022
A Rust machine learning framework.

Linfa linfa (Italian) / sap (English): The vital circulating fluid of a plant. linfa aims to provide a comprehensive toolkit to build Machine Learning

Rust-ML 2.2k Jan 2, 2023
Machine Learning library for Rust

rusty-machine This library is no longer actively maintained. The crate is currently on version 0.5.4. Read the API Documentation to learn more. And he

James Lucas 1.2k Dec 31, 2022
Machine learning crate for Rust

rustlearn A machine learning package for Rust. For full usage details, see the API documentation. Introduction This crate contains reasonably effectiv

Maciej Kula 547 Dec 28, 2022
Open deep learning compiler stack for cpu, gpu and specialized accelerators

Open Deep Learning Compiler Stack Documentation | Contributors | Community | Release Notes Apache TVM is a compiler stack for deep learning systems. I

The Apache Software Foundation 8.9k Jan 4, 2023
Xaynet represents an agnostic Federated Machine Learning framework to build privacy-preserving AI applications.

xaynet Xaynet: Train on the Edge with Federated Learning Want a framework that supports federated learning on the edge, in desktop browsers, integrate

XayNet 196 Dec 22, 2022
The Hacker's Machine Learning Engine

Juice This is the workspace project for juice - machine learning frameworks for hackers coaster - underlying math abstraction coaster-nn coaster-blas

spearow 982 Dec 31, 2022