Opendp - The core library of differential privacy algorithms powering the OpenDP Project.



Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.

The OpenDP Library is a modular collection of statistical algorithms that adhere to the definition of differential privacy. It can be used to build applications of privacy-preserving computations, using a number of different models of privacy. OpenDP is implemented in Rust, with bindings for easy use from Python.

The architecture of the OpenDP Library is based on a conceptual framework for expressing privacy-aware computations. This framework is described in the paper A Programming Framework for OpenDP.

The OpenDP Library is part of the larger OpenDP Project, a community effort to build trustworthy, open source software tools for analysis of private data. (For simplicity in these docs, when we refer to “OpenDP,” we mean just the library, not the entire project.)


OpenDP is under development, and we expect to release new versions frequently, incorporating feedback and code contributions from the OpenDP Community. It's a work in progress, but it can already be used to build some applications and to prototype contributions that will expand its functionality. We welcome you to try it and look forward to feedback on the library! However, please be aware of the following limitations:

OpenDP, like all real-world software, has both known and unknown issues. If you intend to use OpenDP for a privacy-critical application, you should evaluate the impact of these issues on your use case.

More details can be found in the Limitations section of the User Guide.


The easiest way to install OpenDP is using pip (the package installer for Python):

$ pip install opendp

More information can be found in the Getting Started section of the User Guide.


The full documentation for OpenDP is located at Here are some helpful entry points:

Getting Help

If you're having problems using OpenDP, or want to submit feedback, please reach out! Here are some ways to contact us:


OpenDP is a community effort, and we welcome your contributions to its development! If you'd like to participate, please see the Contributing section of the Developer Guide

  • Floating-point issue in noise samplers

    Floating-point issue in noise samplers

    Dear OpenDP team,

    As suggested by @Shoeboxam, I took a look at the approach OpenDP uses to sample noise. My understanding is that it implements three main mitigations against floating-point issues:

    1. using MPFR to generate a hole-free noise distribution centered on zero,
    2. computing (noise + shift/scale)*scale instead of noise*scale + shift,
    3. and rounding in conservative directions at every step.

    None of these mitigations address the problem that when summing two floating-point numbers, the result has the precision of the least-precise summand, which creates distinguishing events. In particular, this problem occurs regardless of whether noise is added before or after scaling, so the second mitigation does not work.

    One easy way to see this is to take scale=1, which makes Mitigation 2 a no-op. The following proof of concept adds Laplace noise of scale 1 to 0 and to 1, and checks the precision level of the output.

    from opendp.trans import *
    from opendp.meas import *
    from opendp.comb import *
    from opendp.mod import enable_features
    samples = 1000
    data = "1,0"
    parse = make_split_dataframe(separator=",", col_names=["A", "B"])
    noisy_sum = (
        make_cast(TIA=str, TOA=float)
        >> make_impute_constant(0.)
        >> make_clamp(bounds=(0., 1.))
        >> make_bounded_sum(bounds=(0., 1.))
        >> make_base_laplace(scale=1.0)
    # Noisy sum, col A
    sum_a = parse >> make_select_column(key="A", TOA=str) >> noisy_sum
    # Noisy sum, col B
    sum_b = parse >> make_select_column(key="B", TOA=str) >> noisy_sum
    out_a = [sum_a(data) for _ in range(0, samples)]
    out_b = [sum_b(data) for _ in range(0, samples)]
    mul_a = sum([(o*(2.**53)).is_integer() for o in out_a])
    mul_b = sum([(o*(2.**53)).is_integer() for o in out_b])
    print(f"{mul_a}/{samples} outputs from 1 are multiples of 2^-53...")
    print(f"... but only {mul_b}/{samples} from 0 are.")

    This prints:

    1000/1000 outputs from 1 are multiples of 2^-53...
    ... but only 722/1000 from 0 are.

    One solution is to determine in advance, based on the parameters of the aggregation, which level of precision you need, and round all outputs to this precision. This is what Google's DP libraries do. Another solution is to use MPRF to generate a hole-free noise distribution centered on shift, with the right noise scale, to avoid the floating-point operations entirely.

    This issue only happens if you enable the floating-point feature, which the documentation warns about. So I'm assuming you do not consider this as a vulnerability, and posting this on GitHub as a regular issue.

    Best regards,


    opened by TedTed 8
