A Rust implementation of HyperLogLog trying to be parsimonious with memory.

Overview

🧮 HyperLogLog-rs

Build status Crates.io Documentation

This is a Rust library that provides an implementation of the HyperLogLog (HLL) algorithm, trying to be parsimonious with memory.

What is HyperLogLog?

HLL is a probabilistic algorithm that is used for estimating the cardinality of a set with very high accuracy while using a very small amount of memory. The algorithm was invented by Philippe Flajolet and Éric Fusy in 2007, and since then, it has been widely used in many fields, including database systems, search engines, and social networks.

The HLL algorithm is based on the observation that the number of unique elements in a set can be estimated by counting the number of leading zeros in the binary representation of the hash values of the elements in the set. This idea is simple but powerful, and it allows us to estimate the cardinality of a set with very high accuracy (within a few percentage points) using only a small amount of memory (a few kilobytes).

Our Rust implementation of the HLL algorithm is highly efficient, and it provides very accurate estimates of the cardinality of large sets. It is designed to be easy to use and integrate into your existing Rust projects. The crate provides a simple API that allows you to create HLL counters, add elements to them, and estimate their cardinality.

The focus of this particular implementation of the HyperLogLog algorithm is to be as memory efficient as possible. We achieve this by avoiding struct attributes that would store parameters such as the precision or the number of bits, as these would take up unnecessary memory space. Instead, we define these parameters as constants associated to the class, which allows for a more efficient and compact implementation.

However, this approach requires the use of several nightly features of the Rust language, including the const generics feature and the associated type constructors. By using these features, we are able to define the size of the internal data structures used by the algorithm at compile time, based on the precision of the HyperLogLog counter. This results in a more memory-efficient implementation that can be used in a wide range of applications.

We hope that this library will be useful for you in your projects, and we welcome your feedback and contributions. Please feel free to open an issue or submit a pull request if you have any questions or suggestions. Thank you for your interest in our HLL crate, and happy counting!

Usage

Add this to your Cargo.toml:

[dependencies]
hyperloglog = "0.1"

and this to your crate root:

use hyperloglog_rs::HyperLogLog;

Examples

use hyperloglog_rs::HyperLogLog;

let mut hll = HyperLogLog::<14, 5>::new();
hll.insert(&1);
hll.insert(&2);

let mut hll2 = HyperLogLog::<14, 5>::new();
hll2.insert(&2);
hll2.insert(&3);

let union = hll | hll2;

let estimated_cardinality = union.estimate_cardinality();
assert!(estimated_cardinality >= 3.0_f32 * 0.9 &&
        estimated_cardinality <= 3.0_f32 * 1.1);

No STD

This crate is designed to be as lightweight as possible and does not require any dependencies from the Rust standard library (std). As a result, it can be used in a bare metal or embedded context, where std may not be available.

All functionality of this crate can be used without std, and the prelude module provides easy access to all the relevant types and traits. If you encounter any issues using this crate in a no_std environment, please don't hesitate to open an issue or submit a pull request on GitHub.

Required features

Note that some of the features used by this crate, such as #![feature(generic_const_exprs)], require a nightly Rust compiler. If you are using a stable or beta version of Rust, you may need to switch to nightly or disable certain features to use this crate.

The generic_const_exprs feature allows for using constant expressions as generic type parameters, which is necessary in this implementation to pass compile-time constant values as template arguments.

The const_float_bits_conv feature provides the ability to convert floating-point values to their corresponding bit patterns at compile time, which is useful in computing hash values.

Finally, the const_fn_floating_point_arithmetic feature provides the ability to perform floating-point arithmetic in const functions, which is necessary for computing hash values using the f32::to_bits() method.

Fuzzing

Fuzzing is a technique for finding security vulnerabilities and bugs in software by providing random input to the code. It can be an effective way of uncovering issues that might not be discovered through other testing methods. In our library, we take fuzzing seriously, and we use the cargo fuzz tool to ensure our code is robust and secure. cargo fuzz automates the process of generating and running randomized test inputs, and it can help identify obscure bugs that would be difficult to detect through traditional testing methods. We make sure that our fuzz targets are continuously updated and run against the latest versions of the library to ensure that any vulnerabilities or bugs are quickly identified and addressed.

Learn more about how we fuzz here

Contributing to this project

Contributions from the community are highly appreciated and can help improve this project. If you have any suggestions, feature requests, or bugs to report, please open an issue on GitHub. Additionally, if you want to contribute to the project, you can open a pull request with your proposed changes. Before making any substantial changes, please discuss them with the project maintainer in the issue tracker or on the 🍇 GRAPE 🍇 Discord server.

If you appreciate this project and would like to support its development, you can star the repository on GitHub or consider making a financial contribution. The project maintainer has set up a GitHub Sponsors page where you can make a recurring financial contribution to support the project's development. Any financial contribution, no matter how small, is greatly appreciated and helps ensure the continued development and maintenance of this project.

Thanks

We would like to thank the GitHub user Tabac for their implementation of HyperLogLog, which was very useful for learning and benchmarking this implementation. The goals of the two implementations are different, but Tabac's implementation already supports the HLL++ algorithm, go check it out if you need it. Their implementation can be found at here.

Citations

Some relevant citations to learn more:

You might also like...
High concurrency, RealTime, In-memory storage inspired by erlang mnesia
High concurrency, RealTime, In-memory storage inspired by erlang mnesia

DarkBird is a Document oriented, high concurrency in-memory Storage, also persist data to disk to avoid loss any data The darkbird provides the follow

Support SIMD low-memory overhead and high-performance adaptive radix tree.

Artful Artful is an adaptive radix tree library for Rust. At a high-level, it's like a BTreeMap. It is based on the implementation of paper, see The A

🧪 The versatile and intuitive memory hacking framework.

🧪 hax 🤔 About hax is a Rust crate designed to make memory hacking, game hacking, cheat development, and any other low level memory based development

Linked Atomic Random Insert Vector: a thread-safe, self-memory-managed vector with no guaranteed sequential insert.
Linked Atomic Random Insert Vector: a thread-safe, self-memory-managed vector with no guaranteed sequential insert.

Linked Atomic Random Insert Vector Lariv is a thread-safe, self-memory-managed vector with no guaranteed sequential insert. It internally uses a linke

 A process memory reader and debugger for Windows (x86_64)
A process memory reader and debugger for Windows (x86_64)

Winreader Winreader is a process memory reader and debugger for Windows, implemented and developed in the Rust language, using the official Microsoft

Vemcache is an in-memory vector database.

Vemcache Vemcache is an in-memory vector database. Vemcache can be thought of as the Redis equivalent for vector databases. Getting Started Prerequisi

KVM memory R/W cheat for CSGO

CSGO KVM DMA Main Feature Triggerbot with random press/release time *TODO list: add method to detect key event in VM. add No recoil. add wallhack(

Rust implementation of Andrej Karpathy's micrograd for purposes of learning both ML and Rust.

micrograd_rs Rust implementation of Andrej Karpathy's micrograd for purposes of learning both ML and Rust. Main takeaways Basically the same takeaways

Ray Tracing: The Next Week implementation in Rust
Ray Tracing: The Next Week implementation in Rust

rttnw Ray Tracing: The Next Week implementation in Rust How to run Install Rust: Link here. Run project git clone https://github.com/luliic2/rttnw cd

Owner
Luca Cappelletti
I am but an egg 🥚 - "Stranger in a Strange Land"
Luca Cappelletti
This Repo Contains my Week Long Journey Trying to Learn Rust Programming Language 🦀.

the-rust-way This Repo Contains my Week Long Journey Trying to Learn Rust Programming Language ?? . ?? Thanks to all Wonderful Contributors Thanks a l

Kanishk Pachauri 7 Oct 20, 2022
A additional Rust compiler pass to detect memory safe bugs of Rust programs.

SafeDrop A additional Rust compiler pass to detect memory safe bugs of Rust programs. SafeDrop performs path-sensitive and field-sensitive inter-proce

Artisan-Lab  (Fn*) 5 Nov 25, 2022
A small in-memory key value database for rust

SmollDB Small in-memory key value database for rust This is a small in-memory key-value database, which can be easly backed up in a file or stream and

null 13 Dec 15, 2022
Library and proc macro to analyze memory usage of data structures in rust.

Allocative: memory profiler for Rust This crate implements a lightweight memory profiler which allows object traversal and memory size introspection.

Meta Experimental 19 Jan 6, 2023
RcLite: small, fast, and memory-friendly reference counting for Rust

RcLite: small, fast, and memory-friendly reference counting RcLite is a lightweight reference-counting solution for Rust that serves as an alternative

Khashayar Fereidani 147 Apr 14, 2023
Super-simple, fully Rust powered "memory" (doc store + semantic search) for LLM projects, semantic search, etc.

memex Super simple "memory" for LLM projects, semantic search, etc. Running the service Note that if you're running on Apple silicon (M1/M2/etc.), it'

Spyglass Search 15 Jun 19, 2023
Rust library for concurrent data access, using memory-mapped files, zero-copy deserialization, and wait-free synchronization.

mmap-sync mmap-sync is a Rust crate designed to manage high-performance, concurrent data access between a single writer process and multiple reader pr

Cloudflare 97 Jun 26, 2023
Blazing fast, memory safe & modern Linux package manager written in Rust.

paket Blazing fast, memory safe & modern Linux package manager written in Rust. Roadmap Version: 0.1 Paket.toml file parsing. (#1, #2) CLI handling (p

null 4 Oct 19, 2023
Proof-of-concept for a memory-efficient data structure for zooming billion-event traces

Proof-of-concept for a gigabyte-scale trace viewer This repo includes: A memory-efficient representation for event traces An unusually simple and memo

Tristan Hume 59 Sep 5, 2022
This crate allows to generate a flat binary with the memory representation of an ELF.

flatelf Library This crate allows to generate a flat binary with the memory representation of an ELF. It also allows to generate a FLATELF with the fo

Roi Martin 3 Sep 29, 2022