The fastest bloom filter in Rust. No accuracy compromises. Use any hasher.

Overview

b100m-filter

Crates.io docs.rs License: MIT License: APACHE

The fastest bloom filter in Rust. No accuracy compromises. Use any hasher.

Usage

# Cargo.toml
[dependencies]
b100m-filter = "0.3.0"
use b100m_filter::BloomFilter;

let num_blocks = 4; // by default, each block is 512 bits
let values = vec!["42", "🦀"];

let filter = BloomFilter::builder(num_blocks).items(values.iter());
assert!(filter.contains("42"));
assert!(filter.contains("🦀"));
use b100m_filter::BloomFilter;
use ahash::RandomState;

let num_blocks = 4; // by default, each block is 512 bits

let filter = BloomFilter::builder(num_blocks)
    .hasher(RandomState::default())
    .items(["42", "🦀"].iter());

Background

Bloom filters are a space efficient approximate membership set data structure. False positives from contains are possible, but false negatives are not, i.e. contains for all items in the set is guaranteed to return true, while contains for all items not in the set probably return false.

Blocked bloom filters are supported by an underlying bit vector, chunked into 512, 256, 128, or 64 bit "blocks", to track item membership. To insert, a number of bits, based on the item's hash, are set in the underlying bit vector. To check membership, a number of bits, based on the item's hash, are checked in the underlying bit vector.

Once constructed, neither the bloom filter's underlying memory usage nor number of bits per item change.

Implementation

b100m-filter is blazingly fast because it uses L1 cache friendly blocks and efficiently derives many index bits from only one hash per value. Compared to traditional implementations, b100m-filter is 2-5 times faster for small sets of items, and hundreds of times faster for larger item sets. In all cases, b100m-filter maintains competitive false positive rates.

Runtime Performance

Runtime comparison to other bloom filter crates:

  • Bloom memory size = 16Kb
  • 1000 contained items
  • 364 hashes per item
Check Non-Existing (ns) Check Existing (ns)
b100m-filter 16.900 139.62
*fastbloom-rs 35.358 485.81
bloom 66.136 749.27
bloomfilter 68.912 996.56
probabilistic-collections 83.385 974.67

*fastbloom-rs uses XXHash, which is faster than SipHash.

False Positive Performance

b100m-filter does not compromise accuracy. Below is a comparison false positive rate with other bloom filter crates:

bloom-crate-fp

Scaling

b100m-filter scales very well.

As the number of bits and set size increase, traditional bloom filters need to perform more hashes per item to keep false positive rates low. However, b100m-filter's optimal number of hashes is bounded while keeping near zero rates even for many items:

bloom_perf

Bloom filter speed is directly proportional to number of hashes.

References

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

You might also like...
ISG lets you use YouTube as cloud storage for ANY files, not just video
ISG lets you use YouTube as cloud storage for ANY files, not just video

I was working on this instead of my finals, hope you appreciate it. I'll add all relevant executables when I can Infinite-Storage-Glitch AKA ISG (writ

The fastest memoizing and caching Python library written in Rust.

Cachebox Cachebox is a Python library (written in Rust) that provides memoizations and cache implementions with different cache replecement policies.

Not the fastest terminal colors library. Don't even ask about size.
Not the fastest terminal colors library. Don't even ask about size.

TROLOLORS Not the fastest terminal colors library. Don't even ask about size. Why? Don't even try to use it. But maybe you need to say to your boss th

This tool will profile official instances of OpenSUSE mirrorcache to determine the fastest repositories for your system

Mirror Magic tool to Magically make OpenSUSE Mirrors Magic-er This tool will profile official instances of OpenSUSE mirrorcache to determine the faste

Fastest GTF/GFF-to-BED converter chilling around
Fastest GTF/GFF-to-BED converter chilling around

gxf2bed The fastest G{F,T}F-to-BED converter around the block! translates chr27 gxf2bed gene 17266470 17285418 . + . gene_id "ENSG00000151743"; chr27

A Rust-based shell script to create a folder structure to use for a single class every semester. Mostly an excuse to use Rust.

A Rust Course Folder Shell Script PROJECT IN PROGRESS (Spring 2022) When completed, script will create a folder structure of the following schema: [ro

A Python package written in Rust for email verification without sending any emails.

PyRustify PyRustify is a Python package written in Rust that verifies the email addresses. Features Feature Description Syntax validation Checks if th

RBTC is cli to convert BTC to any currency and vice-versa.
RBTC is cli to convert BTC to any currency and vice-versa.

RBTC RBTC is cli to convert BTC to any currency and vice-versa. Building for source For build the binary just: $ cargo build To run as debug, just run

Traversal of tree-sitter Trees and any arbitrary tree with a TreeCursor-like interface

tree-sitter-traversal Traversal of tree-sitter Trees and any arbitrary tree with a TreeCursor-like interface. Using cursors, iteration over the tree c

Owner
null
This CLI will help you improve your typing accuracy and speed

This CLI will help you improve your typing accuracy and speed! Improve your personal bests and look back on your previous records in a graph. All in the convenience of your own terminal!

Mitchel Wijt 8 May 25, 2023
Typing accuracy practice app.

booktyping booktyping is a simple commandline tool for practicing typing accuracy while reading a book. installation booktyping has only been tested o

null 3 Dec 10, 2023
Standard Graphics is a command-line tool for printing 2D graphics from any language to any screen.

2D graphics in any programming language with just print statements!

Caleb Winston 123 Nov 20, 2022
Rust crate for interacting with the Windows Packet Filter driver.

NDISAPI-RS NDISAPI-RS is a Rust crate for interacting with the Windows Packet Filter driver. It provides an easy-to-use, safe, and efficient interface

Vadim Smirnov 6 Jun 15, 2023
Alexander Mongus is a state-of-the-art filter to sneak amogus characters in pictures

A. Mongus Go to: http://www.lortex.org/amogu/ ??? This is a client-side, Webassembly-based filter to hide amongus characters in your images. Example:

Lucas Pluvinage 3 Apr 16, 2022
Yet Another Kalman Filter Implementation. As well as Lie Theory (Lie group and algebra) on SE(3). [no_std] is supported by default.

yakf - Yet Another Kalman Filter Yet Another Kalman Filter Implementation, as well as, Lie Theory (Lie group, algebra, vector) on SO(3), SE(3), SO(2),

null 7 Dec 1, 2022
Filter, Sort & Delete Duplicate Files Recursively

Deduplicator Find, Sort, Filter & Delete duplicate files Usage Usage: deduplicator [OPTIONS] [scan_dir_path] Arguments: [scan_dir_path] Run Dedupl

Sreedev Kodichath 108 Jan 27, 2023
A tool to filter sites in a FASTA-format whole-genome pseudo-alignment

Core-SNP-filter This is a tool to filter sites (i.e. columns) in a FASTA-format whole-genome pseudo-alignment based on: Whether the site contains vari

Ryan Wick 15 Apr 2, 2023
Use any async Rust library from PHP!

php-tokio - Use any async Rust library from PHP! Created by Daniil Gentili (@danog). This library allows you to use any async rust library from PHP, a

Daniil Gentili 242 Sep 7, 2023
That program use on platform windows. And if you write any text on uncorrect keyboard layout, that program for that.

?? This program is designed to translate text into the correct layout when typing is incorrect. ?? Example ghbdtn -> привет Just (by default) pressing

Gest Se 5 Jan 26, 2023