Custom memory allocator that helps discover reads from uninitialized memory

Overview

libdiffuzz: security-oriented alternative to Memory Sanitizer

This is a drop-in replacement for OS memory allocator that can be used to detect uses of uninitialized memory. It is designed to be used in case Memory Sanitizer is not applicable for some reason, such as:

  • Your code contains inline assembly or links to proprietary libraries that cannot be instrumented by MSAN
  • You want to find vulnerabilities in black-box binaries that you do not have the source code for (not always straightforward, see below)
  • You want to check if the bug MSAN found is actually exploitable, i.e. if the uninitialized memory contents actually show up in the output
  • You're debugging code that is specific to an exotic CPU architecture or operating system where MSAN is not available, such as macOS. If you're on a really obscure platform that doesn't have a Rust compiler, a less robust C99 implementation is available.

This is not a drop-in replacement for Memory Sanitizer! It will likely require changes to your code or your testing setup, see below.

How it works

When injected into a process, this library initializes every subsequent allocated region of memory to different values. Using this library you can detect uses of uninitialized memory simply by running a certain operation twice in the same process and comparing the outputs; if they differ, then the code uses uninitialized memory somewhere.

Combine this with a fuzzer (e.g. AFL, honggfuzz) to automatically discover cases when this happens. This is called "differential fuzzing", hence the name.

Naturally, this is conditional on the same operation run twice returning the same results normally. If that is not the case in your program and you cannot make it deterministic - you're out of luck.

TL;DR: usage

  1. Clone this repository, run cargo build --release; this will build libdiffuzz.so and put it in target/release
  2. Make your code run the same operation twice in the same process and compare outputs.
  3. Run your code like this:
    • On Linux/BSD/etc: LD_PRELOAD=/path/to/libdiffuzz.so /path/to/your/binary
    • On macOS: DYLD_INSERT_LIBRARIES=/path/to/libdiffuzz.so DYLD_FORCE_FLAT_NAMESPACE=1 /path/to/your/binary
    • If you're fuzzing with AFL: AFL_PRELOAD=/path/to/libdiffuzz.so afl-fuzz ... regardless of platform. If you're not fuzzing with AFL - you should!
  4. Wait for it to crash
  5. Brag that you've used differential fuzzing to find vulnerabilities in real code

Quick start for Rust code

Note: Memory Sanitizer now works with Rust. You should probably use it instead of libdiffuzz!

If your code does not contain unsafe blocks, you don't need to do a thing! Your code is already secure!

However, if you have read from the black book and invoked the Old Ones...

  1. Clone this repository, run cargo build --release; this will build libdiffuzz.so and put it in target/release
  2. Make sure this code doesn't reliably crash when run on its own, but does crash when you run it like this: LD_PRELOAD=/path/to/libdiffuzz.so target/release/membleed
  3. If you haven't done regular fuzzing yet - do set up fuzzing with AFL. It's not that hard.
  4. In your fuzz target run the same operation twice and assert! that they produce the same result. See example fuzz target for lodepng-rust for reference. A more complicated example is also available.
  5. Add the following to your fuzz harness:
// Use the system allocator so we can substitute it with a custom one via LD_PRELOAD
use std::alloc::System;
#[global_allocator]
static GLOBAL: System = System;
  1. Run the fuzz target like this: AFL_PRELOAD=/path/to/libdiffuzz.so cargo afl fuzz ...

Auditing black-box binaries

Simply preload libdiffuzz into a binary (see "Usage" above), feed it the same input twice and compare the outputs. If they differ, it has exposes uninitialized memory in the output.

If your binary only accepts one input and then terminates, set the environment variable LIBDIFFUZZ_NONDETERMINISTIC; this will make output differ between runs. Without that variable set libdiffuzz tries to be as deterministic as possible to make its results reproducible.

If the output varies between runs under normal conditions, try forcing the binary to use just one thread and overriding any sources of randomness it has.

Limitations and future work

Stack-based uninitialized reads are not detected.

Unlike memory sanitizer, this thing will not make your program crash as soon as a read from uninitialized memory occurs. Instead, it lets you detect that it has occurred after the fact and only if the contents of uninitialized memory leak into the output. I.e. this will help you notice security vulnerabilities, but will not really aid in debugging.

Trophy case

List of previously unknown (i.e. zero-day) vulnerabilities found using this tool, to show that this whole idea is not completely bonkers:

  1. Memory disclosure in Claxon

If you find bugs using libdiffuzz, please open a PR to add it here.

See also

Valgrind, a perfectly serviceable tool to detect reads from uninitialized memory if you're willing to tolerate 20x slowdown and occasional false positives.

Dr. Memory, which claims to be an improvement over Valgrind.

MIRI, an interpreter for Rust code that detects violations of Rust's safety rules. Great for debugging but unsuitable for guided fuzzing.

libdislocator, a substitute for Address Sanitizer that also works with black-box binaries.

For background on how this project came about, see How I've found vulnerability in a popular Rust crate (and you can too).

Comments
  • Rewrite in Rust

    Rewrite in Rust

    I haven't been able to test this with uninitialized memory yet, because malloc keeps giving me zeroed buffers (even without this LD_PRELOADed). Perhaps gcc is messing with it?

    This may be better suited for a separate repo, but I'm not really interested in maintaining it long term because I'm not planning on using it personally.

    This also fixes all outstanding issues at the time of writing (#1 #2 #3).

    opened by PlasmaPower 12
  • Thread safety

    Thread safety

    The global counter inside malloc() sometimes might not be incremented in multi-threaded programs due to data races. This may result in uninitialized memory accesses in multi-threaded programs not being reliably detected.

    It can be fixed by using atomic types from C11 standard. This will require including <stdatomic.h>, adding -std=c11 to compiler flags in Makefile, and probably replacing all alloc_clobber_counter++ with atomic_fetch_add(&alloc_clobber_counter, 1) but I'm not really sure about the details.

    enhancement help wanted good first issue 
    opened by Shnatsel 1
  • Drop remaining libdislocator checks

    Drop remaining libdislocator checks

    There are at least two superfluous checks this project has inherited from libdislocator:

    1. Additional protected page is allocated at the end of each region. This crashes the binary on buffer overflows, but uses more memory than normal.
    2. There is a global counter for the total amount of memory allocated, and at some point the library refuses to allocate any more. This is useful to detect a memory leak.

    The primary reason I want to get rid of these is that I want to be sure that when a binary crashes under libdiffuzz and doesn't crash without it, it's because uninitialized memory is leaked into the output.

    There are other excellent tools to isolate buffer overflows or memory leaks (asan, libdislocator), which should be used if those are the issues you're looking for.

    good first issue 
    opened by Shnatsel 1
  • Segfault with `hashbrown`

    Segfault with `hashbrown`

    Hey @Shnatsel,

    I was trying to work out whether msan was giving me false positives when I happened upon libdiffuzz. It segfaulted immediately, but in a completely different part of the code.

    I've isolated a small reproduceable test case here that uses toml and hashbrown to trigger the segfault: https://github.com/michaelsproul/hashbrown-crash

    Have you seen segfaults like this before when using libdiffuzz? Is this a type of false positive, or is hashbrown really doing something sketchy with uninitialized memory? The fault seems to happen in an unsafe drop_in_place call, so I'm wondering whether hashbrown does contain some optimisation that assumes uninitialized memory to be 0, or something.

    The full backtrace is here for reference:

    #0  hashbrown::raw::RawIterRange<(alloc::vec::Vec<alloc::borrow::Cow<str>, alloc::alloc::Global>, alloc::vec::Vec<usize, alloc::alloc::Global>)>::new<(alloc::vec::Vec<alloc::borrow::Cow<str>, alloc::alloc::Global>, alloc::vec::Vec<usize, alloc::alloc::Global>)> (ctrl=0x7ffff7e2c0c8, data=..., len=<optimised out>) at /cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.12.0/src/raw/mod.rs:1862
    #1  hashbrown::raw::RawTable<(alloc::vec::Vec<alloc::borrow::Cow<str>, alloc::alloc::Global>, alloc::vec::Vec<usize, alloc::alloc::Global>), alloc::alloc::Global>::iter<(alloc::vec::Vec<alloc::borrow::Cow<str>, alloc::alloc::Global>, alloc::vec::Vec<usize, alloc::alloc::Global>), alloc::alloc::Global> (self=0x7fffffffdbd8) at /cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.12.0/src/raw/mod.rs:945
    #2  hashbrown::raw::RawTable<(alloc::vec::Vec<alloc::borrow::Cow<str>, alloc::alloc::Global>, alloc::vec::Vec<usize, alloc::alloc::Global>), alloc::alloc::Global>::drop_elements<(alloc::vec::Vec<alloc::borrow::Cow<str>, alloc::alloc::Global>, alloc::vec::Vec<usize, alloc::alloc::Global>), alloc::alloc::Global> (self=0x7fffffffdbd8) at /cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.12.0/src/raw/mod.rs:603
    #3  0x0000555555562ca5 in hashbrown::raw::{impl#17}::drop<(alloc::vec::Vec<alloc::borrow::Cow<str>, alloc::alloc::Global>, alloc::vec::Vec<usize, alloc::alloc::Global>), alloc::alloc::Global> (self=0x7fffffffdbd8)
        at /cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.12.0/src/raw/mod.rs:1801
    #4  core::ptr::drop_in_place<hashbrown::raw::RawTable<(alloc::vec::Vec<alloc::borrow::Cow<str>, alloc::alloc::Global>, alloc::vec::Vec<usize, alloc::alloc::Global>), alloc::alloc::Global>> ()
        at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/ptr/mod.rs:448
    #5  core::ptr::drop_in_place<hashbrown::map::HashMap<alloc::vec::Vec<alloc::borrow::Cow<str>, alloc::alloc::Global>, alloc::vec::Vec<usize, alloc::alloc::Global>, std::collections::hash::map::RandomState, alloc::alloc::Global>>
        () at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/ptr/mod.rs:448
    #6  core::ptr::drop_in_place<std::collections::hash::map::HashMap<alloc::vec::Vec<alloc::borrow::Cow<str>, alloc::alloc::Global>, alloc::vec::Vec<usize, alloc::alloc::Global>, std::collections::hash::map::RandomState>> ()
        at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/ptr/mod.rs:448
    #7  toml::de::{impl#0}::deserialize_any<hashbrown_crash::_::{impl#0}::deserialize::__Visitor> (self=0x7fffffffdc68, visitor=...) at /home/michael/.cargo/registry/src/github.com-1ecc6299db9ec823/toml-0.5.9/src/de.rs:244
    #8  toml::de::{impl#0}::deserialize_struct<hashbrown_crash::_::{impl#0}::deserialize::__Visitor> (self=0x7fffffffdc68, name=..., fields=..., visitor=...)
        at /home/michael/.cargo/registry/src/github.com-1ecc6299db9ec823/toml-0.5.9/src/de.rs:315
    #9  0x000055555555fa3e in hashbrown_crash::_::{impl#0}::deserialize<&mut toml::de::Deserializer> (__deserializer=0x7fffffffdc68) at src/main.rs:3
    #10 toml::de::from_str<hashbrown_crash::Input> (s=...) at /home/michael/.cargo/registry/src/github.com-1ecc6299db9ec823/toml-0.5.9/src/de.rs:80
    #11 0x000055555555e4cd in hashbrown_crash::main () at src/main.rs:11
    
    opened by michaelsproul 4
  • Passing options by environment variables may set them too late and is not portable

    Passing options by environment variables may set them too late and is not portable

    Currently libdiffuzz switches to non-deterministic mode after reading an environment variable from a function called from link-time "constructors" section:

    https://github.com/Shnatsel/libdiffuzz/blob/f0c7a8f3b27df24d389d9e003ebda01ad89eb1cf/src/lib.rs#L31-L33

    This is not a great idea for two reasons:

    1. This is not portable. This is already taking different codepaths depending on whether it's on Linux/BSD or macOS. Windows is currently not supported. What's worse, there is no way to tell if this actually works on your platform or not!
    2. This may kick in too late and miss initializing some heap-allocated memory in other libraries with similar hooks, so libdiffuzz will fail to expose some errors.
    bug help wanted 
    opened by Shnatsel 1
  • Support #![no_std]

    Support #![no_std]

    libdiffuzz doesn't make much use of the standard library. It can probably be switched to the corresponding libcore primitives and compiled in #![no_std] mode.

    Among other things, this will reduce the size of the generated binary and may allow cross-compilation to the more obscure architectures.

    enhancement 
    opened by Shnatsel 1
  • Detect out-of-bounds reads

    Detect out-of-bounds reads

    It would be nice to be able to detect out-of-bounds reads as well. This is actually pretty easy to implement - just allocate more memory than was requested and clobber it with the same variable value as the rest of the buffer. If any of the clobbered values show up in the output, then the program is definitely exploitable - either via reads from uninitialized memory or via out-of-bounds reads.

    Use case: I needed this functionality to determine whether https://github.com/sile/libflate/issues/16 is exploitable or not.

    I have already implemented checks for out-of-bounds reads to the right of the buffer in branch detect-oob-reads, but the ones to the left are still TODO - there's just a static canary there that's inherited from libdislocator.

    enhancement 
    opened by Shnatsel 4
Owner
Sergey "Shnatsel" Davidoff
Feel free to contact me about Rust jobs.
Sergey
A rust library for sharing and updating arbitrary slices between threads, optimized for wait-free reads

atomicslice A Rust library for thread-safe shared slices that are just about as fast as possible to read while also being writable. Overview Use Atomi

Tim Straubinger 5 Dec 6, 2023
The Heros NFT Marketplace Boilerplate project is designed to let users fork, customize, and deploy their own nft marketplace app to a custom domain, ultra fast.

Heros NFT on Solana The Heros NFT Marketplace Boilerplate project is designed to let users fork, customize, and deploy their own nft marketplace app t

nightfury 6 Jun 6, 2022
Memory hacking library for windows.

Memory hacking library for windows.

sy1ntexx 40 Jan 3, 2023
Using fibers to run in-memory code in a different and stealthy way.

Description A fiber is a unit of execution that must be manually scheduled by the application rather than rely on the priority-based scheduling mechan

Kurosh Dabbagh Escalante 121 Apr 20, 2023
Custom memory allocator that helps discover reads from uninitialized memory

libdiffuzz: security-oriented alternative to Memory Sanitizer This is a drop-in replacement for OS memory allocator that can be used to detect uses of

Sergey 155 Dec 3, 2022
⚙️ Crate to discover embedded programming with uno r3 project

⚙️ Crate to discover embedded programming with uno r3 project

null 0 Feb 3, 2022
Discover GitHub token scope permission and return you an easy interface for checking token permission before querying GitHub.

github-scopes-rs Discover GitHub token scope permission and return you an easy interface for checking token permission before querying GitHub. In many

null 8 Sep 15, 2022
SHA256 sentence: discover a SHA256 checksum that matches a sentence's description of hex digit words.

SHA256 sentence "The SHA256 for this sentence begins with: one, eight, two, a, seven, c and nine." Inspired by @lauriewired post Inspired by @humbleha

Joel Parker Henderson 16 Oct 9, 2023
A simple allocator written in Rust that manages memory in fixed-size chunks.

Simple Chunk Allocator A simple no_std allocator written in Rust that manages memory in fixed-size chunks/blocks. Useful for basic no_std binaries whe

Philipp Schuster 7 Aug 8, 2022
🌋 A very lightweight wrapper around the Vulkan Memory Allocator 🦀

?? vk-mem-alloc-rs A very lightweight wrapper around the Vulkan Memory Allocator ?? [dependencies] vk-mem-alloc = "0.1.1" Simple Vulkan Memory Allocat

Project KML 13 Nov 8, 2022
General purpose memory allocator written in Rust.

Memalloc Memory allocator written in Rust. It implements std::alloc::Allocator and std::alloc::GlobalAlloc traits. All memory is requested from the ke

Antonio Sarosi 35 Dec 25, 2022
Executable memory allocator with support for dual mapping and W^X protection

jit-allocator A simple memory allocator for executable code. Use JitAllocator type to allocate/release memory and virtual_memory module functions to e

playX 5 Jul 5, 2023
Decode SCALE bytes into custom types using a scale-info type registry and a custom Visitor impl.

scale-decode This crate attempts to simplify the process of decoding SCALE encoded bytes into a custom data structure given a type registry (from scal

Parity Technologies 6 Sep 20, 2022
Catch Tailwindcss Errors at Compile-Time Before They Catch You, without making any change to your code! Supports overriding, extending, custom classes, custom modifiers, Plugins and many more 🚀🔥🦀

twust Twust is a powerful static checker in rust for TailwindCSS class names at compile-time. Table of Contents Overview Installation Usage Statement

null 15 Nov 8, 2023
Single-reader, multi-writer & single-reader, multi-verifier; broadcasts reads to multiple writeable destinations in parallel

Bus Writer This Rust crate provides a generic single-reader, multi-writer, with support for callbacks for monitoring progress. It also provides a gene

Pop!_OS 26 Feb 7, 2022
Exploration of using Storage instead of Allocator to parameterize collections in Rust

storage-poc aims at exploring the usage of custom Storages, rather than custom Allocators. Goals This is a Proof-of-Concept aiming at: Demonstrating t

null 106 Dec 8, 2022
A Rust crate that reads and writes tfrecord files

tfrecord-rust The crate provides the functionality to serialize and deserialize TFRecord data format from TensorFlow. Features Provide both high level

null 22 Nov 3, 2022
This app reads a csv file and sends an email with a formatted Handlebars file.

Bulkmail This app reads a csv file and sends an email with a formatted Handlebars file. This can be run on Linux for AMD64 and ARMv7. Upstream Links D

Giovanni Bassi 17 Nov 3, 2022
Arena allocator with scopes

Scoped-Arena Scoped-Arena provides arena allocator with explicit scopes. Arena allocation Arena allocators are simple and provides ludicrously fast al

Zakarum 37 Dec 6, 2022
Midnote is a terminal application that reads a MIDI file and displays you its notes bar-by-bar, while playing it.

MIDNOTE Midnote is a terminal application that reads a MIDI file and displays you its notes bar-by-bar, while playing it. Goals As a blind musician my

null 4 Oct 30, 2022