fuzzcheck
Fuzzcheck is a structure-aware, in-process, coverage-guided, evolutionary fuzzing engine for Rust functions.
Given a function test: (T) -> bool
, you can use fuzzcheck to find a value of type T
that fails the test or leads to a crash.
Fuzzcheck works by maintaining a pool of test inputs and ranking them using the uniqueness of the code coverage caused by running test(input)
. From that pool, it selects a high-ranking input, mutates it, and runs the test function again. If the new mutated input has an interesting code coverage then it is added to the pool, otherwise, fuzzcheck tries again with a different input and mutation.
In pseudocode:
loop {
let input = pool.select();
mutate(&mut input);
let analysis = analyze(test, &input);
match analysis {
Failed => reportFailure(input),
Interesting(score) => pool.add(input, score),
NotInteresting => continue
}
}
Fuzzcheck is unique because, unlike other coverage-guided fuzzing engines, it doesn't work with bitstrings but instead works with values of any type T
directly. The complexity of the inputs and the way to mutate them is given by functions defined by the user.
Usage
The first step is to install the cargo-fuzzcheck
executable using cargo nightly.
cargo +nightly install cargo-fuzzcheck
Then, somewhere else, create a new cargo crate. It will contain the library code that you want to fuzz-test. Also do not forget to set the rust version to nightly.
cargo new --lib my_library
cd my_library
rustup override set nightly
Then, run cargo fuzzcheck init
to initialize a fuzz
folder that will contain all future fuzz tests.
cargo fuzzcheck init
A sample test function was created at fuzz/instrumented/src/lib.rs
. The generated file contains an example of a custom data type being fuzzed. The derive proc_macro DefaultMutator
automatically generates a mutator that you can give to fuzzcheck.
use decent_serde_json_alternative::{FromJson, ToJson};
use fuzzcheck_mutators::DefaultMutator;
#[derive(Clone, FromJson, ToJson, DefaultMutator)]
pub enum SampleEnum<T> {
A(u8),
B { x: bool , y: Option<T> },
C,
D
}
#[derive(Clone, FromJson, ToJson, DefaultMutator)]
pub struct SampleStruct<A, B, C> {
a: A,
b: Vec<B>,
c: C
}
The file also contains a sample test function:
pub fn test(input: &[SampleStruct<u8, Option<u8>, SampleEnum<u8>>]) -> bool {
if
input.len() > 5 &&
input[0].a == 0 &&
input[0].b == vec![Some(2), None, Some(187)] &&
matches!(input[0].c, SampleEnum::C) &&
input[1].a == 189 &&
input[1].b.len() > 5 &&
// etc.
{
false
}
else {
true
}
}
And an executable script was created at fuzz/non_instrumented/fuzz_targets/target1.rs
. It launches the fuzzing engine on the above test function using the default mutator.
/* Various import statements not included in this example */
fn main() {
// a vector mutator that uses the default mutator generated by the derive proc_macro
let mutator = Vec::<SampleStruct<u8, Option<u8>, SampleEnum<u8>>>::default_mutator();
// a lightweight serializer is used to save the interesting inputs to the file system,
// you can also use a serde-based serializer if you prefer
let serializer = JsonSerializer::default();
// launch fuzzcheck on the test function
let _ = fuzzcheck::launch(test, mutator, serializer);
}
You can already try launching this test:
cargo fuzzcheck run target1 fuzz
This starts a loop that will stop when a failing test has been found.
A line will be printed whenever a newsworthy event happened, along with some statistics. For example:
NEW 1004619 score: 34.30 pool: 29 exec/s: 1832012 cplx: 110.55
NEW
means that a new input was added to the pool of interesting inputs1,004,619
is the number of iterations that were performed so farscore: 34.30
is a measure of the total code coverage caused by all inputs in the poolpool: 29
is the number of inputs in the poolexec/s: 1,832,012
is the average number of iterations performed every secondcplx: 128.94
is the average complexity of the inputs in the pool
When a failing test has been found, the following is printed:
================ TEST FAILED ================
2828995 score: 57.80 pool: 50 exec/s: 1745441 cplx: 141.34
Saving at "fuzz/artifacts/target1/2a71aa786f3e552b.json"
Here, the path to the artifact file is fuzz/artifacts/target1/2a71aa786f3e552b.json
. It contains the JSON-encoded input that failed the test.
[
{
"a": 0,
"b": [
2,
null,
187
],
"c": {
"kind": "C"
}
},
{
"a": 189,
"b": [
89,
null,
213,
189,
null,
32
],
"c": {
"kind": "A",
"payload": {
"0": 7
}
}
},
...
]
Moreover, the fuzzer can maintain a copy of its input pool in the file system, which is located by default at fuzz/corpora/<target>
. Fuzzing corpora are useful to kick-start a fuzzing process by providing a list of known interesting inputs. If you try to run the fuzzer again, you will see that it finds the problematic input instantly. This is because it first read the values written inside the fuzz corpus and used them as starting points.
Structure of the fuzz folder
The fuzz folder is a bit difficult to understand, because fuzzcheck needs to compile your crate and the fuzz test in two different ways. This is why it contains an instrumented
and a non-instrumented
folder.
The instrumented
folder contains all the test functions and their helper functions. It can use your library as a dependency but not fuzzcheck
or non_instrumented
. Every piece of code written there will be instrumented such that its code coverage can be recorded.
The non-instrumented
folder contains the code that launches the fuzzer (called the fuzz_targets
). It uses your library, fuzzcheck
, and instrumented
as dependencies. The code there is not instrumented.
.
├── Cargo.toml
├── fuzz # everything inside `fuzz` is to be used by fuzzcheck
│ ├── fuzzcheck.toml # fuzzcheck configuration file
│ ├── instrumented # a crate that contains the test functions
│ │ ├── Cargo.toml
│ │ └── src
│ │ └── lib.rs
│ └── non_instrumented # a crate to compile fuzzcheck and launch the fuzz tests
│ ├── build.rs
│ ├── Cargo.toml
│ └── fuzz_targets
│ ├── target1.rs # a fuzz-test
│ └── target2.rs # another fuzz-test
└── src
└── lib.rs # your library code
Note that if instrumented
and non_instrumented
both depend on a common crate A
, then that crate will be compiled twice and the two versions of it will live in the resulting binary. These two versions will have different, incompatible versions of the types and traits defined by A
.
Minifying failing test inputs
Fuzzcheck can also be used to minify a large input that fails a test.
Let's say you have a file crash.json
containing an input that you would like to minify:
[
{
"a": 0,
"b": [2, null, 187],
"c": 9
},
{
"a": 189,
"b": [89, null, 213, 189, null, 32],
"c": 19
},
...
]
Launch cargo-fuzzcheck run
on your target with the tmin
command and an --input-file
option.
cargo fuzzcheck run target1 tmin --input-file "artifacts/crash.json"
This will repeatedly launch the fuzzer in “minify” mode and save the artifacts in the folder artifacts/crash.minified
. The name of each artifact will be prefixed with the complexity of its input. For example, crash.minified/800--fe958d4f003bd4f5.json
has a complexity of 8.00
.
You can stop the minifying fuzzer at any point and look for the least complex input in the crash.minified
folder.
Creating a Mutator
If you would like to fuzz-test your own custom type T
without the DefaultMutator
derive attribute, you will have to create a type that conforms to the Mutator<T>
trait.
This trait can be a bit difficult to implement, but it is very powerful and it is possible to write efficient and composable mutators with it. For example, fuzzcheck implements U8Mutator
(u8), OptionMutator
(Option), and VecMutator
(Vec). They compose such that it possible to use a VecMutator<VecMutator<OptionMutator<U8Mutator>>>
to fuzz values of type Vec<Vec<Option<u8>>>
. A number of other mutators are already defined: TupleMutator
for all types that have a tuple-like structure as well as EnumNPayloadMutator
for all types that have an enum-like structure containing N
items that have associated data.
Furthermore, the fuzzcheck_mutators_derive
crate offers a proc_macro that can choose a default mutator for user-defined types automatically.
I would like to write a guide to fuzzcheck to explain the trait and how to work with it. But in the meantime, if you have questions, please send me an email or create an issue on GitHub.
My goal is to write more mutators for common types and building blocks for composability such that a custom implementation of Mutator<T>
is rarely needed.
Configuration and Instrumentation
The file located at fuzz/fuzzcheck.toml
can be used to configure fuzzcheck. It can be used to:
- change the type of coverage instrumentation that is used to guide the fuzzer
- choose default arguments to the
cargo fuzzcheck
command - change how the instrumented library is compiled
All settings can be configured globally or per fuzz target.
For example, you can disable the trace-compares
coverage instrumentation globally by setting:
[default]
trace-compares = false
This will significantly speed up the fuzzing iteration speed and reduce the size of the internal pool of inputs, but wil miss code coverage information, which may negatively affect the overall performance of the fuzzer.
If you would like to enable stack depth tracing for a specific fuzz target, which can be used to find stack overflows, you can set:
[targets.my_target]
stack_depth = true
Note that stack depth tracing is currently experimental.
You can change the default command line arguments of cargo-fuzzcheck
as follows:
[default]
# by default, do not generate inputs of complexity more than 4096
max_cplx = 4096
[targets.target1]
# but for target1, reduce the default maximum complexity to 256
max_cplx = 256
# and use a custom folder to save the artifacts
artifacts = "fuzz/different_artifacts_folder"
This will allow you to simply run:
cargo fuzzcheck run target1 fuzz
instead of:
cargo fuzzcheck run target1 fuzz -max-cplx 256 -artifacts "fuzz/different_artifacts_folder"
Finally, you can specify which features of the (non-)instrumented library you would like to enable via the instrumented_default_features
(bool) and instrumented_features
(vector of strings) settings.
Previous work on fuzzing engines
As far as I know, evolutionary, coverage-guided fuzzing engines were popularized by American Fuzzy Lop (AFL).
Fuzzcheck is also evolutionary and coverage-guided.
Later on, LLVM released its own fuzzing engine, libFuzzer, which is based on the same ideas as AFL, but it uses Clang’s SanitizerCoverage and is in-process (it lives in the same process as the program being fuzz-tested.
Fuzzcheck is also in-process and also uses SanitizerCoverage.
Both AFL and libFuzzer work by manipulating bitstrings (e.g. 1011101011
). However, many programs work on structured data, and mutations at the bitstring level may not map to meaningful mutations at the level of the structured data. This problem can be partially addressed by using a compact binary encoding such as protobuf and providing custom mutation functions to libFuzzer that work on the structured data itself. This is a way to perform “structure-aware fuzzing” (talk, tutorial).
An alternative way to deal with structured data is to use generators just like QuickCheck’s Arbitrary
trait. And then to “treat the raw byte buffer input provided by the coverage-guided fuzzer as a sequence of random values and implement a “random” number generator around it.” (cited blog post by @fitzgen). The tool cargo-fuzz
has recently implemented that approach.
Fuzzcheck is also structure-aware, but unlike previous attempts at structure-aware fuzzing, it doesn't use an intermediary binary encoding such as protobuf nor does it use Quickcheck-like generators. Instead, it directly mutates the typed values in-process. This is better many ways. First, it is faster because there is no need to encode and decode inputs at each iteration. Second, the complexity of the input is given by a user-defined function, which will be more accurate than counting the bytes of the protobuf encoding. Finally, and most importantly, the mutations are faster and more meaningful than those done on protobuf or Arbitrary
’s byte buffer-based RNG. A detail that I particularly like about fuzzcheck, and that is possible only because it mutates typed values, is that every mutation is done in-place and is reversable. That means that generating a new test case is super fast, and can often even be done with zero allocations.
As I was developing Fuzzcheck for Swift, a few researchers developed Fuzzchick for Coq (paper). It is a coverage-guided property-based testing tool implemented as an extension to Quickchick. As far as I know, it is the only other tool with the same philosophy as fuzzcheck. The similarity between the names fuzzcheck
and Fuzzchick
is a coincidence.