The trait for generating structured data from arbitrary, unstructured input.

Related tags

Testing arbitrary
Overview

Arbitrary

The trait for generating structured data from arbitrary, unstructured input.

GitHub Actions Status

About

The Arbitrary crate lets you construct arbitrary instances of a type.

This crate is primarily intended to be combined with a fuzzer like libFuzzer and cargo-fuzz or AFL, and to help you turn the raw, untyped byte buffers that they produce into well-typed, valid, structured values. This allows you to combine structure-aware test case generation with coverage-guided, mutation-based fuzzers.

Documentation

Read the API documentation on docs.rs!

Example

Say you're writing a color conversion library, and you have an Rgb struct to represent RGB colors. You might want to implement Arbitrary for Rgb so that you could take arbitrary Rgb instances in a test function that asserts some property (for example, asserting that RGB converted to HSL and converted back to RGB always ends up exactly where we started).

Automatically Deriving Arbitrary

Automatically deriving the Arbitrary trait is the recommended way to implement Arbitrary for your types.

Automatically deriving Arbitrary requires you to enable the "derive" cargo feature:

# Cargo.toml

[dependencies]
arbitrary = { version = "1", features = ["derive"] }

And then you can simply add #[derive(Arbitrary)] annotations to your types:

// rgb.rs

use arbitrary::Arbitrary;

#[derive(Arbitrary)]
pub struct Rgb {
    pub r: u8,
    pub g: u8,
    pub b: u8,
}

Implementing Arbitrary By Hand

Alternatively, you can write an Arbitrary implementation by hand:

// rgb.rs

use arbitrary::{Arbitrary, Result, Unstructured};

#[derive(Copy, Clone, Debug)]
pub struct Rgb {
    pub r: u8,
    pub g: u8,
    pub b: u8,
}

impl<'a> Arbitrary<'a> for Rgb {
    fn arbitrary(u: &mut Unstructured<'a>) -> Result<Self> {
        let r = u8::arbitrary(u)?;
        let g = u8::arbitrary(u)?;
        let b = u8::arbitrary(u)?;
        Ok(Rgb { r, g, b })
    }
}

License

Licensed under dual MIT or Apache-2.0 at your choice.

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this project by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Comments
  • Arbitrary impl for char is slow

    Arbitrary impl for char is slow

    The current implementation, when it hits an invalid character, loops downward by one bit each time. This is slow, all of the invalid char values are between 0xD800 and 0xDFFF, or greater than 0x10ffff (already dealt with with the mask)

    This means that we spend a lot of time just looping here, especially when creating strings. We should instead just replace these with null or decrement the fourth nibble.

    opened by Manishearth 16
  • Add lifetime parameter to Arbitrary trait. Remove shrinking functionality. Implement Arbitrary for &str.

    Add lifetime parameter to Arbitrary trait. Remove shrinking functionality. Implement Arbitrary for &str.

    Fixes https://github.com/rust-fuzz/arbitrary/issues/43

    Note: This merges into a new branch staging-1.0 where we can put our changes prepping for a 1.0 release

    To-do:

    • [x] Remove Shrinkable
    • [x] Fix Cow lifetime
    • [x] Remove shrinking from README
    • [x] Save shrinking code to a gist
    opened by frewsxcv 14
  • Tracking issue for 0.3.0 release

    Tracking issue for 0.3.0 release

    So we have some nice improvements to this crate, and I think everything has technically been compatible with a 0.2.X release but I think it is a good time to revisit some of the public API and trait designs and make a 0.3.0 release. I figured it would be good to open an issue to talk about what we might want in it.

    TODO

    • [x] Fill out public API doc comments, and add examples to each of them
    • [x] Better Arbitrary for String implementation (#17)
    • [x] Should Arbitrary have Debug as a super trait? (#7)
    • [x] Should Arbitrary have Clone as a super trait? This would allow easier/better shrinking implementations (e.g. for various collections)
    • [x] Refactor Unstructured to look more like FuzzedDataProvider? In particular, maybe we should get lengths from the end of the buffer like that does.
    • [x] Add Arbitrary::arbitrary_take_rest methods to consume the rest of an Unstructured. (#18)
    • [x] Add Arbitrary::size_hint and refactor Unstructured::container_size into Unstructured::collection_len, as discussed in #18.
    • [x] Add a fn arbitrary<A: Arbitrary>(&mut self) -> Result<A, Self::Error> helper method to Unstructured to shorten invoking nested-arbitrary calls, so you can just do let foo = u.arbitrary()?; and let type inference figure things out for you?

    As we surface more things to be done for 0.3.0 in the comments, I'll add them to this list, so that we have a single place for everything.


    +cc @Manishearth: what do you think? Anything we should add to or remove from this list?

    opened by fitzgen 13
  • SIGSEGV on certain values

    SIGSEGV on certain values

    I think this is an issue with arbitrary, but let me know where I should move it if not.

    I'm observing an issue where derived arbitrary types trigger a SIGSEGV when running cargo-fuzz. Here's a minimal example:

    #![no_main]
    
    use libfuzzer_sys::fuzz_target;
    use arbitrary::Arbitrary;
    
    #[derive(Arbitrary, Clone, Debug, PartialEq)]
    pub enum Op {
        A(A),
        B(B),
    }
    
    #[derive(Arbitrary, Clone, Debug, PartialEq)]
    pub enum A {
        Leaf,
        B(Box<B>),
    }
    
    #[derive(Arbitrary, Clone, Debug, PartialEq)]
    pub enum B {
        Leaf,
        A(Box<A>),
    }
    
    fuzz_target!(|ops: Vec<Op>| {
    });
    

    Running this on the latest nightly (both mac and linux) triggers a SIGSEGV:

    INFO: Seed: 3703294504
    INFO: Loaded 1 modules   (17415 inline 8-bit counters): 17415 [0x10f5ef708, 0x10f5f3b0f), 
    INFO: Loaded 1 PC tables (17415 PCs): 17415 [0x10f5f3b10,0x10f637b80), 
    INFO:       59 files found in /Users/yusuf/Desktop/sandbox/fuzz-bug/fuzz/corpus/fuzz_target_1
    INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
    ────────────────────────────────────────────────────────────────────────────────
    
    Error: Fuzz target exited with signal: 11
    

    Any of these changes will remove the segfault:

    1. Changing the fuzz target's input from Vec<Op> to just Op.
    2. Removing the B enum variant from A
    3. Removing the A enum variant from B
    opened by ysimonson 12
  • Allow custom arbitrary methods for fields in the custom derive

    Allow custom arbitrary methods for fields in the custom derive

    Sometimes a field of a struct doesn't implement arbitrary and it is either impossible to do (because it is from another crate, for example) or undesired.

    We should support some kind of attribute to either use the Default::default implementation for the type, when we "don't care" about that field, or supply an arbitrary function that has the same signature as Arbitrary::arbitrary to be used instead.

    This is all possible to do when implementing Arbitrary by hand, in the sense that you can always stuff whatever code inside the impl. But this seems common enough and simple enough that we may want to support it in the custom derive.

    Straw person syntax:

    #[derive(Arbitrary)]
    struct MyThing {
        // This doesn't actually matter, just a diagnostic message, so don't
        // waste input bytes on it.
        #[arbitrary(default)]
        diagnostic: String,
    
        // This is a type defined by a foreign crate and doesn't implement
        // Arbitrary. Because of orphan rules, we can't implement Arbitrary
        // for it, so instead we have a one-off function.
        #[arbitrary(arbitrary = arbitrary_foreign_type)]
        foreign: foreign_crate::ForeignType,
    }
    
    fn arbitrary_foreign_type(u: &mut Unstructured)
        -> Result<foreign_crate::ForeignType>
    {
        // ...
    }
    

    The open question in my mind is how to deal with the other, optional Arbtirary methods? e.g. arbitrary_take_rest, shrink, and size_hint. Should we assume the default implementations in this case? Should we also allow providing custom implementations for them?

    Any thoughts @Manishearth?

    +cc @bnjbvr

    enhancement 
    opened by fitzgen 10
  • Support customization of fields on derive

    Support customization of fields on derive

    This PR addresses https://github.com/rust-fuzz/arbitrary/issues/33 and enables the following syntax for derive:

    #[derive(Arbitrary)]
    pub struct Rgb {
        // set `r` to Default::default()
        #[arbitrary(default)]
        pub r: u8,
    
        // set `g` to 255
        #[arbitrary(value = 255)]
        pub g: u8,
    
        // generate `b` with a custom function
        #[arbitrary(with = arbitrary_b)]
        pub b: u8,
    }
    
    fn arbitrary_b(u: &mut Unstructured) -> arbitrary::Result<u8> {
        u.int_in_range(64..=128)
    }
    

    Note

    In case of custom function the lower bound for hint_size is defined through core::mem::size_of::<T>() which is correct for the types allocated on stack, but not for the heap allocated types.

    opened by greyblake 8
  • Use fewer input bytes for arbitrary_loop

    Use fewer input bytes for arbitrary_loop

    Asking for an arbitrary bool to decide whether the loop should keep going consumes a byte per loop iteration, while calling int_in_range instead consumes at most four bytes for any combination of arguments, and often less.

    I'm trying to develop an intuition for what helps or hinders libFuzzer when driving a fuzz target that uses arbitrary, but I don't know enough. Do you suppose this is likely to work better? Do you have any advice on how to reason about questions like this?

    I've been thinking about a bounded_arbitrary_len (or arbitrary_bounded_len? arbitrary_len_in_range?) that takes optional min/max bounds like arbitrary_loop does, but takes bytes from the end like arbitrary_len does. That similarly could consume fewer bytes than just calling arbitrary_len and clamping the result. Is that likely to help a fuzzer explore the state space more effectively?

    opened by jameysharp 7
  • Questions about `Arbitrary`

    Questions about `Arbitrary`

    Hi! A few notes to preface issue:

    • I asked @Manishearth this question a few weeks back and said that I should open an issue.
    • This is not meant to be judgement of the decisions made in this crate, especially since I am not aware of the constraints that y'all might have.

    The question: why did y'all opt for a QuickCheck-style, trait-based approach over a Hypothesis/Proptest-style, explicit shrinking object/struct approach? For context, a Hypothesis/Proptest-style checker would have a shrinking strategy that returns a tree. This tree would values and shrinking/expansion strategies. I've noticed that the biggest win of Hypothesis/Proptest-style approaches is that shrinkers/generators can be implemented for completely arbitrary (no pun intended!) types so that orphan rules don't apply to generated code.

    That being said, I suspect that fuzzers might have different constraints than property testing, so this entire question could be taken with a grain of salt.

    If I recall correctly, @fitzgen mentioned to me that they preferred Quickcheck to Proptest-style systems at Strangeloop.

    opened by davidbarsky 7
  • fix(Unstructured)!: don't produce meaningless data if exhausted

    fix(Unstructured)!: don't produce meaningless data if exhausted

    If the Unstructured is completely exhausted, returning a buffer completely filled with zeroes instead of indicating failure does not make sense. It also causes issues for any users of Arbitrary implementations for integers (which are based on this method) who rely on eventually getting a nonzero result to e.g. break recursion.

    Fixes #107.

    opened by Xiretza 6
  • Add support for arbritrary arrays

    Add support for arbritrary arrays

    T: Default would remove all unsafes but it isn't currently possible for [T; N] and this bound limitation isn't desirable for stuff that don't implement Default.

    If https://github.com/rust-lang/rust/pull/75644 is going to be accepted, then the auxiliary functions won't be necessary in the near future.

    opened by c410-f3r 6
  • "Fill all available space" solution for container size

    Broadly, there are two kinds of things Arbitrary generates. Fixed-size objects that map to a fixed set of integers, and variable size objects, which can use something like container_size().

    Thing is, unlike, say, quickcheck, we already have a finite number of bytes as input. It would be nice if Arbitrary could be told "hey, I only have X bytes of data and you're my last consumer" as opposed to "slurp as much data as you want, I'll split the difference amongst the next consumers!"

    This seems pretty tricky to design in the general case, but we could initially settle for types like String and Vec just filling directly.

    opened by Manishearth 6
  • `arbitrary` v1.2.0 + `derive` doesn't build with `generate-lockfile -Z minimal-versions`

    `arbitrary` v1.2.0 + `derive` doesn't build with `generate-lockfile -Z minimal-versions`

    It pulls in derive_arbitrary v1.1.6 which has the following build failure:

    error[E0599]: no method named `len` found for reference `&Fields` in the current scope
       --> /home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/derive_arbitrary-1.1.6/src/lib.rs:219:30
        |
    219 |         if idx + 1 == fields.len() {
        |                              ^^^ method not found in `&Fields`
    

    Perhaps you could bump the derive_arbitrary requirement in arbitrary to 1.2.0 (it's presently 1.1.6) or yank the affected derive_arbitrary v1.1.6 release?

    opened by tarcieri 0
  • Is there anyway to convert an arbitrary struct into unstructured raw bytes

    Is there anyway to convert an arbitrary struct into unstructured raw bytes

    I am new to arbitrary and cargo-fuzz. I have some seed corpus in json and I need to describe them in raw bytes, so that my fuzzer can convert them to a struct derived arbitrary.

    opened by last-las 0
  • Arbitrary derive fails with lifetimes in some cases

    Arbitrary derive fails with lifetimes in some cases

    Hello! Bear with me as I am new to Arbitrary.

    I have this type from juniper:

    pub struct Arguments<'a, S> {
        pub items: Vec<(Spanning<&'a str>, Spanning<InputValue<S>>)>,
    }
    

    Spanning and InputValue successfully derive(arbitrary::Arbitrary)). When I try to derive on Arguments, I get the following error:

    error[E0495]: cannot infer an appropriate lifetime for lifetime parameter `'a` due to conflicting requirements
      --> juniper/src/ast.rs:59:42
       |
    59 | #[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
       |                                          ^^^^^^^^^^^^^^^^^^^^
       |
    note: first, the lifetime cannot outlive the lifetime `'a` as defined here...
      --> juniper/src/ast.rs:60:22
       |
    60 | pub struct Arguments<'a, S> {
       |                      ^^
    note: ...so that the types are compatible
      --> juniper/src/ast.rs:59:42
       |
    59 | #[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
       |                                          ^^^^^^^^^^^^^^^^^^^^
       = note: expected `<&'a str as Arbitrary<'_>>`
                  found `<&str as Arbitrary<'_>>`
    note: but, the lifetime must be valid for the lifetime `'arbitrary` as defined here...
      --> juniper/src/ast.rs:59:42
       |
    59 | #[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
       |                                          ^^^^^^^^^^^^^^^^^^^^
    note: ...so that the types are compatible
    

    Manually implenting it works:

    impl<'a, S> arbitrary::Arbitrary<'a> for Arguments<'a, S>
    where
        S: arbitrary::Arbitrary<'a>,
    {
        fn arbitrary(u: &mut arbitrary::Unstructured<'a>) -> arbitrary::Result<Self> {
            let items: Vec<(Spanning<&'a str>, Spanning<InputValue<S>>)> = u.arbitrary()?;
            Ok(Self { items })
        }
    }
    

    Even though I am unblocked, I figured I'd file as one would expect the derive to work in this case.

    opened by LegNeato 3
  • Allow `Unstructured` to be backed by an iterator of bytes instead of a byte slice?

    Allow `Unstructured` to be backed by an iterator of bytes instead of a byte slice?

    In our use case, we are not using Arbitrary for fuzzing, but simply for creating arbitrary fixture values in tests. Currently we create a 10MB static Vec<u8> of random noise and use that as our Unstructured data. However this is annoying since it requires 10MB of memory overhead, and sometimes even then we run out of bytes.

    I am wondering if it would be valid to have two flavors of Unstructured, one backed by a byte slice, presumably for fuzzing, and one backed by an infinite iterator of bytes which can never be exhausted. I experimented with a PR for doing this and got some basic tests passing, but don't know how valid it is in the grand scheme. I did find that some functionality is indeed dependent on the fixed byte slice, so at the very least some extra UX effort would have to be made to provide slightly different interfaces for different Unstructured flavors.

    I understand if this wouldn't be worth the effort but I guess I am primarily wondering from a motivational standpoint why Unstructured is backed by a byte array instead of an iterator, and only secondarily asking for some feedback on the feasibility of using infinite iterators. Note: I know next to nothing about fuzzing.

    opened by maackle 4
  • WIP: Add `dearbitrary` functionality to turn an instance into its arbitrary byte sequence

    WIP: Add `dearbitrary` functionality to turn an instance into its arbitrary byte sequence

    This PR solves issue #44 by implementing a dearbitrary function to create a byte buf, which can be used again with arbitrary to recreate the struct. It is not a one to one mapping, as e.g. bools only use the last bit of a byte value.

    What works:

    Many parts are missing, as I do not need them at the moment and have no time to implement them...

    Remark: commits are prone to change/squashing/reordering!!

    opened by bitwave 4
A library for generating fake data in Rust.

Fake A Rust library for generating fake data. Installation Default (rand is required): [dependencies] fake = "2.4" rand = "0.8" If you want to use #[d

cksac 552 Dec 25, 2022
A minimalist property-based testing library based on the arbitrary crate.

A minimalist property-based testing library based on the arbitrary crate.

Aleksey Kladov 61 Dec 21, 2022
Hopper is a tool for generating fuzzing test cases for libraries automatically using interpretative fuzzing.

Hopper Hopper is an tool for generating fuzzing test cases for libraries automatically using interpretative fuzzing. It transforms the problem of libr

FuzzAnything 118 Nov 15, 2023
Hopper is a tool for generating fuzzing test cases for libraries automatically using interpretative fuzzing.

Hopper Hopper is an tool for generating fuzzing test cases for libraries automatically using interpretative fuzzing. It transforms the problem of libr

FuzzAnything 124 Nov 24, 2023
QuickCheck bug hunting in Rust standard library data structures

BugHunt, Rust This project is aiming to provide "stateful" QuickCheck models for Rust's standard library. That is, we build up a random list of operat

Brian L. Troutwine 161 Dec 15, 2022
Easy c̵̰͠r̵̛̠ö̴̪s̶̩̒s̵̭̀-t̶̲͝h̶̯̚r̵̺͐e̷̖̽ḁ̴̍d̶̖̔ ȓ̵͙ė̶͎ḟ̴͙e̸̖͛r̶̖͗ë̶̱́ṉ̵̒ĉ̷̥e̷͚̍ s̷̹͌h̷̲̉a̵̭͋r̷̫̊ḭ̵̊n̷̬͂g̵̦̃ f̶̻̊ơ̵̜ṟ̸̈́ R̵̞̋ù̵̺s̷̖̅ţ̸͗!̸̼͋

Rust S̵̓i̸̓n̵̉ I̴n̴f̶e̸r̵n̷a̴l mutability! Howdy, friendly Rust developer! Ever had a value get m̵̯̅ð̶͊v̴̮̾ê̴̼͘d away right under your nose just when

null 294 Dec 23, 2022
Lane-Associated Vector (LAV): Portable SIMD vector trait as GAT of SIMD lane trait.

lav Lane-Associated Vector (LAV): Portable SIMD vector trait as GAT of SIMD lane trait. NOTE: This crate requires nightly Rust. Provides SIMD lane tra

Qu1x 2 Sep 23, 2022
A lending iterator trait based on generic associated types and higher-rank trait bounds

A lending iterator trait based on higher-rank trait bounds (HRTBs) A lending iterator is an iterator which lends mutable borrows to the items it retur

Sebastiano Vigna 6 Oct 23, 2023
The Arbitrary trait

Arbitrary The trait for generating structured data from arbitrary, unstructured input. About The Arbitrary crate lets you construct arbitrary instance

Rust Fuzzing Authority 407 Dec 24, 2022
Extract patterns from unstructured log messages

logu logu is for extracting patterns from (streaming) unstructured log messages. For parsing unstructured logs, it uses the parser from Drain. In simp

null 78 Oct 21, 2024
the file filesystem: mount semi-structured data (like JSON) as a Unix filesystem

ffs: the file filesystem ffs, the file filessytem, let's you mount semi-structured data as a fileystem---a tree structure you already know how to work

Michael Greenberg 176 Dec 31, 2022
Valq - macros for querying and extracting value from structured data by JavaScript-like syntax

valq   valq provides a macro for querying and extracting value from structured data in very concise manner, like the JavaScript syntax. Look & Feel: u

Takumi Fujiwara 24 Dec 21, 2022
Parse RISC-V opcodes to provide more detailed structured data

riscv-opcodes-parser Parse RISC-V opcodes to provide more detailed structured data. License Licensed under either of Apache License, Version 2.0 (LICE

Sprite 2 Jul 30, 2022
A simple key-value store with a log-structured, append-only storage architecture where data is encrypted with AES GCM.

akvdb A simple key-value store with a log-structured, append-only storage architecture where data is encrypted with AES GCM. Modified from the actionk

Olle W 3 Oct 10, 2022
(early experiments toward) a version-control system for structured data

chit: (early experiments toward) a version-control system for structured data please note, very little is actually implemented here. this is not usefu

davidad (David A. Dalrymple) 3 Jul 24, 2023
A rust library for creating and managing logs of arbitrary binary data

A rust library for creating and managing logs of arbitrary binary data. Presently it's used to collect sensor data. But it should generally be helpful in cases where you need to store timeseries data, in a nearly (but not strictly) append-only fashion.

Yusuf Simonson 1 May 9, 2022
TestDrive automatically scrapes input/output data from BOJ(Baekjoon Online Judge) and runs tests for your executable binary file!

?? TestDrive What does it do? TestDrive automatically scrapes input/output data from BOJ(Baekjoon Online Judge) and runs tests for your executable bin

Hyeonseok Jung 3 Mar 5, 2022
A tool to deserialize data from an input encoding, transform it and serialize it back into an output encoding.

dts A simple tool to deserialize data from an input encoding, transform it and serialize it back into an output encoding. Requires rust >= 1.56.0. Ins

null 11 Dec 14, 2022
Structured, contextual, extensible, composable logging for Rust

Getting started Introduction FAQ Crate list slog-rs - The Logging for Rust Introduction (please read) slog is an ecosystem of reusable components for

slog-rs 1.4k Jan 3, 2023
WIP: Parse archived parler pages into structured html

parler-parse Parler HTML goes in (stdin), structured JSON comes out (stdout) Might be useful for feeding into elasticsearch or cross-referencing with

Christopher Tarquini 15 Feb 16, 2021