The Arbitrary trait

Overview

Arbitrary

The trait for generating structured data from arbitrary, unstructured input.

GitHub Actions Status

About

The Arbitrary crate lets you construct arbitrary instances of a type.

This crate is primarily intended to be combined with a fuzzer like libFuzzer and cargo-fuzz or AFL, and to help you turn the raw, untyped byte buffers that they produce into well-typed, valid, structured values. This allows you to combine structure-aware test case generation with coverage-guided, mutation-based fuzzers.

Documentation

Read the API documentation on docs.rs!

Example

Say you're writing a color conversion library, and you have an Rgb struct to represent RGB colors. You might want to implement Arbitrary for Rgb so that you could take arbitrary Rgb instances in a test function that asserts some property (for example, asserting that RGB converted to HSL and converted back to RGB always ends up exactly where we started).

Automatically Deriving Arbitrary

Automatically deriving the Arbitrary trait is the recommended way to implement Arbitrary for your types.

Automatically deriving Arbitrary requires you to enable the "derive" cargo feature:

# Cargo.toml

[dependencies]
arbitrary = { version = "1", features = ["derive"] }

And then you can simply add #[derive(Arbitrary)] annotations to your types:

// rgb.rs

use arbitrary::Arbitrary;

#[derive(Arbitrary)]
pub struct Rgb {
    pub r: u8,
    pub g: u8,
    pub b: u8,
}

Implementing Arbitrary By Hand

Alternatively, you can write an Arbitrary implementation by hand:

// rgb.rs

use arbitrary::{Arbitrary, Result, Unstructured};

#[derive(Copy, Clone, Debug)]
pub struct Rgb {
    pub r: u8,
    pub g: u8,
    pub b: u8,
}

impl<'a> Arbitrary<'a> for Rgb {
    fn arbitrary(u: &mut Unstructured<'a>) -> Result<Self> {
        let r = u8::arbitrary(u)?;
        let g = u8::arbitrary(u)?;
        let b = u8::arbitrary(u)?;
        Ok(Rgb { r, g, b })
    }
}

License

Licensed under dual MIT or Apache-2.0 at your choice.

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this project by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Comments
  • Arbitrary impl for char is slow

    Arbitrary impl for char is slow

    The current implementation, when it hits an invalid character, loops downward by one bit each time. This is slow, all of the invalid char values are between 0xD800 and 0xDFFF, or greater than 0x10ffff (already dealt with with the mask)

    This means that we spend a lot of time just looping here, especially when creating strings. We should instead just replace these with null or decrement the fourth nibble.

    opened by Manishearth 16
  • Add lifetime parameter to Arbitrary trait. Remove shrinking functionality. Implement Arbitrary for &str.

    Add lifetime parameter to Arbitrary trait. Remove shrinking functionality. Implement Arbitrary for &str.

    Fixes https://github.com/rust-fuzz/arbitrary/issues/43

    Note: This merges into a new branch staging-1.0 where we can put our changes prepping for a 1.0 release

    To-do:

    • [x] Remove Shrinkable
    • [x] Fix Cow lifetime
    • [x] Remove shrinking from README
    • [x] Save shrinking code to a gist
    opened by frewsxcv 14
  • Tracking issue for 0.3.0 release

    Tracking issue for 0.3.0 release

    So we have some nice improvements to this crate, and I think everything has technically been compatible with a 0.2.X release but I think it is a good time to revisit some of the public API and trait designs and make a 0.3.0 release. I figured it would be good to open an issue to talk about what we might want in it.

    TODO

    • [x] Fill out public API doc comments, and add examples to each of them
    • [x] Better Arbitrary for String implementation (#17)
    • [x] Should Arbitrary have Debug as a super trait? (#7)
    • [x] Should Arbitrary have Clone as a super trait? This would allow easier/better shrinking implementations (e.g. for various collections)
    • [x] Refactor Unstructured to look more like FuzzedDataProvider? In particular, maybe we should get lengths from the end of the buffer like that does.
    • [x] Add Arbitrary::arbitrary_take_rest methods to consume the rest of an Unstructured. (#18)
    • [x] Add Arbitrary::size_hint and refactor Unstructured::container_size into Unstructured::collection_len, as discussed in #18.
    • [x] Add a fn arbitrary<A: Arbitrary>(&mut self) -> Result<A, Self::Error> helper method to Unstructured to shorten invoking nested-arbitrary calls, so you can just do let foo = u.arbitrary()?; and let type inference figure things out for you?

    As we surface more things to be done for 0.3.0 in the comments, I'll add them to this list, so that we have a single place for everything.


    +cc @Manishearth: what do you think? Anything we should add to or remove from this list?

    opened by fitzgen 13
  • SIGSEGV on certain values

    SIGSEGV on certain values

    I think this is an issue with arbitrary, but let me know where I should move it if not.

    I'm observing an issue where derived arbitrary types trigger a SIGSEGV when running cargo-fuzz. Here's a minimal example:

    #![no_main]
    
    use libfuzzer_sys::fuzz_target;
    use arbitrary::Arbitrary;
    
    #[derive(Arbitrary, Clone, Debug, PartialEq)]
    pub enum Op {
        A(A),
        B(B),
    }
    
    #[derive(Arbitrary, Clone, Debug, PartialEq)]
    pub enum A {
        Leaf,
        B(Box<B>),
    }
    
    #[derive(Arbitrary, Clone, Debug, PartialEq)]
    pub enum B {
        Leaf,
        A(Box<A>),
    }
    
    fuzz_target!(|ops: Vec<Op>| {
    });
    

    Running this on the latest nightly (both mac and linux) triggers a SIGSEGV:

    INFO: Seed: 3703294504
    INFO: Loaded 1 modules   (17415 inline 8-bit counters): 17415 [0x10f5ef708, 0x10f5f3b0f), 
    INFO: Loaded 1 PC tables (17415 PCs): 17415 [0x10f5f3b10,0x10f637b80), 
    INFO:       59 files found in /Users/yusuf/Desktop/sandbox/fuzz-bug/fuzz/corpus/fuzz_target_1
    INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
    ────────────────────────────────────────────────────────────────────────────────
    
    Error: Fuzz target exited with signal: 11
    

    Any of these changes will remove the segfault:

    1. Changing the fuzz target's input from Vec<Op> to just Op.
    2. Removing the B enum variant from A
    3. Removing the A enum variant from B
    opened by ysimonson 12
  • Allow custom arbitrary methods for fields in the custom derive

    Allow custom arbitrary methods for fields in the custom derive

    Sometimes a field of a struct doesn't implement arbitrary and it is either impossible to do (because it is from another crate, for example) or undesired.

    We should support some kind of attribute to either use the Default::default implementation for the type, when we "don't care" about that field, or supply an arbitrary function that has the same signature as Arbitrary::arbitrary to be used instead.

    This is all possible to do when implementing Arbitrary by hand, in the sense that you can always stuff whatever code inside the impl. But this seems common enough and simple enough that we may want to support it in the custom derive.

    Straw person syntax:

    #[derive(Arbitrary)]
    struct MyThing {
        // This doesn't actually matter, just a diagnostic message, so don't
        // waste input bytes on it.
        #[arbitrary(default)]
        diagnostic: String,
    
        // This is a type defined by a foreign crate and doesn't implement
        // Arbitrary. Because of orphan rules, we can't implement Arbitrary
        // for it, so instead we have a one-off function.
        #[arbitrary(arbitrary = arbitrary_foreign_type)]
        foreign: foreign_crate::ForeignType,
    }
    
    fn arbitrary_foreign_type(u: &mut Unstructured)
        -> Result<foreign_crate::ForeignType>
    {
        // ...
    }
    

    The open question in my mind is how to deal with the other, optional Arbtirary methods? e.g. arbitrary_take_rest, shrink, and size_hint. Should we assume the default implementations in this case? Should we also allow providing custom implementations for them?

    Any thoughts @Manishearth?

    +cc @bnjbvr

    enhancement 
    opened by fitzgen 10
  • Support customization of fields on derive

    Support customization of fields on derive

    This PR addresses https://github.com/rust-fuzz/arbitrary/issues/33 and enables the following syntax for derive:

    #[derive(Arbitrary)]
    pub struct Rgb {
        // set `r` to Default::default()
        #[arbitrary(default)]
        pub r: u8,
    
        // set `g` to 255
        #[arbitrary(value = 255)]
        pub g: u8,
    
        // generate `b` with a custom function
        #[arbitrary(with = arbitrary_b)]
        pub b: u8,
    }
    
    fn arbitrary_b(u: &mut Unstructured) -> arbitrary::Result<u8> {
        u.int_in_range(64..=128)
    }
    

    Note

    In case of custom function the lower bound for hint_size is defined through core::mem::size_of::<T>() which is correct for the types allocated on stack, but not for the heap allocated types.

    opened by greyblake 8
  • Use fewer input bytes for arbitrary_loop

    Use fewer input bytes for arbitrary_loop

    Asking for an arbitrary bool to decide whether the loop should keep going consumes a byte per loop iteration, while calling int_in_range instead consumes at most four bytes for any combination of arguments, and often less.

    I'm trying to develop an intuition for what helps or hinders libFuzzer when driving a fuzz target that uses arbitrary, but I don't know enough. Do you suppose this is likely to work better? Do you have any advice on how to reason about questions like this?

    I've been thinking about a bounded_arbitrary_len (or arbitrary_bounded_len? arbitrary_len_in_range?) that takes optional min/max bounds like arbitrary_loop does, but takes bytes from the end like arbitrary_len does. That similarly could consume fewer bytes than just calling arbitrary_len and clamping the result. Is that likely to help a fuzzer explore the state space more effectively?

    opened by jameysharp 7
  • Questions about `Arbitrary`

    Questions about `Arbitrary`

    Hi! A few notes to preface issue:

    • I asked @Manishearth this question a few weeks back and said that I should open an issue.
    • This is not meant to be judgement of the decisions made in this crate, especially since I am not aware of the constraints that y'all might have.

    The question: why did y'all opt for a QuickCheck-style, trait-based approach over a Hypothesis/Proptest-style, explicit shrinking object/struct approach? For context, a Hypothesis/Proptest-style checker would have a shrinking strategy that returns a tree. This tree would values and shrinking/expansion strategies. I've noticed that the biggest win of Hypothesis/Proptest-style approaches is that shrinkers/generators can be implemented for completely arbitrary (no pun intended!) types so that orphan rules don't apply to generated code.

    That being said, I suspect that fuzzers might have different constraints than property testing, so this entire question could be taken with a grain of salt.

    If I recall correctly, @fitzgen mentioned to me that they preferred Quickcheck to Proptest-style systems at Strangeloop.

    opened by davidbarsky 7
  • fix(Unstructured)!: don't produce meaningless data if exhausted

    fix(Unstructured)!: don't produce meaningless data if exhausted

    If the Unstructured is completely exhausted, returning a buffer completely filled with zeroes instead of indicating failure does not make sense. It also causes issues for any users of Arbitrary implementations for integers (which are based on this method) who rely on eventually getting a nonzero result to e.g. break recursion.

    Fixes #107.

    opened by Xiretza 6
  • Add support for arbritrary arrays

    Add support for arbritrary arrays

    T: Default would remove all unsafes but it isn't currently possible for [T; N] and this bound limitation isn't desirable for stuff that don't implement Default.

    If https://github.com/rust-lang/rust/pull/75644 is going to be accepted, then the auxiliary functions won't be necessary in the near future.

    opened by c410-f3r 6
  • "Fill all available space" solution for container size

    Broadly, there are two kinds of things Arbitrary generates. Fixed-size objects that map to a fixed set of integers, and variable size objects, which can use something like container_size().

    Thing is, unlike, say, quickcheck, we already have a finite number of bytes as input. It would be nice if Arbitrary could be told "hey, I only have X bytes of data and you're my last consumer" as opposed to "slurp as much data as you want, I'll split the difference amongst the next consumers!"

    This seems pretty tricky to design in the general case, but we could initially settle for types like String and Vec just filling directly.

    opened by Manishearth 6
  • `arbitrary` v1.2.0 + `derive` doesn't build with `generate-lockfile -Z minimal-versions`

    `arbitrary` v1.2.0 + `derive` doesn't build with `generate-lockfile -Z minimal-versions`

    It pulls in derive_arbitrary v1.1.6 which has the following build failure:

    error[E0599]: no method named `len` found for reference `&Fields` in the current scope
       --> /home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/derive_arbitrary-1.1.6/src/lib.rs:219:30
        |
    219 |         if idx + 1 == fields.len() {
        |                              ^^^ method not found in `&Fields`
    

    Perhaps you could bump the derive_arbitrary requirement in arbitrary to 1.2.0 (it's presently 1.1.6) or yank the affected derive_arbitrary v1.1.6 release?

    opened by tarcieri 0
  • Is there anyway to convert an arbitrary struct into unstructured raw bytes

    Is there anyway to convert an arbitrary struct into unstructured raw bytes

    I am new to arbitrary and cargo-fuzz. I have some seed corpus in json and I need to describe them in raw bytes, so that my fuzzer can convert them to a struct derived arbitrary.

    opened by last-las 0
  • Arbitrary derive fails with lifetimes in some cases

    Arbitrary derive fails with lifetimes in some cases

    Hello! Bear with me as I am new to Arbitrary.

    I have this type from juniper:

    pub struct Arguments<'a, S> {
        pub items: Vec<(Spanning<&'a str>, Spanning<InputValue<S>>)>,
    }
    

    Spanning and InputValue successfully derive(arbitrary::Arbitrary)). When I try to derive on Arguments, I get the following error:

    error[E0495]: cannot infer an appropriate lifetime for lifetime parameter `'a` due to conflicting requirements
      --> juniper/src/ast.rs:59:42
       |
    59 | #[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
       |                                          ^^^^^^^^^^^^^^^^^^^^
       |
    note: first, the lifetime cannot outlive the lifetime `'a` as defined here...
      --> juniper/src/ast.rs:60:22
       |
    60 | pub struct Arguments<'a, S> {
       |                      ^^
    note: ...so that the types are compatible
      --> juniper/src/ast.rs:59:42
       |
    59 | #[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
       |                                          ^^^^^^^^^^^^^^^^^^^^
       = note: expected `<&'a str as Arbitrary<'_>>`
                  found `<&str as Arbitrary<'_>>`
    note: but, the lifetime must be valid for the lifetime `'arbitrary` as defined here...
      --> juniper/src/ast.rs:59:42
       |
    59 | #[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
       |                                          ^^^^^^^^^^^^^^^^^^^^
    note: ...so that the types are compatible
    

    Manually implenting it works:

    impl<'a, S> arbitrary::Arbitrary<'a> for Arguments<'a, S>
    where
        S: arbitrary::Arbitrary<'a>,
    {
        fn arbitrary(u: &mut arbitrary::Unstructured<'a>) -> arbitrary::Result<Self> {
            let items: Vec<(Spanning<&'a str>, Spanning<InputValue<S>>)> = u.arbitrary()?;
            Ok(Self { items })
        }
    }
    

    Even though I am unblocked, I figured I'd file as one would expect the derive to work in this case.

    opened by LegNeato 3
  • Allow `Unstructured` to be backed by an iterator of bytes instead of a byte slice?

    Allow `Unstructured` to be backed by an iterator of bytes instead of a byte slice?

    In our use case, we are not using Arbitrary for fuzzing, but simply for creating arbitrary fixture values in tests. Currently we create a 10MB static Vec<u8> of random noise and use that as our Unstructured data. However this is annoying since it requires 10MB of memory overhead, and sometimes even then we run out of bytes.

    I am wondering if it would be valid to have two flavors of Unstructured, one backed by a byte slice, presumably for fuzzing, and one backed by an infinite iterator of bytes which can never be exhausted. I experimented with a PR for doing this and got some basic tests passing, but don't know how valid it is in the grand scheme. I did find that some functionality is indeed dependent on the fixed byte slice, so at the very least some extra UX effort would have to be made to provide slightly different interfaces for different Unstructured flavors.

    I understand if this wouldn't be worth the effort but I guess I am primarily wondering from a motivational standpoint why Unstructured is backed by a byte array instead of an iterator, and only secondarily asking for some feedback on the feasibility of using infinite iterators. Note: I know next to nothing about fuzzing.

    opened by maackle 4
  • WIP: Add `dearbitrary` functionality to turn an instance into its arbitrary byte sequence

    WIP: Add `dearbitrary` functionality to turn an instance into its arbitrary byte sequence

    This PR solves issue #44 by implementing a dearbitrary function to create a byte buf, which can be used again with arbitrary to recreate the struct. It is not a one to one mapping, as e.g. bools only use the last bit of a byte value.

    What works:

    Many parts are missing, as I do not need them at the moment and have no time to implement them...

    Remark: commits are prone to change/squashing/reordering!!

    opened by bitwave 4
A lending iterator trait based on generic associated types and higher-rank trait bounds

A lending iterator trait based on higher-rank trait bounds (HRTBs) A lending iterator is an iterator which lends mutable borrows to the items it retur

Sebastiano Vigna 6 Oct 23, 2023
The trait for generating structured data from arbitrary, unstructured input.

Arbitrary The trait for generating structured data from arbitrary, unstructured input. About The Arbitrary crate lets you construct arbitrary instance

Rust Fuzzing Authority 407 Dec 24, 2022
The Arbitrary trait

Arbitrary The trait for generating structured data from arbitrary, unstructured input. About The Arbitrary crate lets you construct arbitrary instance

Rust Fuzzing Authority 407 Dec 24, 2022
Trait aliases on stable Rust

trait-set: trait aliases on stable Rust Status: Project info: Support for trait aliases on stable Rust. Description This crate provide support for tra

Igor Aleksanov 39 Oct 12, 2022
A stack-allocated box that stores trait objects.

This crate allows saving DST objects in the provided buffer. It allows users to create global dynamic objects on a no_std environment without a global allocator.

Aleksey Sidorov 19 Dec 13, 2022
A framework for iterating over collections of types implementing a trait without virtual dispatch

zero_v Zero_V is an experiment in defining behavior over collections of objects implementing some trait without dynamic polymorphism.

null 13 Jul 28, 2022
A stack for rust trait objects that minimizes allocations

dynstack A stack for trait objects that minimizes allocations COMPATIBILITY NOTE: dynstack relies on an underspecified fat pointer representation. Tho

Gui Andrade 114 Nov 28, 2022
Build SQLite virtual file systems (VFS) by implementing a simple Rust trait.

sqlite-vfs Build SQLite virtual file systems (VFS) by implementing a simple Rust trait. Documentation | Example This library is build for my own use-c

Markus Ast 56 Dec 19, 2022
Generate enum from a trait, with converters between them

Derive macro for Rust that turns traits into enums, providing tools for calling funtions over channels

Vitaly Shukela 16 Nov 3, 2022
Macro for fast implementing serialize methods in serde::Serializer trait

impl_serialize! This library provides a simple procedural macro for fast implementing serialize methods in serde::Serializer trait. [dependencies] imp

Eduard Baturin 2 Sep 6, 2022
A simple trait-based framework for the annual Advent of Code programming challenge.

lib_aoc lib_aoc is a simple trait-based framework for the annual Advent of Code programming challenge. Focus less on the boilerplate and more on the p

null 2 Dec 8, 2022
A list of Rust buffers that implements the bytes::Buf trait.

buf-list A list of bytes::Bytes chunks. Overview This crate provides a BufList type that is a list of Bytes chunks. The type implements bytes::Buf, so

null 5 Dec 16, 2022
Fills an `impl` with the associated items required by the trait.

portrait Fill impl-trait blocks with default, delegation and more. Motivation Rust traits support provided methods, which are great for backwards comp

Jonathan Chan Kwan Yin 7 Feb 23, 2023
A Rust trait to convert numbers of any type and size to their English representation.

num2english This Rust crate provides the NumberToEnglish trait which can be used to convert any* number to its string representation in English. It us

Travis A. Wagner 6 Mar 8, 2023
Extension trait to chunk iterators into const-length arrays.

const-chunks This crate provides an extension trait that lets you chunk iterators into constant-length arrays using const generics. See the docs for m

Louis Gariépy 6 Jun 12, 2023
A lending version of the `Stream` trait

lending-stream A lending version of Stream API Docs | Releases | Contributing Installation $ cargo add lending-stream Safety This crate uses #![deny(u

Yosh 5 Aug 14, 2023
This library provides a convenient derive macro for the standard library's std::error::Error trait.

derive(Error) This library provides a convenient derive macro for the standard library's std::error::Error trait. [dependencies] therror = "1.0" Compi

Sebastian Thiel 5 Oct 23, 2023
Trait that allows comparing a value to a range of values.

range_cmp Docs This Rust crate provides the RangeComparable trait on all types that implement Ord. This traits exposes a rcmp associated method that a

Akvize 3 Nov 8, 2023
Hash trait that is object-safe

Hash trait that is object-safe This crate provides a DynHash trait that can be used in trait objects. Types that implement the standard library's std:

David Tolnay 19 Nov 12, 2023
This crate provides a convenient macro that allows you to generate type wrappers that promise to always uphold arbitrary invariants that you specified.

prae This crate provides a convenient macro that allows you to generate type wrappers that promise to always uphold arbitrary invariants that you spec

null 96 Dec 4, 2022