Lightweight parsing for Rust proc macros

Overview

crates.io docs.rs license

Lightweight parsing for Rust proc macros

Venial is a WIP parser for Rust proc macros.

When writing proc macros that need to parse Rust code (such as attribute and derive macros), the most common solution is to use the syn crate. Syn can parse arbitrary valid Rust code, and even Rust-based DSLs, and return versatile data structures that can inspected and mutated in powerful ways.

It's also extremely heavy. In one analysis of lqd's early 2022 benchmark collection, the author estimates that syn is reponsible for 8% of compile times of the benchmark, which accounts for Rust's most popular crates. There are subtleties (eg this isn't necessarily critical path time, but syn is often in the critical path anyway), but the overall takeaway is clear: syn is expensive.

And yet, a lot of the power of syn is often unneeded. If we look at the crates that depend on syn, we can see that the 5 most downloaded are:

  • serde_derive
  • proc-macro-hack
  • pin-project-internal
  • anyhow
  • thiserror-impl

Of these, proc-macro-hack is deprecated, and the other four only need to parse basic information on a type.

Other popular reverse-dependencies of syn (such as futures-macro, tokios-macros, async-trait, etc) do use syn's more advanced features, but there's still room for a lightweight parser in proc-macros.

Venial is that parser.

Design

Venial is extremely simple. Most of its implementation is in the parse.rs file, which is about 350 lines at the time I'm writing this README. This is because the Rust language has a very clean syntax, especially for type declarations.

Venial has no dependency besides proc-macro2 and quote.

To achieve this simplicity, venial makes several trade-offs:

  • It can only parse declarations (eg struct MyStruct {}). It can't parse expressions or statements. For now, only types and functions are supported.
  • It doesn't try to parse inside type expressions. For instance, if your struct includes a field like foo_bar: &mut Foo, venial will dutifully give you this type as a sequence of tokens and let you interpret it.
  • It doesn't attempt to recover gracefully from errors. Venial assumes you're running inside a derive macro, and thus that your input is statically guaranteed to be a valid type declaration. If it isn't, venial will summarily panic.

Note though that venial will accept any syntactically valid declaration, even if it isn't semantically valid. The rule of thumb is "if it compiles under a #[cfg(FALSE)], venial will parse it without panicking".

(Note: The above sentence is a lie; venial currently panics on unsupported declarations, eg traits, aliases, etc.)

Example

use venial::{parse_declaration, Declaration};
use quote::quote;

let enum_type = parse_declaration(quote!(
    enum Shape {
        Square(Square),
        Circle(Circle),
        Triangle(Triangle),
    }
));

let enum_type = match enum_type {
    Declaration::Enum(enum_type) => enum_type,
    _ => unreachable!(),
};

assert_eq!(enum_type.variants[0].0.name, "Square");
assert_eq!(enum_type.variants[1].0.name, "Circle");
assert_eq!(enum_type.variants[2].0.name, "Triangle");

Performance

I haven't performed any kind of formal benchmark yet. That said, I compared this fork of miniserde using venial to the equivalent miniserde commit, and got the following results:

$ cargo check -j1 # miniserde-venial, clean build
    Finished dev [unoptimized + debuginfo] target(s) in 6.30s
$ cargo check -j1 # miniserde, clean build
    Finished dev [unoptimized + debuginfo] target(s) in 9.52s

$ cargo check -j4 # miniserde-venial, clean build
    Finished dev [unoptimized + debuginfo] target(s) in 3.17s
$ cargo check -j4 # miniserde, clean build
    Finished dev [unoptimized + debuginfo] target(s) in 4.79s

My machine is desktop computer with an AMD Ryzen 7 1800x (8 cores, 16 threads), I have 32GB of RAM and a 2.5TB SSD.

As we can see, using venial instead of syn shaves about 3.2s off total build times in single-threaded builds, and 1.6s in 4-threaded builds.

Most of the difference comes from syn and venial themselves: cargo check --timings shows that syn takes 2.11s to compile and venial takes 0.58s in 4-threaded builds.

I'm not showing codegen builds, release mode builds, 16-threads builds and the like, but the trend stays roughly the same: for the miniserde project, switching to venial removes ~30% of the build time.

So... Is it worth it?

That's a fairly complicated to answer. At the time I'm writing this section my answer is "Probably, but I'm less enthusiastic than when I started the project".

If you take the most optimistic interpretation, this is great! On a single-threaded machine, switching shaves three seconds off, a whole third of the build time!

In reality, there are a lot of complicating factors:

  • Venial never improves incremental build times at all (since dependencies are cached, even when incremental compilation is off).
  • The gap between syn and venial is shorter with any amount of multithreading.
  • I have a fairly powerful computer. Laptops might get more of a benefit from venial.
  • In projects bigger than miniserde, syn is usually one of many libraries being compiled at the same time. In some cases that means the build time of syn doesn't matter that much since it's compiled in parallel with other libraries. In other cases syn is on the critical path.
  • In practice, most clean build are run by CI servers. To measure the usefulness of venial, you'd need to analyze the specs of the servers used in Github Actions / Gitlab CI / whatever crater uses.

All in all, it's questionable whether the benefits are worth porting your derive crate from syn to venial (though my experience so far has been that porting isn't that hard).

Another thing to keep in mind is that this is a very young library. There has been very little effort to optimize it or profile it so far, and further versions may give a better build time reduction.

tl;dr: You can probably shave off a few seconds off your clean builds with venial. Incremental builds see no benefits.

Contributions

Pull requests are welcome.

My current roadmap is:

  • Find any bugs there might be and fix them.
  • Port other projects from syn to venial and compare compile times.
  • Add Github Actions

On the long term, I'd also like to add parsing for more use cases, while keeping the crate lightweight:

  • Parsing traits.
  • Parsing comma-separated expression lists.

With those, I believe venial would cover virtually all use cases that popular libraries use syn for.

Comments
  • Parse `impl` blocks, `type` and `const` declarations

    Parse `impl` blocks, `type` and `const` declarations

    This pull request takes #27 as a base.

    Supports parsing of two impl block schemes:

    • Inherent: impl Type { ... }
    • Trait: impl Trait for Type { ... }

    Each block can contain members of three different categories:

    • functions
    • constants
    • associated types

    Examples:

    // Inherent impl
    impl<'a, T: Clone, const N: i8> MyStruct<'a, T, N> {}
    
    // Trait impl
    impl MyTrait for MyStruct {
        pub type MyType = std::string::String;
    
        fn new(i: i32, b: bool) -> Self {
            Self {}
        }
    
        #[attr]
        const fn set_value(&mut self, s: String) {}
    
        const CONSTANT: i8 = 24 + 7;
    }
    

    There are a few open questions on my side:

    1. I plan to also add support for traits. Regarding some of the new API types I added, there's the option to reuse them (forcing users to unwrap options) or to make dedicated, type-safe separate ones for trait and impl contexts. What is your preference? This would also affect the naming (e.g. I'm not happy with TypeDefinition). Examples:

      • Trait: const CONSTANT: i8; Impl: const CONSTANT: i8 = value; either we have 1 type representing both, with optional = and value, or 2 separate types.
      • Trait: type AssocTy: Clone; -- can have bounds, but (at the moment) no default type initializers Impl: type AssocTy = MyType; -- must have type initializers, no bounds
    2. I was thinking to add GenericArg/GenericArgList to represent <'a, T, U, N> -- in addition to GenericParam/GenericParamList for <'a: 'static, T, U: Clone, const N: usize>. Would that make sense? If yes, we could probably remove InlineGenericArgs and directly allow to convert from parameters (unless you say we can't afford that copy).

    3. Is there a reason why the Debug impl for GenericParam does not include the tk_prefix? Might be good to check if it's a lifetime, a const or a type. https://github.com/PoignardAzur/venial/blob/b490dd70226cb2b46d63849154bd06a35ee566d3/src/types.rs#L486-L493

    4. There might be some room to make the naming more consistent (e.g. FunctionParameter vs. GenericParam) as well as other smaller refactorings, but it probably makes more sense to work out a proposal to parse trait first 🙂

    opened by Bromeon 16
  • Add fuzz test to ensure venial parses everything that syn does

    Add fuzz test to ensure venial parses everything that syn does

    Hello :)

    This is the PR adding fuzzing to venial. It is not perfect, but on my machine it works well enough to find some differences. I will continue to improve it if you are happy with the general idea/implementation.

    Please let me know if anything fails to compile on your machine. I have had a few problems recently due to new rust toolchains and different OSes handling code coverage instrumentation differently. Usually it can be fixed somewhat easily though.

    How to use it

    The fuzzer uses an unreleased version of fuzzcheck. It is recommended to first install the cargo-fuzzcheck command line tool, as follows:

    cargo install --git "https://github.com/loiclec/fuzzcheck-rs"
    

    And it is also necessary to use the nightly version of the Rust compiler.

     # install the nightly toolchain, then:
    rustup override set nightly
    

    Then you can run:

    cargo fuzzcheck fuzz::fuzz_parse --profile fuzzing
    

    Updates about the progress are printed in the terminal, for example:

    5972ms  242234 simplest_cov(609 cov: 2720/4456 cplx: 63.19) failures(1) iter/s 40438
    

    Some files will be printed inside the fuzz/ folder. If a test failure is found, it will be located inside fuzz/artifacts/. You can stop the fuzzer at any point by pressing Ctrl+C.

    If you find a test failure but you think it is too large to really understand what is going on, you can run the following to minify the test failure:

    cargo fuzzcheck fuzz::fuzz_parse --command minify --input-file "path/to/artifact.rs"
    

    That will run the fuzzer repeatedly, each time with a reduced upper bound on the complexity of the generated token streams. The minified artifacts will be located in path/to/artifact.rs.minified/{cplx}-{hash}.rs. When it seems like no progress is being made anymore, press Ctrl+C and look for the simplest artifact in the folder.

    About the code

    There is only one property being tested so far, which checks that venial can parse everything that syn does. This code is located in fn test_parse_declaration(..). It also only parses DeriveInput and not ItemFn, because as far as I understand function parsing isn't as polished yet in venial.

    The code that actually launches the fuzz test is inside #[test] fn fuzz_parse(). I have commented it lightly, but let me know if you have any questions about it. You can also read about how fuzzcheck works in general at fuzzcheck.neocities.org . The main difficulty was to observe the code coverage of both venial and syn. So instead of using the convenience functions offered by fuzzcheck::builder, I have created a “sensor” and a “pool” manually. In the future, I will improve fuzzcheck’s APIs so that it isn't necessary.

    Limitations of the fuzzer

    There are a few limitations of the fuzzer, some of which can be fixed easily, and some of which are just the results of a compromise between “diversity of generated token streams” and performance. The limitations are:

    • Just a few hardcoded identifiers are generated. These include all Rust keywords as well as some simples ones like a, b, c. The complexity of all identifiers is the same. That is, fuzzcheck considers auto and a to be equally complex. So it won't minify e.g. struct auto {} into struct a {}

    • No raw identifiers are generated.

    • Similarly, just a few hardcoded literals are generated as well. That is because I don't think venial cares about the content of the literals. If that is not true, let me know.

    • No groups with Delimiter::None are generated. That is because I don't yet have a grasp on what the purpose of Delimiter::None is. Furthermore, allowing Delimiter::None to be generated means that we have to use a more verbose serialisation format for the artifacts. That is because there is no visible difference between e.g. an identifier: ident and an identifier in a group without delimiters: ident.

    • Almost no awareness of the Rust grammar is used (yet). The fuzzer doesn't try to generate valid Rust syntax. That is partly by design, and partly because it is additional work. It would be nice to prioritise combination of punctuations that we know have a meaning in Rust (e.g. ->, ..=, ..., ::, etc.). I will work on that at some point later.

    • Some of the more advanced functionality of fuzzcheck isn't used. It is possible to optimise for different goals, such as finding a set of N test cases that cover the most code, or finding the test case that allocates the most, or has the highest time complexity, etc.

    That's it, I think?

    Let me know if it works for you. If you try it now, you should quickly (i.e. in 2 minutes at most) find a failing test case:

    enum a { a = () > () }
    
    opened by loiclec 16
  • venial v0.5 release

    venial v0.5 release

    Would it be possible to release a crates.io version some time in September?

    There have been quite a few features going into venial since v0.4, and it would be nice if other published crates could make use of them. Personally I plan to release a first version of godot-rust (GDExtension) in the near future, and this is not possible on crates.io with Git dependencies.

    Once everything becomes more stable and forward-compatibility is a topic, #[non_exhaustive] could be applied to structs/enums as well. This could also be done in a v0.6 after getting some feedback from v0.5 (which will likely lead to some breaking changes). In that light, I'd also argue it's better to get a version out soon, advertise it and collect user input, rather than trying to find the perfect API in the darkness 😉

    One feature I still plan on implementing are traits. Should be relatively quick to do as a lot of the foundation is now built, but it might make sense to have mod support merged first (https://github.com/PoignardAzur/venial/pull/31).

    opened by Bromeon 8
  • Support function receivers `self`, `mut self`, `&self`, `&mut self`

    Support function receivers `self`, `mut self`, `&self`, `&mut self`

    Adds support for receiver parameters self, mut self, &self, &mut self.

    Also captures several "non-content" tokens like ( ) parameter list group, : parameter name-value separator, -> return type separator.

    This PR contains breaking changes, as it splits FunctionParameter up:

    #[derive(Clone, Debug)]
    pub enum FunctionParameter {
        Receiver(FunctionReceiverParameter),
        Typed(FunctionTypedParameter),
    }
    
    // self, mut self, &self, &mut self
    #[derive(Clone, Debug)]
    pub struct FunctionReceiverParameter {
        pub attributes: Vec<Attribute>,
        pub tk_ref: Option<Punct>,
        pub tk_mut: Option<Ident>,
        pub tk_self: Ident,
    }
    
    // name: type
    #[derive(Clone, Debug)]
    pub struct FunctionTypedParameter {
        pub attributes: Vec<Attribute>,
        pub tk_mut: Option<Ident>,
        pub name: Ident,
        pub tk_colon: Punct,
        pub ty: TyExpr,
    }
    
    opened by Bromeon 8
  • Reject trailing tokens after declaration

    Reject trailing tokens after declaration

    Current venial parses struct Good {} bad and silently ignores bad or any other trailing junk tokens. I'm not sure what the best way to deal with that is, but I think that should be an error.

    (Origin issue: https://github.com/jcaesar/structstruck/issues/4)

    bug 
    opened by jcaesar 5
  • Modifying a parsed declaration does not modify its token stream

    Modifying a parsed declaration does not modify its token stream

    Apologies if the following is expected behaviour, but I find it highly unintuitive.

    I have some code that is trying to modify the type of a struct field, it boils down to:

    use proc_macro2::{Ident, TokenTree}; // v1.0.37
    use quote::{quote, ToTokens}; // v1.0.17
    use venial::{parse_declaration, Declaration, NamedField, Punctuated}; // v0.2.1
    
    fn main() {
        let mut dec = parse_declaration(quote!(
            struct Foo {
                a: i32,
            }
        ));
        match &mut dec {
            Declaration::Struct(dec) => match &mut dec.fields {
                venial::StructFields::Named(nf) => {
                    let (mut field, punct) = nf.fields[0].clone();
                    field.ty.tokens = match &field.ty.tokens[0] {
                        TokenTree::Ident(ident) => {
                            vec![TokenTree::Ident(Ident::new("i64", ident.span()))]
                        }
                        _ => todo!(),
                    };
                    let mut new_fields: Punctuated<NamedField> = Default::default();
                    new_fields.push(field, Some(punct));
                    nf.fields = new_fields;
                    // nf.fields doesn't implement DerefMut, so we can't modify internally...
                    //nf.fields[0].0.ty.tokens = vec![TokenTree::Ident(retype)];
                }
                _ => todo!(),
            },
            _ => todo!(),
        }
    
        println!("{dec:#?}");
        println!("{:#?}", dec.to_token_stream());
    }
    

    This prints:

    Struct(
        Struct {
            attributes: [],
            vis_marker: None,
            _struct: Ident(
                struct,
            ),
            name: Ident(
                Foo,
            ),
            generic_params: None,
            where_clause: None,
            fields: Named(
                [
                    NamedField {
                        attributes: [],
                        vis_marker: None,
                        name: Ident(
                            a,
                        ),
                        _colon: Punct {
                            char: ':',
                            spacing: Alone,
                        },
                        ty: [
                            i64, ← ✔️ Expected
                        ],
                    },
                ],
            ),
            _semicolon: None,
        },
    )
    TokenStream [
        Ident {
            sym: struct,
        },
        Ident {
            sym: Foo,
        },
        Group {
            delimiter: Brace,
            stream: TokenStream [
                Ident {
                    sym: a,
                },
                Punct {
                    char: ':',
                    spacing: Alone,
                },
                Ident {
                    sym: i32, ← ❌ Unexpected
                },
                Punct {
                    char: ',',
                    spacing: Alone,
                },
            ],
        },
    ]
    

    Is this a bug?

    opened by jcaesar 5
  • Re-export missing type names

    Re-export missing type names

    Previously forgotten. Does it make sense to have

    pub use types::*;
    

    ? This would have prevented that problem (and likely similar future ones). I would argue that if a type is public inside the types module, it's intended to be exported, otherwise we could use pub(crate). But maybe I'm missing something.

    bug 
    opened by Bromeon 4
  • Unballanced brackets in type expression, when not ending with comma.

    Unballanced brackets in type expression, when not ending with comma.

    I decided to give venial a try for my proc macros as it seems easier than syn for my purposes. However, I ran into a bug where the proc macro crashes with the error Unballanced brackets in type expression. despite the rust code being correctly formatted.

    The rust code in question:

    #[derive(Clone, RluaUserData)]
    struct Example<A: Clone> {
        _test: A,
    }
    

    Curiously, adding a , after the Clone will make this error go away, as in this code works:

    #[derive(Clone, RluaUserData)]
    struct Example<A: Clone,> {
        _test: A,
    }
    

    repo that this happened in: https://github.com/lenscas/tealr/tree/chore/improve_proc_macros relevant files are https://github.com/lenscas/tealr/blob/chore/improve_proc_macros/tealr_derive/src/lib.rs#L48 (function that declares the derive macro)

    https://github.com/lenscas/tealr/blob/388e7133a1b66cc9ee915e36ddac428bf1979329/tealr_derive/src/user_data.rs#L21 (function that creates the trait impl. I commented everything out to make sure the crash was not in my own code)

    https://github.com/lenscas/tealr/blob/388e7133a1b66cc9ee915e36ddac428bf1979329/tealr/examples/rlua/derive.rs#L20 (example I use for testing, which contains the changes to get venial to panic)

    opened by lenscas 4
  • Added the ability to parse any functions

    Added the ability to parse any functions

    This pull request adds functions that enable the parsing of any function, as demonstrated in the tests.rs file. This will allow this crate to be used for serious projects which use function attributes (such as: various web frameworks) in the future.

    The following is just copy-pasted from the tests.rs file and showcases what this new feature can parse:

    fn hello(a: i32, b: f32) -> String {}
    fn test_me() {}
    fn generic<T, B>(a: T) -> B {}
    fn where_clause<T>() -> T
    where
        T: Debug
    {}
    #[my_attr]
    fn my_attr_fn(a: i32) {}
    pub fn visibility(b: f32) {}
    
    opened by clemmsch 4
  • Implement `trait`

    Implement `trait`

    Parses traits, largely reusing the impl implementation. I added code to parse_impl.rs, didn't have a new good name for the file, so I left it for now...

    Unfortunately this introduces a breaking change, because existing TyDefinition did not account for : Bounds and optional = initializer. In general, we should probably mark everything #[non_exhaustive] to at least have the option to add fields in non-breaking ways.

    But no urgency from my side for any release, just thought I'd implement this while I'm in the flow 😉

    enhancement 
    opened by Bromeon 2
  • Implement `unsafe impl`

    Implement `unsafe impl`

    Builds on top of #31, which should be merged first

    Relevant diff: https://github.com/PoignardAzur/venial/pull/33/commits/8f3644da954bebee5405bda1713b945299795b7f ~~Relevant diff: https://github.com/PoignardAzur/venial/pull/33/commits/c4499e98e7b683e451b3a14e945ffd5591a9826e~~

    enhancement 
    opened by Bromeon 2
  • Rollback iterator to avoid cloning tokens

    Rollback iterator to avoid cloning tokens

    We have now 4 occurrences of this:

    // TODO consider multiple-lookahead instead of potentially cloning many tokens
    

    And in the last days I was thinking how to approach that best. First I had the idea of a multi-peek iterator (which would allow looking ahead several tokens instead of just 1). While thinking how to apply this in venial, another idea came up, which seemed more ergonomic to me: a rollback iterator, meaning that you would normally call next() but could rollback at any time to a previously set backup point.

    Rough sketch:

    fn consume_stuff(tokens: &mut TokenIter) {
        let tokens = tokens.start_attempt(); // new iterator type, or runtime bool flag
    
        ... // call next() many times
    
        if not_what_i_expected {
            tokens.rollback(); // revert to state at beginning of the function
        }
    
    } // drop would commit the attempt
    

    (it's also possible to have explicit commit and rollback-on-drop instead)

    This can be implemented via VecDeque, which keeps the tokens since the backup point in the buffer. When rolling back, the iterator would pop_front() the buffer first, before calling next() on the underlying iterator. Otherwise it would advance as usual.

    One thing to note: for rollbacks to work, the basic TokenIter type which is currently used everywhere would have to be a different iterator type that supports this buffer. It's not possible to only selectively use a rollback iterator in the places where rollback is used, because the more general TokenIter cannot rewind. So we would need to think which of the Iterator methods are worth overriding. Probably size_hint(), but I'm a bit reluctant to touch all the others for possible optimization, at the big risk of introducing bugs.

    What do you think about this?

    enhancement 
    opened by Bromeon 3
  • Split functions behind feature-gate.

    Split functions behind feature-gate.

    The main design goal of Venial is to provide a proc-macro parser with minimal build time, primarily for derive macros.

    Derive macros only need to consider three syntax elements: structs, enums, and unions. There should be a feature flag so that users who only need these don't pay the build time price of all the other features.

    (Or maybe not. The first thing we should do is measure how much time these features actually add.)

    good first issue 
    opened by PoignardAzur 0
  • Failure to parse macro-generated struct

    Failure to parse macro-generated struct

    (This issue was pointed out to me by @dtolnay on https://github.com/jcaesar/structstruck/issues/1)

    When parsing a rule macro-generated struct (e.g. one where the visibility comes from a macro variable), some tokens may be in delimiter-less groups, e.g.

    TokenStream [
        Group {
            delimiter: None,
            stream: TokenStream [
                Ident {
                    ident: "pub",
                    span: #0 bytes(196..199),
                },
            ],
            span: #4 bytes(85..96),
        },
        Ident {
            ident: "struct",
            span: #4 bytes(97..103),
        },
        Ident {
            ident: "Struct",
            span: #4 bytes(104..110),
        },
        Group {…}
      }
    ]
    

    This fails to parse with panic message cannot parse declaration: expected keyword struct/enum/union/fn, found token Group { delimiter: None, stream: TokenStream [Ident { ident: "pub", span: #0 bytes(196..199) }], span: #4 bytes(85..96) }.

    I'm not sure what the correct way of handling this is. Delimiter-less groups can be important in some situations, but I don't think there are any in struct parsing. If that holds true, you could just normalize such groups away. (It may be very weird to have something like foo $blubb Bar with $blubb substituted by : Hax, hax2 : — but I don't see why it should be forbidden.)
    It might be worth checking how syn handles this kind of input, it definitely doesn't panic on the example in my gist.

    opened by jcaesar 2
  • Replacement for split_for_impl() on Generics?

    Replacement for split_for_impl() on Generics?

    Syn offers https://docs.rs/syn/latest/syn/struct.Generics.html#method.split_for_impl for their generics to easily add implementations. I couldn't find an obvious way how to do this with venial. Is there a way, and if so, what would be the correct way?

    opened by MTRNord 2
  • Listing the differences between venial and syn using a fuzzer

    Listing the differences between venial and syn using a fuzzer

    Hi!

    About a year ago I wrote a crate called decent-synquote-alternative, which has basically the same goal as venial. It is not as well written though: I didn't have a clear scope for it and I wrote it in a very naive way. It is also not well tested at all. But it offers an alternative to quote! that I thought was more ergonomic and could maybe make compile times better. I used that crate to write the fuzzcheck proc macros.

    Anyway, now I'm considering replacing decent-synquote-alternative with venial. It has almost every feature I want, and it seems to be better written. So I thought I'd contribute :)

    I am in the process of releasing a crate called fuzzcheck_proc_macro which makes it possible to fuzz-test functions that take a TokenStream as input. I have tested the first prototype on venial. The first fuzz-test I tried is, basically:

    fn test_parse_declaration(ts: proc_macro2::TokenStream) {
        let result = std::panic::catch_unwind(|| {
            crate::parse_declaration(token_stream.clone()),
        });
        let syn_result: Result<syn::DeriveInput, _> = syn::parse2(token_stream);
        match syn_result {
            Ok(derive_input) => {
                if let Ok(_) = &result {
                    // OK! syn and venial can both parse the input
                } else {
                     panic!("syn can parse but venial can't");
                }
            }
            Err(_) => { }
        }
        // + same but with syn parsing `ItemFn` instead
    }
    

    So it tries to parse an arbitrary token stream and ensures that venial accepts everything that syn accepts.

    I tested it on commit b09a976d44849e0c8573f5d8da4f5a8e50b35aab, and within a few minutes, I found a few examples that syn is happy to parse but venial can't:

    // an inner attribute within a struct/enum
    enum A {
        #![some_inner_attribute]
    }
    
    // closure expressions can be the discriminant of an enum variant, and they can contain a comma inside matching `|`. 
    // venial advances to the next comma that isn't inside matching angle brackets only. 
    // So it tries to parse the next variant starting at `c |`, which fails.
    enum A {
         A = |b, c| { b + c }  
    }
    
    // macro in the where clause: venial thinks the content of the macro is the content of the enum
    // so it tries to parse `:` as a variant
    enum A where b !{ : } : { }
    
    // (almost?) any pattern as function argument will make venial fail but not syn
    fn x([x, ..]: a)
    
    // c variadics not supported by venial
    fn x(...) {}
    
    // function parameter with no pattern, see: https://github.com/dtolnay/syn/issues/159 
    fn x(u8) {}
    
    // function with a self parameter
    fn x(mut self) { }
    

    There might be more, but I think that's all I found so far. I visualised the code coverage of the fuzz-test (using fuzzcheck-view) within venial and it is 100%. The code coverage within syn is also good, but it takes much longer to reach all relevant code. There, running the fuzzer for about an hour (at least for the first run) might be necessary.

    Most of those differences are completely acceptable I think. In that case, we can write a function fn accept_input(x: &syn::DeriveInput) -> bool that returns true whenever we're okay with venial not parsing it. Then we can call this function before panicking within the fuzz-test, which ensures it keeps making progress.

    A more interesting fuzz-test would try to compare the results between syn and venial and ensure that they match. That shouldn't be too complicated to write but will take a bit more time.

    Let me know if you're interested in using a fuzzer for venial, then I can work a bit more on it and publish what I have on GitHub. Otherwise, I hope the list I have provided is already useful :)

    Have a good day!

    opened by loiclec 6
Owner
Olivier FAURE
Developer, Tinker 2 rating
Olivier FAURE
A simple /proc//{mem,maps} library for Rust

Summary A very simple library that wraps around /proc/pid/mem and /proc/pid/maps to read memory out of a running process on Linux. Usage Basic usage l

null 6 Jul 27, 2022
Library and proc macro to analyze memory usage of data structures in rust.

Allocative: memory profiler for Rust This crate implements a lightweight memory profiler which allows object traversal and memory size introspection.

Meta Experimental 19 Jan 6, 2023
Proc macro implementation of #[naked]

#[naked] Documentation This crate provide a proc macro version of the #[naked] attribute which can be used on stable Rust. Example // The SYSV64 calli

Amanieu d'Antras 10 Aug 13, 2022
proc-macro to help with using surrealdb's custom functions

SurrealDB Functions This is a proc-macro crate that given a path to a .surql file or a folder of .surql files, will parse DEFINE FUNCTION fn::s inside

Aly 5 Jul 30, 2023
Learn to write Rust procedural macros [Rust Latam conference, Montevideo Uruguay, March 2019]

Rust Latam: procedural macros workshop This repo contains a selection of projects designed to learn to write Rust procedural macros — Rust code that g

David Tolnay 2.5k Dec 29, 2022
A Rust crate providing utility functions and macros.

介绍 此库提供四类功能:异常处理、http post收发对象、格式转换、语法糖。 在 Cargo.toml 里添加如下依赖项 [dependencies.xuanmi_base_support] git = "https://github.com/taiyi-research-institute/x

null 17 Mar 22, 2023
Attribute for defining `macro_rules!` macros with proper visibility and scoping

macro-vis This crate provides an attribute for defining macro_rules! macros that have proper visibility and scoping. The default scoping and publicity

null 2 Aug 29, 2022
Linux daemon to bind keys and macros to your controller's buttons

makima Makima is a daemon for Linux to bind your controller's buttons to key sequences and macros. Features: Configure your keybindings through a simp

null 48 Jun 14, 2023
S-expression parsing and writing in Rust

rsexp S-expression parsing and writing in Rust using nom parser combinators. This implemantion aims at being compatible with OCaml's sexplib. The main

Laurent Mazare 12 Oct 18, 2022
Fast and simple datetime, date, time and duration parsing for rust.

speedate Fast and simple datetime, date, time and duration parsing for rust. speedate is a lax† RFC 3339 date and time parser, in other words, it pars

Samuel Colvin 43 Nov 25, 2022
Tutorial for parsing with nom 5.

Nom Tutorial Nom is a wonderful parser combinators library written in Rust. It can handle binary and text files. Consider it where you would otherwise

Benjamin Kay 265 Dec 11, 2022
Static-checked parsing of regexes into structs

Statically-checked regex parsing into structs. This avoids common regex pitfalls like Off by one capture indexes Trying to get nonexistent captures De

Andrew Baxter 4 Dec 18, 2022
PE Parsing, but blazing fast

PE Parser A blazing fast ?? PE Parser written in Rust Motivation The main goals of pe-parser is to write something suitable for a PE Loader. Is editin

Isaac Marovitz 8 Apr 21, 2023
A fast little combinational parsing library.

neure A fast little combinational parsing library Performance rel is mean release, fat is mean release with lto=fat See examples Example use neure::*;

loren 9 Aug 16, 2023
Fast fail2ban-like tools for parsing nginx logs

Fast2ban This is simple fail2ban-like replacement written in Rust. Usage: ./fast2ban # reads default config.toml from current directory ./fast2ban <co

null 36 May 10, 2023
An AI-native lightweight, reliable, and high performance open-source vector database.

What is OasysDB? OasysDB is a vector database that can be used to store and query high-dimensional vectors. Our goal is to make OasysDB fast and easy

Oasys 3 Dec 25, 2023
Leetcode Solutions in Rust, Advent of Code Solutions in Rust and more

RUST GYM Rust Solutions Leetcode Solutions in Rust AdventOfCode Solutions in Rust This project demostrates how to create Data Structures and to implem

Larry Fantasy 635 Jan 3, 2023
Simple autoclicker written in Rust, to learn the Rust language.

RClicker is an autoclicker written in Rust, written to learn more about the Rust programming language. RClicker was was written by me to learn more ab

null 7 Nov 15, 2022
Rust programs written entirely in Rust

mustang Programs written entirely in Rust Mustang is a system for building programs built entirely in Rust, meaning they do not depend on any part of

Dan Gohman 561 Dec 26, 2022