A minimal `syn` syntax tree pretty-printer

Overview

prettyplease::unparse

github crates.io docs.rs build status

A minimal syn syntax tree pretty-printer.


Overview

This is a pretty-printer to turn a syn syntax tree into a String of well-formatted source code. In contrast to rustfmt, this library is intended to be suitable for arbitrary generated code.

Rustfmt prioritizes high-quality output that is impeccable enough that you'd be comfortable spending your career staring at its output — but that means some heavyweight algorithms, and it has a tendency to bail out on code that is hard to format (for example rustfmt#3697, and there are dozens more issues like it). That's not necessarily a big deal for human-generated code because when code gets highly nested, the human will naturally be inclined to refactor into more easily formattable code. But for generated code, having the formatter just give up leaves it totally unreadable.

This library is designed using the simplest possible algorithm and data structures that can deliver about 95% of the quality of rustfmt-formatted output. In my experience testing real-world code, approximately 97-98% of output lines come out identical between rustfmt's formatting and this crate's. The rest have slightly different linebreak decisions, but still clearly follow the dominant modern Rust style.

The tradeoffs made by this crate are a good fit for generated code that you will not spend your career staring at. For example, the output of bindgen, or the output of cargo-expand. In those cases it's more important that the whole thing be formattable without the formatter giving up, than that it be flawless.


Feature matrix

Here are a few superficial comparisons of this crate against the AST pretty-printer built into rustc, and rustfmt. The sections below go into more detail comparing the output of each of these libraries.

prettyplease rustc rustfmt
non-pathological behavior on big or generated code ✔️
idiomatic modern formatting ("locally indistinguishable from rustfmt") ✔️ ✔️
throughput 60 MB/s 39 MB/s 2.8 MB/s
number of dependencies 3 72 66
compile time including dependencies 2.4 sec 23.1 sec 29.8 sec
buildable using a stable Rust compiler ✔️
published to crates.io ✔️
extensively configurable output ✔️
intended to accommodate hand-maintained source code ✔️

Comparison to rustfmt

If you weren't told which output file is which, it would be practically impossible to tell — except for line 435 in the rustfmt output, which is more than 1000 characters long because rustfmt just gave up formatting that part of the file:

            match segments[5] {
                0 => write!(f, "::{}", ipv4),
                0xffff => write!(f, "::ffff:{}", ipv4),
                _ => unreachable!(),
            }
        } else { # [derive (Copy , Clone , Default)] struct Span { start : usize , len : usize , } let zeroes = { let mut longest = Span :: default () ; let mut current = Span :: default () ; for (i , & segment) in segments . iter () . enumerate () { if segment == 0 { if current . len == 0 { current . start = i ; } current . len += 1 ; if current . len > longest . len { longest = current ; } } else { current = Span :: default () ; } } longest } ; # [doc = " Write a colon-separated part of the address"] # [inline] fn fmt_subslice (f : & mut fmt :: Formatter < '_ > , chunk : & [u16]) -> fmt :: Result { if let Some ((first , tail)) = chunk . split_first () { write ! (f , "{:x}" , first) ? ; for segment in tail { f . write_char (':') ? ; write ! (f , "{:x}" , segment) ? ; } } Ok (()) } if zeroes . len > 1 { fmt_subslice (f , & segments [.. zeroes . start]) ? ; f . write_str ("::") ? ; fmt_subslice (f , & segments [zeroes . start + zeroes . len ..]) } else { fmt_subslice (f , & segments) } }
    } else {
        const IPV6_BUF_LEN: usize = (4 * 8) + 7;
        let mut buf = [0u8; IPV6_BUF_LEN];
        let mut buf_slice = &mut buf[..];

This is a pretty typical manifestation of rustfmt bailing out in generated code — a chunk of the input ends up on one line. The other manifestation is that you're working on some code, running rustfmt on save like a conscientious developer, but after a while notice it isn't doing anything. You introduce an intentional formatting issue, like a stray indent or semicolon, and run rustfmt to check your suspicion. Nope, it doesn't get cleaned up — rustfmt is just not formatting the part of the file you are working on.

The prettyplease library is designed to have no pathological cases that force a bail out; the entire input you give it will get formatted in some "good enough" form.

Separately, rustfmt can be problematic to integrate into projects. It's written using rustc's internal syntax tree, so it can't be built by a stable compiler. Its releases are not regularly published to crates.io, so in Cargo builds you'd need to depend on it as a git dependency, which precludes publishing your crate to crates.io also. You can shell out to a rustfmt binary, but that'll be whatever rustfmt version is installed on each developer's system (if any), which can lead to spurious diffs in checked-in generated code formatted by different versions. In contrast prettyplease is designed to be easy to pull in as a library, and compiles fast.


Comparison to rustc_ast_pretty

This is the pretty-printer that gets used when rustc prints source code, such as rustc -Zunpretty=expanded. It's used also by the standard library's stringify! when stringifying an interpolated macro_rules AST fragment, like an $:expr, and transitively by dbg! and many macros in the ecosystem.

Rustc's formatting is mostly okay, but does not hew closely to the dominant contemporary style of Rust formatting. Some things wouldn't ever be written on one line, like this match expression, and certainly not with a comma in front of the closing brace:

fn eq(&self, other: &IpAddr) -> bool {
    match other { IpAddr::V4(v4) => self == v4, IpAddr::V6(_) => false, }
}

Some places use non-multiple-of-4 indentation, which is definitely not the norm:

pub const fn to_ipv6_mapped(&self) -> Ipv6Addr {
    let [a, b, c, d] = self.octets();
    Ipv6Addr{inner:
                 c::in6_addr{s6_addr:
                                 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0xFF,
                                  0xFF, a, b, c, d],},}
}

And although there isn't an egregious example of it in the link because the input code is pretty tame, in general rustc_ast_pretty has pathological behavior on generated code. It has a tendency to use excessive horizontal indentation and rapidly run out of width:

::std::io::_print(::core::fmt::Arguments::new_v1(&[""],
                                                 &match (&msg,) {
                                                      _args =>
                                                      [::core::fmt::ArgumentV1::new(_args.0,
                                                                                    ::core::fmt::Display::fmt)],
                                                  }));

The snippets above are clearly different from modern rustfmt style. In contrast, prettyplease is designed to have output that is practically indistinguishable from rustfmt-formatted code.


Example

// [dependencies]
// prettyplease = "0.1"
// syn = { version = "1", default-features = false, features = ["full", "parsing"] }

const INPUT: &str = stringify! {
    use crate::{
          lazy::{Lazy, SyncLazy, SyncOnceCell}, panic,
        sync::{ atomic::{AtomicUsize, Ordering::SeqCst},
            mpsc::channel, Mutex, },
      thread,
    };
    impl<T, U> Into<U> for T where U: From<T> {
        fn into(self) -> U { U::from(self) }
    }
};

fn main() {
    let syntax_tree = syn::parse_file(INPUT).unwrap();
    let formatted = prettyplease::unparse(&syntax_tree);
    print!("{}", formatted);
}

Algorithm notes

The approach and terminology used in the implementation are derived from Derek C. Oppen, "Pretty Printing" (1979), on which rustc_ast_pretty is also based, and from rustc_ast_pretty's implementation written by Graydon Hoare in 2011 (and modernized over the years by dozens of volunteer maintainers).

The paper describes two language-agnostic interacting procedures Scan() and Print(). Language-specific code decomposes an input data structure into a stream of string and break tokens, and begin and end tokens for grouping. Each beginend range may be identified as either "consistent breaking" or "inconsistent breaking". If a group is consistently breaking, then if the whole contents do not fit on the line, every break token in the group will receive a linebreak. This is appropriate, for example, for Rust struct literals, or arguments of a function call. If a group is inconsistently breaking, then the string tokens in the group are greedily placed on the line until out of space, and linebroken only at those break tokens for which the next string would not fit. For example, this is appropriate for the contents of a braced use statement in Rust.

Scan's job is to efficiently accumulate sizing information about groups and breaks. For every begin token we compute the distance to the matched end token, and for every break we compute the distance to the next break. The algorithm uses a ringbuffer to hold tokens whose size is not yet ascertained. The maximum size of the ringbuffer is bounded by the target line length and does not grow indefinitely, regardless of deep nesting in the input stream. That's because once a group is sufficiently big, the precise size can no longer make a difference to linebreak decisions and we can effectively treat it as "infinity".

Print's job is to use the sizing information to efficiently assign a "broken" or "not broken" status to every begin token. At that point the output is easily constructed by concatenating string tokens and breaking at break tokens contained within a broken group.

Leveraging these primitives (i.e. cleverly placing the all-or-nothing consistent breaks and greedy inconsistent breaks) to yield rustfmt-compatible formatting for all of Rust's syntax tree nodes is a fun challenge.

Here is a visualization of some Rust tokens fed into the pretty printing algorithm. Consistently breaking beginend pairs are represented by «», inconsistently breaking by , break by ·, and the rest of the non-whitespace are string.

use crate::«{·
‹    lazy::«{·‹Lazy,· SyncLazy,· SyncOnceCell›·}»,·
    panic,·
    sync::«{·
‹        atomic::«{·‹AtomicUsize,· Ordering::SeqCst›·}»,·
        mpsc::channel,· Mutex›,·
    }»,·
    thread›,·
}»;·
«‹«impl<«·T‹›,· U‹›·»>» Into<«·U·»>· for T›·
where·
    U:‹ From<«·T·»>›,·
{·
«    fn into(·«·self·») -> U {·
‹        U::from(«·self·»)›·
»    }·
»}·

The algorithm described in the paper is not quite sufficient for producing well-formatted Rust code that is locally indistinguishable from rustfmt's style. The reason is that in the paper, the complete non-whitespace contents are assumed to be independent of linebreak decisions, with Scan and Print being only in control of the whitespace (spaces and line breaks). In Rust as idiomatically formattted by rustfmt, that is not the case. Trailing commas are one example; the punctuation is only known after the broken vs non-broken status of the surrounding group is known:

let _ = Struct { x: 0, y: true };

let _ = Struct {
    x: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,
    y: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyy,   //<- trailing comma if the expression wrapped
};

The formatting of match expressions is another case; we want small arms on the same line as the pattern, and big arms wrapped in a brace. The presence of the brace punctuation, comma, and semicolon are all dependent on whether the arm fits on the line:

match total_nanos.checked_add(entry.nanos as u64) {
    Some(n) => tmp = n,   //<- small arm, inline with comma
    None => {
        total_secs = total_secs
            .checked_add(total_nanos / NANOS_PER_SEC as u64)
            .expect("overflow in iter::sum over durations");
    }   //<- big arm, needs brace added, and also semicolon^
}

The printing algorithm implementation in this crate accommodates all of these situations with conditional punctuation tokens whose selection can be deferred and populated after it's known that the group is or is not broken.


License

Licensed under either of Apache License, Version 2.0 or MIT license at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this crate by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
Comments
  • "not implemented: Item::Macro2" panic when trying to print libcore

    I expect not yet supported features (e.g. macro by example v2) to be printed unprettily instead of panicking.

    thread 'main' panicked at 'not implemented: Item::Macro2 `macro assert_matches { ($ left : expr , $ (|) ? $ ($ pattern : pat_param) |+ $ (if $ guard : expr) ? $ (,) ?) => { match $ left { $ ($ pattern) |+ $ (if $ guard) ? => { } ref left_val => { $ crate :: panicking :: assert_matches_failed (left_val , $ crate :: stringify ! ($ ($ pattern) |+ $ (if $ guard) ?) , $ crate :: option :: Option :: None) ; } } } , ($ left : expr , $ (|) ? $ ($ pattern : pat_param) |+ $ (if $ guard : expr) ?, $ ($ arg : tt) +) => { match $ left { $ ($ pattern) |+ $ (if $ guard) ? => { } ref left_val => { $ crate :: panicking :: assert_matches_failed (left_val , $ crate :: stringify ! ($ ($ pattern) |+ $ (if $ guard) ?) , $ crate :: option :: Option :: Some ($ crate :: format_args ! ($ ($ arg) +))) ; } } } , }`', /home/vi/.cargo/registry/src/-3d9d141e372ea94e/prettyplease-0.1.10/src/item.rs:169:9
    stack backtrace:
       0: rust_begin_unwind
                 at /rustc/c0672870491e84362f76ddecd50fa229f9b06dff/library/std/src/panicking.rs:584:5
       1: core::panicking::panic_fmt
                 at /rustc/c0672870491e84362f76ddecd50fa229f9b06dff/library/core/src/panicking.rs:142:14
       2: prettyplease::item::<impl prettyplease::algorithm::Printer>::item
       3: prettyplease::item::<impl prettyplease::algorithm::Printer>::item_mod
                 at /home/vi/.cargo/registry/src/-3d9d141e372ea94e/prettyplease-0.1.10/src/item.rs:183:17
       4: prettyplease::item::<impl prettyplease::algorithm::Printer>::item
                 at /home/vi/.cargo/registry/src/-3d9d141e372ea94e/prettyplease-0.1.10/src/item.rs:25:32
       5: prettyplease::file::<impl prettyplease::algorithm::Printer>::file
                 at /home/vi/.cargo/registry/src/-3d9d141e372ea94e/prettyplease-0.1.10/src/file.rs:13:13
       6: prettyplease::unparse
                 at /home/vi/.cargo/registry/src/-3d9d141e372ea94e/prettyplease-0.1.10/src/lib.rs:373:5
       7: syn_file_expand_cli::main
                 at ./crates/syn-file-expand-cli/src/main.rs:140:28
       8: core::ops::function::FnOnce::call_once
                 at /rustc/c0672870491e84362f76ddecd50fa229f9b06dff/library/core/src/ops/function.rs:248:5
    note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
    Aborted
    
    opened by vi 3
  • `Item::Verbatim` support

    `Item::Verbatim` support

    Would you be open to accepting a PR to add support for Item::Verbatim by using quote::ToTokens::to_tokens().to_string() or thereabouts?

    If so, would you prefer this to be added under an optional feature?

    opened by adetaylor 2
  • Trailing comma in tuple removed

    Trailing comma in tuple removed

    Hi,

    I am generating routes for warp, with function return types that look like this:

    warp::filters::BoxedFilter<(impl Reply,)>
    

    You can see the same in a warp example.

    prettyplease removes the trailing comma inside the tuple and the code does not compile.

    Is this inside or outside the 95% that you wish to cover?

    opened by 6xzo 2
  • Fails to format multiline doc comments containing comment end characters

    Fails to format multiline doc comments containing comment end characters

    Hi,

    I tried using this for generating code based on some schema and the comments in that schema contain file glob patterns that contain the comment end characters */. It seems like this library fails to handle them although it works when just using the code inside a proc macro. See https://github.com/mohe2015/prettyplease-bug/blob/main/src/main.rs for a reproducible example.

    Kind regards Moritz

    opened by mohe2015 1
  • Broken formatting of varargs

    Broken formatting of varargs

    prettyplease formats this code

    extern "C" {
        fn printf(format: u8, ...);
    }
    

    into this code

    extern "C" {
        fn printf(format: u8...);
    }
    

    The comma before the vararg was lost.

    opened by Nilstrieb 1
  • Divergence from rustfmt around ranges and closures.

    Divergence from rustfmt around ranges and closures.

    I'm not sure if this is really a valid issue, but I noticed a divergence from rustfmt on a simple case.

    I would expect calling map on a range to produce a flattened 3 line expression. Instead, it produces a 4 line expression where map is dropped to the next line. It seems like prettyplease is eagerly expanding to next line.

    This code is formatted by rustfmt:

    fn main() {
        let b = (0..10).map(|f| async {
            let b = 10;
        });
    }
    

    This is the result from prettyplease:

    fn main() {
        let b = (0..10)
            .map(|f| async {
                let b = 10;
            });
    }
    

    I'm not sure if you care per se, but it would be convenient if these two cases aligned.

    opened by jkelleyrtp 1
  • Implicit requirement of valid `Item`s for formatting

    Implicit requirement of valid `Item`s for formatting

    Currently the implementation requires a valid syn::File rather than an arbitrary tokenstream (well, an almost correct one), already stated in #5 and #6 .

    To bridge this gap the knee jerk reaction is:

    let f = syn::parse2::<syn::File>(tokenstream)?; 
    prettyplease::unparse(f);
    

    but it has a practical issue. Unlike rustfmt, it doesn't try it's best to fix the content, which is a hindrance when developing a proc-macro. But when would one be interested in the generated code beyond development or debugging? I personally would see it as a viable usecase for prettyplease :)

    Context: https://github.com/drahnr/expander uses the host's rustfmt

    Note that I am sorry to open the can of #5 and #6 again, I think this adds some more insight why a Tokenstream based API might be useful.

    Thank you for your continued, awesome work! :heart:

    opened by drahnr 1
  • Support for `syn::Expr`?

    Support for `syn::Expr`?

    I'm building an autoformatter for a macro and would like to support autoformatting arbitrary rust expressions inside said macro.

    Prettyplease is amazing for this :)

    Unfortunately, the only API accessible for printing is through the syn::File type - which means I have to manually wrap each expression with fn main() { #tokens } and then trim off the beginning/end.

    I saw the Printer type - but it's not exposed to just print the Expr type.

    Is this is something you'd be interested in supporting? Any gudance on how to add it?

    opened by jkelleyrtp 1
  • Open brace of multi-line `else if` is excessively indented

    Open brace of multi-line `else if` is excessively indented

    Prettyplease formats the following tokens as:

    fn repro() {
        if input.peek(Token![+]) || input.peek(Token![-]) || input.peek(Token![*])
            || input.peek(Token![/])
        {
            return;
        } else if input.peek(Token![+]) || input.peek(Token![-]) || input.peek(Token![*])
                || input.peek(Token![/])
            {
            return;
        }
    }
    

    The last open brace, and probably the line above it, should be less indented.

    opened by dtolnay 0
  • Expose prettyplease version number for downstream build.rs to inspect

    Expose prettyplease version number for downstream build.rs to inspect

    What version of the formatter a generated file has been formatted with can be relevant.

    In cargo-expand I'd like to include this number in the output of --version --verbose.

    opened by dtolnay 0
  • Fix indentation of multiline return types

    Fix indentation of multiline return types

    Before:

    pub async fn stream(
        self,
    ) -> Result<
            impl futures::Stream<Item = Result<T, tokio_postgres::Error>>,
            tokio_postgres::Error,
        > {
        let stmt = self.stmt().await?;
    }
    

    After: this matches rustfmt's formatting.

    pub async fn stream(
        self,
    ) -> Result<
        impl futures::Stream<Item = Result<T, tokio_postgres::Error>>,
        tokio_postgres::Error,
    > {
        let stmt = self.stmt().await?;
    }
    
    opened by dtolnay 0
Releases(0.1.22)
Owner
David Tolnay
David Tolnay
A pretty simple VM implemented with C++. Just one of my practices.

Bailan VM Overview Bailan VM is a simple VM implemented in C++. It just one of my little practices, but my friend, Zihao Qu encouraged me to open its

27Onion Nebell 3 Oct 6, 2022
A simplistic functional programming language based around Lisp syntax.

Orchid A simplistic functional programming language based around Lisp syntax. Short taste # function to return the larger list (fn larger-list (as bs)

rem 3 May 7, 2022
Oxido is a dynamic interpreted programming language basing most of its syntax on Rust.

Oxido Table of Contents: Oxido Installation Uninstallation Usage Syntax Data types Variables Reassignments Printing If statements Loop statements Func

Oxido 6 Oct 6, 2022
In this repository you can find modules with code and comments that explain rust syntax and all about Rust lang.

Learn Rust What is this? In this repository you can find modules with code and comments that explain rust syntax and all about Rust lang. This is usef

Domagoj Ratko 5 Nov 5, 2022
syntax-level async join enabling branching control flow and shared mutable borrow

enjoin enjoin's async join macros operate at the syntax level. It allows you to... break, continue, and return out of async code running in a join for

Wisha W. 15 Apr 16, 2023
Serialize & deserialize device tree binary using serde

serde_device_tree Use serde framework to deserialize Device Tree Blob binary files; no_std compatible. Use this library Run example: cargo run --examp

Luo Jia 20 Aug 20, 2022
A Rust implementation of generic prefix tree (trie) map with wildcard capture support

prefix_tree_map A Rust implementation of generic prefix tree (trie) map with wildcard capture support. Design Trie is a good data structure for storin

EAimTY 3 Dec 6, 2022
An embedded key-value storage for learning purpose, which is based on the idea of SSTable / LSM-tree.

Nouzdb An embedded key-value storage for learning purpose, which is based on the idea of SSTable / LSM-tree. Plan Implement a memtable. Implement the

Nouzan 1 Dec 5, 2021
A cargo plugin for showing a tree-like overview of a crate's modules.

cargo-modules Synopsis A cargo plugin for showing an overview of a crate's modules. Motivation With time, as your Rust projects grow bigger and bigger

Vincent Esche 445 Jan 3, 2023
A radix tree implementation for router, and provides CRUD operations.

radixtree A radix tree implementation for router, and provides CRUD operations. Radixtree is part of treemux, on top of which updates and removes are

Zhenwei Guo 2 Dec 19, 2022
Embeddable tree-walk interpreter for a "mostly lazy" Lisp-like scripting language.

ceceio Embeddable tree-walk interpreter for a "mostly lazy" Lisp-like scripting language. Just a work-in-progress testbed for now. Sample usage us

Vinícius Miguel 7 Aug 18, 2022
Support SIMD low-memory overhead and high-performance adaptive radix tree.

Artful Artful is an adaptive radix tree library for Rust. At a high-level, it's like a BTreeMap. It is based on the implementation of paper, see The A

future 3 Sep 7, 2022
Key-value store for embedded systems, for raw NOR flash, using an LSM-Tree.

ekv Key-value store for embedded systems, for raw NOR flash, using an LSM-Tree. Features None yet TODO Everything Minimum supported Rust version (MSRV

Dario Nieuwenhuis 16 Nov 22, 2022
A tutorial of building an LSM-Tree storage engine in a week! (WIP)

LSM in a Week Build a simple key-value storage engine in a week! Tutorial The tutorial is available at https://skyzh.github.io/mini-lsm. You can use t

Alex Chi 870 Jan 3, 2023
Purplecoin Core integration/staging tree

ℙurplecoin Official implementation of Purplecoin, the first stateless cryptocurrency. Requires Rust Nightly >=v1.63.0. WARNING The source code is stil

Purplecoin 5 Dec 31, 2022
Use rust programming language to create a b+ tree.

Use rust programming language to create a b+ tree.

yangshijie 3 Jan 7, 2023
Purplecoin Core integration/staging tree

ℙurplecoin Official implementation of Purplecoin, the first stateless cryptocurrency. Requires Rust Nightly >=v1.63.0. WARNING The source code is stil

Purplecoin 8 Jan 12, 2023
An Adaptive Radix Tree implementation.

Ryan's Adaptive Radix Tree This is yet another implementation of an Adaptive Radix Tree (ART) in Rust. ARTs are an ordered associative (key-value) str

Ryan Daum 21 Jun 16, 2023
A pure-rust(with zero dependencies) fenwick tree, for the efficient computation of dynamic prefix sums.

indexset A pure-rust(with zero dependencies, no-std) fenwick tree, for the efficient computation of dynamic prefix sums. Background Did you ever have

Bruno Rucy Carneiro Alves de Lima 2 Jul 13, 2023