Showcase for pathological compile times when using knuffel / chumsky / VERY LARGE types

Overview

netherquote

Showcase for pathological compile times when using knuffel / chumsky / VERY LARGE types.

How to reproduce

The rust toolchain version is pinned to 1.59.0 stable already, .cargo/config.toml defaults to lld but it probably doesn't make a big difference here.

cargo run finishes in reasonable time, but cargo run --release spends a long time in "netherquote (bin)".

cargo timings don't show much, rustc self-profile shows a bunch of time spent in thin-LTO, I don't know how to go much deeper.

$ summarize summarize netherquote-867123.mm_profdata | less
+-------------------------------------------------+-----------+-----------------+----------+------------+
| Item                                            | Self time | % of total time | Time     | Item count |
+-------------------------------------------------+-----------+-----------------+----------+------------+
| LLVM_passes                                     | 12.66s    | 24.018          | 12.69s   | 1          |
+-------------------------------------------------+-----------+-----------------+----------+------------+
| finish_ongoing_codegen                          | 11.80s    | 22.376          | 11.80s   | 1          |
+-------------------------------------------------+-----------+-----------------+----------+------------+
| LLVM_module_optimize                            | 8.90s     | 16.879          | 8.90s    | 17         |
+-------------------------------------------------+-----------+-----------------+----------+------------+
| LLVM_thin_lto_import                            | 4.89s     | 9.274           | 4.89s    | 16         |
+-------------------------------------------------+-----------+-----------------+----------+------------+
| LLVM_module_codegen_emit_obj                    | 4.71s     | 8.942           | 4.71s    | 17         |
+-------------------------------------------------+-----------+-----------------+----------+------------+
| LLVM_lto_optimize                               | 4.55s     | 8.629           | 4.55s    | 16         |
+-------------------------------------------------+-----------+-----------------+----------+------------+
| codegen_module_perform_lto                      | 1.40s     | 2.658           | 15.67s   | 16         |
+-------------------------------------------------+-----------+-----------------+----------+------------+
| codegen_module                                  | 1.12s     | 2.125           | 1.38s    | 16         |
+-------------------------------------------------+-----------+-----------------+----------+------------+
| codegen_module_optimize                         | 1.01s     | 1.921           | 9.91s    | 17         |
+-------------------------------------------------+-----------+-----------------+----------+------------+
| run_linker                                      | 232.54ms  | 0.441           | 232.54ms | 1          |
+-------------------------------------------------+-----------+-----------------+----------+------------+
| codegen_fulfill_obligation                      | 209.15ms  | 0.397           | 345.49ms | 2308       |
+-------------------------------------------------+-----------+-----------------+----------+------------+
| normalize_projection_ty                         | 201.78ms  | 0.383           | 206.45ms | 823        |
+-------------------------------------------------+-----------+-----------------+----------+------------+

cargo llvm-lines show some multiple-pages-long types: these are generated by one of knuffel's derive macro, which uses chumsky under the hood.

These generate a frightening amount of LLVM IR lines, considering config.rs is barely over a hundred lines:

$ cargo llvm-lines | less
  Lines          Copies       Function name
  -----          ------       -------------
  322815 (100%)  7117 (100%)  (TOTAL)
   38985 (12.1%)  107 (1.5%)  <chumsky::combinator::Then<A,B> as chumsky::Parser<I,(O,U)>>::parse_inner
   21584 (6.7%)   331 (4.7%)  core::result::Result<T,E>::map
   21132 (6.5%)    36 (0.5%)  <chumsky::combinator::Or<A,B> as chumsky::Parser<I,O>>::parse_inner
   14220 (4.4%)   158 (2.2%)  <chumsky::combinator::Map<A,F,O> as chumsky::Parser<I,U>>::parse_inner
   14148 (4.4%)   108 (1.5%)  <chumsky::combinator::Or<A,B> as chumsky::Parser<I,O>>::parse_inner::zip_with
   12336 (3.8%)    48 (0.7%)  <chumsky::combinator::Repeated<A> as chumsky::Parser<I,alloc::vec::Vec<O>>>::parse_inner::{{closure}}
   10154 (3.1%)   158 (2.2%)  <chumsky::combinator::Map<A,F,O> as chumsky::Parser<I,U>>::parse_inner::{{closure}}
    6944 (2.2%)    14 (0.2%)  <chumsky::primitive::Choice<(X_,Y_,Z_),E> as chumsky::Parser<I,O>>::parse_inner
    6664 (2.1%)   180 (2.5%)  <chumsky::combinator::Or<A,B> as chumsky::Parser<I,O>>::parse_inner::{{closure}}
    5634 (1.7%)   112 (1.6%)  chumsky::stream::Stream<I,S>::attempt
    4680 (1.4%)    78 (1.1%)  chumsky::stream::Stream<I,S>::try_parse::{{closure}}
    4440 (1.4%)    24 (0.3%)  <chumsky::combinator::Repeated<A> as chumsky::Parser<I,alloc::vec::Vec<O>>>::parse_inner
    3807 (1.2%)   253 (3.6%)  <chumsky::debug::Silent as chumsky::debug::Debugger>::invoke
    3777 (1.2%)   251 (3.5%)  <chumsky::debug::Verbose as chumsky::debug::Debugger>::invoke
    3272 (1.0%)    14 (0.2%)  <chumsky::primitive::Filter<F,E> as chumsky::Parser<I,I>>::parse_inner

This might have more to do with codegen + LLVM and less with "making rustc data structures / algorithms" more efficient, you be the judge!

You might also like...
Utilities to gather data out of roms. Written in Rust. It (should) support all types.

snesutilities Utilities to gather data out of roms. Written in Rust. It (should) support all types. How Have a look at main.rs: use snesutilities::Sne

Time related types (and conversions) for scientific and astronomical usage.

astrotime Time related types (and conversions) for scientific and astronomical usage. This library is lightweight and high performance. Features The f

Lapce vue plugin, support vue (SFC) syntax highlight, autocomplate,types check
Lapce vue plugin, support vue (SFC) syntax highlight, autocomplate,types check

Lapce Plugin for Vue (based volar) Preview Usage Required: Lapce version must be greater than 2.0, and you can use Lapce nightly version. click here t

A lending iterator trait based on generic associated types and higher-rank trait bounds

A lending iterator trait based on higher-rank trait bounds (HRTBs) A lending iterator is an iterator which lends mutable borrows to the items it retur

serde support for http crate types Request, Response, Uri, StatusCode, HeaderMap

serde extensions for the http crate types Allows serializing and deserializing the following types from http: Response Request HeaderMap StatusCode Ur

Garden monitoring system using m328p Arduino Uno boards. 100% Rust [no_std] using the avr hardware abstraction layer (avr-hal)

uno-revive-rs References Arduino Garden Controller Roadmap uno-revive-rs: roadmap Components & Controllers 1-2 Uno R3 m328p Soil moisture sensor: m328

Concatenate Amazon S3 files remotely using flexible patterns

S3 Concat This tool has been migrated into s3-utils, please use that crate for future updates. A small utility to concatenate files in AWS S3. Designe

Mix async code with CPU-heavy thread pools using Tokio + Rayon

tokio-rayon Mix async code with CPU-heavy thread pools using Tokio + Rayon Resources Documentation crates.io TL;DR Sometimes, you're doing async stuff

Simple popup for using marks in Sway

Simple popup for using marks in Sway This allows you to use vim-like marks in sway easily. Usage: bindsym --to-code $mod+m exec sway-marker mark binds

Comments
  • Profiling results

    Profiling results

    As the README for this repo notes, the llvm-lines output shows there is a lot of LLVM IR being generated. Multiple generic functions have 100+ instantiations.

    Times for non-incremental builds:

    • Check: 0.32s
    • Debug: 5.59
    • Opt: 20.74

    That's an unusually large spread across the three, which aligns with the large amount of LLVM IR.

    Cachegrind results corroborate this, here are the hottest functions:

    --------------------------------------------------------------------------------
    Ir                       file:function
    --------------------------------------------------------------------------------
    10,639,420,867 ( 4.74%)  /home/njn/dev/rust0/src/llvm-project/llvm/include/llvm/Support/DJB.h:llvm::StringMapImpl::LookupBucketFor(llvm::StringRef)
     4,818,733,803 ( 2.14%)  /home/njn/dev/rust0/src/llvm-project/llvm/include/llvm/ADT/SmallPtrSet.h:llvm::PointerMayBeCaptured(llvm::Value const*, llvm::CaptureTracker*, unsigned int)::{lambda(llvm::Va
    lue const*)#1}::operator()(llvm::Value const*) const [clone .isra.0]
     4,186,197,300 ( 1.86%)  /home/njn/dev/rust0/src/llvm-project/llvm/lib/Support/SHA1.cpp:llvm::SHA1::hashBlock()
     3,302,861,263 ( 1.47%)  ./string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_avx_unaligned_erms
     3,297,098,498 ( 1.47%)  /home/njn/dev/rust0/src/llvm-project/llvm/include/llvm/Support/DJB.h:llvm::StringMapImpl::FindKey(llvm::StringRef) const
     3,133,496,963 ( 1.39%)  /home/njn/dev/rust0/src/llvm-project/llvm/lib/Support/MD5.cpp:llvm::MD5::body(llvm::ArrayRef<unsigned char>)
     2,858,892,768 ( 1.27%)  /home/njn/dev/rust0/src/llvm-project/llvm/lib/IR/AttributeImpl.h:llvm::AttributeList::hasFnAttr(llvm::Attribute::AttrKind) const
     2,388,990,570 ( 1.06%)  /home/njn/dev/rust0/src/llvm-project/llvm/lib/Support/SmallPtrSet.cpp:llvm::SmallPtrSetImplBase::FindBucketFor(void const*) const
     2,373,725,224 ( 1.06%)  /home/njn/dev/rust0/src/llvm-project/llvm/lib/IR/Metadata.cpp:llvm::Value::getMetadata(unsigned int) const
     2,285,195,990 ( 1.02%)  /home/njn/dev/rust0/src/llvm-project/llvm/lib/IR/Attributes.cpp:llvm::AttributeList::hasFnAttr(llvm::Attribute::AttrKind) const
     2,158,168,557 ( 0.96%)  /home/njn/dev/rust0/src/llvm-project/llvm/lib/IR/Instructions.cpp:llvm::CallBase::hasFnAttrOnCalledFunction(llvm::Attribute::AttrKind) const
     1,798,854,531 ( 0.80%)  /home/njn/dev/rust0/src/llvm-project/llvm/lib/IR/Attributes.cpp:llvm::AttributeList::hasAttributeAtIndex(unsigned int, llvm::Attribute::AttrKind) const
    

    ... and the results go on and on like that.

    cargo expand gives interesting results. There are a number of very large expressions. A couple of them result in a line that is over 5,500 chars long, long enough that rustfmt skips over them, making them hard to read :( None of them involve parse_inner or invoke, I guess that's a further layer down.

    chumsky is quite aggressive about inlining all its parse_inner functions, I wonder if that's relevant.

    The perf-book has some suggestions about reducing LLVM IR sizes. Perhaps these could be applied to chumsky.

    opened by nnethercote 0
Owner
Amos Wenger
These days: writing Rust code, then writing articles about said Rust code. I love to teach! Into systems programming, networks, trying to get into hardware.
Amos Wenger
Fake ping times.

Pong A Linux program that replies to ping but modifies the payload of the ICMP package to get lower ping times in some ping implementations. See https

Mara Bos 136 Sep 21, 2022
This crate allows you to safely initialize Dynamically Sized Types (DST) using only safe Rust.

This crate allows you to safely initialize Dynamically Sized Types (DST) using only safe Rust.

Christofer Nolander 11 Dec 22, 2022
A very simple Among Us mod launcher

Sussy Launcher ඞ A very simple mod launcher/loader for the game Among Us (Also referred to as Amogus ඞ). This Project is written with the Rust program

null 4 Aug 17, 2022
prelate-rs is an idiomatic, asynchronous Rust wrapper around the aoe4world API. Very much a WIP at this stage.

prelate-rs is an idiomatic, asynchronous Rust wrapper around the aoe4world API. Very much a WIP at this stage. Project Status We currently support the

William Findlay 4 Dec 29, 2022
A library to compile USDT probes into a Rust library

sonde sonde is a library to compile USDT probes into a Rust library, and to generate a friendly Rust idiomatic API around it. Userland Statically Defi

Ivan Enderlin 40 Jan 7, 2023
Lightweight compile-time UUID parser.

compiled-uuid Anywhere you're building Uuids from a string literal, you should use uuid. Motivation If you want to use a fixed Uuid throughout your pr

Quinn 10 Dec 8, 2022
A rollup plugin that compile Rust code into WebAssembly modules

rollup-plugin-rust tl;dr -- see examples This is a rollup plugin that loads Rust code so it can be interop with Javascript base project. Currently, th

Fahmi Akbar Wildana 37 Aug 1, 2022
A proc macro for creating compile-time checked CSS class sets, in the style of classNames

semester Semester is a declarative CSS conditional class name joiner, in the style of React's classnames. It's intended for use in web frameworks (lik

Nathan West 11 Oct 20, 2022
A lean, minimal, and stable set of types for color interoperation between crates in Rust.

This library provides a lean, minimal, and stable set of types for color interoperation between crates in Rust. Its goal is to serve the same function that mint provides for (linear algebra) math types.

Gray Olson 16 Sep 21, 2022
A framework for iterating over collections of types implementing a trait without virtual dispatch

zero_v Zero_V is an experiment in defining behavior over collections of objects implementing some trait without dynamic polymorphism.

null 13 Jul 28, 2022