Showcase for pathological compile times when using knuffel / chumsky / VERY LARGE types

Amos Wenger

Last update: Jan 1, 2023

Related tags

Utilities netherquote

Overview

netherquote

Showcase for pathological compile times when using knuffel / chumsky / VERY LARGE types.

How to reproduce

The rust toolchain version is pinned to 1.59.0 stable already, .cargo/config.toml defaults to lld but it probably doesn't make a big difference here.

cargo run finishes in reasonable time, but cargo run --release spends a long time in "netherquote (bin)".

cargo timings don't show much, rustc self-profile shows a bunch of time spent in thin-LTO, I don't know how to go much deeper.

$ summarize summarize netherquote-867123.mm_profdata | less
+-------------------------------------------------+-----------+-----------------+----------+------------+
| Item                                            | Self time | % of total time | Time     | Item count |
+-------------------------------------------------+-----------+-----------------+----------+------------+
| LLVM_passes                                     | 12.66s    | 24.018          | 12.69s   | 1          |
+-------------------------------------------------+-----------+-----------------+----------+------------+
| finish_ongoing_codegen                          | 11.80s    | 22.376          | 11.80s   | 1          |
+-------------------------------------------------+-----------+-----------------+----------+------------+
| LLVM_module_optimize                            | 8.90s     | 16.879          | 8.90s    | 17         |
+-------------------------------------------------+-----------+-----------------+----------+------------+
| LLVM_thin_lto_import                            | 4.89s     | 9.274           | 4.89s    | 16         |
+-------------------------------------------------+-----------+-----------------+----------+------------+
| LLVM_module_codegen_emit_obj                    | 4.71s     | 8.942           | 4.71s    | 17         |
+-------------------------------------------------+-----------+-----------------+----------+------------+
| LLVM_lto_optimize                               | 4.55s     | 8.629           | 4.55s    | 16         |
+-------------------------------------------------+-----------+-----------------+----------+------------+
| codegen_module_perform_lto                      | 1.40s     | 2.658           | 15.67s   | 16         |
+-------------------------------------------------+-----------+-----------------+----------+------------+
| codegen_module                                  | 1.12s     | 2.125           | 1.38s    | 16         |
+-------------------------------------------------+-----------+-----------------+----------+------------+
| codegen_module_optimize                         | 1.01s     | 1.921           | 9.91s    | 17         |
+-------------------------------------------------+-----------+-----------------+----------+------------+
| run_linker                                      | 232.54ms  | 0.441           | 232.54ms | 1          |
+-------------------------------------------------+-----------+-----------------+----------+------------+
| codegen_fulfill_obligation                      | 209.15ms  | 0.397           | 345.49ms | 2308       |
+-------------------------------------------------+-----------+-----------------+----------+------------+
| normalize_projection_ty                         | 201.78ms  | 0.383           | 206.45ms | 823        |
+-------------------------------------------------+-----------+-----------------+----------+------------+

cargo llvm-lines show some multiple-pages-long types: these are generated by one of knuffel's derive macro, which uses chumsky under the hood.

These generate a frightening amount of LLVM IR lines, considering config.rs is barely over a hundred lines:

$ cargo llvm-lines | less
  Lines          Copies       Function name
  -----          ------       -------------
  322815 (100%)  7117 (100%)  (TOTAL)
   38985 (12.1%)  107 (1.5%)  <chumsky::combinator::Then<A,B> as chumsky::Parser<I,(O,U)>>::parse_inner
   21584 (6.7%)   331 (4.7%)  core::result::Result<T,E>::map
   21132 (6.5%)    36 (0.5%)  <chumsky::combinator::Or<A,B> as chumsky::Parser<I,O>>::parse_inner
   14220 (4.4%)   158 (2.2%)  <chumsky::combinator::Map<A,F,O> as chumsky::Parser<I,U>>::parse_inner
   14148 (4.4%)   108 (1.5%)  <chumsky::combinator::Or<A,B> as chumsky::Parser<I,O>>::parse_inner::zip_with
   12336 (3.8%)    48 (0.7%)  <chumsky::combinator::Repeated<A> as chumsky::Parser<I,alloc::vec::Vec<O>>>::parse_inner::{{closure}}
   10154 (3.1%)   158 (2.2%)  <chumsky::combinator::Map<A,F,O> as chumsky::Parser<I,U>>::parse_inner::{{closure}}
    6944 (2.2%)    14 (0.2%)  <chumsky::primitive::Choice<(X_,Y_,Z_),E> as chumsky::Parser<I,O>>::parse_inner
    6664 (2.1%)   180 (2.5%)  <chumsky::combinator::Or<A,B> as chumsky::Parser<I,O>>::parse_inner::{{closure}}
    5634 (1.7%)   112 (1.6%)  chumsky::stream::Stream<I,S>::attempt
    4680 (1.4%)    78 (1.1%)  chumsky::stream::Stream<I,S>::try_parse::{{closure}}
    4440 (1.4%)    24 (0.3%)  <chumsky::combinator::Repeated<A> as chumsky::Parser<I,alloc::vec::Vec<O>>>::parse_inner
    3807 (1.2%)   253 (3.6%)  <chumsky::debug::Silent as chumsky::debug::Debugger>::invoke
    3777 (1.2%)   251 (3.5%)  <chumsky::debug::Verbose as chumsky::debug::Debugger>::invoke
    3272 (1.0%)    14 (0.2%)  <chumsky::primitive::Filter<F,E> as chumsky::Parser<I,I>>::parse_inner

This might have more to do with codegen + LLVM and less with "making rustc data structures / algorithms" more efficient, you be the judge!

Utilities to gather data out of roms. Written in Rust. It (should) support all types.

snesutilities Utilities to gather data out of roms. Written in Rust. It (should) support all types. How Have a look at main.rs: use snesutilities::Sne

5 Oct 12, 2022

Time related types (and conversions) for scientific and astronomical usage.

astrotime Time related types (and conversions) for scientific and astronomical usage. This library is lightweight and high performance. Features The f

3 Aug 22, 2022

Lapce vue plugin, support vue (SFC) syntax highlight, autocomplate,types check

Lapce Plugin for Vue (based volar) Preview Usage Required: Lapce version must be greater than 2.0, and you can use Lapce nightly version. click here t

32 Dec 26, 2022

A lending iterator trait based on generic associated types and higher-rank trait bounds

A lending iterator trait based on higher-rank trait bounds (HRTBs) A lending iterator is an iterator which lends mutable borrows to the items it retur

6 Oct 23, 2023

serde support for http crate types Request, Response, Uri, StatusCode, HeaderMap

serde extensions for the http crate types Allows serializing and deserializing the following types from http: Response Request HeaderMap StatusCode Ur

3 Nov 1, 2023

Garden monitoring system using m328p Arduino Uno boards. 100% Rust [no_std] using the avr hardware abstraction layer (avr-hal)

uno-revive-rs References Arduino Garden Controller Roadmap uno-revive-rs: roadmap Components & Controllers 1-2 Uno R3 m328p Soil moisture sensor: m328

1 May 4, 2022

Concatenate Amazon S3 files remotely using flexible patterns

S3 Concat This tool has been migrated into s3-utils, please use that crate for future updates. A small utility to concatenate files in AWS S3. Designe

33 Dec 15, 2022

Mix async code with CPU-heavy thread pools using Tokio + Rayon

tokio-rayon Mix async code with CPU-heavy thread pools using Tokio + Rayon Resources Documentation crates.io TL;DR Sometimes, you're doing async stuff

74 Jan 2, 2023

Simple popup for using marks in Sway

Simple popup for using marks in Sway This allows you to use vim-like marks in sway easily. Usage: bindsym --to-code $mod+m exec sway-marker mark binds

21 Dec 21, 2022

Comments

Profiling results

As the README for this repo notes, the llvm-lines output shows there is a lot of LLVM IR being generated. Multiple generic functions have 100+ instantiations.

Times for non-incremental builds:

Check: 0.32s
Debug: 5.59
Opt: 20.74

That's an unusually large spread across the three, which aligns with the large amount of LLVM IR.

Cachegrind results corroborate this, here are the hottest functions:

--------------------------------------------------------------------------------
Ir                       file:function
--------------------------------------------------------------------------------
10,639,420,867 ( 4.74%)  /home/njn/dev/rust0/src/llvm-project/llvm/include/llvm/Support/DJB.h:llvm::StringMapImpl::LookupBucketFor(llvm::StringRef)
 4,818,733,803 ( 2.14%)  /home/njn/dev/rust0/src/llvm-project/llvm/include/llvm/ADT/SmallPtrSet.h:llvm::PointerMayBeCaptured(llvm::Value const*, llvm::CaptureTracker*, unsigned int)::{lambda(llvm::Va
lue const*)#1}::operator()(llvm::Value const*) const [clone .isra.0]
 4,186,197,300 ( 1.86%)  /home/njn/dev/rust0/src/llvm-project/llvm/lib/Support/SHA1.cpp:llvm::SHA1::hashBlock()
 3,302,861,263 ( 1.47%)  ./string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_avx_unaligned_erms
 3,297,098,498 ( 1.47%)  /home/njn/dev/rust0/src/llvm-project/llvm/include/llvm/Support/DJB.h:llvm::StringMapImpl::FindKey(llvm::StringRef) const
 3,133,496,963 ( 1.39%)  /home/njn/dev/rust0/src/llvm-project/llvm/lib/Support/MD5.cpp:llvm::MD5::body(llvm::ArrayRef<unsigned char>)
 2,858,892,768 ( 1.27%)  /home/njn/dev/rust0/src/llvm-project/llvm/lib/IR/AttributeImpl.h:llvm::AttributeList::hasFnAttr(llvm::Attribute::AttrKind) const
 2,388,990,570 ( 1.06%)  /home/njn/dev/rust0/src/llvm-project/llvm/lib/Support/SmallPtrSet.cpp:llvm::SmallPtrSetImplBase::FindBucketFor(void const*) const
 2,373,725,224 ( 1.06%)  /home/njn/dev/rust0/src/llvm-project/llvm/lib/IR/Metadata.cpp:llvm::Value::getMetadata(unsigned int) const
 2,285,195,990 ( 1.02%)  /home/njn/dev/rust0/src/llvm-project/llvm/lib/IR/Attributes.cpp:llvm::AttributeList::hasFnAttr(llvm::Attribute::AttrKind) const
 2,158,168,557 ( 0.96%)  /home/njn/dev/rust0/src/llvm-project/llvm/lib/IR/Instructions.cpp:llvm::CallBase::hasFnAttrOnCalledFunction(llvm::Attribute::AttrKind) const
 1,798,854,531 ( 0.80%)  /home/njn/dev/rust0/src/llvm-project/llvm/lib/IR/Attributes.cpp:llvm::AttributeList::hasAttributeAtIndex(unsigned int, llvm::Attribute::AttrKind) const

... and the results go on and on like that.

cargo expand gives interesting results. There are a number of very large expressions. A couple of them result in a line that is over 5,500 chars long, long enough that rustfmt skips over them, making them hard to read :( None of them involve parse_inner or invoke, I guess that's a further layer down.

chumsky is quite aggressive about inlining all its parse_inner functions, I wonder if that's relevant.

The perf-book has some suggestions about reducing LLVM IR sizes. Perhaps these could be applied to chumsky.

opened by nnethercote 0

Showcase for pathological compile times when using knuffel / chumsky / VERY LARGE types

Related tags

Overview

netherquote

How to reproduce

You might also like...

Utilities to gather data out of roms. Written in Rust. It (should) support all types.

Time related types (and conversions) for scientific and astronomical usage.

Lapce vue plugin, support vue (SFC) syntax highlight, autocomplate,types check

A lending iterator trait based on generic associated types and higher-rank trait bounds

serde support for http crate types Request, Response, Uri, StatusCode, HeaderMap

Garden monitoring system using m328p Arduino Uno boards. 100% Rust [no_std] using the avr hardware abstraction layer (avr-hal)

Concatenate Amazon S3 files remotely using flexible patterns

Mix async code with CPU-heavy thread pools using Tokio + Rayon

Simple popup for using marks in Sway

Comments

Profiling results

Owner

Amos Wenger

Fake ping times.

This crate allows you to safely initialize Dynamically Sized Types (DST) using only safe Rust.

A very simple Among Us mod launcher

prelate-rs is an idiomatic, asynchronous Rust wrapper around the aoe4world API. Very much a WIP at this stage.

A library to compile USDT probes into a Rust library

Lightweight compile-time UUID parser.

A rollup plugin that compile Rust code into WebAssembly modules

A proc macro for creating compile-time checked CSS class sets, in the style of classNames

A lean, minimal, and stable set of types for color interoperation between crates in Rust.

A framework for iterating over collections of types implementing a trait without virtual dispatch