0.3.0 was released today, and I was surprised by the zip file size:
|File name|0.2.0
|0.3.0
|
|----|----|----|
|i686-pc-windows-msvc
|1.74 MB|3.36 MB|
|x86_64-apple-darwin
|1.7 MB|3.41 MB|
|x86_64-pc-windows-msvc
|1.74 MB|3.36 MB|
Clearly something has changed!
Note that these are compressed sizes reported on the GitHub releases page. The rest of this analysis will focus on uncompressed executable sizes. For reference, the size of the uncompressed x86_64
Windows exe is 4,410,880 bytes for 0.2.0, and 7,805,952 bytes for 0.3.0.
My first suspect was ureq
, added in #11. This crate handles HTTP requests. I ran some high-level analysis with cargo-bloat
and was able to verify that #11 increased the executable size by about 1.3 MB and added about 800 functions (after optimizations including inlining and dead code elimination). None of these new functions show up in the top-20 list provided by cargo-bloat
, meaning they are all less than 23 KB.
The dependency tree for ureq
points to some things that can likely be removed or replaced with a smaller alternative:
ureq
v2.3.0
├── base64 v0.13.0
├── chunked_transfer v1.4.0
├── log v0.4.14
│ └── cfg-if v1.0.0
├── once_cell v1.8.0
├── rustls v0.20.0
│ ├── log v0.4.14 (*)
│ ├── ring v0.16.20
│ │ ├── spin v0.5.2
│ │ ├── untrusted v0.7.1
│ │ └── winapi v0.3.9
│ ├── sct v0.7.0
│ │ ├── ring v0.16.20 (*)
│ │ └── untrusted v0.7.1
│ └── webpki v0.22.0
│ ├── ring v0.16.20 (*)
│ └── untrusted v0.7.1
├── serde v1.0.130
│ └── serde_derive v1.0.130 (proc-macro)
│ ├── proc-macro2 v1.0.32
│ │ └── unicode-xid v0.2.2
│ ├── quote v1.0.10
│ │ └── proc-macro2 v1.0.32 (*)
│ └── syn v1.0.81
│ ├── proc-macro2 v1.0.32 (*)
│ ├── quote v1.0.10 (*)
│ └── unicode-xid v0.2.2
├── serde_json v1.0.69
│ ├── itoa v0.4.8
│ ├── ryu v1.0.5
│ └── serde v1.0.130 (*)
├── url v2.2.2
│ ├── form_urlencoded v1.0.1
│ │ ├── matches v0.1.9
│ │ └── percent-encoding v2.1.0
│ ├── idna v0.2.3
│ │ ├── matches v0.1.9
│ │ ├── unicode-bidi v0.3.7
│ │ └── unicode-normalization v0.1.19
│ │ └── tinyvec v1.5.0
│ │ └── tinyvec_macros v0.1.0
│ ├── matches v0.1.9
│ └── percent-encoding v2.1.0
├── webpki v0.22.0 (*)
└── webpki-roots v0.22.1
└── webpki v0.22.0 (*)
Number of dependencies is not a great proxy for contribution to executable size, but rustls
, serde
, and url
looks like the heaviest transitive dependencies. We need TLS, but ureq
can use either rustls
or native-tls
under the hood, so we might get a smaller exe just by switching? That's one theory to test.
On the other hand, serde_json
is not necessary at all! I use it for conveniently parsing the JSON response from the GitHub API. But JSON parsing can be done with other crates like the aptly-named json
, which has zero additional dependencies (beyond std
): https://crates.io/crates/json/0.12.4/dependencies Switching JSON parsing to the json
crate may or may not reduce overall exe size. That's another theory to test. (Aside: serde
is used by other crates, so getting rid of all of it is highly unlikely.)
We need to handle URLs, so I guess the url
crate is strictly necessary. For my particular use-case, I don't need to care about IDNA at all. Maybe there is something I could do to trick ureq
into not calling into the url
crate, and LTO will trim all the useless IDNA stuff. That's theory three.
But that doesn't account for all 3,395,072 bytes added between the two releases.
My second suspicion was something added when updating dependencies in #28. cargo-bloat
pointed out an additional 1.8 MB from about 550 new functions added by this specific PR. Combined, that's roughly 3.1 MB added to the exe size from two PRs. That falls into the ballpark.
This time, there are two differences of note in the dependency tree. The first is rfd
(which handles dialog windows).
Used in 0.2.0:
rfd
v0.5.1
├── raw-window-handle v0.3.3
│ └── libc v0.2.107
└── winapi v0.3.9
Used in 0.3.0:
rfd
v0.6.4
├── raw-window-handle v0.4.2
│ └── cty v0.2.2
└── windows v0.30.0
└── windows_x86_64_msvc v0.30.0
Could windows-rs
be responsible? It doesn't seem like it, since that crate is almost entirely a flat hierarchy of leaf functions. In my experience, these kinds of bloat issues tend to be caused by deeply nested functions with code paths that will never run: https://github.com/rust-console/cargo-n64/commit/8cd9123fd176a96e68b0c0edcaf72788775137c2
The next suspect was ... ureq
again! A minor update added support for gzip, which includes a small tree of its own with `flate2:
ureq
v2.4.0
ureq v2.4.0
├── base64 v0.13.0
├── chunked_transfer v1.4.0
├── flate2 v1.0.22
│ ├── cfg-if v1.0.0
│ ├── crc32fast v1.3.1
│ │ └── cfg-if v1.0.0
│ ├── libc v0.2.113
│ └── miniz_oxide v0.4.4
│ └── adler v1.0.2
├── log v0.4.14
│ └── cfg-if v1.0.0
├── once_cell v1.9.0
├── rustls v0.20.2
│ ├── log v0.4.14 (*)
│ ├── ring v0.16.20
│ │ ├── spin v0.5.2
│ │ ├── untrusted v0.7.1
│ │ └── winapi v0.3.9
│ ├── sct v0.7.0
│ │ ├── ring v0.16.20 (*)
│ │ └── untrusted v0.7.1
│ └── webpki v0.22.0
│ ├── ring v0.16.20 (*)
│ └── untrusted v0.7.1
├── serde v1.0.135
│ └── serde_derive v1.0.135 (proc-macro)
│ ├── proc-macro2 v1.0.36
│ │ └── unicode-xid v0.2.2
│ ├── quote v1.0.15
│ │ └── proc-macro2 v1.0.36 (*)
│ └── syn v1.0.86
│ ├── proc-macro2 v1.0.36 (*)
│ ├── quote v1.0.15 (*)
│ └── unicode-xid v0.2.2
├── serde_json v1.0.78
│ ├── itoa v1.0.1
│ ├── ryu v1.0.9
│ └── serde v1.0.135 (*)
├── url v2.2.2
│ ├── form_urlencoded v1.0.1
│ │ ├── matches v0.1.9
│ │ └── percent-encoding v2.1.0
│ ├── idna v0.2.3
│ │ ├── matches v0.1.9
│ │ ├── unicode-bidi v0.3.7
│ │ └── unicode-normalization v0.1.19
│ │ └── tinyvec v1.5.1
│ │ └── tinyvec_macros v0.1.0
│ ├── matches v0.1.9
│ └── percent-encoding v2.1.0
├── webpki v0.22.0 (*)
└── webpki-roots v0.22.2
└── webpki v0.22.0 (*)
gzip
is totally optional (both in the HTTML spec, and as a feature flag supported by ureq
) so we can just disable this and see how much it saves. That's the fourth (and final?) theory for where the bloat comes from.
Based on the above analysis, I have four things to check. I'm not planning to test these combinatorically, just sequentially. (For instance, step 3 is additive with steps 1 and 2; I don't intend to test what steps 1 and 3 look like while skipping step 2.) For each step, I keep the result that produces the smaller exe. Keep in mind, that may end up being exactly what we have in 0.3.0 if all four steps show that the status quo is optimal.
The basic plan is:
- Replace
rustls
with native-tls
.
- Replace
serde_json
with json
.
- Find a way to trim IDNA from URL parsing.
- Disable gzip support.
I'll start with a baseline, building the 0.3.0
tag as-is.
$ cargo build --release && ls -l target/release/cartunes.exe
[...snip...]
-rwxr-xr-x 2 jay 197609 7769088 Jan 24 20:19 target/release/cartunes.exe*
The file size is a little different than what's in the zip because I'm locally building with a recent nightly compiler (our CI uses the latest stable compiler) and I'm also using a different linker and build flags between my local machine and CI.
From this baseline, replace rustls
with native-tls
:
-rwxr-xr-x 2 jay 197609 7010304 Jan 24 21:36 target/release/cartunes.exe*
That satisfies my curiosity for whether native-tls
produces smaller code than rustls
. Yes, it does. About 750 KB worth of savings in my case.
Next, replace serde_json
with json
:
-rwxr-xr-x 2 jay 197609 7001088 Jan 24 21:40 target/release/cartunes.exe*
This theory only gave us a marginal win. About 9 KB. Well within the noise threshold, really. That result doesn't justify the additional code to maintain:
Abandoned patch
diff --git a/Cargo.toml b/Cargo.toml
index 50e062f..812ea78 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -25,6 +25,7 @@ epaint = { version = "0.16", default-features = false, features = ["single_threa
font-loader = "0.11"
human-sort = "0.2"
hotwatch = "0.4"
+json = "0.12"
kuchiki = "0.8"
log = "0.4"
native-tls = "0.2"
@@ -34,11 +35,10 @@ pollster = "0.2"
raw-window-handle = "0.4"
rfd = "0.6"
semver = "1.0"
-serde = { version = "1.0", features = ["derive"] }
thiserror = "1.0"
toml_edit = "0.13"
unicode-segmentation = "1.7"
-ureq = { version = "2.3", default-features = false, features = ["gzip", "json", "native-tls"] }
+ureq = { version = "2.3", default-features = false, features = ["gzip", "native-tls"] }
walkdir = "2.3"
webbrowser = "0.5"
wgpu = "0.12"
diff --git a/src/updates.rs b/src/updates.rs
index 0c7fde2..65ed2a6 100644
--- a/src/updates.rs
+++ b/src/updates.rs
@@ -8,7 +8,6 @@ use crate::framework::UserEvent;
use crate::timer::Timer;
use log::error;
use semver::Version;
-use serde::Deserialize;
use std::any::Any;
use std::sync::mpsc::{sync_channel, Receiver, SyncSender};
use std::sync::Arc;
@@ -39,6 +38,22 @@ pub(crate) enum Error {
Persist(#[from] PersistError),
}
+/// All the ways in which parsing the JSON response body can fail.
+#[derive(Debug, Error)]
+pub(crate) enum JsonParseError {
+ /// Invalid name.
+ #[error("Invalid release name")]
+ Name,
+
+ /// Invalid body.
+ #[error("Invalid release body")]
+ Body,
+
+ /// Invalid html_url.
+ #[error("Invalid release URL")]
+ Url,
+}
+
/// How often to check for updates.
#[derive(Debug, Copy, Clone, Eq, PartialEq)]
pub(crate) enum UpdateFrequency {
@@ -75,7 +90,7 @@ struct UpdateCheckerThread {
}
/// Parsed API response body.
-#[derive(Debug, Deserialize)]
+#[derive(Debug)]
pub(crate) struct ReleaseBody {
name: String,
body: String,
@@ -264,7 +279,7 @@ impl UpdateCheckerThread {
};
// Parse the response
- let body: ReleaseBody = match res.into_json() {
+ let body_str = match res.into_string() {
Ok(body) => body,
Err(error) => {
error!("HTTP response error: {:?}", error);
@@ -272,6 +287,22 @@ impl UpdateCheckerThread {
}
};
+ let body_json = match json::parse(&body_str) {
+ Ok(body) => body,
+ Err(error) => {
+ error!("JSON parse error: {:?}", error);
+ return self.duration;
+ }
+ };
+
+ let body: ReleaseBody = match body_json.try_into() {
+ Ok(body) => body,
+ Err(error) => {
+ error!("JSON parse error: {:?}", error);
+ return self.duration;
+ }
+ };
+
// Parse the version in the response
let version = match Version::parse(&body.name) {
Ok(version) => version,
@@ -317,3 +348,31 @@ impl UpdateCheckerThread {
}
}
}
+
+impl TryFrom<json::JsonValue> for ReleaseBody {
+ type Error = JsonParseError;
+
+ fn try_from(mut value: json::JsonValue) -> Result<Self, Self::Error> {
+ let name = value
+ .remove("name")
+ .as_str()
+ .ok_or(Self::Error::Name)?
+ .to_string();
+ let body = value
+ .remove("body")
+ .as_str()
+ .ok_or(Self::Error::Body)?
+ .to_string();
+ let html_url = value
+ .remove("html_url")
+ .as_str()
+ .ok_or(Self::Error::Url)?
+ .to_string();
+
+ Ok(Self {
+ name,
+ body,
+ html_url,
+ })
+ }
+}
For this reason, I will continue using serde_json
.
Third, find a way to trim IDNA from URL parsing. There is a merged but as-yet unreleased PR for url
that makes idna
optional: https://github.com/servo/rust-url/pull/728
Patching ureq
locally to disable this, we get:
-rwxr-xr-x 2 jay 197609 6682112 Jan 24 21:45 target/release/cartunes.exe*
Saving 328,192 bytes over step 1. This is roughly in line with observations from https://github.com/servo/rust-url/issues/727. Unfortunately, until url
is released with this feature flag, I have to eat the >300 KB bloat.
The final theory I had was that gzip was unnecessarily bloating the executable. Since I'm not going to be using results from steps 2 or 3, this is the "last chance" for a big win. I still have about 2.3 MB completely unaccounted for in these savings figures. Let's disable gzip and hope for the best:
-rwxr-xr-x 2 jay 197609 6953984 Jan 24 21:55 target/release/cartunes.exe*
That saves 56,320 bytes over step 1. It's a modest win, but not nearly what I was looking for.
Now I need to continue digging for the cause of those extra megabytes. Using cargo-bloat
's JSON output and a little bash-fu with jql
points out two other potential crates to look at:
std
added 340 functions since the 0.2.0 release
inplace_it
added 322 functions since the 0.2.0 release
That's odd, why did std
usage grow so much? Unfortunately, build-std
didn't seem to help, plus I ran into issues with panic = "abort"
when I tried: https://github.com/rust-lang/cargo/issues/7359 The additional std
usage is responsible for about 124 KB of executable size. This is most likely from sockets and threads, but that's just a guess.
I have no idea what inplace_it
is:
$ cargo tree -i inplace_it
inplace_it v0.3.3
└── wgpu-hal v0.12.4
├── wgpu v0.12.0
│ ├── cartunes v0.3.0 (C:\Users\jay\projects\cartunes)
│ └── egui_wgpu_backend v0.16.0
│ └── cartunes v0.3.0 (C:\Users\jay\projects\cartunes)
└── wgpu-core v0.12.2
└── wgpu v0.12.0 (*)
It's used by wgpu
. 🤷 And it appears to contribute directly to about 32 KB of executable size.
At this point, I still don't know what the last 2 MB is from. That's a crazy number to not stick out like a sore thumb.
bug