replaces fixed-sized string prefixes & whole sections in binaries for fast, debuggable, reproducible builds

Overview

Replacing fixed-sized string prefixes in binaries to refix them to their build context

Here's the long story about what refix does and why you'd want to do this.

The short story is, refix replaces fixed-size string prefixes inside binary files (executables, shared libraries, static libraries, and object files.) It can also replace the data of whole sections with equally sized data read from a file (if --section name file is passed one or more times.)

Why would you want this? Let's say you have reproducible builds, which have relative paths to source code, and no build-specific information (like the absolute path of the final executable, the build time etc.) Then you might want - I'd say you very much do want - to put the absolute source code paths, and all that build-specific information, into the executable once it leaves the build system cache (where only reproducible artifacts belong) and is delivered to your filesystem. You want to refix the binary back to the original source path & other build context, as it were, after the build system detached it from this context.

Putting this info back helps debugging a lot! You don't want finding the source code in debuggers and other tools to be a puzzle; you want it to just work.

Putting absolute source file paths into your binaries

  • Run with the equivalent of the following gcc flags:
    -fdebug-prefix-map==MAGIC # for DWARF
    -ffile-prefix-map==MAGIC  # for __FILE__
    
  • Make MAGIC long enough for any source path prefix you're willing to support.
  • Why the == in the flag? This invocation assumes that file paths are relative, so it remaps the empty string to MAGIC, meaning, dir/file.c becomes MAGICdir/file.c. You can also pass =/prefix/to/remap=MAGIC, if your build system uses absolute paths.
  • After the linked binary is delivered, use refix to put the actual source path back in (this basically works like replacing with sed would, but taking tens of milliseconds instead of seconds for large binaries):
    refix binary MAGIC actual-source-prefix
  • If the source path is shorter than the length of MAGIC, pad it with forward slashes: /////home/user/src/. If the source path is too long, the post-link step should truncate it, warn, and eventually be changed to outright fail.

That's it, now all debugging tools will find the source code effortlessly!

Putting other build-specific info into your binaries

You can put all the "build context" info (full path to the executable, build date etc.) into a separate section, reserved at build time and filled after link time. You make the section with:

char ver[SIZE] __attribute__((section(".ver"))) = {1};

This reserves SIZE bytes in a section called .ver. It's non-const deliberately, since if it's const, the OS will exclude it from core dumps (why save data to disk when it's guaranteed to be exactly the same as the contents of the section in the binary?) But you might actually very much want to look at the content of this section in a core dump, perhaps before looking at anything else. For instance, looking at this section is how you can find the path of the executable that dumped this core!

How do you find the section in the core dump without having an executable which the debugger could use to tell you the address of ver? Like so: strings core | grep MagicOnlyFoundInVer. What if this section, being non-const, gets overwritten? If you're really worried about this, you can align its base address and size to the OS page size, and mprotect it at init time, though I personally never bothered and have not suffered any consequences to date.

Additionally, our ver variable is deliberately initialized with one 1 followed by zeros, since if it's all zeros, then .ver will be a "bss" section, the kind zeroed by the loader and without space reserved for it in the binary. So you'd have nowhere to write your actual, "non-reproducible" version info at a post-link step.

After the linker is done, and you're running refix to fix the source path, you can pass it more arguments to replace the section:

refix binary MAGIC actual-source-prefix --section .ver file

refix will put the content of file into .ver, or fail if the file's length differs from the section's. You could do this with objcopy, same as you could replace the prefixes with sed; the point of refix is doing it faster (by optimizing for the "same-sized old & new string length" case, as well as knowing to ignore most of the input file where nothing ever needs to be changed.)

Of course, you needn't have this "version info section" to use refix for putting the source path into your binaries. It's another, separate thing you can do that helps with debugging.

You might also like...
Automatically deploy from GitHub to Replit, lightning fast ⚡️

repl.deploy Automatically deploy from GitHub to Replit, lightning fast ⚡️ repl.deploy is split into A GitHub app, which listens for code changes and s

fast rust implementation of online nonnegative matrix factorization as laid out in the paper "detect and track latent factors with online nonnegative matrix factorization"

ONMF status: early work in progress. still figuring this out. code still somewhat messy. api still in flux. fast rust implementation of online nonnega

a super fast scientific calculator with dimensional analysis support written in Rust 🦀

larvae a super fast scientific calculator with dimensional analysis support written in Rust 🦀 🐛 heavily inspired from insect Usage: Command mode: $

A fast lean and clean modern constraint programming solver implementation (in rust)

MaxiCP-rs This project aims at implementing a fast, and clean constraint programming solver with a focus on correctness, simplicity, maintainability a

Fast and simple datetime, date, time and duration parsing for rust.

speedate Fast and simple datetime, date, time and duration parsing for rust. speedate is a lax† RFC 3339 date and time parser, in other words, it pars

A fast, iterative, correct approach to Stackblur, resulting in a very smooth and high-quality output, with no edge bleeding

A fast, iterative, correct approach to Stackblur, resulting in a very smooth and high-quality output, with no edge bleeding. This crate implements a t

Macro for fast implementing serialize methods in serde::Serializer trait

impl_serialize! This library provides a simple procedural macro for fast implementing serialize methods in serde::Serializer trait. [dependencies] imp

Now, the Host is Mine! - Super Fast Sub-domain Takeover Detection!
Now, the Host is Mine! - Super Fast Sub-domain Takeover Detection!

NtH1M - Super Fast Sub-domain Takeover Detection Notice This is a sad notice that our Internet Hero (@TheBinitGhimire) had recent demise on 26th of Ju

Simple and fast proxy checker that include protocol validation;

Open Proxies ⭐️ Leave me a start please ⭐️ it will motivate me to continue maintaining and adding futures About | Technologies | Requirements | Starti

Owner
Yossi Kreinin
Yossi Kreinin
Stack heap flexible string designed to improve performance for Rust

flexible-string A stack heap flexible string designed to improve performance. FlexibleString was first implemented in spdlog-rs crate, which improved

Sprite 6 Feb 9, 2022
Simple string matching with questionmark- and star-wildcard operator

wildmatch Match strings against a simple wildcard pattern. Tests a wildcard pattern p against an input string s. Returns true only when p matches the

Armin Becher 38 Dec 18, 2022
A crate for converting an ASCII text string or file to a single unicode character

A crate for converting an ASCII text string or file to a single unicode character. Also provides a macro to embed encoded source code into a Rust source file. Can also do the same to Python code while still letting the code run as before by wrapping it in a decoder.

Johanna Sörngård 17 Dec 31, 2022
Rust crate for obfuscating string literals.

Obfustring This crate provides a obfuscation macro for string literals. This makes it easy to protect them from common reverse engineering attacks lik

null 7 Mar 1, 2023
A special rope, designed to work with any data type that is not String

AnyRope AnyRope is an arbitrary data type rope for Rust, designed for similar operations that a rope would do, but targeted at data types that are not

ahoyiski 27 Mar 22, 2023
Compact, clone-on-write vector and string.

ecow Compact, clone-on-write vector and string. Types An EcoVec is a reference-counted clone-on-write vector. It takes up two words of space (= 2 usiz

Typst 78 Apr 18, 2023
Parses a relative time string and returns a `Duration`

humantime_to_duration A Rust crate for parsing human-readable relative time strings and converting them to a Duration. Features Parses a variety of hu

null 5 Apr 25, 2023
Idiomatic Rust implementations for various Windows string types (like UNICODE_STRING)

nt-string by Colin Finck <[email protected]> Provides idiomatic Rust implementations for various Windows string types: NtUnicodeString (with NtUnicode

Colin Finck 5 Jun 4, 2023
Rust based magic-string with source map chains support

enhanced-magic-string Rust implementation of https://www.npmjs.com/package/magic-string with original sourcemap chain support. license. This project i

Farm 3 Nov 5, 2023
A fast uuid generator in Python using Rust

ruuid A fast UUID generator for Python built using Rust. Its a simple wrapper on top of Rust's UUID crate. How to use? Installation: pip3 install ruui

Rahul Nair 19 Jul 13, 2022