Rebuilderd debian buildinfo crawler

Overview

Sponsored by:

rebuilderd-debian-buildinfo-crawler

This program parses the Packages.xz debian package index, attempts to discover the right buildinfo file from https://buildinfos.debian.net and prints them in a format that can be understood by rebuilderd:

{
  "name": "0ad-data",
  "version": "0.0.25b-1",
  "distro": "debian",
  "suite": "main",
  "architecture": "all",
  "input_url": "https://buildinfos.debian.net/buildinfo-pool/0/0ad-data/0ad-data_0.0.25b-1_all.buildinfo",
  "artifacts": [
    {
      "name": "0ad-data",
      "version": "0.0.25b-1",
      "url": "https://deb.debian.org/debian/pool/main/0/0ad-data/0ad-data_0.0.25b-1_all.deb"
    },
    {
      "name": "0ad-data-common",
      "version": "0.0.25b-1",
      "url": "https://deb.debian.org/debian/pool/main/0/0ad-data/0ad-data-common_0.0.25b-1_all.deb"
    }
  ]
},

Rebuilderd can then pass the buildinfo files to debrebuild.py and compare the build outputs to the files distributed by debian.

This implementation explicitly attempts to work around the build-version/binnmu-version problem described in my blog post Reproducible Builds: Debian and the case of the missing version string. This project should be considered a workaround.

Usage

# Generate the json
cargo run --release -- --db foo.db --packages-db http://deb.debian.org/debian/dists/sid/main/binary-amd64/Packages.xz --source http://deb.debian.org/debian --distro debian --suite main --release sid --arch amd64 > import.json
# Import the json into rebuilderd (requires rebuilderd/rebuildctl to be setup and configured)
rebuildctl pkgs sync-stdin debian main < import.json

FAQ

The initial import takes very long

Yes, that's a limitation of this workaround. The second run is faster. 🤞

What's https://buildinfos.debian.net/missing-buildinfo/?

If debian distributes a binary package (.deb) that we couldn't locate a buildinfo file for, we still output this build group but use a dummy link. Rebuilderd is going to fail to download this buildinfo file and mark the corresponding .deb's as unreproducible.

To show the list of packages in debian unstable with missing buildinfo files, use this command:

# Check the usage section how import.json is generated
grep missing-buildinfo import.json

I think this is cool work, how can I get more of this?

Follow me on Twitter and consider contributing to my next sponsorhip goal on Github Sponsors, thanks!

License

GPLv3+

You might also like...
A lightweight async Web crawler in Rust, optimized for concurrent scraping while respecting `robots.txt` rules.

🕷️ crawly A lightweight and efficient web crawler in Rust, optimized for concurrent scraping while respecting robots.txt rules. 🚀 Features Concurren

A small, memory efficient crawler written in Rust.

Atra - The smaller way to crawl !!This read me will we reworked in a few days. Currently I am working on a better version and a wiki for the config fi

Owner
Software supply-chain security. Ex vulnerability research. Maintains packages in Arch Linux, Debian, Alpine. Steals food at conferences.
null
The parser library to parse messages from crypto-crawler.

crypto-msg-parser The parser library to parse messages from crypto-crawler. Architecture crypto-msg-parser is the parser library to parse messages fro

null 5 Jan 2, 2023
A project for automatically generating and maintaining Debian repositories from a TOML spec.

Debian Repository Builder A simple utility for constructing and maintaining Debian repositories. Configuration of a repo is based on the directory hie

Pop!_OS 52 Feb 7, 2022
Rust crate which provides direct access to files within a Debian archive

debarchive This Rust crate provides direct access to files within a Debian archive. This crate is used by our debrep utility to generate the Packages

Pop!_OS 11 Dec 18, 2021
Rust parser/validator for Debian version strings

debian version handling in rust This simple crate provides a struct for parsing, validating, manipulating and comparing Debian version strings. It aim

Jelmer Vernooij 2 Jul 8, 2023
Authenticate the cryptographic chain-of-custody of Linux distributions (like Arch Linux and Debian) to their source code inputs

backseat-signed Authenticate the cryptographic chain-of-custody of Linux distributions (like Arch Linux and Debian) to their source code inputs. This

null 25 Apr 17, 2024
Crusty - polite && scalable broad web crawler

Broad web crawling is an activity of going through practically boundless web by starting from a set of locations(urls) and following outgoing links. Usually it doesn't matter where you start from as long as it has outgoing links to external domains.

Sergey F. 72 Jan 2, 2023
A CLI tool based on the crypto-crawler-rs library to crawl trade, level2, level3, ticker, funding rate, etc.

carbonbot A CLI tool based on the crypto-crawler-rs library to crawl trade, level2, level3, ticker, funding rate, etc. Run To quickly get started, cop

null 8 Dec 21, 2022
Lens crawler & cacher

netrunner netrunner is a tool to help build, validate, & create archives for Spyglass lenses. Lenses are a simple set of rules that tell a crawler whi

Spyglass Search 16 Dec 15, 2022
The parser library to parse messages from crypto-crawler.

crypto-msg-parser The parser library to parse messages from crypto-crawler. Architecture crypto-msg-parser is the parser library to parse messages fro

null 5 Jan 2, 2023
🌊 ~ seaward is a crawler which searches for links or a specified word in a website.

?? seaward Installation cargo install seaward On NetBSD a pre-compiled binary is available from the official repositories. To install it, simply run:

null 3 Jul 16, 2023