Tools for managing GitHub block lists

Overview

GitHub block list management

Build status

Octocrabby is a small set of command-line tools and Octocrab extensions that are focused on managing block lists on GitHub. This project was inspired by an open letter supporting Richard Stallman, which has been signed by several thousand GitHub users I don't want to accidentally donate free open source support to.

This project may eventually get merged into cancel-culture, which is currently focused on archiving and block list management for Twitter.

Usage

This project is made of Rust, and you currently need Rust and Cargo installed to use it. If you've followed these instructions and cloned this repo locally, you can build the CLI by running the following command from the project directory:

$ cargo build --release
   Compiling bytes v1.0.1
   ...
   Compiling octocrabby v0.1.0 (/home/travis/projects/octocrabby)
    Finished release [optimized] target(s) in 1m 35s

Most operations require a GitHub personal access token, which you currently have to provide as a command-line option. If you want to use the mass-blocking functionality, you'll need to select the user scope when creating your token. If you only want to generate reports or export your follower or block lists, that shouldn't be necessary. The following examples assume that this has been exported to the environment variable GH_TOKEN.

Contributor reports

One operation that doesn't require a personal access token is list-pr-contributors:

$ target/release/crabby -vvvv list-pr-contributors -r rms-support-letter/rms-support-letter.github.io > data/rms-support-letter-contributors.csv

If no token is provided, this command will output a CSV document with a row for each GitHub user who contributed a pull request to the given repository. Each row will have three columns:

  1. GitHub username
  2. GitHub user ID
  3. Number of PRs for this repository

For example:

0312birdzhang,1762041,1
0hueliSJWpidorasi,81465353,1
0kalekale,31927746,1
0ver3inker,53104897,1
0x0000ff,1977210,1

If you provide a personal access token to this command (via -t), the output will include several additional columns:

  1. GitHub username
  2. GitHub user ID
  3. Number of PRs for this repo
  4. Number of days between account creation and the first PR to this repo
  5. The user's name (if available)
  6. The Twitter handle provided by the user (if available)
  7. A boolean indicating whether you follow this user
  8. A boolean indicating whether this user follows you

For example:

01012,14347178,2,2019,,,false,false
0312birdzhang,1762041,1,3229,BirdZhang,,false,false
0MazaHacka0,11509345,1,2204,Dmitry Abakumov,,false,false
0hueliSJWpidorasi,81465353,1,0,,,false,false
0kalekale,31927746,1,1288,kalekale,,false,false
0mid,288476,1,3958,,,false,false
0rhan,33350605,2,1241,Orhan Gurbanov,,false,false
0ver3inker,53104897,1,617,0ver3inker,0ver3inker,false,false
0x0000-dot-ru,1397843,2,3343,Dmitriy Balakin,,false,false
0x0000ff,1977210,1,3176,,,false,false

Please note that GitHub does not verify that the Twitter handle provided by a GitHub user in their GitHub profile is owned by that user (or that it exists, etc.), so that field should not be used for automated blocking on Twitter. You can omit that column from the output by providing --omit-twitter.

You can find copies of the output of this command in this project's data directory.

This allows us to see how many of the signatories were using single-purpose throwaway accounts, for example. As of this morning, only 82 of the 3,000+ accounts were created on the same day they opened their PR:

$ awk -F, '$4 == 0' data/rms-support-letter-contributors.csv | wc
     82     102    3282

You can also check how many of the signers follow you on GitHub:

$ egrep -r "true,(true|false)$" data/rms-support-letter-contributors.csv | wc
      0       0       0

And how many you follow:

$ egrep -r "true$" data/rms-support-letter-contributors.csv | wc
      0       0       0

Good.

Follow and block list export

The CLI also allows you to export lists of users you follow, are followed by, and block:

$ target/release/crabby -vvvv -t $GH_TOKEN list-following | wc
     24      24     408

$ target/release/crabby -vvvv -t $GH_TOKEN list-followers | wc
    575     575   10416

$ target/release/crabby -vvvv -t $GH_TOKEN list-blocks | head
alexy,27491
soc,42493
jdegoes,156745
vmarquez,427578
gvolpe,443978
neko-kai,450507
hmemcpy,601206
kubukoz,894884
propensive,1024588
phderome,11035032

The format is a two-column CSV with username and user ID.

It's also possible to export the block list of an organization you administer by adding --org $MY_ORG to the list-blocks command (note that this requires your token to have the read:org scope enabled).

In general it's probably a good idea to save the output of the list-blocks command before using the mass-blocking functionality in the next section.

Mass blocking

The CLI also includes a block-users command that accepts CSV rows from standard input. It ignores all columns except the first, which it expects to be a GitHub username. This is designed to make it convenient to save the output of list-pr-contributors, manually remove accounts if needed, and then block the rest.

target/release/crabby -vvv -t $GH_TOKEN block-users < data/rms-support-letter-contributors.csv
15:17:36 [WARN] Skipping 3936 known blocked users
15:17:36 [INFO] Successfully blocked Aliaksei-Tatarynchyk
...

If you've set the logging level to at least WARN (via the -vvv or -vvvv options), it will show you a message for each user who is blocked. Note that if you've blocked thousands of accounts or are running the script on a repository for the first time, it may be faster to include the --force option, which doesn't download your current block list, but simply requests a block for each user.

It's also possible to block a list of users on behalf of an organization that you administer by adding --org $MY_ORG to the block-users command (assuming your token has write:org enabled).

Other tools

You can view all currently supported commands with -h:

crabby 0.1.0
Travis Brown <[email protected]>

USAGE:
    crabby [FLAGS] [OPTIONS] <SUBCOMMAND>

FLAGS:
    -h, --help       Prints help information
    -v, --verbose    Logging verbosity
    -V, --version    Prints version information

OPTIONS:
    -t, --token <token>    A GitHub personal access token (not needed for all operations)

SUBCOMMANDS:
    block-users             Block a list of users provided in CSV format to stdin
    check-follow            Check whether one user follows another
    help                    Prints this message or the help of the given subcommand(s)
    list-blocks             List accounts the authenticated user blocks in CSV format to stdout
    list-followers          List the authenticated user's followers in CSV format to stdout
    list-following          List accounts the authenticated user follows in CSV format to stdout
    list-pr-contributors    List PR contributors for the given repository

Caveats and future work

I wrote this thing yesterday afternoon. It's completely untested. It might not work. For your own safety please don't use it with a personal access token with unneeded permissions (i.e. anything except user).

It's probably possible to include the account age in the contributor report even when unauthenticated—I just wasn't able to find a way to get information about multiple users via a single request except through the GraphQL endpoint, which is only available to authenticated users (and if you request each user individually, you'll run into GitHub's rate limits for projects like the Stallman support letter).

Related projects

License

This project is licensed under the Mozilla Public License, version 2.0. See the LICENSE file for details.

Comments
  • Remove accidental user from list

    Remove accidental user from list

    This person signed the wrong one accidentally, they in no way support Richard Stallman nor his despicable behaviour and I can attest to the quality of their character. They tried to remove their signature from the list, but the repo is archived.

    Naturally they're blocked from making a PR themselves, hence the request coming from me. I hope that's ok.

    opened by SegFault-Verm 2
  • Break GraphQL queries into chunks

    Break GraphQL queries into chunks

    Still seeing 502s here. I've looked at the resource limit documentation but don't really understand how this query could be hitting them. In any case splitting up the query seems to resolve the issue.

    opened by travisbrown 1
  • crash if attempts to block no-longer-existing user, e.g. elonmusksama

    crash if attempts to block no-longer-existing user, e.g. elonmusksama

    Apparently sometimes github users go away and thusly can't be blocked. I ran target/release/crabby -vvv -t [redacted] block-users < jerks.csv

    ...where jerks.csv was generated via list-pr-contributors -r rms-support-letter/rms-support-letter.github.io . It crashed

    23:41:13 [WARN] eligiobz was already blocked
    23:41:13 [WARN] eliphatfs was already blocked
    Error: GitHub { source: GitHubError { documentation_url: "https://docs.github.com/rest/reference/users#block-a-user", errors: None, message: "Not Found" }, backtrace: Backtrace(   0: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
    

    I can work around by editing jerks.csv , no problem. But it'd be nice if this crash were just a warning, yep.

    opened by lahosken 1
  • make README examples consist with repo layout

    make README examples consist with repo layout

    This PR tweaks the readme so that the examples can be copied from directly. It adds a suggestion to set the token as an environment variable with the name already used in the example and points at the CSV included in the repo.

    opened by thumperward 1
  • add vscode devcontainer

    add vscode devcontainer

    This PR adds a very basic vscode devcontainer configuration which removes the need for a local Rust install if using the recommended vscode container-based workflow.

    The code is unmodified from vscode's suggestion for Rust development, so the container isn't minimal, but does allow for further extension.

    opened by thumperward 1
  • Change list-pr-contributors to include their Twitter handle

    Change list-pr-contributors to include their Twitter handle

    Hi! I'm not sure if I should have opened an issue to ask if you wanted this before implementing it, but I figured you can always reject the PR.

    One thing I wanted when I ran the command was to also get the user's Twitter handle, which is sometimes available. That way I can use other tools to block or mute them on Twitter as well. One caveat is that Github does not verify whether or not you own the Twitter account you declare as your own, so it could be wrong or targeting someone else.

    Also, this my first ever attempt at Rust code so I'm not sure if there's something I maybe should have done differently!

    opened by iamricard 1
  • Provide more accurate log messages

    Provide more accurate log messages

    @lahosken noted in #17 that block-users previously crashed when given accounts that no longer existed as input. This was fixed in #6 and #7, but while the command no longer crashed on those inputs, it did still give an inaccurate log message ("was already blocked"). This PR includes several improvements to the log messages, including passing through unknown responses, since I'm not sure we've covered the full range of what the API does here.

    opened by travisbrown 0
  • Support blocking for organizations

    Support blocking for organizations

    You can now enable the appropriate read:org or write:org scope when creating your personal access token and then add --org $MY_ORG to the list-blocks or block-users commands.

    opened by travisbrown 0
  • Add (user-configurable) mechanism for exclusions

    Add (user-configurable) mechanism for exclusions

    This morning I got an email from someone who said they had signed the Richard Stallman support letter but then read this article and decided to remove their signature.

    Their GitHub account will still appear in the reports that this tool generates for that repository at the moment, since it simply lists all contributors who have opened PRs against the repo. It seems reasonable to me that somebody should be able to let users of this tool know that they've changed their mind, though, so I've added a (fairly simple) exclusion system in this PR.

    By default the tool reads the data/exclusions.csv file, which should be a CSV file where each line includes a repo path (e.g. travisbrown/octocrabby) and a GitHub account that should not be associated with that repo. Specifically, anyone who no longer wishes to be associated with the Stallman letter can add an exclusion to this file, and their account will not show up either in the data/rms-support-letter-contributors.csv file, or in the results for anyone who is using this tool with the default settings.

    Any user of this tool can point it to their own exclusions file with --exclusions-file, or can disable the default exclusions with --ignore-exclusions. There are currently two hard-coded excluded users for all repos: ghost and dependabot.

    I'll only merge PRs adding rms-support-letter exclusions that link to a rms-support-letter PR removing a signature. Any PRs that demand that I remove "personal information" or whatever will just be closed immediately. I encourage anyone who's tempted to open a PR like that to send me a cease-and-desist email instead, and please use your best fake legalese.

    Fixes #8 and #2.

    opened by travisbrown 0
  • Add a process for exceptions

    Add a process for exceptions

    If someone decides that they no longer wish to be associated with the open letter supporting Stallman, it should be possible for them to make that decision known to users of this tool (note that simply removing their signature isn't sufficient, since this tool reports all contributors to a repository).

    The implementation is likely to be a data/exclusions.csv file where each line indicates a GitHub username and a repository. This exclusions file would be respected by default when running the list-pr-contributors command, although it will be possible for users of the tool to provide their own exclusions file, or simply to ignore the default exclusions.

    opened by travisbrown 0
  • Handle case where blocked account is also flagged

    Handle case where blocked account is also flagged

    The message can include other information about the blocked user ("Blocked user has already been blocked, Blocked user has been flagged as spam"), so we can't just check equality. We could probably also just check that errors is empty. I wish Octocrab would provide the status code in the case of error so we could check for 304 (I don't think it does), but since this is just for a log message it's not a big deal.

    opened by travisbrown 0
  • Add support for Wayback Machine archiving

    Add support for Wayback Machine archiving

    There are some pretty fucked-up conversations happening over on the rms-support-letter issue tracker (e.g. some pretty extreme bigotry here, "We know the name of harassing person, it is… as per … . We may want to send abuses against it.", etc.).

    It'd be fairly straightforward to add a command that would check for new issue threads or comments on a repo and request snapshots for all updated pages in the Wayback Machine, as a way of tracking and documenting this abuse. My cancel-culture project already has a number of tools for doing this kind of thing for Twitter accounts, and it wouldn't be hard to repurpose some of that code.

    The basic approach would be the following:

    1. Make a request against the Wayback CDX index to get a list of all archived pages.
    2. Get the time of most recent snapshot for each.
    3. Make requests against the GitHub API to identify new or updated issues, PRs, and comments.
    4. Hit the Wayback save page endpoint for all pages that have been updated since the last snapshot.
    5. Report all new Wayback URLs.

    At some point we might also want to add Wayback search and bulk downloading tools.

    opened by travisbrown 0
Owner
Travis Brown
Functional programmer mostly.
Travis Brown
Learn Rust by writing Entirely Too Many linked lists

Learn Rust by writing Entirely Too Many Linked Lists Read the pretty version at https://rust-unofficial.github.io/too-many-lists/. Building Building r

null 2.4k Jan 3, 2023
Generate voxel block meshes in Rust.

block-mesh Fast algorithms for generating voxel block meshes. Two algorithms are included: visible_block_faces: very fast but suboptimal meshes greedy

Duncan 89 Dec 24, 2022
Rust library for scheduling, managing resources, and running DAGs 🌙

?? moongraph ?? moongraph is a Rust library for scheduling, managing resources, and running directed acyclic graphs. In moongraph, graph nodes are nor

Schell Carl Scivally 3 May 1, 2023
Rust 核心库和标准库的源码级中文翻译,可作为 IDE 工具的智能提示 (Rust core library and standard library translation. can be used as IntelliSense for IDE tools)

Rust 标准库中文版 这是翻译 Rust 库 的地方, 相关源代码来自于 https://github.com/rust-lang/rust。 如果您不会说英语,那么拥有使用中文的文档至关重要,即使您会说英语,使用母语也仍然能让您感到愉快。Rust 标准库是高质量的,不管是新手还是老手,都可以从中

wtklbm 493 Jan 4, 2023
Conversion Tools API Rust client

ConversionTools Rust This Conversion Tools API Rust client allows you to use the site API and convert files faster and more conveniently. Site Convers

WinsomeQuill 2 Jan 23, 2022
Tools to feature more lenient Polonius-based borrow-checker patterns in stable Rust

Though this be madness, yet there is method in 't. More context Hamlet: For yourself, sir, shall grow old as I am – if, like a crab, you could go back

Daniel Henry-Mantilla 52 Dec 26, 2022
Fast fail2ban-like tools for parsing nginx logs

Fast2ban This is simple fail2ban-like replacement written in Rust. Usage: ./fast2ban # reads default config.toml from current directory ./fast2ban <co

null 36 May 10, 2023
Automatically deploy from GitHub to Replit, lightning fast ⚡️

repl.deploy Automatically deploy from GitHub to Replit, lightning fast ⚡️ repl.deploy is split into A GitHub app, which listens for code changes and s

Khushraj Rathod 78 Dec 22, 2022
Sorting-in-rust-jadyn-nicholas created by GitHub Classroom

Sorting in Rust Overview Traits Running the code and the tests To Do Overview This lab uses various sorting algorithms as examples several features of

null 0 Mar 24, 2022
disemvowel-in-rust-bante created by GitHub Classroom

Rust Disemvowel This is a simple lab where we'll use Rust to implement the disemvowel function that we covered in a previous C lab. What is Rust? Rust

null 0 Dec 7, 2021
The second Rust implementation on GitHub of third-party REST API client for Bilibili.

Bilibili REST API The second Rust implementation on GitHub of third-party REST API client for Bilibili. Designed to be lightweight and efficient. It's

null 4 Aug 25, 2022
learn_rust_rustlings-qinyuhang created by GitHub Classroom

rustlings ?? ❤️ Greetings and welcome to rustlings. This project contains small exercises to get you used to reading and writing Rust code. This inclu

The Learning&Training Hub of OS Kernel 2 Oct 22, 2022
example codes for CIS198 https://cis198-2016s.github.io/

CIS198: RUST 编程语言 学习背景 rust 和 c/c++/Java/Python/golang 不太一样 rust 学习曲线比较陡峭 rust 有很多颠覆认知的特性: 所有权,生命周期,借用检测 cargo 工具 函数式+命令式支持 视频讲解见 B站 课程大纲 Timeline Lec

Jinghui Hu 3 Apr 9, 2024
A minimalist tool for managing block-lists from the terminal.

Block List A minimalist hosts-based tool for managing block lists and ad-blocking. This project uses the excellent and regularly updated Unified Hosts

Oliver Brotchie 7 Aug 14, 2022
Easy c̵̰͠r̵̛̠ö̴̪s̶̩̒s̵̭̀-t̶̲͝h̶̯̚r̵̺͐e̷̖̽ḁ̴̍d̶̖̔ ȓ̵͙ė̶͎ḟ̴͙e̸̖͛r̶̖͗ë̶̱́ṉ̵̒ĉ̷̥e̷͚̍ s̷̹͌h̷̲̉a̵̭͋r̷̫̊ḭ̵̊n̷̬͂g̵̦̃ f̶̻̊ơ̵̜ṟ̸̈́ R̵̞̋ù̵̺s̷̖̅ţ̸͗!̸̼͋

Rust S̵̓i̸̓n̵̉ I̴n̴f̶e̸r̵n̷a̴l mutability! Howdy, friendly Rust developer! Ever had a value get m̵̯̅ð̶͊v̴̮̾ê̴̼͘d away right under your nose just when

null 294 Dec 23, 2022
Tudo is a fast and simple CLI tool for managing to-do lists.

Tudo is a fast and simple CLI tool for managing to-do lists. With Tudo, you can easily add, remove, clear, and mark tasks as done, all from the command line. Tudo is written in Rust, making it a BLAZINGLY high-performance and efficient tool for staying organized.

Daniel Ramirez 3 Apr 18, 2023
Rust-verification-tools - RVT is a collection of tools/libraries to support both static and dynamic verification of Rust programs.

Rust verification tools This is a collection of tools/libraries to support both static and dynamic verification of Rust programs. We see static verifi

null 253 Dec 31, 2022
⚙️ A curated list of static analysis (SAST) tools for all programming languages, config files, build tools, and more.

This repository lists static analysis tools for all programming languages, build tools, config files and more. The official website, analysis-tools.de

Analysis Tools 10.7k Jan 2, 2023
A notebook app integrated with todo lists utility. Developed with Rust, WebAssembly, Yew and Trunk.

Flow.er A notebook app integrated with todo-list utility. Project flow.er is a Rust WASM app running in browser. Taking advantage of Yew and Trunk, it

null 45 Dec 31, 2022
A number of collections, such as linked-lists, binary-trees, or B-Trees are most easily implemented with aliasing pointers.

StaticRc is a safe reference-counted pointer, similar to Rc or Arc, though performing its reference-counting at compile-time rather than run-time, and

null 372 Dec 19, 2022