Gathering some metrics about github projects

Overview

rust-metrics

This is an experimental project to start gathering metrics about github organizations and repositories.

The goal is to get an idea of various "open source health" style metrics, such as:

  • which repositories are the most active?
  • who is contributing to them?
  • how long does it take to get reviews?

Ultimately, I'd like to use this data to help projects and contributors be more successful. For example, it'd be nice to know if someone has stopped opening PRs in the last month or two -- maybe there is something blocking them that can be fixed?

Want to help?

Great! This is a fun project to hack on because it's relatively simple. It's a good way to get acquainted with async Rust and Rust in general. I'm eager to grow a set of maintainers who can help the project grow and be useful, so if you think it sounds appealing, get in touch with nikomatsakis on the github-metrics zulip instance below.

Chat

We chat on the github-metrics zulip.

Comments
  • Parallel loading of data

    Parallel loading of data

    Right now, listing the number of PRs from each repo runs one by one and is very slow. We should parallelize the requests across the various repositories.

    opened by nikomatsakis 6
  • feat(consumer): use anyhow::Error and cvs writer for Print

    feat(consumer): use anyhow::Error and cvs writer for Print

    Address the remaining tasks of: https://github.com/optopodi/optopodi/issues/30

    • [x] escape , in entry content
    • [x] make a test producer that supplies dummy data to test it
    • [x] Use anyhow::Error instead of String for the Consumer return type . https://github.com/optopodi/optopodi/issues/28

    I used the csv crate to write/escape the data.

    As suggested in the review of #32, I added a Write instance variable to the Print struct. I decided to add it to the state rather than as a parameter to the consume function to avoid (or just postpone) refactoring the struct: ExportToSheets, which is a Consumer but the Write parameter cannot be added straightforward.

    To test the Print, I had to pass a reference to consume (&self instead of self).

    Since the resulting code is a bit over-complex, feedback is super appreciated, thanks!

    opened by angelocatalani 4
  • Refactor the command-line options

    Refactor the command-line options

    The command line configuration is currently

    #[derive(Clap, Debug)]
    #[clap(setting = AppSettings::ColoredHelp)]
    #[clap(name = "gh-metrics")]
    enum Opt {
        /// list all repositories in the given organization and the number of
        /// Pull Requests created in the last 30 days
        List {
            /// name of GitHub organization to analyze
            #[clap(short, long)]
            org: String,
    
            /// Verbose mode (-v, -vv, -vvv, etc.)
            #[clap(short, long, parse(from_occurrences))]
            verbose: u8,
    
            #[clap(short, long)]
            google_sheet: Option<String>,
        },
    }
    

    but I think that the various options (e.g., org, verbose, google_sheet) should really be independent of the command in use. I imagine the command being the way we specify the producer, and the other options being the way we specify the consumer (or options that apply to many producers).

    So you'd be able to do things like e.g.

    github-metrics list --csv xxx.csv
    github-metrics count-contributors --csv xxx.csv
    github-metrics count-contributors --google-sheet xxx
    
    opened by nikomatsakis 4
  • feat: export and live update data to Google Sheets

    feat: export and live update data to Google Sheets

    Description

    • [x] close #5
    • [x] implements tokio::sync::mpsc to separate the concerns of producing data vs consuming data as suggested by @nikomatsakis
    • [x] users have the ability to attach a google sheets ID by way of the --google_sheets <ID> flag or -g <ID>. This will convey that they'd like to export data to that Sheet!
    • [x] users are re-directed to sign in to Google (via a link printed to terminal) where they can then allow us to take control
    • [x] data is exporting to google sheets :smile:
      • [x] when exporting to a sheet, the entirety of the sheet's values (not formatting) will be cleared; then we will insert our data starting with cell A1
    • [x] a message is conveyed to the user after processing
    • ARCHIVED ~~Introduces some first-draft structs, traits, and implementations to manage our various data types and de-couple gathering data vs outputing data.~~
      • ~~I am 100% confident that somewhere in this PR my lack of experience in this language will be exposed. I expect it will greatly benefit from a magical refactor from @nikomatsakis.~~
      • ~~gathering and outputting data for rust-lang takes even longer now... (or feels like it at least since it currently gathers all data before printing everything — which we should change eventually)~~

    Comments

    • the code found in google_sheets.rs was initially inspired by the sheets crate, because I found their initial design rather simple and clean, albiet a bit lacking in what we needed.
    • Though there is naturally a little overhead by building our google-sheets communications from scratch, I personally think the control is really convenient — especially as we may someday want to provide some Inversion of Control to our users RE: formatting sheets options (just for example), in which case the granular control we have by interacting directly with the Google's API is a pretty powerful and convenient thing.
      Plus, in all honesty, I was not overly impressed with what the two rust crates I looked at had to offer.

    Thoughts / Ideas / Conversations

    • the google sheets export depends on an OAuth 2.0 Client ID for Desktop (Server) from a Google Cloud Application. Our Sheets client looks for that in a file called client_secret.json in the root folder.
      • This is the Client ID for a GCloud application called gh-metrics-rst under ownership of a freshly created GMail account [email protected] that we can use as a default so that users can use the CLI out of the box.
      • Later on, we can support the ability for users to specify their own Client ID or Cloud App
    • We should add a "progress bar" so our user doesn't need to wonder what is happening when we're trying to analyze the entirety of rust-lang
      • on this note, the code introduced in this PR doesn't log each repo/line one-by-one, therefore you end up waiting around for quite some time just waiting for our CLI to load up all the data.
    opened by chazkiker2 4
  • fix: merge generation of input-data for `issue_closures` and `list_repos`

    fix: merge generation of input-data for `issue_closures` and `list_repos`

    Description

    • [x] fix #64
    • [x] combine the Producer functionality for issue_closures and list_repos into one producer
      • [x] fix associated consumers
    • [x] add some clarifying comments & doc-comments throughout

    r? @rylev

    opened by chazkiker2 3
  • add context to errors

    add context to errors

    We should change from anyhow to eyre, since it generates prettier reports. We also should add context to our errors. See this Zulip thread for more information.

    opened by nikomatsakis 3
  • generate reports about the contributors

    generate reports about the contributors

    Big overhaul of the setup. We still use the same metrics, but we now collect them into digested form. This code is a bit of a mess because I glued together some stuff I wrote, but it seems to work.

    Some notes for future:

    • I don't think the intermediate "inputs" directory adds any value; we should probably remove it and instead have producers just produce typed structs instead of vectors of strings.
    • The "top contributors" and "top crates" code in the reports should really be a producer itself, but I was too lazy to do that rewrite. It should probably also not be written synchronously?
    opened by nikomatsakis 3
  • print consumer should not hardcode the labels

    print consumer should not hardcode the labels

    The print consumer currently hardcodes that there are two columns:

    https://github.com/nikomatsakis/github-metrics/blob/5a3667f890b4668056c1083905e8ca8fd04bf959/src/metrics/print.rs#L14-L17

    It should be rewritten to work for any number of columns

    good first issue 
    opened by nikomatsakis 3
  • feat: refactor `count_pull_requests` to use GraphQL

    feat: refactor `count_pull_requests` to use GraphQL

    Description

    This PR improves the speed of the src/metrics/list_repos producer by an incredible amount by simple shifting our count_pull_requests function to utilize a GraphQL call.

    • [x] refactor 'count_pull_requests' function to utilize graphql query
    opened by chazkiker2 3
  • Bump deps

    Bump deps

    These are mostly uninteresting, but something in the first commit fixes a crash:

        Finished release [optimized] target(s) in 0.05s
         Running `target/release/optopodi report data/2021-10-13`
        Updating crates.io index
    thread 'tokio-runtime-worker' panicked at 'Unable to update registry: failed to fetch `https://github.com/rust-lang/crates.io-index`
    
    Caused by:
        invalid version 0 on git_proxy_options; class=Invalid (3)', ~/.cargo/registry/src/github.com-1ecc6299db9ec823/rust-playground-top-crates-0.1.0/src/lib.rs:228:21
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
    Error: Failed to generate new report from directory data/2021-10-13
    
    Caused by:
       0: Failed to parse Top Crates
       1: Failed to spawn blocking task
       2: panic
    
    opened by lnicola 2
  • Add issue closures report

    Add issue closures report

    This is based on functionality in rylev/triage-tracker, but rewritten to be more efficient and simpler.

    This adds an additional report on the number of issue openings and closings for repos in a given time period. The data is not currently super useful, but hopefully this can be extended in the future to display this data over time to see if repos close more issues than are opened.

    Here's what a report looks like:

    Organization,Repo,Opened,Closed,Delta,Time Period
    rust-lang,rust,210,203,7,2021-07-01<>2021-08-01
    rust-lang,rustup,9,5,4,2021-07-01<>2021-08-01
    rust-lang,rustfmt,20,15,5,2021-07-01<>2021-08-01
    rust-lang,rustfix,0,0,0,2021-07-01<>2021-08-01
    rust-lang,cargo,35,30,5,2021-07-01<>2021-08-01
    rust-lang,crates.io,7,9,-2,2021-07-01<>2021-08-01
    rust-lang,docs.rs,7,5,2,2021-07-01<>2021-08-01
    rust-lang,mdBook,11,14,-3,2021-07-01<>2021-08-0
    
    opened by rylev 2
  • latest crates.io release does not build with nightly

    latest crates.io release does not build with nightly

    Hi, so coming from @estebank s twitter post https://twitter.com/ekuber/status/1453078603114029059 I tried to cargo install optopodi just to find out that latest release build is broken :disappointed: rustc 1.58.0-nightly (e269e6bf4 2021-10-26)

    
       Compiling optopodi v0.1.0
    error[E0432]: unresolved import `clap::Clap`
     --> /home/matthias/.cargo/registry/src/github.com-1ecc6299db9ec823/optopodi-0.1.0/src/main.rs:2:25
      |
    2 | use clap::{AppSettings, Clap};
      |                         ^^^^ no `Clap` in the root
    
    error: cannot determine resolution for the derive macro `Clap`
      --> /home/matthias/.cargo/registry/src/github.com-1ecc6299db9ec823/optopodi-0.1.0/src/main.rs:26:10
       |
    26 | #[derive(Clap, Debug, PartialEq)]
       |          ^^^^
       |
       = note: import resolution is stuck, try simplifying macro imports
    
    error: cannot determine resolution for the derive macro `Clap`
      --> /home/matthias/.cargo/registry/src/github.com-1ecc6299db9ec823/optopodi-0.1.0/src/main.rs:13:10
       |
    13 | #[derive(Clap, Debug, PartialEq)]
       |          ^^^^
       |
       = note: import resolution is stuck, try simplifying macro imports
    
    error: cannot find attribute `clap` in this scope
      --> /home/matthias/.cargo/registry/src/github.com-1ecc6299db9ec823/optopodi-0.1.0/src/main.rs:14:3
       |
    14 | #[clap(setting = AppSettings::ColoredHelp)]
       |   ^^^^
       |
       = note: `clap` is in scope, but it is a crate, not an attribute
    
    error: cannot find attribute `clap` in this scope
      --> /home/matthias/.cargo/registry/src/github.com-1ecc6299db9ec823/optopodi-0.1.0/src/main.rs:15:3
       |
    15 | #[clap(name = "gh-metrics")]
       |   ^^^^
       |
       = note: `clap` is in scope, but it is a crate, not an attribute
    
    error: cannot find attribute `clap` in this scope
      --> /home/matthias/.cargo/registry/src/github.com-1ecc6299db9ec823/optopodi-0.1.0/src/main.rs:18:7
       |
    18 |     #[clap(long)]
       |       ^^^^
       |
       = note: `clap` is in scope, but it is a crate, not an attribute
    
    error: cannot find attribute `clap` in this scope
      --> /home/matthias/.cargo/registry/src/github.com-1ecc6299db9ec823/optopodi-0.1.0/src/main.rs:22:7
       |
    22 |     #[clap(subcommand)]
       |       ^^^^
       |
       = note: `clap` is in scope, but it is a crate, not an attribute
    
    error[E0599]: no function or associated item named `parse` found for struct `OctoCli` in the current scope
      --> /home/matthias/.cargo/registry/src/github.com-1ecc6299db9ec823/optopodi-0.1.0/src/main.rs:41:24
       |
    16 | struct OctoCli {
       | -------------- function or associated item `parse` not found for this
    ...
    41 |     let cli = OctoCli::parse();
       |                        ^^^^^ function or associated item not found in `OctoCli`
       |
       = help: items from traits can only be used if the trait is implemented and in scope
       = note: the following traits define an item `parse`, perhaps you need to implement one of them:
               candidate #1: `Parser`
               candidate #2: `object::read::elf::file::FileHeader`
               candidate #3: `object::read::macho::file::MachHeader`
               candidate #4: `object::read::pe::file::ImageNtHeaders`
    
    Some errors have detailed explanations: E0432, E0599.
    For more information about an error, try `rustc --explain E0432`.
    error: failed to compile `optopodi v0.1.0`, intermediate artifacts can be found at `/tmp/cargo-installzSqF0p`
    
    opened by matthiaskrgr 0
  • consider arguments in graphql caching with `--replay-graphql`

    consider arguments in graphql caching with `--replay-graphql`

    --replay-graphql bug

    • the changes in this branch are working great without the --replay-graphql option; but as soon we run it WITH that flag, the GQL query is loading response data from the same file for both metrics::util::count_pull_requests and metrics::util::count_issues
    • running with RUST_LOG=debug demonstrates this issue well
    ❯ RUST_LOG=debug cargo run -- report data/optopodi/2021-07-31                 
       Compiling optopodi v0.1.0 (/Users/chazadmin/code/opensource/github-metrics)
        Finished dev [unoptimized + debuginfo] target(s) in 7.84s
         Running `target/debug/optopodi report data/optopodi/2021-07-31`
    [2021-07-31T18:13:32Z DEBUG reqwest::connect] starting new connection: https://api.github.com/
    [2021-07-31T18:13:32Z DEBUG reqwest::async_impl::client] response '200 OK' for https://api.github.com/graphql
    [2021-07-31T18:13:33Z DEBUG reqwest::async_impl::client] response '200 OK' for https://api.github.com/graphql
    [2021-07-31T18:13:33Z DEBUG reqwest::async_impl::client] response '200 OK' for https://api.github.com/graphql
    [2021-07-31T18:13:33Z DEBUG optopodi::metrics::util] Fetching issue closure info for optopodi/optopodi
    [2021-07-31T18:13:33Z DEBUG reqwest::async_impl::client] response '200 OK' for https://api.github.com/graphql
    [2021-07-31T18:13:33Z DEBUG optopodi::metrics::util] Fetching issue closure info for optopodi/optopodi
    [2021-07-31T18:13:33Z DEBUG reqwest::async_impl::client] response '200 OK' for https://api.github.com/graphql
    [2021-07-31T18:13:33Z DEBUG optopodi::metrics::list_repos] Retried issue closure info for optopodi/optopodi: IssueClosuresCount { opened: 2, closed: 5 }
    
    
    ❯ RUST_LOG=debug cargo run -- --replay-graphql report data/optopodi/2021-07-31
        Finished dev [unoptimized + debuginfo] target(s) in 0.15s
         Running `target/debug/optopodi --replay-graphql report data/optopodi/2021-07-31`
    [2021-07-31T18:13:35Z INFO  optopodi::metrics::gql] loading response data from `data/optopodi/2021-07-31/graphql/all-repos/0.json` rather than github
    [2021-07-31T18:13:35Z INFO  optopodi::metrics::gql] loading response data from `data/optopodi/2021-07-31/graphql/repo-participants/0.json` rather than github
    [2021-07-31T18:13:35Z INFO  optopodi::metrics::gql] loading response data from `data/optopodi/2021-07-31/graphql/repo-infos/0.json` rather than github
    [2021-07-31T18:13:35Z DEBUG optopodi::metrics::util] Fetching issue closure info for optopodi/optopodi
    [2021-07-31T18:13:35Z INFO  optopodi::metrics::gql] loading response data from `data/optopodi/2021-07-31/graphql/repo-infos/0.json` rather than github
    [2021-07-31T18:13:35Z DEBUG optopodi::metrics::util] Fetching issue closure info for optopodi/optopodi
    [2021-07-31T18:13:35Z INFO  optopodi::metrics::gql] loading response data from `data/optopodi/2021-07-31/graphql/repo-infos/0.json` rather than github
    [2021-07-31T18:13:35Z DEBUG optopodi::metrics::list_repos] Retried issue closure info for optopodi/optopodi: IssueClosuresCount { opened: 5, closed: 5 }
    
    opened by chazkiker2 1
  • add a

    add a "high level project architecture" document in the book

    We should add a high-level project architecture guide in our book for incoming contributors who are hoping to familiarize themselves with how this codebase functions.

    • this should include a description of what each file/folder does
    • how GQL queries are implemented
    • Answer this question: "if I want to add a metric, how do I go about that?"
    documentation 
    opened by chazkiker2 5
  • analyze the top crates too

    analyze the top crates too

    As an addition to the report, I would like to be running some analysis across each of the repositories for each of the top crates to generally measure their "active maintainership".

    opened by nikomatsakis 0
  • convert report data to producers

    convert report data to producers

    Once #45 lands, and we complete #46, we should convert the "reports" into producers, and then just have the final output be done with consumers. We can probably simplify the report file structure too (just put graphql into its own directory, for example).

    opened by nikomatsakis 0
Owner
null
Ointers is a library for representing pointers where some bits have been stolen so that they may be used by the programmer for something else

Ointers is a library for representing pointers where some bits have been stolen so that they may be used by the programmer for something else. In effect, it's a small amount of free storage

Irrustible 8 Jun 4, 2022
Some tools for streaming frames to rpi-rgb-led-matrix using ZeroMQ, written in Rust.

led_matrix_zmq Some tools for streaming frames to rpi-rgb-led-matrix using ZeroMQ, written in Rust. This repository includes: Rust client and server l

Dan 2 Sep 6, 2022
Solutions of Advent of Code 2021 in Rust, and some other languages.

advent-of-rust Solutions of Advent of Code 2021 in Rust, and some other languages. Puzzles Puzzle Stars Languages Day 1: Sonar Sweep ⭐ ⭐ Rust Python D

rene-d 6 Jan 7, 2023
A rust program to try and detect some types of Hardware Keyloggers.

Hardware Keylogger Detection Warning: Certain Types of Hardware keyloggers can not be detected by this program, Passive Hardware Keyloggers are imposs

null 4 Dec 5, 2022
Wally is a modern package manager for Roblox projects inspired by Cargo

Wally is a package manager for Roblox inspired by Cargo (Rust) and npm (JavaScript). It brings the familiar, community-oriented world of sharing code from other communities into the Roblox ecosystem.

Uplift Games 194 Jan 3, 2023
A repository containing dozens of projects requiring vastly different skillsets.

The 100 Project Challenge A repository containing dozens of projects requiring vastly different skillsets. All the projects that I might add to this r

null 4 Jun 21, 2022
Modrinth API is a simple library for using, you guessed it, the Modrinth API in Rust projects

Modrinth API is a simple library for using, you guessed it, the Modrinth API in Rust projects. It uses reqwest as its HTTP(S) client and deserialises responses to typed structs using serde.

null 21 Jan 1, 2023
Ocular seeks to be the preferred cosmos client library UX for Rust projects

Ocular seeks to be the preferred cosmos client library UX for Rust projects. It is strongly based on lens, a go client library for blockchains built with the Cosmos SDK.

Peggy JV, Inc 34 Dec 26, 2022
A dead simple boilerplate for Rust projects.

boilerplate-rs • A dead simple boilerplate for Rust projects. Project Structure ├── assets │ └── logo.png ├── bin │ ├── Cargo.toml │ └── src │

null 6 Mar 27, 2023
A Github Actions based CI release template for Rust binaries

Rust CI Release Template A Github Actions based CI release template. This repo serves as a live template, and reference for building your own CI power

null 60 Dec 9, 2022
A GitHub Action to automatically build and deploy your mdbook project.

?? deploy-mdbook The deploy-mdbook action allows you to easily build and deploy your mdBook project to GitHub Pages. See action.yml for configuration

null 27 Oct 24, 2022
🛡️ Automatically protect the default branch of new repositories in a GitHub organization

The Branch Autoprotector watches a GitHub organization and automatically protects the default branch in new repositories. This service notifies the creator of the default branch of this automatic branch protection setup by filing an issue in the repository.

Branch Autoprotector 2 Jan 31, 2022
Help project managers and project owners with easy-to-understand views of github issue dependencies.

Help project managers and project owners with easy-to-understand views of github issue dependencies.

nasa 56 Dec 15, 2022
A Github webhook server to help with CI/CD written in Rust.

This application will automatically updates local GitHub repositories and triggers a command once the update is complete. This can be extremely useful

Luca 9 Apr 4, 2023
⏱ Cross-platform Prometheus style process metrics collector of metrics crate

⏱ metrics-process This crate provides Prometheus style process metrics collector of metrics crate for Linux, macOS, and Windows. Collector code is man

Alisue 12 Dec 16, 2022
A simple cli to clone projects and fetch all projects in a GitHub org..

stupid-git A simple cli to clone projects and update all projects. get all repository from GitHub clone all pull all with git stash Usage create sgit.

Fengda Huang 5 Sep 15, 2022
A lib crate for gathering system info such as cpu, distro, environment, kernel, etc in Rust.

nixinfo A lib crate for gathering system info such as cpu, distro, environment, kernel, etc in Rust. To use: nixinfo = "0.2.8" in your Cargo.toml. Cur

ValleyKnight 37 Nov 26, 2022
RustRedOps is a repository dedicated to gathering and sharing advanced techniques and malware for Red Team, with a specific focus on the Rust programming language. (In Construction)

RustRedOps In Construction.... The project is still under development Overview RustRedOps is a repository that houses various tools and projects relat

João Victor 17 Dec 14, 2023
Rust port of https://github.com/hunar4321/life_code with some fun features.

Smarticles A Rust port of Brainxyz's Artificial Life simulator with some fun features. A simple program to simulate primitive Artificial Life using si

Chevy Ray Johnston 15 Dec 24, 2022
GitHub CLI extension to search some repos interactively.

gh activity GitHub CLI extension to search some repos interactively. It's wrapper to build gh command provided by GitHub CLI, it could search more eas

taka naoga 3 Jul 28, 2023