Library for scripting analyses against crates.io's database dumps

Overview

crates.io database dumps

github crates.io docs.rs build status

Library for scripting analyses against crates.io's database dumps.

These database dumps contain all information exposed by the crates.io API packaged into a single download. An updated dump is published every 24 hours. The latest dump is available at https://static.crates.io/db-dump.tar.gz.

[dependencies]
db-dump = "0.2"

Examples

The examples/ directory of this repo contains several runnable example analyses.

total‑downloads Computes time series of total downloads by day across all crates on crates.io
crate‑downloads Computes time series of downloads of one specific crate
top‑crates Computes the top few most directly depended upon crates
user‑dependencies Computes the percentage of crates.io which depends directly on at least one crate by the specified user
user‑downloads Computes time series of the fraction of crates.io downloads attributed to a single given user's crates

Each of these examples can be run using Cargo once you've downloaded a recent database dump:

$ wget https://static.crates.io/db-dump.tar.gz
$ cargo run --release --example total-downloads

Here is the implementation of the most basic example, total-downloads, and graph of the resulting table. It shows crates.io download rate doubling every 9 months, or equivalently 10× every 2.5 years!

use chrono::NaiveDate;
use std::collections::BTreeMap as Map;

fn main() -> db_dump::Result<()> {
    let mut downloads = Map::<NaiveDate, u64>::new();
    db_dump::Loader::new()
        .version_downloads(|row| {
            *downloads.entry(row.date).or_default() += row.downloads;
        })
        .load("./db-dump.tar.gz")?;

    for (date, count) in downloads {
        println!("{},{}", date, count);
    }

    Ok(())
}

Crates.io downloads per day (log scale)


Here is a graph from the user-downloads example:

Fraction of crates.io downloads that are dtolnay's crates


License

Licensed under either of Apache License, Version 2.0 or MIT license at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this crate by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
You might also like...
Polars is a blazingly fast DataFrames library implemented in Rust using Apache Arrow Columnar Format as memory model.
Polars is a blazingly fast DataFrames library implemented in Rust using Apache Arrow Columnar Format as memory model.

Polars Python Documentation | Rust Documentation | User Guide | Discord | StackOverflow Blazingly fast DataFrames in Rust, Python & Node.js Polars is

mach-dump can parse Mach-O core dumps taken with lldb from macOS and iOS devices.

mach-dump mach-dump can parse Mach-O core dumps taken with lldb from macOS and iOS devices. It has no external dependencies. Example use std::path::Pa

Easy c̵̰͠r̵̛̠ö̴̪s̶̩̒s̵̭̀-t̶̲͝h̶̯̚r̵̺͐e̷̖̽ḁ̴̍d̶̖̔ ȓ̵͙ė̶͎ḟ̴͙e̸̖͛r̶̖͗ë̶̱́ṉ̵̒ĉ̷̥e̷͚̍ s̷̹͌h̷̲̉a̵̭͋r̷̫̊ḭ̵̊n̷̬͂g̵̦̃ f̶̻̊ơ̵̜ṟ̸̈́ R̵̞̋ù̵̺s̷̖̅ţ̸͗!̸̼͋

Rust S̵̓i̸̓n̵̉ I̴n̴f̶e̸r̵n̷a̴l mutability! Howdy, friendly Rust developer! Ever had a value get m̵̯̅ð̶͊v̴̮̾ê̴̼͘d away right under your nose just when

Security advisory database for Rust crates published through crates.io

RustSec Advisory Database The RustSec Advisory Database is a repository of security advisories filed against Rust crates published via https://crates.

Soufflé is a variant of Datalog for tool designers crafting analyses in Horn clauses. Soufflé synthesizes a native parallel C++ program from a logic specification.

Welcome! This is the official repository for the Soufflé language project. The Soufflé language is similar to Datalog (but has terms known as records)

Bundle Cargo crates for use with macOS/iOS in Xcode

cargo-cocoapods - Build Rust code for Xcode integration Installing cargo install cargo-cocoapods You'll also need to install all the toolchains you i

"Crates for Cheese" is a Rust collection library of those crates I consider a useful "extended standard".

cfc The purpose of this library is to provide a minimal list of currated crates which enhance the std library. In addition, most or all crates in this

A Distributed SQL Database - Building the Database in the Public to Learn Database Internals
A Distributed SQL Database - Building the Database in the Public to Learn Database Internals

Table of Contents Overview Usage TODO MVCC in entangleDB SQL Query Execution in entangleDB entangleDB Raft Consensus Engine What I am trying to build

A lightweight, embedded key-value database for mobile clients (i.e., iOS, Android), written in Rust.

A lightweight, embedded key-value database for mobile clients (i.e., iOS, Android), written in Rust. ⚠️ Still in testing, not yet ready for production

A web service that generates images of dependency graphs for crates hosted on crates.io

crate-deps A web service that generates images of dependency graphs for crates hosted on crates.io This project is built entirely in Rust using these

crates is an extension aims to help people to manage their dependencies for rust (crates.io & TOML).
crates is an extension aims to help people to manage their dependencies for rust (crates.io & TOML).

crates Hello Rust & VSCode lovers, This is crates, an extension for crates.io dependencies. Aims helping developers to manage dependencies while using

https://crates.io/crates/transistor

Transistor A Rust Crux Client crate/lib. For now, this crate intends to support 2 ways to interact with Crux: Via Docker with a crux-standalone versio

Crates - A collection of open source Rust crates from iqlusion

iqlusion crates 📦 This repository contains a set of Apache 2.0-licensed packages (a.k.a. "crates") for the Rust programming language, contributed to

Download .crate files of all versions of all crates from crates.io

get-all-crates Download .crate files of all versions of all crates from crates.io. Useful for things like noisy-clippy which need to analyze the sourc

A gui tool written in Dioxus to make it easy to release a workspace of crates to crates.io
A gui tool written in Dioxus to make it easy to release a workspace of crates to crates.io

Easy-Release: a visual tool for releasing workspaces of libraries A work-in-progress GUI for releasing a large workspace of crates manually, but easil

Crates Registry is a tool for serving and publishing crates and serving rustup installation in offline networks.
Crates Registry is a tool for serving and publishing crates and serving rustup installation in offline networks.

Crates Registry Description Crates Registry is a tool for serving and publishing crates and serving rustup installation in offline networks. (like Ver

JSON-RPC endpoint proxy that dumps requests/responses for debugging
JSON-RPC endpoint proxy that dumps requests/responses for debugging

json_rpc_snoop How to build Ensure you have cargo installed and in your PATH (the easiest way is to visit https://rustup.rs/) make This will create t

A Rust application that inserts Discogs data dumps into Postgres

Discogs-load A Rust application that inserts Discogs data dumps into Postgres. Discogs-load uses a simple state machine with the quick-xml Rust librar

Scans a given directory for software of unknown provinence (SOUP) and dumps them in a json-file

Scans a given directory for software of unknown provinence (SOUP) and writes them to a json-file. The json-file contains name, version and a meta property for each SOUP.

Comments
  • Expose timestamps as DateTime<Utc>

    Expose timestamps as DateTime

    The created_at and updated_at timestamps in the db dump CSV do not mention a time zone, but it's not correct to represent them as NaiveDateTime because they refer to instants in real time, just stated in UTC.

    opened by dtolnay 0
  • Drop dependency on serde's derive feature

    Drop dependency on serde's derive feature

    This allows everything downstream of serde in our dependency graph (bstr, csv, semver, serde_json) to begin building before serde_derive has finished building.

    Before:

    After:

    opened by dtolnay 0
  • Serialize Id structs

    Serialize Id structs

    It would be beneficial if other structs are also serializable (Rows).

    E.g. I want to serialize crates and categories data to be used as json in a pipeline.

    opened by elpiel 0
  • use Day<Tz>

    use Day

    As per discussion in https://github.com/chronotope/chrono/issues/820 - I've added the Day<Tz> type to a branch in chrono and tested it out over here. If this is desired to be merged, I'll update the chrono dependency to point to a released version once it is released

    opened by esheppa 0
Releases(0.4.1)
Owner
David Tolnay
David Tolnay
Yet Another Technical Analysis library [for Rust]

YATA Yet Another Technical Analysis library YaTa implements most common technical analysis methods and indicators. It also provides you an interface t

Dmitry 197 Dec 29, 2022
Rust DataFrame library

Polars Blazingly fast DataFrames in Rust & Python Polars is a blazingly fast DataFrames library implemented in Rust. Its memory model uses Apache Arro

Ritchie Vink 11.9k Jan 8, 2023
Rayon: A data parallelism library for Rust

Rayon Rayon is a data-parallelism library for Rust. It is extremely lightweight and makes it easy to convert a sequential computation into a parallel

null 7.8k Jan 8, 2023
sparse linear algebra library for rust

sprs, sparse matrices for Rust sprs implements some sparse matrix data structures and linear algebra algorithms in pure Rust. The API is a work in pro

Vincent Barrielle 311 Dec 18, 2022
ConnectorX - Fastest library to load data from DB to DataFrames in Rust and Python

ConnectorX enables you to load data from databases into Python in the fastest and most memory efficient way.

SFU Database Group 939 Jan 5, 2023
A cross-platform library to retrieve performance statistics data.

A toolkit designed to be a foundation for applications to monitor their performance.

Lark Technologies Pte. Ltd. 155 Nov 12, 2022
This library provides a data view for reading and writing data in a byte array.

Docs This library provides a data view for reading and writing data in a byte array. This library requires feature(generic_const_exprs) to be enabled.

null 2 Nov 2, 2022
Dataflow is a data processing library, primarily for machine learning

Dataflow Dataflow is a data processing library, primarily for machine learning. It provides efficient pipeline primitives to build a directed acyclic

Sidekick AI 9 Dec 19, 2022
A bit vector with the Rust standard library's portable SIMD API

bitsvec A bit vector with the Rust standard library's portable SIMD API Usage Add bitsvec to Cargo.toml: bitsvec = "x.y.z" Write some code like this:

Chojan Shang 31 Dec 21, 2022
A rust library built to support building time-series based projection models

TimeSeries TimeSeries is a framework for building analytical models in Rust that have a time dimension. Inspiration The inspiration for writing this i

James MacAdie 12 Dec 7, 2022