A Rust application that inserts Discogs data dumps into Postgres

Overview

Discogs-load

A Rust application that inserts Discogs data dumps into Postgres.

Discogs-load uses a simple state machine with the quick-xml Rust library to parse the monthly data dump of discogs and load it into postgres. At moment of writing the largest file of the monthly dump is ~10 gb compressed and takes ~20 minutes to parse and load on a mac air m1.

Inspired by discogs-xml2db and discogs2pg.

Installation

Create a binary.

cargo build --release

Usage

Download the releases data dump here, and run the binary with the path to the gz compressed file as only argument.

docker-compose up -d postgres
./target/release/discogs-load discogs_20211201_releases.xml.gz

Tests

If you don't want to run the huge releases file, it is possible to run a smaller example file like so:

docker-compose up -d postgres
cargo run tests/data/discogs_test_releases.xml.gz

And do a small manual test:

docker exec -it discogs-load_postgres_1 /bin/bash
psql -U dev discogs
select * from release;

Contributing/Remaining todo

  • Implement COPY_IN
    • Postgres COPY is faster than the current multi row insertion.
    • will also refactor current ugly functions of write_table
  • Other (smaller) files from the monthly discogs data dump
    • labels
    • artists
    • masters
  • Implement the db argparse env variables
You might also like...
An easy-to-use, zero-downtime schema migration tool for Postgres

Reshape is an easy-to-use, zero-downtime schema migration tool for Postgres. It automatically handles complex migrations that would normally require downtime or manual multi-step changes.

The simplest implementation of LLM-backed vector search on Postgres.

pg_vectorize under development The simplest implementation of LLM-backed vector search on Postgres. -- initialize an existing table select vectorize.i

Materialize simplifies application development with streaming data. Incrementally-updated materialized views - in PostgreSQL and in real time. Materialize is powered by Timely Dataflow.
Materialize simplifies application development with streaming data. Incrementally-updated materialized views - in PostgreSQL and in real time. Materialize is powered by Timely Dataflow.

Materialize is a streaming database for real-time applications. Get started Check out our getting started guide. About Materialize lets you ask questi

🐸Slippi DB ingests Slippi replays and puts the data into a SQLite database for easier parsing.
🐸Slippi DB ingests Slippi replays and puts the data into a SQLite database for easier parsing.

The primary goal of this project is to make it easier to analyze large amounts of Slippi data. Its end goal is to create something similar to Ballchasing.com but for Melee.

a tokio-enabled data store for triple data

terminusdb-store, a tokio-enabled data store for triple data Overview This library implements a way to store triple data - data that consists of a sub

A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy
A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy

A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy

Implements the packet parser for Gran Turismo 7 telemetry data, allowing a developer to retrieve data from a running game.

gran-turismo-query Implements the packet parser for Gran Turismo 7 telemetry data, allowing a developer to retrieve data from a running game. Features

Simple Rust Application for YugabyteDB
Simple Rust Application for YugabyteDB

A simple Rust application that connects to a YugabyteDB cluster and performs basic CRUD operations.

An explorer for the DeArrow database as a web application. Inspired by Lartza's SBrowser

DeArrow Browser An explorer for the DeArrow database as a web application. Inspired by Lartza's SBbrowser. Public instance available at dearrow.minibo

Releases(v0.1.2)
Owner
Dylan
🐊
Dylan
Postgres Foreign Data Wrapper for Clerk.com API

Pre-requisites Postgres-15 Rust pgrx Getting Started To run the program locally, clone the repository git clone https://github.com/tembo-io/clerk_fdw.

Tembo 3 Aug 22, 2023
Scalable and fast data store optimised for time series data such as financial data, events, metrics for real time analysis

OnTimeDB Scalable and fast data store optimised for time series data such as financial data, events, metrics for real time analysis OnTimeDB is a time

Stuart 2 Apr 5, 2022
Manage Redshift/Postgres privileges in GitOps style written in Rust

grant-rs An open-source project that aims to manage Postgres/Redshift database roles and privileges in GitOps style, written in Rust. Home | Documenta

Duyet Le 13 Nov 23, 2022
cogo rust coroutine database driver (Mysql,Postgres,Sqlite)

cdbc Coroutine Database driver Connectivity.based on cogo High concurrency,based on coroutine No Future<'q,Output=*>,No async fn, No .await , no Poll*

co-rs 10 Nov 13, 2022
rust-postgres support library for the r2d2 connection pool

r2d2-postgres Documentation rust-postgres support library for the r2d2 connection pool. Example use std::thread; use r2d2_postgres::{postgres::NoTls,

Steven Fackler 128 Dec 26, 2022
Command-line tool to make Rust source code entities from Postgres tables.

pg2rs Command-line tool to make Rust source code entities from Postgres tables. Generates: enums structs which can be then used like mod structs; use

Stanislav 10 May 20, 2022
A Pub/Sub library for Rust backed by Postgres

Unisub Unisub is a Pub/Sub library for Rust, using Postgres as the backend. It offers a convenient way to publish and subscribe to messages across dif

Nick Rempel 12 Oct 6, 2023
Rust library and daemon for easily starting postgres databases per-test without Docker

pgtemp pgtemp is a Rust library and cli tool that allows you to easily create temporary PostgreSQL servers for testing without using Docker. The pgtem

Harry Stern 165 Mar 22, 2024
Making Postgres and Elasticsearch work together like it's 2021

Making Postgres and Elasticsearch work together like it's 2021 Readme ZomboDB brings powerful text-search and analytics features to Postgres by using

ZomboDB 4.2k Jan 2, 2023
Distributed, version controlled, SQL database with cryptographically verifiable storage, queries and results. Think git for postgres.

SDB - SignatureDB Distributed, version controlled, SQL database with cryptographically verifiable storage, queries and results. Think git for postgres

Fremantle Industries 5 Apr 26, 2022