⚡️
Seed Your Development Database With Real Data Replibyte is a powerful tool to seed your databases
with real data and other cool features
Features
- Support data backup and restore for PostgreSQL, MySQL and MongoDB
- Replace sensitive data with fake data
- Works on large database (> 10GB) (read Design)
- Database Subsetting: Scale down a production database to a more reasonable size
🔥 - Start a local database with the prod data in a single command
🔥 - On-the-fly data (de)compression (Zlib)
- On-the-fly data de/encryption (AES-256)
- Fully stateless (no server, no daemon) and lightweight binary
🍃 - Use custom transformers
Here are the features we plan to support
- Auto-detect and version database schema change
- Auto-detect sensitive fields
- Auto-clean backed up data
Install
Install on MacOSX
brew tap Qovery/replibyte
brew install replibyte
Or manually.
Install on Linux
# download latest replibyte archive for Linux
curl -s https://api.github.com/repos/Qovery/replibyte/releases/latest | \
jq -r '.assets[].browser_download_url' | \
grep -i 'linux-musl.tar.gz$' | wget -qi - && \
# unarchive
tar zxf *.tar.gz
# make replibyte executable
chmod +x replibyte
# make it accessible from everywhere
mv replibyte /usr/local/bin/
Install on Windows
Download the latest Windows release and install it.
Install from source
git clone https://github.com/Qovery/replibyte.git && cd replibyte
# Install cargo
# visit: https://doc.rust-lang.org/cargo/getting-started/installation.html
# Build with cargo
cargo build --release
# Run RepliByte
./target/release/replibyte -h
Run replibyte with Docker
git clone https://github.com/Qovery/replibyte.git
# Build image with Docker
docker build -t replibyte -f Dockerfile .
# Run RepliByte
docker run -v $(pwd)/examples:/examples/ replibyte -c /examples/replibyte.yaml transformer list
Feel free to edit ./examples/replibyte.yaml
with your configuration.
Usage
Example with PostgreSQL as a Source and Destination database AND S3 as a Bridge (cf configuration file)
Create a dev database dataset from your production database
Show me
replibyte -c prod-conf.yaml backup run
The backup is compressed and stored on your S3 bucket (cf configuration).
Create a dev database dataset from a dump file
Show me
cat dump.sql | replibyte -c prod-conf.yaml backup run -s postgres -i
The backup is compressed and stored on your S3 bucket (cf configuration).
Seed my local database (Docker required)
Show me
List all your backups to choose one:
replibyte -c prod-conf.yaml backup list
type name size when compressed encrypted
PostgreSQL backup-1647706359405 154MB Yesterday at 03:00 am true true
PostgreSQL backup-1647731334517 152MB 2 days ago at 03:00 am true true
PostgreSQL backup-1647734369306 149MB 3 days ago at 03:00 am true true
Restore the latest one into a Postgres container bound on 5433 (default: 5432) port:
replibyte -c prod-conf.yaml restore local -v latest --image postgres --port 5433
To connect to your Postgres database, use the following connection string:
> postgres://postgres:password@localhost:5433/postgres
Waiting for Ctrl-C to stop the container
OR restore a specific one:
replibyte -c prod-conf.yaml restore local -v backup-1647706359405 --image postgres --port 5433
The seed comes from your S3 bucket (cf configuration)
Seed a remote database
Show me
Show your backups:
replibyte -c prod-conf.yaml backup list
type name size when compressed encrypted
PostgreSQL backup-1647706359405 154MB Yesterday at 03:00 am true true
PostgreSQL backup-1647731334517 152MB 2 days ago at 03:00 am true true
PostgreSQL backup-1647734369306 149MB 3 days ago at 03:00 am true true
Restore the latest one:
replibyte -c prod-conf.yaml restore remote -v latest
OR restore a specific one:
replibyte -c prod-conf.yaml restore remote -v backup-1647706359405
The seed comes from your S3 bucket (cf configuration)
Configuration
Create your prod-conf.yaml
configuration file to source your production database.
encryption_key: $MY_PRIVATE_ENC_KEY # optional - encrypt data on bridge
source:
connection_uri: $DATABASE_URL
database_subset: # optional - downscale database while keeping it consistent
database: public
table: orders
strategy_name: random
strategy_options:
percent: 50
passthrough_tables:
- us_states
transformers: # optional - hide sensitive data
- database: public
table: employees
columns:
- name: last_name
transformer_name: random
- name: birth_date
transformer_name: random-date
- name: first_name
transformer_name: first-name
- name: email
transformer_name: email
- name: username
transformer_name: keep-first-char
- database: public
table: customers
columns:
- name: phone
transformer_name: phone-number
bridge:
bucket: $BUCKET_NAME
region: $S3_REGION
access_key_id: $ACCESS_KEY_ID
secret_access_key: $AWS_SECRET_ACCESS_KEY
Run the app for the source
replibyte -c prod-conf.yaml
Destination
Create your staging-conf.yaml
configuration file to sync your production database with your staging database.
bridge:
bucket: $BUCKET_NAME
region: $S3_REGION
access_key_id: $ACCESS_KEY_ID
secret_access_key: $AWS_SECRET_ACCESS_KEY
destination:
connection_uri: $DATABASE_URL
encryption_key: $MY_PRIVATE_ENC_KEY # optional - needed to decrypt data on bridge if there was an encryption_key defined when running the source backup
Run the app for the destination
replibyte -c staging-conf.yaml
How RepliByte works
Show me how RepliByte works
Check out our Design page
Connectors
Supported Source connectors
- PostgreSQL
- MongoDB
- Local dump file
- MySQL
Supported Transformers
A transformer is useful to change / hide the value of a column. RepliByte provides pre-made transformers.
Check out the list of our available Transformers
RepliByte Bridge
The S3 wire protocol, used by RepliByte bridge, is supported by most cloud providers. Here is a non-exhaustive list of S3 compatible services.
Cloud Service Provider | S3 service name | S3 compatible |
---|---|---|
Amazon Web Services | S3 | Yes (Original) |
Google Cloud Platform | Cloud Storage | Yes |
Microsoft Azure | Blob Storage | Yes |
Digital Ocean | Spaces | Yes |
Scaleway | Object Storage | Yes |
Minio | Object Storage | Yes |
Feel free to drop a PR to include another S3 compatible solution.
Supported Destination connectors
- PostgreSQL
- MongoDB
- Local dump file
- MySQL
Motivation
At Qovery (the company behind RepliByte), developers can clone their applications and databases just with one click. However, the cloning process can be tedious and time-consuming, and we end up copying the information multiple times. With RepliByte, the Qovery team wants to provide a comprehensive way to seed cloud databases from one place to another.
The long-term motivation behind RepliByte is to provide a way to clone any database in real-time. This project starts small, but has big ambition!
FAQ
Q: Does RepliByte is an ETL?
Answer
RepliByte is not an ETL like AirByte, AirFlow, Talend, and it will never be. If you need to synchronize versatile data sources, you are better choosing a classic ETL. RepliByte is a tool for software engineers to help them to synchronize data from the same databases. With RepliByte, you can only replicate data from the same type of databases. As mentioned above, the primary purpose of RepliByte is to duplicate into different environments. You can see RepliByte as a specific use case of an ETL, where an ETL is more generic.
Q: Do you support backup from a dump file?
Answer
absolutely,
cat dump.sql | replibyte -c prod-conf.yaml backup run -s postgres -i
and
replibyte -c prod-conf.yaml backup run -s postgres -f dump.sql
How RepliByte can list the backups? Is there an API?
Answer
There is no API, RepliByte is fully stateless and store the backup list into the bridge (E.g. S3) via an index_file .
Contributing
Show me how to contribute
Local development
For local development, you will need to install Docker and run docker compose -f ./docker-compose-dev.yml
to start the local databases. At the moment, docker-compose
includes 2 PostgreSQL database instances, 2 MySQL instances, 2 MongoDB instances and a MinIO bridge. One source, one destination by database and one bridge. In the future, we will provide more options.
The Minio console is accessible at http://localhost:9001.
Once your Docker instances are running, you can run the RepliByte tests, to check if everything is configured correctly:
AWS_ACCESS_KEY_ID=minioadmin AWS_SECRET_ACCESS_KEY=minioadmin cargo test
How to contribute
RepliByte is in its early stage of development and need some time to be usable in production. We need some help, and you are welcome to contribute. To better synchronize consider joining our #replibyte channel on our Discord. Otherwise, you can pick any open issues and contribute.
Where should I start?
Check the open issues and their priority.
How can I contact you?
3 options:
- Open an issue.
- Join our #replibyte channel on our discord.
- Drop us an email to
github+replibyte {at} qovery {dot} com
.
Telemetry
Show me
RepliByte collects anonymized data from users in order to improve our product. Feel free to inspect the code here. This can be deactivated at any time, and any data that has already been collected can be deleted on request (hello+replibyte {at} qovery {dot} com).
Collected data
- Command line parameters
- Options used (subset, transformer, compression) in the configuration file.
Thanks
Thanks to all people sharing their ideas to make RepliByte better. We do appreciate it. I would also thank AirByte, a great product and a trustworthy source of inspiration for this project.