Replibyte - a powerful tool to seed your databases

Qovery

Last update: Jan 9, 2023

Related tags

Database mysql rust aws postgres cloud backup database mongodb postgresql s3 rust-lang cloudnative

Overview

Seed Your Development Database With Real Data ⚡️

Replibyte is a powerful tool to seed your databases
with real data and other cool features 🔥

Features

Support data backup and restore for PostgreSQL, MySQL and MongoDB
Replace sensitive data with fake data
Works on large database (> 10GB) (read Design)
Database Subsetting: Scale down a production database to a more reasonable size 🔥
Start a local database with the prod data in a single command 🔥
On-the-fly data (de)compression (Zlib)
On-the-fly data de/encryption (AES-256)
Fully stateless (no server, no daemon) and lightweight binary 🍃
Use custom transformers

Here are the features we plan to support

Auto-detect and version database schema change
Auto-detect sensitive fields
Auto-clean backed up data

Install

Install on MacOSX

⚠️ RepliByte homebrew auto release is in maintenance. Consider using Docker or building from source in the meantime ⚠️

brew tap Qovery/replibyte
brew install replibyte

Or manually.

Install on Linux

# download latest replibyte archive for Linux
curl -s https://api.github.com/repos/Qovery/replibyte/releases/latest | \
    jq -r '.assets[].browser_download_url' | \
    grep -i 'linux-musl.tar.gz$' | wget -qi - && \

# unarchive
tar zxf *.tar.gz

# make replibyte executable
chmod +x replibyte

# make it accessible from everywhere
mv replibyte /usr/local/bin/

Install on Windows

Download the latest Windows release and install it.

Install from source

git clone https://github.com/Qovery/replibyte.git && cd replibyte 

# Install cargo
# visit: https://doc.rust-lang.org/cargo/getting-started/installation.html

# Build with cargo
cargo build --release

# Run RepliByte
./target/release/replibyte -h

Run replibyte with Docker

git clone https://github.com/Qovery/replibyte.git

# Build image with Docker
docker build -t replibyte -f Dockerfile .

# Run RepliByte
docker run -v $(pwd)/examples:/examples/ replibyte -c /examples/replibyte.yaml transformer list

Feel free to edit ./examples/replibyte.yaml with your configuration.

Usage

Example with PostgreSQL as a Source and Destination database AND S3 as a Bridge (cf configuration file)

Create a dev database dataset from your production database

Show me

replibyte -c prod-conf.yaml backup run

The backup is compressed and stored on your S3 bucket (cf configuration).

Create a dev database dataset from a dump file

Show me

cat dump.sql | replibyte -c prod-conf.yaml backup run -s postgres -i

The backup is compressed and stored on your S3 bucket (cf configuration).

Seed my local database (Docker required)

Show me

List all your backups to choose one:

replibyte -c prod-conf.yaml backup list

type          name                    size    when                    compressed  encrypted
PostgreSQL    backup-1647706359405    154MB   Yesterday at 03:00 am   true        true
PostgreSQL    backup-1647731334517    152MB   2 days ago at 03:00 am  true        true
PostgreSQL    backup-1647734369306    149MB   3 days ago at 03:00 am  true        true

Restore the latest one into a Postgres container bound on 5433 (default: 5432) port:

replibyte -c prod-conf.yaml restore local -v latest --image postgres --port 5433

To connect to your Postgres database, use the following connection string:
> postgres://postgres:password@localhost:5433/postgres
Waiting for Ctrl-C to stop the container

OR restore a specific one:

replibyte -c prod-conf.yaml restore local -v backup-1647706359405 --image postgres --port 5433

The seed comes from your S3 bucket (cf configuration)

Seed a remote database

Show me

Show your backups:

replibyte -c prod-conf.yaml backup list

type          name                    size    when                    compressed  encrypted
PostgreSQL    backup-1647706359405    154MB   Yesterday at 03:00 am   true        true
PostgreSQL    backup-1647731334517    152MB   2 days ago at 03:00 am  true        true
PostgreSQL    backup-1647734369306    149MB   3 days ago at 03:00 am  true        true

Restore the latest one:

replibyte -c prod-conf.yaml restore remote -v latest

OR restore a specific one:

replibyte -c prod-conf.yaml restore remote -v backup-1647706359405

The seed comes from your S3 bucket (cf configuration)

Configuration

Create your prod-conf.yaml configuration file to source your production database.

encryption_key: $MY_PRIVATE_ENC_KEY # optional - encrypt data on bridge
source:
  connection_uri: $DATABASE_URL
  database_subset: # optional - downscale database while keeping it consistent
    database: public
    table: orders
    strategy_name: random
    strategy_options:
      percent: 50
    passthrough_tables:
      - us_states
  transformers: # optional - hide sensitive data
    - database: public
      table: employees
      columns:
        - name: last_name
          transformer_name: random
        - name: birth_date
          transformer_name: random-date
        - name: first_name
          transformer_name: first-name
        - name: email
          transformer_name: email
        - name: username
          transformer_name: keep-first-char
    - database: public
      table: customers
      columns:
        - name: phone
          transformer_name: phone-number
bridge:
  bucket: $BUCKET_NAME
  region: $S3_REGION
  access_key_id: $ACCESS_KEY_ID
  secret_access_key: $AWS_SECRET_ACCESS_KEY

Run the app for the source

replibyte -c prod-conf.yaml

Destination

Create your staging-conf.yaml configuration file to sync your production database with your staging database.

bridge:
  bucket: $BUCKET_NAME
  region: $S3_REGION
  access_key_id: $ACCESS_KEY_ID
  secret_access_key: $AWS_SECRET_ACCESS_KEY
destination:
  connection_uri: $DATABASE_URL
encryption_key: $MY_PRIVATE_ENC_KEY # optional - needed to decrypt data on bridge if there was an encryption_key defined when running the source backup

Run the app for the destination

replibyte -c staging-conf.yaml

How RepliByte works

Show me how RepliByte works

Check out our Design page

Connectors

Supported Source connectors

PostgreSQL
MongoDB
Local dump file
MySQL

Supported Transformers

A transformer is useful to change / hide the value of a column. RepliByte provides pre-made transformers.

Check out the list of our available Transformers

RepliByte Bridge

The S3 wire protocol, used by RepliByte bridge, is supported by most cloud providers. Here is a non-exhaustive list of S3 compatible services.

Cloud Service Provider	S3 service name	S3 compatible
Amazon Web Services	S3	Yes (Original)
Google Cloud Platform	Cloud Storage	Yes
Microsoft Azure	Blob Storage	Yes
Digital Ocean	Spaces	Yes
Scaleway	Object Storage	Yes
Minio	Object Storage	Yes

Feel free to drop a PR to include another S3 compatible solution.

Supported Destination connectors

PostgreSQL
MongoDB
Local dump file
MySQL

Motivation

At Qovery (the company behind RepliByte), developers can clone their applications and databases just with one click. However, the cloning process can be tedious and time-consuming, and we end up copying the information multiple times. With RepliByte, the Qovery team wants to provide a comprehensive way to seed cloud databases from one place to another.

The long-term motivation behind RepliByte is to provide a way to clone any database in real-time. This project starts small, but has big ambition!

FAQ

Q: Does RepliByte is an ETL?

Answer

RepliByte is not an ETL like AirByte, AirFlow, Talend, and it will never be. If you need to synchronize versatile data sources, you are better choosing a classic ETL. RepliByte is a tool for software engineers to help them to synchronize data from the same databases. With RepliByte, you can only replicate data from the same type of databases. As mentioned above, the primary purpose of RepliByte is to duplicate into different environments. You can see RepliByte as a specific use case of an ETL, where an ETL is more generic.

Q: Do you support backup from a dump file?

Answer

absolutely,

cat dump.sql | replibyte -c prod-conf.yaml backup run -s postgres -i

and

replibyte -c prod-conf.yaml backup run -s postgres -f dump.sql

How RepliByte can list the backups? Is there an API?

Answer

There is no API, RepliByte is fully stateless and store the backup list into the bridge (E.g. S3) via an index_file .

⬆️ Open an issue if you have any question - I'll pick the most common questions and put them here with the answer

Contributing

Show me how to contribute

Local development

For local development, you will need to install Docker and run docker compose -f ./docker-compose-dev.yml to start the local databases. At the moment, docker-compose includes 2 PostgreSQL database instances, 2 MySQL instances, 2 MongoDB instances and a MinIO bridge. One source, one destination by database and one bridge. In the future, we will provide more options.

The Minio console is accessible at http://localhost:9001.

Once your Docker instances are running, you can run the RepliByte tests, to check if everything is configured correctly:

AWS_ACCESS_KEY_ID=minioadmin AWS_SECRET_ACCESS_KEY=minioadmin cargo test

How to contribute

RepliByte is in its early stage of development and need some time to be usable in production. We need some help, and you are welcome to contribute. To better synchronize consider joining our #replibyte channel on our Discord. Otherwise, you can pick any open issues and contribute.

Where should I start?

Check the open issues and their priority.

How can I contact you?

3 options:

Open an issue.
Join our #replibyte channel on our discord.
Drop us an email to github+replibyte {at} qovery {dot} com.

Telemetry

Show me

RepliByte collects anonymized data from users in order to improve our product. Feel free to inspect the code here. This can be deactivated at any time, and any data that has already been collected can be deleted on request (hello+replibyte {at} qovery {dot} com).

Collected data

Command line parameters
Options used (subset, transformer, compression) in the configuration file.

Thanks

Thanks to all people sharing their ideas to make RepliByte better. We do appreciate it. I would also thank AirByte, a great product and a trustworthy source of inspiration for this project.

Additional resources

Comments

panic: Unterminated string literal in SQL instruction

Hello, at Flexhire we are trying out this tool to seed our staging DB with production data.

We are using replibyte v0.6 for Linux x86 and the dump is of a PostgreSQL 11 DB hosted on AWS RDS.

When generating a dump from our production DB we are encountering the following crash. I included a stack backtrace generated with RUST_BACKTRACE=full

We have text columns containing user entered text in the markdown format that might have characters such as '. The application uses rails so these characters should be escaped. Perhaps there is some issue with the escaping done by Replibyte? It looks like INSERT instructions are the ones causing the problem.

thread 'main' panicked at 'TokenizerError { message: "Unterminated string literal", line: 1, col: 788 }', dump-parser/src/postgres/mod.rs:747:13
stack backtrace:
   0:     0x7f147c81bc5d - std::backtrace_rs::backtrace::libunwind::trace::h081201764674ef17
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:     0x7f147c81bc5d - std::backtrace_rs::backtrace::trace_unsynchronized::hebab37398c391bd7
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x7f147c81bc5d - std::sys_common::backtrace::_print_fmt::h301516df68ed24f9
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/sys_common/backtrace.rs:66:5
   3:     0x7f147c81bc5d - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h8f5170f4f03a12c0
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/sys_common/backtrace.rs:45:22
   4:     0x7f147c867fac - core::fmt::write::h5dc5601e8d9f6367
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/fmt/mod.rs:1190:17
   5:     0x7f147c8138c8 - std::io::Write::write_fmt::h5b19302eb99d9acf
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/io/mod.rs:1657:15
   6:     0x7f147c81e2e7 - std::sys_common::backtrace::_print::hd81cf53a75c8ae6a
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/sys_common/backtrace.rs:48:5
   7:     0x7f147c81e2e7 - std::sys_common::backtrace::print::hb5aa882e87c2a0dc
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/sys_common/backtrace.rs:35:9
   8:     0x7f147c81e2e7 - std::panicking::default_hook::{{closure}}::had913369af61b326
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:295:22
   9:     0x7f147c81dfb0 - std::panicking::default_hook::h37b06af9ee965447
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:314:9
  10:     0x7f147c81ea39 - std::panicking::rust_panic_with_hook::hf2019958d21362cc
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:698:17
  11:     0x7f147c81e727 - std::panicking::begin_panic_handler::{{closure}}::he9c06fdd592f8785
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:588:13
  12:     0x7f147c81c124 - std::sys_common::backtrace::__rust_end_short_backtrace::ha521b96560789310
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/sys_common/backtrace.rs:138:18
  13:     0x7f147c81e439 - rust_begin_unwind
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:584:5
  14:     0x7f147b94ba23 - core::panicking::panic_fmt::h28f1697d4e9394b4
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/panicking.rs:143:14
  15:     0x7f147c267afe - dump_parser::postgres::get_tokens_from_query_str::h5c129d0e7d926086
  16:     0x7f147b982287 - replibyte::source::postgres::read_and_transform::{{closure}}::h38d9592830b06c69
  17:     0x7f147b97b9d6 - dump_parser::utils::list_sql_queries_from_dump_reader::h540f884dd1afeb0d
  18:     0x7f147b9dad0f - <replibyte::tasks::full_dump::FullDumpTask<S> as replibyte::tasks::Task>::run::ha2213c1574b9739f
  19:     0x7f147ba97a55 - replibyte::commands::dump::run::h86af3c9d6a23f4c9
  20:     0x7f147ba59b30 - replibyte::main::h880123c1e7bd6b08
  21:     0x7f147ba8b9b3 - std::sys_common::backtrace::__rust_begin_short_backtrace::h9dd32b852f85f8a3
  22:     0x7f147b9f1e89 - std::rt::lang_start::{{closure}}::h434a6ad0c3bb7757
  23:     0x7f147c81b3b4 - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::hd127f27863548251
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ops/function.rs:259:13
  24:     0x7f147c81b3b4 - std::panicking::try::do_call::h926290883a1d024e
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:492:40
  25:     0x7f147c81b3b4 - std::panicking::try::hc74a3d1f4a4b6e5f
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:456:19
  26:     0x7f147c81b3b4 - std::panic::catch_unwind::h5eb7ded2df1a4d5f
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panic.rs:137:14
  27:     0x7f147c81b3b4 - std::rt::lang_start_internal::{{closure}}::h0736f9682f7c55ea
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/rt.rs:128:48
  28:     0x7f147c81b3b4 - std::panicking::try::do_call::h2772c479b1c89ef7
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:492:40
  29:     0x7f147c81b3b4 - std::panicking::try::h967ebbc371287391
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:456:19
  30:     0x7f147c81b3b4 - std::panic::catch_unwind::h41bcc02b28316856
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panic.rs:137:14
  31:     0x7f147c81b3b4 - std::rt::lang_start_internal::haf46799f55774d07
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/rt.rs:128:20
  32:     0x7f147ba5db02 - main

opened by fazo96 13

How to use on macos?

hello! I was wondering the best way to use this on macos. From the docs, I used brew to install:

brew install replibyte

but when I restart my terminal and run replibyte, I get:

zsh: command not found: replibyte

Brew info:

✗ brew info replibyte
qovery/replibyte/replibyte: stable 0.6.3
Seed Your Development Database With Real Data ⚡️
https://github.com/Qovery/replibyte
/usr/local/Cellar/replibyte/0.6.3 (49 files, 275.3KB) *
  Built from source on 2022-05-16 at 10:56:08
From: https://github.com/Qovery/homebrew-replibyte/blob/HEAD/Formula/replibyte.rb
License: MIT

brew --cellar replibyte

✗ brew --cellar replibyte
/usr/local/Cellar/replibyte

Is there a step that I'm missing with brew?

bug question

opened by brett-anderson 10

feat: restore backup in local container

This PR add support for restoring a backup inside a Docker container as discussed in #32.

This is currently a wip, needs some tests and improvements

Closes #32

opened by fabriceclementz 10
mongo dump not working, terminal without output
Hi, and thank you really much for your efforts. I'm trying to make an anonymous dump using replibyte and a mongo database but I think is not working as expected. I have my conf.yaml, really simple just to avoid interactions and I'm doing everything locally to avoid mistakes.

conf.yaml

source: connection_uri: mongodb://mongo:mongo@localhost:27017/test transformers: # optional - hide sensitive data - database: test table: employees columns: - name: surname transformer_name: random - name: first_name transformer_name: first-name - name: email transformer_name: email - name: phone_ext transformer_name: phone-number destination: connection_uri: mongodb://mongo:mongo@localhost:27017/test # you can use $DATABASE_URL datastore: local_disk: dir: .

But everytime I run replibyte -c conf.yaml dump create I have no answer in the terminal, sometimes I can see this message "No such file or directory (os error 2)". I'm using mongo4.4 is not supported?

I just want to run the transformer but I'm not sure if I can. Thank you very much for your help.

Kind regards
opened by victorgomezg93 8
Nomenclature / naming

I have the feeling that "bridge" does not mean anything and it might be more accurate to rename it into something like "store". The idea is a place where to store the created dataset from the source database. A "bridge" does not really reflect this concept. What do you think?
question

opened by evoxmusic 8
feat: custom transformer with wasm
This is a draft of implementing #26

I have written some boilerplate code which will eventually lead to the full implementation of custom transformer with Web Assembly (wasm) and I also included one simple test.

Before I continue with the implementation, there are a few points that should be addressed:

where should we receive the input for this transformer (the wasm bytes)? as a special new argument flag? or from replibyte.yaml file?

wasm only supports 32/64 bits numbers, so maybe we should add a few more values to Column:

pub enum Column { // ... NumberValuei32(String, i32), NumberValuei64(String, i64), NumberValuei128(String, i128), // not compatible with wasm, unless we truncate // ... }

the first try of compilation yielded the following error: *mut Ctx cannot be shared between threads safely within wasmer_runtime_core::instance::InstanceInner, the trait std::marker::Sync is not implemented for *mut Ctx

This is due to fact that Transformer is Sync. as a temporary solution, I removed Sync from Transformer, of course I realize that this is not a long term solution but I wanted to make the compilation and example test work. we need to think about how we are going to address this issue.

There will probably be more points to think about in the future, but for now these are the main things that bothered me. @evoxmusic what do you think?
opened by benny-n 8
Enhancement: make RepliByte installable from Homebrew

Everything is in the title :) Today, it's tedious to install RepliByte from MacOSX. It will be great to make it installable from Homebrew for MacOSX Intel and M1 processors. Anyone can help here? :)
enhancement

opened by evoxmusic 8
setup lib.rs and bin coexistence
The main thing I'm trying to do in this pull-request is to add the file replibyte/tests/mysql.rs. In addition to allowing importing modules from tests folder, it also revealed that the current structure allows for 'clever' use statements such as

// replibyte/src/config.rs use crate::{RandomTransformer, Transformer}; // or replibyte/src/tasks.rs use crate::Source;

which is saying "use RandomTransformer, Transformer that are imported from crate (main.rs)". If main.rs in the future doesn't use them anymore, replibyte/src/config.rs (and its dependent) would break.

Please let me know what you think 🤓
opened by tbmreza 8

Transformers don't work with tables containing uppercase letters

Transformers are not working correctly with tables containing uppercase characters (in my case a table named User). When I renamed the table to user, it worked perfectly. Is there a solution to make uppercase table names work on transformers ?

The conf.yaml file I used is:

source:
  connection_uri: $DATABASE_SOURCE_URL
  transformers:
    - database: public
      table: User
      columns:
        - name: name
          transformer_name: random
        - name: email
          transformer_name: email
datastore:
  aws:
    bucket: $BUCKET_NAME
    region: $S3_REGION
    credentials:
      access_key_id: $ACCESS_KEY_ID
      secret_access_key: $AWS_SECRET_ACCESS_KEY
destination:
  connection_uri: $DATABASE_DESTINATION_URL

bug

opened by mkgharbi 7

Docker image not working
Hi, I'm trying to run the docker image to test just following the guide:

git clone https://github.com/Qovery/replibyte.git # Build image with Docker docker build -t replibyte -f Dockerfile . # Run RepliByte docker run -v $(pwd)/examples:/examples/ replibyte -c /examples/replibyte.yaml transformer list

I'm doing it in Ubuntu 20.04 and when I try to make the docker run I have this error:

No such file or directory (os error 2)

Maybe some package is missing? If I make the installation from source or from linux the same message appear.

Thank you very much for your help
opened by victorgomezg93 7

Rename backup to dump?

As we rename the cli command from backup to dump, I think we could rename it to Dump in the datastore directory. WDYT?

Example:

// datastore/mod.rs
pub struct Backup {
    pub directory_name: String,
    pub size: usize,
    pub created_at: u128,
    pub compressed: bool,
    pub encrypted: bool,
}

pub struct Dump {
    pub directory_name: String,
    pub size: usize,
    pub created_at: u128,
    pub compressed: bool,
    pub encrypted: bool,
}

// datastore/mod.rs
pub struct IndexFile {
    pub backups: Vec<Backup>,
}

pub struct IndexFile {
    pub dumps: Vec<Dump>,
}

This update will lead to a renaming of the key backups to dumps inside the metadata.json file.

enhancement

opened by fabriceclementz 7

subsetting strategy
The documentation on subsets is a bit sparse. Is there a way of creating more complex strategies for subsetting?

i.e. interactions often occur between tables: products, orders, customers, a percentage of products might leave a lot of empty orders, or a lot of products not bought.

What if I wanted:

a subset of users based on a random sample

a subset of those users orders, maybe based on a specific time period (the last three months)

a subset of products—maybe all the ones in the orders above, plus a few random extra ones
opened by janrito 1
multiple database name locations in configuration
The connection URI includes the database name—but transformers and subsets also ask for a database parameter? does one override the other?

connection_uri: postgres://user:password@host:port/db transformers: - database: public table: customers .... database_subset: database: public table: customers

is the database public? db? can we leave one out?
opened by janrito 0
Interpolation of variables in connection_uri
Is there a way to interpolate environment variables in the connection_uri? It would be useful to keep the other bits in version control.

source: connection_uri: postgres://user:$DB_PASSWORD@host:port/db
opened by janrito 0
Not working --file option?
Hi there. I tried to create a dump using MongoDB's manual dump file with reference to this doc. https://www.replibyte.com/docs/guides/create-a-dump#option-2-make-a-dump-manually

this command is working.

cat sample_dump | replibyte -c config.yaml dump create -s mongodb -i

but this command infinitely spins a spinner and don't complete.

replibyte -c config.yaml dump create -s mongodb -i --file sample_dump

I checked the code and it seems to stop at this line. https://github.com/Qovery/Replibyte/blob/d6b35a7455b7b9d6bc1456b3fe1f7644377e7153/replibyte/src/commands/dump.rs#L200

I guess it occurs other than MongoDB, and #209 is related to this issue. I want to try fixing this bug but I don't know how to fix it.

Environment

OS: Mac OS Monterey (12.5) System Model Name: Macbook Pro CPU: Apple M1 Max replibyte version: 0.10.0 config:

source: connection_uri: mongodb://root:password@localhost:27017/ destination: connection_uri: mongodb://root:password@localhost:27017/ datastore: local_disk: dir: /datastore
opened by ishikawa-pro 0
Running replibyte results in command being killed
I'm currently running replibyte trying to subset my database.

My config is something like:

source: connection_uri: $DB_URL database_subset: database: public table: cart strategy_name: random strategy_options: percent: 20 passthrough_tables: - country .... <30 more tables> transformers: - database: public table: customer columns: - name: email transformer_name: email ... <23 columns and 8 tables>

This results in replibyte running out of memory (I'm guessing given the screenshot of the process from my activity montitor below, taken right before the process crashed)

Is there any way to reduce the memory consumption, the database is in total around 250mb (148mb if asking postgres with: SELECT pg_size_pretty( pg_database_size('dbname') )) and 51gb of memory taken up seems excessive.
opened by pKorsholm 0