Replibyte - a powerful tool to seed your databases

Overview

replibyte logo

Seed Your Development Database With Real Data ⚡️

Replibyte is a powerful tool to seed your databases
with real data and other cool features 🔥

stable badge Build and Tests Discord

Features

  • Support data backup and restore for PostgreSQL, MySQL and MongoDB
  • Replace sensitive data with fake data
  • Works on large database (> 10GB) (read Design)
  • Database Subsetting: Scale down a production database to a more reasonable size 🔥
  • Start a local database with the prod data in a single command 🔥
  • On-the-fly data (de)compression (Zlib)
  • On-the-fly data de/encryption (AES-256)
  • Fully stateless (no server, no daemon) and lightweight binary 🍃
  • Use custom transformers

Here are the features we plan to support

  • Auto-detect and version database schema change
  • Auto-detect sensitive fields
  • Auto-clean backed up data

Install

Install on MacOSX

⚠️ RepliByte homebrew auto release is in maintenance. Consider using Docker or building from source in the meantime ⚠️

brew tap Qovery/replibyte
brew install replibyte

Or manually.

Install on Linux
# download latest replibyte archive for Linux
curl -s https://api.github.com/repos/Qovery/replibyte/releases/latest | \
    jq -r '.assets[].browser_download_url' | \
    grep -i 'linux-musl.tar.gz$' | wget -qi - && \

# unarchive
tar zxf *.tar.gz

# make replibyte executable
chmod +x replibyte

# make it accessible from everywhere
mv replibyte /usr/local/bin/
Install on Windows

Download the latest Windows release and install it.

Install from source
git clone https://github.com/Qovery/replibyte.git && cd replibyte 

# Install cargo
# visit: https://doc.rust-lang.org/cargo/getting-started/installation.html

# Build with cargo
cargo build --release

# Run RepliByte
./target/release/replibyte -h
Run replibyte with Docker
git clone https://github.com/Qovery/replibyte.git

# Build image with Docker
docker build -t replibyte -f Dockerfile .

# Run RepliByte
docker run -v $(pwd)/examples:/examples/ replibyte -c /examples/replibyte.yaml transformer list

Feel free to edit ./examples/replibyte.yaml with your configuration.

Usage

What is RepliByte

Example with PostgreSQL as a Source and Destination database AND S3 as a Bridge (cf configuration file)

Create a dev database dataset from your production database

Show me
replibyte -c prod-conf.yaml backup run

The backup is compressed and stored on your S3 bucket (cf configuration).

Create a dev database dataset from a dump file

Show me
cat dump.sql | replibyte -c prod-conf.yaml backup run -s postgres -i

The backup is compressed and stored on your S3 bucket (cf configuration).

Seed my local database (Docker required)

Show me

List all your backups to choose one:

replibyte -c prod-conf.yaml backup list

type          name                    size    when                    compressed  encrypted
PostgreSQL    backup-1647706359405    154MB   Yesterday at 03:00 am   true        true
PostgreSQL    backup-1647731334517    152MB   2 days ago at 03:00 am  true        true
PostgreSQL    backup-1647734369306    149MB   3 days ago at 03:00 am  true        true

Restore the latest one into a Postgres container bound on 5433 (default: 5432) port:

replibyte -c prod-conf.yaml restore local -v latest --image postgres --port 5433

To connect to your Postgres database, use the following connection string:
> postgres://postgres:password@localhost:5433/postgres
Waiting for Ctrl-C to stop the container

OR restore a specific one:

replibyte -c prod-conf.yaml restore local -v backup-1647706359405 --image postgres --port 5433

The seed comes from your S3 bucket (cf configuration)

Seed a remote database

Show me

Show your backups:

replibyte -c prod-conf.yaml backup list

type          name                    size    when                    compressed  encrypted
PostgreSQL    backup-1647706359405    154MB   Yesterday at 03:00 am   true        true
PostgreSQL    backup-1647731334517    152MB   2 days ago at 03:00 am  true        true
PostgreSQL    backup-1647734369306    149MB   3 days ago at 03:00 am  true        true

Restore the latest one:

replibyte -c prod-conf.yaml restore remote -v latest

OR restore a specific one:

replibyte -c prod-conf.yaml restore remote -v backup-1647706359405

The seed comes from your S3 bucket (cf configuration)

Configuration

Create your prod-conf.yaml configuration file to source your production database.

encryption_key: $MY_PRIVATE_ENC_KEY # optional - encrypt data on bridge
source:
  connection_uri: $DATABASE_URL
  database_subset: # optional - downscale database while keeping it consistent
    database: public
    table: orders
    strategy_name: random
    strategy_options:
      percent: 50
    passthrough_tables:
      - us_states
  transformers: # optional - hide sensitive data
    - database: public
      table: employees
      columns:
        - name: last_name
          transformer_name: random
        - name: birth_date
          transformer_name: random-date
        - name: first_name
          transformer_name: first-name
        - name: email
          transformer_name: email
        - name: username
          transformer_name: keep-first-char
    - database: public
      table: customers
      columns:
        - name: phone
          transformer_name: phone-number
bridge:
  bucket: $BUCKET_NAME
  region: $S3_REGION
  access_key_id: $ACCESS_KEY_ID
  secret_access_key: $AWS_SECRET_ACCESS_KEY

Run the app for the source

replibyte -c prod-conf.yaml

Destination

Create your staging-conf.yaml configuration file to sync your production database with your staging database.

bridge:
  bucket: $BUCKET_NAME
  region: $S3_REGION
  access_key_id: $ACCESS_KEY_ID
  secret_access_key: $AWS_SECRET_ACCESS_KEY
destination:
  connection_uri: $DATABASE_URL
encryption_key: $MY_PRIVATE_ENC_KEY # optional - needed to decrypt data on bridge if there was an encryption_key defined when running the source backup

Run the app for the destination

replibyte -c staging-conf.yaml

How RepliByte works

Show me how RepliByte works

Check out our Design page

Connectors

Supported Source connectors

  • PostgreSQL
  • MongoDB
  • Local dump file
  • MySQL

Supported Transformers

A transformer is useful to change / hide the value of a column. RepliByte provides pre-made transformers.

Check out the list of our available Transformers

RepliByte Bridge

The S3 wire protocol, used by RepliByte bridge, is supported by most cloud providers. Here is a non-exhaustive list of S3 compatible services.

Cloud Service Provider S3 service name S3 compatible
Amazon Web Services S3 Yes (Original)
Google Cloud Platform Cloud Storage Yes
Microsoft Azure Blob Storage Yes
Digital Ocean Spaces Yes
Scaleway Object Storage Yes
Minio Object Storage Yes

Feel free to drop a PR to include another S3 compatible solution.

Supported Destination connectors

  • PostgreSQL
  • MongoDB
  • Local dump file
  • MySQL

Motivation

At Qovery (the company behind RepliByte), developers can clone their applications and databases just with one click. However, the cloning process can be tedious and time-consuming, and we end up copying the information multiple times. With RepliByte, the Qovery team wants to provide a comprehensive way to seed cloud databases from one place to another.

The long-term motivation behind RepliByte is to provide a way to clone any database in real-time. This project starts small, but has big ambition!

FAQ

Q: Does RepliByte is an ETL?

Answer

RepliByte is not an ETL like AirByte, AirFlow, Talend, and it will never be. If you need to synchronize versatile data sources, you are better choosing a classic ETL. RepliByte is a tool for software engineers to help them to synchronize data from the same databases. With RepliByte, you can only replicate data from the same type of databases. As mentioned above, the primary purpose of RepliByte is to duplicate into different environments. You can see RepliByte as a specific use case of an ETL, where an ETL is more generic.

Q: Do you support backup from a dump file?

Answer

absolutely,

cat dump.sql | replibyte -c prod-conf.yaml backup run -s postgres -i

and

replibyte -c prod-conf.yaml backup run -s postgres -f dump.sql

How RepliByte can list the backups? Is there an API?

Answer

There is no API, RepliByte is fully stateless and store the backup list into the bridge (E.g. S3) via an index_file .


⬆️ Open an issue if you have any question - I'll pick the most common questions and put them here with the answer

Contributing

Show me how to contribute

Local development

For local development, you will need to install Docker and run docker compose -f ./docker-compose-dev.yml to start the local databases. At the moment, docker-compose includes 2 PostgreSQL database instances, 2 MySQL instances, 2 MongoDB instances and a MinIO bridge. One source, one destination by database and one bridge. In the future, we will provide more options.

The Minio console is accessible at http://localhost:9001.

Once your Docker instances are running, you can run the RepliByte tests, to check if everything is configured correctly:

AWS_ACCESS_KEY_ID=minioadmin AWS_SECRET_ACCESS_KEY=minioadmin cargo test

How to contribute

RepliByte is in its early stage of development and need some time to be usable in production. We need some help, and you are welcome to contribute. To better synchronize consider joining our #replibyte channel on our Discord. Otherwise, you can pick any open issues and contribute.

Where should I start?

Check the open issues and their priority.

How can I contact you?

3 options:

  1. Open an issue.
  2. Join our #replibyte channel on our discord.
  3. Drop us an email to github+replibyte {at} qovery {dot} com.

Telemetry

Show me

RepliByte collects anonymized data from users in order to improve our product. Feel free to inspect the code here. This can be deactivated at any time, and any data that has already been collected can be deleted on request (hello+replibyte {at} qovery {dot} com).

Collected data

  • Command line parameters
  • Options used (subset, transformer, compression) in the configuration file.

Thanks

Thanks to all people sharing their ideas to make RepliByte better. We do appreciate it. I would also thank AirByte, a great product and a trustworthy source of inspiration for this project.

Additional resources

Comments
  • panic: Unterminated string literal in SQL instruction

    panic: Unterminated string literal in SQL instruction

    Hello, at Flexhire we are trying out this tool to seed our staging DB with production data.

    We are using replibyte v0.6 for Linux x86 and the dump is of a PostgreSQL 11 DB hosted on AWS RDS.

    When generating a dump from our production DB we are encountering the following crash. I included a stack backtrace generated with RUST_BACKTRACE=full

    We have text columns containing user entered text in the markdown format that might have characters such as '. The application uses rails so these characters should be escaped. Perhaps there is some issue with the escaping done by Replibyte? It looks like INSERT instructions are the ones causing the problem.

    thread 'main' panicked at 'TokenizerError { message: "Unterminated string literal", line: 1, col: 788 }', dump-parser/src/postgres/mod.rs:747:13
    stack backtrace:
       0:     0x7f147c81bc5d - std::backtrace_rs::backtrace::libunwind::trace::h081201764674ef17
                                   at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
       1:     0x7f147c81bc5d - std::backtrace_rs::backtrace::trace_unsynchronized::hebab37398c391bd7
                                   at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
       2:     0x7f147c81bc5d - std::sys_common::backtrace::_print_fmt::h301516df68ed24f9
                                   at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/sys_common/backtrace.rs:66:5
       3:     0x7f147c81bc5d - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h8f5170f4f03a12c0
                                   at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/sys_common/backtrace.rs:45:22
       4:     0x7f147c867fac - core::fmt::write::h5dc5601e8d9f6367
                                   at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/fmt/mod.rs:1190:17
       5:     0x7f147c8138c8 - std::io::Write::write_fmt::h5b19302eb99d9acf
                                   at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/io/mod.rs:1657:15
       6:     0x7f147c81e2e7 - std::sys_common::backtrace::_print::hd81cf53a75c8ae6a
                                   at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/sys_common/backtrace.rs:48:5
       7:     0x7f147c81e2e7 - std::sys_common::backtrace::print::hb5aa882e87c2a0dc
                                   at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/sys_common/backtrace.rs:35:9
       8:     0x7f147c81e2e7 - std::panicking::default_hook::{{closure}}::had913369af61b326
                                   at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:295:22
       9:     0x7f147c81dfb0 - std::panicking::default_hook::h37b06af9ee965447
                                   at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:314:9
      10:     0x7f147c81ea39 - std::panicking::rust_panic_with_hook::hf2019958d21362cc
                                   at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:698:17
      11:     0x7f147c81e727 - std::panicking::begin_panic_handler::{{closure}}::he9c06fdd592f8785
                                   at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:588:13
      12:     0x7f147c81c124 - std::sys_common::backtrace::__rust_end_short_backtrace::ha521b96560789310
                                   at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/sys_common/backtrace.rs:138:18
      13:     0x7f147c81e439 - rust_begin_unwind
                                   at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:584:5
      14:     0x7f147b94ba23 - core::panicking::panic_fmt::h28f1697d4e9394b4
                                   at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/panicking.rs:143:14
      15:     0x7f147c267afe - dump_parser::postgres::get_tokens_from_query_str::h5c129d0e7d926086
      16:     0x7f147b982287 - replibyte::source::postgres::read_and_transform::{{closure}}::h38d9592830b06c69
      17:     0x7f147b97b9d6 - dump_parser::utils::list_sql_queries_from_dump_reader::h540f884dd1afeb0d
      18:     0x7f147b9dad0f - <replibyte::tasks::full_dump::FullDumpTask<S> as replibyte::tasks::Task>::run::ha2213c1574b9739f
      19:     0x7f147ba97a55 - replibyte::commands::dump::run::h86af3c9d6a23f4c9
      20:     0x7f147ba59b30 - replibyte::main::h880123c1e7bd6b08
      21:     0x7f147ba8b9b3 - std::sys_common::backtrace::__rust_begin_short_backtrace::h9dd32b852f85f8a3
      22:     0x7f147b9f1e89 - std::rt::lang_start::{{closure}}::h434a6ad0c3bb7757
      23:     0x7f147c81b3b4 - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::hd127f27863548251
                                   at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ops/function.rs:259:13
      24:     0x7f147c81b3b4 - std::panicking::try::do_call::h926290883a1d024e
                                   at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:492:40
      25:     0x7f147c81b3b4 - std::panicking::try::hc74a3d1f4a4b6e5f
                                   at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:456:19
      26:     0x7f147c81b3b4 - std::panic::catch_unwind::h5eb7ded2df1a4d5f
                                   at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panic.rs:137:14
      27:     0x7f147c81b3b4 - std::rt::lang_start_internal::{{closure}}::h0736f9682f7c55ea
                                   at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/rt.rs:128:48
      28:     0x7f147c81b3b4 - std::panicking::try::do_call::h2772c479b1c89ef7
                                   at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:492:40
      29:     0x7f147c81b3b4 - std::panicking::try::h967ebbc371287391
                                   at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:456:19
      30:     0x7f147c81b3b4 - std::panic::catch_unwind::h41bcc02b28316856
                                   at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panic.rs:137:14
      31:     0x7f147c81b3b4 - std::rt::lang_start_internal::haf46799f55774d07
                                   at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/rt.rs:128:20
      32:     0x7f147ba5db02 - main
    
    opened by fazo96 13
  • How to use on macos?

    How to use on macos?

    hello! I was wondering the best way to use this on macos. From the docs, I used brew to install:

    brew install replibyte
    

    but when I restart my terminal and run replibyte, I get:

    zsh: command not found: replibyte
    

    Brew info:

    ✗ brew info replibyte
    qovery/replibyte/replibyte: stable 0.6.3
    Seed Your Development Database With Real Data ⚡️
    https://github.com/Qovery/replibyte
    /usr/local/Cellar/replibyte/0.6.3 (49 files, 275.3KB) *
      Built from source on 2022-05-16 at 10:56:08
    From: https://github.com/Qovery/homebrew-replibyte/blob/HEAD/Formula/replibyte.rb
    License: MIT
    

    brew --cellar replibyte

    ✗ brew --cellar replibyte
    /usr/local/Cellar/replibyte
    

    Is there a step that I'm missing with brew?

    bug question 
    opened by brett-anderson 10
  • feat: restore backup in local container

    feat: restore backup in local container

    This PR add support for restoring a backup inside a Docker container as discussed in #32.

    This is currently a wip, needs some tests and improvements

    Closes #32

    opened by fabriceclementz 10
  • mongo dump not working, terminal without output

    mongo dump not working, terminal without output

    Hi, and thank you really much for your efforts. I'm trying to make an anonymous dump using replibyte and a mongo database but I think is not working as expected. I have my conf.yaml, really simple just to avoid interactions and I'm doing everything locally to avoid mistakes.

    conf.yaml

    source:
      connection_uri: mongodb://mongo:mongo@localhost:27017/test
      transformers: # optional - hide sensitive data
          - database: test
            table: employees
            columns:
              - name: surname
                transformer_name: random
              - name: first_name
                transformer_name: first-name
              - name: email
                transformer_name: email
              - name: phone_ext
                transformer_name: phone-number
    destination:
      connection_uri: mongodb://mongo:mongo@localhost:27017/test # you can use $DATABASE_URL
    datastore:
      local_disk:
        dir: .
    

    But everytime I run replibyte -c conf.yaml dump create I have no answer in the terminal, sometimes I can see this message "No such file or directory (os error 2)". I'm using mongo4.4 is not supported?

    image

    I just want to run the transformer but I'm not sure if I can. Thank you very much for your help.

    Kind regards

    opened by victorgomezg93 8
  • Nomenclature / naming

    Nomenclature / naming

    I have the feeling that "bridge" does not mean anything and it might be more accurate to rename it into something like "store". The idea is a place where to store the created dataset from the source database. A "bridge" does not really reflect this concept. What do you think?

    question 
    opened by evoxmusic 8
  • feat: custom transformer with wasm

    feat: custom transformer with wasm

    This is a draft of implementing #26

    I have written some boilerplate code which will eventually lead to the full implementation of custom transformer with Web Assembly (wasm) and I also included one simple test.

    Before I continue with the implementation, there are a few points that should be addressed:

    1. where should we receive the input for this transformer (the wasm bytes)? as a special new argument flag? or from replibyte.yaml file?

    2. wasm only supports 32/64 bits numbers, so maybe we should add a few more values to Column:

      pub enum Column {
           // ...
           NumberValuei32(String, i32),
           NumberValuei64(String, i64),
           NumberValuei128(String, i128),  // not compatible with wasm, unless we truncate
          // ...
      }
      
    3. the first try of compilation yielded the following error: *mut Ctx cannot be shared between threads safely within wasmer_runtime_core::instance::InstanceInner, the trait std::marker::Sync is not implemented for *mut Ctx

      This is due to fact that Transformer is Sync. as a temporary solution, I removed Sync from Transformer, of course I realize that this is not a long term solution but I wanted to make the compilation and example test work. we need to think about how we are going to address this issue.

    There will probably be more points to think about in the future, but for now these are the main things that bothered me. @evoxmusic what do you think?

    opened by benny-n 8
  • Enhancement: make RepliByte installable from Homebrew

    Enhancement: make RepliByte installable from Homebrew

    Everything is in the title :) Today, it's tedious to install RepliByte from MacOSX. It will be great to make it installable from Homebrew for MacOSX Intel and M1 processors. Anyone can help here? :)

    enhancement 
    opened by evoxmusic 8
  • setup lib.rs and bin coexistence

    setup lib.rs and bin coexistence

    The main thing I'm trying to do in this pull-request is to add the file replibyte/tests/mysql.rs. In addition to allowing importing modules from tests folder, it also revealed that the current structure allows for 'clever' use statements such as

    // replibyte/src/config.rs
    use crate::{RandomTransformer, Transformer};
    
    // or replibyte/src/tasks.rs
    use crate::Source;
    

    which is saying "use RandomTransformer, Transformer that are imported from crate (main.rs)". If main.rs in the future doesn't use them anymore, replibyte/src/config.rs (and its dependent) would break.

    Please let me know what you think 🤓

    opened by tbmreza 8
  • Transformers don't work with tables containing uppercase letters

    Transformers don't work with tables containing uppercase letters

    Hi

    Transformers are not working correctly with tables containing uppercase characters (in my case a table named User). When I renamed the table to user, it worked perfectly. Is there a solution to make uppercase table names work on transformers ?

    The conf.yaml file I used is:

    source:
      connection_uri: $DATABASE_SOURCE_URL
      transformers:
        - database: public
          table: User
          columns:
            - name: name
              transformer_name: random
            - name: email
              transformer_name: email
    datastore:
      aws:
        bucket: $BUCKET_NAME
        region: $S3_REGION
        credentials:
          access_key_id: $ACCESS_KEY_ID
          secret_access_key: $AWS_SECRET_ACCESS_KEY
    destination:
      connection_uri: $DATABASE_DESTINATION_URL
    
    bug 
    opened by mkgharbi 7
  • Docker image not working

    Docker image not working

    Hi, I'm trying to run the docker image to test just following the guide:

    git clone https://github.com/Qovery/replibyte.git
    
    # Build image with Docker
    docker build -t replibyte -f Dockerfile .
    
    # Run RepliByte
    docker run -v $(pwd)/examples:/examples/ replibyte -c /examples/replibyte.yaml transformer list
    

    I'm doing it in Ubuntu 20.04 and when I try to make the docker run I have this error:

    No such file or directory (os error 2)
    

    Maybe some package is missing? If I make the installation from source or from linux the same message appear.

    Thank you very much for your help

    opened by victorgomezg93 7
  • Rename backup to dump?

    Rename backup to dump?

    As we rename the cli command from backup to dump, I think we could rename it to Dump in the datastore directory. WDYT?

    Example:

    // datastore/mod.rs
    pub struct Backup {
        pub directory_name: String,
        pub size: usize,
        pub created_at: u128,
        pub compressed: bool,
        pub encrypted: bool,
    }
    

    to

    pub struct Dump {
        pub directory_name: String,
        pub size: usize,
        pub created_at: u128,
        pub compressed: bool,
        pub encrypted: bool,
    }
    
    // datastore/mod.rs
    pub struct IndexFile {
        pub backups: Vec<Backup>,
    }
    

    to

    pub struct IndexFile {
        pub dumps: Vec<Dump>,
    }
    

    This update will lead to a renaming of the key backups to dumps inside the metadata.json file.

    enhancement 
    opened by fabriceclementz 7
  • subsetting strategy

    subsetting strategy

    The documentation on subsets is a bit sparse. Is there a way of creating more complex strategies for subsetting?

    i.e. interactions often occur between tables: products, orders, customers, a percentage of products might leave a lot of empty orders, or a lot of products not bought.

    What if I wanted:

    • a subset of users based on a random sample
    • a subset of those users orders, maybe based on a specific time period (the last three months)
    • a subset of products—maybe all the ones in the orders above, plus a few random extra ones
    opened by janrito 1
  • multiple database name locations in configuration

    multiple database name locations in configuration

    The connection URI includes the database name—but transformers and subsets also ask for a database parameter? does one override the other?

    connection_uri: postgres://user:password@host:port/db
      transformers:
        - database: public
          table: customers
          ....
      database_subset:
        database: public
        table: customers
    

    is the database public? db? can we leave one out?

    opened by janrito 0
  • Interpolation of variables in connection_uri

    Interpolation of variables in connection_uri

    Is there a way to interpolate environment variables in the connection_uri? It would be useful to keep the other bits in version control.

    source:
      connection_uri: postgres://user:$DB_PASSWORD@host:port/db
    
    opened by janrito 0
  • Not working --file option?

    Not working --file option?

    Hi there. I tried to create a dump using MongoDB's manual dump file with reference to this doc. https://www.replibyte.com/docs/guides/create-a-dump#option-2-make-a-dump-manually

    this command is working.

    cat sample_dump | replibyte -c config.yaml dump create -s mongodb -i
    

    but this command infinitely spins a spinner and don't complete.

    replibyte -c config.yaml dump create -s mongodb -i --file sample_dump
    

    I checked the code and it seems to stop at this line. https://github.com/Qovery/Replibyte/blob/d6b35a7455b7b9d6bc1456b3fe1f7644377e7153/replibyte/src/commands/dump.rs#L200

    I guess it occurs other than MongoDB, and #209 is related to this issue. I want to try fixing this bug but I don't know how to fix it.

    Environment

    OS: Mac OS Monterey (12.5) System Model Name: Macbook Pro CPU: Apple M1 Max replibyte version: 0.10.0 config:

    source:
      connection_uri:
        mongodb://root:password@localhost:27017/
    destination:
      connection_uri:
        mongodb://root:password@localhost:27017/
    datastore:
      local_disk:
        dir: /datastore
    
    opened by ishikawa-pro 0
  • Running replibyte results in command being killed

    Running replibyte results in command being killed

    I'm currently running replibyte trying to subset my database.

    My config is something like:

    source:
      connection_uri: $DB_URL
      database_subset:
        database: public
        table: cart
        strategy_name: random
        strategy_options:
          percent: 20
        passthrough_tables: 
          - country
          .... 
          <30 more tables> 
      transformers: 
        - database: public
          table: customer
          columns:
            - name: email
              transformer_name: email
        ... 
        <23 columns and 8 tables>    
    

    This results in replibyte running out of memory (I'm guessing given the screenshot of the process from my activity montitor below, taken right before the process crashed)

    image

    Is there any way to reduce the memory consumption, the database is in total around 250mb (148mb if asking postgres with: SELECT pg_size_pretty( pg_database_size('dbname') )) and 51gb of memory taken up seems excessive.

    opened by pKorsholm 0
Releases(v0.10.0)
Owner
Qovery
Qovery - The simplest way to deploy your apps in the Cloud
Qovery
Provides a Rust-based SQLite extension for using Hypercore as the VFS for your databases.

SQLite and Hypercore A Rust library providing SQLite with an virtual file system to enable Hypercore as a means of storage. Contributing The primary r

Jacky Alciné 14 Dec 5, 2022
Query is a Rust server for your remote SQLite databases and a CLI to manage them.

Query Query is a Rust server for your remote SQLite databases and a CLI to manage them. Table Of Contents Run A Query Server CLI Install Use The Insta

Víctor García 6 Oct 6, 2023
CLI tool to work with Sled key-value databases.

sledtool CLI tool to work with Sled key-value databases. $ sledtool --help Usage: sledtool <dbpath> <command> [<args>] CLI tool to work with Sled da

Vitaly Shukela 27 Sep 26, 2022
Engula empowers engineers to build reliable and cost-effective databases.

Engula is a storage engine that empowers engineers to build reliable and cost-effective databases with less effort and more confidence. Engula is in t

Engula 706 Jan 1, 2023
Sled - the champagne of beta embedded databases

key value buy a coffee for us to convert into databases documentation chat about databases with us sled - it's all downhill from here!!! An embedded d

Tyler Neely 6.6k Jan 8, 2023
Optimistic multi-version concurrency control (MVCC) for main memory databases, written in Rust.

MVCC for Rust This is a work-in-progress the Hekaton optimistic multiversion concurrency control library in Rust. The aim of the project is to provide

Pekka Enberg 32 Apr 20, 2023
This project provides a Rust-based solution for migrating MSSQL databases to MySQL.

MSSQL to MySQL Database Migration A Rust project to migrate MSSQL databases to MySQL, including table structures, column data types, and table data ro

Bitalizer 2 Jul 10, 2023
Rust library and daemon for easily starting postgres databases per-test without Docker

pgtemp pgtemp is a Rust library and cli tool that allows you to easily create temporary PostgreSQL servers for testing without using Docker. The pgtem

Harry Stern 165 Mar 22, 2024
PRQL is a modern language for transforming data — a simpler and more powerful SQL

PRQL Pipelined Relational Query Language, pronounced "Prequel". PRQL is a modern language for transforming data — a simpler and more powerful SQL. Lik

PRQL 6.5k Jan 5, 2023
A simple yet powerful bluetooth client.

Overskride A Bluetooth and (soon to be) Obex client that is straight to the point, DE/WM agnostic, and beautiful (also soon to be) :D Prerequisites gt

kaii 23 Oct 7, 2023
Grsql is a great tool to allow you set up your remote sqlite database as service and CRUD(create/read/update/delete) it using gRPC.

Grsql is a great tool to allow you set up your remote sqlite database as service and CRUD (create/ read/ update/ delete) it using gRPC. Why Create Thi

Bruce Yuan 33 Dec 16, 2022
⚡🦀 🧨 make your rust types fit DynamoDB and visa versa

?? ?? dynomite dynomite makes DynamoDB fit your types (and visa versa) Overview Goals ⚡ make writing dynamodb applications in rust a productive experi

Doug Tangren 197 Dec 15, 2022
asynchronous and synchronous interfaces and persistence implementations for your OOD architecture

OOD Persistence Asynchronous and synchronous interfaces and persistence implementations for your OOD architecture Installation Add ood_persistence = {

Dmitriy Pleshevskiy 1 Feb 15, 2022
Visualize your database schema

dbviz Visualize your database schema. The tool loads database schema and draws it as a graph. Usage $ dbviz -d database_name | dot -Tpng > schema.png

yunmikun2 2 Sep 4, 2022
Macros that allow for implicit await in your async code.

suspend fn Disclaimer: this was mostly made as a proof of concept for the proposal below. I haven't tested if there is a performance cost to this macr

null 6 Dec 22, 2021
SubZero - a standalone web server that turns your database directly into a REST/GraphQL api

What is this? This is a demo repository for the new subzero codebase implemented in Rust. subZero is a standalone web server that turns your database

subZero 82 Jan 1, 2023
Generate type-checked Rust from your PostgreSQL.

Cornucopia Generate type checked Rust from your SQL Install | Example Cornucopia is a small CLI utility resting on postgres designed to facilitate Pos

null 206 Dec 25, 2022
Teach your PostgreSQL database how to speak MongoDB Wire Protocol

“If it looks like MongoDB, swims like MongoDB, and quacks like MongoDB, then it probably is PostgreSQL.” ?? Discord | Online Demo | Intro Video | Quic

Felipe Coury 261 Jun 18, 2023
Document your SQLite tables and columns with in-line comments

sqlite-docs A SQLite extension, CLI, and library for documentating SQLite tables, columns, and extensions. Warning sqlite-docs is still young and not

Alex Garcia 20 Jul 2, 2023