Lightweight Google Cloud Storage sync Rust Client with better performance than gsutil rsync

Overview

gcs-rsync

build codecov License:MIT docs.rs crates.io crates.io (recent)

Lightweight and efficient Rust gcs rsync for Google Cloud Storage.

gcs-sync is faster than gsutil rsync when files change a lot while performance is similar to gsutil when there is no changes.

How to install

cargo install --example gcs-rsync gcs-rsync

~/.cargo/bin/gcs-rsync

Benchmark

Important note about gsutil: The gsutils ls command does not list all object items by default but instead list all prefixes while adding the -r flag slowdown gsutil performance. The ls performance command is very different to the rsync implementation.

new files only (first time sync)

  • gcs-sync: 2.2s/7MB
  • gsutil: 9.93s/47MB

winner: gcs-sync

gcs-sync sync bench

rm -rf ~/Documents/test4 && cargo build --release --examples && /usr/bin/time -lp -- ./target/release/examples/bucket_to_folder_sync
real         2.20
user         0.13
sys          0.21
             7606272  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                1915  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                 394  messages sent
                1255  messages received
                   0  signals received
                  54  voluntary context switches
                5814  involuntary context switches
           636241324  instructions retired
           989595729  cycles elapsed
             3895296  peak memory footprint

gsutil sync bench

rm -rf ~/Documents/gsutil_test4 && mkdir ~/Documents/gsutil_test4 && /usr/bin/time -lp --  gsutil -m -q rsync -r gs://dev-bucket/sync_test4/ ~/Documents/gsutil_test4/
Operation completed over 215 objects/50.3 KiB.
real         9.93
user         8.12
sys          2.35
            47108096  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              196391  page reclaims
                   1  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
               36089  messages sent
               87309  messages received
                   5  signals received
               38401  voluntary context switches
               51924  involuntary context switches
            12986389  instructions retired
            12032672  cycles elapsed
              593920  peak memory footprint

no change (second time sync)

  • gcs-sync: 1.79s/8MB
  • gsutil: 2.18s/47MB

winner: no clear winner, but at least gcs-sync perf is similar to gsutil rync when there is no modification (which is quite rare).

gcs-sync sync bench

cargo build --release --examples && /usr/bin/time -lp -- ./target/release/examples/bucket_to_folder_sync
real         1.79
user         0.13
sys          0.12
             7864320  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                1980  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                 397  messages sent
                1247  messages received
                   0  signals received
                  42  voluntary context switches
                4948  involuntary context switches
           435013936  instructions retired
           704782682  cycles elapsed
             4141056  peak memory footprint

gsutil sync bench

/usr/bin/time -lp --  gsutil -m -q rsync -r gs://test-bucket/sync_test4/ ~/Documents/gsutil_test4/
real         2.18
user         1.37
sys          0.66
            46899200  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              100108  page reclaims
                1732  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                6311  messages sent
               12752  messages received
                   4  signals received
                6145  voluntary context switches
               14219  involuntary context switches
            13133297  instructions retired
            13313536  cycles elapsed
              602112  peak memory footprint

gsutil rsync config

gsutil -m -q rsync -r -d ./your-dir gs://your-bucket
/usr/bin/time -lp --  gsutil -m -q rsync -r gs://dev-bucket/sync_test4/ ~/Documents/gsutil_test4/

About authentication

All default functions related to authentication use GOOGLE_APPLICATION_CREDENTIALS env var as default conf like official Google libraries do on other languages (golang, dotnet)

Other functions (from and from_file) provide the custom integration mode.

For more info about OAuth2, see the related README in the oauth2 mod.

How to run tests

Unit tests

cargo test --lib

Integration tests + Unit tests

TEST_SERVICE_ACCOUNT=<PathToAServiceAccount> TEST_BUCKET=<BUCKET> TEST_PREFIX=<PREFIX> cargo test --no-fail-fast

Examples

Upload object

cargo run --release --example upload_object "<YourBucket>" "<YourPrefix>" "<YourFilePath>"

Download object

cargo run --release --example download_object "<YourBucket>" "<YourObjectName>" "<YourAbsoluteExistingDirectory>"

Delete object

cargo run --release --example delete_object "<YourBucket>" "<YourPrefix>/<YourFileName>"

List objects

cargo run --release --example list_objects "<YourBucket>" "<YourPrefix>"

List objects with default service account

GOOGLE_APPLICATION_CREDENTIALS=<PathToJson> cargo r --release --example list_objects_service_account "<YourBucket>" "<YourPrefix>"

List objects

list a bucket having more than 60K objects

time cargo run --release --example list_objects "<YourBucket>" "<YourPrefixHavingMoreThan60K>" | wc -l

Profiling

Humans are terrible at guessing-about-performance

export CARGO_PROFILE_RELEASE_DEBUG=true
sudo -- cargo flamegraph --example list_objects "<YourBucket>" "<YourPrefixHavingMoreThan60K>"
cargo build --release --examples && /usr/bin/time -lp -- ./target/release/examples/list_objects "<YourBucket>" "<YourPrefixHavingMoreThan60K>"

Native bin build (static shared lib)

docker rust rust:alpine3.14
apk add --no-cache musl-dev pkgconfig openssl-dev

LDFLAGS="-static -L/usr/local/musl/lib" LD_LIBRARY_PATH=/usr/local/musl/lib:$LD_LIBRARY_PATH CFLAGS="-I/usr/local/musl/include" PKG_CONFIG_PATH=/usr/local/musl/lib/pkgconfig cargo build --release --target=x86_64-unknown-linux-musl --example bucket_to_folder_sync

TODO

  • OAuth2 service account (default, from and from_file)
  • OAuth2 dev (default, from and from_file)
  • OAuth2 Integration tests + examples
  • Useful diagnostic on error (raw json response)
  • List objects with better performance than gsutil by supporting GCS Partial Response
  • Upload/Download/Get/Delete objects + Integrations tests and examples
  • Sync local folder (one way sync without delete remote files with crc32c support)
  • Mirror local folder (sync + delete remotes files)
  • Benchmarks
  • Sync/Mirror integration tests
  • Doc crate
  • CI/CD
  • Publish crate
You might also like...
A POSIX select I/O Multiplexing Rust library.

A POSIX select I/O Multiplexing Rust library.

Supertag is a tag-based filesystem, written in Rust, for Linux and MacOS
Supertag is a tag-based filesystem, written in Rust, for Linux and MacOS

Supertag is a tag-based filesystem, written in Rust, for Linux and MacOS. It provides a tag-based view of your files by removing the hierarchy constraints typically imposed on files and folders. In other words, it allows you to think about your files not as objects stored in folders, but as objects that can be filtered by folders.

馃 How to minimize Rust binary size 馃摝

Minimizing Rust Binary Size To help this project reach more users, consider upvoting the min-sized-rust Stack Overflow answer. This repository demonst

Spacedrive is an open source cross-platform file explorer, powered by a virtual distributed filesystem written in Rust.
Spacedrive is an open source cross-platform file explorer, powered by a virtual distributed filesystem written in Rust.

Spacedrive A file explorer from the future. spacedrive.com 禄 Download for macOS 路 Windows 路 Linux 路 iOS 路 watchOS 路 Android ~ Links will be added once

Explain semver requirements by converting them into less than, greater than, and/or equal to form.

semver-explain Convert SemVer requirements to their most-obvious equivalents. semver-explain is a CLI tool to explain Semantic Versioning requirements

Provide CRUD CLI for Moco Activities with Jira Cloud Sync Option for faster time tracking.

Moco CLI Provide CRUD CLI for Moco Activities with Jira Cloud Sync Option for faster time tracking. Available commands Login Jira Must be called befor

Simple utility to backup/sync data between devices to the cloud

RSink Simple utility to backup/sync data between devices to the cloud Features 馃殌 Blazingly Fast 鉀 Lightweight 馃洘 Cross-platform, runs everywhere 馃搧 S

Lupus is a utility to administer backups with future integration with rsync

Lupus is a utility to administer backups with future integration with rsync. Many other features are either included or planned such as chat bridges using rcon and or parsing the pipe output from programs/games.

Starlight is a JS engine in Rust which focuses on performance rather than ensuring 100% safety of JS runtime.

starlight Starlight is a JS engine in Rust which focuses on performance rather than ensuring 100% safety of JS runtime. Features Bytecode interpreter

Google Cloud Client Libraries for Rust.

google-cloud-rust Rust packages for Google Cloud Platform services. Providing a high level API for gRPC API like Google Cloud Go. Component google-clo

Filen.io is a cloud storage provider with an open-source desktop client.

Library to call Filen.io API from Rust Filen.io is a cloud storage provider with an open-source desktop client. My goal is to write a library which ca

The rust client for CeresDB. CeresDB is a high-performance, distributed, schema-less, cloud native time-series database that can handle both time-series and analytics workloads.

The rust client for CeresDB. CeresDB is a high-performance, distributed, schema-less, cloud native time-series database that can handle both time-series and analytics workloads.

Command-line tool to generate Rust code for Google Cloud Spanner

nene nene is a command-line tool to generate Rust code for Google Cloud Spanner. nene uses database schema to generate code by using Information Schem

A wrapper for the Google Cloud DNS API

cloud-dns is a crate providing a client to interact with Google Cloud DNS v1

A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy
A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy

A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy

A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy
A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy

A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy

loc is a tool for counting lines of code. It's a rust implementation of cloc, but it's more than 100x faster.

2019-10-07: I really haven't been on top of accepting pull requests or looking at issues, you guy should definitely look at SCC. It's faster and more

Conditional compilation using boolean expression syntax, rather than any(), all(), not()

Conditional compilation expressions Conditional compilation using boolean expression syntax, rather than any(), all(), not(). [dependencies] efg = "0.

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values. Authors: Sanjay Ghem

Comments
  • [Reworked] GCP metadata server credentials support

    [Reworked] GCP metadata server credentials support

    This is a picked version of #16 of @bryancymo to support google metadata token api integrated into one gcs-rsync example bin.

    This version uses a dyn trait over TokenGenerator to avoid static dispatch issue when it comes to use the api.

    In the end, only one gcs-rsync bin will be available and the feature can be activated by adding a flag : -u, --use-metadata-token-api

    鈿狅笍 This version breaks the compatibility over the TokenGenerator dyn trait

    • [x] Use TokenGenerator dyn trait to switch between AuthorizedUser/Metadata in gcs-rsync example
    • [x] Fix new clippy errors
    • [x] Upgrade to Rust 2021
    • [x] Provide a new conf to make integration tests working
    opened by cboudereau 4
  • GCP metadata server credentials support

    GCP metadata server credentials support

    Added credentials functions to acquire tokens from the GCP Metadata servers when running on GCP infrastructure. This allows the application to utilise the VM's native service account or in case of GKE the Workload Identity service account, in doing so eliminating the need to embed credentials as files.

    opened by bryancymo 3
  • feat: publish as docker image

    feat: publish as docker image

    publish as docker image

    • reqwest rustls-tls for alpine based image
    • workflow publish/push to push image to docker hub
    • readme docker commands
    • dockerfile and ignore
    opened by cboudereau 1
  • rsync from a list of files

    rsync from a list of files

    Hi,

    I have a huuge dataset and I only want to sync some of the files. The traditional rsync allows for that (using --files-from), but the rsync on gsutil does not have this functionality.

    Does your implementation allow such a thing? If not, can this be implemented?

    opened by bernardohenz 2
Releases(v0.2.1)
Merge together and efficiently time-sort compressed .pcap files stored in AWS S3 object storage (or locally) to stdout for pipelined processing.

Merge together and efficiently time-sort compressed .pcap files stored in AWS S3 object storage (or locally) to stdout for pipelined processing. High performance and parallel implementation for > 10 Gbps playback throughput with large numbers of files (~4k).

null 4 Aug 19, 2022
鈿 Garry's Mod module that boosts performance by moving -condebug file I/O to a separate thread

This is a Garry's Mod server module that moves -condebug file I/O out of the main thread, which should significantly improve performance for noisy servers.

William 30 Nov 21, 2022
Minty is an amazingly fast file deduplication app built in rust with a rust user interface.

minty Project Minty has a new look and feel!!! Minty is an amazingly fast file deduplication app built in rust with a rust user interface. I say super

null 26 Nov 20, 2022
High level FFI binding around the sys mount & umount2 calls, for Rust

sys-mount High level FFI bindings to the mount and umount2 system calls, for Rust. Examples Mount This is how the mount command could be written with

Pop!_OS 29 Nov 7, 2022
ergonomic paths and files in rust

path_abs: ergonomic paths and files in rust. This library aims to provide ergonomic path and file operations to rust with reasonable performance. See

Rett Berg 45 Oct 29, 2022
Temporary directory management for Rust

tempdir A Rust library for creating a temporary directory and deleting its entire contents when the directory is dropped. Documentation Deprecation No

null 133 Jun 1, 2022
Temporary file library for rust

tempfile A secure, cross-platform, temporary file library for Rust. In addition to creating temporary files, this library also allows users to securel

Steven Allen 754 Nov 26, 2022
Extended attribute library for rust.

xattr A small library for setting, getting, and listing extended attributes. Supported Platforms: Linux, MacOS, FreeBSD, and NetBSD. API Documentation

Steven Allen 33 Nov 12, 2022
Rust implemention of Ascon

Ascon Pure Rust implementation of the lightweight Authenticated Encryption and Associated Data (AEAD) Ascon-128 and Ascon-128a. Security Notes This cr

Sebastian Ramacher 4 May 28, 2022