Utilities and tools based around Amazon S3 to provide convenience APIs in a CLI

Overview

s3-utils

Crates.io Build Status

Utilities and tools based around Amazon S3 to provide convenience APIs in a CLI.

This tool contains a small set of command line utilities for working with Amazon S3, focused on including features which are not readily available in the S3 API. It has evolved from various scripts and use cases during work life, but packaged into something a little more useful. It's likely that more tools will be added over time as they become useful and/or required.

All S3 interaction is controlled by rusoto_s3.

Installation

You can install s3-utils from either this repository, or from Crates (once it's published):

# install from Cargo
$ cargo install s3-utils

# install the latest from GitHub
$ cargo install --git https://github.com/whitfin/s3-utils.git

Commands

Credentials can be configured by following the instructions on the AWS Documentation. Almost every command you might use will take this shape:

$ AWS_ACCESS_KEY_ID=MY_ACCESS_KEY_ID \
    AWS_SECRET_ACCESS_KEY=MY_SECRET_ACCESS_KEY \
    AWS_DEFAULT_REGION=MY_AWS_REGION \
    s3-utils <subcommand> <arguments>

There are several switches available on almost all commands (such as -d to dry run an operation), but please check the command documentation before assuming it does exist. Each command exposes a -h switch to show a help menu, as standard. The examples below will omit the AWS_ environment variables for brevity.

concat

This command is focused around concatenation of files in S3. You can concatenate files in a basic manner just by providing a source pattern, and a target file path:

$ s3-utils concat my.bucket.name 'archives/*.gz' 'archive.gz'

If the case you're working with long paths, you can add a prefix on the bucket name to avoid having to type it all out multiple times. In the following case, *.gz and archive.gz are relative to the my/annoyingly/nested/path/ prefix.

$ s3-utils concat my.bucket.name/my/annoyingly/nested/path/ '*.gz' 'archive.gz'

You can also use pattern matching (driven by the official regex crate), to use segments of the source paths in your target paths. Here is an example of mapping a date hierarchy (YYYY/MM/DD) to a flat structure (YYYY-MM-DD):

$ s3-utils concat my.bucket.name 'date-hierachy/(\d{4})/(\d{2})/(\d{2})/*.gz' 'flat-hierarchy/$1-$2-$3.gz'

In this case, all files in 2018/01/01/* would be mapped to 2018-01-01.gz. Don't forget to add single quotes around your expressions to avoid any pesky shell expansions!

In order to concatenate files remotely (i.e. without pulling them to your machine), this tool uses the Multipart Upload API of S3. This means that all limitations of that API are inherited by this tool. Usually, this isn't an issue, but one of the more noticeable problems is that files smaller than 5MB cannot be concatenated. To avoid wasted AWS calls, this is currently caught in the client layer and will result in a client side error. Due to the complexity in working around this, it's currently unsupported to join files with a size smaller than 5MB.

rename

The rename command offers dynamic file renaming using patterns, without having to download files. The main utility in this command is being able to use patterns to rename large amounts of files in a single command.

You can rename files in a basic manner, such as simply changing their prefix:

$ s3-utils rename my.bucket.name 'my-directory/(.*)' 'my-new-directory/$1'

Although basic, this shows how you can use captured patterns in your renaming operations. This allows you to do much more complicated mappings, such as transforming an existing tree hierarchy into flat files:

$ s3-utils rename my.bucket.name '(.*)/(.*)/(.*)' '$1-$2-$3'

This is a very simple model, but provides a pretty flexible tool to change a lot of stuff pretty quickly.

Due to limitations in the current AWS S3 API, this command is unable to work with files larger than 5GB in size. At some point we may add a workaround for this, but for now this is likely to throw an error.

report

Reports generate metadata about an S3 bucket or subdirectory thereof. They can be used to inspect things like file sizes, modification dates, etc. This command is extremely simple as it's fairly un-customizable:

$ s3-utils report my.bucket.name
$ s3-utils report my.bucket.name/my/directory/path

This generates shell output which follows a relatively simple format, meant to be easily extensible and (hopefully) convenient in shell pipelines. The general format is pretty stable, but certain formatting may change over time (spacing, number formatting, etc).

Below is an example based on a real S3 bucket (although with fake names):

[general]
total_time=7s
total_space=1.94TB
total_files=51,152

[file_size]
average_file_size=37.95MB
average_file_bytes=37949529
largest_file_size=1.82GB
largest_file_bytes=1818900684
largest_file_name=path/to/my_largest_file.txt.gz
smallest_file_size=54B
smallest_file_bytes=54
smallest_file_name=path/to/my_smallest_file.txt.gz
smallest_file_others=12

[extensions]
unique_extensions=1
most_frequent_extension=gz

[modification]
earliest_file_date=2016-06-11T17:36:57.000Z
earliest_file_name=path/to/my_earliest_file.txt.gz
earliest_file_others=3
latest_file_date=2017-01-01T00:03:19.000Z
latest_file_name=path/to/my_latest_file.txt.gz

This sample report is based on the initial builds of this subcommand, so depending on when you visit this tool there may be more (or less) included in the generated report.

You might also like...
A set of utilities to better enable polymorphic behavior in Rust

Polymorph A set of utilities to better enable polymorphic behavior in Rust. Introduction Rust is a wonderful language, with a strong emphasis on fast,

An asynchronous IO utilities crate powered by tokio.

An asynchronous IO utilities crate powered by tokio.

A box full of utilities, a unworthy replacement for coreutils / busybox / toybox.

Gearbox A box full of utilities, a unworthy replacement for coreutils / busybox / toybox. List of content How to setup Systems Ubuntu Arch How to buil

Common utilities code used across Fulcrum Genomics Rust projects

fgoxide Common utilities code used across Fulcrum Genomics Rust projects. Why? There are many helper functions that are used repeatedly across project

ffizz is a library of utilities for exporting Rust libs for use in other languages

ffizz ffizz is a library of utilities for exporting Rust libs for use in other languages. FFI generally requires a lot of unsafe code, which in turn r

Utilities for integrating Datadog with opentelemetry + tracing in rust

Non-official datadog tracing and log correlation for Rust services. This crate contains the necessary glue to bridge the gap between OpenTelemetry, tr

Parses COVID-19 testing data from DC government ArcGIS APIs

covid-dc Parses COVID-19 testing data from DC government ArcGIS APIs Example debug output from cargo run RapidSite { attributes: RapidSiteAttribut

Adapters to convert between different writable APIs.

I/O adapters This crate provides adapters to compose writeable traits in the standard library. The following conversions are available: fmt::Write -

Fast, compact and all-around subdomain enumeration tool written in Rust
Fast, compact and all-around subdomain enumeration tool written in Rust

Fast, compact and all-around subdomain enumeration tool written in Rust, which uses dns bruteforce, internet search and recursive http content search.

Comments
  • concat should support parallel writing

    concat should support parallel writing

    The reason I was interested in this tool was to merge ~2TB (3gb chunks) of data without downloading it to a middleman server first. Doing this in serial will take too long. I'm not great with rust but I I ended up creating a copycat project in javascript to do this https://gitlab.com/joshwillik/s3-concat (in case you want to read my implementation)

    opened by JoshWillik 0
  • s3-utils concat did not merge

    s3-utils concat did not merge

    I tested s3-utils rename and report, they did what was expected. But s3-utils concat did not merge any files, no result file, without or with error. s3-utils concat bucketname path/to/.gz' 'merged.gz' ### no error, no result either. s3-utils concat bucketname/path/to/ '.gz' 'merged.gz' ### Error: regex parse error: *.gz

    I installed s3-utils via: cargo install --git https://github.com/whitfin/s3-utils.git Thanks.

    opened by pinetree1 1
Releases(v1.1.0)
Owner
Isaac Whitfield
Fan of all things automated. OSS when applicable. Author of Cachex for Elixir. Senior Software Engineer at Axway. Intelligence wanes without practice.
Isaac Whitfield
Convenience wrapper for cargo buildscript input/output

A convenience wrapper for cargo buildscript input/output. Why? The cargo buildscript API is (necessarily) stringly-typed.

Christopher Durham 6 Sep 25, 2022
CLI & Utilities for fractional.art

fractional-rs CLI & Utilities for fractional.art CLI Usage The CLI uses Flashbots' relay to submit the transactions. No bribe is required as you pay v

Georgios Konstantopoulos 20 Dec 27, 2022
This contract is to provide vesting account feature for the both cw20 and native tokens, which is controlled by a master address

Token Vesting This contract is to provide vesting account feature for the both cw20 and native tokens, which is controlled by a master address. Instan

yys 7 Oct 7, 2022
Alternative future adapters that provide cancel safety.

cancel-safe-futures Alternative futures adapters that are more cancel-safe. What is this crate? The futures library contains many adapters that make w

Oxide Computer Company 12 Jul 2, 2023
Bolt is a desktop application that is designed to make the process of developing and testing APIs easier and more efficient.

Bolt ⚡ Bolt is a desktop application that is designed to make the process of developing and testing APIs easier and more efficient. Quick start ??‍??

0xHiro 6 Mar 26, 2023
Fast and scalable phylogenomic utilities 🐱 .

ogcat Fast and scalable phylogenomic utilities ?? . Installation Prebuilt binaries See releases. The musl binary for Linux should be the most compatib

Baqiao Liu 2 Dec 1, 2022
Build and deploy cross platform bioinformatic utilities with Rust.

The Bioinformatics Toolkit RUST-backed utilities for bioinformatic data processing. Get started The fastest way to get started it to download the appl

null 5 Sep 8, 2023
hy-rs, pronounced high rise, provides a unified and portable to the hypervisor APIs provided by various platforms.

Introduction The hy-rs crate, pronounced as high rise, provides a unified and portable interface to the hypervisor APIs provided by various platforms.

S.J.R. van Schaik 12 Nov 1, 2022
Utilities for interacting with the Behringer X-Touch Mini MIDI controller

xtouchmini Collection of utilities for interacting with the Behringer X-Touch Mini MIDI controller, including reading button/knob/fader inputs and sen

null 5 Nov 11, 2021
Utilities to gather data out of roms. Written in Rust. It (should) support all types.

snesutilities Utilities to gather data out of roms. Written in Rust. It (should) support all types. How Have a look at main.rs: use snesutilities::Sne

Layle | Luca 5 Oct 12, 2022