Official Rust implementation of Apache Arrow

The Apache Software Foundation

Last update: Jan 9, 2023

Related tags

Data processing arrow-rs

Overview

Native Rust implementation of Apache Arrow

Welcome to the implementation of Arrow, the popular in-memory columnar format, in Rust.

This part of the Arrow project is divided in 4 main components:

Crate	Description	Documentation
Arrow	Core functionality (memory layout, arrays, low level computations)	(README)
Parquet	Parquet support	(README)
Arrow-flight	Arrow data between processes	(README)
DataFusion	In-memory query engine with SQL support	(README)
Ballista	Distributed query execution	(README)

Independently, they support a vast array of functionality for in-memory computations.

Together, they allow users to write an SQL query or a DataFrame (using the datafusion crate), run it against a parquet file (using the parquet crate), evaluate it in-memory using Arrow's columnar format (using the arrow crate), and send to another process (using the arrow-flight crate).

Generally speaking, the arrow crate offers functionality to develop code that uses Arrow arrays, and datafusion offers most operations typically found in SQL, with the notable exceptions of:

join
window functions

There are too many features to enumerate here, but some notable mentions:

Arrow implements all formats in the specification except certain dictionaries
Arrow supports SIMD operations to some of its vertical operations
DataFusion supports async execution
DataFusion supports user-defined functions, aggregates, and whole execution nodes

You can find more details about each crate in their respective READMEs.

Arrow Rust Community

We use the official ASF Slack for informal discussions and coordination. This is a great place to meet other contributors and get guidance on where to contribute. Join us in the arrow-rust channel.

We use ASF JIRA as the system of record for new features and bug fixes and this plays a critical role in the release process.

For design discussions we generally collaborate on Google documents and file a JIRA linking to the document.

There is also a bi-weekly Rust-specific sync call for the Arrow Rust community. This is hosted on Google Meet at https://meet.google.com/ctp-yujs-aee on alternate Wednesday's at 09:00 US/Pacific, 12:00 US/Eastern. During US daylight savings time this corresponds to 16:00 UTC and at other times this is 17:00 UTC.

Developer's guide to Arrow Rust

How to compile

This is a standard cargo project with workspaces. To build it, you need to have rust and cargo:

cd /rust && cargo build

You can also use rust's official docker image:

docker run --rm -v $(pwd)/rust:/rust -it rust /bin/bash -c "cd /rust && cargo build"

The command above assumes that are in the root directory of the project, not in the same directory as this README.md.

You can also compile specific workspaces:

cd /rust/arrow && cargo build

Git Submodules

Before running tests and examples, it is necessary to set up the local development environment.

The tests rely on test data that is contained in git submodules.

To pull down this data run the following:

git submodule update --init

This populates data in two git submodules:

../parquet_testing/data (sourced from https://github.com/apache/parquet-testing.git)
../testing (sourced from https://github.com/apache/arrow-testing)

By default, cargo test will look for these directories at their standard location. The following environment variables can be used to override the location:

# Optionaly specify a different location for test data
export PARQUET_TEST_DATA=$(cd ../parquet-testing/data; pwd)
export ARROW_TEST_DATA=$(cd ../testing/data; pwd)

From here on, this is a pure Rust project and cargo can be used to run tests, benchmarks, docs and examples as usual.

Running the tests

Run tests using the Rust standard cargo test command:

# run all tests.
cargo test


# run only tests for the arrow crate
cargo test -p arrow

Code Formatting

Our CI uses rustfmt to check code formatting. Before submitting a PR be sure to run the following and check for lint issues:

cargo +stable fmt --all -- --check

Clippy Lints

We recommend using clippy for checking lints during development. While we do not yet enforce clippy checks, we recommend not introducing new clippy errors or warnings.

Run the following to check for clippy lints.

cargo clippy

If you use Visual Studio Code with the rust-analyzer plugin, you can enable clippy to run each time you save a file. See https://users.rust-lang.org/t/how-to-use-clippy-in-vs-code-with-rust-analyzer/41881.

One of the concerns with clippy is that it often produces a lot of false positives, or that some recommendations may hurt readability. We do not have a policy of which lints are ignored, but if you disagree with a clippy lint, you may disable the lint and briefly justify it.

Search for allow(clippy:: in the codebase to identify lints that are ignored/allowed. We currently prefer ignoring lints on the lowest unit possible.

If you are introducing a line that returns a lint warning or error, you may disable the lint on that line.
If you have several lints on a function or module, you may disable the lint on the function or module.
If a lint is pervasive across multiple modules, you may disable it at the crate level.

Git Pre-Commit Hook

We can use git pre-commit hook to automate various kinds of git pre-commit checking/formatting.

Suppose you are in the root directory of the project.

First check if the file already exists:

ls -l .git/hooks/pre-commit

If the file already exists, to avoid mistakenly overriding, you MAY have to check the link source or file content. Else if not exist, let's safely soft link pre-commit.sh as file .git/hooks/pre-commit:

ln -s  ../../rust/pre-commit.sh .git/hooks/pre-commit

If sometimes you want to commit without checking, just run git commit with --no-verify:

git commit --no-verify -m "... commit message ..."

Comments

Define eq_dyn_scalar API

Which issue does this PR close?

Working on this in relation to #984 and #1068 with the end goal being to finalize how we want eq_dyn_scalar to work.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?
parquet arrow arrow-flight

opened by matthewmturner 26
Decimal Precision Validation
Which part is this question about

Generally the approach taken by this crate is that a given ArrayData and by extension Array only contains valid data. For example, a StringArray is valid UTF-8 with each index at a codepoint boundary, a dictionary array only has valid indexes, etc... This allows eliding bound checks on access within kernels.

However, in order for this to be sound, it must be impossible to create invalid ArrayData using safe APIs. This means that safe APIs must either:

Generate valid data by construction - e.g. the builder APIs

Validate data - e.g. ArrayData::try_new

For the examples above incorrect validation can very clearly lead to UB. The situation for decimal values is a bit more confused, in particular I'm not really clear on what the implications of a value that exceeds the precision actually are. However, some notes:

As far as I can tell we don't protect against overflow of normal integer types

We don't have any decimal arithmetic kernels (yet)

The decimal types are fixed bit width and so the precision isn't used to impact their representation

Describe your question

My question boils down to:

What is the purpose of the precision argument? Is it just for interoperability with other non-arrow representations?

Is there a requirement to saturate/error at the bounds of the precision, or can we simply overflow/saturate at the bounds of the underlying representation

Does validating the precision on ingest to ArrayData actually elide any validation when performing computation?

The answers to this will dictate if we can just take a relaxed attitude to precision, and let users opt into validation if they care, and otherwise simply ignore it.

I tried to understand what the C++ implementation is doing, but I honestly got lost. It almost looks like it is performing floating point operations and then rounding them back, which seems surprising...

Additional context
question
opened by tustvold 24
Add `async` into doc features

Signed-off-by: remzi [email protected]

Which issue does this PR close?

Closes #1307. Closes https://github.com/apache/arrow-rs/issues/1617

Rationale for this change

What changes are included in this PR?

Add async to default enabled features, so that the link arrow::async_reader is active.

Are there any user-facing changes?
parquet

opened by HaoYang670 23
Replace azure sdk with custom implementation

Which issue does this PR close?

closes #2176

Rationale for this change

See https://github.com/apache/arrow-rs/issues/2176

What changes are included in this PR?

Replaces azure sdk with a custom implementation based on reqwest. So far this is a rough draft, and surely needs cleanup and some more work on that auth part. I tried to make the aws and azure implementations look as comparable as can be. ~~I also pulled in a new dependency on oauth2 crate. Will evaluate a bit more whzen cleaning up auth, but my feeling was that implementing the oauth flows manually could be another significant piece of work.~~

Any feedback is highly welcome.

cc @tustvold @alamb

Are there any user-facing changes?

Not that I'm aware of, but there is a possibility
api-change object-store

opened by roeap 22
Speed up `Decimal256` validation based on bytes comparison and add benchmark test

Which issue does this PR close?

Closes https://github.com/apache/arrow-rs/issues/2320

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?
parquet arrow

opened by liukun4515 20
support compression for IPC

Which issue does this PR close?

Closes #1709 Closes #70

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?
arrow

opened by liukun4515 20
Split up Arrow Crate
TLDR rather than fighting entropy lets just brute-force compilation

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

The arrow crate is getting rather large, and is starting to show up as a non-trivial bottleneck when compiling code, see #2170. There have been some efforts to reduce the amount of generated code, see #1858, but this is going to be a perpetual losing battle against new feature additions.

I think there are a couple of problems currently:

Limited build parallelism, especially if codegen-units is set low

Upstream crates have to "depend" on functionality they don't need, e.g. parquet depending on compute kernels

Minor changes force large amounts of recompilation, with incremental compilation only helping marginally

Codegen is rarely linear in complexity, consequently larger codegen units take longer than the same amount of code in smaller units

All these conspire to often result in an arrow shaped hole in compilation, where CPUs are left idle.

Some numbers from my local machine

Release with default features: 232 seconds

Release with default features without comparison kernels: 150 seconds

Release with default features without compute kernels: 70 seconds

Release without default features without compute kernels: 60 seconds

The vast majority of the time all bar a single core is idle.

Describe the solution you'd like

I would like to propose we split up the arrow crate, into a number of sub-crates that are then re-exported by the top-level arrow crate. Users can then choose to depend on the batteries included arrow crate, or more granular crates.

Initially I would propose the following split:

arrow-csv: CSV reader support

arrow-ipc: IPC support

arrow-json: JSON support (related to #2300)

arrow-compute: contents of compute module

arrow-test: arrow test_utils (not published)

arrow-core: everything else

There is definitely scope for splitting up the crates further after this, in particular the comparison kernels might be a good candidate to live on their own, but I think lets start small and go from there. I suspect there is a fair amount of disentangling that will be necessary to achieve this.

Describe alternatives you've considered

Feature flags are another way this can be handled, however, they have a couple of limitations:

It is impractical to test the full combinatorial explosion of combinations, which allows for bugs to sneak through

They are unified for a target which limits build parallelism, just because say DataFusion depends on arrow with CSV support, shouldn't force the parquet crate to wait for this to compile before it can start compiling

Poor UX:

Discoverability is limited, it can be hard to determine what features gate what functionality

Hard to determine if the feature flag set is minimal, no equivalent of cargo-udeps

It can be a non-trivial detective exercise to determine why a given feature is being enabled

Necessitate counter-intuitive hacks to play nicely in multi-crate workspaces - see workspace hack

Additional context

@Jimexist recently drove an initiative to do something similar to DataFusion which has worked very well - https://github.com/apache/arrow-datafusion/issues/1750

FYI @alamb @jhorstmann @nevi-me
enhancement
opened by tustvold 19
FFI: ArrowArray::try_from_raw shouldn't clone
Guys, not sure if my understanding is right, but I think this commit will break the design and create memory leak.

If we clone the FFI struct, then it means we need to free the pointer by ourself, but if we free FFI_ArrowArray, then the data in this Array will also be free? Which means we can't free the pointer(until the data are used and ready to free, but in reality we can't hold this useless pointer in a big project for such a long time), which create memory leak.

As to the question @viirya raised in #1333 , when manage memory, the one who allocate it should free it, which means in our case, we need to alloc the struct in rust and pass the pointer to java and then also free the memory in rust.

You can check my code in here: https://github.com/wangfenjin/duckdb-rs/blob/5083d39a4147f8017613304ae5f217a88ac42c2e/src/raw_statement.rs#L58

When I try to upgrade to version 10, memory leak detected and there is no easy way to fix it.

I suggest we revert this commit. cc @alamb @sunchao

Originally posted by @wangfenjin in https://github.com/apache/arrow-rs/issues/1334#issuecomment-1064828113
arrow bug
opened by wangfenjin 19
Add FFI for Arrow C Stream Interface

Which issue does this PR close?

Closes #1348.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?
arrow

opened by viirya 19
Change Field::metadata to HashMap

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Currently Schema::metadata is HashMap<String, String>, whereas Field::metadata is Option<BTreeMap<String, String>>. This is not only inconsistent, but it is unclear why there is an additional Option

Describe the solution you'd like

I would like to change Field::metadata to a HashMap for consistency with Schema

Describe alternatives you've considered

Additional context
parquet arrow enhancement

opened by tustvold 18

`SchemaResult` in IPC deviates from other implementations

Describe the bug

The SchemaResult produced by SchemaAsIpc can be converted to a Schema by the Rust implementation of Apache Arrow Flight but not other implementations of Apache Arrow Flight (tested the Go, Java, and Python implementations).

To Reproduce

For the Rust server, implement FlightService.get_schema() as:

async fn get_schema(
    &self,
    _request: Request<FlightDescriptor>,
) -> Result<Response<SchemaResult>, Status> {
    let tid = Field::new("tid", DataType::Int32, false);
    let timestamp = Field::new("timestamp", DataType::Timestamp(TimeUnit::Millisecond, None), false);
    let value = Field::new("value", DataType::Float32, false);
    let schema = Schema::new(vec![tid, timestamp, value]);

    let options = IpcWriteOptions::default();
    let schema_as_ipc = SchemaAsIpc::new(&schema, &options);
    let schema_result: SchemaResult = schema_as_ipc.into();

    Ok(Response::new(schema_result))
}

Attempt to retrieve and print the Schema using the following Rust code:

use arrow::ipc::convert;
use arrow_flight::flight_service_client::FlightServiceClient;
use arrow_flight::FlightDescriptor;
use tokio::runtime::Runtime;
use tonic::Request;

fn main() {
    let tokio = Runtime::new().unwrap();

    tokio.block_on(async {
        let mut flight_service_client = FlightServiceClient::connect("grpc://127.0.0.1:9999").await.unwrap();
        let flight_descriptor = FlightDescriptor::new_path(vec!["".to_owned()]);
        let request = Request::new(flight_descriptor);
        let schema_result = flight_service_client.get_schema(request).await.unwrap().into_inner();
        let schema = convert::schema_from_bytes(&schema_result.schema).unwrap();
        dbg!(schema);
    });
}

Attempt to retrieve and print the Schema using the following Python code:

from pyarrow import flight
client = flight.FlightClient('grpc://127.0.0.1:9999')
descriptor = flight.FlightDescriptor.for_path("")
schema_result = client.get_schema(descriptor)
print(schema_result.schema)

Expected behavior

The Rust code should successfully retrieve and print the Schema while the Python code should fail due to the following OSError being raised:

Traceback (most recent call last):
  File "get_schema.py", line 5, in <module>
    print(schema_result.schema)
  File "pyarrow/_flight.pyx", line 720, in pyarrow._flight.SchemaResult.schema.__get__
  File "pyarrow/_flight.pyx", line 80, in pyarrow._flight.check_flight_status
  File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
OSError: Invalid flatbuffers message.

Additional context

As the issue was first discovered using a client written in Go, we first raised apache/arrow#13853 as we believed the problem was in the Go implementation of Apache Arrow Flight. But from the comments provided by @zeroshade on that issue, it seems that the Rust implementation of Apache Arrow Flight deviates from the other implementations in how a Schema is serialized. For example, both the C++ and Go implementations include the continuation indicator (0xFFFFFFFF) followed by the message length as a 32-bit integer before the Schema:

use arrow_flight::{SchemaAsIpc, SchemaResult};
use arrow::datatypes::{Schema, TimeUnit, Field, DataType};
use arrow::ipc::writer::IpcWriteOptions;

fn main() {
    let tid = Field::new("tid", DataType::Int32, false);
    let timestamp = Field::new("timestamp", DataType::Timestamp(TimeUnit::Millisecond, None), false);
    let value = Field::new("value", DataType::Float32, false);
    let schema = Schema::new(vec![tid, timestamp, value]);

    let options = IpcWriteOptions::default();
    let schema_as_ipc = SchemaAsIpc::new(&schema, &options);
    let schema_result: SchemaResult = schema_as_ipc.into();
    dbg!(schema_result);
}

16 0 0 0 0 0 10 0 14 0 12 0 11 0 4 0 10 0 0 0 20 0 0 0 0 0 0 1 4 0 10 0 12 0 0 0 8 0 4 0 10 0 0 0 8 0 0 0 8 0 0 0 0 0 0 0 3 0 0 0 136 0 0 0 52 0 0 0 4 0 0 0 148 255 255 255 16 0 0 0 20 0 0 0 0 0 0 3 16 0 0 0 206 255 255 255 0 0 1 0 0 0 0 0 5 0 0 0 118 97 108 117 101 0 0 0 192 255 255 255 28 0 0 0 12 0 0 0 0 0 0 10 32 0 0 0 0 0 0 0 0 0 6 0 8 0 6 0 6 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 9 0 0 0 116 105 109 101 115 116 97 109 112 0 0 0 16 0 20 0 16 0 0 0 15 0 4 0 0 0 8 0 16 0 0 0 24 0 0 0 32 0 0 0 0 0 0 2 28 0 0 0 8 0 12 0 4 0 11 0 8 0 0 0 32 0 0 0 0 0 0 1 0 0 0 0 3 0 0 0 116 105 100 0

#include <iostream>

#include "arrow/type.h"
#include "arrow/buffer.h"
#include "arrow/ipc/writer.h"

int main() {
  std::shared_ptr<arrow::Field> tid = arrow::field("tid", arrow::int32());
  std::shared_ptr<arrow::Field> timestamp = arrow::field("timestamp", arrow::timestamp(arrow::TimeUnit::MILLI));
  std::shared_ptr<arrow::Field> value = arrow::field("value", arrow::float32());

  std::shared_ptr<arrow::Schema> schema_ptr = arrow::schema({tid, timestamp, value});
  arrow::Schema schema = *schema_ptr.get();
  std::shared_ptr<arrow::Buffer> serialized_schema = arrow::ipc::SerializeSchema(schema).ValueOrDie();

  size_t serialized_schema_size = serialized_schema->size();
  for (int index = 0; index < serialized_schema_size; index++) {
    std::cout << unsigned((*serialized_schema)[index]) << ' ';
  }
  std::cout << std::endl;
}

255 255 255 255 224 0 0 0 16 0 0 0 0 0 10 0 12 0 6 0 5 0 8 0 10 0 0 0 0 1 4 0 12 0 0 0 8 0 8 0 0 0 4 0 8 0 0 0 4 0 0 0 3 0 0 0 124 0 0 0 52 0 0 0 4 0 0 0 160 255 255 255 0 0 1 3 16 0 0 0 24 0 0 0 4 0 0 0 0 0 0 0 5 0 0 0 118 97 108 117 101 0 0 0 210 255 255 255 0 0 1 0 204 255 255 255 0 0 1 10 16 0 0 0 32 0 0 0 4 0 0 0 0 0 0 0 9 0 0 0 116 105 109 101 115 116 97 109 112 0 6 0 8 0 6 0 6 0 0 0 0 0 1 0 16 0 20 0 8 0 6 0 7 0 12 0 0 0 16 0 16 0 0 0 0 0 1 2 16 0 0 0 28 0 0 0 4 0 0 0 0 0 0 0 3 0 0 0 116 105 100 0 8 0 12 0 8 0 7 0 8 0 0 0 0 0 0 1 32 0 0 0

package main

import (
    "fmt"
    "github.com/apache/arrow/go/arrow"
    "github.com/apache/arrow/go/arrow/flight"
    "github.com/apache/arrow/go/arrow/memory"
)

func main() {
     schema := arrow.NewSchema(
		[]arrow.Field{
			{Name: "tid", Type: arrow.PrimitiveTypes.Int32},
			{Name: "timestamp", Type: arrow.FixedWidthTypes.Timestamp_ms},
			{Name: "value", Type: arrow.PrimitiveTypes.Float32},
		},
		nil,
	)
    serialized_schema := flight.SerializeSchema(schema,memory.DefaultAllocator)
    fmt.Println(serialized_schema)
}

255 255 255 255 248 0 0 0 16 0 0 0 0 0 10 0 12 0 10 0 9 0 4 0 10 0 0 0 16 0 0 0 0 1 4 0 8 0 8 0 0 0 4 0 8 0 0 0 4 0 0 0 3 0 0 0 148 0 0 0 60 0 0 0 4 0 0 0 136 255 255 255 16 0 0 0 24 0 0 0 0 0 0 3 24 0 0 0 0 0 0 0 0 0 6 0 8 0 6 0 6 0 0 0 0 0 1 0 5 0 0 0 118 97 108 117 101 0 0 0 188 255 255 255 16 0 0 0 24 0 0 0 0 0 0 10 36 0 0 0 0 0 0 0 8 0 12 0 10 0 4 0 8 0 0 0 8 0 0 0 0 0 1 0 3 0 0 0 85 84 67 0 9 0 0 0 116 105 109 101 115 116 97 109 112 0 0 0 16 0 20 0 16 0 0 0 15 0 8 0 0 0 4 0 16 0 0 0 16 0 0 0 24 0 0 0 0 0 0 2 28 0 0 0 0 0 0 0 8 0 12 0 8 0 7 0 8 0 0 0 0 0 0 1 32 0 0 0 3 0 0 0 116 105 100 0 255 255 255 255 0 0 0 0

arrow arrow-flight bug help wanted

opened by skejserjensen 18

Use concurrency groups instead if the cancle workflow

Is your feature request related to a problem or challenge? Please describe what you are trying to do. The cancel.yml workflow is no longer necessary as GitHub has integrated this feature into gha: https://docs.github.com/en/actions/using-jobs/using-concurrency#example-only-cancel-in-progress-jobs-or-runs-for-the-current-workflow

Describe the solution you'd like Add concurrency groups to the separate workflows, that way the cancel action used in cancel.yml as an external dependency can be removed. Example: https://github.com/apache/arrow/blob/master/.github/workflows/cpp.yml#L44-L46
enhancement

opened by assignUser 0
Fix: Added support to cast string without time

Which issue does this PR close?

Closes #3492

Rationale for this change

Support cast string like 2022-01-08

What changes are included in this PR?

arrow-rs/arrow-cast/src/parse.rs

Are there any user-facing changes?

No
arrow

opened by csphile 0
Support casting strings like `'2001-01-01'` to timestamp
Is your feature request related to a problem or challenge? Please describe what you are trying to do.

We are trying to use '2001-01-01' as a timestamp (as an argument to the date_bin function in DataFusion).

However, we get the following error

Arrow error: Cast error: Error parsing '2001-01-01' as timestamp

As a workaround we can add 00:00:00 to the end and that works:

'2001-01-01 00:00:00'

Describe the solution you'd like I would like '2001-01-01' to be parsed the same as '2001-01-01 00:00:00'

Describe alternatives you've considered I can special case this downstream in datafusion

Additional context I believe this can be achieved by adding the appropriate support to string_to_timestamp_nanos and adding a few tests

https://github.com/apache/arrow-rs/blob/c74665808439cb7020fb1cfb74b376a136c73259/arrow-cast/src/parse.rs#L23-L71
good first issue enhancement
opened by alamb 4
Fix negative interval prettyprint

Which issue does this PR close?

Related to https://github.com/apache/arrow-datafusion/issues/4220

Rationale for this change

Current issue where nanoseconds/milliseconds part can print the minus sign if negative, which can double up with the seconds, not to mention is incorrectly placed (after the decimal). This fixes it to only have a single sign before the seconds, if negative

What changes are included in this PR?

Are there any user-facing changes?
arrow

opened by Jefffrey 0
Support GCP Workload Identity

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Currently the object_store crate only supports obtaining credentials using a provided service account, it would be beneficial if it could also optionally obtain credentials from its environment. This would be consistent with the behaviour of the aws and azure implementations, and avoids requiring users to handle sensitive long-term service account credentials.

Describe the solution you'd like

If no service account is specified, it should fallback to trying to get credentials from a metadata endpoint.

This is documented here

Describe alternatives you've considered

Additional context
good first issue enhancement help wanted object-store

opened by tustvold 2
feat: Allow providing a service account key directly for GCS

Which issue does this PR close?

Closes https://github.com/apache/arrow-rs/issues/3488

Rationale for this change

Use case:

We're storing service accounts keys external to where the object store client is being created. We do not want to have to write the key to a file before creating the object store client. This change allows for providing the key directly.

What changes are included in this PR?

Adds an appropriate method to the GCS object store builder for supplying the service account key directly. Only one of service account path or service account key may be provided, otherwise build will return an appropriate error.

Are there any user-facing changes?

An additional method on GCS object store builder.

There are currently no breaking changes, however I believe the ServiceAccount variant for the GoogleConfigKey should be renamed to ServiceAccountPath to better represent what that option is for. I held off on making that change because I saw that the changelog was already generated for 0.5.3 which includes the new GoogleConfigKey stuff, making that a breaking change. If that's an acceptable breaking change, I'm down to go ahead and do that in this PR as well.
object-store

opened by scsmithr 2