An experimental Athena extension for DuckDB ๐Ÿค

Overview

DuckDB Athena Extension

WARNING This is a work in progress - things may or may not work as expected ๐Ÿง™โ€โ™‚๏ธ

Limitations

  • Only the default database is supported
  • Not all data types are implemented yet
  • 10,000 results are returned by default (use maxrows=-1 to return everything)
  • Pushdown predicates are not supported

Getting started

The Athena extension is supported in DuckDB v0.7.0 and up. To install the extension, start duckdb with the unsigned parameter.

> duckdb -unsigned
v0.7.1 b00b93f0b1
D 

The first time you use the extension, you need to install it from a custom repository.

SET custom_extension_repository='d2j9pg7mqm9we6.cloudfront.net/athena/latest';
INSTALL athena;

Then LOAD the extension. You only need to run the INSTALL command once.

LOAD athena;

You can now extract data from tables in your default data catalog.

select * from athena_scan("noaa_gsod_pds", "s3://results-bucket/prefix");

Warning To prevent runaway queries, the extension only returns 10,000 rows by default. If you'd like to return everything, you can add maxrows=-1 as a parameter inside the function.

select * from athena_scan("noaa_gsod_pds", "s3://results-bucket/prefix", maxrows=-1);

Filter pushdown is not yet supported so the extension will scan the entire table.

Note The extension uses your environment variables to figure out region and credentials. Make sure to have your access key/secret set.

Development

  • Clone the repo with submodules
git clone https://github.com/dacort/duckdb-athena-extension.git --recurse-submodules
  • Build
cd duckdb-athena-extension
make release
  • Start up duckdb with the -unsigned parameter and your desired AWS_REGION
AWS_REGION=us-east-1 build/debug/duckdb -unsigned
v0.7.0 f7827396d7
Enter ".help" for usage hints.
D 
  • Load the extension
load 'build/debug/extension/duckdb-athena-extension/athena.duckdb_extension';
  • Query a single table, also providing where S3 results are written to
select * from athena_scan('table_name', 's3://<bucket>/athena-results/);

Warning: 10,000 results will be returned by default! Use maxrows=-1 to return the entire table.

D select * from athena_scan("amazon_reviews_parquet");
Running Athena query, execution id: 152a20c7-ff32-4a19-bb71-ae0135373ca6
State: Queued, sleep 5 secs ...
Total execution time: 1307 millis
100% โ–•โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ– 
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ marketplace โ”‚ customer_id โ”‚   review_id    โ”‚ product_id โ”‚ product_parent โ”‚ โ€ฆ โ”‚  vine   โ”‚ verified_purchase โ”‚   review_headline    โ”‚     review_body      โ”‚   review_date   โ”‚ year  โ”‚
โ”‚   varchar   โ”‚   varchar   โ”‚    varchar     โ”‚  varchar   โ”‚    varchar     โ”‚   โ”‚ varchar โ”‚      varchar      โ”‚       varchar        โ”‚       varchar        โ”‚      int64      โ”‚ int32 โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ US          โ”‚ 37441986    โ”‚ R2H287L0BUP89U โ”‚ B00CT780C2 โ”‚ 473048287      โ”‚ โ€ฆ โ”‚ N       โ”‚ Y                 โ”‚ Perfect Gift         โ”‚ I love giving my sโ€ฆ  โ”‚ 140454171422720 โ”‚     0 โ”‚
โ”‚ US          โ”‚ 20676035    โ”‚ R1222MJHP5QWXE โ”‚ B004LLILFA โ”‚ 361255549      โ”‚ โ€ฆ โ”‚ N       โ”‚ Y                 โ”‚ Five Stars           โ”‚ Great gift for outโ€ฆ  โ”‚           16170 โ”‚  2014 โ”‚
โ”‚ US          โ”‚ 45090731    โ”‚ R32ECJRNTB61K8 โ”‚ B004LLIL4G โ”‚ 307223063      โ”‚ โ€ฆ โ”‚ N       โ”‚ Y                 โ”‚ happy birthday card  โ”‚ gift cards from Amโ€ฆ  โ”‚ 140454171423232 โ”‚     0 โ”‚
โ”‚ US          โ”‚ 2207141     โ”‚ RLTEU3JZ1IJAA  โ”‚ B004LLILDM โ”‚ 87389551       โ”‚ โ€ฆ โ”‚ N       โ”‚ Y                 โ”‚ Five Stars           โ”‚ gracias.             โ”‚           16391 โ”‚  2014 โ”‚
โ”‚ US          โ”‚ 15258       โ”‚ R1ZAX1TN66QOU6 โ”‚ B004LLIKVU โ”‚ 473048287      โ”‚ โ€ฆ โ”‚ N       โ”‚ Y                 โ”‚ easy breezy          โ”‚ gift card was sentโ€ฆ  โ”‚ 140454171424000 โ”‚     0 โ”‚
โ”‚ ยท           โ”‚    ยท        โ”‚       ยท        โ”‚     ยท      โ”‚    ยท           โ”‚ ยท โ”‚ ยท       โ”‚ ยท                 โ”‚     ยท                โ”‚    ยท                 โ”‚             ยท   โ”‚    ยท  โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 999 rows (40 shown)                                                                                                                                          15 columns (11 shown) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Credits

Comments
  • Add pagination

    Add pagination

    The initial implementation does not include pagination. ๐Ÿ˜ข

    lance has a good example of how this could be implemented.

    let batch = match crate::RUNTIME.block_on(async { (*(*init_data).stream).next().await }) {
    

    Essentially we can create a stream using GetQueryResultsPaginator and have .next() called until no results are left.

    opened by dacort 0
  • Add column selection

    Add column selection

    By default, we do a SELECT * to query Athena.

    DuckDB can send the projected columns to the extension, which is what the postgres scanner does. We should do similar.

    opened by dacort 1
  • [design] Decide on other functions to implement

    [design] Decide on other functions to implement

    athena_scan is the most basic thing to implement, but scans an entire table. Unfortunately, the way Athena works, it will be difficult to optimize that for large tables. And in most cases, I'm assuming folks are going to want a small slice of a table so at the very least, we'll need a pushdown function. It could be interesting to utilize UNLOAD, though, and then let DuckDB load the parquet files from S3.

    • athena_scan - just returns all the data from a single table
    • athena_scan_pushdown - similar to the postgres scanner, returns all the data filtered by certain predicates/partitions
    • athena_unload - Utilizes an UNLOAD query in Athena to write results to parquet in S3, then duckdb can just load the parquet files.
    • athena_query - Runs an athena query and returns the results
    opened by dacort 1
  • Add support for other databases

    Add support for other databases

    Currently only the default database is supported. We should add support for selecting data from different databases. Maybe simply as part of the first athena_scan parameter? So instead of athena_scan("table_name") it could be athena_scan("database.table_name").

    opened by dacort 0
  • Add pushdown support

    Add pushdown support

    Currently this extension scans an entire Athena table, which is extremely inefficient. ๐Ÿ˜

    DuckDB can send the table filters to the extension, which is what the postgres scanner does. We should do similar.

    opened by dacort 1
Owner
Damon P. Cortesi
Damon P. Cortesi
Annotation to easily define ad-hoc / one-shot extension traits

Annotation to easily define ad-hoc / one-shot extension traits

Daniel Henry-Mantilla 2 Apr 19, 2022
Tons of extension utility functions for Rust

LazyExt Tons of extension utility functions for Rust. English | ็ฎ€ไฝ“ไธญๆ–‡ Status Name Status Crate Documents Introduction lazyext-slice Alpha Thousands of

Al Liu 2 Dec 5, 2022
A lambda extension to hot reload parameters from SSM Parameter Store, Secrets Manager, DynamoDB, AppConfig

A lambda extension to hot reload parameters from SSM Parameter Store, Secrets Manager, DynamoDB, AppConfig

Jake Scott 7 Jun 12, 2022
Searchbuddy is a browser extension that lets you chat with people that are searching for what you're searching for.

searchbuddy Make friends while searching! Searchbuddy is a browser extension that lets you chat with people that are searching for what you're searchi

Joseph Gerber 14 May 23, 2022
Rust Lambda Extension for any Runtime to preload SSM Parameters as ๐Ÿ” Secure Environment Variables!

?? Crypteia Rust Lambda Extension for any Runtime to preload SSM Parameters as Secure Environment Variables! Super fast and only performaned once duri

Custom Ink 34 Jan 7, 2023
Extension registry for Lapce Registry

Lapce Registry This is the software running the lapce plugin registry, this manages and hosts plugins that the community uploads. Run the registry loc

Lapce 15 Dec 4, 2022
The lambda-chaos-extension allows you to inject faults into Lambda functions without modifying the function code.

Chaos Extension - Seamless, Universal & Lightning-Fast The lambda-chaos-extension allows you to inject faults into Lambda functions without modifying

AWS CLI Tools 5 Aug 2, 2023
Cargo extension to recycle outdated build artifacts

cargo gc Cargo extension to recycle outdated build artifacts. And try the best to avoid recompilation. Usage Install it with cargo: cargo install carg

Ruihang Xia 23 Aug 30, 2023
Experimental playground for wiktionary data

wikt Experimental playground for wiktionary data. This document might not update as often as the code does. Set up You'll want a minimum of 10 GB free

Fรฉlix Saparelli 8 Jul 9, 2022
An experimental programming language for exploring first class iterators.

An experimental programming language for exploring first class iterators.

Miccah 4 Nov 23, 2021
Experimental syntax for Rust

Osy.rs Experimental syntax for Rust Hey everyone, this readme needs work! The spec has been roughed out in Osy.rs_spec.alpha, but the file could be be

null 3 Dec 17, 2021
An experimental Rust crate for sigstore

Continuous integration Docs License This is an experimental crate to interact with sigstore. This is under high development, many features and checks

sigstore 89 Dec 29, 2022
Experimental Valve Index camera passthrough for Linux

Index camera passthrough Warning: This is still a work in progress, you could get motion sickness if you try it now The problem that the Index camera

yshui 22 Dec 1, 2022
Experimental Rust tool for generating FFI definitions allowing many other languages to call Rust code

Diplomat is an experimental Rust tool for generating FFI definitions allowing many other languages to call Rust code. With Diplomat, you can simply define Rust APIs to be exposed over FFI and get high-level C, C++, and JavaScript bindings automatically!

null 255 Dec 30, 2022
An experimental implementation of Arc against Apache Datafusion

box This is an experimental repository to perform a proof of concept replacement of the Apache Spark executor for Arc with Apache DataFusion. This is

tripl.ai 1 Nov 26, 2021
An experimental transpiler to bring tailwind macros to SWC ๐Ÿš€

stailwc (speedy tailwind compiler) This is an experimental SWC transpiler to bring compile time tailwind macros to SWC (and nextjs) a-la twin macro. T

Alexander Lyon 139 Dec 20, 2022
An experimental RISC-V recompiler

WARNING: All of this code is highly experimental and is a direct result of a two day hacking binge fueled by a truckload of tea. It's definitely not s

Koute 13 Apr 2, 2023
duckdb-rs is an ergonomic wrapper for using duckdb from Rust.

duckdb-rs duckdb-rs is an ergonomic wrapper for using duckdb from Rust. It attempts to expose an interface similar to rusqlite. Acctually the initial

Wang Fenjin 126 Dec 30, 2022
Managing schema for AWS Athena in GitOps-style

athena-rs Managing AWS Athena Schemas Installation $ cargo install --git https://github.com/duyet/athena-rs $ athena --help athena 0.1.0 Duyet <me@du

Duyet Le 3 Sep 25, 2022