Cloud-optimized GeoTIFF ... Parallel I/O πŸ¦€

Overview

cog3pio

Cloud-optimized GeoTIFF ... Parallel I/O

Yet another attempt at creating a GeoTIFF reader, in Rust, with Python bindings.

Installation

Rust

cargo add --git https://github.com/weiji14/cog3pio.git

Python

pip install git+https://github.com/weiji14/cog3pio.git

Tip

The API for this crate/library is still unstable and subject to change, so you may want to pin to a specific git commit using either:

  • cargo add --git https://github.com/weiji14/cog3pio.git --rev <sha>
  • pip install git+https://github.com/weiji14/cog3pio.git@<sha>

where <sha> is a commit hashsum obtained from https://github.com/weiji14/cog3pio/commits/main

Usage

Rust

use std::io::Cursor;

use bytes::Bytes;
use cog3pio::io::geotiff::read_geotiff;
use ndarray::Array2;
use object_store::path::Path;
use object_store::{parse_url, GetResult, ObjectStore};
use tokio;
use url::Url;

#[tokio::main]
async fn main() {
    let cog_url: &str =
        "https://github.com/cogeotiff/rio-tiler/raw/6.4.0/tests/fixtures/cog_nodata_nan.tif";
    let tif_url: Url = Url::parse(cog_url).unwrap();
    let (store, location): (Box<dyn ObjectStore>, Path) = parse_url(&tif_url).unwrap();

    let stream: Cursor<Bytes> = {
        let result: GetResult = store.get(&location).await.unwrap();
        let bytes: Bytes = result.bytes().await.unwrap();
        Cursor::new(bytes)
    };

    let arr: Array2<f32> = read_geotiff(stream).unwrap();
    assert_eq!(arr.dim(), (549, 549));
    assert_eq!(arr[[500, 500]], 0.13482364);
}

Python

import numpy as np
from cog3pio import read_geotiff

array: np.ndarray = read_geotiff(
    path="https://github.com/cogeotiff/rio-tiler/raw/6.4.0/tests/fixtures/cog_nodata_nan.tif"
)
assert array.shape == (549, 549)
assert array.dtype == "float32"

Note

Currently, this crate/library only supports reading single-band float32 GeoTIFF files, i.e. multi-band and other dtypes (e.g. uint16) don't work yet. See roadmap below on future plans.

Roadmap

Short term (Q1 2024):

Medium term (Q2 2024):

  • Integration with xarray as a BackendEntrypoint
  • Parallel reader (TBD on multi-threaded or asynchronous)
  • Direct-to-GPU loading

Related crates

Comments
  • :construction_worker: Setup benchmark workflow with pytest-codspeed

    :construction_worker: Setup benchmark workflow with pytest-codspeed

    Measuring the execution speed of unit tests to track performance of cog3pio Python functions over time.

    Using pytest-codspeed to do the benchmarking, with the help of https://github.com/CodSpeedHQ/action. Decorated test_read_geotiff with @pytest.mark.benchmark to see if the benchmarking works.

    Note: Running the benchmarks on Python 3.12 to enable flame graph generation, available with pytest-codspeed>=2.0.0 - see https://docs.codspeed.io/features/trace-generation.

    References:

    • https://codspeed.io/blog/introducing-codspeed
    • https://docs.codspeed.io/benchmarks/python#running-the-benchmarks-in-your-ci
    opened by weiji14 2
  • :sparkles: Read GeoTIFF files from remote urls via object_store

    :sparkles: Read GeoTIFF files from remote urls via object_store

    Enable GeoTIFFs stored on remote urls (e.g. http, s3, azure, gcp, etc) to be read into numpy arrays.

    Uses the object_store crate, which pulls data via an async API. Note that only the network read is async, the TIFF decoding (using the tiff crate) is still synchronous.

    Usage (in Python):

    import numpy as np
    from cog3pio import read_geotiff
    
    array: np.ndarray = read_geotiff(
        path="https://github.com/pka/georaster/raw/v0.1.0/data/tiff/float32.tif"
    )
    assert array.shape == (20, 20)
    assert array.dtype == "float32"
    

    TODO:

    • [x] Add failing unit test reading a GeoTIFF via HTTP
    • [x] Use object_store's parse_url function to handle remote URL reads
    • [x] Ensure that local relative filepaths (e.g. data/float32.tif) work alongside remote URLs (e.g. https://somewhere.com/float32.tif)
    • [x] Refactor to remove .unwrap() statements
    • [x] Document in main README.md on new usage

    Other things done in this PR out of necessity:

    • Move to compiling on manylinux_2_28, therefore dropping compilation on i686 (x86) targets on Linux

    References:

    • Similar PR to read FlatGeobuf via object_store at https://github.com/geoarrow/geoarrow-rs/pull/494
    feature 
    opened by weiji14 1
  • :sparkles: A read_geotiff function for reading GeoTIFF into ndarray

    :sparkles: A read_geotiff function for reading GeoTIFF into ndarray

    Rust-based function for reading GeoTIFF files!

    Uses the tiff crate for the I/O, ndarray crate to store the 2D array in Rust, and numpy crate to convert to numpy.ndarray in Python.

    Note: This only works on single-band GeoTIFF files with float32 dtype for now

    Usage:

    In Rust:

    use ndarray::Array2;
    use std::fs::File;
    use cog3pio::io::geotiff::read_geotiff;
    
    let path: &str = "path/to/file.tif";
    let file: File = File::open(path).expect("Cannot find GeoTIFF file");
    let arr: Array2<f32> = read_geotiff(file).unwrap();
    assert_eq!(arr.dim(), (20, 20));
    assert_eq!(arr.mean(), Some(19.0));
    

    In Python:

    import numpy as np
    from cog3pio import read_geotiff
    
    array: np.ndarray = read_geotiff(path="georaster/data/tiff/float32.tif")
    assert array.shape == (20, 20)
    assert array.dtype == "float32"
    

    TODO:

    • [x] Initial implementation to allow reading single-band Float32 dtype GeoTIFF files
    • [x] Improve documentation
    • [x] Add unit tests in Python and Rust

    TODO in future:

    • [ ] Refactor to handle other dtypes
    • [ ] Reconsider project layout?

    References:

    • https://stackoverflow.com/questions/76145217/how-to-transform-a-tiff-file-to-ndarray-in-rust
    • https://en.anyquestion.info/q/how-to-transform-a-tiff-file-to-ndarray-in-rust
    • https://github.com/image-rs/image-tiff/pull/216/files
    feature 
    opened by weiji14 0
  • :construction_worker: Setup GitHub Actions Continuous Integration tests

    :construction_worker: Setup GitHub Actions Continuous Integration tests

    Minimal configuration to run unit tests. Using pytest. Using GitHub Actions CI.

    Created using maturin generate-ci --pytest github, with additional modifications. Also set pyo3's abi3-py310 feature.

    Matrix builds are on:

    • Linux (ubuntu-22.04) - targets: [x86_64, x86, aarch64, armv7, s390x, ppc64le]
    • Windows (windows-2022) - targets: [x64, x86]
    • macOS (macos-14) - targets: [x86_64, aarch64]

    References:

    • https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python
    • https://pyo3.rs/v0.20.3/features.html#the-abi3-pyxy-features
    maintenance 
    opened by weiji14 0
  • :seedling: Initialize Cargo.toml and pyproject.toml with maturin

    :seedling: Initialize Cargo.toml and pyproject.toml with maturin

    Created using maturin init --mixed following https://github.com/PyO3/maturin/blob/v1.4.0/guide/src/project_layout.md#mixed-rustpython-project, with some custom modifications.

    TODO:

    • [x] Include pyo3 crate as a dependency, set minimum Python version to 3.10 as per SPEC 0.
    • [x] Add trove classifiers
    • [x] Add .gitignore file
    • [x] Repo description, roadmap and related crates
    maintenance 
    opened by weiji14 0
Owner
Wei Ji
Geospatial Data Scientist/ML Practitioner @developmentseed. Towards GPU-native and cloud-native geospatial machine learning!
Wei Ji
Rust derive-based argument parsing optimized for code size

Argh Argh is an opinionated Derive-based argument parser optimized for code size Derive-based argument parsing optimized for code size and conformance

Google 1.3k Dec 28, 2022
Interpreted, optimized, JITed and compiled implementations of the Brainfuck lang.

Interpreted, Optimized, JITed and Compiled Brainfuck implementations This repo contains a series of brainfuck implementations based on Eli Bendersky b

Rodrigo Batista de Moraes 5 Jan 6, 2023
Solutions for exact and optimized best housing chains in BDO using popjumppush and MIP.

Work in progress. About This project is an implementation of the pop_jump_push algorithm. It uses graph data from the MMORPG Black Desert Online's tow

Thell 'Bo' Fowler 3 May 2, 2023
A lightweight async Web crawler in Rust, optimized for concurrent scraping while respecting `robots.txt` rules.

??️ crawly A lightweight and efficient web crawler in Rust, optimized for concurrent scraping while respecting robots.txt rules. ?? Features Concurren

CrystalSoft 5 Aug 29, 2023
Supporting code for the paper "Optimized Homomorphic Evaluation of Boolean Functions" submitted to Eurocrypt 2024

This repository contains the code related to the paper Optimized Homomorphic Evaluation of Boolean Functions. The folder search_algorithm contains the

CryptoExperts 3 Oct 23, 2023
βš‘πŸš€ Content Delivery Network written in Rustlang, optimized for speed and latency.

Supported Formats HTML Javscript Css Image PNG JPG JPEG GIF SVG Video MP4 WEBM FLV Audio OGG ACC MP3 Archives ZIP RAR Feeds & Data JSON YAML XML Docum

Noname 3 Apr 9, 2024
1 library and 2 binary crates to run SSH/SCP commands on a "mass" of hosts in parallel

massh 1 library and 2 binary crates to run SSH/SCP commands on a "mass" of hosts in parallel. The binary crates are CLI and GUI "frontends" for the li

null 2 Oct 16, 2022
Parallel iteration of FASTA/FASTQ files, for when sequence order doesn't matter but speed does

Rust-parallelfastx A truly parallel parser for FASTA/FASTQ files. Principle The input file is memory-mapped then virtually split into N chunks. Each c

Rayan Chikhi 8 Oct 24, 2022
A PAM module that runs multiple other PAM modules in parallel, succeeding as long as one of them succeeds.

PAM Any A PAM module that runs multiple other PAM modules in parallel, succeeding as long as one of them succeeds. Development I created a VM to test

Rajas Paranjpe 8 Apr 23, 2024
A diff-based data management language to implement unlimited undo, auto-save for games, and cloud-apps which needs to retain every change.

Docchi is a diff-based data management language to implement unlimited undo, auto-save for games, and cloud-apps which needs to save very often. User'

juzy 21 Sep 19, 2022
Command-line tool to generate Rust code for Google Cloud Spanner

nene nene is a command-line tool to generate Rust code for Google Cloud Spanner. nene uses database schema to generate code by using Information Schem

Naohiro Yoshida 3 Dec 7, 2021
Provide CRUD CLI for Moco Activities with Jira Cloud Sync Option for faster time tracking.

Moco CLI Provide CRUD CLI for Moco Activities with Jira Cloud Sync Option for faster time tracking. Available commands Login Jira Must be called befor

Emanuel Vollmer 7 Nov 18, 2022
By mirroring traffic to and from your machine, mirrord surrounds your local service with a mirror image of its cloud environment.

mirrord lets you easily mirror traffic from your Kubernetes cluster to your development environment. It comes as both Visual Studio Code extension and

MetalBear 2.1k Jan 3, 2023
ISG lets you use YouTube as cloud storage for ANY files, not just video

I was working on this instead of my finals, hope you appreciate it. I'll add all relevant executables when I can Infinite-Storage-Glitch AKA ISG (writ

HistidineDwarf 3.6k Feb 23, 2023
Cloud Native Buildpack that builds an OCI image with Ollama and a large language model.

Ollama Cloud Native Buildpack This buildpack builds an OCI image with Ollama and a large language model. Configure your model by an Ollama Modelfile o

Manuel Fuchs 3 Mar 19, 2024
ABQ is a universal test runner that runs test suites in parallel. It’s the best tool for splitting test suites into parallel jobs locally or on CI

?? abq.build   ?? @rwx_research   ?? discord   ?? documentation ABQ is a universal test runner that runs test suites in parallel. It’s the best tool f

RWX 13 Apr 7, 2023
A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy

A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy

Datafuse Labs 5k Jan 9, 2023
A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy

A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy

Datafuse Labs 5k Jan 9, 2023
πŸ’₯ Fast State-of-the-Art Tokenizers optimized for Research and Production

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tok

Hugging Face 6.2k Jan 2, 2023
enum-map enum-map xfix/enum-map [enum-map] β€” An optimized map implementation for enums using an array to store values.

enum-map A library providing enum map providing type safe enum array. It is implemented using regular Rust arrays, so using them is as fast as using r

Konrad Borowski 57 Dec 19, 2022