Read specialized NGS formats as data frames in R, Python, and more.

Overview

oxbow

Read specialized bioinformatic file formats as data frames in R, Python, and more.

File formats create a lot of friction for computational biologists. Oxbow is a data unification layer that aims to improve data accessibility and ease of high-performance analytics.

Data I/O is handled in Rust with features exposed to Python and R via Apache Arrow.

Learn more in our recent blog post.

Contributing

Want to contribute? Join us!

Development

The oxbow project is split into separate Rust, Python, and R packages. You can download sample data by following these instructions.

You might also like...
An example repository on how to start building graph applications on streaming data. Just clone and start building πŸ’» πŸ’ͺ
An example repository on how to start building graph applications on streaming data. Just clone and start building πŸ’» πŸ’ͺ

An example repository on how to start building graph applications on streaming data. Just clone and start building πŸ’» πŸ’ͺ

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing

Apache Arrow Powering In-Memory Analytics Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enabl

High-performance runtime for data analytics applications

Weld Documentation Weld is a language and runtime for improving the performance of data-intensive applications. It optimizes across libraries and func

A high-performance, high-reliability observability data pipeline.

Quickstart β€’ Docs β€’ Guides β€’ Integrations β€’ Chat β€’ Download What is Vector? Vector is a high-performance, end-to-end (agent & aggregator) observabilit

Rayon: A data parallelism library for Rust

Rayon Rayon is a data-parallelism library for Rust. It is extremely lightweight and makes it easy to convert a sequential computation into a parallel

Quickwit is a big data search engine.

Quickwit This repository will host Quickwit, the big data search engine developed by Quickwit Inc. We will progressively polish and opensource our cod

DataFrame / Series data processing in Rust

black-jack While PRs are welcome, the approach taken only allows for concrete types (String, f64, i64, ...) I'm not sure this is the way to go. I want

A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, written in Rust

Datafuse Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture Datafuse is a Real-Time Data Processing & Analytics DBMS wit

A highly efficient daemon for streaming data from Kafka into Delta Lake

A highly efficient daemon for streaming data from Kafka into Delta Lake

Comments
  • r-oxbow installation fails

    r-oxbow installation fails

    R-Version: 4.3.0 OS: Pos!OS (Ubuntu variant) rust-Version: rustc 1.69.0 (84c898d65 2023-04-16)

    I've cloned the repo to make sure I can build the local r-oxbow.

    cd r-oxbow/src/rust
    cargo build
    

    This works fine. Detects the oxbow in the directory above and installs it fine.

    remotes::install_local("r-oxbow")
    

    Generates this output:

    ── R CMD build ─────────────────────────────────────────────────────────────────
    βœ”  checking for file β€˜/tmp/RtmpjHZ4Kt/file34ed022cec682/r-oxbow/DESCRIPTION’ ...
    ─  preparing β€˜oxbow’: (2.1s)
    βœ”  checking DESCRIPTION meta-information ...
    ─  cleaning src
    ─  checking for LF line-endings in source and make files and shell scripts (425ms)
    ─  checking for empty or unneeded directories
    ─  building β€˜oxbow_0.0.0.9000.tar.gz’
       
    * installing *source* package β€˜oxbow’ ...
    ** using staged installation
    ** libs
    using C compiler: β€˜gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0’
    rm -Rf oxbow.so ./rust/target/release/liboxbow.a entrypoint.o
    gcc -I"/rmflight_stuff/software/R-4.3.0/include" -DNDEBUG   -I/usr/local/include    -fpic  -g -O2  -c entrypoint.c -o entrypoint.o
    # In some environments, ~/.cargo/bin might not be included in PATH, so we need
    # to set it here to ensure cargo can be invoked. It is appended to PATH and
    # therefore is only used if cargo is absent from the user's PATH.
    if [ "" != "true" ]; then \
    	export CARGO_HOME=/tmp/RtmpxvOvG8/R.INSTALL34f23e7aef79/oxbow/src/.cargo; \
    fi && \
    	export PATH="/opt/TinyTeX/bin/x86_64-linux/:/opt/TinyTeX/bin/x86_64-linux/:/home/rmflight/.cargo/bin:/home/rmflight/anaconda3/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:/home/rmflight/.local/bin:/home/rmflight/bin:/bin/java/:/software/julia-1.0.5/bin:/home/rmflight/.cargo/bin:/opt/TinyTeX/bin/x86_64-linux/:/home/rmflight/.cargo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin:/home/rmflight/.cargo/bin" && \
    	cargo build --lib --release --manifest-path=./rust/Cargo.toml --target-dir ./rust/target
    error: failed to get `oxbow` as a dependency of package `r-oxbow v0.1.0 (/tmp/RtmpxvOvG8/R.INSTALL34f23e7aef79/oxbow/src/rust)`
    
    Caused by:
      failed to load source for dependency `oxbow`
    
    Caused by:
      Unable to update /tmp/RtmpxvOvG8/R.INSTALL34f23e7aef79/oxbow
    
    Caused by:
      failed to read `/tmp/RtmpxvOvG8/R.INSTALL34f23e7aef79/oxbow/Cargo.toml`
    
    Caused by:
      No such file or directory (os error 2)
    make: *** [Makevars:16: rust/target/release/liboxbow.a] Error 101
    ERROR: compilation failed for package β€˜oxbow’
    * removing β€˜/rmflight_stuff/software/R-4.3.0/library/oxbow’
    Warning message:
    In i.p(...) :
      installation of package β€˜/tmp/RtmpjHZ4Kt/file34ed0a9f21f9/oxbow_0.0.0.9000.tar.gz’ had non-zero exit status
    

    Same error if I use:

    remotes::install_github("abdenlab/oxbow", subdir="r-oxbow")
    
    opened by rmflight 5
Work out how to read Parquet files in a browser using web assembly (via the Rust toolchain)

wasm-pack-template A template for kick starting a Rust and WebAssembly project using wasm-pack. Tutorial | Chat Built with ?? ?? by The Rust and WebAs

null 5 Oct 11, 2022
New generation decentralized data warehouse and streaming data pipeline

World's first decentralized real-time data warehouse, on your laptop Docs | Demo | Tutorials | Examples | FAQ | Chat Get Started Watch this introducto

kamu 184 Dec 22, 2022
This library provides a data view for reading and writing data in a byte array.

Docs This library provides a data view for reading and writing data in a byte array. This library requires feature(generic_const_exprs) to be enabled.

null 2 Nov 2, 2022
A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy

A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy

Datafuse Labs 5k Jan 9, 2023
Bytewax is an open source Python framework for building highly scalable dataflows.

Bytewax Bytewax is an open source Python framework for building highly scalable dataflows. Bytewax uses PyO3 to provide Python bindings to the Timely

Bytewax 289 Jan 6, 2023
Perhaps the fastest and most memory efficient way to pull data from PostgreSQL into pandas and numpy. πŸš€

flaco Perhaps the fastest and most memory efficient way to pull data from PostgreSQL into pandas and numpy. ?? Have a gander at the initial benchmarks

Miles Granger 14 Oct 31, 2022
A fast, powerful, flexible and easy to use open source data analysis and manipulation tool written in Rust

fisher-rs fisher-rs is a Rust library that brings powerful data manipulation and analysis capabilities to Rust developers, inspired by the popular pan

Syed Vilayat Ali Rizvi 5 Aug 31, 2023
A fast, powerful, flexible and easy to use open source data analysis and manipulation tool written in Rust

fisher-rs fisher-rs is a Rust library that brings powerful data manipulation and analysis capabilities to Rust developers, inspired by the popular pan

null 5 Sep 6, 2023
Provides a way to use enums to describe and execute ordered data pipelines. πŸ¦€πŸΎ

enum_pipline Provides a way to use enums to describe and execute ordered data pipelines. ?? ?? I needed a succinct way to describe 2d pixel map operat

Ben Greenier 0 Oct 29, 2021
AppFlowy is an open-source alternative to Notion. You are in charge of your data and customizations

AppFlowy is an open-source alternative to Notion. You are in charge of your data and customizations. Built with Flutter and Rust.

null 30.7k Jan 7, 2023