Extract subsets of ONT (Nanopore) reads based on time

Overview

ONTime

Rust CI Crates.io License: MIT github release version DOI

Extract subsets of ONT (Nanopore) reads based on time

Motivation

Some collaborators wanted to know how long they need to perform sequencing on the Nanopore device until they got "sufficient" data (sufficient is obviously application-dependent).

They were just going to do multiple runs for different amounts of time. So instead, I created ontime to easily grab reads from the first hour, first two hours, first three hours etc. and run those subsets through the analysis pipeline that was the intended application. This way they only needed to do one (longer) run.

Install

tl;dr: precompiled binary

curl -sSL ontime.mbh.sh | sh
# or with wget
wget -nv -O - ontime.mbh.sh | sh

You can also pass options to the script like so

$ curl -sSL ontime.mbh.sh | sh -s -- --help
install.sh [option]

Fetch and install the latest version of ontime, if ontime is already
installed it will be updated to the latest version.

Options
        -V, --verbose
                Enable verbose output for the installer

        -f, -y, --force, --yes
                Skip the confirmation prompt during installation

        -p, --platform
                Override the platform identified by the installer [default: apple-darwin]

        -b, --bin-dir
                Override the bin installation directory [default: /usr/local/bin]

        -a, --arch
                Override the architecture identified by the installer [default: x86_64]

        -B, --base-url
                Override the base URL used for downloading releases [default: https://github.com/mbhall88/ssubmit/releases]

        -h, --help
                Display this help message

Conda

Conda (channel only) bioconda version Conda

$ conda install -c bioconda ontime

Cargo

$ cargo install ontime

Container

Docker images are hosted at quay.io.

singularity

Prerequisite: singularity

$ URI="docker://quay.io/mbhall88/ontime"
$ singularity exec "$URI" ontime --help

The above will use the latest version. If you want to specify a version then use a tag (or commit) like so.

$ VERSION="0.1.0"
$ URI="docker://quay.io/mbhall88/ontime:${VERSION}"

docker

Docker Repository on Quay

Prerequisite: docker

$ docker pull quay.io/mbhall88/ontime
$ docker run quay.io/mbhall88/ontime ontime --help

You can find all the available tags on the quay.io repository.

Build from source

$ git clone https://github.com/mbhall88/ontime.git
$ cd ontime
$ cargo build --release
$ target/release/ontime -h

Examples

I want the reads that were sequenced in the first hour

$ ontime --to 1h in.fq

I want the reads that were sequenced after the first hour

$ ontime --from 1h in.fq

I want all reads except those sequenced in the last hour

$ ontime --to -1h in.fq

I want reads sequenced between the third and fourth hours

ontime --from 3h --to 4h in.fq

Check what the earliest and latest start times in the fastq are

$ ontime --show in.fq
Earliest: 2022-12-12T15:17:01.0Z
Latest  : 2022-12-13T01:16:27.0Z

I like to be specific, give me the reads that were sequenced **while I was eating dinner ** (see note on time formats)

ontime --from 2022-12-12T20:45:00Z --to 2022-12-12T21:17:01.5Z in.fq

I want to save the output to a Gzip-compressed file

$ ontime --to 2h -o out.fq.gz in.fq

Usage

Usage: ontime [OPTIONS] <FILE>

Arguments:
  <FILE>  Input fastq file

Options:
  -o, --output <FILE>          Output file name [default: stdout]
  -O, --output-type <u|b|g|l>  u: uncompressed; b: Bzip2; g: Gzip; l: Lzma
  -L, --compress-level <1-9>   Compression level to use if compressing output [default: 6]
  -f, --from <DATE/DURATION>   Earliest start time; otherwise the earliest time is used
  -t, --to <DATE/DURATION>     Latest start time; otherwise the latest time is used
  -s, --show                   Show the earliest and latest start times in the input and exit
  -h, --help                   Print help information (use `--help` for more detail)
  -V, --version                Print version information

Specifying a time range

The --from and --to options are used to restrict the timeframe you want reads from. These options accept two different formats: duration and timestamp.

Duration: The most human-friendly way to provide a range is with duration. For example, 1h means 1 hour. Passing --from 1h says "I want reads that were generated 1 hour or more after sequencing started" - i.e. the earliest start time in the file plus 1 hour. Likewise, passing --to 2h says "I only want reads that were generated before the second hour of sequencing". Using --from and --to in combination gives you a range.

We support a range of time/duration units and they can be combined. For example, 3h45m to indicate 3 hours and 45 minutes. See the duration-str docs for the full list of support duration units.

Negative durations are also allowed. A negative duration subtracts that duration from the latest start time in the file. So --to -1h will exclude reads that were sequenced in the last hour of the run. Negative ranges are also valid - i.e. --from -2h --to -1h will give you the reads sequenced in the penultimate hour of the run.

Timestamp: If you want to provide date and time for your ranges, that is acceptable in --from/--to also. See the formatting guide for more information.

To make using timestamps a little easier, you can first run ontime --show <in.fq> to get the earliest and latest timestamps in the file.

Time format

The times that ontime extracts are the start_time=<time> section contained in the description of each fastq read. The format of this time has changed a few times, so if you come across a file which ontime cannot parse, please raise an issue so I can make it work.

All times printed by ontime and accepted by the --from/--to options are UTC time. More recent versions of Guppy also have UTC offsets in their start_time; for simplicity's sake, these offsets are ignored by ontime. So, if you want to provide a timestamp to --from/--to based on a timeframe in your local time, please first convert it to UTC time.

In general, the timestamp format ontime accepts anything that is RFC339-compliant.

The basic (recommended) format is <YEAR>-<MONTH>-<DAY>T<HOUR>:<MINUTE>:<SECONDS>Z - e.g. 2022-12-12T18:39:09Z. Feel free to get precise with subseconds though if you like...

Full usage

Extract subsets of ONT (Nanopore) reads based on time

Usage: ontime [OPTIONS] <FILE>

Arguments:
  <FILE>
          Input fastq file

Options:
  -o, --output <FILE>
          Output file name [default: stdout]

  -O, --output-type <u|b|g|l>
          u: uncompressed; b: Bzip2; g: Gzip; l: Lzma

          ontime will attempt to infer the output compression format automatically from the output extension. If writing to stdout, the default is uncompressed (u)

  -L, --compress-level <1-9>
          Compression level to use if compressing output

          [default: 6]

  -f, --from <DATE/DURATION>
          Earliest start time; otherwise the earliest time is used

          This can be a timestamp - e.g. 2022-11-20T18:00:00 - or a duration from the start - e.g. 2h30m (2 hours and 30 minutes from the start). See the docs for more examples

  -t, --to <DATE/DURATION>
          Latest start time; otherwise the latest time is used

          See --from (and docs) for examples

  -s, --show
          Show the earliest and latest start times in the input and exit

  -h, --help
          Print help information (use `-h` for a summary)

  -V, --version
          Print version information

Cite

ontime is archived at Zenodo.

@software{ontime,
  author       = {Michael Hall},
  title        = {mbhall88/ontime: 0.1.3},
  month        = jan,
  year         = 2023,
  publisher    = {Zenodo},
  version      = {0.1.3},
  doi          = {10.5281/zenodo.7533053},
  url          = {https://doi.org/10.5281/zenodo.7533053}
}
You might also like...
Deadliner helps you keep track of the time left for your deadline by dynamically updating the wallpaper of your desktop with the time left.
Deadliner helps you keep track of the time left for your deadline by dynamically updating the wallpaper of your desktop with the time left.

Deadliner Watch the YouTube video What's Deadliner? Deadliner is a cross-platform desktop application for setting deadline for a project and keeping t

Helps you keep track of time for team members across different time zones & DST changes

Teamdate Helps you keep track of time for team members across different timezones and other daylight saving changes based off their location. Because

Another TUI based system monitor, this time in Rust!
Another TUI based system monitor, this time in Rust!

Another TUI based system monitor, this time in Rust!

🛡️ Terminal-based, real-time traffic monitoring and statistics for your AdGuard Home instance
🛡️ Terminal-based, real-time traffic monitoring and statistics for your AdGuard Home instance

AdGuardian-Term Terminal-based, real-time traffic monitoring and statistics for your AdGuard Home instance About AdGuardian Terminal Eddition - Keep a

Revolutionize handheld gaming with adaptive game settings. Optimize graphics and gameplay experience based on real-time system metrics. Open-source project empowering developers to enhance games on portable devices
Revolutionize handheld gaming with adaptive game settings. Optimize graphics and gameplay experience based on real-time system metrics. Open-source project empowering developers to enhance games on portable devices

Welcome to the server-side application for the HarmonyLink project. This innovative software is developed with the Rust programming language and is ai

Fast Symbol Ranking based compressor. Based on the idea of Matt Mahoney's SR2

Fast Symbol Ranking based compressor. Based on the idea of Matt Mahoney's SR2

Fuzzy  a general fuzzy finder that saves you time in rust!
Fuzzy a general fuzzy finder that saves you time in rust!

Life is short, skim! Half of our life is spent on navigation: files, lines, commands… You need skim! It is a general fuzzy finder that saves you time.

Display ZFS datasets' I/O in real time
Display ZFS datasets' I/O in real time

ztop Display ZFS datasets' I/O in real time Overview ztop is like top, but for ZFS datasets. It displays the real-time activity for datasets. The buil

verilot (verifiable lottery) is a command line tool for running and verifying one-time lotteries.

verilot verilot (verifiable lottery) is a command line tool for running and verifying one-time lotteries. Install Install Rust and Cargo with Rustup.

Owner
Michael Hall
Postdoc with @lachlancoin Bioinformatics | TB | Nanopore | Microbial Genomics | Software Dev.
Michael Hall
Bam Error Stats Tool (best): analysis of error types in aligned reads.

best Bam Error Stats Tool (best): analysis of error types in aligned reads. best is used to assess the quality of reads after aligning them to a refer

Google 54 Jan 3, 2023
A simple TTS tool for Windows that reads directly from the clipboard.

Quick Text-To-Speech A simple TTS tool for Windows that reads directly from the clipboard or from textfiles that are dragged into the window. Screensh

Alexander 3 May 1, 2023
Tumour-only somatic mutation calling using long reads

smrest smrest is a prototype somatic mutation caller for single molecule long reads. It uses haplotype phasing patterns for tumour samples that have a

Jared Simpson 16 Mar 1, 2024
Command line tool to extract various data from Blender .blend files

blendtool Command line tool to extract various data from Blender .blend files. Currently supports dumping Eevee irradiance volumes to .dds, new featur

null 2 Sep 26, 2021
Fetch and extract HTML's title and description by given link.

extd Fetch and extract HTML's title and description by given link. Usage in Cargo.toml: [dependencies] extd = "0.1.4" Example use extd::extract_td; f

null 4 Nov 4, 2022
Periodically download a youtube playlist, extract audio, convert to mp3, move to directory (possibly synced using syncthing).

auto-dl Periodically download a youtube playlist, extract audio, convert to mp3, move to directory (possibly synced using syncthing). drop https://git

Paul Adenot 10 Jan 12, 2023
Given a set of kmers (fasta format) and a set of sequences (fasta format), this tool will extract the sequences containing the kmers.

Kmer2sequences Description Given a set of kmers (fasta / fastq [.gz] format) and a set of sequences (fasta / fastq [.gz] format), this tool will extra

Pierre Peterlongo 22 Sep 16, 2023
Extract core logic from qdrant and make it available as a library.

Qdrant lib Why? Qdrant is a vector search engine known for its speed, scalability, and user-friendliness. While it excels in its domain, it currently

Tyr Chen 27 Jan 1, 2024
Extract patterns from unstructured log messages

logu logu is for extracting patterns from (streaming) unstructured log messages. For parsing unstructured logs, it uses the parser from Drain. In simp

null 78 Oct 21, 2024
belt is a command line app that can show your time from a list of selected time zones

A CLI app to show your time from a list of selected time zones, and a rust lib to parse dates in string formats that are commonly used.

Rollie Ma 23 Nov 4, 2022