Bioinformatics tool for counting guides in CRISPR-screen studies.

Fulcrum Genomics

Last update: Oct 17, 2022

Related tags

Overview

guide-counter

A better, faster way to count guides in CRISPR screens.

Overview

guide-counter is a tool for processing FASTQ files from CRISPR screen experiments to generate a matrix of per-sample guide counts. It can be used as a faster, more accurate, drop in replacement for mageck count. By default guide-counter will look for guide seqeunces in the reads with 0 or 1 mismatches vs. the expected guides, but can be run in exact matching mode.

Why `guide-counter`?

If you have any experience analyzing CRISPR screens you've almost certainly tried mageck. It's widely used, highly cited and generally works well. Surprisingly though, mageck count is both rather slow and misses counting a non-trivial amount of the data.

As an example, we ran data from the Sanson et al paper through both tools. The dataset consists of:

Sample	Reads	Gzipped FASTQ Size
Plasmid	9,821,128	377M
RepA	76,471,324	2.3G
RepB	85,301,059	2.5G
RepC	75,356,900	2.2G

The following plot shows the amount of data recovered per sample by each of three different analyses:

And the following plot shows the runtime for each of the three analyses performed using a single CPU core/thread on an Intel Core i9 powered MacBook Pro laptop:

Installation

Installation can be done using conda:

conda install -c bioconda guide-counter

or with cargo if installed:

cargo install guide-counter

Example Workflow

The following shows an example of running guide-counter followed by mageck test on data from the Sanson et al. 2018 paper:

guide-counter count \
  --input plasmid.fq.gz RepA.fq.gz RepB.fq.gz RepC.fq.gz \
  --control-pattern control \
  --essential-genes metadata/training_essentials.txt \
  --nonessential-genes metadata/training_nonessential.txt \
  --library metadata/broadgpp-brunello-library-corrected.txt.gz  \
  --output sanson
  
mageck test \
  --count-table sanson.counts.txt \
  --control-id plasmid \
  --treatment-id RepA,RepB,RepC \
  --norm-method median \
  --output-prefix sanson.test

Inputs

The full usage for guide-counter count is reproduced below; this section describes a few of the key inputs in more detail:

Input Option	Required	Description
`--input`	Yes	FASTQ files one per sample. Files may be gzipped or uncompressed.
`--samples`	No	Names for the samples, matched positionally to the FASTQs. If not provided then the input file names minus any `.[fq
`--essential-genes`	No	An optional file of known essential genes. May be gzipped or uncompressed. May be either just gene names, one per line, or tab-delimited with the gene in the first column. If given, guides will be labeled as essential for matching genes, and mean coverage of guides for essential genes computed.
`--nonessential-genes`	No	An optional file of known nonessential genes. May be gzipped or uncompressed. May be either just gene names, one per line, or tab-delimited with the gene in the first column. If given, guides will be labeled as nonessential for matching genes, and mean coverage of guides for nonessential genes computed.
`--control-guides`	No	An optional file of guide IDs for control guides. May be gzipped or uncompressed. May be either just guide IDs, one per line, or tab-delimited data with the guide ID in the first column. If given, matching guides will be labeled as controls, and mean coverage of control guides computed. May be used alone or in conjunction with `--control-pattern`.
`--control-pattern`	No	An optional regular expression which is applied (case insensitive) to both guide IDs and gene names, and when a match is found, guides are labeled as controls. For example `--control-pattern control` works well for many human libraries.

Outputs

The output files are generated:

{output}.counts.txt - a standard count matrix with columns for the guide ID and gene, then one column per sample with raw/unnormalized guide counts.
{output}.-extended-counts.txt - an extended version of the counts matrix which includes a guide_type column which will have one of [Essential, Nonessential, Control, Other] per guide as determined based on the gene lists and control information provided.
{output}.stats.txt - a file of computed statistics, one row per input sample/FASTQ.

The columns in the stats file are:

Column	Description
file	The path to the input FASTQ file used to generate the stats.
label	The label or sample name given to the sample.
total_guides	The total number of guides in the guide library (not sample dependent).
total_reads	The total number of reads in the input FASTQ file.
mapped_reads	The number of reads that could be mapped to a guide.
frac_mapped	The fraction of reads (0-1) that could be mapped to a guide.
mean_reads_per_guide	The mean number of reads mapped to each guide in the library.
mean_reads_essential	The mean number of reads mapped to guides for essential genes.
mean_reads_nonessential	The mean number of reads mapped to guides for nonessential genes.
mean_reads_control	The mean number of reads mapped to control guides.
mean_reads_other	The mean number of reads mapped to other guides (guides not flagged as essential, nonessential or control).
zero_read_guides

Usage

Usage for guide-counter count:

guide-counter-count

Counts the guides observed in a CRISPR screen, starting from one or more FASTQs.  FASTQs are one per
sample and currently only single-end FASTQ inputs are supported.

A set of sample IDs may be provided using `--samples id1 id2 ..`.  If provided it must have the same
number of values as input FASTQs.  If not provided the FASTQ names are used minus any fastq/fq/gz
suffixes.

Automatically determines the range of valid offsets within the sequencing reads where the guide
sequences are located, independently for each FASTQ input.  The first `offset-sample-size` reads
from each FASTQ are examined to determine the offsets at which guides are found. When processing the
full FASTQ, checks only those offsets that accounted for at least `offset-min-fraction` of the first
`offset-sample-size` reads.

Matching by default allows for one mismatch (and no indels) between the read sub-sequence and the
expected guide sequences.  Exact matching may be enabled by specifying the `--exact-match` option.

Two output files are generated.  The first is named `{output}.counts.txt` and contains columns for
the guide id, the gene targeted by the guide and one count column per input FASTQ with raw/un-
normalized counts.  The second is named `{output}.stats.txt` and contains basic QC statistics per
input FASTQ on the matching process.

USAGE:
    guide-counter count [OPTIONS] --input <INPUT>... --library <LIBRARY> --output <OUTPUT>

OPTIONS:
    -c, --control-guides <CONTROL_GUIDES>
            Optional path to file with list control guide IDs.  IDs should appear one per line and
            are case sensitive

    -C, --control-pattern <CONTROL_PATTERN>
            Optional regular expression pattern used to ID control guides. Pattern is matched, case
            insensitive, to guide IDs and Gene names

    -e, --essential-genes <ESSENTIAL_GENES>
            Optional path to file with list of essential genes.  Gene names should appear one per
            line and are case sensitive

    -f, --offset-min-fraction <OFFSET_MIN_FRACTION>
            After sampling the first `offset_sample_size` reads, use offsets that

            [default: 0.005]

    -h, --help
            Print help information

    -i, --input <INPUT>...
            Input fastq file(s)

    -l, --library <LIBRARY>
            Path to the guide library metadata.  May be a tab- or comma-separated file.  Must have a
            header line, and the first three fields must be (in order): i) the ID of the guide, ii)
            the base sequence of the guide, iii) the gene the guide targets

    -n, --nonessential-genes <NONESSENTIAL_GENES>
            Optional path to file with list of nonessential genes.  Gene names should appear one per
            line and are case sensitive

    -N, --offset-sample-size <OFFSET_SAMPLE_SIZE>
            The number of reads to be examined when determining the offsets at which guides may be
            found in the input reads

            [default: 100000]

    -o, --output <OUTPUT>
            Path prefix to use for all output files

    -s, --samples <SAMPLES>...
            Sample names corresponding to the input fastqs. If provided must be the same length as
            input.  Otherwise will be inferred from input file names

    -x, --exact-match
            Perform exact matching only, don't allow mismatches between reads and guides

qfetch is a tool that fetches info about your linux install.

qfetch qfetch is a tool that fetches info about your linux install. Status Dependencies /proc/meminfo with the following fields: MemTotal in the 1st l

2 Nov 16, 2022

Goodname is a tool to assist you with cool naming of your methods and software

Goodname is a tool to assist you with cool naming of your methods and software. Given a brief description of your method or software, this tool enumerates name candidates forming subsequences of the description (i.e., abbreviation).

118 Dec 28, 2022

A tool and library to losslessly join multiple .mp4 files shot with same camera and settings

mp4-merge A tool and library to losslessly join multiple .mp4 files shot with same camera and settings. This is useful to merge multiple files that ar

7 Jan 2, 2023

A tool that helps you to turn in one command a Rust crate into a Haskell Cabal library!

cabal-pack A tool that helps you to turn in one command a Rust crate into a Haskell Cabal library! To generate bindings, you need to annotate the Rust

18 Dec 31, 2022

An open source WCH-Link library/command line tool written in Rust.

wlink - WCH-Link command line tool NOTE: This tool is still in development and not ready for production use. Known Issue: Only support binary firmware

22 Mar 7, 2023

Simple CLI tool to create dummy accounts with referral links to give yourself free Plus.

Free Duolingo Plus A simple CLI tool to create dummy accounts with referral links to give yourself free Plus (max 24/41 weeks depending on whether you

23 Apr 27, 2023

Tool to convert variable and function names in C/C++ source code to snake_case

FixNameCase Tool to convert variable and function names in C/C++ source code to snake_case. Hidden files and files listed in .gitignore are untouched.

4 May 25, 2023

OP-Up is a hive tool for testing OP-Stack-compatible software modules

op-up Warning This is a work in progress. OP-Up is a hive tool for testing OP-Stack-compatible software modules. This project was born out of the need

20 Jun 13, 2023

Easy c̵̰͠r̵̛̠ö̴̪s̶̩̒s̵̭̀-t̶̲͝h̶̯̚r̵̺͐e̷̖̽ḁ̴̍d̶̖̔ ȓ̵͙ė̶͎ḟ̴͙e̸̖͛r̶̖͗ë̶̱́ṉ̵̒ĉ̷̥e̷͚̍ s̷̹͌h̷̲̉a̵̭͋r̷̫̊ḭ̵̊n̷̬͂g̵̦̃ f̶̻̊ơ̵̜ṟ̸̈́ R̵̞̋ù̵̺s̷̖̅ţ̸͗!̸̼͋

Rust S̵̓i̸̓n̵̉ I̴n̴f̶e̸r̵n̷a̴l mutability! Howdy, friendly Rust developer! Ever had a value get m̵̯̅ð̶͊v̴̮̾ê̴̼͘d away right under your nose just when

294 Dec 23, 2022

A project management tool for data science and bioinformatics. If you want it, Kerblam it!

Warning kerblam run and kerblam package are complete but still untested. Please do use them, but be careful. Always have a backup of your data and cod

4 Dec 18, 2023

Taking the best of Substrate Recipes and applying them to a new framework for structuring a collection of how-to guides.

Attention: This repository has been archived and is no longer being maintained. It has been replaced by the Substrate How-to Guides. Please use the Su

35 Oct 17, 2022

Book - Actix user guides

User guides Actix User Guide Actix API Documentation (Development) Actix API Documentation (Releases) Actix Web User Guide Actix Web API Documentation

185 Dec 25, 2022

A simple made in Rust crack, automatic for Winrar, activated from shared virtual memory, for studies.

Simple Winrar Crack in Rust What does it do ? A simple project that allows you to modify the license check used by WinRaR, "RegKey" from virtual memor

7 Jan 2, 2023

An interactive Bayesian Probability Calculator CLI that guides users through updating beliefs based on new evidence.

Bayesian Probability Calculator CLI Welcome to the Bayesian Probability Calculator CLI! This command-line tool is designed to help you update your bel

4 Apr 25, 2023

some AV / EDR / analysis studies

binary some AV / EDR / analysis related experiences fault_test: trigger a access violation, catch with a custom handler and continue the normal execut

7 May 26, 2023

This repository brings together my studies in the Rust language.

Studying_Rust This repository brings together my studies in the Rust language. Study schedule in 90 days start date: 7/24 end date: 10/24 Each topic w

5 Aug 8, 2023

This library provides implementations of many algorithms and data structures that are useful for bioinformatics.

This library provides implementations of many algorithms and data structures that are useful for bioinformatics. All provided implementations are rigorously tested via continuous integration.

1.2k Dec 26, 2022

Bioinformatics plugin for nushell.

Nushell bio A bioinformatics plugin for nushell. The aim initially is to create a bunch of parsers for all of the common bioinformatics file formats a

7 Jan 8, 2023

A free and open-source DNA Sequencing/Visualization software for bioinformatics research.

DNArchery 🧬 A free and open-source cross-platform DNA Sequencing/Visualization Software for bioinformatics research. A toolkit for instantly performi

21 Mar 26, 2023

Comments

range end index 20 out of range for slice of length 14

Hi!

I'm having an issue while trying to run guide-counter count for the first time.

Here's the command I'm trying to run:

guide-counter count --input \
sample1.fastq \
sample2.fastq \
sample3.fastq \
sample4.fastq \
sample5.fastq \
sample6.fastq \
sample7.fastq \
sample8.fastq \
sample9.fastq \
sample10.fastq \
sample11.fastq \
--control-pattern control \
--library /lib/library.txt \
--output guide_counter

And here's the answer:

[2022-03-18T10:42:51Z INFO  guide_counter::guide] Loaded library with 77440 guides for 19115 genes; 0=essential, 0=nonessential, 1000=control, 76440=other.
[2022-03-18T10:42:51Z INFO  guide_counter::commands::count] Building lookup.
[2022-03-18T10:42:53Z INFO  guide_counter::commands::count] Lookup built with 4719224 entries.
thread 'main' panicked at 'range end index 20 out of range for slice of length 14', src/commands/count.rs:258:38
stack backtrace:
   0:     0x55b229a3b1fd - std::backtrace_rs::backtrace::libunwind::trace::h09f7e4e089375279
                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:     0x55b229a3b1fd - std::backtrace_rs::backtrace::trace_unsynchronized::h1ec96f1c7087094e
                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x55b229a3b1fd - std::sys_common::backtrace::_print_fmt::h317b71fc9a5cf964
                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:67:5
   3:     0x55b229a3b1fd - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::he3555b48e7dfe7f0
                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:46:22
   4:     0x55b2299bf41c - core::fmt::write::h513b07ca38f4fb1b
                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/fmt/mod.rs:1149:17
   5:     0x55b229a39cf4 - std::io::Write::write_fmt::haf8c932b52111354
                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/io/mod.rs:1697:15
   6:     0x55b229a3a3d0 - std::sys_common::backtrace::_print::h195c38364780a303
                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:49:5
   7:     0x55b229a3a3d0 - std::sys_common::backtrace::print::hc09dfdea923b6730
                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:36:9
   8:     0x55b229a3a3d0 - std::panicking::default_hook::{{closure}}::hb2e38ec0d91046a3
                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:211:50
   9:     0x55b229a397ca - std::panicking::default_hook::h60284635b0ad54a8
                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:228:9
  10:     0x55b229a397ca - std::panicking::rust_panic_with_hook::ha677a669fb275654
                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:606:17
  11:     0x55b229a5e298 - std::panicking::begin_panic_handler::{{closure}}::h976246fb95d93c31
                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:502:13
  12:     0x55b229a5e216 - std::sys_common::backtrace::__rust_end_short_backtrace::h38077ee5b7b9f99a
                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:139:18
  13:     0x55b229a5e1d2 - rust_begin_unwind
                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:498:5
  14:     0x55b2299315f0 - core::panicking::panic_fmt::h35f3a62252ba0fd2
                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/panicking.rs:107:14
  15:     0x55b2299316f1 - core::slice::index::slice_end_index_len_fail::h735e748f7023a8c4
                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/slice/index.rs:41:5
  16:     0x55b229955428 - guide_counter::commands::count::Count::determine_prefixes::{{closure}}::h3e1e1356877818da
  17:     0x55b22994ff13 - <guide_counter::commands::count::Count as guide_counter::commands::command::Command>::execute::h32ae3439f4cdfe2c
  18:     0x55b22995ea0c - guide_counter::main::h56f2ffd0731cd338
  19:     0x55b229937f73 - std::sys_common::backtrace::__rust_begin_short_backtrace::h231446ae02769ddb
  20:     0x55b229963e07 - main
  21:     0x7f1f3df6ed0a - __libc_start_main
  22:     0x55b2299360e9 - <unknown>

The library is Brunello human and it's formatted like:

sg_1	CATCTTCTTTCACCTGAACG	A1BG
sg_2	CTCCGGGGAGAACTCCGGCG	A1BG
sg_3	TCTCCATGGTGCATCAGCAC	A1BG
sg_4	TGGAAGTCCACTCCACTCAG	A1BG
sg_5	ACTGCATCTGTGCAAACGGG	A2M
sg_6	ATGTCTCATGAACTACCCTG	A2M
sg_7	TGAAATGAAACTTCACACTG	A2M
sg_8	TTACTCATATAGGATCCCAA	A2M

Do you know what could be the problem?

Thanks! Pierre

opened by p-levy 4

Update clap to 3.0 rc9

Somewhere between “beta5” and “rc9" clap 3.0.0 hid the derive macros behind a feature flag. Given that cargo eagerly updates dependencies that are expected to be compatible, this breaks cargo install. This updates to rc9 and adds the feature flag.

opened by tfenne 0
Add a new output with correlations of guide counts between samples
One other nice QC to have is a set of correlations between samples. This might look something like this:

sample1 sample2 r2 control_r2 essential_r2 nonessential_r2 other_r2 plasmid day7 0.76 0.94 0.62 0.92 0.87 ...
opened by tfenne 0

Releases(v0.1.3)

v0.1.3(Mar 22, 2022)

Release 0.1.3 is a minor release to fix a bug in the handling of input reads that are shorter than the length of guides in the guide library. Prior to this release if one of the input FASTQs contained reads that are shorter than the length of the guides in use, guide-counter would fail and exit with an error. Starting with 0.1.3 reads shorter than a guide-length are ignored.
Source code(tar.gz)
Source code(zip)
v0.1.2(Dec 29, 2021)

Minor bugfix to renable multi-value parameters (i.e. --input 1.fq 2.fq) vs. having to specify the option many times (e.g. --input 1.fq --input2.fq). No other changes.
Source code(tar.gz)
Source code(zip)
v0.1.1(Dec 29, 2021)

Bugfix release to fix broken cargo install due to derive macros in clap-3.0.0 RCs moving behind a feature flag. No functional differences.
Source code(tar.gz)
Source code(zip)
v0.1.0(Dec 28, 2021)

First public release of guide-counter.
Source code(tar.gz)
Source code(zip)

Owner

Fulcrum Genomics

GitHub

Intro: we are creating a software system for a pizza restaurant, one of the modules is supposed to handle the management of various pizza recipes and how the orders are put together, and a big part of the module will be the control of food types, the potential allergens in recipes, and calories counting.

rust_pizzeria Intro: we are creating a software system for a pizza restaurant, one of the modules is supposed to handle the management of various pizz

1 Oct 26, 2021

Bioinformatics tool for counting guides in CRISPR-screen studies.

Related tags

Overview

guide-counter

Overview

Why guide-counter?

Installation

Example Workflow

Inputs

Outputs

Usage

You might also like...

qfetch is a tool that fetches info about your linux install.

Goodname is a tool to assist you with cool naming of your methods and software

A tool and library to losslessly join multiple .mp4 files shot with same camera and settings

A tool that helps you to turn in one command a Rust crate into a Haskell Cabal library!

An open source WCH-Link library/command line tool written in Rust.

Simple CLI tool to create dummy accounts with referral links to give yourself free Plus.

Tool to convert variable and function names in C/C++ source code to snake_case

OP-Up is a hive tool for testing OP-Stack-compatible software modules

Easy c̵̰͠r̵̛̠ö̴̪s̶̩̒s̵̭̀-t̶̲͝h̶̯̚r̵̺͐e̷̖̽ḁ̴̍d̶̖̔ ȓ̵͙ė̶͎ḟ̴͙e̸̖͛r̶̖͗ë̶̱́ṉ̵̒ĉ̷̥e̷͚̍ s̷̹͌h̷̲̉a̵̭͋r̷̫̊ḭ̵̊n̷̬͂g̵̦̃ f̶̻̊ơ̵̜ṟ̸̈́ R̵̞̋ù̵̺s̷̖̅ţ̸͗!̸̼͋

A project management tool for data science and bioinformatics. If you want it, Kerblam it!

Taking the best of Substrate Recipes and applying them to a new framework for structuring a collection of how-to guides.

Book - Actix user guides

A simple made in Rust crack, automatic for Winrar, activated from shared virtual memory, for studies.

An interactive Bayesian Probability Calculator CLI that guides users through updating beliefs based on new evidence.

some AV / EDR / analysis studies

This repository brings together my studies in the Rust language.

This library provides implementations of many algorithms and data structures that are useful for bioinformatics.

Bioinformatics plugin for nushell.

A free and open-source DNA Sequencing/Visualization software for bioinformatics research.

Comments

range end index 20 out of range for slice of length 14

Update clap to 3.0 rc9

Add a new output with correlations of guide counts between samples

Releases(v0.1.3)

v0.1.3(Mar 22, 2022)

v0.1.2(Dec 29, 2021)

v0.1.1(Dec 29, 2021)

v0.1.0(Dec 28, 2021)

Owner

Fulcrum Genomics

RcLite: small, fast, and memory-friendly reference counting for Rust

A cli tool to write your idea in terminal

🎨🦀 A system information tool for Rustaceans

A tool to make grocery lists written in Rust

A tool to deserialize data from an input encoding, transform it and serialize it back into an output encoding.

Rust command-line tool to encrypt and decrypt files or directories with age

A tool to use docker / podman / oci containers with rust

A tool & library to help you with the compiler course.

rusty-riscy is a performance testing and system resource monitoring tool written in Rust to benchmark RISC-V processors.

Why `guide-counter`?