A utility to do detailed analysis of gzip files.

Overview

GZInspector

A robust command-line tool for inspecting and analyzing GZIP/ZLIB compressed files. GZInspector provides detailed information about compression chunks, headers, and content previews with support for both human-readable and JSON output formats.

Motivation

Most GZIP implementations discard chunk boundaries during decompression since they're typically irrelevant for the decompressed output. However, certain file formats leverage GZIP chunks as a core feature, allowing selective decompression of individual chunks when their byte offsets and lengths are known.

This chunked compression approach is particularly prevalent in web archiving formats, including:

These formats are actively used by major web archiving initiatives like CommonCrawl and the Internet Archive to manage and provide access to petabyte-scale web archives.

Features

  • πŸ“¦ Chunk-by-chunk analysis of GZIP files
  • πŸ“Š Detailed compression statistics and ratios
  • πŸ” Content preview capabilities
  • 🎯 Support for concatenated GZIP files
  • πŸ’Ύ Multiple output formats (human-readable and JSON)
  • πŸ“ Comprehensive header information including timestamps and flags
  • πŸ”„ Automatic encoding detection and handling

Installation

Using Rust Cargo

cargo install gzinspector

Pre-built Binary (Linux)

To install the pre-built binary for Linux:

# Download the binary
# Download latest release from:
# https://github.com/jt55401/gzinspector/releases/latest
wget $(curl -s https://api.github.com/repos/jt55401/gzinspector/releases/latest | grep "browser_download_url.*tar\.gz" | cut -d '"' -f 4)

# Or browse all releases at:
# https://github.com/jt55401/gzinspector/releases

# Extract the binary
tar -xzf gzinspector-linux-x86_64.tar.gz

# Move the binary to a directory in your PATH
sudo mv gzinspector /usr/local/bin/

From Source

To install GZInspector from source, you'll need Rust and Cargo installed on your system. Then:

# Clone the repository
git clone https://github.com/jt55401/gzinspector.git

# Build the project
cd gzinspector
cargo build --release

# The binary will be available at target/release/gzinspector

Usage

gzinspector [OPTIONS] <FILE>

Options

  • -o, --output-format <FORMAT>: Output format (human or json) [default: human]
  • -p, --preview <PREVIEW>: Preview content (format: HEAD:TAIL, e.g. '5:3' shows first 5 and last 3 lines)
  • -c, --chunks <CHUNKS>: Only show first and last N chunks (format: HEAD:TAIL, e.g. '5:3' shows first 5 and last 3)
  • -e, --encoding <ENCODING>: Encoding for preview [default: utf-8]
  • -h, --help: Display help information
  • -V, --version: Display version information

Examples

Basic file inspection:

gzinspector example.gz

Show JSON output:

gzinspector -o json example.gz

Preview content (first 5 lines and last 3 lines):

gzinspector -p 5:3 example.gz

Output Format

Human-readable Output

The human-readable output includes:

πŸ“¦ #1    β”‚ πŸ“ 0         β”‚ πŸ”“ 2.5x β”‚ πŸ“₯ 1.2KB   β”‚ πŸ“€ 3.0KB   β”‚ ℹ️  deflate|EXTRA|NAME|example.txt

Where:

  • πŸ“¦ #N: Chunk number
  • πŸ“: Offset in file
  • πŸ”“/πŸ”’: Compression ratio (with direction indicator)
  • πŸ“₯: Compressed size
  • πŸ“€: Uncompressed size
  • ℹ️: Header information

JSON Output

JSON output provides detailed information in a machine-readable format:

{
  "chunk_number": 1,
  "offset": 0,
  "compressed_size": 1234,
  "uncompressed_size": 3000,
  "compression_ratio": 2.43,
  "header_info": "deflate|EXTRA|NAME|example.txt"
}

File Summary

Both output formats include a summary showing:

  • Total number of chunks
  • Total compressed size
  • Total uncompressed size
  • Average compression ratio

Dependencies

  • flate2: GZIP/ZLIB compression library
  • serde: Serialization framework
  • clap: Command line argument parsing
  • chrono: Date and time functionality
  • crc32fast: CRC32 checksum calculation

Building from Source

  1. Ensure you have Rust installed (1.56.0 or later)
  2. Clone the repository
  3. Run cargo build --release

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Jason Grey ([email protected])

Version History

  • 0.1.0: Initial release

    • Basic GZIP file inspection
    • Human-readable and JSON output formats
    • Content preview functionality
  • 0.2.0: Chunks release

    • Ability to show first N and last N chunks of the file
    • Shows progress bar during tail scan of large files
You might also like...
Captures packets and streams them to other devices. Built for home network analysis and A&D CTFs.

🍩 shiny-donut shiny-donut is a packet capture app that supports streaming packets from a remote system to another device. The main use for this is to

Bam Error Stats Tool (best): analysis of error types in aligned reads.

best Bam Error Stats Tool (best): analysis of error types in aligned reads. best is used to assess the quality of reads after aligning them to a refer

Nodium is an easy-to-use data analysis and automation platform built using Rust, designed to be versatile and modular.
Nodium is an easy-to-use data analysis and automation platform built using Rust, designed to be versatile and modular.

Nodium is an easy-to-use data analysis and automation platform built using Rust, designed to be versatile and modular. Nodium aims to provide a user-friendly visual node-based interface for various tasks.

some AV / EDR / analysis studies
some AV / EDR / analysis studies

binary some AV / EDR / analysis related experiences fault_test: trigger a access violation, catch with a custom handler and continue the normal execut

Python package for topological data analysis written in Rust. Not limited to just H0 and H1.

Topological Data Analysis (TDA) Contents Installation Compiling from source Roadmap TDA is a python package for topological data analysis written in R

A tool that allow you to run SQL-like query on local files instead of database files using the GitQL SDK.
A tool that allow you to run SQL-like query on local files instead of database files using the GitQL SDK.

FileQL - File Query Language FileQL is a tool that allow you to run SQL-like query on local files instead of database files using the GitQL SDK. Sampl

A CLI utility to secretly copy secrets to your clipboard. πŸ¦€
A CLI utility to secretly copy secrets to your clipboard. πŸ¦€

seclip πŸ”’ πŸ“ A CLI utility to secretly copy secrets to your clipboard. πŸ¦€ Table of Contents Features Installation Usage Build From Source Contribution

A CLI utility installed as
A CLI utility installed as "ansi" to quickly get ANSI escape sequences. Supports the most basic ones, like colors and styles as bold or italic.

'ansi' - a CLI utility to quickly get ANSI escape codes This Rust project called ansi-escape-sequences-cli provides an executable called ansi which ca

Rust command line utility to quickly display useful secrets in a Kubernetes namespace
Rust command line utility to quickly display useful secrets in a Kubernetes namespace

kube-secrets This is a command line utility for quickly looking at secrets in a Kubernetes namespace that are typically looked at by humans. It specif

Comments
  • Feature: option for ugly, but more useful output

    Feature: option for ugly, but more useful output

    The default output is pretty IMHO, and easy to read - however, when passing output to tools or doing more detailed analysis of preview content, we need a better option.

    --no-chunk-headers
    --no-line-numbers
    --no-summary
    

    or

    --only-preview-lines - only output the preview lines (no line numbers, chunk boundaries, or summary)
    (must be used with --preview)
    
    opened by jt55401 0
  • Feature request: when using --chunks, don't scan entire file, scan only head and tail chunks

    Feature request: when using --chunks, don't scan entire file, scan only head and tail chunks

    doing --chunks 2:2 on a very large file is quite slow - this because we scan entire file forward, so we can maintain chunk numbers and overall stats.

    We should add a "--quick" mode, where we do not output chunk numbers, nor stats at the end, and then modify the logic to scan tail chunks from the back of the file. This should be VERY fast then...

    enhancement 
    opened by jt55401 0
Releases(v0.2.4)
Owner
Jason Grey
Technology Leader, Innovator & Inventor, Engineer & Architect
Jason Grey
This tool was developed as part of a course on forensic analysis and cybersecurity. It is intended to be used as a training resource to help students understand the structure and content of job files in Windows environments.

Job File Parser Job File Parser is a Rust-based tool designed for parsing both legacy binary job files and modern XML job files used by the Windows Ta

Mehrnoush 3 Aug 12, 2024
mdBook is a utility to create modern online books from Markdown files.

Create book from markdown files. Like Gitbook but implemented in Rust

The Rust Programming Language 11.6k Jan 4, 2023
fixred is a command line utility to fix outdated links in files with redirect URLs.

fixred fixred is a command line utility to fix outdated links in files with redirect URLs. Installation fixred is installed via cargo package manager.

Linda_pp 35 Aug 6, 2022
This utility traverses through your filesystem looking for open-source dependencies that are seeking donations by parsing README.md and FUNDING.yml files

This utility traverses through your filesystem looking for open-source dependencies that are seeking donations by parsing README.md and FUNDING.yml files

Mufeed VH 38 Dec 30, 2022
Remote-Archive is a utility for exploring remote archive files without downloading the entire contents of the archive.

[WIP] REMOTE-ARCHIVE Remote-Archive is a utility for exploring remote archive files without downloading the entire contents of the archive. The idea b

null 4 Nov 7, 2022
A utility written in Rust for dumping binary information out of Mach-O files inspired by objdump

Mach-O Dump (macho-dump) An objdump like tool for exploring and manipulating Mach-O files. Note: This project is in an early stage and most of the fea

Irides 2 Oct 17, 2022
This CLI utility facilitates effortless manipulation and exploration of TOML, YAML, JSON and RON files.

???????? This CLI utility facilitates effortless manipulation and exploration of TOML, YAML, JSON and RON files.

Moe 3 Apr 26, 2023
Shellcheck - a static analysis tool for shell scripts

ShellCheck - A shell script static analysis tool ShellCheck is a GPLv3 tool that gives warnings and suggestions for bash/sh shell scripts: The goals o

Vidar Holen 31.1k Jan 9, 2023
Oxygen is a voice journal and audio analysis toolkit for people who want to change the way their voice comes across.

Oxygen Voice Journal Oxygen is a voice journal and audio analysis toolkit for people who want to change the way their voice comes across. Or rather, i

Jocelyn Stericker 32 Oct 20, 2022
A modern high-performance open source file analysis library for automating localization tasks

?? Filecount Filecount is a modern high-performance open source file analysis library for automating localization tasks. It enables you to add file an

Babblebase 4 Nov 11, 2022