A robust command-line tool for inspecting and analyzing GZIP/ZLIB compressed files. GZInspector provides detailed information about compression chunks, headers, and content previews with support for both human-readable and JSON output formats.
Most GZIP implementations discard chunk boundaries during decompression since they're typically irrelevant for the decompressed output. However, certain file formats leverage GZIP chunks as a core feature, allowing selective decompression of individual chunks when their byte offsets and lengths are known.
This chunked compression approach is particularly prevalent in web archiving formats, including:
- WARC, WET, WAT files used by web archives to store crawled content
- CDX/J and ZipNum encoded CDX files that enable efficient index lookups
These formats are actively used by major web archiving initiatives like CommonCrawl and the Internet Archive to manage and provide access to petabyte-scale web archives.
- π¦ Chunk-by-chunk analysis of GZIP files
- π Detailed compression statistics and ratios
- π Content preview capabilities
- π― Support for concatenated GZIP files
- πΎ Multiple output formats (human-readable and JSON)
- π Comprehensive header information including timestamps and flags
- π Automatic encoding detection and handling
cargo install gzinspector
To install the pre-built binary for Linux:
# Download the binary
# Download latest release from:
# https://github.com/jt55401/gzinspector/releases/latest
wget $(curl -s https://api.github.com/repos/jt55401/gzinspector/releases/latest | grep "browser_download_url.*tar\.gz" | cut -d '"' -f 4)
# Or browse all releases at:
# https://github.com/jt55401/gzinspector/releases
# Extract the binary
tar -xzf gzinspector-linux-x86_64.tar.gz
# Move the binary to a directory in your PATH
sudo mv gzinspector /usr/local/bin/
To install GZInspector from source, you'll need Rust and Cargo installed on your system. Then:
# Clone the repository
git clone https://github.com/jt55401/gzinspector.git
# Build the project
cd gzinspector
cargo build --release
# The binary will be available at target/release/gzinspector
gzinspector [OPTIONS] <FILE>
-o, --output-format <FORMAT>
: Output format (human or json) [default: human]-p, --preview <PREVIEW>
: Preview content (format: HEAD:TAIL, e.g. '5:3' shows first 5 and last 3 lines)-c, --chunks <CHUNKS>
: Only show first and last N chunks (format: HEAD:TAIL, e.g. '5:3' shows first 5 and last 3)-e, --encoding <ENCODING>
: Encoding for preview [default: utf-8]-h, --help
: Display help information-V, --version
: Display version information
Basic file inspection:
gzinspector example.gz
Show JSON output:
gzinspector -o json example.gz
Preview content (first 5 lines and last 3 lines):
gzinspector -p 5:3 example.gz
The human-readable output includes:
π¦ #1 β π 0 β π 2.5x β π₯ 1.2KB β π€ 3.0KB β βΉοΈ deflate|EXTRA|NAME|example.txt
Where:
- π¦ #N: Chunk number
- π: Offset in file
- π/π: Compression ratio (with direction indicator)
- π₯: Compressed size
- π€: Uncompressed size
- βΉοΈ: Header information
JSON output provides detailed information in a machine-readable format:
{
"chunk_number": 1,
"offset": 0,
"compressed_size": 1234,
"uncompressed_size": 3000,
"compression_ratio": 2.43,
"header_info": "deflate|EXTRA|NAME|example.txt"
}
Both output formats include a summary showing:
- Total number of chunks
- Total compressed size
- Total uncompressed size
- Average compression ratio
flate2
: GZIP/ZLIB compression libraryserde
: Serialization frameworkclap
: Command line argument parsingchrono
: Date and time functionalitycrc32fast
: CRC32 checksum calculation
- Ensure you have Rust installed (1.56.0 or later)
- Clone the repository
- Run
cargo build --release
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
Jason Grey ([email protected])
-
0.1.0: Initial release
- Basic GZIP file inspection
- Human-readable and JSON output formats
- Content preview functionality
-
0.2.0: Chunks release
- Ability to show first N and last N chunks of the file
- Shows progress bar during tail scan of large files