Pure Rust bzip2 decoder

Overview

bzip2-rs

crates.io Documentation dependency status Rustc Version 1.34.2+ CI

Pure Rust 100% safe bzip2 decompressor.

Features

  • Default features: Rust >= 1.34.2 is supported
  • rustc_1_37: bump MSRV to 1.37, enable more optimizations
  • nightly: require Rust Nightly, enable more optimizations

Usage

use std::fs::File;
use std::io;
use bzip2_rs::DecoderReader;

let mut compressed_file = File::open("input.bz2")?;
let mut decompressed_output = File::create("output")?;

let mut reader = DecoderReader::new(compressed_file);
io::copy(&mut reader, &mut decompressed_output)?;

Upcoming features

  • parallel decoding support (similar to pbzip2)
  • bzip2 encoding support
  • no_std support (is anybody interested with this?)

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you shall be dual licensed as above, without any additional terms or conditions.

Comments
  • PositionalMTFEncoder

    PositionalMTFEncoder

    Description

    • Takes a buffer, bwt pointers into that buffer, and a boolean array of used bytes
    • Converts the (used) bytes to their respective positions (0..count_bytes)
    • Then applies MTF on these bytes

    Tests

    • [x] distinct bytes small
    • [x] repeating bytes small
    • [x] repeating bytes large
    opened by GaurangTandon 2
  • Fuzzing: Attempt to add with overflow and/or invalid ArrayVec::set_len

    Fuzzing: Attempt to add with overflow and/or invalid ArrayVec::set_len

    I'm attaching the full original reproducer, which crashes at:

    thread '<unnamed>' panicked at 'attempt to add with overflow', /home/ben/bzip2-rs/src/block/mod.rs:255:17
    

    It appears to minimize into a different crash, which is very interesting. You should be able to tar xf this file in the crate root and get a file in fuzz/artifacts/decompress: crash-14f4cbb67ac1b1daa7493646937fe87be63f72e1.tar.gz

    opened by saethlin 1
  • BlockError: `huffman bitstream truncated`

    BlockError: `huffman bitstream truncated`

    This error occurs when I'm decoding an Android update payload. Dump file (not actually zipped, change extension to .bz2): dump.zip

    let mut decoder = bzip2_rs::DecoderReader::new(input?);
    // The next statement returns BlockError { reason: "huffman bitstream truncated" }
    let copied = std::io::copy(&mut decoder, &mut output);
    

    bzip2 = "0.4" decoded this file successfully.

    opened by KiruyaMomochi 0
  • internal error when reading into empty buffer

    internal error when reading into empty buffer

    Code:

    use std::io::Read;
    
    fn main() {
        let data: &[u8] = &[
            66, 90, 104, 49, 49, 65, 89, 38, 83, 89, 86, 93, 15, 22, 0, 194, 220, 127, 255, 255, 239,
            235, 55, 189, 28, 217, 67, 255, 13, 24, 139, 49, 59, 25, 166, 202, 27, 124, 186, 57, 240,
            188, 234, 152, 81, 186, 48, 211, 249, 46, 205, 48, 1, 108, 16, 21, 53, 51, 72, 210, 30,
            166, 152, 154, 104, 211, 65, 163, 35, 17, 234, 26, 15, 80, 200, 26, 13, 0, 211, 65, 160, 6,
            71, 162, 100, 26, 49, 54, 160, 194, 122, 154, 50, 102, 153, 34, 12, 38, 38, 76, 4, 192,
            152, 154, 52, 196, 96, 0, 152, 4, 105, 145, 163, 1, 52, 201, 128, 76, 140, 0, 0, 4, 96,
            154, 48, 16, 36, 84, 72, 4, 209, 166, 38, 153, 30, 147, 4, 102, 144, 200, 193, 24, 19, 0,
            153, 25, 25, 61, 17, 166, 35, 9, 137, 163, 17, 128, 70, 134, 35, 70, 38, 77, 53, 210, 128,
            0, 11, 218, 9, 198, 72, 139, 87, 8, 55, 171, 29, 60, 156, 202, 38, 115, 65, 207, 110, 253,
            156, 54, 32, 24, 16, 92, 24, 58, 107, 148, 0, 221, 115, 114, 136, 5, 240, 69, 196, 208,
            162, 128, 218, 64, 136, 173, 9, 32, 74, 6, 36, 134, 196, 8, 77, 164, 129, 3, 96, 144, 8,
            27, 4, 146, 64, 132, 216, 129, 36, 36, 132, 12, 96, 144, 2, 27, 16, 128, 171, 105, 20, 45,
            5, 43, 18, 159, 96, 16, 180, 218, 237, 48, 41, 36, 130, 33, 8, 54, 64, 3, 35, 65, 32, 219,
            133, 149, 50, 112, 19, 110, 35, 191, 106, 173, 226, 177, 86, 70, 138, 90, 56, 148, 153, 48,
            24, 172, 16, 2, 16, 2, 143, 101, 63, 15, 3, 2, 62, 173, 70, 206, 244, 128, 122, 218, 234,
            51, 184, 201, 13, 216, 70, 246, 8, 6, 167, 41, 46, 130, 6, 42, 19, 238, 210, 133, 226, 16,
            76, 10, 153, 74, 157, 183, 142, 72, 34, 224, 82, 28, 30, 65, 62, 96, 14, 234, 35, 204, 39,
            6, 60, 252, 129, 215, 48, 81, 205, 59, 74, 25, 47, 160, 234, 65, 40, 2, 247, 161, 1, 174,
            241, 61, 209, 99, 78, 66, 154, 222, 203, 130, 152, 208, 173, 92, 137, 167, 200, 143, 170,
            39, 225, 51, 245, 121, 135, 0, 221, 133, 42, 160, 22, 173, 49, 122, 70, 62, 75, 86, 210,
            89, 207, 43, 150, 138, 231, 54, 194, 88, 10, 75, 109, 244, 69, 248, 90, 78, 165, 207, 109,
            54, 181, 69, 123, 52, 81, 132, 83, 9, 12, 104, 26, 173, 36, 31, 164, 139, 151, 175, 16, 43,
            227, 233, 9, 9, 253, 147, 252, 125, 57, 82, 221, 162, 77, 162, 163, 170, 68, 77, 232, 170,
            214, 171, 172, 98, 72, 203, 73, 123, 52, 220, 90, 150, 157, 122, 236, 191, 204, 80, 86, 73,
            148, 214, 104, 151, 21, 200, 0, 40, 38, 112, 0, 24, 0, 8, 0, 28, 8, 2, 145, 166, 153, 32,
            165, 180, 5, 27, 10, 170, 137, 178, 33, 33, 154, 162, 83, 9, 84, 170, 218, 41, 36, 57, 138,
            10, 201, 50, 154, 205, 23, 67, 2, 56, 1, 112, 46, 0, 3, 200, 1, 128, 5, 177, 12, 208, 9,
            169, 64, 20, 148, 24, 173, 75, 84, 77, 160, 174, 225, 83, 152, 21, 181, 6, 200, 164, 216,
            146, 218, 132, 109, 72, 13, 164, 85, 180, 133, 181, 33, 178, 1, 109, 42, 13, 165, 32, 238,
            160, 35, 152, 170, 132, 218, 42, 13, 136, 136, 107, 115, 47, 110, 230, 138, 170, 133, 181,
            241, 119, 36, 83, 133, 9, 11, 242, 79, 37, 240,
        ];
        let mut decoder = bzip2_rs::DecoderReader::new(data);
        decoder.read(&mut []).unwrap();
    }
    

    This gives the error:

    thread 'main' panicked at 'internal error: entered unreachable code', C:\Users\bruno\.cargo\registry\src\github.com-1ecc6299db9ec823\bzip2-rs-0.1.2\src\decoder\reader.rs:67:50
    stack backtrace:
       0: std::panicking::begin_panic_handler
                 at /rustc/ca122c7ebb3ab50149c9d3d24ddb59c252b32272/library\std\src\panicking.rs:584
       1: core::panicking::panic_fmt
                 at /rustc/ca122c7ebb3ab50149c9d3d24ddb59c252b32272/library\core\src\panicking.rs:142
       2: core::panicking::panic
                 at /rustc/ca122c7ebb3ab50149c9d3d24ddb59c252b32272/library\core\src\panicking.rs:48
       3: bzip2_rs::decoder::reader::impl$1::read<slice$<u8> >
                 at C:\Users\bruno\.cargo\registry\src\github.com-1ecc6299db9ec823\bzip2-rs-0.1.2\src\decoder\reader.rs:67
       4: my_module::main
                 at .\src\main.rs:38
       5: core::ops::function::FnOnce::call_once<void (*)(),tuple$<> >
                 at /rustc/ca122c7ebb3ab50149c9d3d24ddb59c252b32272\library\core\src\ops\function.rs:248
    

    I also tried this against the main branch, where it does not panic but instead enters an infinite loop.

    opened by mejrs 0
  • Possible memory Leak

    Possible memory Leak

    Hi,

    I used this library to process a bzip2 encoded Wikidata dump (70GB+) and was observing that memory was not being released.

    Initially my suspicion was on the default BufferedReader lines iterator, however in the simple approach of reading lines reusing a mutable buffer the issue persisted.

    A simple switch to this alternative: https://github.com/alexcrichton/bzip2-rs has solved the problem (memory peaks to a few MBs), hence I suspect the memory leak is in this library.

    Cheers,

    Stefano

    opened by dipstef 3
  • Feature Request: CLI Support

    Feature Request: CLI Support

    Hey!

    Saw this library and was intrigued by the possibility of making it available for Julia language users to decompress bzip2 files in a consistent way across all platforms supported by the language. The packaging infrastructure, etc. is quite well established and it shouldn't be much trouble for us to generate binaries via the community's Yggdrasil packaging infrastructure, but the one thing that seems to be missing is CLI support. Is this something that might be on your roadmap?

    Cheers, Jeremiah

    opened by jeremiahpslewis 1
  • Add support for seek-bzip style indexing and decompressing individual blocks

    Add support for seek-bzip style indexing and decompressing individual blocks

    Bzip2 files can be randomly accessed if you can index which compressed bit offsets map to which uncompressed byte offsets.

    It seems that the original seek-bzip was on BitBucket and was taken down upon the author's untimely death last year. But there is a fork on GitHub: https://github.com/cscott/seek-bzip and also a JavaScript implementation: https://github.com/galaxyproject/seek-bzip2

    I've tried looking at the code but I'm too new to rust. I guess it can already decompress a single block or a range of blocks pretty trivially. I was looking at the test code in block/mod.rs

    For supporting bzip-table, we need to do most of the decompression of each block the output the a mapping from the bit offset where each block begins in the compressed file, to either the byte offset of the length in bytes of that block in the raw file. Since we're throwing away the decompressed blocks as we're only interested in their lengths, there's probably a shortcut possible where we don't have to fully decompress the block.

    opened by hippietrail 3
Owner
Paolo Barbolini
Student and freelancer | core @lettre
Paolo Barbolini
banzai: pure rust bzip2 encoder

banzai banzai is a bzip2 encoder with linear-time complexity, written entirely in safe Rust. It is currently alpha software, which means that it is no

Jack Byrne 27 Oct 24, 2022
libbz2 (bzip2 compression) bindings for Rust

bzip2 Documentation A streaming compression/decompression library for rust with bindings to libbz2. # Cargo.toml [dependencies] bzip2 = "0.4" License

Alex Crichton 67 Dec 27, 2022
A Brotli implementation in pure and safe Rust

Brotli-rs - Brotli decompression in pure, safe Rust Documentation Compression provides a <Read>-struct to wrap a Brotli-compressed stream. A consumer

Thomas Pickert 59 Oct 7, 2022
A simple rust library to read and write Zip archives, which is also my pet project for learning Rust

rust-zip A simple rust library to read and write Zip archives, which is also my pet project for learning Rust. At the moment you can list the files in

Kang Seonghoon 2 Jan 5, 2022
Brotli compressor and decompressor written in rust that optionally avoids the stdlib

rust-brotli What's new in 3.2 into_inner conversions for both Reader and Writer classes What's new in 3.0 A fully compatible FFI for drop-in compatibi

Dropbox 659 Dec 29, 2022
A Rust implementation of the Zopfli compression algorithm.

Zopfli in Rust This is a reimplementation of the Zopfli compression tool in Rust. I have totally ignored zopflipng. More info about why and how I did

Carol (Nichols || Goulding) 76 Oct 20, 2022
DEFLATE, gzip, and zlib bindings for Rust

flate2 A streaming compression/decompression library DEFLATE-based streams in Rust. This crate by default uses the miniz_oxide crate, a port of miniz.

The Rust Programming Language 619 Jan 8, 2023
Snappy bindings for Rust

Snappy [ Originally forked from https://github.com/thestinger/rust-snappy ] Documentation Usage Add this to your Cargo.toml: [dependencies] snappy = "

Jeff Belgum 14 Jan 21, 2022
Tar file reading/writing for Rust

tar-rs Documentation A tar archive reading/writing library for Rust. # Cargo.toml [dependencies] tar = "0.4" Reading an archive extern crate tar; use

Alex Crichton 490 Dec 30, 2022
Zip implementation in Rust

zip-rs Documentation Info A zip library for rust which supports reading and writing of simple ZIP files. Supported compression formats: stored (i.e. n

null 549 Jan 4, 2023
Like pigz, but rust - a cross platform, fast, compression and decompression tool.

?? crabz Like pigz, but rust. A cross platform, fast, compression and decompression tool. Synopsis This is currently a proof of concept CLI tool using

Seth 232 Jan 2, 2023
A reimplementation of the Zopfli compression tool in Rust.

Zopfli in Rust This is a reimplementation of the Zopfli compression tool in Rust. Carol Nichols started the Rust implementation as an experiment in in

null 11 Dec 26, 2022
A Rust application that compress files and folders

Quick Storer This is a Rust application that compress files and folders. Usage Download or build the binary and place it on your desktop, or any other

AL68 & co. 1 Feb 2, 2022
lzlib (lzip compression) bindings for Rust

lzip Documentation A streaming compression/decompression library for rust with bindings to lzlib. # Cargo.toml [dependencies] lzip = "0.1" License Lic

Firas Khalil Khana 8 Sep 20, 2022
An extremely fast alternative to zip which is written in rust.

Zap Compress and/or encrypt folders fast. Like, really fast. or as some say, blazingly fast. Installation To install Zap, run the following command fr

null 39 Dec 23, 2022
An extremely fast alternative to zip which is written in rust.

Zap Compress and/or encrypt folders fast. Like, really fast. or as some say, blazingly fast. Installation To install Zap, run the following command fr

null 37 Nov 9, 2022
Simple NoNG songs manager for GD, written in Rust.

nong-manager Simple NoNG songs manager for GD, written in Rust. Powered by Song File Hub (https://songfilehub.com/home) How to use Enter song ID that

Alexander Simonov 4 May 13, 2023
Basic (and naïve) LZW and Huffman compression algorithms in Rust.

Naive implementation of the LZW and Huffman compression algorithms. To run, install the Rust toolchain. Cargo may be used to compile the source. Examp

Luiz Felipe Gonçalves 9 May 22, 2023
Ribzip2 - A bzip2 implementation in pure Rust.

ribzip2 - a comprehensible bzip2 implementation ribzip2 is command line utility providing bzip2 compression and decompression written in pure Rust. It

null 16 Oct 24, 2022
banzai: pure rust bzip2 encoder

banzai banzai is a bzip2 encoder with linear-time complexity, written entirely in safe Rust. It is currently alpha software, which means that it is no

Jack Byrne 27 Oct 24, 2022