libbz2 (bzip2 compression) bindings for Rust

Alex Crichton

Last update: Dec 27, 2022

Related tags

Compression bzip2-rs

Overview

bzip2

Documentation

A streaming compression/decompression library for rust with bindings to libbz2.

# Cargo.toml
[dependencies]
bzip2 = "0.4"

License

This project is licensed under either of

Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this repository by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Comments

Stuck busy looping

First please take whit a grain of salt, and assume I might be doing something wrong, or the data is corrupted.

I am doing long running rdedup stressttest I caught it hanged, using 100% of one of the CPUs. I was able to attach gdb and turns out it looping on a drop/finish in this code of mine:

  8     fn decompress(&self, buf: SGData) -> io::Result<SGData> {                                                       
  7         let mut decompressor =                                                                                      
  6             bzip2::write::BzDecoder::new(Vec::with_capacity(buf.len()));                                            
  5                                                                                                                     
  4         for sg_part in buf.as_parts() {                                                                             
  3             decompressor.write_all(&sg_part)?;                                                                      
  2         }                                                                                                           
  1         Ok(SGData::from_single(decompressor.finish()?))                                                             
114     }

Note that 114 might be off, as I might have added/removed a line or two. I'm guessing it is finish() call.

#0  bzip2::write::{{impl}}::write<collections::vec::Vec<u8>> (self=0x7fff68365488, data=...)                            
    at /home/dpc/.cargo/registry/src/github.com-1ecc6299db9ec823/bzip2-0.3.2/src/write.rs:254                           
#1  0x00005567a4127d20 in core::ptr::drop_in_place<bzip2::write::BzDecoder<collections::vec::Vec<u8>>> ()               
    at /home/dpc/.cargo/registry/src/github.com-1ecc6299db9ec823/bzip2-0.3.2/src/write.rs:219                           
#2  0x00005567a4127952 in rdedup_lib::compression::{{impl}}::decompress (self=<optimized out>, buf=...)                 
    at lib/src/compression.rs:114                           
#3  0x00005567a4153913 in rdedup_lib::{{impl}}::read_chunk_into (self=0x7fff6837fd38, digest=...,                       
    chunk_type=<optimized out>, data_type=rdedup_lib::DataType::Data, writer=...) at lib/src/lib.rs:352                 
#4  0x00005567a4151807 in rdedup_lib::ReadContext::read_recursively (self=0x7fff68367aa0, accessor=..., digest=...)     
    at lib/src/lib.rs:262

If I keep doing next in my gdb, I get:

core::ptr::drop_in_place<bzip2::write::BzDecoder<collections::vec::Vec<u8>>> ()                                         
    at /home/dpc/lab/rust/rdedup/lib/<try macros>:3         
3       /home/dpc/lab/rust/rdedup/lib/<try macros>: No such file or directory.                                          
(gdb)                         
218             while !self.done {                          
(gdb)                         
219                 try!(self.write(&[]));                  
(gdb)                         
3       /home/dpc/lab/rust/rdedup/lib/<try macros>: No such file or directory.                                          
(gdb)                         
218             while !self.done {                          
(gdb)                         
219                 try!(self.write(&[]));                  
(gdb) n                       
3       /home/dpc/lab/rust/rdedup/lib/<try macros>: No such file or directory.                                          
(gdb)                         
218             while !self.done {                          
(gdb)                         
219                 try!(self.write(&[]));                  
(gdb)                         
3       /home/dpc/lab/rust/rdedup/lib/<try macros>: No such file or directory.                                          
(gdb)                         
218             while !self.done {                          
(gdb)                         
219                 try!(self.write(&[]));                  
(gdb)                         
3       /home/dpc/lab/rust/rdedup/lib/<try macros>: No such file or directory.

If I do step, I get:

3       /home/dpc/lab/rust/rdedup/lib/<try macros>: No such file or directory.                                          
(gdb) s                       
218             while !self.done {                          
(gdb)                         
219                 try!(self.write(&[]));                  
(gdb)                         
bzip2::write::{{impl}}::write<collections::vec::Vec<u8>> (self=0x7fff68365488, data=...)                                
    at /home/dpc/.cargo/registry/src/github.com-1ecc6299db9ec823/bzip2-0.3.2/src/write.rs:254                           
254             if self.done {
(gdb)                         
258                 try!(self.dump());                      
(gdb)                         
258                 try!(self.dump());                      
(gdb)                         
bzip2::write::BzDecoder<collections::vec::Vec<u8>>::dump<collections::vec::Vec<u8>> (self=0x7fff68365488)               
    at /home/dpc/.cargo/registry/src/github.com-1ecc6299db9ec823/bzip2-0.3.2/src/write.rs:200                           
200             while self.buf.len() > 0 {                  
(gdb)                         
204             Ok(())        
(gdb)                         
205         }                 
(gdb)                         
bzip2::write::{{impl}}::write<collections::vec::Vec<u8>> (self=0x7fff68365488, data=...)                                
    at /home/dpc/lab/rust/rdedup/lib/<try macros>:3                                                                     
3       /home/dpc/lab/rust/rdedup/lib/<try macros>: No such file or directory.                                          
(gdb)                                                                                                                   
267             (self.raw.total_in_lo32 as u64) |                                                                       
(gdb)                                                                                                                   
268             ((self.raw.total_in_hi32 as u64) << 32)                                                                 
(gdb)                                                                                                                   
261                 let res = self.data.decompress_vec(data, &mut self.buf);                                            
(gdb)                                                                                                                   
bzip2::mem::Decompress::decompress_vec (self=0x7fff68365488, input=..., output=0x7fff683654a8)                          
    at /home/dpc/.cargo/registry/src/github.com-1ecc6299db9ec823/bzip2-0.3.2/src/mem.rs:243                             
243                 let before = self.total_out();                                                                      
(gdb)                                                                                                                   
bzip2::mem::Decompress::total_out (self=<optimized out>)                                                                
    at /home/dpc/.cargo/registry/src/github.com-1ecc6299db9ec823/bzip2-0.3.2/src/mem.rs:243                             
243                 let before = self.total_out();                                                                      
(gdb)                         
bzip2::mem::Stream<bzip2::mem::DirDecompress>::total_out<bzip2::mem::DirDecompress> (self=<optimized out>)              
    at /home/dpc/.cargo/registry/src/github.com-1ecc6299db9ec823/bzip2-0.3.2/src/mem.rs:272                             
272             (self.raw.total_out_lo32 as u64) |          
(gdb)                         
273             ((self.raw.total_out_hi32 as u64) << 32)    
(gdb)                         
bzip2::mem::Decompress::decompress_vec (self=0x7fff68365488, input=..., output=0x7fff683654a8)                          
    at /home/dpc/.cargo/registry/src/github.com-1ecc6299db9ec823/bzip2-0.3.2/src/mem.rs:245                             
245                     let ptr = output.as_mut_ptr().offset(len as isize);                                             
(gdb)                         
core::ptr::{{impl}}::offset<u8> (count=<optimized out>, self=<optimized out>) at /checkout/src/libcore/ptr.rs:617       
617     /checkout/src/libcore/ptr.rs: No such file or directory.                                                        
(gdb)         
(...)

opened by dpc 11

Error trying to install crate bzip2-sys

I'm trying to install the zip crate that depends on bzip2 but I'm getting the following error.

Using Rust nightly 1.31.0 ("2018 edition")

Compiling bzip2-sys v0.1.6
error: failed to run custom build command for `bzip2-sys v0.1.6`
process didn't exit successfully: `C:\Dev\[project]\target\debug\build\bzip2-sys-33f8cb5c7dfcf9ea\build-script-build` (exit code: 101)
--- stdout
TARGET = Some("x86_64-pc-windows-msvc")
OPT_LEVEL = Some("0")
HOST = Some("x86_64-pc-windows-msvc")
CC_x86_64-pc-windows-msvc = None
CC_x86_64_pc_windows_msvc = None
HOST_CC = None
CC = Some("C:\\mingw-w64\\x86_64-7.3.0-posix-seh-rt_v5-rev0\\mingw64\\bin\\gcc.exe")
CFLAGS_x86_64-pc-windows-msvc = None
CFLAGS_x86_64_pc_windows_msvc = None
HOST_CFLAGS = None
CFLAGS = None
DEBUG = Some("true")
running: "C:\\mingw-w64\\x86_64-7.3.0-posix-seh-rt_v5-rev0\\mingw64\\bin\\gcc.exe" "-O0" "-ffunction-sections" "-fdata-sections" "-fPIC" "-g" "-fno-omit-frame-pointer" "-m64" "-I" "bzip2-1.0.6" "-D_WIN32" "-DBZ_EXPORT" "-DBZ_NO_STDIO" "/FoC:\\Dev\\[project]\\target\\debug\\build\\bzip2-sys-351245291296f0d0\\out\\bzip2-1.0.6\\blocksort.o" "/c" "bzip2-1.0.6/blocksort.c"
cargo:warning=gcc.exe: error: /FoC:\Dev\[project]\target\debug\build\bzip2-sys-351245291296f0d0\out\bzip2-1.0.6\blocksort.o: Invalid argument
cargo:warning=gcc.exe: error: /c: No such file or directory
exit code: 1

--- stderr
thread 'main' panicked at '

Internal error occurred: Command "C:\\mingw-w64\\x86_64-7.3.0-posix-seh-rt_v5-rev0\\mingw64\\bin\\gcc.exe" "-O0" "-ffunction-sections" "-fdata-sections" "-fPIC" "-g" "-fno-omit-frame-pointer" "-m64" "-I" "bzip2-1.0.6" "-D_WIN32" "-DBZ_EXPORT" "-DBZ_NO_STDIO" "/FoC:\\Dev\\[project]\\target\\debug\\build\\bzip2-sys-351245291296f0d0\\out\\bzip2-1.0.6\\blocksort.o" "/c" "bzip2-1.0.6/blocksort.c" with args "gcc.exe" did not execute successfully (status code exit code: 1).

', C:\Users\[user]\.cargo\registry\src\github.com-1ecc6299db9ec823\cc-1.0.25\src\lib.rs:2260:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.

opened by woubuc 8

Filesize mismatch when decompressing multi-stream with sizes greater than 2GB (2^31)

Here is a self contained reproducer of the problem: bzip_bug.zip

To reproduce:

unzip bzip_bug.zip
cd bzip_bug
cargo run --release --bin bzip_bug

You will see the following output:

$ cargo run --release --bin bzip_bug
   Compiling pkg-config v0.3.18
   Compiling libc v0.2.72
   Compiling cc v1.0.58
   Compiling bzip2-sys v0.1.9+1.0.8
   Compiling bzip2 v0.4.1
   Compiling bzip_bug v0.1.0 (/private/tmp/foo/bzip_bug)
    Finished release [optimized] target(s) in 9.08s
     Running `target/release/bzip_bug`
Generating expected results
Decompressing stream made with bzip2
Decompressing stream made with pbzip2
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `405900000`,
 right: `3000000000`: decompressed length mismatch', src/main.rs:37:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

The expected output is that the program complete's sucessfully

The test has two files:

raw.dat.bz2 is a 3GB generated file produced using the bzip2 tool on a mac
raw.dat.pbz2 is the same 3GB generated file produced using the pbzip2 tool on a mac

The data was generated using the generate.rs tool in the package, via the following commands:

$ cargo build --release --all
$ ./target/release/generate | nice bzip2 > raw.dat.bz2&
$ ./target/release/generate | nice pbzip2 > raw.dat.pbz2&

You can check the output byte counts using bzcat:

$ bzcat raw.dat.bz2 | wc -c
  3000000000

$ bzcat raw.dat.pbz2 | wc -c
 3000000000

The issue seems to affect files that are larger than 2^31 (which smells like a u32 overflow somewhere)

opened by alamb 6

Build failure at gcc level: blocksort.c not found

bzip2-sys seems to refuse to build due to missing blocksort.c (it is indeed not present in the related build directory).

   Compiling bzip2-sys v0.1.6
error: failed to run custom build command for `bzip2-sys v0.1.6`
process didn't exit successfully: `c:\Local\poligon\extract_blobs\target\release\build\bzip2-sys-9937a6f77c12f052\build-script-build` (exit code: 101)
--- stdout
TARGET = Some("x86_64-pc-windows-gnu")
OPT_LEVEL = Some("3")
TARGET = Some("x86_64-pc-windows-gnu")
HOST = Some("x86_64-pc-windows-gnu")
TARGET = Some("x86_64-pc-windows-gnu")
TARGET = Some("x86_64-pc-windows-gnu")
HOST = Some("x86_64-pc-windows-gnu")
CC_x86_64-pc-windows-gnu = Some("C:\\Local\\poligon\\.rustup\\toolchains\\nightly-x86_64-pc-windows-gnu\\lib\\rustlib\\x86_64-pc-windows-gnu\\bin\\gcc.exe")
TARGET = Some("x86_64-pc-windows-gnu")
HOST = Some("x86_64-pc-windows-gnu")
CFLAGS_x86_64-pc-windows-gnu = None
CFLAGS_x86_64_pc_windows_gnu = None
HOST_CFLAGS = None
CFLAGS = None
DEBUG = Some("false")
running: "C:\\Local\\poligon\\.rustup\\toolchains\\nightly-x86_64-pc-windows-gnu\\lib\\rustlib\\x86_64-pc-windows-gnu\\bin\\gcc.exe" "-O3" "-ffunction-sections" "-fdata-sections" "-m64" "-I" "bzip2-1.0.6" "-D_WIN32" "-DBZ_EXPORT" "-DBZ_NO_STDIO" "-o" "c:\\Local\\poligon\\extract_blobs\\target\\release\\build\\bzip2-sys-f19fa19517d295e2\\out\\bzip2-1.0.6\\blocksort.o" "-c" "bzip2-1.0.6/blocksort.c"
cargo:warning=gcc.exe: error: CreateProcess: No such file or directory
exit code: 1

I'm using rustc 1.28.0-nightly-gnu-x64 with Win 7.

opened by ljedrz 6

BzCompressor fails on empty buffer?

The Read implementation of BzCompressor can return an error when the underlying buffer return zero bytes: https://github.com/alexcrichton/bzip2-rs/blob/master/src/reader.rs#L67.

I believe this is now incorrect behaviour, as EOF is now signalled through a zero-byte read instead of an error. That's what I get from the docs, anyway: https://doc.rust-lang.org/std/io/trait.Read.html#tymethod.read

opened by marcusklaas 5

Decompression stops silently on byte #900000

I got this using this file: https://database.lichess.org/atomic/lichess_db_atomic_rated_2015-01.pgn.bz2 (I checked at least one other file, and it also stopped on 900000th byte) The following code:

use std::fs::File;
use std::io;

fn by_bzip2() -> std::io::Result<()> {
    let compressed_file = File::open("lichess_db_atomic_rated_2015-01.pgn.bz2")?;
    let mut decompressed_output = File::create("decoded_by_bzip2.pgn")?;

    let mut reader = bzip2::read::BzDecoder::new(compressed_file);
    io::copy(&mut reader, &mut decompressed_output)?;
    Ok(())
}

fn by_bzip2_rs() -> std::io::Result<()> {
    let compressed_file = File::open("lichess_db_atomic_rated_2015-01.pgn.bz2")?;
    let mut decompressed_output = File::create("decoded_by_bzip2-rs.pgn")?;

    let mut reader = bzip2_rs::DecoderReader::new(compressed_file);
    io::copy(&mut reader, &mut decompressed_output)?;
    Ok(())
}

fn main() {
    by_bzip2().expect("bzip2");
    by_bzip2_rs().expect("bzip2-rs");
}

returns 0 on exit, produces same files decoded_by_bzip2.pgn and decoded_by_bzip2-rs.pgn each of length exactly 900000 bytes, which are a prefix of a much larger file that my system decoder /usr/bin/bzip2 -dk lichess_db_atomic_rated_2015-01.pgn.bz2 produces. So the decoding in both bzip2 and bzip2-rs crates for some reason abruptly stops on this round number 900000 without printing any error message. Which looks like a bug in the crates, not in the file itself, since the file is handled correctly by Linux utilities.

opened by ygyzys 4

correct multistream decompression EOF handling (fix #61)

Still missing a good test case for this. I don't think the reproducer in #61 is a good candidate as it takes a long time to run. I should be able to reduce it to a smaller test in the next few days.

opened by afflux 4

Fix compilation on Wasm

Fix compilation on WebAssembly, as well as any other non-Windows/Unix/Redox targets.

Previously, the functions would be undefined on those targets and the crate would fail to compile with errors like:

error[E0425]: cannot find function, tuple struct or tuple variant `BZ2_bzCompressInit` in crate `ffi`
   --> src/mem.rs:124:22
    |
124 |                 ffi::BZ2_bzCompressInit(&mut *raw, lvl.level() as c_int, 0, work_factor as c_int),
    |                      ^^^^^^^^^^^^^^^^^^ not found in `ffi`

error[E0425]: cannot find function, tuple struct or tuple variant `BZ2_bzCompress` in crate `ffi`
   --> src/mem.rs:157:24
    |
157 |             match ffi::BZ2_bzCompress(&mut *self.inner.raw, action as c_int) {
    |                        ^^^^^^^^^^^^^^ not found in `ffi`

error[E0425]: cannot find function, tuple struct or tuple variant `BZ2_bzDecompressInit` in crate `ffi`
   --> src/mem.rs:215:29
    |
215 |             assert_eq!(ffi::BZ2_bzDecompressInit(&mut *raw, 0, small as c_int), 0);
    |                             ^^^^^^^^^^^^^^^^^^^^ not found in `ffi`

error[E0425]: cannot find function, tuple struct or tuple variant `BZ2_bzDecompress` in crate `ffi`
   --> src/mem.rs:232:24
    |
232 |             match ffi::BZ2_bzDecompress(&mut *self.inner.raw) {
    |                        ^^^^^^^^^^^^^^^^ not found in `ffi`

error[E0425]: cannot find function, tuple struct or tuple variant `BZ2_bzCompressEnd` in crate `ffi`
   --> src/mem.rs:312:14
    |
312 |         ffi::BZ2_bzCompressEnd(stream)
    |              ^^^^^^^^^^^^^^^^^ not found in `ffi`

error[E0425]: cannot find function, tuple struct or tuple variant `BZ2_bzDecompressEnd` in crate `ffi`
   --> src/mem.rs:317:14
    |
317 |         ffi::BZ2_bzDecompressEnd(stream)
    |              ^^^^^^^^^^^^^^^^^^^ not found in `ffi`

opened by RReverser 4

[bzip2-sys] use the system bzip2 if found via pkg-config

Similar approach to libz-sys: https://github.com/rust-lang/libz-sys/blob/master/build.rs

Signed-off-by: Michel Alexandre Salim [email protected]

This change is

opened by michel-slm 4

The bzip2-sys crate in crates.io seems not consist with the git version, please fix

This is the build.rs in crates.io with version 0.1.6.

extern crate cc;

use std::env;

fn main() {
    let mut cfg = cc::Build::new();
    cfg.warnings(false);

    if env::var("TARGET").unwrap().contains("windows") {
        cfg.define("_WIN32", None);
        cfg.define("BZ_EXPORT", None);
    }

    cfg.include("bzip2-1.0.6")
       .define("BZ_NO_STDIO", None)
       .file("bzip2-1.0.6/blocksort.c")
       .file("bzip2-1.0.6/huffman.c")
       .file("bzip2-1.0.6/crctable.c")
       .file("bzip2-1.0.6/randtable.c")
       .file("bzip2-1.0.6/compress.c")
       .file("bzip2-1.0.6/decompress.c")
       .file("bzip2-1.0.6/bzlib.c")
       .compile("libbz2.a");
}

because of this, the header bzlib.h does not exist in output dir.

link rust-bio/rust-htslib#115 here.

opened by zitsen 4

Update partial-io requirement to ^0.3.0
Updates the requirements on partial-io to permit the latest version.

Commits

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot will not automatically merge this PR because this dependency is pre-1.0.0.

Note: This repo was added to Dependabot recently, so you'll receive a maximum of 5 PRs for your first few update runs. Once an update run creates fewer than 5 PRs we'll remove that limit.

You can always request more updates by clicking Bump now in your Dependabot dashboard.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot merge will merge this PR after your CI passes on it

@dependabot ignore this [patch|minor|major] version will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot badge me will comment on this PR with code to add a "Dependabot enabled" badge to your readme

Additionally, you can set the following in your Dependabot dashboard:

Update frequency (including time of day and day of week)

Automerge options (never/patch/minor, and dev/runtime dependencies)

Pull request limits (per update run and/or open at any time)

Out-of-range updates (receive only lockfile updates, if desired)

Security updates (receive only security updates, if desired)

Finally, you can contact us by mentioning @dependabot.

dependencies
opened by dependabot-preview[bot] 4

Patched an infinite loop bug in src/mem.rs, impl Decompress::decompress()

Hello!

Version

stable-x86_64-unknown-linux-gnu (default)
rustc 1.66.0 (69f9c33d7 2022-12-12)

Problem

I came across an infinite loop bug when I'm trying to unzipping an ASCII file whose size is larger than 4GB.

The following code is an example of my usage:

    pub fn from_zip_file(file_path: String) -> Trace {
        let zip_file =
            fs::File::open(&file_path).expect(&format!("Could not open file {}", &file_path));
        let mut zip_archive = zip::ZipArchive::new(zip_file)
            .expect(&format!("Could not open archive {}", &file_path));

        println!("Trace::from_zip_file: unzipping {}", &file_path);
        let mut trace_file = zip_archive.by_index(0).unwrap();
        let trace_file_path = trace_file.sanitized_name().to_str().unwrap().to_string();
        println!("Trace::from_zip_file: unzipped {}", &file_path);

        let mut trace_content = String::new();
        trace_file
            .read_to_string(&mut trace_content)
            .expect(&format!("Could not read unzipped file {}", trace_file_path));
        println!("Trace::from_zip_file: read_to_string {}", &file_path);
    }

My program stuck at trace_file.read_to_string(). In order to investigate the problem, I analyzed the code carefully and find root cause in the bzip2 rust library.

Root Cause

The function read_to_string() indirectly invokes std::io::default_read_to_end(). The following is source of std::io::default_read_to_end():

// This uses an adaptive system to extend the vector when it fills. We want to
// avoid paying to allocate and zero a huge chunk of memory if the reader only
// has 4 bytes while still making large reads if the reader does have a ton
// of data to return. Simply tacking on an extra DEFAULT_BUF_SIZE space every
// time is 4,500 times (!) slower than a default reservation size of 32 if the
// reader has a very small amount of data to return.
pub(crate) fn default_read_to_end<R: Read + ?Sized>(r: &mut R, buf: &mut Vec<u8>) -> Result<usize> {
    let start_len = buf.len();
    let start_cap = buf.capacity();

    let mut initialized = 0; // Extra initialized bytes from previous loop iteration
    loop {
        if buf.len() == buf.capacity() {
            buf.reserve(32); // buf is full, need more space
        }

        let mut read_buf: BorrowedBuf<'_> = buf.spare_capacity_mut().into();

        // SAFETY: These bytes were initialized but not filled in the previous loop
        unsafe {
            read_buf.set_init(initialized);
        }

        let mut cursor = read_buf.unfilled();
        match r.read_buf(cursor.reborrow()) {
            Ok(()) => {}
            Err(e) if e.kind() == ErrorKind::Interrupted => continue,
            Err(e) => return Err(e),
        }

        if cursor.written() == 0 {
            return Ok(buf.len() - start_len);
        }

        // store how much was initialized but not filled
        initialized = cursor.init_ref().len();

        // SAFETY: BorrowedBuf's invariants mean this much memory is initialized.
        unsafe {
            let new_len = read_buf.filled().len() + buf.len();
            buf.set_len(new_len);
        }

        if buf.len() == buf.capacity() && buf.capacity() == start_cap {
            // The buffer might be an exact fit. Let's read into a probe buffer
            // and see if it returns `Ok(0)`. If so, we've avoided an
            // unnecessary doubling of the capacity. But if not, append the
            // probe buffer to the primary buffer and let its capacity grow.
            let mut probe = [0u8; 32];

            loop {
                match r.read(&mut probe) {
                    Ok(0) => return Ok(buf.len() - start_len),
                    Ok(n) => {
                        buf.extend_from_slice(&probe[..n]);
                        break;
                    }
                    Err(ref e) if e.kind() == ErrorKind::Interrupted => continue,
                    Err(e) => return Err(e),
                }
            }
        }
    }
}

The stdlib function std::io::default_read_to_end() double the buffer's capacity every time when it's full. As the buffer's initial capacity is 32, the capacity of the Vec buf will always be a power of two.

As my ASCII file to be unzipped is a little larger than 4GB (0x1_0000_0000h, 4294967296). The capacity of buffer will be extended to 8GB (0x2_0000_0000h, 8589934592).

In default_read_to_end(), the read_buf as BorrowedBuf is borrowed from the original buffer, indicating the start of spare space in buf.

There exists a time when read_buf.len() is exactly 0x1_0000_0000 and buf.len() is 0x2_0000_0000, which means first half part of buf is all unzipped data and last half part is available spare space for filling unzipped data.

The function call r.read_buf(cursor.reborrow()) indirectly calls Decompress::decompress(), whose source is listed below:

    /// Decompress a block of input into a block of output.
    pub fn decompress(&mut self, input: &[u8], output: &mut [u8]) -> Result<Status, Error> {
        self.inner.raw.next_in = input.as_ptr() as *mut _;
        self.inner.raw.avail_in = input.len() as c_uint;
        self.inner.raw.next_out = output.as_mut_ptr() as *mut _;
        self.inner.raw.avail_out = output.len() as c_uint;
        unsafe {
            match ffi::BZ2_bzDecompress(&mut *self.inner.raw) {
                ffi::BZ_OK => Ok(Status::Ok),
                ffi::BZ_MEM_ERROR => Ok(Status::MemNeeded),
                ffi::BZ_STREAM_END => Ok(Status::StreamEnd),
                ffi::BZ_PARAM_ERROR => Err(Error::Param),
                ffi::BZ_DATA_ERROR => Err(Error::Data),
                ffi::BZ_DATA_ERROR_MAGIC => Err(Error::DataMagic),
                ffi::BZ_SEQUENCE_ERROR => Err(Error::Sequence),
                c => panic!("wut: {}", c),
            }
        }
    }

At this point, the length of output is 0x1_0000_0000.

When casting to c_uint (uint32), output.len()'s high 32 bits are lost, so avail_out is zero when invoking C unzipping function.

In that case, no data will be extracted and function directly returns. However, parent functions keeps asking for unzipping the left datas. Thus endless loop occurred.

Patch

The following is my patch. self.inner.raw.avail_out 's assign logic is modified. When avail_out is exceeding the available range of c_uint, it is set to the max value of c_uint.

Other function may contain the same problem.

    /// Decompress a block of input into a block of output.
    pub fn decompress(&mut self, input: &[u8], output: &mut [u8]) -> Result<Status, Error> {
        self.inner.raw.next_in = input.as_ptr() as *mut _;
        self.inner.raw.avail_in = input.len() as c_uint;
        self.inner.raw.next_out = output.as_mut_ptr() as *mut _;
        self.inner.raw.avail_out = {
            let avail_out = output.len();
            if (avail_out > 0) && (avail_out & c_uint::MAX as usize == 0) {
                c_uint::MAX
            } else {
                avail_out as c_uint
            }
        };
        unsafe {
            match ffi::BZ2_bzDecompress(&mut *self.inner.raw) {
                ffi::BZ_OK => Ok(Status::Ok),
                ffi::BZ_MEM_ERROR => Ok(Status::MemNeeded),
                ffi::BZ_STREAM_END => Ok(Status::StreamEnd),
                ffi::BZ_PARAM_ERROR => Err(Error::Param),
                ffi::BZ_DATA_ERROR => Err(Error::Data),
                ffi::BZ_DATA_ERROR_MAGIC => Err(Error::DataMagic),
                ffi::BZ_SEQUENCE_ERROR => Err(Error::Sequence),
                c => panic!("wut: {}", c),
            }
        }
    }

opened by bjrjk 0

Way to know how many compressed bytes were read when decompressing
This is more for user feedback than anything, but I was looking for a way to know how many bytes of the compressed source had been read, rather than the number of decompressed bytes, so that I can get accurate ETA's.

For example:

let input = File::open(input.expect("Could not get path"))?; let size = input.metadata()?.len(); let reader = BufReader::new(input); let mut md = MultiBzDecoder::new(reader); let mut buffer = [0; BUFFER_LENGTH]; let mut n = md.read(&mut buffer)?;

n appears to me to be how many uncompressed bytes were read, How do I know how far along in the file I've read and how it compares to size?
opened by alexgagnon 0
Bzip2-sys does not support the value 0 for blockSize100k

We are directly passing Compression::level to BZ2_bzCompressInit

https://github.com/alexcrichton/bzip2-rs/blob/016e18155ef7c05983ea244cae1344c5b68defd8/src/mem.rs#L123-L126

However, this API only supports values from 1..=9 as documented here in the sources

Parameter blockSize100k specifies the block size to be used for compression. It should be a value between 1 and 9 inclusive, and the actual block size used is 100000 x this figure. 9 gives the best compression but takes most memory.

This bug seems to have been introduced in https://github.com/alexcrichton/bzip2-rs/pull/37, where we tried to support an API aligned with flate2::Compression. However, our version of Compression::none will always panic on the above assertion with BZ_PARAM_ERROR

https://github.com/alexcrichton/bzip2-rs/blob/016e18155ef7c05983ea244cae1344c5b68defd8/src/lib.rs#L100-L102

References

We had a panic in usage of zip over here :) https://github.com/zip-rs/zip/issues/326

Suggestions

As the library doesnt appear to support disabling compression, I would prefer none to be deprecated, and a bounds check to be added to Compression::new. Hopefully there's a clean way to provide the equivalent of flate2's none and we can specifically branch on 0?

opened by Plecra 0
build android fail
buid command: RUSTFLAGS='-C strip=symbols' cargo build -p 'lib_project' --release --target aarch64-linux-android RUSTFLAGS='-C strip=symbols' cargo build -p 'lib_project' --release --target armv7-linux-androideabi config file

[target.aarch64-linux-android] ar = "NDK/aarch64/bin/aarch64-linux-android-ar" linker = "NDK/aarch64/bin/aarch64-linux-android22-clang"

[target.armv7-linux-androideabi] ar = "NDK/armv7/bin/arm-linux-androideabi-ar" linker = "NDK/armv7/bin/armv7a-linux-androideabi22-clang"

code:

///zip compress fn compress_zip(bytes: &Vec<u8>, level: u32) -> anyhow::Result<Vec<u8>> { use bzip2::write::{BzEncoder}; let mut compressor = BzEncoder::new(vec![0x1, 0, 0, 0, 0], Compression::new(level.min(9))); compressor.write_all(bytes)?; let mut result = compressor.finish()?; Ok(result) } ///zip decompress bytes buffer fn depress_zip(bytes: &[u8], size: usize) -> anyhow::Result<Vec<u8>>{ let mut decompressor = bzip2::read::BzDecoder::new(std::io::Cursor::new(bytes)); let mut out_buffer = Vec::with_capacity(size); std::io::copy(&mut decompressor, &mut out_buffer)?; Ok(out_buffer) }

error msg:

error: failed to run custom build command for bzip2-sys v0.1.11+1.0.8

Caused by: process didn't exit successfully: /home/banagame/BattleServer/server/Rust/target/release/build/bzip2-sys-5c05721eef23dbb9/build-script-build (exit status: 1) --- stdout cargo:rerun-if-env-changed=BZIP2_NO_PKG_CONFIG cargo:rerun-if-env-changed=PKG_CONFIG_ALLOW_CROSS_armv7-linux-androideabi cargo:rerun-if-env-changed=PKG_CONFIG_ALLOW_CROSS_armv7_linux_androideabi cargo:rerun-if-env-changed=TARGET_PKG_CONFIG_ALLOW_CROSS cargo:rerun-if-env-changed=PKG_CONFIG_ALLOW_CROSS cargo:rerun-if-env-changed=PKG_CONFIG_armv7-linux-androideabi cargo:rerun-if-env-changed=PKG_CONFIG_armv7_linux_androideabi cargo:rerun-if-env-changed=TARGET_PKG_CONFIG cargo:rerun-if-env-changed=PKG_CONFIG cargo:rerun-if-env-changed=PKG_CONFIG_SYSROOT_DIR_armv7-linux-androideabi cargo:rerun-if-env-changed=PKG_CONFIG_SYSROOT_DIR_armv7_linux_androideabi cargo:rerun-if-env-changed=TARGET_PKG_CONFIG_SYSROOT_DIR cargo:rerun-if-env-changed=PKG_CONFIG_SYSROOT_DIR TARGET = Some("armv7-linux-androideabi") OPT_LEVEL = Some("3") HOST = Some("x86_64-unknown-linux-gnu") CC_armv7-linux-androideabi = None CC_armv7_linux_androideabi = None TARGET_CC = None CC = None CFLAGS_armv7-linux-androideabi = None CFLAGS_armv7_linux_androideabi = None TARGET_CFLAGS = None CFLAGS = None CRATE_CC_NO_DEFAULTS = None DEBUG = Some("false") running: "arm-linux-androideabi-clang" "-O3" "-DANDROID" "-ffunction-sections" "-fdata-sections" "-fPIC" "-I" "bzip2-1.0.8" "-D_FILE_OFFSET_BITS=64" "-DBZ_NO_STDIO" "-o" "/home/banagame/BattleServer/server/Rust/target/armv7-linux-androideabi/release/build/bzip2-sys-16baeea8c12f7af8/out/lib/bzip2-1.0.8/blocksort.o" "-c" "bzip2-1.0.8/blocksort.c"

--- stderr

error occurred: Failed to find tool. Is arm-linux-androideabi-clang installed?
opened by DrYaling 0
bz_internal_error exported
I develop rpm-sequoia, which is a Rust crate that implements rpm's OpenPGP API in terms of Sequoia PGP. (More details, unrelated to this issue, are described in this rpm issue).

rpm-sequoia should only export the rpm's PGP API. Unfortunately, a number of additional symbols are leaked by dependencies. Specifically, any symbols that have the #[no_mangle] attribute appear to be exported. bzip2-sys marks bz_internal_error like this, and thus it is exported by the library:

$ nm --defined-only --extern-only /tmp/rpm-sequoia/debug/librpm_sequoia.so | grep bz 0000000000361fa0 T bz_internal_error

In turns out that rpm-sequoia doesn't actually need compression support so I was able to workaround this issue by disabling that feature. But, this issue may trip up others.
opened by nwalfield 1
Implement tokio::io::AsyncBufRead for bufread::* types

Tokio not only provides the AsyncRead trait (mirroring std::io::Read but for async) but also AsyncBufRead (mirroring std::io::BufRead).

It would be great if it was implemented for the types in the bufread module if the tokio feature flag is enabled. Right now it appears that only AsyncRead is implemented in this case.

opened by d-e-s-o 1