libbz2 (bzip2 compression) bindings for Rust

Related tags

Compression bzip2-rs
Overview

bzip2

Documentation

A streaming compression/decompression library for rust with bindings to libbz2.

# Cargo.toml
[dependencies]
bzip2 = "0.4"

License

This project is licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this repository by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Comments
  • Stuck busy looping

    Stuck busy looping

    First please take whit a grain of salt, and assume I might be doing something wrong, or the data is corrupted.

    I am doing long running rdedup stressttest I caught it hanged, using 100% of one of the CPUs. I was able to attach gdb and turns out it looping on a drop/finish in this code of mine:

      8     fn decompress(&self, buf: SGData) -> io::Result<SGData> {                                                       
      7         let mut decompressor =                                                                                      
      6             bzip2::write::BzDecoder::new(Vec::with_capacity(buf.len()));                                            
      5                                                                                                                     
      4         for sg_part in buf.as_parts() {                                                                             
      3             decompressor.write_all(&sg_part)?;                                                                      
      2         }                                                                                                           
      1         Ok(SGData::from_single(decompressor.finish()?))                                                             
    114     }
    

    Note that 114 might be off, as I might have added/removed a line or two. I'm guessing it is finish() call.

    #0  bzip2::write::{{impl}}::write<collections::vec::Vec<u8>> (self=0x7fff68365488, data=...)                            
        at /home/dpc/.cargo/registry/src/github.com-1ecc6299db9ec823/bzip2-0.3.2/src/write.rs:254                           
    #1  0x00005567a4127d20 in core::ptr::drop_in_place<bzip2::write::BzDecoder<collections::vec::Vec<u8>>> ()               
        at /home/dpc/.cargo/registry/src/github.com-1ecc6299db9ec823/bzip2-0.3.2/src/write.rs:219                           
    #2  0x00005567a4127952 in rdedup_lib::compression::{{impl}}::decompress (self=<optimized out>, buf=...)                 
        at lib/src/compression.rs:114                           
    #3  0x00005567a4153913 in rdedup_lib::{{impl}}::read_chunk_into (self=0x7fff6837fd38, digest=...,                       
        chunk_type=<optimized out>, data_type=rdedup_lib::DataType::Data, writer=...) at lib/src/lib.rs:352                 
    #4  0x00005567a4151807 in rdedup_lib::ReadContext::read_recursively (self=0x7fff68367aa0, accessor=..., digest=...)     
        at lib/src/lib.rs:262    
    

    If I keep doing next in my gdb, I get:

    core::ptr::drop_in_place<bzip2::write::BzDecoder<collections::vec::Vec<u8>>> ()                                         
        at /home/dpc/lab/rust/rdedup/lib/<try macros>:3         
    3       /home/dpc/lab/rust/rdedup/lib/<try macros>: No such file or directory.                                          
    (gdb)                         
    218             while !self.done {                          
    (gdb)                         
    219                 try!(self.write(&[]));                  
    (gdb)                         
    3       /home/dpc/lab/rust/rdedup/lib/<try macros>: No such file or directory.                                          
    (gdb)                         
    218             while !self.done {                          
    (gdb)                         
    219                 try!(self.write(&[]));                  
    (gdb) n                       
    3       /home/dpc/lab/rust/rdedup/lib/<try macros>: No such file or directory.                                          
    (gdb)                         
    218             while !self.done {                          
    (gdb)                         
    219                 try!(self.write(&[]));                  
    (gdb)                         
    3       /home/dpc/lab/rust/rdedup/lib/<try macros>: No such file or directory.                                          
    (gdb)                         
    218             while !self.done {                          
    (gdb)                         
    219                 try!(self.write(&[]));                  
    (gdb)                         
    3       /home/dpc/lab/rust/rdedup/lib/<try macros>: No such file or directory.    
    

    If I do step, I get:

    3       /home/dpc/lab/rust/rdedup/lib/<try macros>: No such file or directory.                                          
    (gdb) s                       
    218             while !self.done {                          
    (gdb)                         
    219                 try!(self.write(&[]));                  
    (gdb)                         
    bzip2::write::{{impl}}::write<collections::vec::Vec<u8>> (self=0x7fff68365488, data=...)                                
        at /home/dpc/.cargo/registry/src/github.com-1ecc6299db9ec823/bzip2-0.3.2/src/write.rs:254                           
    254             if self.done {
    (gdb)                         
    258                 try!(self.dump());                      
    (gdb)                         
    258                 try!(self.dump());                      
    (gdb)                         
    bzip2::write::BzDecoder<collections::vec::Vec<u8>>::dump<collections::vec::Vec<u8>> (self=0x7fff68365488)               
        at /home/dpc/.cargo/registry/src/github.com-1ecc6299db9ec823/bzip2-0.3.2/src/write.rs:200                           
    200             while self.buf.len() > 0 {                  
    (gdb)                         
    204             Ok(())        
    (gdb)                         
    205         }                 
    (gdb)                         
    bzip2::write::{{impl}}::write<collections::vec::Vec<u8>> (self=0x7fff68365488, data=...)                                
        at /home/dpc/lab/rust/rdedup/lib/<try macros>:3                                                                     
    3       /home/dpc/lab/rust/rdedup/lib/<try macros>: No such file or directory.                                          
    (gdb)                                                                                                                   
    267             (self.raw.total_in_lo32 as u64) |                                                                       
    (gdb)                                                                                                                   
    268             ((self.raw.total_in_hi32 as u64) << 32)                                                                 
    (gdb)                                                                                                                   
    261                 let res = self.data.decompress_vec(data, &mut self.buf);                                            
    (gdb)                                                                                                                   
    bzip2::mem::Decompress::decompress_vec (self=0x7fff68365488, input=..., output=0x7fff683654a8)                          
        at /home/dpc/.cargo/registry/src/github.com-1ecc6299db9ec823/bzip2-0.3.2/src/mem.rs:243                             
    243                 let before = self.total_out();                                                                      
    (gdb)                                                                                                                   
    bzip2::mem::Decompress::total_out (self=<optimized out>)                                                                
        at /home/dpc/.cargo/registry/src/github.com-1ecc6299db9ec823/bzip2-0.3.2/src/mem.rs:243                             
    243                 let before = self.total_out();                                                                      
    (gdb)                         
    bzip2::mem::Stream<bzip2::mem::DirDecompress>::total_out<bzip2::mem::DirDecompress> (self=<optimized out>)              
        at /home/dpc/.cargo/registry/src/github.com-1ecc6299db9ec823/bzip2-0.3.2/src/mem.rs:272                             
    272             (self.raw.total_out_lo32 as u64) |          
    (gdb)                         
    273             ((self.raw.total_out_hi32 as u64) << 32)    
    (gdb)                         
    bzip2::mem::Decompress::decompress_vec (self=0x7fff68365488, input=..., output=0x7fff683654a8)                          
        at /home/dpc/.cargo/registry/src/github.com-1ecc6299db9ec823/bzip2-0.3.2/src/mem.rs:245                             
    245                     let ptr = output.as_mut_ptr().offset(len as isize);                                             
    (gdb)                         
    core::ptr::{{impl}}::offset<u8> (count=<optimized out>, self=<optimized out>) at /checkout/src/libcore/ptr.rs:617       
    617     /checkout/src/libcore/ptr.rs: No such file or directory.                                                        
    (gdb)         
    (...)
    
    opened by dpc 11
  • Error trying to install crate bzip2-sys

    Error trying to install crate bzip2-sys

    I'm trying to install the zip crate that depends on bzip2 but I'm getting the following error.

    Using Rust nightly 1.31.0 ("2018 edition")

    Compiling bzip2-sys v0.1.6
    error: failed to run custom build command for `bzip2-sys v0.1.6`
    process didn't exit successfully: `C:\Dev\[project]\target\debug\build\bzip2-sys-33f8cb5c7dfcf9ea\build-script-build` (exit code: 101)
    --- stdout
    TARGET = Some("x86_64-pc-windows-msvc")
    OPT_LEVEL = Some("0")
    HOST = Some("x86_64-pc-windows-msvc")
    CC_x86_64-pc-windows-msvc = None
    CC_x86_64_pc_windows_msvc = None
    HOST_CC = None
    CC = Some("C:\\mingw-w64\\x86_64-7.3.0-posix-seh-rt_v5-rev0\\mingw64\\bin\\gcc.exe")
    CFLAGS_x86_64-pc-windows-msvc = None
    CFLAGS_x86_64_pc_windows_msvc = None
    HOST_CFLAGS = None
    CFLAGS = None
    DEBUG = Some("true")
    running: "C:\\mingw-w64\\x86_64-7.3.0-posix-seh-rt_v5-rev0\\mingw64\\bin\\gcc.exe" "-O0" "-ffunction-sections" "-fdata-sections" "-fPIC" "-g" "-fno-omit-frame-pointer" "-m64" "-I" "bzip2-1.0.6" "-D_WIN32" "-DBZ_EXPORT" "-DBZ_NO_STDIO" "/FoC:\\Dev\\[project]\\target\\debug\\build\\bzip2-sys-351245291296f0d0\\out\\bzip2-1.0.6\\blocksort.o" "/c" "bzip2-1.0.6/blocksort.c"
    cargo:warning=gcc.exe: error: /FoC:\Dev\[project]\target\debug\build\bzip2-sys-351245291296f0d0\out\bzip2-1.0.6\blocksort.o: Invalid argument
    cargo:warning=gcc.exe: error: /c: No such file or directory
    exit code: 1
    
    --- stderr
    thread 'main' panicked at '
    
    Internal error occurred: Command "C:\\mingw-w64\\x86_64-7.3.0-posix-seh-rt_v5-rev0\\mingw64\\bin\\gcc.exe" "-O0" "-ffunction-sections" "-fdata-sections" "-fPIC" "-g" "-fno-omit-frame-pointer" "-m64" "-I" "bzip2-1.0.6" "-D_WIN32" "-DBZ_EXPORT" "-DBZ_NO_STDIO" "/FoC:\\Dev\\[project]\\target\\debug\\build\\bzip2-sys-351245291296f0d0\\out\\bzip2-1.0.6\\blocksort.o" "/c" "bzip2-1.0.6/blocksort.c" with args "gcc.exe" did not execute successfully (status code exit code: 1).
    
    ', C:\Users\[user]\.cargo\registry\src\github.com-1ecc6299db9ec823\cc-1.0.25\src\lib.rs:2260:5
    note: Run with `RUST_BACKTRACE=1` for a backtrace.
    
    opened by woubuc 8
  • Filesize mismatch when decompressing multi-stream with sizes greater than 2GB (2^31)

    Filesize mismatch when decompressing multi-stream with sizes greater than 2GB (2^31)

    Here is a self contained reproducer of the problem: bzip_bug.zip

    To reproduce:

    unzip bzip_bug.zip
    cd bzip_bug
    cargo run --release --bin bzip_bug
    

    You will see the following output:

    $ cargo run --release --bin bzip_bug
       Compiling pkg-config v0.3.18
       Compiling libc v0.2.72
       Compiling cc v1.0.58
       Compiling bzip2-sys v0.1.9+1.0.8
       Compiling bzip2 v0.4.1
       Compiling bzip_bug v0.1.0 (/private/tmp/foo/bzip_bug)
        Finished release [optimized] target(s) in 9.08s
         Running `target/release/bzip_bug`
    Generating expected results
    Decompressing stream made with bzip2
    Decompressing stream made with pbzip2
    thread 'main' panicked at 'assertion failed: `(left == right)`
      left: `405900000`,
     right: `3000000000`: decompressed length mismatch', src/main.rs:37:5
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
    

    The expected output is that the program complete's sucessfully

    The test has two files:

    1. raw.dat.bz2 is a 3GB generated file produced using the bzip2 tool on a mac
    2. raw.dat.pbz2 is the same 3GB generated file produced using the pbzip2 tool on a mac

    The data was generated using the generate.rs tool in the package, via the following commands:

    $ cargo build --release --all
    $ ./target/release/generate | nice bzip2 > raw.dat.bz2&
    $ ./target/release/generate | nice pbzip2 > raw.dat.pbz2&
    

    You can check the output byte counts using bzcat:

    $ bzcat raw.dat.bz2 | wc -c
      3000000000
    
    $ bzcat raw.dat.pbz2 | wc -c
     3000000000
    

    The issue seems to affect files that are larger than 2^31 (which smells like a u32 overflow somewhere)

    opened by alamb 6
  • Build failure at gcc level: blocksort.c not found

    Build failure at gcc level: blocksort.c not found

    bzip2-sys seems to refuse to build due to missing blocksort.c (it is indeed not present in the related build directory).

       Compiling bzip2-sys v0.1.6
    error: failed to run custom build command for `bzip2-sys v0.1.6`
    process didn't exit successfully: `c:\Local\poligon\extract_blobs\target\release\build\bzip2-sys-9937a6f77c12f052\build-script-build` (exit code: 101)
    --- stdout
    TARGET = Some("x86_64-pc-windows-gnu")
    OPT_LEVEL = Some("3")
    TARGET = Some("x86_64-pc-windows-gnu")
    HOST = Some("x86_64-pc-windows-gnu")
    TARGET = Some("x86_64-pc-windows-gnu")
    TARGET = Some("x86_64-pc-windows-gnu")
    HOST = Some("x86_64-pc-windows-gnu")
    CC_x86_64-pc-windows-gnu = Some("C:\\Local\\poligon\\.rustup\\toolchains\\nightly-x86_64-pc-windows-gnu\\lib\\rustlib\\x86_64-pc-windows-gnu\\bin\\gcc.exe")
    TARGET = Some("x86_64-pc-windows-gnu")
    HOST = Some("x86_64-pc-windows-gnu")
    CFLAGS_x86_64-pc-windows-gnu = None
    CFLAGS_x86_64_pc_windows_gnu = None
    HOST_CFLAGS = None
    CFLAGS = None
    DEBUG = Some("false")
    running: "C:\\Local\\poligon\\.rustup\\toolchains\\nightly-x86_64-pc-windows-gnu\\lib\\rustlib\\x86_64-pc-windows-gnu\\bin\\gcc.exe" "-O3" "-ffunction-sections" "-fdata-sections" "-m64" "-I" "bzip2-1.0.6" "-D_WIN32" "-DBZ_EXPORT" "-DBZ_NO_STDIO" "-o" "c:\\Local\\poligon\\extract_blobs\\target\\release\\build\\bzip2-sys-f19fa19517d295e2\\out\\bzip2-1.0.6\\blocksort.o" "-c" "bzip2-1.0.6/blocksort.c"
    cargo:warning=gcc.exe: error: CreateProcess: No such file or directory
    exit code: 1
    

    I'm using rustc 1.28.0-nightly-gnu-x64 with Win 7.

    opened by ljedrz 6
  • BzCompressor fails on empty buffer?

    BzCompressor fails on empty buffer?

    The Read implementation of BzCompressor can return an error when the underlying buffer return zero bytes: https://github.com/alexcrichton/bzip2-rs/blob/master/src/reader.rs#L67.

    I believe this is now incorrect behaviour, as EOF is now signalled through a zero-byte read instead of an error. That's what I get from the docs, anyway: https://doc.rust-lang.org/std/io/trait.Read.html#tymethod.read

    opened by marcusklaas 5
  • Decompression stops silently on byte #900000

    Decompression stops silently on byte #900000

    I got this using this file: https://database.lichess.org/atomic/lichess_db_atomic_rated_2015-01.pgn.bz2 (I checked at least one other file, and it also stopped on 900000th byte) The following code:

    use std::fs::File;
    use std::io;
    
    fn by_bzip2() -> std::io::Result<()> {
        let compressed_file = File::open("lichess_db_atomic_rated_2015-01.pgn.bz2")?;
        let mut decompressed_output = File::create("decoded_by_bzip2.pgn")?;
    
        let mut reader = bzip2::read::BzDecoder::new(compressed_file);
        io::copy(&mut reader, &mut decompressed_output)?;
        Ok(())
    }
    
    fn by_bzip2_rs() -> std::io::Result<()> {
        let compressed_file = File::open("lichess_db_atomic_rated_2015-01.pgn.bz2")?;
        let mut decompressed_output = File::create("decoded_by_bzip2-rs.pgn")?;
    
        let mut reader = bzip2_rs::DecoderReader::new(compressed_file);
        io::copy(&mut reader, &mut decompressed_output)?;
        Ok(())
    }
    
    fn main() {
        by_bzip2().expect("bzip2");
        by_bzip2_rs().expect("bzip2-rs");
    }
    

    returns 0 on exit, produces same files decoded_by_bzip2.pgn and decoded_by_bzip2-rs.pgn each of length exactly 900000 bytes, which are a prefix of a much larger file that my system decoder /usr/bin/bzip2 -dk lichess_db_atomic_rated_2015-01.pgn.bz2 produces. So the decoding in both bzip2 and bzip2-rs crates for some reason abruptly stops on this round number 900000 without printing any error message. Which looks like a bug in the crates, not in the file itself, since the file is handled correctly by Linux utilities.

    opened by ygyzys 4
  • correct multistream decompression EOF handling (fix #61)

    correct multistream decompression EOF handling (fix #61)

    Still missing a good test case for this. I don't think the reproducer in #61 is a good candidate as it takes a long time to run. I should be able to reduce it to a smaller test in the next few days.

    opened by afflux 4
  • Fix compilation on Wasm

    Fix compilation on Wasm

    Fix compilation on WebAssembly, as well as any other non-Windows/Unix/Redox targets.

    Previously, the functions would be undefined on those targets and the crate would fail to compile with errors like:

    error[E0425]: cannot find function, tuple struct or tuple variant `BZ2_bzCompressInit` in crate `ffi`
       --> src/mem.rs:124:22
        |
    124 |                 ffi::BZ2_bzCompressInit(&mut *raw, lvl.level() as c_int, 0, work_factor as c_int),
        |                      ^^^^^^^^^^^^^^^^^^ not found in `ffi`
    
    error[E0425]: cannot find function, tuple struct or tuple variant `BZ2_bzCompress` in crate `ffi`
       --> src/mem.rs:157:24
        |
    157 |             match ffi::BZ2_bzCompress(&mut *self.inner.raw, action as c_int) {
        |                        ^^^^^^^^^^^^^^ not found in `ffi`
    
    error[E0425]: cannot find function, tuple struct or tuple variant `BZ2_bzDecompressInit` in crate `ffi`
       --> src/mem.rs:215:29
        |
    215 |             assert_eq!(ffi::BZ2_bzDecompressInit(&mut *raw, 0, small as c_int), 0);
        |                             ^^^^^^^^^^^^^^^^^^^^ not found in `ffi`
    
    error[E0425]: cannot find function, tuple struct or tuple variant `BZ2_bzDecompress` in crate `ffi`
       --> src/mem.rs:232:24
        |
    232 |             match ffi::BZ2_bzDecompress(&mut *self.inner.raw) {
        |                        ^^^^^^^^^^^^^^^^ not found in `ffi`
    
    error[E0425]: cannot find function, tuple struct or tuple variant `BZ2_bzCompressEnd` in crate `ffi`
       --> src/mem.rs:312:14
        |
    312 |         ffi::BZ2_bzCompressEnd(stream)
        |              ^^^^^^^^^^^^^^^^^ not found in `ffi`
    
    error[E0425]: cannot find function, tuple struct or tuple variant `BZ2_bzDecompressEnd` in crate `ffi`
       --> src/mem.rs:317:14
        |
    317 |         ffi::BZ2_bzDecompressEnd(stream)
        |              ^^^^^^^^^^^^^^^^^^^ not found in `ffi`
    
    opened by RReverser 4
  • [bzip2-sys] use the system bzip2 if found via pkg-config

    [bzip2-sys] use the system bzip2 if found via pkg-config

    Similar approach to libz-sys: https://github.com/rust-lang/libz-sys/blob/master/build.rs

    Signed-off-by: Michel Alexandre Salim [email protected]


    This change is Reviewable

    opened by michel-slm 4
  • The bzip2-sys crate in crates.io seems not consist with the git version, please fix

    The bzip2-sys crate in crates.io seems not consist with the git version, please fix

    This is the build.rs in crates.io with version 0.1.6.

    extern crate cc;
    
    use std::env;
    
    fn main() {
        let mut cfg = cc::Build::new();
        cfg.warnings(false);
    
        if env::var("TARGET").unwrap().contains("windows") {
            cfg.define("_WIN32", None);
            cfg.define("BZ_EXPORT", None);
        }
    
        cfg.include("bzip2-1.0.6")
           .define("BZ_NO_STDIO", None)
           .file("bzip2-1.0.6/blocksort.c")
           .file("bzip2-1.0.6/huffman.c")
           .file("bzip2-1.0.6/crctable.c")
           .file("bzip2-1.0.6/randtable.c")
           .file("bzip2-1.0.6/compress.c")
           .file("bzip2-1.0.6/decompress.c")
           .file("bzip2-1.0.6/bzlib.c")
           .compile("libbz2.a");
    }
    

    because of this, the header bzlib.h does not exist in output dir.

    link rust-bio/rust-htslib#115 here.

    opened by zitsen 4
  • Update partial-io requirement to ^0.3.0

    Update partial-io requirement to ^0.3.0

    Updates the requirements on partial-io to permit the latest version.

    Commits

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

    Dependabot will not automatically merge this PR because this dependency is pre-1.0.0.


    Note: This repo was added to Dependabot recently, so you'll receive a maximum of 5 PRs for your first few update runs. Once an update run creates fewer than 5 PRs we'll remove that limit.

    You can always request more updates by clicking Bump now in your Dependabot dashboard.

    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot ignore this [patch|minor|major] version will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot badge me will comment on this PR with code to add a "Dependabot enabled" badge to your readme

    Additionally, you can set the following in your Dependabot dashboard:

    • Update frequency (including time of day and day of week)
    • Automerge options (never/patch/minor, and dev/runtime dependencies)
    • Pull request limits (per update run and/or open at any time)
    • Out-of-range updates (receive only lockfile updates, if desired)
    • Security updates (receive only security updates, if desired)

    Finally, you can contact us by mentioning @dependabot.

    dependencies 
    opened by dependabot-preview[bot] 4
  • Patched an infinite loop bug in src/mem.rs, impl Decompress::decompress()

    Patched an infinite loop bug in src/mem.rs, impl Decompress::decompress()

    Hello!

    Version

    stable-x86_64-unknown-linux-gnu (default)
    rustc 1.66.0 (69f9c33d7 2022-12-12)
    

    Problem

    I came across an infinite loop bug when I'm trying to unzipping an ASCII file whose size is larger than 4GB.

    The following code is an example of my usage:

        pub fn from_zip_file(file_path: String) -> Trace {
            let zip_file =
                fs::File::open(&file_path).expect(&format!("Could not open file {}", &file_path));
            let mut zip_archive = zip::ZipArchive::new(zip_file)
                .expect(&format!("Could not open archive {}", &file_path));
    
            println!("Trace::from_zip_file: unzipping {}", &file_path);
            let mut trace_file = zip_archive.by_index(0).unwrap();
            let trace_file_path = trace_file.sanitized_name().to_str().unwrap().to_string();
            println!("Trace::from_zip_file: unzipped {}", &file_path);
    
            let mut trace_content = String::new();
            trace_file
                .read_to_string(&mut trace_content)
                .expect(&format!("Could not read unzipped file {}", trace_file_path));
            println!("Trace::from_zip_file: read_to_string {}", &file_path);
        }
    

    My program stuck at trace_file.read_to_string(). In order to investigate the problem, I analyzed the code carefully and find root cause in the bzip2 rust library.

    Root Cause

    The function read_to_string() indirectly invokes std::io::default_read_to_end(). The following is source of std::io::default_read_to_end():

    // This uses an adaptive system to extend the vector when it fills. We want to
    // avoid paying to allocate and zero a huge chunk of memory if the reader only
    // has 4 bytes while still making large reads if the reader does have a ton
    // of data to return. Simply tacking on an extra DEFAULT_BUF_SIZE space every
    // time is 4,500 times (!) slower than a default reservation size of 32 if the
    // reader has a very small amount of data to return.
    pub(crate) fn default_read_to_end<R: Read + ?Sized>(r: &mut R, buf: &mut Vec<u8>) -> Result<usize> {
        let start_len = buf.len();
        let start_cap = buf.capacity();
    
        let mut initialized = 0; // Extra initialized bytes from previous loop iteration
        loop {
            if buf.len() == buf.capacity() {
                buf.reserve(32); // buf is full, need more space
            }
    
            let mut read_buf: BorrowedBuf<'_> = buf.spare_capacity_mut().into();
    
            // SAFETY: These bytes were initialized but not filled in the previous loop
            unsafe {
                read_buf.set_init(initialized);
            }
    
            let mut cursor = read_buf.unfilled();
            match r.read_buf(cursor.reborrow()) {
                Ok(()) => {}
                Err(e) if e.kind() == ErrorKind::Interrupted => continue,
                Err(e) => return Err(e),
            }
    
            if cursor.written() == 0 {
                return Ok(buf.len() - start_len);
            }
    
            // store how much was initialized but not filled
            initialized = cursor.init_ref().len();
    
            // SAFETY: BorrowedBuf's invariants mean this much memory is initialized.
            unsafe {
                let new_len = read_buf.filled().len() + buf.len();
                buf.set_len(new_len);
            }
    
            if buf.len() == buf.capacity() && buf.capacity() == start_cap {
                // The buffer might be an exact fit. Let's read into a probe buffer
                // and see if it returns `Ok(0)`. If so, we've avoided an
                // unnecessary doubling of the capacity. But if not, append the
                // probe buffer to the primary buffer and let its capacity grow.
                let mut probe = [0u8; 32];
    
                loop {
                    match r.read(&mut probe) {
                        Ok(0) => return Ok(buf.len() - start_len),
                        Ok(n) => {
                            buf.extend_from_slice(&probe[..n]);
                            break;
                        }
                        Err(ref e) if e.kind() == ErrorKind::Interrupted => continue,
                        Err(e) => return Err(e),
                    }
                }
            }
        }
    }
    

    The stdlib function std::io::default_read_to_end() double the buffer's capacity every time when it's full. As the buffer's initial capacity is 32, the capacity of the Vec buf will always be a power of two.

    As my ASCII file to be unzipped is a little larger than 4GB (0x1_0000_0000h, 4294967296). The capacity of buffer will be extended to 8GB (0x2_0000_0000h, 8589934592).

    In default_read_to_end(), the read_buf as BorrowedBuf is borrowed from the original buffer, indicating the start of spare space in buf.

    There exists a time when read_buf.len() is exactly 0x1_0000_0000 and buf.len() is 0x2_0000_0000, which means first half part of buf is all unzipped data and last half part is available spare space for filling unzipped data.

    The function call r.read_buf(cursor.reborrow()) indirectly calls Decompress::decompress(), whose source is listed below:

        /// Decompress a block of input into a block of output.
        pub fn decompress(&mut self, input: &[u8], output: &mut [u8]) -> Result<Status, Error> {
            self.inner.raw.next_in = input.as_ptr() as *mut _;
            self.inner.raw.avail_in = input.len() as c_uint;
            self.inner.raw.next_out = output.as_mut_ptr() as *mut _;
            self.inner.raw.avail_out = output.len() as c_uint;
            unsafe {
                match ffi::BZ2_bzDecompress(&mut *self.inner.raw) {
                    ffi::BZ_OK => Ok(Status::Ok),
                    ffi::BZ_MEM_ERROR => Ok(Status::MemNeeded),
                    ffi::BZ_STREAM_END => Ok(Status::StreamEnd),
                    ffi::BZ_PARAM_ERROR => Err(Error::Param),
                    ffi::BZ_DATA_ERROR => Err(Error::Data),
                    ffi::BZ_DATA_ERROR_MAGIC => Err(Error::DataMagic),
                    ffi::BZ_SEQUENCE_ERROR => Err(Error::Sequence),
                    c => panic!("wut: {}", c),
                }
            }
        }
    

    At this point, the length of output is 0x1_0000_0000.

    When casting to c_uint (uint32), output.len()'s high 32 bits are lost, so avail_out is zero when invoking C unzipping function.

    In that case, no data will be extracted and function directly returns. However, parent functions keeps asking for unzipping the left datas. Thus endless loop occurred.

    Patch

    The following is my patch. self.inner.raw.avail_out 's assign logic is modified. When avail_out is exceeding the available range of c_uint, it is set to the max value of c_uint.

    Other function may contain the same problem.

        /// Decompress a block of input into a block of output.
        pub fn decompress(&mut self, input: &[u8], output: &mut [u8]) -> Result<Status, Error> {
            self.inner.raw.next_in = input.as_ptr() as *mut _;
            self.inner.raw.avail_in = input.len() as c_uint;
            self.inner.raw.next_out = output.as_mut_ptr() as *mut _;
            self.inner.raw.avail_out = {
                let avail_out = output.len();
                if (avail_out > 0) && (avail_out & c_uint::MAX as usize == 0) {
                    c_uint::MAX
                } else {
                    avail_out as c_uint
                }
            };
            unsafe {
                match ffi::BZ2_bzDecompress(&mut *self.inner.raw) {
                    ffi::BZ_OK => Ok(Status::Ok),
                    ffi::BZ_MEM_ERROR => Ok(Status::MemNeeded),
                    ffi::BZ_STREAM_END => Ok(Status::StreamEnd),
                    ffi::BZ_PARAM_ERROR => Err(Error::Param),
                    ffi::BZ_DATA_ERROR => Err(Error::Data),
                    ffi::BZ_DATA_ERROR_MAGIC => Err(Error::DataMagic),
                    ffi::BZ_SEQUENCE_ERROR => Err(Error::Sequence),
                    c => panic!("wut: {}", c),
                }
            }
        }
    
    opened by bjrjk 0
  • Way to know how many compressed bytes were read when decompressing

    Way to know how many compressed bytes were read when decompressing

    This is more for user feedback than anything, but I was looking for a way to know how many bytes of the compressed source had been read, rather than the number of decompressed bytes, so that I can get accurate ETA's.

    For example:

    let input = File::open(input.expect("Could not get path"))?;
    let size = input.metadata()?.len();
    let reader = BufReader::new(input);
    let mut md = MultiBzDecoder::new(reader);
    let mut buffer = [0; BUFFER_LENGTH];
    let mut n = md.read(&mut buffer)?;
    

    n appears to me to be how many uncompressed bytes were read, How do I know how far along in the file I've read and how it compares to size?

    opened by alexgagnon 0
  • Bzip2-sys does not support the value 0 for blockSize100k

    Bzip2-sys does not support the value 0 for blockSize100k

    We are directly passing Compression::level to BZ2_bzCompressInit

    https://github.com/alexcrichton/bzip2-rs/blob/016e18155ef7c05983ea244cae1344c5b68defd8/src/mem.rs#L123-L126

    However, this API only supports values from 1..=9 as documented here in the sources

    Parameter blockSize100k specifies the block size to be used for compression. It should be a value between 1 and 9 inclusive, and the actual block size used is 100000 x this figure. 9 gives the best compression but takes most memory.

    This bug seems to have been introduced in https://github.com/alexcrichton/bzip2-rs/pull/37, where we tried to support an API aligned with flate2::Compression. However, our version of Compression::none will always panic on the above assertion with BZ_PARAM_ERROR

    https://github.com/alexcrichton/bzip2-rs/blob/016e18155ef7c05983ea244cae1344c5b68defd8/src/lib.rs#L100-L102

    References

    We had a panic in usage of zip over here :) https://github.com/zip-rs/zip/issues/326

    Suggestions

    As the library doesnt appear to support disabling compression, I would prefer none to be deprecated, and a bounds check to be added to Compression::new. Hopefully there's a clean way to provide the equivalent of flate2's none and we can specifically branch on 0?

    opened by Plecra 0
  • build android fail

    build android fail

    buid command: RUSTFLAGS='-C strip=symbols' cargo build -p 'lib_project' --release --target aarch64-linux-android RUSTFLAGS='-C strip=symbols' cargo build -p 'lib_project' --release --target armv7-linux-androideabi config file

    [target.aarch64-linux-android] ar = "NDK/aarch64/bin/aarch64-linux-android-ar" linker = "NDK/aarch64/bin/aarch64-linux-android22-clang"

    [target.armv7-linux-androideabi] ar = "NDK/armv7/bin/arm-linux-androideabi-ar" linker = "NDK/armv7/bin/armv7a-linux-androideabi22-clang"

    code:

    ///zip compress
    fn compress_zip(bytes: &Vec<u8>, level: u32) -> anyhow::Result<Vec<u8>> {
        use bzip2::write::{BzEncoder};
        let mut compressor = BzEncoder::new(vec![0x1, 0, 0, 0, 0], Compression::new(level.min(9)));
        compressor.write_all(bytes)?;
        let mut result = compressor.finish()?;
        Ok(result)
    }
    
    ///zip decompress bytes buffer
    fn depress_zip(bytes: &[u8], size: usize) -> anyhow::Result<Vec<u8>>{
        let mut decompressor = bzip2::read::BzDecoder::new(std::io::Cursor::new(bytes));
        let mut out_buffer = Vec::with_capacity(size);
        std::io::copy(&mut decompressor, &mut out_buffer)?;
        Ok(out_buffer)
    }
    

    error msg:

    error: failed to run custom build command for bzip2-sys v0.1.11+1.0.8

    Caused by: process didn't exit successfully: /home/banagame/BattleServer/server/Rust/target/release/build/bzip2-sys-5c05721eef23dbb9/build-script-build (exit status: 1) --- stdout cargo:rerun-if-env-changed=BZIP2_NO_PKG_CONFIG cargo:rerun-if-env-changed=PKG_CONFIG_ALLOW_CROSS_armv7-linux-androideabi cargo:rerun-if-env-changed=PKG_CONFIG_ALLOW_CROSS_armv7_linux_androideabi cargo:rerun-if-env-changed=TARGET_PKG_CONFIG_ALLOW_CROSS cargo:rerun-if-env-changed=PKG_CONFIG_ALLOW_CROSS cargo:rerun-if-env-changed=PKG_CONFIG_armv7-linux-androideabi cargo:rerun-if-env-changed=PKG_CONFIG_armv7_linux_androideabi cargo:rerun-if-env-changed=TARGET_PKG_CONFIG cargo:rerun-if-env-changed=PKG_CONFIG cargo:rerun-if-env-changed=PKG_CONFIG_SYSROOT_DIR_armv7-linux-androideabi cargo:rerun-if-env-changed=PKG_CONFIG_SYSROOT_DIR_armv7_linux_androideabi cargo:rerun-if-env-changed=TARGET_PKG_CONFIG_SYSROOT_DIR cargo:rerun-if-env-changed=PKG_CONFIG_SYSROOT_DIR TARGET = Some("armv7-linux-androideabi") OPT_LEVEL = Some("3") HOST = Some("x86_64-unknown-linux-gnu") CC_armv7-linux-androideabi = None CC_armv7_linux_androideabi = None TARGET_CC = None CC = None CFLAGS_armv7-linux-androideabi = None CFLAGS_armv7_linux_androideabi = None TARGET_CFLAGS = None CFLAGS = None CRATE_CC_NO_DEFAULTS = None DEBUG = Some("false") running: "arm-linux-androideabi-clang" "-O3" "-DANDROID" "-ffunction-sections" "-fdata-sections" "-fPIC" "-I" "bzip2-1.0.8" "-D_FILE_OFFSET_BITS=64" "-DBZ_NO_STDIO" "-o" "/home/banagame/BattleServer/server/Rust/target/armv7-linux-androideabi/release/build/bzip2-sys-16baeea8c12f7af8/out/lib/bzip2-1.0.8/blocksort.o" "-c" "bzip2-1.0.8/blocksort.c"

    --- stderr

    error occurred: Failed to find tool. Is arm-linux-androideabi-clang installed?

    opened by DrYaling 0
  • bz_internal_error exported

    bz_internal_error exported

    I develop rpm-sequoia, which is a Rust crate that implements rpm's OpenPGP API in terms of Sequoia PGP. (More details, unrelated to this issue, are described in this rpm issue).

    rpm-sequoia should only export the rpm's PGP API. Unfortunately, a number of additional symbols are leaked by dependencies. Specifically, any symbols that have the #[no_mangle] attribute appear to be exported. bzip2-sys marks bz_internal_error like this, and thus it is exported by the library:

    $ nm --defined-only --extern-only /tmp/rpm-sequoia/debug/librpm_sequoia.so | grep bz
    0000000000361fa0 T bz_internal_error
    

    In turns out that rpm-sequoia doesn't actually need compression support so I was able to workaround this issue by disabling that feature. But, this issue may trip up others.

    opened by nwalfield 1
  • Implement tokio::io::AsyncBufRead for bufread::* types

    Implement tokio::io::AsyncBufRead for bufread::* types

    Tokio not only provides the AsyncRead trait (mirroring std::io::Read but for async) but also AsyncBufRead (mirroring std::io::BufRead).

    It would be great if it was implemented for the types in the bufread module if the tokio feature flag is enabled. Right now it appears that only AsyncRead is implemented in this case.

    opened by d-e-s-o 1
Owner
Alex Crichton
Alex Crichton
Pure Rust bzip2 decoder

bzip2-rs Pure Rust 100% safe bzip2 decompressor. Features Default features: Rust >= 1.34.2 is supported rustc_1_37: bump MSRV to 1.37, enable more opt

Paolo Barbolini 36 Jan 6, 2023
Ribzip2 - A bzip2 implementation in pure Rust.

ribzip2 - a comprehensible bzip2 implementation ribzip2 is command line utility providing bzip2 compression and decompression written in pure Rust. It

null 16 Oct 24, 2022
banzai: pure rust bzip2 encoder

banzai banzai is a bzip2 encoder with linear-time complexity, written entirely in safe Rust. It is currently alpha software, which means that it is no

Jack Byrne 27 Oct 24, 2022
lzlib (lzip compression) bindings for Rust

lzip Documentation A streaming compression/decompression library for rust with bindings to lzlib. # Cargo.toml [dependencies] lzip = "0.1" License Lic

Firas Khalil Khana 8 Sep 20, 2022
A Rust implementation of the Zopfli compression algorithm.

Zopfli in Rust This is a reimplementation of the Zopfli compression tool in Rust. I have totally ignored zopflipng. More info about why and how I did

Carol (Nichols || Goulding) 76 Oct 20, 2022
Like pigz, but rust - a cross platform, fast, compression and decompression tool.

?? crabz Like pigz, but rust. A cross platform, fast, compression and decompression tool. Synopsis This is currently a proof of concept CLI tool using

Seth 232 Jan 2, 2023
A reimplementation of the Zopfli compression tool in Rust.

Zopfli in Rust This is a reimplementation of the Zopfli compression tool in Rust. Carol Nichols started the Rust implementation as an experiment in in

null 11 Dec 26, 2022
Basic (and naïve) LZW and Huffman compression algorithms in Rust.

Naive implementation of the LZW and Huffman compression algorithms. To run, install the Rust toolchain. Cargo may be used to compile the source. Examp

Luiz Felipe Gonçalves 9 May 22, 2023
(WIP) Taking the pain away from file (de)compression

Ouch! ouch loosely stands for Obvious Unified Compression files Helper and aims to be an easy and intuitive way of compressing and decompressing files

Vinícius Miguel 734 Dec 30, 2022
gzp - Multi-threaded Compression

gzp - Multi-threaded Compression

Seth 123 Dec 28, 2022
Fastest Snappy compression library in Node.js

snappy !!! For [email protected] and below, please go to node-snappy. More background about the 6-7 changes, please read this, Thanks @kesla . ?? Help me to

LongYinan 103 Jan 2, 2023
Michael's Compression Algorithm

mca This repository contains a compression algorithm written by me (Michael Grigoryan). The algorithm is only able to compress and decompress text fil

Michael Grigoryan 1 Dec 19, 2022
Obvious Unified Compression Helper is a CLI tool to help you compress and decompress files of several formats

Ouch! ouch stands for Obvious Unified Compression Helper and is a CLI tool to help you compress and decompress files of several formats. Features Usag

null 734 Dec 30, 2022
DEFLATE, gzip, and zlib bindings for Rust

flate2 A streaming compression/decompression library DEFLATE-based streams in Rust. This crate by default uses the miniz_oxide crate, a port of miniz.

The Rust Programming Language 619 Jan 8, 2023
Snappy bindings for Rust

Snappy [ Originally forked from https://github.com/thestinger/rust-snappy ] Documentation Usage Add this to your Cargo.toml: [dependencies] snappy = "

Jeff Belgum 14 Jan 21, 2022
A simple rust library to read and write Zip archives, which is also my pet project for learning Rust

rust-zip A simple rust library to read and write Zip archives, which is also my pet project for learning Rust. At the moment you can list the files in

Kang Seonghoon 2 Jan 5, 2022
A Brotli implementation in pure and safe Rust

Brotli-rs - Brotli decompression in pure, safe Rust Documentation Compression provides a <Read>-struct to wrap a Brotli-compressed stream. A consumer

Thomas Pickert 59 Oct 7, 2022
Brotli compressor and decompressor written in rust that optionally avoids the stdlib

rust-brotli What's new in 3.2 into_inner conversions for both Reader and Writer classes What's new in 3.0 A fully compatible FFI for drop-in compatibi

Dropbox 659 Dec 29, 2022
Tar file reading/writing for Rust

tar-rs Documentation A tar archive reading/writing library for Rust. # Cargo.toml [dependencies] tar = "0.4" Reading an archive extern crate tar; use

Alex Crichton 490 Dec 30, 2022