Like pigz, but rust - a cross platform, fast, compression and decompression tool.

Related tags

Compression crabz
Overview

🦀 crabz

Build Status license Version info
Like pigz, but rust.

A cross platform, fast, compression and decompression tool.

Synopsis

This is currently a proof of concept CLI tool using the gzp crate.

Supported formats:

  • Gzip
  • Zlib
  • Mgzip
  • BGZF
  • Raw Deflate
  • Snap

Install

  • Homebrew / Linuxbrew
brew tap sstadick/crabz
brew install crabz
  • Debian (Ubuntu)
curl -LO https://github.com/sstadick/crabz/releases/download/
   
    /crabz-linux-amd64.deb
sudo dpkg -i crabz-linux-amd64.deb

   
  • Cargo
cargo install crabz
  • Conda (coming soon!)
conda install -c conda-forge crabz

Usage

Input file to read from, empty or "-" to read from stdin ">
❯ crabz -h              
Seth Stadick
Compress and decompress files

USAGE:
    crabz [FLAGS] [OPTIONS] [FILE]

FLAGS:
    -d, --decompress    Flag to switch to decompressing inputs. Note: this flag may change in future releases
    -h, --help          Prints help information
    -V, --version       Prints version information

OPTIONS:
    -l, --compression-level 
   
            Compression level [default: 6]
    -p, --compression-threads 
    
     
            Number of compression threads to use, or if decompressing a format that allow for multi-threaded
            decompression, the number to use. Note that > 4 threads for decompression doesn't seem to help [default:
            32]
    -f, --format 
     
      
            The format to use [default: gzip]  [possible values: gzip, bgzf, mgzip,
            zlib, deflate, snap]
    -o, --output 
                                    Output path to write to, empty or "-" to write to stdout

ARGS:
    
       
         Input file to read from, empty or "-" to read from stdin 
       
     
    
   

Benchmarks

These benchmarks use the data in bench-data catted together 100 times. Run with bash ./benchmark.sh data.txt.

Benchmark system specs: Ubuntu 20 AMD Ryzen 9 3950X 16-Core Processor w/ 64 GB DDR4 memory and 1TB NVMe Drive

pigz v2.4 installed via apt on Ubuntu

Takeaways:

  • crabz with zlib backend is pretty much identical to pigz
  • crabz with zlib-ng backend is roughly 30-50% faster than pigz
  • crabz with rust backend is roughly 5-10% faster than pigz

It is already known that zlib-ng is faster than zlib, so none of this is groundbreaking. However, I think crabz gets an an edge due to the following:

  • crabz with deflate_rust backend is using all Rust only code, which is in theory more secure / safe.
  • crabz with zlib-ng is easier to install than pigz with a zlib-ng backend
  • crabz supports more formats than pigz
  • crabz is cross platform and can run on windows

With regards to block formats like Mgzip and BGZF, crabz is using libdeflater by default which excels at compressing and decompression known-sized blocks. This makes block compression formats very fast at a small loss to the compression ratio.

Comparing crabz against tools like bgzip, which also defaults to libdeflater as a backend shows them within a few percent of eachother.

As crabz is just a wrapper for the gzp library, the most exciting thing about these benchmarks is that gzp is on par with best in class CLI tools for multi-threaded compression and decompression as a library.

Flate2 zlib-ng backend

Compression

Command Mean [s] Min [s] Max [s] Relative
crabz -p 1 -c 3 < ./data.txt 6.450 ± 0.069 6.328 6.540 16.86 ± 0.24
pigz -p 1 -3 < ./data.txt 11.404 ± 0.152 11.186 11.717 29.81 ± 0.49
crabz -p 2 -c 3 < ./data.txt 3.437 ± 0.017 3.418 3.461 8.98 ± 0.10
pigz -p 2 -3 < ./data.txt 5.868 ± 0.031 5.826 5.927 15.34 ± 0.17
crabz -p 4 -c 3 < ./data.txt 1.741 ± 0.008 1.729 1.752 4.55 ± 0.05
pigz -p 4 -3 < ./data.txt 2.952 ± 0.008 2.939 2.960 7.72 ± 0.08
crabz -p 8 -c 3 < ./data.txt 0.889 ± 0.004 0.882 0.895 2.32 ± 0.02
pigz -p 8 -3 < ./data.txt 1.505 ± 0.008 1.493 1.520 3.93 ± 0.04
crabz -p 16 -c 3 < ./data.txt 0.485 ± 0.014 0.457 0.502 1.27 ± 0.04
pigz -p 16 -3 < ./data.txt 0.775 ± 0.011 0.764 0.797 2.02 ± 0.04
crabz -p 32 -c 3 < ./data.txt 0.383 ± 0.004 0.375 0.388 1.00
pigz -p 32 -3 < ./data.txt 0.699 ± 0.029 0.668 0.770 1.83 ± 0.08
crabz -p 1 -c 6 < ./data.txt 10.367 ± 0.211 10.106 10.642 27.10 ± 0.61
pigz -p 1 -6 < ./data.txt 26.734 ± 0.345 26.234 27.135 69.89 ± 1.12
crabz -p 2 -c 6 < ./data.txt 5.366 ± 0.036 5.299 5.429 14.03 ± 0.16
pigz -p 2 -6 < ./data.txt 13.589 ± 0.083 13.428 13.679 35.52 ± 0.40
crabz -p 4 -c 6 < ./data.txt 2.719 ± 0.021 2.694 2.757 7.11 ± 0.09
pigz -p 4 -6 < ./data.txt 6.887 ± 0.013 6.871 6.916 18.00 ± 0.17
crabz -p 8 -c 6 < ./data.txt 1.381 ± 0.007 1.372 1.397 3.61 ± 0.04
pigz -p 8 -6 < ./data.txt 3.479 ± 0.008 3.463 3.488 9.09 ± 0.09
crabz -p 16 -c 6 < ./data.txt 0.745 ± 0.022 0.727 0.804 1.95 ± 0.06
pigz -p 16 -6 < ./data.txt 1.818 ± 0.036 1.765 1.874 4.75 ± 0.10
crabz -p 32 -c 6 < ./data.txt 0.549 ± 0.006 0.538 0.557 1.44 ± 0.02
pigz -p 32 -6 < ./data.txt 1.187 ± 0.011 1.172 1.210 3.10 ± 0.04
crabz -p 1 -c 9 < ./data.txt 30.114 ± 0.196 29.842 30.420 78.72 ± 0.90
pigz -p 1 -9 < ./data.txt 51.369 ± 0.164 51.246 51.698 134.29 ± 1.33
crabz -p 2 -c 9 < ./data.txt 15.371 ± 0.070 15.202 15.443 40.18 ± 0.42
pigz -p 2 -9 < ./data.txt 26.452 ± 0.085 26.253 26.576 69.15 ± 0.69
crabz -p 4 -c 9 < ./data.txt 7.729 ± 0.022 7.699 7.768 20.20 ± 0.20
pigz -p 4 -9 < ./data.txt 13.365 ± 0.047 13.271 13.449 34.94 ± 0.35
crabz -p 8 -c 9 < ./data.txt 3.901 ± 0.006 3.889 3.910 10.20 ± 0.10
pigz -p 8 -9 < ./data.txt 6.749 ± 0.014 6.737 6.781 17.64 ± 0.17
crabz -p 16 -c 9 < ./data.txt 2.039 ± 0.024 1.997 2.071 5.33 ± 0.08
pigz -p 16 -9 < ./data.txt 3.486 ± 0.054 3.426 3.574 9.11 ± 0.17
crabz -p 32 -c 9 < ./data.txt 1.337 ± 0.072 1.220 1.411 3.49 ± 0.19
pigz -p 32 -9 < ./data.txt 2.203 ± 0.114 2.082 2.378 5.76 ± 0.30

Decompression

Command Mean [s] Min [s] Max [s] Relative
crabz -d < ./data.3.txt.gz 1.422 ± 0.010 1.411 1.437 1.03 ± 0.02
pigz -d < ./data.3.txt.gz 1.674 ± 0.031 1.621 1.705 1.21 ± 0.03
crabz -d < ./data.6.txt.gz 1.403 ± 0.016 1.389 1.427 1.01 ± 0.02
pigz -d < ./data.6.txt.gz 1.724 ± 0.026 1.697 1.766 1.24 ± 0.02
crabz -d < ./data.9.txt.gz 1.385 ± 0.018 1.359 1.416 1.00
pigz -d < ./data.9.txt.gz 1.745 ± 0.044 1.684 1.797 1.26 ± 0.04

Flate2 zlib backend

Compression

Command Mean [s] Min [s] Max [s] Relative
crabz -p 1 -c 3 < ./data.txt 11.248 ± 0.247 11.085 11.532 20.23 ± 0.45
pigz -p 1 -3 < ./data.txt 11.296 ± 0.171 11.104 11.434 20.32 ± 0.31
crabz -p 2 -c 3 < ./data.txt 5.681 ± 0.040 5.645 5.725 10.22 ± 0.08
pigz -p 2 -3 < ./data.txt 5.926 ± 0.015 5.916 5.944 10.66 ± 0.04
crabz -p 4 -c 3 < ./data.txt 2.891 ± 0.007 2.883 2.895 5.20 ± 0.02
pigz -p 4 -3 < ./data.txt 2.966 ± 0.013 2.955 2.980 5.34 ± 0.03
crabz -p 8 -c 3 < ./data.txt 1.461 ± 0.003 1.459 1.465 2.63 ± 0.01
pigz -p 8 -3 < ./data.txt 1.509 ± 0.004 1.505 1.512 2.71 ± 0.01
crabz -p 16 -c 3 < ./data.txt 0.784 ± 0.010 0.775 0.795 1.41 ± 0.02
pigz -p 16 -3 < ./data.txt 0.772 ± 0.010 0.765 0.784 1.39 ± 0.02
crabz -p 32 -c 3 < ./data.txt 0.556 ± 0.002 0.554 0.557 1.00
pigz -p 32 -3 < ./data.txt 0.743 ± 0.047 0.694 0.786 1.34 ± 0.08
crabz -p 1 -c 6 < ./data.txt 26.366 ± 0.154 26.189 26.469 47.42 ± 0.31
pigz -p 1 -6 < ./data.txt 26.688 ± 0.103 26.579 26.783 48.00 ± 0.23
crabz -p 2 -c 6 < ./data.txt 13.443 ± 0.069 13.400 13.523 24.18 ± 0.14
pigz -p 2 -6 < ./data.txt 13.605 ± 0.059 13.567 13.673 24.47 ± 0.13
crabz -p 4 -c 6 < ./data.txt 6.833 ± 0.005 6.828 6.837 12.29 ± 0.03
pigz -p 4 -6 < ./data.txt 6.866 ± 0.028 6.834 6.884 12.35 ± 0.06
crabz -p 8 -c 6 < ./data.txt 3.446 ± 0.000 3.445 3.446 6.20 ± 0.02
pigz -p 8 -6 < ./data.txt 3.482 ± 0.002 3.480 3.483 6.26 ± 0.02
crabz -p 16 -c 6 < ./data.txt 1.822 ± 0.012 1.813 1.835 3.28 ± 0.02
pigz -p 16 -6 < ./data.txt 1.771 ± 0.004 1.767 1.776 3.19 ± 0.01
crabz -p 32 -c 6 < ./data.txt 1.178 ± 0.008 1.171 1.187 2.12 ± 0.02
pigz -p 32 -6 < ./data.txt 1.184 ± 0.001 1.184 1.185 2.13 ± 0.01
crabz -p 1 -c 9 < ./data.txt 52.122 ± 0.288 51.790 52.293 93.75 ± 0.58
pigz -p 1 -9 < ./data.txt 53.031 ± 0.071 52.951 53.085 95.39 ± 0.29
crabz -p 2 -c 9 < ./data.txt 26.287 ± 0.047 26.249 26.339 47.28 ± 0.15
pigz -p 2 -9 < ./data.txt 26.409 ± 0.238 26.190 26.662 47.50 ± 0.45
crabz -p 4 -c 9 < ./data.txt 13.373 ± 0.051 13.317 13.419 24.05 ± 0.11
pigz -p 4 -9 < ./data.txt 13.414 ± 0.035 13.383 13.451 24.13 ± 0.09
crabz -p 8 -c 9 < ./data.txt 6.733 ± 0.003 6.731 6.736 12.11 ± 0.03
pigz -p 8 -9 < ./data.txt 6.763 ± 0.004 6.761 6.767 12.16 ± 0.03
crabz -p 16 -c 9 < ./data.txt 3.487 ± 0.034 3.450 3.517 6.27 ± 0.06
pigz -p 16 -9 < ./data.txt 3.459 ± 0.021 3.434 3.473 6.22 ± 0.04
crabz -p 32 -c 9 < ./data.txt 2.088 ± 0.008 2.081 2.097 3.76 ± 0.02
pigz -p 32 -9 < ./data.txt 2.107 ± 0.023 2.090 2.133 3.79 ± 0.04

Decompression

Flate2 rust backend

Compression

Command Mean [s] Min [s] Max [s] Relative
crabz -p 1 -c 3 < ./data.txt 10.167 ± 0.164 10.050 10.355 18.57 ± 0.33
pigz -p 1 -3 < ./data.txt 11.338 ± 0.071 11.292 11.420 20.71 ± 0.21
crabz -p 2 -c 3 < ./data.txt 4.912 ± 0.013 4.898 4.920 8.97 ± 0.08
pigz -p 2 -3 < ./data.txt 5.876 ± 0.047 5.826 5.919 10.73 ± 0.12
crabz -p 4 -c 3 < ./data.txt 2.463 ± 0.018 2.447 2.482 4.50 ± 0.05
pigz -p 4 -3 < ./data.txt 2.967 ± 0.008 2.958 2.972 5.42 ± 0.05
crabz -p 8 -c 3 < ./data.txt 1.255 ± 0.005 1.250 1.261 2.29 ± 0.02
pigz -p 8 -3 < ./data.txt 1.509 ± 0.002 1.507 1.511 2.76 ± 0.02
crabz -p 16 -c 3 < ./data.txt 0.705 ± 0.030 0.673 0.731 1.29 ± 0.05
pigz -p 16 -3 < ./data.txt 0.780 ± 0.015 0.768 0.797 1.42 ± 0.03
crabz -p 32 -c 3 < ./data.txt 0.547 ± 0.004 0.544 0.552 1.00
pigz -p 32 -3 < ./data.txt 0.755 ± 0.025 0.726 0.771 1.38 ± 0.05
crabz -p 1 -c 6 < ./data.txt 27.064 ± 0.288 26.863 27.394 49.44 ± 0.66
pigz -p 1 -6 < ./data.txt 27.034 ± 0.090 26.938 27.117 49.38 ± 0.43
crabz -p 2 -c 6 < ./data.txt 12.400 ± 0.083 12.321 12.487 22.65 ± 0.24
pigz -p 2 -6 < ./data.txt 13.619 ± 0.074 13.558 13.702 24.88 ± 0.24
crabz -p 4 -c 6 < ./data.txt 6.279 ± 0.023 6.263 6.305 11.47 ± 0.10
pigz -p 4 -6 < ./data.txt 6.879 ± 0.020 6.867 6.901 12.57 ± 0.11
crabz -p 8 -c 6 < ./data.txt 3.189 ± 0.010 3.178 3.198 5.83 ± 0.05
pigz -p 8 -6 < ./data.txt 3.477 ± 0.007 3.470 3.483 6.35 ± 0.05
crabz -p 16 -c 6 < ./data.txt 1.756 ± 0.015 1.740 1.771 3.21 ± 0.04
pigz -p 16 -6 < ./data.txt 1.799 ± 0.024 1.779 1.827 3.29 ± 0.05
crabz -p 32 -c 6 < ./data.txt 1.192 ± 0.011 1.183 1.205 2.18 ± 0.03
pigz -p 32 -6 < ./data.txt 1.196 ± 0.016 1.183 1.214 2.19 ± 0.03
crabz -p 1 -c 9 < ./data.txt 44.907 ± 0.283 44.585 45.116 82.03 ± 0.84
pigz -p 1 -9 < ./data.txt 53.109 ± 1.049 52.373 54.311 97.02 ± 2.07
crabz -p 2 -c 9 < ./data.txt 19.977 ± 0.159 19.819 20.136 36.49 ± 0.41
pigz -p 2 -9 < ./data.txt 26.562 ± 0.134 26.407 26.643 48.52 ± 0.46
crabz -p 4 -c 9 < ./data.txt 10.397 ± 0.484 10.070 10.953 18.99 ± 0.90
pigz -p 4 -9 < ./data.txt 13.346 ± 0.040 13.300 13.372 24.38 ± 0.21
crabz -p 8 -c 9 < ./data.txt 5.100 ± 0.021 5.076 5.114 9.32 ± 0.08
pigz -p 8 -9 < ./data.txt 6.754 ± 0.016 6.736 6.767 12.34 ± 0.10
crabz -p 16 -c 9 < ./data.txt 2.716 ± 0.014 2.708 2.732 4.96 ± 0.05
pigz -p 16 -9 < ./data.txt 3.444 ± 0.038 3.420 3.487 6.29 ± 0.09
crabz -p 32 -c 9 < ./data.txt 1.747 ± 0.009 1.740 1.758 3.19 ± 0.03
pigz -p 32 -9 < ./data.txt 2.086 ± 0.008 2.077 2.093 3.81 ± 0.03

Decompression

Command Mean [s] Min [s] Max [s] Relative
crabz -d < ./data.3.txt.gz 1.599 ± 0.014 1.573 1.615 1.00
pigz -d < ./data.3.txt.gz 1.696 ± 0.020 1.654 1.725 1.06 ± 0.02
crabz -d < ./data.6.txt.gz 1.615 ± 0.012 1.586 1.626 1.01 ± 0.01
pigz -d < ./data.6.txt.gz 1.760 ± 0.030 1.687 1.797 1.10 ± 0.02
crabz -d < ./data.9.txt.gz 1.613 ± 0.014 1.596 1.641 1.01 ± 0.01
pigz -d < ./data.9.txt.gz 1.767 ± 0.012 1.748 1.787 1.11 ± 0.01

Block Formats with libdeflater

Decompression

Command Mean [s] Min [s] Max [s] Relative
crabz -p 1 -d -f mgzip ./bdata.3.txt.gz > data.txt 1.221 ± 0.164 1.073 1.397 2.32 ± 0.31
pigz -d -c ./bdata.3.txt.gz > data.txt 2.415 ± 0.063 2.347 2.472 4.58 ± 0.14
crabz -p 1 -d -f mgzip ./bdata.6.txt.gz > data.txt 1.256 ± 0.063 1.200 1.325 2.38 ± 0.13
pigz -d -c ./bdata.6.txt.gz > data.txt 2.513 ± 0.052 2.467 2.569 4.77 ± 0.13
crabz -p 1 -d -f mgzip ./bdata.9.txt.gz > data.txt 1.147 ± 0.065 1.094 1.219 2.18 ± 0.13
pigz -d -c ./bdata.9.txt.gz > data.txt 2.394 ± 0.118 2.262 2.488 4.54 ± 0.24
crabz -p 1 -d -f mgzip ./bdata.12.txt.gz > data.txt 1.165 ± 0.074 1.106 1.248 2.21 ± 0.15
pigz -d -c ./bdata.12.txt.gz > data.txt 2.457 ± 0.067 2.408 2.534 4.66 ± 0.15
crabz -p 2 -d -f mgzip ./bdata.3.txt.gz > data.txt 0.634 ± 0.008 0.628 0.642 1.20 ± 0.03
pigz -d -c ./bdata.3.txt.gz > data.txt 2.379 ± 0.012 2.368 2.391 4.51 ± 0.08
crabz -p 2 -d -f mgzip ./bdata.6.txt.gz > data.txt 0.645 ± 0.015 0.629 0.658 1.22 ± 0.03
pigz -d -c ./bdata.6.txt.gz > data.txt 2.438 ± 0.073 2.356 2.497 4.62 ± 0.16
crabz -p 2 -d -f mgzip ./bdata.9.txt.gz > data.txt 0.659 ± 0.015 0.644 0.674 1.25 ± 0.04
pigz -d -c ./bdata.9.txt.gz > data.txt 2.451 ± 0.075 2.400 2.538 4.65 ± 0.16
crabz -p 2 -d -f mgzip ./bdata.12.txt.gz > data.txt 0.656 ± 0.015 0.647 0.673 1.24 ± 0.04
pigz -d -c ./bdata.12.txt.gz > data.txt 2.450 ± 0.045 2.412 2.500 4.65 ± 0.12
crabz -p 4 -d -f mgzip ./bdata.3.txt.gz > data.txt 0.577 ± 0.024 0.554 0.603 1.10 ± 0.05
pigz -d -c ./bdata.3.txt.gz > data.txt 2.459 ± 0.052 2.420 2.518 4.66 ± 0.13
crabz -p 4 -d -f mgzip ./bdata.6.txt.gz > data.txt 0.559 ± 0.024 0.531 0.576 1.06 ± 0.05
pigz -d -c ./bdata.6.txt.gz > data.txt 2.538 ± 0.044 2.502 2.587 4.81 ± 0.12
crabz -p 4 -d -f mgzip ./bdata.9.txt.gz > data.txt 0.552 ± 0.011 0.539 0.560 1.05 ± 0.03
pigz -d -c ./bdata.9.txt.gz > data.txt 2.402 ± 0.018 2.385 2.420 4.56 ± 0.08
crabz -p 4 -d -f mgzip ./bdata.12.txt.gz > data.txt 0.592 ± 0.040 0.546 0.616 1.12 ± 0.08
pigz -d -c ./bdata.12.txt.gz > data.txt 2.525 ± 0.038 2.484 2.558 4.79 ± 0.11
crabz -p 8 -d -f mgzip ./bdata.3.txt.gz > data.txt 0.563 ± 0.013 0.548 0.571 1.07 ± 0.03
pigz -d -c ./bdata.3.txt.gz > data.txt 2.490 ± 0.126 2.369 2.621 4.72 ± 0.25
crabz -p 8 -d -f mgzip ./bdata.6.txt.gz > data.txt 0.552 ± 0.018 0.533 0.569 1.05 ± 0.04
pigz -d -c ./bdata.6.txt.gz > data.txt 2.531 ± 0.115 2.417 2.647 4.80 ± 0.23
crabz -p 8 -d -f mgzip ./bdata.9.txt.gz > data.txt 0.603 ± 0.029 0.583 0.636 1.14 ± 0.06
pigz -d -c ./bdata.9.txt.gz > data.txt 2.483 ± 0.042 2.435 2.515 4.71 ± 0.11
crabz -p 8 -d -f mgzip ./bdata.12.txt.gz > data.txt 0.527 ± 0.009 0.519 0.537 1.00
pigz -d -c ./bdata.12.txt.gz > data.txt 2.524 ± 0.093 2.417 2.583 4.79 ± 0.19
crabz -p 16 -d -f mgzip ./bdata.3.txt.gz > data.txt 0.603 ± 0.058 0.551 0.665 1.14 ± 0.11
pigz -d -c ./bdata.3.txt.gz > data.txt 2.392 ± 0.007 2.384 2.397 4.54 ± 0.08
crabz -p 16 -d -f mgzip ./bdata.6.txt.gz > data.txt 0.611 ± 0.065 0.565 0.686 1.16 ± 0.13
pigz -d -c ./bdata.6.txt.gz > data.txt 2.593 ± 0.148 2.427 2.712 4.92 ± 0.29
crabz -p 16 -d -f mgzip ./bdata.9.txt.gz > data.txt 0.564 ± 0.027 0.541 0.594 1.07 ± 0.05
pigz -d -c ./bdata.9.txt.gz > data.txt 2.426 ± 0.023 2.404 2.450 4.60 ± 0.09
crabz -p 16 -d -f mgzip ./bdata.12.txt.gz > data.txt 0.601 ± 0.020 0.582 0.623 1.14 ± 0.04
pigz -d -c ./bdata.12.txt.gz > data.txt 2.528 ± 0.022 2.507 2.550 4.80 ± 0.09
crabz -p 32 -d -f mgzip ./bdata.3.txt.gz > data.txt 0.595 ± 0.019 0.577 0.614 1.13 ± 0.04
pigz -d -c ./bdata.3.txt.gz > data.txt 2.544 ± 0.107 2.422 2.621 4.83 ± 0.22
crabz -p 32 -d -f mgzip ./bdata.6.txt.gz > data.txt 0.601 ± 0.021 0.586 0.626 1.14 ± 0.05
pigz -d -c ./bdata.6.txt.gz > data.txt 2.519 ± 0.114 2.435 2.649 4.78 ± 0.23
crabz -p 32 -d -f mgzip ./bdata.9.txt.gz > data.txt 0.565 ± 0.023 0.539 0.579 1.07 ± 0.05
pigz -d -c ./bdata.9.txt.gz > data.txt 2.487 ± 0.064 2.415 2.540 4.72 ± 0.15
crabz -p 32 -d -f mgzip ./bdata.12.txt.gz > data.txt 0.557 ± 0.013 0.548 0.571 1.06 ± 0.03
pigz -d -c ./bdata.12.txt.gz > data.txt 2.505 ± 0.105 2.442 2.626 4.75 ± 0.22

TODOs

  • Test with jemalloc
  • Add some form of auto format detection, even just by file extension
Comments
  • gzip should be handled by ParDecompress

    gzip should be handled by ParDecompress

    When doing parallel compression of a gzip file, crabz uses ZBuilder which instantiates a ParCompress when num_threads > 1.

    However, decompression always uses single-threaded MultiGzDecoder. Why is it not using ParDecompress when num_threads > 1?

    opened by derolf 13
  • compression level 1 error gzip stdin invalid compressed data format violated tar Unexpected EOF in archive

    compression level 1 error gzip stdin invalid compressed data format violated tar Unexpected EOF in archive

    I created a tgz using specific compression level and threads with crabz:

    git clone https://github.com/sstadick/crabz.git
    mv crabz blah
    time tar cf - blah/ | crabz --compression-level=1 --compression-threads=6 > ./blah.tgz
    mkdir testcrabz
    cd testcrabz/
    mv ../blah.tgz .
    tar zxf ./blah.tgz 
    

    I expected success when extracting a tgz created with crabz. Instead, I got the following error output:

    gzip: stdin: invalid compressed data--format violated
    tar: Unexpected EOF in archive
    tar: Unexpected EOF in archive
    tar: Error is not recoverable: exiting now
    

    NOTE: This error only surfaces when extracting with tar when crabz uses compression level 1. tar extracts as expected when crabz uses compression levels 0, 2 to 9 BUT NOT 1.

    bug 
    opened by omac777 8
  • Does not compile with Rust-only DEFLATE

    Does not compile with Rust-only DEFLATE

    I have tried to install crabz with Rust-only compressor implementations using the following command:

    cargo install crabz --no-default-features --features=snap_default,deflate_rust
    

    It fails to compile with the following error:

    error[E0599]: no variant or associated item named `Zlib` found for enum `Format` in the current scope
       --> /home/shnatsel/.cargo/registry/src/github.com-1ecc6299db9ec823/crabz-0.7.2/src/main.rs:281:21
        |
    170 | enum Format {
        | ----------- variant or associated item `Zlib` not found here
    ...
    281 |             Format::Zlib => ("zz", string_set!["zz", "z", "gz"]),
        |                     ^^^^ variant or associated item not found in `Format`
    
    For more information about this error, try `rustc --explain E0599`.
    
    opened by Shnatsel 1
  • Regular gzipped file created instead of a bgzf one, when 1 thread is requested.

    Regular gzipped file created instead of a bgzf one, when 1 thread is requested.

    Regular gzipped file created instead of a bgzf one, when 1 thread is requested.

    crabz -p 1 -f bgzf -o test.csv.bgzf_threads_1.gz test.csv
    crabz -p 2 -f bgzf -o test.csv.bgzf_threads_2.gz test.csv
    
    $ file test.csv.bgzf_threads_1.gz test.csv.bgzf_threads_2.gz
    test.csv.bgzf_threads_1.gz:                                    gzip compressed data
    test.csv.bgzf_threads_2.gz:                                    gzip compressed data, extra field
    
    # Try to decompress the files
    $ crabz -p 1 -d -f bgzf test.csv.bgzf_threads_1.gz | wc -l
    [2021-09-14T12:22:32Z INFO  crabz] Decompressing (bgzf) with 1 threads available.
    Error: Invalid block header: Extra field flag not set
    0
    
    $ crabz -p 1 -d -f bgzf test.csv.bgzf_threads_2.gz | wc -l
    [2021-09-14T12:23:18Z INFO  crabz] Decompressing (bgzf) with 1 threads available.
    100000000
    
    
    opened by ghuls 1
  • Do not panick when piping result and closing pipe with -f bgzf.

    Do not panick when piping result and closing pipe with -f bgzf.

    Do not panick when piping result and closing pipe with -f bgzf.

    $ crabz -p 4 -d -f bgzf test.fastq.gz | head
    [2021-09-07T16:57:14Z INFO  crabz] Decompressing (bgzf) with 4 threads available.
    @NB501171:702:H7Y55BGXH:4:11401:19233:1053 2:N:0:CACCGCACCA+ANTGACAGTC
    NNNNNNTNAAAAATGCCCTAGCCCCCTTCAGAANACAAGGCAAA
    +
    ######/#EEAEEEE/EE//E/E////EA////#AEE/E/E/E<
    @NB501171:702:H7Y55BGXH:4:11401:23376:1053 2:N:0:CACCGCACCA+ANTGACAGTC
    NNNNNNTNTTATGTAACTAATGCATCTTGCCCTNATCTCTTTGC
    +
    ######E#EEEEEEEEEEEEEE<E/AEEAE/EA#EEEEEEEE/E
    @NB501171:702:H7Y55BGXH:4:11401:8502:1053 2:N:0:CACCGCACCA+AATGACAGTC
    NNNNNNTNGAAGGCAGACTGCATGGCTTAATTTNAAAAATCATT
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: "SendError(..)"', ~/.cargo/registry/src/github.com-1ecc6299db9ec823/gzp-0.8.0/src/par/decompress.rs:158:35
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
    
    
    opened by ghuls 1
  • Create a BGZIP index (BGZF) on the fly

    Create a BGZIP index (BGZF) on the fly

    See: http://www.htslib.org/doc/bgzip.html. This should not be too hard if you keep the uncompressed and compressed byte offset of each block you are processing. Right now, I have to use bgzip to create the index.

    enhancement 
    opened by nh13 2
  • crabz failed to build on centos 7

    crabz failed to build on centos 7

    I had a build error with centos 7 because the default cmake package is an older version. The steps below help to get crabz built and installed successfully on centos 7.

    1. Install cmake3
    dnf install cmake3
    alternatives --install /usr/local/bin/cmake cmake /usr/bin/cmake 10 --slave /usr/local/bin/ctest ctest /usr/bin/ctest --slave /usr/local/bin/cpack cpack /usr/bin/cpack --slave /usr/local/bin/ccmake ccmake /usr/bin/ccmake --family cmake
    alternatives --install /usr/local/bin/cmake cmake /usr/bin/cmake3 20 --slave /usr/local/bin/ctest ctest /usr/bin/ctest3 --slave /usr/local/bin/cpack cpack /usr/bin/cpack3 --slave /usr/local/bin/ccmake ccmake /usr/bin/ccmake3 --family cmake
    
    1. build crabz
    cargo install crabz --force
    
    1. REVERT BACK TO DEFAULT CENTOS 7 CMAKE 2
    alternatives --config cmake
    
    this_is_a_solution 
    opened by omac777 3
  • Performance Tracking

    Performance Tracking

    @Shnatsel I'm moving performance tracking for crabz related things to here.

    Run the same benchmarks as found here: https://github.com/zlib-ng/pigzbench with the different backends.

    The zlib-ng benchmarks pretty clearly indicate that zlib-ng is the way to go as a backend, which matches what I see in benchmarks. zlib and the rust backends for flate2 perform about the same.

    performance 
    opened by sstadick 13
Releases(v0.7.5)
Owner
Seth
Principal Bioinformatics Engineer @fulcrumgenomics
Seth
Obvious Unified Compression Helper is a CLI tool to help you compress and decompress files of several formats

Ouch! ouch stands for Obvious Unified Compression Helper and is a CLI tool to help you compress and decompress files of several formats. Features Usag

null 734 Dec 30, 2022
A reimplementation of the Zopfli compression tool in Rust.

Zopfli in Rust This is a reimplementation of the Zopfli compression tool in Rust. Carol Nichols started the Rust implementation as an experiment in in

null 11 Dec 26, 2022
Basic (and naïve) LZW and Huffman compression algorithms in Rust.

Naive implementation of the LZW and Huffman compression algorithms. To run, install the Rust toolchain. Cargo may be used to compile the source. Examp

Luiz Felipe Gonçalves 9 May 22, 2023
libbz2 (bzip2 compression) bindings for Rust

bzip2 Documentation A streaming compression/decompression library for rust with bindings to libbz2. # Cargo.toml [dependencies] bzip2 = "0.4" License

Alex Crichton 67 Dec 27, 2022
A Rust implementation of the Zopfli compression algorithm.

Zopfli in Rust This is a reimplementation of the Zopfli compression tool in Rust. I have totally ignored zopflipng. More info about why and how I did

Carol (Nichols || Goulding) 76 Oct 20, 2022
lzlib (lzip compression) bindings for Rust

lzip Documentation A streaming compression/decompression library for rust with bindings to lzlib. # Cargo.toml [dependencies] lzip = "0.1" License Lic

Firas Khalil Khana 8 Sep 20, 2022
(WIP) Taking the pain away from file (de)compression

Ouch! ouch loosely stands for Obvious Unified Compression files Helper and aims to be an easy and intuitive way of compressing and decompressing files

Vinícius Miguel 734 Dec 30, 2022
gzp - Multi-threaded Compression

gzp - Multi-threaded Compression

Seth 123 Dec 28, 2022
Fastest Snappy compression library in Node.js

snappy !!! For [email protected] and below, please go to node-snappy. More background about the 6-7 changes, please read this, Thanks @kesla . ?? Help me to

LongYinan 103 Jan 2, 2023
Michael's Compression Algorithm

mca This repository contains a compression algorithm written by me (Michael Grigoryan). The algorithm is only able to compress and decompress text fil

Michael Grigoryan 1 Dec 19, 2022
An extremely fast alternative to zip which is written in rust.

Zap Compress and/or encrypt folders fast. Like, really fast. or as some say, blazingly fast. Installation To install Zap, run the following command fr

null 39 Dec 23, 2022
An extremely fast alternative to zip which is written in rust.

Zap Compress and/or encrypt folders fast. Like, really fast. or as some say, blazingly fast. Installation To install Zap, run the following command fr

null 37 Nov 9, 2022
A utility that can download JavaScript and TypeScript module graphs and store them locally in a special zip file.

eszip A utility that can download JavaScript and TypeScript module graphs and store them locally in a special zip file. To create a new archive: > esz

Deno Land 162 Dec 24, 2022
A simple rust library to read and write Zip archives, which is also my pet project for learning Rust

rust-zip A simple rust library to read and write Zip archives, which is also my pet project for learning Rust. At the moment you can list the files in

Kang Seonghoon 2 Jan 5, 2022
A Brotli implementation in pure and safe Rust

Brotli-rs - Brotli decompression in pure, safe Rust Documentation Compression provides a <Read>-struct to wrap a Brotli-compressed stream. A consumer

Thomas Pickert 59 Oct 7, 2022
Brotli compressor and decompressor written in rust that optionally avoids the stdlib

rust-brotli What's new in 3.2 into_inner conversions for both Reader and Writer classes What's new in 3.0 A fully compatible FFI for drop-in compatibi

Dropbox 659 Dec 29, 2022
DEFLATE, gzip, and zlib bindings for Rust

flate2 A streaming compression/decompression library DEFLATE-based streams in Rust. This crate by default uses the miniz_oxide crate, a port of miniz.

The Rust Programming Language 619 Jan 8, 2023
A Rust application that compress files and folders

Quick Storer This is a Rust application that compress files and folders. Usage Download or build the binary and place it on your desktop, or any other

AL68 & co. 1 Feb 2, 2022
SIMD Floating point and integer compressed vector library

compressed_vec Floating point and integer compressed vector library, SIMD-enabled for fast processing/iteration over compressed representations. This

Evan Chan 56 Nov 24, 2022