banzai: pure rust bzip2 encoder

Related tags

Compression banzai
Overview

banzai

banzai is a bzip2 encoder with linear-time complexity, written entirely in safe Rust. It is currently alpha software, which means that it is not battle-hardened and is not guaranteed to perform well and not eat your data. That's not to say, however, that I don't care about performance or reliability - bug reports are warmly appreciated! In the long term I would like to get this library to a state where it can be relied upon in production software.

To use banzai as a command-line tool with a similar interface to bzip(1), install bnz through cargo.

banzai currently uses a near-identical method of choosing Huffman trees to the reference implementation and therefore achieves very similar compression ratios. Compared to the reference implementation, banzai has worse average runtime but better worst-case runtime. This is because of the different algorithms used to compute the Burrows-Wheeler Transform. The choice of algorithm used in banzai is SA-IS, which computes a suffix array in linear time. Since bzip2 uses a 'wrap-around' version of the BWT, we are obliged to compute the suffix array of the input concatenated with itself. I intend to investigate ways in which the redundancy inherent to inputs of this form can be exploited to optimise suffix array construction.

This library does not (currently) include a decompressor. Paolo Barbolini's bzip2-rs offers a pure Rust bzip2 decompressor, though I have not used it myself and cannot vouch for its quality.

Interface

fn encode(reader: R, writer: io::BufWriter<W>, level: usize) -> io::Result<usize>
where
    R: io::BufRead,
    W: io::Write

Call encode with a reference to an input buffer and a BufWriter. The final parameter is level, which is a number between 1 and 9 inclusive, which corresponds to the block size (block size is level * 100_000 bytes). The typical default is 9. Returns the number of input bytes encoded.

Safety

banzai is written entirely in safe Rust. This is a deliberate choice which will, in future, make banzai a good choice for applications where memory-safety is of paramount importance. However, this decisions comes with some performance costs. Experiments suggest that banzai could be approximately 10% faster if the extremely hot Index impls on Data and Array in bwt.rs were changed to be unchecked. In the future such performance boosts may be made available to consumers of the library behind a feature gate.

Acknowledgements

This is original libre software. However, implementation guidance was derived from several free-software sources.

The suffix array construction algorithm used in banzai is SA-IS, which was developed by Ge Nong, Sen Zhang, and Wai Hong Chan. Guidance for implementing SA-IS was derived from Yuta Mori's sais and burntsushi's suffix.

The implementation of Huffman coding used in banzai takes heavy inspiration from the reference implementation of bzip2, originally authored by Julian Seward, currently maintained by Micah Snyder.

Finally, the unofficial bzip2 Format Specification written by Joe Tsai was extremely helpful when it came to the specifics of the bzip2 binary format.

You might also like...
Like pigz, but rust - a cross platform, fast, compression and decompression tool.

🦀 crabz Like pigz, but rust. A cross platform, fast, compression and decompression tool. Synopsis This is currently a proof of concept CLI tool using

A reimplementation of the Zopfli compression tool in Rust.

Zopfli in Rust This is a reimplementation of the Zopfli compression tool in Rust. Carol Nichols started the Rust implementation as an experiment in in

A Rust application that compress files and folders

Quick Storer This is a Rust application that compress files and folders. Usage Download or build the binary and place it on your desktop, or any other

lzlib (lzip compression) bindings for Rust

lzip Documentation A streaming compression/decompression library for rust with bindings to lzlib. # Cargo.toml [dependencies] lzip = "0.1" License Lic

An extremely fast alternative to zip which is written in rust.

Zap Compress and/or encrypt folders fast. Like, really fast. or as some say, blazingly fast. Installation To install Zap, run the following command fr

An extremely fast alternative to zip which is written in rust.

Zap Compress and/or encrypt folders fast. Like, really fast. or as some say, blazingly fast. Installation To install Zap, run the following command fr

Simple NoNG songs manager for GD, written in Rust.
Simple NoNG songs manager for GD, written in Rust.

nong-manager Simple NoNG songs manager for GD, written in Rust. Powered by Song File Hub (https://songfilehub.com/home) How to use Enter song ID that

Basic (and naïve) LZW and Huffman compression algorithms in Rust.

Naive implementation of the LZW and Huffman compression algorithms. To run, install the Rust toolchain. Cargo may be used to compile the source. Examp

Pure Rust bzip2 decoder

bzip2-rs Pure Rust 100% safe bzip2 decompressor. Features Default features: Rust = 1.34.2 is supported rustc_1_37: bump MSRV to 1.37, enable more opt

Ribzip2 - A bzip2 implementation in pure Rust.

ribzip2 - a comprehensible bzip2 implementation ribzip2 is command line utility providing bzip2 compression and decompression written in pure Rust. It

libbz2 (bzip2 compression) bindings for Rust

bzip2 Documentation A streaming compression/decompression library for rust with bindings to libbz2. # Cargo.toml [dependencies] bzip2 = "0.4" License

A binary encoder / decoder implementation in Rust.
A binary encoder / decoder implementation in Rust.

Bincode A compact encoder / decoder pair that uses a binary zero-fluff encoding scheme. The size of the encoded object will be the same or smaller tha

A Rust encoder/decoder for Dominic Szablewski's QOI format for fast, lossless image compression.

QOI - The “Quite OK Image” format This is a Rust encoder and decoder for Dominic Szablewski's QOI format for fast, lossless image compression. See the

A binary encoder / decoder implementation in Rust.

Bincode A compact encoder / decoder pair that uses a binary zero-fluff encoding scheme. The size of the encoded object will be the same or smaller tha

A basic rust QOI decoder/encoder

libqoi A basic rust QOI decoder/encoder. Why QOI QOI is a lossless image format with a one page specification. It can achieve better compression than

Free Rust-only Xbox ADPCM encoder and decoder

XbadPCM Safe (and optionally no-std) Rust crate for encoding and decoding Xbox ADPCM blocks. Decoding example Here is example code for decoding stereo

BitTorrent peer ID registry/parser/(soon) encoder for Rust

BitTorrent peer ID registry/parser/(soon) encoder By convention, BitTorrent clients identify themselves and their versions in peer IDs they send to tr

Rust port of ffmpeg's native AAC encoder

raash 🪇 An attempt at RIIR-ing the native AAC encoder from ffmpeg. First, I used c2rust to translate all relevant C code into Rust, and I'm in the pr

The fastest and safest AV1 encoder.

rav1e The fastest and safest AV1 encoder. Table of Content Overview Features Documentation Releases Building Dependency: NASM Release binary Unstable

Owner
Jack Byrne
Computer Programmer
Jack Byrne
Ribzip2 - A bzip2 implementation in pure Rust.

ribzip2 - a comprehensible bzip2 implementation ribzip2 is command line utility providing bzip2 compression and decompression written in pure Rust. It

null 16 Oct 24, 2022
libbz2 (bzip2 compression) bindings for Rust

bzip2 Documentation A streaming compression/decompression library for rust with bindings to libbz2. # Cargo.toml [dependencies] bzip2 = "0.4" License

Alex Crichton 67 Dec 27, 2022
A Brotli implementation in pure and safe Rust

Brotli-rs - Brotli decompression in pure, safe Rust Documentation Compression provides a <Read>-struct to wrap a Brotli-compressed stream. A consumer

Thomas Pickert 59 Oct 7, 2022
A simple rust library to read and write Zip archives, which is also my pet project for learning Rust

rust-zip A simple rust library to read and write Zip archives, which is also my pet project for learning Rust. At the moment you can list the files in

Kang Seonghoon 2 Jan 5, 2022
Brotli compressor and decompressor written in rust that optionally avoids the stdlib

rust-brotli What's new in 3.2 into_inner conversions for both Reader and Writer classes What's new in 3.0 A fully compatible FFI for drop-in compatibi

Dropbox 659 Dec 29, 2022
A Rust implementation of the Zopfli compression algorithm.

Zopfli in Rust This is a reimplementation of the Zopfli compression tool in Rust. I have totally ignored zopflipng. More info about why and how I did

Carol (Nichols || Goulding) 76 Oct 20, 2022
DEFLATE, gzip, and zlib bindings for Rust

flate2 A streaming compression/decompression library DEFLATE-based streams in Rust. This crate by default uses the miniz_oxide crate, a port of miniz.

The Rust Programming Language 619 Jan 8, 2023
Snappy bindings for Rust

Snappy [ Originally forked from https://github.com/thestinger/rust-snappy ] Documentation Usage Add this to your Cargo.toml: [dependencies] snappy = "

Jeff Belgum 14 Jan 21, 2022
Tar file reading/writing for Rust

tar-rs Documentation A tar archive reading/writing library for Rust. # Cargo.toml [dependencies] tar = "0.4" Reading an archive extern crate tar; use

Alex Crichton 490 Dec 30, 2022
Zip implementation in Rust

zip-rs Documentation Info A zip library for rust which supports reading and writing of simple ZIP files. Supported compression formats: stored (i.e. n

null 549 Jan 4, 2023