Rust library for reading/writing numbers in big-endian and little-endian.

Andrew Gallant

Last update: Jan 1, 2023

Related tags

Encoding Multimedia byteorder

Overview

byteorder

This crate provides convenience methods for encoding and decoding numbers in either big-endian or little-endian order.

Dual-licensed under MIT or the UNLICENSE.

Documentation

https://docs.rs/byteorder

Installation

This crate works with Cargo and is on crates.io. Add it to your Cargo.toml like so:

[dependencies]
byteorder = "1"

If you want to augment existing Read and Write traits, then import the extension methods like so:

use byteorder::{ReadBytesExt, WriteBytesExt, BigEndian, LittleEndian};

For example:

use std::io::Cursor;
use byteorder::{BigEndian, ReadBytesExt};

let mut rdr = Cursor::new(vec![2, 5, 3, 0]);
// Note that we use type parameters to indicate which kind of byte order
// we want!
assert_eq!(517, rdr.read_u16::<BigEndian>().unwrap());
assert_eq!(768, rdr.read_u16::<BigEndian>().unwrap());

`no_std` crates

This crate has a feature, std, that is enabled by default. To use this crate in a no_std context, add the following to your Cargo.toml:

[dependencies]
byteorder = { version = "1", default-features = false }

Alternatives

Note that as of Rust 1.32, the standard numeric types provide built-in methods like to_le_bytes and from_le_bytes, which support some of the same use cases.

Comments

Add a method to read the first bytes of a float (and assume the rest are 0)

I'm not sure if this is broadly useful enough to be worth adding to the library, but I've found myself using a similar function to this several times in a recent project so I thought I'd submit a PR in case you wanted it.

The idea is to read the first n bytes of an f64 as a uint and then assume that the rest of the bytes are 0. This is very useful if you're parsing lots of compressed floats in little-endian format where the low bits (which are often zero) can be dropped (eg. due to a form of run length encoding, or other compression that drops sequences of zeros).

opened by SamWhited 30
Improve WriteBytesExt API
At the moment the byteorder crate uses a enum to generalize over the endianness and the method name to select the type. Both is not very ergonomic to use. I propose the following interface:

trait WriteBytesExt<T> { fn write_le(&mut self, n: T) -> io::Result<()>; fn write_be(&mut self, n: T) -> io::Result<()>; } impl<W> WriteBytesExt<u8> for W where W: Write { fn write_le(&mut self, n: u8) -> io::Result<()> { .... } .... }

First of all it gets rid of the enum. Since the enum is purely a compile time parameter it cannot be used for dynamic dispatch. This is as good or bad as having it directly the method name. Thus I do not see the point of having it. Secondly it gets rid of the redundant type name in the signature.

This shortens the method call significantly

w.write_u16::<LittleEndian>(42u16)

becomes

w.write_le(42u16)

My two points are:

The type in the method signature carries redundant information.

The enum type parameter does not provide any benefit for the user.

Enums are most useful for runtime polymorphism. Thus, as the enum variants are no types these *BytesExt traits cannot be use to write generic code that abstracts over endianness. Again no benefit for the user.
opened by nwin 30
Add a version using new `io` module

As I understand, the purpose of this crate is to prepare for a world where we no longer have the endian-writing/reading functions on Reader and Writer. As someone that that uses those functions a lot, I would like to prepare my crates (namely bincode) for the new io crate. I'd be willing to help do the port if you are interested.

opened by TyOverby 25
approaching 1.0

byteorder is very heavily used, but its API has mostly remained the same since it was first released (which was inspired by both Go's encoding/binary package and the pre-existing methods in Rust's old standard library that fulfilled a similar role). There was however significant discussion on its API in #27, but I feel that no consensus has been reached and I don't think there's an obviously better API given Rust in its current form. Therefore, I'd like to propose that we cut a 1.0 release in the next few weeks.

I think the only outstanding issue that we should try to resolve before 1.0 is #52.

cc @nwin @TyOverby @sfackler @lambda
help wanted

opened by BurntSushi 16
Consider mechanisms to convert &[u32] to &[u8]
It makes me a little sad to see unsafe being used to convert a &[u32] into a &[u8] in octavo:

fn crypt(&mut self, input: &[u8], output: &mut [u8]) { assert_eq!(input.len(), output.len()); if self.index == STATE_BYTES { self.update() } let buffer = unsafe { slice::from_raw_parts(self.buffer.as_ptr() as *const u8, STATE_BYTES) }; for i in self.index..input.len() { output[i] = input[i] ^ buffer[i]; } self.index = input.len(); }

We really ought to have a place to centralize this functionality so that it's well tested and safe across our ecosystem. Would it make sense to have this functionality be in byteorder?

It'd also be interesting to also support the inverse operation, where a &[u8] is converted into a (&[u8], &[u32], &[u8]), where the first and last slice are there to read a byte-at-a-time until the the slice is aligned. This style operation could be useful to safely massage a slice into something that can use simd (or at least simd-ish operations over a usize value).

cc @huonw, @bluss, @hauleth
opened by erickt 15

Speed up slice writes

Hi there,

I've been toying around with adding faster to a few encoding libraries, and I noticed that I could get up to a 6x speed boost by using it in write_u16_into, write_u32_into, and write_u64_into. The compiler does a pretty good job of vectorizing the read functions.

Would there be any interest in adding this behind a feature?

Benchmarks: (Ivy Bridge host; 128-bit integer vectors)

faster (No difference between target-cpu=native and target-cpu=x86-64)
test slice_u16::write_big_endian    ... bench:      23,344 ns/iter (+/- 122) = 8567 MB/s
test slice_u32::write_big_endian    ... bench:      46,681 ns/iter (+/- 160) = 8568 MB/s
test slice_u64::write_big_endian    ... bench:     105,206 ns/iter (+/- 369) = 7604 MB/s
master (-C target-cpu=native)
test slice_u16::write_big_endian    ... bench:     147,829 ns/iter (+/- 269) = 1352 MB/s
test slice_u32::write_big_endian    ... bench:     112,241 ns/iter (+/- 652) = 3563 MB/s
test slice_u64::write_big_endian    ... bench:     108,404 ns/iter (+/- 571) = 7379 MB/s

opened by AdamNiederer 12

Change as_ptr to as_mut_ptr to fix Miri error

Before, the example in the docs for ByteOrder::from_slice_i32 caused Miri to error with the message, "error: Undefined Behavior: trying to reborrow for Unique at alloc1389, but parent tag does not have an appropriate item in the borrow stack". Now it runs without errors (tested locally by creating an example and running it with cargo +nightly miri run --example the_example).

(This is the example in the Rust Playground. You can run it with Miri by selecting "Miri" from the "Tools" menu.)

Fwiw, I'm not sure if the original code really has undefined behavior or not, but this PR is a simple change, and the new code is a little clearer anyway.

opened by jturner314 11
Writing to uninitialized buffer

write_* methods of ByteOrder trait accept a buffer and don't guarantee that they wouldn't read from it. This has a drawback that strictly speaking, the provided buffer shouldn't be uninitialized.

I suggest to provide some way of guaranteeing that the buffer won't be read from, so it's fine to pass uninitialized buffer.
wontfix

opened by Kixunil 10
Read Write for core

@Tobba and I have worked on https://github.com/QuiltOS/core_io, a copy of Read and Write but with an associated error type to make it just need core. Perhaps it would be nice to (optionally) extend these traits for no_std users?

opened by Ericson2314 8
Consider adding runtime-defined endianness

Sometimes it is impossible to statically determine required endianness in advance. For example, TIFF image format defines endianness in the first byte of an input file, so it may be either big or little but which exactly is unknown statically. It would be nice if I could use byteorder for this task too.
question

opened by netvl 8
Unlicense is flawed enough to scare me off

I'd like to depend on this for an experiment where I'm rewriting a Python script which examines GIF files in a performance-optimized manner.

However because of flaws in the Unlicense, it's on my blacklist to ensure proper safety for my users, regardless of the jurisdiction they're in.

Is there any chance you'd be willing to offer byteorder under something more carefully designed like the Creative Commons CC0 public domain dedication?

(CC0 is also what the FSF recommends if you want to release your code into the public domain.)

opened by ssokolow 8

Convert endianess while copying in read/write into methods

Rather than first copying data from source to destination buffer and then performing endianess adjustment, to the conversion while copying. This means that each byte is accessed only once which (according to benchmarks) speeds up read_xxx_into and write_xxx_into methods:

| Benchmark                      | Before [ns/iter] | After [ns/iter] |
|--------------------------------+------------------+-----------------|
| slice_i64::read_big_endian     |  34,863  (±  30) | 23,656  (± 935) |
| slice_i64::read_little_endian  |  15,518  (±  19) | 13,362  (± 405) |
| slice_i64::write_big_endian    |  30,910  (± 109) | 23,123  (±  91) |
| slice_i64::write_little_endian |  14,924  (±  21) | 13,209  (± 180) |
|--------------------------------+------------------+-----------------|
| slice_u16::read_big_endian     |   7,492  (± 343) |  3,788  (±  16) |
| slice_u16::read_little_endian  |   3,366  (±   8) |  3,198  (±   3) |
| slice_u16::write_big_endian    |   4,066  (±   7) |  4,497  (±   8) |
| slice_u16::write_little_endian |   4,040  (± 946) |  3,193  (±   7) |
|--------------------------------+------------------+-----------------|
| slice_u64::read_big_endian     |  35,816  (± 251) | 23,259  (±  21) |
| slice_u64::read_little_endian  |  15,506  (±  86) | 13,365  (±  81) |
| slice_u64::write_big_endian    |  30,948  (±  63) | 23,102  (±  36) |
| slice_u64::write_little_endian |  14,938  (±  17) | 13,158  (±  18) |

The benchmarks were done on AMD Ryzen 9 5900X 12-Core Processor.

I’m somewhat confused why little endian benchmark show improvements but the results are reproducible. My best guess is that it’s compiler failing to optimise out for v $dst.iter_mut() { nop(); } loops currently present.

opened by mina86 0

Changelog is not up to date

Current version of byteorder is 1.4.3, but the changelog ends with 1.3.4: https://github.com/BurntSushi/byteorder/blob/abffade8232229db557e0a30c395963071624b2b/CHANGELOG.md

It would be nice if someone can add the changes from the more recent versions. :)

opened by striezel 0
Implement write_uXX_from

This was discussed in #155, and makes my life much easier when serializing big vectors.

For now I only implemented write_u32_from, but if this looks OK I can go implement all the others too. I also created slice_to_u8 based on slice_to_u8_mut, but I don't think the comment about "modification of the binary representation of any Copy type" applies to it.

opened by luizirber 0
Add methods that take/return arrays

The recently-added standard library endian conversion functions like from_be_bytes and to_be_bytes operate on arrays by value rather than slices by reference, which can provide better type safety in some cases.

It would be great if the ByteOrder trait were to add similar methods so that code which still needs to be generic over byte order can benefit from the array approach. Concretely, I would make use of those methods in the zerocopy::byteorder module if they were available.

cc @tamird

opened by joshlf 1