Character encoding support for Rust

Kang Seonghoon

Last update: Dec 14, 2022

Related tags

Encoding Multimedia rust-encoding

Overview

Encoding 0.3.0-dev

Character encoding support for Rust. (also known as rust-encoding) It is based on WHATWG Encoding Standard, and also provides an advanced interface for error detection and recovery.

This documentation is for the development version (0.3). Please see the stable documentation for 0.2.x versions.

Complete Documentation (stable)

Usage

Put this in your Cargo.toml:

[dependencies]
encoding = "0.3"

Then put this in your crate root:

extern crate encoding;

Data Table

By default, Encoding comes with ~480 KB of data table ("indices"). This allows Encoding to encode and decode legacy encodings efficiently, but this might not be desirable for some applications.

Encoding provides the no-optimized-legacy-encoding Cargo feature to reduce the size of encoding tables (to ~185 KB) at the expense of encoding performance (typically 5x to 20x slower). The decoding performance remains identical. This feature is strongly intended for end users. Do not try to enable this feature from library crates, ever.

For finer-tuned optimization, see src/index/gen_index.py for custom table generation.

Overview

To encode a string:

use encoding::{Encoding, EncoderTrap};
use encoding::all::ISO_8859_1;

assert_eq!(ISO_8859_1.encode("caf\u{e9}", EncoderTrap::Strict),
           Ok(vec![99,97,102,233]));

To encode a string with unrepresentable characters:

use encoding::{Encoding, EncoderTrap};
use encoding::all::ISO_8859_2;

assert!(ISO_8859_2.encode("Acme\u{a9}", EncoderTrap::Strict).is_err());
assert_eq!(ISO_8859_2.encode("Acme\u{a9}", EncoderTrap::Replace),
           Ok(vec![65,99,109,101,63]));
assert_eq!(ISO_8859_2.encode("Acme\u{a9}", EncoderTrap::Ignore),
           Ok(vec![65,99,109,101]));
assert_eq!(ISO_8859_2.encode("Acme\u{a9}", EncoderTrap::NcrEscape),
           Ok(vec![65,99,109,101,38,35,49,54,57,59]));

To decode a byte sequence:

use encoding::{Encoding, DecoderTrap};
use encoding::all::ISO_8859_1;

assert_eq!(ISO_8859_1.decode(&[99,97,102,233], DecoderTrap::Strict),
           Ok("caf\u{e9}".to_string()));

To decode a byte sequence with invalid sequences:

use encoding::{Encoding, DecoderTrap};
use encoding::all::ISO_8859_6;

assert!(ISO_8859_6.decode(&[65,99,109,101,169], DecoderTrap::Strict).is_err());
assert_eq!(ISO_8859_6.decode(&[65,99,109,101,169], DecoderTrap::Replace),
           Ok("Acme\u{fffd}".to_string()));
assert_eq!(ISO_8859_6.decode(&[65,99,109,101,169], DecoderTrap::Ignore),
           Ok("Acme".to_string()));

To encode or decode the input into the already allocated buffer:

use encoding::{Encoding, EncoderTrap, DecoderTrap};
use encoding::all::{ISO_8859_2, ISO_8859_6};

let mut bytes = Vec::new();
let mut chars = String::new();

assert!(ISO_8859_2.encode_to("Acme\u{a9}", EncoderTrap::Ignore, &mut bytes).is_ok());
assert!(ISO_8859_6.decode_to(&[65,99,109,101,169], DecoderTrap::Replace, &mut chars).is_ok());

assert_eq!(bytes, [65,99,109,101]);
assert_eq!(chars, "Acme\u{fffd}");

A practical example of custom encoder traps:

use encoding::{Encoding, ByteWriter, EncoderTrap, DecoderTrap};
use encoding::types::RawEncoder;
use encoding::all::ASCII;

// hexadecimal numeric character reference replacement
fn hex_ncr_escape(_encoder: &mut RawEncoder, input: &str, output: &mut ByteWriter) -> bool {
    let escapes: Vec<String> =
        input.chars().map(|ch| format!("&#x{:x};", ch as isize)).collect();
    let escapes = escapes.concat();
    output.write_bytes(escapes.as_bytes());
    true
}
static HEX_NCR_ESCAPE: EncoderTrap = EncoderTrap::Call(hex_ncr_escape);

let orig = "Hello, 世界!".to_string();
let encoded = ASCII.encode(&orig, HEX_NCR_ESCAPE).unwrap();
assert_eq!(ASCII.decode(&encoded, DecoderTrap::Strict),
           Ok("Hello, &#x4e16;&#x754c;!".to_string()));

Getting the encoding from the string label, as specified in WHATWG Encoding standard:

use encoding::{Encoding, DecoderTrap};
use encoding::label::encoding_from_whatwg_label;
use encoding::all::WINDOWS_949;

let euckr = encoding_from_whatwg_label("euc-kr").unwrap();
assert_eq!(euckr.name(), "windows-949");
assert_eq!(euckr.whatwg_name(), Some("euc-kr")); // for the sake of compatibility
let broken = &[0xbf, 0xec, 0xbf, 0xcd, 0xff, 0xbe, 0xd3];
assert_eq!(euckr.decode(broken, DecoderTrap::Replace),
           Ok("\u{c6b0}\u{c640}\u{fffd}\u{c559}".to_string()));

// corresponding Encoding native API:
assert_eq!(WINDOWS_949.decode(broken, DecoderTrap::Replace),
           Ok("\u{c6b0}\u{c640}\u{fffd}\u{c559}".to_string()));

Types and Stuffs

There are three main entry points to Encoding.

Encoding is a single character encoding. It contains encode and decode methods for converting String to Vec<u8> and vice versa. For the error handling, they receive traps (EncoderTrap and DecoderTrap respectively) which replace any error with some string (e.g. U+FFFD) or sequence (e.g. ?). You can also use EncoderTrap::Strict and DecoderTrap::Strict traps to stop on an error.

There are two ways to get Encoding:

encoding::all has static items for every supported encoding. You should use them when the encoding would not change or only handful of them are required. Combined with link-time optimization, any unused encoding would be discarded from the binary.
encoding::label has functions to dynamically get an encoding from given string ("label"). They will return a static reference to the encoding, which type is also known as EncodingRef. It is useful when a list of required encodings is not available in advance, but it will result in the larger binary and missed optimization opportunities.

RawEncoder is an experimental incremental encoder. At each step of raw_feed, it receives a slice of string and emits any encoded bytes to a generic ByteWriter (normally Vec<u8>). It will stop at the first error if any, and would return a CodecError struct in that case. The caller is responsible for calling raw_finish at the end of encoding process.

RawDecoder is an experimental incremental decoder. At each step of raw_feed, it receives a slice of byte sequence and emits any decoded characters to a generic StringWriter (normally String). Otherwise it is identical to RawEncoders.

One should prefer Encoding::{encode,decode} as a primary interface. RawEncoder and RawDecoder is experimental and can change substantially. See the additional documents on encoding::types module for more information on them.

Supported Encodings

Encoding covers all encodings specified by WHATWG Encoding Standard and some more:

7-bit strict ASCII (ascii)
UTF-8 (utf-8)
UTF-16 in little endian (utf-16 or utf-16le) and big endian (utf-16be)
All single byte encoding in WHATWG Encoding Standard:
- IBM code page 866
- ISO 8859-{2,3,4,5,6,7,8,10,13,14,15,16}
- KOI8-R, KOI8-U
- MacRoman (macintosh), Macintosh Cyrillic encoding (x-mac-cyrillic)
- Windows code pages 874, 1250, 1251, 1252 (instead of ISO 8859-1), 1253, 1254 (instead of ISO 8859-9), 1255, 1256, 1257, 1258
All multi byte encodings in WHATWG Encoding Standard:
- Windows code page 949 (euc-kr, since the strict EUC-KR is hardly used)
- EUC-JP and Windows code page 932 (shift_jis, since it's the most widespread extension to Shift_JIS)
- ISO-2022-JP with asymmetric JIS X 0212 support (Note: this is not yet up to date to the current standard)
- GBK
- GB 18030
- Big5-2003 with HKSCS-2008 extensions
Encodings that were originally specified by WHATWG Encoding Standard:
- HZ
ISO 8859-1 (distinct from Windows code page 1252)

Parenthesized names refer to the encoding's primary name assigned by WHATWG Encoding Standard.

Many legacy character encodings lack the proper specification, and even those that have a specification are highly dependent of the actual implementation. Consequently one should be careful when picking a desired character encoding. The only standards reliable in this regard are WHATWG Encoding Standard and vendor-provided mappings from the Unicode consortium. Whenever in doubt, look at the source code and specifications for detailed explanations.

Comments

(fix) removed deprecated syntax for lifetime in traits

I don't really understand whats going on, but removing 'static lifetime allows this library to compile and tests to pass. However 104 tests were ignored.

This fixes the error: src/encoding/types.rs:105:25: 105:32 error: expected ident, found 'static src/encoding/types.rs:105 pub trait StringWriter: 'static {

opened by brycefisher 6

"Replace" vs. WHATWG error handling

Hi,

Quoting from the README:

use encoding::whatwg;
let mut euckr = whatwg::TextDecoder::new(Some(~"euc-kr")).unwrap();
euckr.encoding(); // => ~"euc-kr"
let broken = &[0xbf, 0xec, 0xbf, 0xcd, 0xff, 0xbe, 0xd3];
euckr.decode_buffer(Some(broken)); // => Ok(~"\uc6b0\uc640\ufffd\uc559")

// this is different from rust-encoding's default behavior:
let decoded = all::WINDOWS_949.decode(broken, Replace); // => Ok(~"\uc6b0\uc640\ufffd\ufffd")

Is there a reason for this difference? Could the Replace built-in trap be align with the spec?

opened by SimonSapin 6

Fix hyphens on target names error
As of the latest nightly, the rules in https://github.com/rust-lang/rfcs/blob/master/text/0940-hyphens-considered-harmful.md are now fully implemented and in use. This patch fixes the errors on cargo build that arise when attempting to build projects that depend on this library:

Unable to get packages from source Caused by: failed to parse manifest at `/home/filipe/.cargo/registry/src/github.com-1ecc6299db9ec823/encoding-index-simpchinese-1.20141219.1/Cargo.toml` Caused by: target names cannot contain hyphens: encoding-index-simpchinese
opened by filipegoncalves 5
Charset request: ArmSCII-8

Would it be possible to add support for the ArmSCII-8 encoding? Ref: https://manned.org/armscii-8 and https://en.wikipedia.org/wiki/ArmSCII

I had a quick look to see if I could add this myself, as it's just a single-byte encoding; But seeing how all current codecs are autogenerated from the whatwg specs, I'm a bit lost as to the best approach to implement a custom codec. I'd be happy to provide a PR if I have some guidance on the next steps to take.

opened by 17dec 4
`all::encodings()` returns an errornous list (and should be sorted alphabetically).
The bug is in src/all.rs:

const ENCODINGS: &'static [EncodingRef] = &[ ....

This is the way I collect the list (please confirm that it should be done this way):

let list = all::encodings().iter().map(|&e|format!(" {}\n",e.name())).collect::<String>();

the following names do not work:

error mac-roman mac-roman mac-cyrillic hz big5-2003 pua-mapped-binary encoder-only-utf-8

These two do work but are not listed:

x-user-defined macintosh

PLEASE return them alphabetically sorted!
opened by getreu 4
hz-gb-2312 encoding and WHATWG compatibility

The WHATWG Encoding Spec lists hz-gb-2312 as mapping to the replacement encoding, which uses the UTF-8 encoder and throws a special replacement encoding error for its decoder. However, it looks like this crate implements the actual HZ encoding. For WHATWG compatibility, this would have to get folded in with the rest of the replacement encodings, but I don't know if that's acceptable considering other people may be using the current implementation.

Would you prefer to maintain strict WHATWG compatibility or keep the current implementation? If the current implementation is kept, this deviation needs to be well documented - it isn't too hard to work around, but is a bit annoying and could catch someone unaware because the rest of the crate is compatible.

opened by aneeshusa 4

Incrementally parsed invalid sequences spanning multiple chunks write data

    #[test]
    fn test_invalid_multibyte_span() {
        use std::mem;
        let mut d = UTF8Encoding.decoder();
        // "ef bf be" is an invalid sequence.
        assert_feed_ok!(d, [], [0xef, 0xbf], "");
        let input: [u8, ..1] = [ 0xbe ];
        let (_, _, buf) = unsafe { d.test_feed(mem::transmute(input.as_slice())) };
        // Make sure no data was written to the buffer.
        assert_eq!(buf, String::new());
        // task 'codec::utf_8::tests::test_invalid_multibyte_span' failed at 'assertion failed: `(left == right) && (right == left)` (left: ``, right: ``)', /Users/cgaebel/code/rust-encoding/src/codec/utf_8.rs:529
    }

This test successfully reports an error, but when it does it writes an invalid code sequence into the buffer.

(side note, github markup is eating the invalid UTF-8 char in left. Rest assured SOMETHING is in there.

opened by cgaebel 4

Encoding.name() vs. WHATWG encoding name
whatwg::encoding_from_label returns a tuple of an Encoding object and the encoding name as a string, while the object also has a .name() that returns the same as a string. This seems redundant.

I would like to remove the former and only keep the latter, which should use names from the spec. The requires changes are:

Rename shift-jis to shift_jis.

Add iso-8859-8-i, identical to iso-8859-8 but with a different name.

Rename windows-949 to euc-kr

1 and 2 are harmless, but 3 seems to have been deliberate. Is there a difference between windows-949 and euc-kr, or a reason to prefer the first name?
opened by SimonSapin 4
Split encoding::types into encoding-types crate

This allows for creation of alternative non-WHATWG encodings that use the same interface as encodings defined in this crate without pulling in all the tables and encodings.

This commit does not introduce any breaking changes; all the types previously defined in encoding::types are reexported.

Fixes #81.

I tried to avoid breaking changes for now, but IMO fn decode being in encoding::types makes little sense; I’d move it to encoding at some point later.

opened by nagisa 3
Change hyphen to _

I'm getting errors when building crates which depend on "encoding-index-*" crates because of the hyphen.

cargo build Unable to get packages from source

Caused by: failed to parse manifest at .../.cargo/registry/src/github.com-1ecc6299db9ec823/encoding-index-tradchinese-1.20141219.2/Cargo.toml

Caused by: library target names cannot contain hyphens: encoding-index-tradchinese

opened by marvelm 3
Fix building with current rust

By now, #10683 has been fixed, so the temporary can be dropped, but with the DST changes, we now get &[char, ..5], which doesn't coerce to &[char]. And since only the latter implements CharEq, we have a problem. Using as_slice() instead of prefixing the vector with &, we get a &[char] and all is good.

opened by dotdash 3
to GBK and to UTF8 is not right work
i have a GBK string, GBK.decode(rst_raw, DecoderTrap::Strict).is_err() and UTF_8.decode(rst_raw, DecoderTrap::Strict).is_err() can not judge right result.i don't why, so i writed judge "utf8 str" code: fn is_utf8(data: &[u8]) -> bool { let mut i = 0; while i < data.len() { let num = preNUm(data[i]); if data[i] & 0x80 == 0x00 { i += 1; continue; } else if num > 2 { i += 1; let mut j = 0; while j < num -1 { if data[i] & 0xc0 != 0x80 { return false; } j += 1; i += 1; } } else { return false; }

} return true;

}

fn preNUm(data: u8) -> i32 { let rst = format!("{:b}", data); let mut i = 0; for j in rst.chars() { if j != '1' { break; } i += 1; } return i; }
opened by whereisyou 0
Abandoned?

What is the status of the project? It seems to have seen no updates in the last 5 years, is it abandoned? And if so what is the "official" replacement?

If the project is to be considered abandoned, maybe that could be indicated in the readme and the project archived?

opened by masklinn 2

Performance: Consider replacing lookup tables with match statements or binary search in single byte index

The current technique for building the single byte "forward" and "backward" function is to generate lookup tables using gen_index.py

Here's an example generated file: https://github.com/lifthrasiir/rust-encoding/blob/master/src/index/singlebyte/windows_1252.rs

There are some benchmarks that are generated, but they're micro-benchmarks with synthetic data, and I'm not sure they adequately capture how the library would be used in the wild.

So I wrote a few tiny benchmarks that exercise the encoder and decoder at the level they're typically used.

/// Some Latin-1 text to test
//
// the first few sentences of the article "An Ghaeilge" from Irish Wikipedia.
// https://ga.wikipedia.org/wiki/An_Ghaeilge
pub static IRISH_TEXT: &'static str =
    "Is ceann de na teangacha Ceilteacha í an Ghaeilge (nó Gaeilge na hÉireann mar a thugtar \
     uirthi corruair), agus ceann den dtrí cinn de theangacha Ceilteacha ar a dtugtar na \
     teangacha Gaelacha (.i. an Ghaeilge, Gaeilge na hAlban agus Gaeilge Mhanann) go háirithe. \
     Labhraítear in Éirinn go príomha í, ach tá cainteoirí Gaeilge ina gcónaí in áiteanna eile ar \
     fud an domhain. Is í an teanga náisiúnta nó dhúchais agus an phríomhtheanga oifigiúil i \
     bPoblacht na hÉireann í an Ghaeilge. Tá an Béarla luaite sa Bhunreacht mar theanga oifigiúil \
     eile. Tá aitheantas oifigiúil aici chomh maith i dTuaisceart Éireann, atá mar chuid den \
     Ríocht Aontaithe. Ar an 13 Meitheamh 2005 d'aontaigh airí gnóthaí eachtracha an Aontais \
     Eorpaigh glacadh leis an nGaeilge mar theanga oifigiúil oibre san AE";

pub static RUSSIAN_TEXT: &'static str =
    "Ру?сский язы?к Информация о файле слушать)[~ 3] один из восточнославянских языков, \
     национальный язык русского народа. Является одним из наиболее распространённых языков мира \
     шестым среди всех языков мира по общей численности говорящих и восьмым по численности \
     владеющих им как родным[9]. Русский является также самым распространённым славянским \
     языком[10] и самым распространённым языком в Европе ? географически и по числу носителей \
     языка как родного[7]. Русский язык ? государственный язык Российской Федерации, один из \
     двух государственных языков Белоруссии, один из официальных языков Казахстана, Киргизии и \
     некоторых других стран, основной язык международного общения в Центральной Евразии, в \
     Восточной Европе, в странах бывшего Советского Союза, один из шести рабочих языков ООН, \
     ЮНЕСКО и других международных организаций[11][12][13].";


#[bench]
fn bench_encode_irish(bencher: &mut test::Bencher) {
    bencher.bytes = IRISH_TEXT.len() as u64;
    bencher.iter(|| {
        test::black_box(
            WINDOWS_1252.encode(&ASCII_TEXT, EncoderTrap::Strict)
        )
    })
}

#[bench]
fn bench_decode_irish(bencher: &mut test::Bencher) {
    let bytes = WINDOWS_1252.encode(IRISH_TEXT, EncoderTrap::Strict).unwrap();
    
    bencher.bytes = bytes.len() as u64;
    bencher.iter(|| {
        test::black_box(
            WINDOWS_1252.decode(&bytes, DecoderTrap::Strict)
        )
    })
}

#[bench]
fn bench_encode_russian(bencher: &mut test::Bencher) {
    bencher.bytes = RUSSIAN_TEXT.len() as u64;
    bencher.iter(|| {
        test::black_box(
            ISO_8859_5.encode(&RUSSIAN_TEXT, EncoderTrap::Strict)
        )
    })
}

#[bench]
fn bench_decode_russian(bencher: &mut test::Bencher) {
    let bytes = ISO_8859_5.encode(RUSSIAN_TEXT, EncoderTrap::Strict).unwrap();
    
    bencher.bytes = bytes.len() as u64;
    bencher.iter(|| {
        test::black_box(
            ISO_8859_5.decode(&bytes, DecoderTrap::Strict)
        )
    })
}

I picked the windows-1252 encoding because it's similar to the old latin-1 standard and can encode the special characters in the Irish text I grabbed, and iso-8859-5 for similar reasons for the Russian test.

I rewrote gen_index.py to create match statements instead of building a lookup table. You get something like this:

// AUTOGENERATED FROM index-windows-1252.txt, ORIGINAL COMMENT FOLLOWS:
//
// For details on index index-windows-1252.txt see the Encoding Standard
// https://encoding.spec.whatwg.org/
//
// Identifier: e56d49d9176e9a412283cf29ac9bd613f5620462f2a080a84eceaf974cfa18b7
// Date: 2018-01-06
#[inline]
pub fn forward(code: u8) -> Option<u16> {
    match code {
        128 => Some(8364),
        129 => Some(129),
        130 => Some(8218),
        131 => Some(402),
        132 => Some(8222),
        133 => Some(8230),
        134 => Some(8224),
        135 => Some(8225),
        136 => Some(710),
        137 => Some(8240),
        //  a bunch more items
        250 => Some(250),
        251 => Some(251),
        252 => Some(252),
        253 => Some(253),
        254 => Some(254),
        255 => Some(255),
        _ => None
    }
}

#[inline]
pub fn backward(code: u32) -> Option<u8> {
    match code {
        8364 => Some(128),
        129 => Some(129),
        8218 => Some(130),
        402 => Some(131),
        8222 => Some(132),
        8230 => Some(133),
        8224 => Some(134),
        8225 => Some(135),
        710 => Some(136),
        8240 => Some(137),
        352 => Some(138),
        8249 => Some(139),
        338 => Some(140),
        141 => Some(141),
        381 => Some(142),
        //  a bunch more items
        251 => Some(251),
        252 => Some(252),
        253 => Some(253),
        254 => Some(254),
        255 => Some(255),
        _ => None
    }
}

Note that I changed the function signature to return an Option instead of a sentinel value. That wasn't strictly required, and didn't have a large effect on performance, but makes the code more idiomatic, I think.

I also generated a version that uses a binary search. It's pretty simple.

const BACKWARD_KEYS: &'static [u32] = &[
    128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146,
    147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 162, 163, 164, 165, 166,
    167, 168, 169, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 187,
    188, 189, 190, 215, 247, 1488, 1489, 1490, 1491, 1492, 1493, 1494, 1495, 1496, 1497, 1498, 1499,
    1500, 1501, 1502, 1503, 1504, 1505, 1506, 1507, 1508, 1509, 1510, 1511, 1512, 1513, 1514, 8206,
    8207, 8215
];

const BACKWARD_VALUES: &'static [u8] = &[
    128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146,
    147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 162, 163, 164, 165, 166,
    167, 168, 169, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 187,
    188, 189, 190, 170, 186, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237,
    238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 253, 254, 223
];

#[inline]
pub fn backward(code: u32) -> u8 {
    if let Ok(index) = BACKWARD_KEYS.binary_search(&code) {
        BACKWARD_VALUES[index]
    } else {
        0
    }
}

Here's a table comparing the three techniques (scroll to see entire table):

test | master | | | | match | | | | | binary search | | | | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- codec::singlebyte::tests::bench_decode_irish | 3246 | ns/iter | 240 | MB/s | 3171 | ns/iter | 245 | MB/s | 2.08% | | | | | codec::singlebyte::tests::bench_decode_russian | 8508 | ns/iter | 98 | MB/s | 8890 | ns/iter | 94 | MB/s | -4.08% | | | | | codec::singlebyte::tests::bench_encode_irish | 2622 | ns/iter | 310 | MB/s | 1688 | ns/iter | 482 | MB/s | 55.48% | 2243 | ns/iter | 363 | MB/s | 17.10% codec::singlebyte::tests::bench_encode_russian | 6692 | ns/iter | 228 | MB/s | 10578 | ns/iter | 144 | MB/s | -36.84% | 10019 | ns/iter | 152 | MB/s | -33.33%

Obviously the Irish / Windows-1252 case is improved with both alternative techniques, but the Russian case is degraded.

It looks like the decode method isn't changed much, and that makes sense, because the match expressions are contiguous integers, I bet that LLVM is optimizing that down to a lookup table anyways.

I'll try running some more tests.

opened by john-parton 0

Bugfix/warnings

Fix for https://github.com/lifthrasiir/rust-encoding/issues/123

Most of the fixes were generated by running cargo fix and cargo fix --edition on the current nightly toolchain.

Unresolved

If you rebuild the .rs files using the gen_index.py script, it will generate code that generates warning. I can resolve that in this PR or in another PR.

Let me know if you have any questions.

Thanks!

opened by john-parton 0
Warnings emited when building
On master, cargo build emits 237 warnings.

Here are the different kinds of warnings:

[ ] warning: trait objects without an explicit dyn are deprecated

[ ] ... range patterns are deprecated

[ ] unreachable pattern (this one has to do with the fact that Rust wasn't able to tell when a match pattern was exhaustive if you used all of the scalar values for a type, but now it appears to handle that correctly)

[ ] use of deprecated item 'try': use the ? operator instead

You can run cargo build -v 2>&1 | grep warning | sort | uniq to get a summary.
opened by john-parton 0