In the Windows world UTF-16 strings are not only encountered when interfacing with APIs, but also in a few on-disk structures (e.g. NT registry hives or NTFS filesystems).
This complicates interoperability with Rust's UTF-8 world, especially in no_std
environments.
My current approach when writing a parser for such an on-disk structure is as follows:
- I define my own
Utf16ByteString
type that just wraps a &[u8]
.
- All parser functions that output a string just return the byte slice encompassing that string in a
Utf16ByteString
. This has zero cost.
- For users with
alloc
or std
, my Utf16ByteString
provides a to_string
function that uses char::decode_utf16(bytes.chunks_exact(2).map(|two_bytes| u16::from_le_bytes(two_bytes.try_into().unwrap())))
internally. Apart from the required allocations, this function also comes with decoding overhead.
Of course, I like to avoid using to_string
, and a frequent case where this should be possible are (case-sensitive) comparisons.
Currently, I have to create the comparison byte buffers by hand though, e.g. let hello = &[b'H', 0, b'e', 0, b'l', 0, b'l', 0, b'o', 0]
.
Latest const-utf16
is no help here, as its encode!
only outputs a &[u16]
. I could transmute my &[u8]
to a &[u16]
, but that would be an unsafe hack and prone to endian problems.
Could const-utf16
therefore be extended to alternatively output a UTF-16LE &[u8]
slice for such comparisons?
Or am I missing a zero-cost alternative here?