Cardano doesn't use a deterministic mechanism for CBOR encoding. The same block / tx data can be encoded in practically infinite variations, each one resulting in a different hash.
The process of decoding CBOR into a Rust struct involves loosing the information about these slight variations in encoding (eg: indefinite / definite arrays, int size, map orders, etc). So far, Pallas has been dealing with this problem by enriching the structures with extra data that allows to replicate the same variation when re-encoding a struct.
After dealing with several edge cases I thought I had won the battle. We actually managed to process the whole mainnet history without a mismatch. I felt a nice sense of accomplishment, I fought against the odds and survived...
... but I was wrong, CBOR had more tricks under the sleeve. A testnet Tx was found that had a mismatch in the resulting hash: https://github.com/txpipe/oura/issues/307
This time, the problem was that a Tx input tuple (hash, index) was encoded as a CBOR indefinite array. Why on earth would someone encode a fixed tuple as an indefinite array? I don't know, but it was clear that this is not a war that can be won.
It is now 100% clear to me that the only way to maintain consistency on the hash generation process is to retain the original CBOR data, something that @NicolasDP has been telling me since quite a while now.
The question is how do we accomplish this without an impact on memory.
In this PR I introduce a generic structure called KeepRaw<T>
that wraps an inner CBOR-encodable structure while tracking the start / end positions that represent the segment of the original bytestream relevant to that particular inner struct:
use pallas_codec::utils::KeepRaw;
let a = (123u16, (456u16, 789u16), 123u16);
let data = minicbor::to_vec(a).unwrap();
let (_, keeper, _): (u16, KeepRaw<(u16, u16)>, u16) = minicbor::decode(&data).unwrap();
// this returns a &[u8] slice of the total bytestream
keeper.raw_cbor()
In this way, we can start wrapping ledger primitives, allowing us to retain the original CBOR and hash accordingly:
impl ToHash<32> for KeepRaw<'_, TransactionBody> {
fn to_hash(&self) -> pallas_crypto::hash::Hash<32> {
Hasher::<256>::hash(self.raw_cbor())
}
}
I'm not ashamed to say that CBOR has won, it was an honourable fight.