This is a not-totally-sketch sketch of a fairly major (though not especially user-visible) change to RawVal
. I'm opening it here for discussion, not as a definite proposal but because it's at the point where a decent number of tests are actually passing and it's worth discussing before either forging ahead or abandoning.
Background
RawVal
is the "polymorphic" type used to carry values that can be one-of-many-types -- numbers, object handles, symbols, booleans, errors -- back and forth between the (native) host and the (WASM) guest. It's used for passing arguments to contracts, as well as storing values in our host-side polymorphic containers (maps and vecs). It's currently bit-packed into a 64-bit value, because .. WASM only really knows about values that are 32 or 64 bits. Everything else you have to tell it about fairly manually.
Users don't directly see RawVal
very often, they usually see a wrapper type around it that imbues it with a bit more knowledge of its content. But RawVal
s are fairly ubiquitous under the covers of the system, and a lot of the WASM bytecode we generate is concerned with packing, unpacking, tag-testing and converting them.
Summary of changes:
-
RawVal
is changed to a 160-bit value: a pair of u32
+ u128
- The
u32
is a control word, which holds the type-tag and any other metadata
- The
u128
is the payload word, which holds the non-metadata content of the value
-
A pile of new code is added to call and return plumbing to "explode" and "implode" RawVals
to and from multiple-argument sets -- sequences of u32 and u64 values that WASM supports -- and do returns via caller-allocated return-pointers, and similar messiness because there's no stable and widely-supported ABI in WASM that we can rely on for passing multi-word values between the host and guest yet. We're essentially hand-implementing an ABI.
-
Some of this code is in the SDK, and there's a companion branch for it that's required for this to work.
-
The "wrapper" types around RawVal
-- used when we know the subtype of value carried in a RawVal
, such as when we have a u32
or Object
handle -- are instead turned into "subset" types that only carry the bits relevant to them, don't carry a whole RawVal
at all anymore (though they can reconstruct one on demand if needed).
- Object for example is
Object(u64)
, Status is Status(u64)
, Symbol is Symbol(u128)
, and so on.
- This allows passing and returning such "subset" types without engaging the big-expensive-
RawVal
ABI
- Actually quite a lot of host functions take
Object
and u32
args, not RawVal
. Only polymorphic functions like vec_get
which can return "anything" (because vector-contents are polymorphic) need to return RawVal
.
Rationale
Why consider this? A few reasons:
- It lets us raise the size of
Symbol
from 10 characters to 21 characters. Users are chafing with 10 chars a bit.
- It eliminates the "weird" number situation in the existing encoding, where "u63" fits (but what's that?) and
u64
and i64
have to be boxed as Object
s. All normal Rust scalar types fit unboxed into this experiment's large RawVal
s, including i128
and u128
.
- It supports standardizing on an unboxed
i128
as a ubiquitous fixed-point arithmetic type for asset values. Currently we are somewhat on the fence about how people are likely to be representing asset-amounts. It's possible that u63
and/or boxed i64
values will be the norm (possibly using the Stellar-native scale factor of 7 digits -- it's quite a decent range) but it's also fairly likely that people coming from Ethereum or other ecosystems will be expecting bigger "standard" number types for asset-amounts, and will wind up using BigInt
everywhere (or rolling their own on top of Bytes
) if we don't provide something standard.
- Given the luxuriously-sized
i128
, it might support elimination of BigInt
from the object repertoire entirely, which is somewhat overkill for use-cases that would be ok with i128
, and a bit tricky to instrument safely / correctly for gas-metering.
Impacts
- Broadly speaking, it seems to work. There remain some bugs.
- Codesize goes up, but not horribly. Worst cases double, average cases that are heavy on host functions with mostly "subset"-typed arguments are more like 10-40% overhead.
- If you're curious where the difference comes from, here are two annotated examples of one of the smallest contracts, add_i32:
- If you just want the high level summary, here are some before-and-after sizes:
| before |after | contract |
| -------- | ----- | ----------- |
|6522 | 8840 | soroban_auth_advanced_contract.wasm
|996 | 1534 | soroban_auth_contract.wasm
|456 | 412 | soroban_cross_contract_a_contract.wasm
|903 | 665 | soroban_cross_contract_b_contract.wasm
|1015 | 1558 | soroban_custom_types_contract.wasm
|963 | 638 | soroban_deployer_contract.wasm
|424 | 539 | soroban_deployer_test_contract.wasm
|509 | 616 | soroban_errors_contract.wasm
|566 | 730 | soroban_events_contract.wasm
|409 | 461 | soroban_hello_world_contract.wasm
|425 | 510 | soroban_increment_contract.wasm
|7478 | 10614 | soroban_liquidity_pool_contract.wasm
|21638 | 31565 | soroban_liquidity_pool_router_contract.wasm
|283 | 262 | soroban_logging_contract.wasm
|11897 | 17271 | soroban_single_offer_contract.wasm
|11497 | 16377 | soroban_single_offer_contract_xfer_from.wasm
|21451 | 31040 | soroban_single_offer_router_contract.wasm
|4655 | 7872 | soroban_timelock_contract.wasm
|31481 | 31481 | soroban_token_contract.wasm
|11405 | 17020 | soroban_wallet_contract.wasm
Discussion
I do not know exactly what to make of this. Knowing it's possible is interesting, but it's also fairly costly and the benefits might not justify it. I am interested in hearing input from others, especially around the question of number types.
The way I see it we have 3-and-a-half options:
- Stick with current, encourage use of u63 for asset-amounts (which are almost always positive), assume scale=7 is good enough. It seems to have been basically ok for the classic Stellar protocol, though we occasionally need support routines that minimize intermediate rounding, like a 3-arg A*B/C operation conducted in 128-bit precision. We can absolutely build that sort of thing into the SDK and/or host functions though.
1.1. Possiby encourage using
BigInt
for asset-amounts, and possibly shift BigInt
to a type with a menu of of fixed-size-but-big types, like u128
, u256
, u512
and u1024
, such as supported by the crypto_bigint library. This has a more predictable cost model and range (eg. one can know that if you are working in u256
that your type will convert back to an Ethereum value too). Though it also lacks a few functions in our existing BigInt
such as pow
.
- Move to this experiment, wire in (say) scale=18 which is the norm (and in the past suggested mandatory-value) in ERC-20, and certainly plenty for any real use-cases. We get more "breathing room" in our value repertoire, and a slightly simpler mental model for users, at the expense of code size (and thus performance).
- Move to an even-larger version of this experiment, say with
RawVal
payload being 256 bits. This is a bit of an appealing target as well, in some ways, since it's both "ridiculously huge for fixed-point math", and "interoperates exactly with Ethereum values", and also "is able to store SHA256 outputs and Ed25519 points as unboxed values". But since those latter two tend to be opaque constants rather than number-like values with lots of temporaries created and forgotten through arithmetic expressions, the value of keeping them unboxed is not totally clear to me.
One thing to recognize is that no matter what we put in as "standard" types (i.e. with pre-defined tags in the XDR, standard helper routines for printing and converting, standard operations as host functions), users can always "ship their own" in a contract. They can include fixed-point arithmetic or unboxed u256 values or whatever. It'll just be a bit janky -- slower than native-supported, non-interoperable, harder to debug, hurt their codesize, etc. -- but they can do it. So we don't need to cover all corner cases. We need to do something good-enough that most contracts have something familiar to reach for.
Personally I am .. somewhat disappointed in the cost and complexity of this experiment and am leaning towards options 1 or 1.1 above -- stay where we are and encourage either u63
/i64
or a specific size of Object
-handle based BigInt
for normal asset values, with a menu of BigInt
options for interop -- but I would love to hear others' opinions. Especially those working on "standard token contract interfaces" -- we probably want to smooth down those interfaces, make the type repertoire support their needs directly.