You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There have been a few updates to the moderate path float parsing algorithms in minimal-lexical, which can either provide performance benefits or reduce the amount of static storage required, depending on your use-case. I'll summarize a list of plausible options below, and if any seem beneficial to the maintainers of serde-json, will be happy to submit a PR.
Quick Background
Just for a quick summary: the float parsing algorithm is broken into 3 parts:
A fast path algorithm, where the significant digits can be exactly represented as a native float without truncated.
A moderate path algorithm, to process all floats except near-halfway cases through an extended representation of a float.
A slow path algorithm, that discerns the proper way to round near-halfway floats using arbitrary-precision arithmetic.
The moderate path is ~66-75% faster than the slow path, and therefore improvements to it either from a performance standpoint or correctness standpoint can lead to dramatic performance gains.
Interpolate the Cached Power Exponent
Serde uses pre-computed values for the cached float exponents in cached_float80. However, we can interpolate all these exponents, since each exponent is just effectively ⌈ log2(10) * exp10 ⌉. Using a pre-computed, integral power for log2(10), we can calculate the exponent exactly from the index to the cached power.
The specific pseudo-code can be used to generate this magic number, and verify it produces the correct result over the entire range of valid exponents:
See the appendix to see the full changes required to implement this change.
Pros:
Less storage required.
No discernible impact on runtime performance.
Cons:
N/A
Correctness Concerns:
N/A, can be proven the generated exponents are identical for all valid inputs.
Add the Eisel-Lemire Algorithm.
A fast algorithm for creating correct representations of floats from an extended 128-bit (or 192-bit) representation was developed and is currently in use in major Google projects like Golang and Wuffs, as well as others. The Eisel-Lemire algorithm is ~15% for a uniform distribution of randomly-generated floats over the entire float range, and catches halfway cases different than the existing extended-float algorithm.
A few examples of cases:
"9007199254740992.0" (or 1<<53): correctly classified by both.
"9007199254740992000.0e-3"(or 1<<53): only classified by extended-float only.
"9007199254740993.0" (or 1 + (1<<53`): both cannot classify.
"9007199254740994.0" (or 2 + (1<<53)`): correctly classified by both.
"9007199254740995.0" (or 3 + (1<<53)`): correctly classified by Eisel-Lemire only.
"9007199254740996.0" (or 4 + (1<<53)`): correctly classified by both.
"2.470328229206232720e-324" (near-halfway subnormal float): correctly classified by extended-float only.
"8.988465674311580536e307" (large near-halfway float): correctly classified by Eisel-Lemire only.
In short, the two combined have overlapping coverage, and can avoid delegating to the slow path algorithm, leading to major performance benefits. See minimal-lexical/lemire.rs for an example implementation of this algorithm. The general approach therefore is run Eisel-Lemire, and if the algorithm fails, delegate to the extended-float algorithm.
Pros:
Slightly faster performance than extended-float in some cases.
Can be combined with extended-float to minimize delegating to the slow path.
Can use pre-computed powers for Eisel-Lemire for extended-float too, leading to minor performance improvements.
Cons:
Increased storage required (requires an additional 1226 u64s, or ~9.8 KB).
Correctness Concerns:
Substantial, but well-established algorithm and passes all correctness tests.
It passes the curated suite of halfway cases, a large, curated suite of cases used to validate Go's parser, and Rust's extensive randomly-generated test-cases.
Replace Extended-Float with Lemire
A third option is to entirely remove the extended-float algorithm, and replace it with the Eisel-Lemire algorithm. In order to do so, we need to round-down to b so the slow algorithm can correctly differentiate between b, b+h, and b+u. Extensive comments and code samples are included in lexical-core/lemire.rs for how to implement this.
Pros:
Slightly faster performance than extended-float in some cases.
Cons:
Increased storage required (requires an additional 1226 u64s, or ~9.8 KB).
Less correct than extended-float, and therefore delegates to the slow path algorithm more often.
Correctness Concerns:
Substantial, but well-established algorithm and passes all correctness tests.
It passes the curated suite of halfway cases, a large, curated suite of cases used to validate Go's parser, and Rust's extensive randomly-generated test-cases.
Appendix
Interpolation
The full changes to interpolate the exponent are the following:
Right now I don't have a great sense of what the people relying on this would prefer to optimize for (whether it's performance, binary size, etc). Absent of feedback to the effect that the current implementation is not serving people's needs, I would prefer to mostly stick with the current code.
My own priorities for the float parser in order are:
Correctness
Reasonable performance envelope (say, within 2–5× of "optimal")
Source code simplicity / auditability
Smaller compile time / binary size (there is some complicated tradeoff frontier between these two things)
The exponent interpolation change sounds well aligned with this, whereas the Eisel–Lemire change is probably not impactful for us.
I think the right workflow might be to plan to pull in batches of changes from minimal-lexical on a cadence of something like 18 months or 24 months (other than correctness fixes, which we would take immediately).
There have been a few updates to the moderate path float parsing algorithms in minimal-lexical, which can either provide performance benefits or reduce the amount of static storage required, depending on your use-case. I'll summarize a list of plausible options below, and if any seem beneficial to the maintainers of serde-json, will be happy to submit a PR.
Quick Background
Just for a quick summary: the float parsing algorithm is broken into 3 parts:
The moderate path is ~66-75% faster than the slow path, and therefore improvements to it either from a performance standpoint or correctness standpoint can lead to dramatic performance gains.
Interpolate the Cached Power Exponent
Serde uses pre-computed values for the cached float exponents in cached_float80. However, we can interpolate all these exponents, since each exponent is just effectively
⌈ log2(10) * exp10 ⌉
. Using a pre-computed, integral power forlog2(10)
, we can calculate the exponent exactly from the index to the cached power.The specific pseudo-code can be used to generate this magic number, and verify it produces the correct result over the entire range of valid exponents:
See the appendix to see the full changes required to implement this change.
Pros:
Cons:
Correctness Concerns:
Add the Eisel-Lemire Algorithm.
A fast algorithm for creating correct representations of floats from an extended 128-bit (or 192-bit) representation was developed and is currently in use in major Google projects like Golang and Wuffs, as well as others. The Eisel-Lemire algorithm is ~15% for a uniform distribution of randomly-generated floats over the entire float range, and catches halfway cases different than the existing extended-float algorithm.
A few examples of cases:
"9007199254740992.0"
(or1<<53
): correctly classified by both."9007199254740992000.0e-3"
(or1<<53
): only classified by extended-float only."9007199254740993.0" (or
1 + (1<<53`): both cannot classify."9007199254740994.0" (or
2 + (1<<53)`): correctly classified by both."9007199254740995.0" (or
3 + (1<<53)`): correctly classified by Eisel-Lemire only."9007199254740996.0" (or
4 + (1<<53)`): correctly classified by both."2.470328229206232720e-324"
(near-halfway subnormal float): correctly classified by extended-float only."8.988465674311580536e307"
(large near-halfway float): correctly classified by Eisel-Lemire only.In short, the two combined have overlapping coverage, and can avoid delegating to the slow path algorithm, leading to major performance benefits. See minimal-lexical/lemire.rs for an example implementation of this algorithm. The general approach therefore is run Eisel-Lemire, and if the algorithm fails, delegate to the extended-float algorithm.
Pros:
Cons:
u64
s, or ~9.8 KB).Correctness Concerns:
Replace Extended-Float with Lemire
A third option is to entirely remove the extended-float algorithm, and replace it with the Eisel-Lemire algorithm. In order to do so, we need to round-down to
b
so the slow algorithm can correctly differentiate betweenb
,b+h
, andb+u
. Extensive comments and code samples are included in lexical-core/lemire.rs for how to implement this.Pros:
Cons:
u64
s, or ~9.8 KB).Correctness Concerns:
Appendix
Interpolation
The full changes to interpolate the exponent are the following:
The text was updated successfully, but these errors were encountered: