Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously we implicitly assumed that character ranges in a NFA/DFA state
cannot overlap, so for example, I cannot have rules
'0'-'9'
and'5'-'9'
inthe NFA/DFA state. This is obviously incorrect, as reported in #2.
This patch introduces a "range map" that maps
u32
ranges to values. A rangecan have multiple values (used in NFA-to-DFA conversion). When two ranges
overlap, the overlapping part has union of values of the overlapping ranges.
Using this type instead of
HashMap<(char, char), HashSet<NfaStateIdx>>
automatically fixes the issue, as the handling of overlapping ranges
effectively implements switching to states of all of the overlapping ranges in
the NFA. So most of the code is just implementing the
RangeMap
type. Rest aretests and replacing hash maps with
RangeMap
.I also benchmarked this patch. For the Lua 5.1 lexer, this PR reduces compile
times 9%. Generated code is identical except orders of alternatives in or
patterns for ranges.
Fixes #2