Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle overlapping ranges #3

Merged
merged 12 commits into from May 28, 2021
Merged

Handle overlapping ranges #3

merged 12 commits into from May 28, 2021

Conversation

osa1
Copy link
Owner

@osa1 osa1 commented May 28, 2021

Previously we implicitly assumed that character ranges in a NFA/DFA state
cannot overlap, so for example, I cannot have rules '0'-'9' and '5'-'9' in
the NFA/DFA state. This is obviously incorrect, as reported in #2.

This patch introduces a "range map" that maps u32 ranges to values. A range
can have multiple values (used in NFA-to-DFA conversion). When two ranges
overlap, the overlapping part has union of values of the overlapping ranges.

Using this type instead of HashMap<(char, char), HashSet<NfaStateIdx>>
automatically fixes the issue, as the handling of overlapping ranges
effectively implements switching to states of all of the overlapping ranges in
the NFA. So most of the code is just implementing the RangeMap type. Rest are
tests and replacing hash maps with RangeMap.

I also benchmarked this patch. For the Lua 5.1 lexer, this PR reduces compile
times 9%. Generated code is identical except orders of alternatives in or
patterns for ranges.

Fixes #2

@osa1 osa1 merged commit 9f0f30f into main May 28, 2021
@osa1 osa1 deleted the range_set branch May 28, 2021 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Overlapping ranges break lexing
1 participant