Commit Graph

17 Commits

Author SHA1 Message Date
aplefull
3e391bdb2d LibRegex: Use token-state restoration in character class parsing
Previously, we used restoration based on character position in parser.
This caused the lexer to re-tokenize from the middle of multi-character
tokens like escape sequences, and led to incorrect parse failures for
patterns like `[\[\]]`. We would backtrack to before the first `\[`
token, then re-lex the `[` as a separate token instead of part of the
`\[` escape.

Now we save and restore the actual token object along with the lexer
index, so we keep correct token state when backtracking.
2025-12-23 11:04:16 +01:00
aplefull
ff06a4a9e5 LibRegex: Fix negated class validation for nested string properties
We were incorrectly checking for negated character class when string
properties appeared in nested classes. Now we track negation state in
the parser and correctly reject invalid string properties in negated
classes.
2025-12-23 11:04:16 +01:00
aplefull
1b570fcd61 LibRegex: Correct negated character class escapes behavior
Patterns like /[^\S]/ should match whitespace characters, but previously
would fail to match. The position would advance twice: once during the
character class comparison, and again at the end when temporary_inverse
was reset. This caused matches to be skipped incorrectly.

Now we advance at the end only if position hasn't already changed during
the loop.
2025-12-23 11:04:16 +01:00
aplefull
52a3c19c0a LibRegex: Clamp large quantifier values instead of rejecting them
Fixes parsing of regex quantifiers with extremely large numeric values.
Previously, very large quantifiers would fail to parse, but Chrome and
Firefox both clamp such large values to 2^31-1 instead of rejecting
them. So now we do the same.
2025-12-23 11:04:16 +01:00
Ali Mohammad Pur
57ef949b61 LibRegex: Account for nested 'or' compare ops
Closes #6647.
2025-11-01 17:49:57 +01:00
aplefull
8c9c2ee289 LibRegex: Track local compares in nested classes 2025-11-01 14:38:08 +01:00
aplefull
c4eef822de LibRegex: Fix backreferences to undefined capture groups
Fixes handling of backreferences when the referenced capture group is
undefined or hasn't participated in the match.
CharacterCompareType::NamedReference is added to distinguish numbered
(\1) from named (\k<name>) backreferences. Numbered backreferences use
exact group lookup. Named backreferences search for participating
groups among duplicates.
2025-10-16 16:37:54 +02:00
Callum Law
8ada4b7fdc LibRegex: Account for opcode size when calculating incoming jump edges
Not accounting for opcode size when calculating incoming jump edges
meant that we were merging nodes where we otherwise shouldn't have been,
for example /.*a|.*b/.
2025-07-28 17:06:58 +02:00
Ali Mohammad Pur
c7ad6cd508 LibRegex: Use code unit length in more places that apply
Finishes what 7f6b70fafb started.
Having one part use length and another code unit length lead to crashes,
the added test ensures we don't mess that up again.
2025-07-24 23:09:01 +02:00
aplefull
e2f8f5a350 LibRegex: Fix capture groups in quantified alternations
This prevents empty matches from overwriting non-empty captures in
quantified alternations. Fixes patterns like (a|a?)+ where the optional
branch would incorrectly overwrite meaningful captures with empty
strings.
2025-07-24 10:40:16 +02:00
Timothy Flynn
2dfcc4c307 LibRegex: Compare code units (not code points) in non-Unicode char range 2025-07-21 23:44:18 +02:00
Timothy Flynn
9582895759 AK+LibJS+LibWeb+LibRegex: Replace AK::Utf16Data with AK::Utf16String 2025-07-18 12:45:38 -04:00
Ali Mohammad Pur
5b45223d5f LibRegex: Account for uppercase characters in insensitive patterns 2025-07-12 11:26:23 +02:00
Shannon Booth
bd6581fe22 LibRegex: Correctly use ClassSetReservedPunctuator in ClassSetCharacter
We had typo'd using ClassSetReservedDoublePunctuator which was
resulting in a parse error for the regex:

([^\\:]+?)

With the 'v' flag set.

Co-Authored-By: Ali Mohammad Pur <mpfard@serenityos.org>
2025-07-10 11:41:02 +02:00
aplefull
486602e796 LibRegex: Fix handling of + quantifier with zero-width matches
Small change that allows quantifiers using Fork* forms (e.g., +) to
succeed after one match, even if that match has zero width.
2025-06-02 15:52:26 +02:00
Ali Mohammad Pur
cfc241f61d LibRegex: Make the trie rewrite optimisation maintain the alt order
This is required by the spec.
2025-05-21 14:28:45 +02:00
ayeteadoe
11bca38f91 CMake: Build LibRegex tests in Tests/LibRegex not Meta/Lagom
As LibRegex was not specified in TEST_DIRECTORIES, the existing
Tests/LibRegex subdirectory was not actually included during
configuration. Also the RegexLibC test has not been needed
since migration away from Serenitys LibC was done, so
that test has been fully removed. I also renamed the
Regex.cpp test to TestRegex.cpp to match the naming
convention of most test targets.
2025-05-14 02:05:12 -06:00