Commit Graph

4 Commits

Author SHA1 Message Date
Andreas Kling
e0de4ef33e LibRegex: Reject negated /v classes that contain strings
Negated unicode-set classes are only valid when every member is
single-code-point. We already rejected direct string-valued members
such as `q{ab}` and `p{RGI_Emoji_Flag_Sequence}` inside `[^...]`,
but nested class-set operands could still smuggle them through, so
patterns like `[^[[p{Emoji_Keycap_Sequence}]]]` and the reported
fuzzed literal compiled instead of throwing.

Validate nested class-set expressions after parsing and reject only the
negated `/v` classes whose resulting multi-code-point strings are still
non-empty. Track the exact string members contributed by string
literals, string properties, and nested classes so intersections and
subtractions can eliminate them before the negated-class check runs.

Add constructor and literal coverage for the reduced nested-string
cases, the original regression, and valid negated set operations that
remove every string member.
2026-03-31 15:59:04 +02:00
Andreas Kling
c12647fc37 LibRegex: Clamp braced quantifier bounds to 2^31 - 1
Browsers clamp braced quantifier bounds above 2^31 - 1 before
checking whether {min,max} is in order. The parser still kept values
up to u32::MAX, so patterns like {2147483648,2147483647} were
rejected even though both bounds should collapse to the same limit.

Clamp parsed braced quantifier bounds to 2^31 - 1 as they are read.
This keeps the existing acceptance of huge exact and open-ended
quantifiers and makes the constructor and regex literal paths agree
with other engines on the out-of-order edge cases.

The RegExp runtime and syntax tests now cover accepted huge
quantifiers, clamped order validation, and huge literal forms. The
reported constructor and literal cases also match other engines.
2026-03-31 15:59:04 +02:00
Andreas Kling
50b137f527 LibJS: Reject mixed surrogate forms in RegExp names
Reject surrogate pairs in named group names unless both halves come
from the same raw form. A literal surrogate half was being
normalized into \uXXXX before LibRegex parsed the pattern, which let
mixed literal and escaped forms sneak through.

Validate surrogate handling on the UTF-16 pattern before
normalization, but only treat \k<...> as a named backreference when
the parser would do that too. Legacy regexes without named groups
still use \k as an identity escape, so their literal text must not be
rejected by the pre-scan.

Add runtime and syntax tests for the mixed forms, the valid literal,
fixed-width, and braced escape cases, and the legacy \k literals.
2026-03-31 15:59:04 +02:00
Andreas Kling
1f4700a0c8 LibJS: Add tests for regex literal parse-time error reporting
Cover the paths exercised by deferred regex compilation:
- Invalid patterns in literals, eval(), and new Function()
- Duplicate and invalid flags in literals
- Valid literals with various flag combinations
- Multiple literals in the same scope
- Literals inside regular and arrow functions
2026-03-06 13:06:05 +01:00