ladybird

mirror of https://github.com/LadybirdBrowser/ladybird synced 2026-04-26 01:35:08 +02:00

Author	SHA1	Message	Date
aplefull	6f1b7c8d50	LibRegex: Track optional capture groups for match_length_minimum Backreferences can match the empty string when the referenced group didn't participate in the match, so we shouldn't add their length to the match_length_minimum, as it makes us skip valid matches.	2026-02-26 13:50:11 +01:00
aplefull	53a98f26d4	LibRegex: Exclude lookahead assertions from match_length_minimum Lookaheads are zero-width assertions and should not affect the minimum match length.	2026-02-26 13:50:11 +01:00
aplefull	2ac99312b0	LibRegex: Restore parser state for incomplete \x and \u escapes In non-Unicode mode, incomplete escape sequences like `\x0` or `\u00` should be parsed as literal characters. `read_digits_as_string` consumed hex digits but did not restore the parser position when fewer digits than required were found, and `consume_escaped_code_point` did not update `current_token` after falling back to literal 'u'.	2026-02-26 13:50:11 +01:00
aplefull	96d6dba37c	LibRegex: Backtrack 2 characters for legacy octal escapes When `\0` is followed by digits, we backtrack to parse it as a legacy octal escape. We need to backtrack 2 characters, so `parse_legacy_octal_escape` sees the leading `0` and can parse sequences correctly.	2026-02-26 13:50:11 +01:00
aplefull	e4572aa9d7	LibRegex: Add support for regex modifiers This commit implements the regexp-modifiers proposal. It allows us to use modification of i,m,s flags within groups using `(?flags:subpattern)` and `(?flags-flags:subpattern)` syntax.	2026-01-16 15:00:00 +01:00
mikiubo	535d2476a7	LibRegex: Implement proper lookbehind via new StepBack opcodes This introduces a new mechanism for evaluating lookbehind assertions by adding four new bytecode opcodes: SetStepBack, IncStepBack, CheckStepBack, and CheckSavedPosition. These opcodes replace the previous GoBack-based approach and enables correct handling of variable-length lookbehind patterns, where the match length cannot be known statically. Track lookbehind greediness in the parser and propagate it to bytecode generation. Allow controlled backtracking in lookbehind bodies while avoiding incorrect captures during step-back execution. Partially fix issue: #3459	2026-01-11 23:24:49 +01:00
Jelle Raaijmakers	ae20ecf857	AK+Everywhere: Add Vector::contains(predicate) and use it No functional changes.	2026-01-08 15:27:30 +00:00
Ali Mohammad Pur	637d47ba30	LibRegex: Add an optimisation for replacing /.*x/ with a seek op This will avoid some catastrophic backtracking by just skipping to 'x'.	2026-01-05 18:22:11 +01:00
Ali Mohammad Pur	e2c6918cdb	LibRegex: Fuse consecutive single-char Compares into a String Compare This avoids huge instruction decoding and dispatch overhead, ~40x performance improvement for /(^\|x)ppp/.	2026-01-05 18:22:11 +01:00
aplefull	3e391bdb2d	LibRegex: Use token-state restoration in character class parsing Previously, we used restoration based on character position in parser. This caused the lexer to re-tokenize from the middle of multi-character tokens like escape sequences, and led to incorrect parse failures for patterns like `[\[\]]`. We would backtrack to before the first `\[` token, then re-lex the `[` as a separate token instead of part of the `\[` escape. Now we save and restore the actual token object along with the lexer index, so we keep correct token state when backtracking.	2025-12-23 11:04:16 +01:00
aplefull	ff06a4a9e5	LibRegex: Fix negated class validation for nested string properties We were incorrectly checking for negated character class when string properties appeared in nested classes. Now we track negation state in the parser and correctly reject invalid string properties in negated classes.	2025-12-23 11:04:16 +01:00
aplefull	52a3c19c0a	LibRegex: Clamp large quantifier values instead of rejecting them Fixes parsing of regex quantifiers with extremely large numeric values. Previously, very large quantifiers would fail to parse, but Chrome and Firefox both clamp such large values to 2^31-1 instead of rejecting them. So now we do the same.	2025-12-23 11:04:16 +01:00
aplefull	eed4dd3745	LibRegex: Add support for string literals in character classes	2025-11-26 11:34:38 +01:00
aplefull	a49c39de32	LibRegex: Support matching unicode multi-character sequences	2025-11-26 11:34:38 +01:00
aplefull	8c9c2ee289	LibRegex: Track local compares in nested classes	2025-11-01 14:38:08 +01:00
aplefull	7ce4abe330	LibRegex+LibUnicode: Add unicode string properties	2025-10-24 13:24:55 -04:00
aplefull	4b989b8efd	LibRegex: Add support for forward references to named capture groups This commit implements support for forward references to named capture groups. We now allow patterns like \k<name>(?<name>x) and self-references like (?<name>\k<name>x).	2025-10-16 16:37:54 +02:00
aplefull	25a47ceb1b	LibRegex+LibJS: Include all named capture groups in source order Previously, named capture groups in RegExp results did not always follow their source order, and unmatched groups were omitted. According to the spec, all named capture groups must appear in the result object in the order they are defined, even if they did not participate in the match. This commit makes sure we follow this requirement.	2025-10-16 16:37:54 +02:00
aplefull	c4eef822de	LibRegex: Fix backreferences to undefined capture groups Fixes handling of backreferences when the referenced capture group is undefined or hasn't participated in the match. CharacterCompareType::NamedReference is added to distinguish numbered (\1) from named (\k<name>) backreferences. Numbered backreferences use exact group lookup. Named backreferences search for participating groups among duplicates.	2025-10-16 16:37:54 +02:00
Jelle Raaijmakers	3db7d802db	LibRegex: Early return in `Parser::try_skip()` No functional changes.	2025-07-22 09:10:32 -04:00
Shannon Booth	bd6581fe22	LibRegex: Correctly use ClassSetReservedPunctuator in ClassSetCharacter We had typo'd using ClassSetReservedDoublePunctuator which was resulting in a parse error for the regex: ([^\\:]+?) With the 'v' flag set. Co-Authored-By: Ali Mohammad Pur <mpfard@serenityos.org>	2025-07-10 11:41:02 +02:00
Timothy Flynn	62d9a84b8d	AK+Everywhere: Replace custom number parsers with fast_float Our floating point number parser was based on the fast_float library: https://github.com/fastfloat/fast_float However, our implementation only supports 8-bit characters. To support UTF-16, we will need to be able to convert char16_t-based strings to numbers as well. This works out-of-the-box with fast_float. We can also use fast_float for integer parsing.	2025-07-03 09:51:56 -04:00
Timothy Flynn	7280ed6312	Meta: Enforce newlines around namespaces This has come up several times during code review, so let's just enforce it using a new clang-format 20 option.	2025-05-14 02:01:59 -06:00
Ali Mohammad Pur	76f5dce3db	LibRegex: Flatten capture group list in MatchState This makes copying the capture group COWVector significantly cheaper, as we no longer have to run any constructors for it - just memcpy.	2025-04-18 17:09:27 +02:00
Ali Mohammad Pur	4136d8d13e	LibRegex: Use an interned string table for capture group names This avoids messing around with unsafe string pointers and removes the only non-FlyString-able user of DeprecatedFlyString.	2025-04-02 11:43:13 +02:00
Andreas Kling	e5db913b0d	Revert "LibRegex: Port remaining DeprecatedFlyString to ByteString" This reverts commit `aab3fbe254`. Greatly regressed JavaScript benchmark performance.	2025-04-01 15:40:38 +02:00
Kenneth Myhra	aab3fbe254	LibRegex: Port remaining DeprecatedFlyString to ByteString	2025-04-01 12:50:00 +02:00
Andreas Kling	46a5710238	LibJS: Use FlyString in PropertyKey instead of DeprecatedFlyString This required dealing with substantial fallout.	2025-03-24 22:27:17 +00:00
aplefull	389a63d6bf	LibRegex: Allow duplicate named capture groups in separate alternatives	2025-03-05 14:36:09 +01:00
Pavel Shliak	811d5a5c3e	LibRegex: Remove duplicated condition	2024-12-22 12:33:41 +01:00
Marc Jessome	efcaf991e6	LibRegex: Ensure nested capture groups have non-conflicting names Take record of the named capture group prior to parsing the group's body. This requires removal of the recorded minimum length of the named capture group directly, and now needs to be looked up via the group minimu lengths table.	2024-11-24 10:26:09 +01:00
Pavel Shliak	cdb54fe504	LibRegex: Clean up #include directives This change aims to improve the speed of incremental builds.	2024-11-21 14:08:33 +01:00
Timothy Flynn	93712b24bf	Everywhere: Hoist the Libraries folder to the top-level	2024-11-10 12:50:45 +01:00
Andreas Kling	13d7c09125	Libraries: Move to Userland/Libraries/	2021-01-12 12:17:46 +01:00
Luke	0f66589007	Everywhere: Fix more typos	2020-12-31 01:47:41 +01:00
AnotherTest	ee1d9217aa	LibRegex: Fail early if eof seen after '(?' Fixes #4583.	2020-12-28 13:09:22 +01:00
AnotherTest	765d2977bc	LibRegex: Add basic support for unicode escapes in ECMA262Parser This parses unicode escapes (and matches them only for utf8 strings).	2020-12-06 15:38:40 +01:00
AnotherTest	86811683b0	LibRegex: Remove Lexer::slice_back() and just use StringViews	2020-12-06 15:38:40 +01:00
AnotherTest	19bf7734a4	LibRegex: Store 'String' matches inside the bytecode Also removes an unnecessary 'length' argument (StringView has a length!)	2020-12-06 15:38:40 +01:00
AnotherTest	6394720c87	LibRegex: Don't try to consume the escaped character if at EOF Fixes assert on e.g. `new RegExp("\\")`	2020-11-30 17:45:05 +01:00
Linus Groh	eea7cabdbc	LibRegex: Use match_ordinary_characters() in ECMA262Parser::parse_atom() Otherwise we would only match TokenType::Char, making all of these invalid: - /foo,bar/ - /foo\/bar/ - /foo=bar/ - /foo-bar/ - /foo:bar/ Fixes #4243.	2020-11-29 20:35:52 +01:00
AnotherTest	158fe9d9ca	LibRegex: Allow syntax characters (except ']') without escapes in classes e.g. `[:]`	2020-11-29 20:32:10 +01:00
Linus Groh	cbe4595ec2	LibRegex: Fix clang build errors	2020-11-29 09:29:26 +01:00
AnotherTest	491e4a8a3b	LibRegex: Allow '-' as the last element of a charclass Fixes #4189.	2020-11-28 10:13:33 +01:00
AnotherTest	e2fa1b40c4	LibRegex: Allow unknown escapes in non-unicode mode (for ECMA262) This makes regexps like `/\x/` to work as normal. Partially deals with #4189.	2020-11-28 10:13:33 +01:00
AnotherTest	801750b95a	LibRegex: Fix parsing identity escape sequences Also fixes the propagation of default options (the previous implementation reset them to zero before parsing...). Partially deals with #4189.	2020-11-28 10:13:33 +01:00
AnotherTest	e83e7a03c2	LibRegex: Stop trying to read a character class when no tokens remain e.g. in "[". Fixes #4186.	2020-11-28 10:13:33 +01:00
AnotherTest	dbef2b1ee9	LibRegex: Implement an ECMA262-compatible parser This also adds support for lookarounds and individually-negated comparisons. The only unimplemented part of the parser spec is the unicode stuff.	2020-11-27 21:32:41 +01:00
AnotherTest	3db8ced4c7	LibRegex: Change bytecode value type to a 64-bit value To allow storing unicode ranges compactly; this is not utilised at the moment, but changing this later would've been significantly more difficult. Also fixes a few debug logs.	2020-11-27 21:32:41 +01:00
AnotherTest	92ea9ed4a5	LibRegex: Fix greedy/reluctant modifiers in PosixExtendedParser Also fixes the issue with assertions causing early termination when they fail.	2020-11-27 21:32:41 +01:00

1 2

52 Commits