ladybird

mirror of https://github.com/LadybirdBrowser/ladybird synced 2026-04-26 01:35:08 +02:00

Author	SHA1	Message	Date
Ben Wiederhake	7fb7025d69	LibRegex: Remove unused header in Regex	2026-02-23 12:15:23 +01:00
Jelle Raaijmakers	1745926fc6	AK+Everywhere: Use MurmurHash3 for int/u64 hashing Rework our hash functions a bit for significant better performance: * Rename int_hash to u32_hash to mirror u64_hash. * Make pair_int_hash call u64_hash instead of multiple u32_hash()es. * Implement MurmurHash3's fmix32 and fmix64 for u32_hash and u64_hash. On my machine, this speeds up u32_hash by 20%, u64_hash by ~290%, and pair_int_hash by ~260%. We lose the property that an input of 0 results in something that is not 0. I've experimented with an offset to both hash functions, but it resulted in a measurable performance degradation for u64_hash. If there's a good use case for 0 not to result in 0, we can always add in that offset as a countermeasure in the future.	2026-02-20 22:47:24 +01:00
aplefull	aeec2c804c	LibRegex: Implement Unicode case-insensitive matching Previously, case-insensitive regex matching used ASCII-only case conversion (to_ascii_lowercase) even for Unicode characters. Now we implement Canonicalize abstract operation, so we can case-fold Unicode characters properly during case-insensitive matching.	2026-02-16 07:51:00 -05:00
Ali Mohammad Pur	01be1ed583	LibRegex: Mark OpCode_classes with REGEX_API	2026-02-07 14:09:56 +01:00
Ali Mohammad Pur	6aba31ba13	LibRegex: Add some FileCheck-like tests to ensure opts don't break	2026-02-07 14:09:56 +01:00
Ali Mohammad Pur	fedf0f78ca	LibRegex: Reject RSeekTo crossing the current-to-EOL boundary	2026-02-07 14:09:56 +01:00
Ali Mohammad Pur	f4d4bd9ed1	LibRegex: Ignore 'FailIfEmpty' in dot-star loop detection	2026-02-07 14:09:56 +01:00
mikiubo	5aaf08c7cf	LibRegex: Make RegexDebug resilient to empty state vectors Avoid crashing in RegexDebug when saved_positions or step_backs are empty. These cases are already handled correctly by the bytecode execution, but the debug output assumed non-empty vectors. Print a placeholder instead when no entries are present. This fixes #7502.	2026-01-21 14:20:08 +01:00
aplefull	e4572aa9d7	LibRegex: Add support for regex modifiers This commit implements the regexp-modifiers proposal. It allows us to use modification of i,m,s flags within groups using `(?flags:subpattern)` and `(?flags-flags:subpattern)` syntax.	2026-01-16 15:00:00 +01:00
aplefull	6ce312e22f	LibRegex: Prevent empty matches in optional quantifiers Step 2.b of the RepeatMatcher states that once minimum repetitions are satisfied, empty matches should not be considered for further repetitions. This was not being enforced for optional quantifiers like `?`, so we had extra capture group matches.	2026-01-16 01:11:24 +01:00
mikiubo	535d2476a7	LibRegex: Implement proper lookbehind via new StepBack opcodes This introduces a new mechanism for evaluating lookbehind assertions by adding four new bytecode opcodes: SetStepBack, IncStepBack, CheckStepBack, and CheckSavedPosition. These opcodes replace the previous GoBack-based approach and enables correct handling of variable-length lookbehind patterns, where the match length cannot be known statically. Track lookbehind greediness in the parser and propagate it to bytecode generation. Allow controlled backtracking in lookbehind bodies while avoiding incorrect captures during step-back execution. Partially fix issue: #3459	2026-01-11 23:24:49 +01:00
Jelle Raaijmakers	ae20ecf857	AK+Everywhere: Add Vector::contains(predicate) and use it No functional changes.	2026-01-08 15:27:30 +00:00
Ali Mohammad Pur	2677338f43	LibRegex: Process RSeekTo candidates in the correct order	2026-01-07 00:14:02 +01:00
Ali Mohammad Pur	9668927dfc	LibRegex: Don't generate duplicate results for /.*/ patterns Since the code pattern may span multiple blocks, this can generate duplicate results; keep the last one to avoid corrupting the bytecode.	2026-01-06 19:09:27 +01:00
Ali Mohammad Pur	363f1f6568	LibRegex: Correctly calculate ForkIf target offset in tree alternatives	2026-01-06 19:09:27 +01:00
Ali Mohammad Pur	41ce1023b8	LibRegex: Add default initialisers to ParserResult to make gcc happy	2026-01-05 18:22:11 +01:00
Ali Mohammad Pur	fbd898fb54	LibRegex: Use nicer rewrite APIs where possible Co-Authored-By: Hendiadyoin1 <leon.a@serenityos.org>	2026-01-05 18:22:11 +01:00
Ali Mohammad Pur	c1535ef65b	LibRegex: Skip multi-op compare overhead when not necessary	2026-01-05 18:22:11 +01:00
Ali Mohammad Pur	637d47ba30	LibRegex: Add an optimisation for replacing /.*x/ with a seek op This will avoid some catastrophic backtracking by just skipping to 'x'.	2026-01-05 18:22:11 +01:00
Ali Mohammad Pur	77d982d6fe	LibRegex: Restore the pure substring search optimisation for u16view `ca2f0141f6` removed only the execution side of this, which made it skip some optimisations for pure string searches. This commit implements it properly for utf16 strings instead.	2026-01-05 18:22:11 +01:00
Ali Mohammad Pur	e2c6918cdb	LibRegex: Fuse consecutive single-char Compares into a String Compare This avoids huge instruction decoding and dispatch overhead, ~40x performance improvement for /(^\|x)ppp/.	2026-01-05 18:22:11 +01:00
Ali Mohammad Pur	9d49fafdbf	LibRegex: Add an optimisation to skip forks that cannot produce a match ...and implement it for 'start of line' checks. This makes patterns like /(^\|x)ppp/ fork-free at runtime, ~30% perf improvement for that pattern.	2026-01-05 18:22:11 +01:00
Ali Mohammad Pur	0acac7f02b	LibRegex: Split basic blocks at jump targets too	2026-01-05 18:22:11 +01:00
Ali Mohammad Pur	3f35d84785	LibRegex+LibJS: Flatten the bytecode buffer before regex execution This makes it so we don't have to unnecessarily check for having a flattened buffer; significant performance increase.	2026-01-05 18:22:11 +01:00
aplefull	3e391bdb2d	LibRegex: Use token-state restoration in character class parsing Previously, we used restoration based on character position in parser. This caused the lexer to re-tokenize from the middle of multi-character tokens like escape sequences, and led to incorrect parse failures for patterns like `[\[\]]`. We would backtrack to before the first `\[` token, then re-lex the `[` as a separate token instead of part of the `\[` escape. Now we save and restore the actual token object along with the lexer index, so we keep correct token state when backtracking.	2025-12-23 11:04:16 +01:00
aplefull	ff06a4a9e5	LibRegex: Fix negated class validation for nested string properties We were incorrectly checking for negated character class when string properties appeared in nested classes. Now we track negation state in the parser and correctly reject invalid string properties in negated classes.	2025-12-23 11:04:16 +01:00
aplefull	f3a32a0b1a	LibRegex: Use code unit offset for starting range checks	2025-12-23 11:04:16 +01:00
aplefull	1b570fcd61	LibRegex: Correct negated character class escapes behavior Patterns like /[^\S]/ should match whitespace characters, but previously would fail to match. The position would advance twice: once during the character class comparison, and again at the end when temporary_inverse was reset. This caused matches to be skipped incorrectly. Now we advance at the end only if position hasn't already changed during the loop.	2025-12-23 11:04:16 +01:00
aplefull	52a3c19c0a	LibRegex: Clamp large quantifier values instead of rejecting them Fixes parsing of regex quantifiers with extremely large numeric values. Previously, very large quantifiers would fail to parse, but Chrome and Firefox both clamp such large values to 2^31-1 instead of rejecting them. So now we do the same.	2025-12-23 11:04:16 +01:00
Andreas Kling	7d7886afea	LibJS: Don't assume flattened bytecode when dumping OpCode_Compare Fixes #7129	2025-12-13 16:40:19 -06:00
Andreas Kling	f7ea47145c	LibRegex: Only call OpCode::size() once per matcher iteration	2025-12-13 13:51:12 -06:00
Andreas Kling	67b20017dc	LibRegex: Cache pointer to flattened bytecode data in OpCode_Compare This avoids repeatedly checking if the bytecode has been flattened (which is always the case by the time we're executing). 1.05x speedup on Octane/regexp.js	2025-12-13 13:51:12 -06:00
Andreas Kling	82fe962d96	LibJS: Don't rerun regexp optimizer every time a regexp literal is used	2025-12-12 11:43:35 -06:00
aplefull	934817d45e	LibRegex: Add missing StringSet cases	2025-11-27 14:02:04 +01:00
Tim Ledbetter	1abc91ccc6	LibRegex: Put debug mode code block behind a flag This block should be optimized out anyway, but putting the whole thing behind a flag makes the intent clearer.	2025-11-26 14:33:59 +00:00
Tim Ledbetter	4c491b8920	LibRegex: Remove unused code from `RegexStringView`	2025-11-26 14:33:59 +00:00
Tim Ledbetter	061b457bac	LibRegex: Use `unchecked_empend()` where possible	2025-11-26 14:33:59 +00:00
aplefull	eed4dd3745	LibRegex: Add support for string literals in character classes	2025-11-26 11:34:38 +01:00
aplefull	a49c39de32	LibRegex: Support matching unicode multi-character sequences	2025-11-26 11:34:38 +01:00
Ali Mohammad Pur	d5d37abfa5	AK+LibRegex: Only set node metadata on Trie::ensure_child if missing `a290034a81` passed an empty vector to this, which caused nodes that appeared multiple times to reset the trie metadata...which broke the optimisation. This patchset makes the function take a 'provide missing metadata' function instead, and only invokes it when the node is missing rather than unconditionally setting the metadata on all nodes.	2025-11-21 02:46:33 +01:00
Ali Mohammad Pur	a290034a81	LibRegex: Start alternation opt nodes with an empty vector ...instead of checking every time whether there's a vector there. Fixes #6755.	2025-11-08 11:51:27 +01:00
Ali Mohammad Pur	57ef949b61	LibRegex: Account for nested 'or' compare ops Closes #6647.	2025-11-01 17:49:57 +01:00
aplefull	8c9c2ee289	LibRegex: Track local compares in nested classes	2025-11-01 14:38:08 +01:00
aplefull	5632a52531	LibRegex: Properly track code units in u-v modes Previously, both string_position and view_index used code unit offsets regardless of mode. Now in unicode mode, these variables track code point positions while string_position_in_code_units is properly updated to reflect code unit offsets.	2025-10-24 21:23:06 +02:00
aplefull	7ce4abe330	LibRegex+LibUnicode: Add unicode string properties	2025-10-24 13:24:55 -04:00
aplefull	4b989b8efd	LibRegex: Add support for forward references to named capture groups This commit implements support for forward references to named capture groups. We now allow patterns like \k<name>(?<name>x) and self-references like (?<name>\k<name>x).	2025-10-16 16:37:54 +02:00
aplefull	25a47ceb1b	LibRegex+LibJS: Include all named capture groups in source order Previously, named capture groups in RegExp results did not always follow their source order, and unmatched groups were omitted. According to the spec, all named capture groups must appear in the result object in the order they are defined, even if they did not participate in the match. This commit makes sure we follow this requirement.	2025-10-16 16:37:54 +02:00
aplefull	c4eef822de	LibRegex: Fix backreferences to undefined capture groups Fixes handling of backreferences when the referenced capture group is undefined or hasn't participated in the match. CharacterCompareType::NamedReference is added to distinguish numbered (\1) from named (\k<name>) backreferences. Numbered backreferences use exact group lookup. Named backreferences search for participating groups among duplicates.	2025-10-16 16:37:54 +02:00
Rocco Corsi	3d1d055e27	LibRegex: Export OpCode/OpCode_Compare for REGEX_DEBUG builds When building with REGEX_DEBUG or ENABLE_ALL_THE_DEBUG_MACROS there are two issues with linking of bin/TestRegex - Libraries/LibRegex/RegexDebug.h:76 with undefined reference regex::OpCode_Compare::variable_arguments_to_byte_string( AK::Optional<regex::MatchInput const&>) const - Libraries/LibRegex/RegexByteCode.h:672 with undefined reference regex::OpCode::name(regex::OpCodeId) Add REGEX_API on regex::OpCode and regex::OptCode_Compare to allow access to the classes in bin/TestRegex	2025-09-18 11:02:13 +02:00
Callum Law	8ada4b7fdc	LibRegex: Account for opcode size when calculating incoming jump edges Not accounting for opcode size when calculating incoming jump edges meant that we were merging nodes where we otherwise shouldn't have been, for example /.a\|.b/.	2025-07-28 17:06:58 +02:00

1 2 3 4

160 Commits