ladybird

mirror of https://github.com/LadybirdBrowser/ladybird synced 2026-05-11 09:27:00 +02:00

Author	SHA1	Message	Date
Andreas Kling	73812e12d2	LibJS: Fast path Array.prototype.indexOf on packed arrays Skip the generic HasProperty and Get loop when indexOf operates on a simple packed array. In that case every index below length is an own data property, so a direct scan of the packed indexed property storage gives the same strict-equality result without the per-element property lookup ceremony. Only use the fast path when the current packed storage size still matches the length captured before fromIndex coercion, since that coercion can run user code and mutate the receiver. Add coverage for length and storage mutations during fromIndex coercion.	2026-04-27 08:39:37 +02:00
Andreas Kling	e65e85cb8c	LibJS: Materialize arguments object for shorthand `{ arguments }` The parser only set `might_need_arguments_object` when an `arguments` or `eval` Identifier went through `consume()`, but shorthand object properties create the reference via `make_identifier()` directly. As a result `function f() { return { arguments } }` allocated an `arguments` local, never initialized it, and crashed at runtime when the property was read. Fall back to scope-driven detection: if scope analysis allocated a non-lexical `arguments` local for the function, treat it as a real arguments-object reference and emit `CreateArguments`. Skip the fallback when a function declaration named `arguments` claims the local, since that local belongs to the function, not the arguments object. Add a runtime test covering shorthand inside a free function and a method, plus a regression test for `({ eval } = ...)` to confirm destructuring assignment doesn't accidentally trigger arguments materialization.	2026-04-27 08:04:11 +02:00
Andreas Kling	c1bc0cdfa9	LibJS: Allocate local variable indices in source order The scope collector stored identifier_groups and variables in HashMaps and then sorted them alphabetically before assigning local register indices. The sorts existed only because HashMap iteration order is non-deterministic; alphabetical was a stable choice for comparing bytecode against the now-removed C++ port. Switch both maps to indexmap::IndexMap so iteration follows the order of first reference (= source order), and drop the alphabetical sorts. Local indices now reflect declaration order, which matches what shows up in bytecode dumps and is easier to read alongside the source. Add a focused bytecode test using zebra/yak/aardvark to pin the new allocation order; existing tests using let/var declarations have their local indices renumbered to match.	2026-04-27 08:04:11 +02:00
Andreas Kling	010deec578	LibJS: Build functions_to_initialize in source order ECMAScript hoisting keeps the LAST function declaration with a given name. The Rust scope_collector and script GDI extraction implemented this with a single reverse scan that pushed first-seen entries, which left the resulting list in REVERSE source order. The C++ side then iterated `m_functions_to_initialize.in_reverse()` to undo that. Switch the Rust side to a two-pass forward scan that records the last position per name and emits entries in source order, and drop the matching `.in_reverse()` calls in Script.cpp and AbstractOperations.cpp. Same hoisting semantics; NewFunction emission and global property iteration order now follow the source. The HashMap that tracks last positions is keyed on `SharedUtf16String`, so each insert is a refcount bump on the AST's existing Rc instead of a deep `Vec<u16>` clone. Add bytecode tests at script and nested-function scope that exercise multiple declarations and a duplicate name to pin the new ordering.	2026-04-27 08:04:11 +02:00
Andreas Kling	30394ece8d	LibJS: Use natural source positions for parser-synthesized identifiers The Rust parser used to copy several "rule_start"-derived positions from the C++ implementation: every identifier inside a binding pattern inherited the pattern's `[`/`{` position, every property identifier after `.` inherited the period's position, every spread element inherited the surrounding `[`/`{` position, and identifier-name property keys inherited the object/class start position. This was useful while comparing bytecode against the C++ port; with the C++ side gone, those quirks just hide the actual source positions in source maps and devtools. Drop the dedicated `binding_pattern_start` parser field and the `ident_pos_override` parameter on `parse_property_key`, and capture each identifier's own start position at the consume site. Add an AST snapshot test that pins the new per-identifier positions for object, array, nested, and parameter binding patterns.	2026-04-27 08:04:11 +02:00
Andreas Kling	cec0be6f3d	LibJS: Replace in_property_key_context flag with explicit consume helper The parser used to suppress the arguments/eval reference check via a state flag that was set during the entire `parse_property_key` call. That was over-broad: identifiers inside a computed property key like `{ [arguments]: 1 }` are real references, but the flag silenced their check too, leaving the function unmarked as needing the arguments object. Reading the resulting property at runtime crashed. Replace the flag with a `consume_property_key_token()` method used at the specific consume sites for the property key token itself, so the suppression is narrow. Inner consumes inside computed keys now go through regular `consume()` and run the check normally. Add a focused AST snapshot test covering plain, shorthand, computed, binding-pattern, and method-name property-key cases.	2026-04-27 08:04:11 +02:00
Timothy Flynn	12d9aaebb3	LibJS: Remove `gc` from the global object No other engine defines this function, so it is an observable difference of our engine. This traces back to the earliest days of LibJS. We now define `gc` in just the test-js and test262 runners.	2026-04-24 18:36:23 +02:00
Aliaksandr Kalenik	bfbc3352b5	LibJS: Extend Array.prototype.shift() fast path to holey arrays indexed_take_first() already memmoves elements down for both Packed and Holey storage, but the caller at ArrayPrototype::shift() only entered the fast path for Packed arrays. Holey arrays fell through to the spec-literal per-element loop (has_property / get / set / delete_property_or_throw), which is substantially slower. Add a separate Holey predicate with the additional safety checks the spec semantics require: default_prototype_chain_intact() (so HasProperty on a hole doesn't escape to a poisoned prototype) and extensible() (so set() on a hole slot doesn't create a new own property on a non-extensible object). The existing Packed predicate is left unchanged -- packed arrays don't need these checks because every index in [0, size) is already an own data property. Allows us to fail at Cloudflare Turnstile way much faster!	2026-04-23 21:47:21 +02:00
Andreas Kling	eb9432fcb8	LibJS: Preserve source positions in bytecode source maps Carry full source positions through the Rust bytecode source map so stack traces and other bytecode-backed source lookups can use them directly. This keeps exception-heavy paths from reconstructing line and column information through SourceCode::range_from_offsets(), which can spend a lot of time building SourceCode's position cache on first use. We're trading some space for time here, but I believe it's worth it at this tag, as this saves ~250ms of main thread time while loading https://x.com/ on my Linux machine. :^) Reading the stored Position out of the source map directly also exposed two things masked by the old range_from_offsets() path: a latent off-by-one in Lexer::new_at_offset() (its consume() bumped line_column past the character at offset; only synthesize_binding_pattern() hit it), and a (1,1) fallback in range_from_offsets() that fired whenever the queried range reached EOF. Fix the lexer, then rebaseline both the bytecode dump tests (no more spurious "1:1") and the destructuring AST tests (binding-pattern identifiers now report their real columns).	2026-04-22 22:34:54 +02:00
Andreas Kling	51758f3022	LibJS: Make bytecode register allocator O(1) Generator::allocate_register used to scan the free pool to find the lowest-numbered register and then Vec::remove it, making every allocation O(n) in the size of the pool. When loading https://x.com/ on my Linux machine, we spent ~800ms in this function alone! This logic only existed to match the C++ register allocation ordering while transitioning from C++ to Rust in the LibJS compiler, so now we can simply get rid of it and make it instant. :^) So drop the "always hand out the lowest-numbered free register" policy and use the pool as a plain LIFO stack. Pushing and popping the back of the Vec are both O(1), and peak register usage is unchanged since the policy only affects which specific register gets reused, not how aggressively.	2026-04-21 13:59:55 +02:00
Andreas Kling	e5d4c5cce8	LibJS: Check TDZ state in asm environment calls GetCalleeAndThisFromEnvironment treated a binding as initialized when its value slot was not <empty>. Declarative bindings do not encode TDZ in that slot, though: uninitialized bindings keep a separate initialized flag and their value starts as undefined. That let the first slow-path TDZ failure populate the environment cache, then a second call at the same site reused the cached coordinate and turned the required ReferenceError into a TypeError from calling undefined. Check Binding.initialized in the asm fast path instead and cover the cached second-hit case with a regression test.	2026-04-20 11:23:34 +02:00
Timothy Flynn	10ce847931	LibJS+LibUnicode: Use LibUnicode as appropriate for lexing JavaScript Now that LibUnicode exports its character type APIs in Rust, we can use them to lex identifiers and whitespace. Fixes #8870.	2026-04-19 10:39:26 +02:00
Andreas Kling	583fa475fb	LibJS: Call RawNativeFunction directly from asm Call The asm interpreter already inlines ECMAScript calls, but builtin calls still went through the generic C++ Call slow path even when the callee was a plain native function pointer. That added an avoidable boundary around hot builtin calls and kept asm from taking full advantage of the new RawNativeFunction representation. Teach the asm Call handler to recognize RawNativeFunction, allocate the callee frame on the interpreter stack, copy the call-site arguments, and jump straight to the stored C++ entry point. NativeJavaScriptBackedFunction and other non-raw callees keep falling through to the existing C++ slow path unchanged.	2026-04-15 15:57:48 +02:00
Andreas Kling	8a9d5ee1a1	LibJS: Separate raw and capturing native functions NativeFunction previously stored an AK::Function for every builtin, even when the callable was just a plain C++ entry point. That mixed together two different representations, made simple builtins carry capture storage they did not need, and forced the GC to treat every native function as if it might contain captured JS values. Introduce RawNativeFunction for plain NativeFunctionPointer callees and keep AK::Function-backed callables on a CapturingNativeFunction subclass. Update the straightforward native registrations in LibJS and LibWeb to use the raw representation, while leaving exported Wasm functions on the capturing path because they still capture state. Wrap UniversalGlobalScope's byte-length strategy lambda in Function<...> explicitly so it keeps selecting the capturing NativeFunction::create overload.	2026-04-15 15:57:48 +02:00
Timothy Flynn	4b1ecbc9df	LibJS+LibUnicode: Update icu4x's calendar module to 2.2.0 First: We now pin the icu4x version to an exact number. Minor version upgrades can result in noisy deprecation warnings and API changes which cause tests to fail. So let's pin the known-good version exactly. This patch updates our Rust calendar module to use the new APIs. This initially caused some test failures due to the new Date::try_new API (which is the recommended replacement for Date::try_new_from_codes) having quite a limited year range of +/-9999. So we must use other APIs (Date::try_from_fields and calendrical_calculations::gregorian) to avoid these limits. http://github.com/unicode-org/icu4x/blob/main/CHANGELOG.md#icu4x-22	2026-04-14 18:12:31 -04:00
Andreas Kling	517812647a	LibJS: Pack asm Call shared-data metadata Pack the asm Call fast path metadata next to the executable pointer so the interpreter can fetch both values with one paired load. This removes several dependent shared-data loads from the hot path. Keep the executable pointer and packed metadata in separate registers through this binding so the fast path can still use the paired-load layout after any non-strict this adjustment. Lower the packed metadata flag checks correctly on x86_64 as well. Those bits now live above bit 31, so the generator uses bt for single- bit high masks and covers that path with a unit test. Add a runtime test that exercises both object and global this binding through the asm Call fast path.	2026-04-14 12:37:12 +02:00
Andreas Kling	8c7c46f8ec	LibJS: Inline asm interpreter JS Call fast path Handle inline-eligible JS-to-JS Call directly in asmint.asm instead of routing the whole operation through AsmInterpreter.cpp. The asm handler now validates the callee, binds `this` for the non-allocating cases, reserves the callee InterpreterStack frame, populates the ExecutionContext header and Value tail, and enters the callee bytecode at pc 0. Keep the cases that need NewFunctionEnvironment() or sloppy `this` boxing on a narrow helper that still builds an inline frame. This preserves the existing inline-call semantics for promise-job ordering, receiver binding, and sloppy global-this handling while keeping the common path in assembly. Add regression coverage for closure-capturing callees, sloppy primitive receivers, and sloppy undefined receivers.	2026-04-14 08:14:43 +02:00
Andreas Kling	12a916d14a	LibJS: Handle AsmInt returns without C++ helpers Handle Return and End entirely in AsmInt when leaving an inline frame. The handlers now restore the caller, update the interpreter stack bookkeeping directly, and bump the execution generation without bouncing through AsmInterpreter.cpp. Add WeakRef tests that exercise both inline Return and inline End so this path stays covered.	2026-04-14 08:14:43 +02:00
Andreas Kling	c301a21960	LibJS: Skip preserving zero-argument call callees The callee and this-value preservation copies only matter while later argument expressions are still being evaluated. For zero-argument calls there is nothing left to clobber them, so we can keep the original operand and let the interpreter load it directly. This removes the hot Mov arg0->reg pattern from zero-argument local calls and reduces register pressure.	2026-04-13 18:29:43 +02:00
Andreas Kling	3a08f7b95f	LibJS: Drop dead entry GetLexicalEnvironment loads Teach the Rust bytecode generator to treat the synthetic entry GetLexicalEnvironment as a removable prologue load. We still model reg4 as the saved entry lexical environment during codegen, but assemble() now deletes that load when no emitted instruction refers to the saved environment register. This keeps the semantics of unwinding and environment restoration intact while letting empty functions and other simple bodies start at their first real instruction.	2026-04-13 18:29:43 +02:00
Andreas Kling	9af5508aef	LibJS: Split inline frames from execution context stack Keep JS-to-JS inline calls out of m_execution_context_stack and walk the active stack from the running execution context instead. Base pushes now record the previous running context so duplicate TemporaryExecutionContext pushes and host re-entry still restore correctly. This keeps the fast JS-to-JS path off the vector without losing GC root collection, stack traces, or helpers that need to inspect the active execution context chain.	2026-04-13 18:29:43 +02:00
Andreas Kling	2ca7dfa649	LibJS: Move bytecode interpreter state to VM The bytecode interpreter only needed the running execution context, but still threaded a separate Interpreter object through both the C++ and asm entry points. Move that state and the bytecode execution helpers onto VM instead, and teach the asm generator and slow paths to use VM directly.	2026-04-13 18:29:43 +02:00
Andreas Kling	3e18136a8c	LibJS: Add a String.fromCharCode builtin opcode Specialize only the fixed unary case in the bytecode generator and let all other argument counts keep using the generic Call instruction. This keeps the builtin bytecode simple while still covering the common fast path. The asm interpreter handles int32 inputs directly, applies the ToUint16 mask in-place, and reuses the VM's cached ASCII single-character strings when the result is 7-bit representable. Non-ASCII single code unit results stay on the dedicated builtin path via a small helper, and the dedicated slow path still handles the generic cases.	2026-04-12 19:15:50 +02:00
Andreas Kling	ce8f92cf6a	LibJS: Reuse cached ASCII strings for substrings Teach the PrimitiveString substring creation path to return the VM's preallocated single-character ASCII strings instead of always allocating a deferred Substring. This keeps one-code-unit ASCII substrings on the same fast path as direct string creation, including callers like charAt and indexed string property access.	2026-04-12 19:15:50 +02:00
Andreas Kling	7bc40bd54a	LibJS: Add a charAt builtin bytecode fast path Tag String.prototype.charAt as a builtin and emit a dedicated bytecode instruction for non-computed calls. The asm interpreter can then stay on the fast path when the receiver is a primitive string with resident UTF-16 data and the selected code unit is ASCII. In that case we can return the VM's cached empty or single-character ASCII string directly.	2026-04-12 19:15:50 +02:00
RubenKelevra	a1ae402bb9	LibJS: Make folded non-decimal prefix parsing UTF-8-safe Folded StringToNumber() and StringToBigInt() detected non-decimal prefixes by slicing the string at byte offset 2. On UTF-8 input this could split at a non-character boundary and panic. To prevent this, we replace the byte-based split with ASCII prefix stripping and preserve rejection of empty suffixes such as "0x", "0o", and "0b" explicitly before parsing the remaining digits. This makes non-decimal prefix folding UTF-8-safe and preserves the expected invalid-result behavior for empty prefixed literals. Tests: Add regression coverage for folded StringToNumber() and StringToBigInt() non-decimal prefix handling to validate the UTF-8 safety fix as 'string-to-number-and-bigint-non-decimal-prefixes.js'. These tests ensure empty suffixes like "0x", "0o", and "0b" and other invalid prefixed forms stay invalid, while valid prefixed literals continue to be accepted. Since we removed a byte-index split in folded StringToNumber()/StringToBigInt() coercion that could panic when byte index 2 landed inside a multi-byte UTF-8 scalar, we add regression tests for representative panic-shape inputs to ensure these coercions now return invalid results instead of crashing as 'string-to-number-and-bigint-utf8-boundary.js'	2026-04-12 17:36:51 +02:00
Shannon Booth	ba59640ab2	LibRegex: Avoid hitting backtrack limit for bounded grouped repetitions Unrolling a bounded quantifier {min,max} into (max-min) optional Split chains lets the backtracker explore O(2^n) paths, which quickly exhausts the backtrack limit for large bounds. Fix this by compiling the optional tail via a RepeatStart/RepeatCheck counted loop when the atom is known to be non-zero-width. The loop is safe to use without a progress check precisely because the atom cannot match empty. This required making atom_can_be_zero_width recursive into group bodies: previously it conservatively returned true for all Group and NonCapturingGroup atoms, so the non-zero-width guard could never fire for grouped subexpressions. The old lowering triggered "Regular expression backtrack limit exceeded" for patterns like /'(?:\\(?:\r\n\|[\s\S])\|[^'\\\r\n]){0,32}'/, causing inputs that should match normally (or return null) to throw instead. Fixes syntax highlighting of the C++ API on https://blend2d.com	2026-04-11 18:43:48 +02:00
Andreas Kling	0969a5cd9a	LibJS: Use Substring for legacy regexp statics Keep the legacy regexp static properties backed by PrimitiveString values instead of eagerly copied Utf16Strings. lastMatch, leftContext, rightContext, and $1-$9 now materialize lazy Substrings from the original match input when accessed. Keep RegExp.input as a separate slot from the match source so manual writes do not rewrite the last match state. Add coverage for that behavior and for rope-backed UTF-16 inputs.	2026-04-11 00:35:36 +02:00
Andreas Kling	8b8136b480	LibJS: Use Substring in Intl.Segmenter Keep the primitive string that segment() creates alongside the UTF-16 buffer used by LibUnicode. Segment data objects can then return lazy Substring instances for "segment" and reuse the original PrimitiveString for "input" instead of copying both strings. Add a rope-backed UTF-16 segmenter test that exercises both containing() and iterator results.	2026-04-11 00:35:36 +02:00
Andreas Kling	a9bedc5a8d	LibJS: Use Substring for string slices Route the obvious substring-producing string operations through the new PrimitiveString substring factory. Direct indexing, at(), charAt(), slice(), substring(), substr(), and the plain-string split path can now return lazy JS::Substring values backed by the original string. Add runtime coverage for rope-backed string operations so these lazy string slices stay exercised across both ASCII and UTF-16 inputs.	2026-04-11 00:35:36 +02:00
Andreas Kling	f6f791969d	LibJS: Use Substring for regexp results Return JS::Substring objects from the builtin regexp exec and split paths instead of eagerly copying UTF-16 slices into new strings. Matches, captures, and split pieces can now point back at the original input until someone asks for the string contents. Add focused runtime coverage for UTF-16 captures and regex split captures so these lazy slices stay exercised.	2026-04-11 00:35:36 +02:00
Andreas Kling	1182250414	LibJS: Add deferred PrimitiveString substrings Introduce JS::Substring as a lazily materialized PrimitiveString variant that stores an originating string plus a UTF-16 offset and length. This makes substring creation cheap while still reifying to a normal string when character data is requested. Track which short strings actually live in the VM caches so lazily resolved ropes and substrings do not evict unrelated cached strings when they are finalized. Add focused unit tests for nested ranges, rope-backed substrings, surrogate boundaries, and cache behavior.	2026-04-11 00:35:36 +02:00
Andreas Kling	879ac36e45	LibJS: Cache stable for-in iteration at bytecode sites Cache the flattened enumerable key snapshot for each `for..in` site and reuse a `PropertyNameIterator` when the receiver shape, dictionary generation, indexed storage kind and length, prototype chain validity, and magical-length state still match. Handle packed indexed receivers as well as plain named-property objects. Teach `ObjectPropertyIteratorNext` in `asmint.asm` to return cached property values directly and to fall back to the slow iterator logic when any guard fails. Treat arrays' hidden non-enumerable `length` property as a visited name for for-in shadowing, and include the receiver's magical-length state in the cache key so arrays and plain objects do not share snapshots. Add `test-js` and `test-js-bytecode` coverage for mixed numeric and named keys, packed receiver transitions, re-entry, iterator reuse, GC retention, array length shadowing, and same-site cache reuse.	2026-04-10 15:12:53 +02:00
Andreas Kling	4c1e2222df	LibJS: Fast-path safe writes into holey array holes Teach the asm PutByValue path to materialize in-bounds holey array elements directly when the receiver is a normal extensible Array with the default prototype chain and no indexed interference. This avoids bouncing through generic property setting while preserving the lazy holey length model. Keep the fast path narrow so inherited setters, inherited non-writable properties, and non-extensible arrays still fall back to the generic semantics. Add regression coverage for those cases alongside the large holey array stress tests.	2026-04-09 20:06:42 +02:00
Andreas Kling	da1c943161	LibJS: Make holey array lengths lazy Treat setting a large array length as a logical length change instead of forcing dictionary indexed storage or materializing every hole up front. This keeps dense fills on Array(length) on the holey indexed path and only falls back to sparse storage when later writes actually create a large realized gap. The asm indexed get/put fast paths assumed holey arrays always had a materialized backing store. Guard those paths with a capacity check so lazy holey arrays fall back safely until an index has been realized. Add regression coverage for very large holey arrays and for densely filling a large holey array after pre-sizing it with Array(length).	2026-04-09 20:06:42 +02:00
mikiubo	afc0f8b495	LibRegex: Use Unicode ID_Start/ID_Continue for named group names Switch to LibUnicode’s ICU-backed functions. Keep the explicit checks for '$', '_', U+200C, and U+200D that ECMAScript requires on top of the Unicode properties. Add test coverage for both the newly accepted case and regression guards for cases that must continue to work.	2026-04-08 07:31:54 -04:00
Shannon Booth	f27bc38aa7	Everywhere: Remove ShadowRealm support The proposal has not seemed to progress for a while, and there is a open issue about module imports which breaks HTML integration. While we could probably make an AD-HOC change to fix that issue, it is deep enough in the JS engine that I am not particularly keen on making that change. Until other browsers begin to make positive signals about shipping ShadowRealms, let's remove our implementation for now. There is still some cleanup that can be done with regard to the HTML integration, but there are a few more items that need to be untangled there.	2026-04-05 13:57:58 +02:00
mikiubo	f84edd8173	LibRegex: Fix legacy backreference fallback digit 8 or 9 When a multi-digit decimal escape like \81 exceeds the total capture group count in non-Unicode mode, the parser falls back to legacy octal reinterpretation. However, digits '8' and '9' are not valid in octal (base 8), so passing them to parse_legacy_octal() caused an unwrap() panic on None from char::to_digit(8). Treat '8' and '9' as literal characters in the fallback path, matching the behavior already present for the non-backreference.	2026-04-04 12:12:00 +02:00
Shannon Booth	adabc5cedb	LibJS: Handle empty UTF-16 strings in Rust FFI Treat zero length UTF-16 slices from Rust as empty views at the FFI boundary instead of assuming a non null backing pointer. Add a regression test which crashed before these changes. Fixes a crash loading github.com/ladybirdbrowser/ladybird.	2026-03-31 22:33:36 +02:00
Andreas Kling	201e615aad	LibRegex: Preserve set-op direction in backward /v matches Unicode-set intersection and subtraction always lowered their post-consumption checks as lookbehinds. That is correct while the outer matcher runs forward, but inside lookbehind the consumed text sits to the right of the current position, so the checks must flip to lookahead instead. Because we always looked left, patterns like `(?<=[[^A-Z]--[A-Z]])P{N}` and the reported fuzz case missed matches whenever the character before the consumed one changed the set-operation result. Preserve the surrounding match direction when compiling those checks, and add coverage for reduced subtraction and intersection cases plus the original regression.	2026-03-31 15:59:04 +02:00
Andreas Kling	e0de4ef33e	LibRegex: Reject negated /v classes that contain strings Negated unicode-set classes are only valid when every member is single-code-point. We already rejected direct string-valued members such as `q{ab}` and `p{RGI_Emoji_Flag_Sequence}` inside `[^...]`, but nested class-set operands could still smuggle them through, so patterns like `[^[[p{Emoji_Keycap_Sequence}]]]` and the reported fuzzed literal compiled instead of throwing. Validate nested class-set expressions after parsing and reject only the negated `/v` classes whose resulting multi-code-point strings are still non-empty. Track the exact string members contributed by string literals, string properties, and nested classes so intersections and subtractions can eliminate them before the negated-class check runs. Add constructor and literal coverage for the reduced nested-string cases, the original regression, and valid negated set operations that remove every string member.	2026-03-31 15:59:04 +02:00
Andreas Kling	6347827eb8	LibJS: Retry Unicode low-surrogate lastIndex positions RegExpBuiltinExec used to snap any Unicode lastIndex that landed on a low surrogate back to the start of the pair. That matched `/😀/u`, but it skipped valid empty matches when the original low-surrogate position was itself matchable, such as `/p{Script=Cyrillic}?(?<!\D)/v` on `"A😘"` and the longer fuzzed global case. Try the snapped position first, then retry the original lastIndex when the snapped match fails. Only keep that second result when it is empty at the original low-surrogate position, so consuming /u and /v matches still cannot split a surrogate pair. In the Rust VM, treat backward Unicode matches that start between surrogate halves as having no complete code point to their left, which matches V8's lookbehind behavior for those positions. Add reduced coverage for both low-surrogate exec cases, the original global match count regression, and the consuming-match retry regression.	2026-03-31 15:59:04 +02:00
Andreas Kling	33f9d464de	LibRegex: Preserve negated class direction in lookbehind Compile the synthetic assertion for negated classes in the same direction as the surrounding matcher. We were hardcoding a lookahead for `[^...]`, so lookbehind checked the wrong side of the current position and missed valid `/v` matches such as `(?<=[^\p{Emoji}])2`. Apply the same fix to unicode set classes, since they use the same negative-lookaround-plus-`AnyChar` lowering for complements. Add reduced `RegExp.js` coverage for both `[^\p{Emoji}]` and `[[^\p{Emoji}]]` in lookbehind, plus the original complex `/gv` regression.	2026-03-31 15:59:04 +02:00
Andreas Kling	c828c87408	LibRegex: Leave suffix minima for repeated simple loops Repeated simple loops like "a+".repeat(100) compile to a chain of greedy loop instructions. When one loop failed, the VM only knew how to give back one character at a time unless the next instruction was a literal char, so V8's regexp-fallback case ran into the backtrack limit instead of finding the obvious match. When a greedy simple loop is followed by more loops for the same matcher, sum their minimum counts and backtrack far enough to leave the missing suffix in one step. If that suffix is already available to the right, still give back one character so the VM makes progress instead of reusing the same greedy state forever. The RegExp runtime test now covers the Chromium regexp-fallback case through exec(), global exec(), and both replace() paths, plus bounded same-matcher chains where the suffix minimum is partly missing or already available.	2026-03-31 15:59:04 +02:00
Andreas Kling	f24bdb9f94	LibRegex: Honor wrapped start anchors in search hints The VM only marked patterns as anchored when the first real instruction was AssertStart. That missed anchors hidden behind capture setup or a leading positive lookahead, so patterns like /(^bar)/ and /(?=^bar)\w+/ fell back to whole-subject scanning. Teach the hint analysis to see through the non-consuming wrappers we emit ahead of a leading ^, but still run the literal prefilters before anchored and sticky VM attempts. Missing required literals should stay cheap no-matches instead of running the full backtracking VM and raising the step limit. The RegExp runtime test now covers the Chromium ascii-regexp-subject case on a long ASCII input and anchored, sticky, and global no-match cases where the required literal is absent.	2026-03-31 15:59:04 +02:00
Andreas Kling	c12647fc37	LibRegex: Clamp braced quantifier bounds to 2^31 - 1 Browsers clamp braced quantifier bounds above 2^31 - 1 before checking whether {min,max} is in order. The parser still kept values up to u32::MAX, so patterns like {2147483648,2147483647} were rejected even though both bounds should collapse to the same limit. Clamp parsed braced quantifier bounds to 2^31 - 1 as they are read. This keeps the existing acceptance of huge exact and open-ended quantifiers and makes the constructor and regex literal paths agree with other engines on the out-of-order edge cases. The RegExp runtime and syntax tests now cover accepted huge quantifiers, clamped order validation, and huge literal forms. The reported constructor and literal cases also match other engines.	2026-03-31 15:59:04 +02:00
Andreas Kling	87b22d0c04	LibRegex: Compare set operands by exact string length Unicode set intersection and subtraction were compiled by matching one operand and then checking the others with lookbehind. That let a longer string operand reject a shorter match whenever the longer string happened to end at the same position. Group unicode set operands by exact match length and compile each length class separately, longest first. This keeps longest-match semantics for unions while making intersection and subtraction compare only strings of the same length. The new RegExp runtime cases cover both the reported [a-z]--\q{abc} regression and the related intersection/subtraction mismatches, and they now agree with V8.	2026-03-31 15:59:04 +02:00
Andreas Kling	50b137f527	LibJS: Reject mixed surrogate forms in RegExp names Reject surrogate pairs in named group names unless both halves come from the same raw form. A literal surrogate half was being normalized into \uXXXX before LibRegex parsed the pattern, which let mixed literal and escaped forms sneak through. Validate surrogate handling on the UTF-16 pattern before normalization, but only treat \k<...> as a named backreference when the parser would do that too. Legacy regexes without named groups still use \k as an identity escape, so their literal text must not be rejected by the pre-scan. Add runtime and syntax tests for the mixed forms, the valid literal, fixed-width, and braced escape cases, and the legacy \k literals.	2026-03-31 15:59:04 +02:00
Undefine	b9e5c0e98c	Meta: Replace all use of LADYBIRD_PROJECT_ROOT with LADYBIRD_SOURCE_DIR Those two are equivalent so no need to have two different variables.	2026-03-29 13:59:11 -06:00
Andreas Kling	1f413da8e8	LibRegex: Anchor sticky matches at lastIndex Sticky regular expressions were still using the generic forward-search paths inside LibRegex and only enforcing the lastIndex check back in LibJS after a match had already been found. That made tokenizer-style sticky patterns spend most of their time scanning for later matches that would be thrown away. Route sticky exec() and test() through an anchored VM entry point that runs exactly once at the requested start position while keeping the existing literal-hint pruning. Add focused test-js coverage for sticky literals, alternations, classes, quantifiers, and WebIDL-style token patterns.	2026-03-29 16:06:57 +02:00

1 2 3 4 5 ...

410 Commits