ladybird

mirror of https://github.com/LadybirdBrowser/ladybird synced 2026-05-10 17:12:41 +02:00

Author	SHA1	Message	Date
Andreas Kling	e65e85cb8c	LibJS: Materialize arguments object for shorthand `{ arguments }` The parser only set `might_need_arguments_object` when an `arguments` or `eval` Identifier went through `consume()`, but shorthand object properties create the reference via `make_identifier()` directly. As a result `function f() { return { arguments } }` allocated an `arguments` local, never initialized it, and crashed at runtime when the property was read. Fall back to scope-driven detection: if scope analysis allocated a non-lexical `arguments` local for the function, treat it as a real arguments-object reference and emit `CreateArguments`. Skip the fallback when a function declaration named `arguments` claims the local, since that local belongs to the function, not the arguments object. Add a runtime test covering shorthand inside a free function and a method, plus a regression test for `({ eval } = ...)` to confirm destructuring assignment doesn't accidentally trigger arguments materialization.	2026-04-27 08:04:11 +02:00
Andreas Kling	c1bc0cdfa9	LibJS: Allocate local variable indices in source order The scope collector stored identifier_groups and variables in HashMaps and then sorted them alphabetically before assigning local register indices. The sorts existed only because HashMap iteration order is non-deterministic; alphabetical was a stable choice for comparing bytecode against the now-removed C++ port. Switch both maps to indexmap::IndexMap so iteration follows the order of first reference (= source order), and drop the alphabetical sorts. Local indices now reflect declaration order, which matches what shows up in bytecode dumps and is easier to read alongside the source. Add a focused bytecode test using zebra/yak/aardvark to pin the new allocation order; existing tests using let/var declarations have their local indices renumbered to match.	2026-04-27 08:04:11 +02:00
Andreas Kling	010deec578	LibJS: Build functions_to_initialize in source order ECMAScript hoisting keeps the LAST function declaration with a given name. The Rust scope_collector and script GDI extraction implemented this with a single reverse scan that pushed first-seen entries, which left the resulting list in REVERSE source order. The C++ side then iterated `m_functions_to_initialize.in_reverse()` to undo that. Switch the Rust side to a two-pass forward scan that records the last position per name and emits entries in source order, and drop the matching `.in_reverse()` calls in Script.cpp and AbstractOperations.cpp. Same hoisting semantics; NewFunction emission and global property iteration order now follow the source. The HashMap that tracks last positions is keyed on `SharedUtf16String`, so each insert is a refcount bump on the AST's existing Rc instead of a deep `Vec<u16>` clone. Add bytecode tests at script and nested-function scope that exercise multiple declarations and a duplicate name to pin the new ordering.	2026-04-27 08:04:11 +02:00
Andreas Kling	30394ece8d	LibJS: Use natural source positions for parser-synthesized identifiers The Rust parser used to copy several "rule_start"-derived positions from the C++ implementation: every identifier inside a binding pattern inherited the pattern's `[`/`{` position, every property identifier after `.` inherited the period's position, every spread element inherited the surrounding `[`/`{` position, and identifier-name property keys inherited the object/class start position. This was useful while comparing bytecode against the C++ port; with the C++ side gone, those quirks just hide the actual source positions in source maps and devtools. Drop the dedicated `binding_pattern_start` parser field and the `ident_pos_override` parameter on `parse_property_key`, and capture each identifier's own start position at the consume site. Add an AST snapshot test that pins the new per-identifier positions for object, array, nested, and parameter binding patterns.	2026-04-27 08:04:11 +02:00
Andreas Kling	cec0be6f3d	LibJS: Replace in_property_key_context flag with explicit consume helper The parser used to suppress the arguments/eval reference check via a state flag that was set during the entire `parse_property_key` call. That was over-broad: identifiers inside a computed property key like `{ [arguments]: 1 }` are real references, but the flag silenced their check too, leaving the function unmarked as needing the arguments object. Reading the resulting property at runtime crashed. Replace the flag with a `consume_property_key_token()` method used at the specific consume sites for the property key token itself, so the suppression is narrow. Inner consumes inside computed keys now go through regular `consume()` and run the check normally. Add a focused AST snapshot test covering plain, shorthand, computed, binding-pattern, and method-name property-key cases.	2026-04-27 08:04:11 +02:00
Andreas Kling	141063e91d	LibJS: Precompile top-level IIFEs off-thread Mark direct calls to function expressions while generating top-level Rust bytecode, then compile those functions before returning the off-thread compilation result to WebContent. The main thread still performs all VM and GC-backed materialization. It now receives an already assembled executable for each eager IIFE and attaches it to the SharedFunctionInstanceData while creating the parent Executable. Nested functions owned by the eager executable remain lazy. This targets large wrapper IIFEs that are invoked as soon as top-level code starts running. Their bytecode generation now runs on the existing script compilation worker instead of blocking the main thread on first call.	2026-04-26 21:51:52 +02:00
Andreas Kling	4a7dc45b3f	LibWeb+LibJS: Compile fetched top-level JS off-thread Split Rust program compilation so code generation and assembly finish before the main thread materializes GC-backed executable objects. The new CompiledProgram handle owns the parsed program, generator state, and bytecode until C++ consumes it on the main thread. Wire WebContent script fetching through that handle for classic scripts and modules. Syntax-error paths still return ParsedProgram, so existing error reporting stays in place. Successful fetches now do top-level codegen on the thread pool before deferred_invoke hands control back to the main thread. Executable creation, SharedFunctionInstanceData materialization, module metadata extraction, and declaration data extraction still run on the main thread where VM and GC access is valid.	2026-04-26 21:51:52 +02:00
Andreas Kling	0d120019df	LibJS: Resolve VM constants during executable creation Rust bytecode generation still reached into the VM to encode well-known symbols and intrinsic abstract-operation functions as raw JS::Value constants. That is not compatible with running top-level code generation away from the main thread. Keep those constants symbolic in the Rust constant pool instead. The C++ Executable materialization step now resolves them into real VM values while it is already decoding the rest of the constant table on the main thread. This removes another VM dependency from Rust bytecode emission without changing when the resulting constants become visible to the bytecode interpreter.	2026-04-26 21:51:52 +02:00
Andreas Kling	56625c13d7	LibJS: Defer function data materialization Rust bytecode generation currently creates SharedFunctionInstanceData and ClassBlueprint GC objects as soon as nested functions and classes are encountered. That keeps the whole code generation phase tied to the main-thread VM and heap. Record pending descriptors on the Generator instead, then materialize those descriptors while creating the C++ Executable. This keeps the GC allocation boundary exactly where it already belongs, but removes the last direct function-data allocations from the codegen walk. This is a preparatory step for compiling top-level bytecode off-thread and only doing C++ materialization after returning to the main thread.	2026-04-26 21:51:52 +02:00
Andreas Kling	63d6ef1026	LibJS: Track nested function ids during Rust parsing FunctionTable::extract_reachable() used to rediscover a function's nested functions by walking the full body and parameter list during bytecode generation. This is hot during page loading because creating every lazy SFD pays for an extra structural AST traversal. Record each parser-created function's direct child function ids while parsing instead. Extraction can then recursively move that known subtree without scanning the enclosing function again. Keep the old structural scan for codegen-synthesized wrappers, such as class field initializers, where no parser function context exists. This preserves the sparse FunctionTable storage while making the common extraction path proportional to the nested function count.	2026-04-26 21:51:52 +02:00
Timothy Flynn	5c34c7f554	Meta: Move python code generators to a subdirectory Let's have a bit of organization here, rather than an ever-growing Meta folder.	2026-04-23 07:31:19 -04:00
Andreas Kling	eb9432fcb8	LibJS: Preserve source positions in bytecode source maps Carry full source positions through the Rust bytecode source map so stack traces and other bytecode-backed source lookups can use them directly. This keeps exception-heavy paths from reconstructing line and column information through SourceCode::range_from_offsets(), which can spend a lot of time building SourceCode's position cache on first use. We're trading some space for time here, but I believe it's worth it at this tag, as this saves ~250ms of main thread time while loading https://x.com/ on my Linux machine. :^) Reading the stored Position out of the source map directly also exposed two things masked by the old range_from_offsets() path: a latent off-by-one in Lexer::new_at_offset() (its consume() bumped line_column past the character at offset; only synthesize_binding_pattern() hit it), and a (1,1) fallback in range_from_offsets() that fired whenever the queried range reached EOF. Fix the lexer, then rebaseline both the bytecode dump tests (no more spurious "1:1") and the destructuring AST tests (binding-pattern identifiers now report their real columns).	2026-04-22 22:34:54 +02:00
Andreas Kling	51758f3022	LibJS: Make bytecode register allocator O(1) Generator::allocate_register used to scan the free pool to find the lowest-numbered register and then Vec::remove it, making every allocation O(n) in the size of the pool. When loading https://x.com/ on my Linux machine, we spent ~800ms in this function alone! This logic only existed to match the C++ register allocation ordering while transitioning from C++ to Rust in the LibJS compiler, so now we can simply get rid of it and make it instant. :^) So drop the "always hand out the lowest-numbered free register" policy and use the pool as a plain LIFO stack. Pushing and popping the back of the Vec are both O(1), and peak register usage is unchanged since the policy only affects which specific register gets reused, not how aggressively.	2026-04-21 13:59:55 +02:00
Timothy Flynn	10ce847931	LibJS+LibUnicode: Use LibUnicode as appropriate for lexing JavaScript Now that LibUnicode exports its character type APIs in Rust, we can use them to lex identifiers and whitespace. Fixes #8870.	2026-04-19 10:39:26 +02:00
Andrew Kaster	f26cb24751	Rust: Add a config file for rustfmt This sets max_width to 120, which causes a lot of reformatting.	2026-04-18 08:05:47 -04:00
Andreas Kling	530f6fb05c	LibJS: Fold nested Rust match conditionals Move several let/const checks and the `instanceof` keyword check into match guards.	2026-04-16 22:44:41 +02:00
Andreas Kling	c301a21960	LibJS: Skip preserving zero-argument call callees The callee and this-value preservation copies only matter while later argument expressions are still being evaluated. For zero-argument calls there is nothing left to clobber them, so we can keep the original operand and let the interpreter load it directly. This removes the hot Mov arg0->reg pattern from zero-argument local calls and reduces register pressure.	2026-04-13 18:29:43 +02:00
Andreas Kling	3a08f7b95f	LibJS: Drop dead entry GetLexicalEnvironment loads Teach the Rust bytecode generator to treat the synthetic entry GetLexicalEnvironment as a removable prologue load. We still model reg4 as the saved entry lexical environment during codegen, but assemble() now deletes that load when no emitted instruction refers to the saved environment register. This keeps the semantics of unwinding and environment restoration intact while letting empty functions and other simple bodies start at their first real instruction.	2026-04-13 18:29:43 +02:00
Andreas Kling	3e18136a8c	LibJS: Add a String.fromCharCode builtin opcode Specialize only the fixed unary case in the bytecode generator and let all other argument counts keep using the generic Call instruction. This keeps the builtin bytecode simple while still covering the common fast path. The asm interpreter handles int32 inputs directly, applies the ToUint16 mask in-place, and reuses the VM's cached ASCII single-character strings when the result is 7-bit representable. Non-ASCII single code unit results stay on the dedicated builtin path via a small helper, and the dedicated slow path still handles the generic cases.	2026-04-12 19:15:50 +02:00
Andreas Kling	7bc40bd54a	LibJS: Add a charAt builtin bytecode fast path Tag String.prototype.charAt as a builtin and emit a dedicated bytecode instruction for non-computed calls. The asm interpreter can then stay on the fast path when the receiver is a primitive string with resident UTF-16 data and the selected code unit is ASCII. In that case we can return the VM's cached empty or single-character ASCII string directly.	2026-04-12 19:15:50 +02:00
Andreas Kling	d31750a43c	LibJS: Add a charCodeAt builtin bytecode fast path Teach builtin call specialization to recognize non-computed member calls to charCodeAt() and emit a dedicated builtin opcode. Mark String.prototype.charCodeAt with that builtin tag, then add an asm interpreter fast path for primitive-string receivers whose UTF-16 data is already resident. The asm path handles both ASCII-backed and UTF-16-backed resident strings, returns NaN for out-of-bounds Int32 indices, and falls back to the generic builtin call path for everything else. This keeps the optimistic case in asm while preserving the ordinary method call semantics when charCodeAt has been replaced or when string resolution would be required.	2026-04-12 19:15:50 +02:00
Andreas Kling	7ffe01cee3	LibJS: Split builtin call bytecode opcodes Replace the generic CallBuiltin instruction with one opcode per supported builtin call and make those instructions fixed-size by arity. This removes the builtin dispatch sled in the asm interpreter, gives each builtin a dedicated slow-path entry point, and lets bytecode generation encode the callee shape directly. Keep the existing handwritten asm fast paths for the Math builtins that already benefit from them, while routing the other builtin opcodes through their own C++ execute implementations. Build the new opcode directly in Rust codegen, and keep the generic call fallback when the original builtin function has been replaced.	2026-04-12 19:15:50 +02:00
RubenKelevra	a1ae402bb9	LibJS: Make folded non-decimal prefix parsing UTF-8-safe Folded StringToNumber() and StringToBigInt() detected non-decimal prefixes by slicing the string at byte offset 2. On UTF-8 input this could split at a non-character boundary and panic. To prevent this, we replace the byte-based split with ASCII prefix stripping and preserve rejection of empty suffixes such as "0x", "0o", and "0b" explicitly before parsing the remaining digits. This makes non-decimal prefix folding UTF-8-safe and preserves the expected invalid-result behavior for empty prefixed literals. Tests: Add regression coverage for folded StringToNumber() and StringToBigInt() non-decimal prefix handling to validate the UTF-8 safety fix as 'string-to-number-and-bigint-non-decimal-prefixes.js'. These tests ensure empty suffixes like "0x", "0o", and "0b" and other invalid prefixed forms stay invalid, while valid prefixed literals continue to be accepted. Since we removed a byte-index split in folded StringToNumber()/StringToBigInt() coercion that could panic when byte index 2 landed inside a multi-byte UTF-8 scalar, we add regression tests for representative panic-shape inputs to ensure these coercions now return invalid results instead of crashing as 'string-to-number-and-bigint-utf8-boundary.js'	2026-04-12 17:36:51 +02:00
Andreas Kling	879ac36e45	LibJS: Cache stable for-in iteration at bytecode sites Cache the flattened enumerable key snapshot for each `for..in` site and reuse a `PropertyNameIterator` when the receiver shape, dictionary generation, indexed storage kind and length, prototype chain validity, and magical-length state still match. Handle packed indexed receivers as well as plain named-property objects. Teach `ObjectPropertyIteratorNext` in `asmint.asm` to return cached property values directly and to fall back to the slow iterator logic when any guard fails. Treat arrays' hidden non-enumerable `length` property as a visited name for for-in shadowing, and include the receiver's magical-length state in the cache key so arrays and plain objects do not share snapshots. Add `test-js` and `test-js-bytecode` coverage for mixed numeric and named keys, packed receiver transitions, re-entry, iterator reuse, GC retention, array length shadowing, and same-site cache reuse.	2026-04-10 15:12:53 +02:00
Johan Dahlin	c7969858d3	LibJS: Use SharedUtf16String in pattern_bound_names Replace .to_vec() in parse_variable_declaration with token_identifier_name(). Destructuring patterns use Rc clones instead of string copies. WebsitesParse: -0.1% RSS (-4 MB) WebsitesRun: -0.1% RSS (-5 MB)	2026-04-08 16:41:25 +02:00
Johan Dahlin	1fbf2023c8	LibJS: Expand lexer identifier cache to all identifiers Remove <=8 char and ASCII-only restrictions, use first_code_unit % 128 for indexing. WebsitesParse: 1.10x faster, -0.4% RSS (-12 MB) WebsitesRun: 1.06x faster, -0.3% RSS (-11 MB)	2026-04-08 16:41:25 +02:00
Johan Dahlin	36fe4d4af3	LibJS: Avoid alloc in ScopeRecord::variable() on cache hit contains_key() before entry() skips Utf16String alloc when the variable name already exists in the scope.	2026-04-08 16:41:25 +02:00
Johan Dahlin	1d5e976fbb	LibJS: Use SharedUtf16String keys in identifier_groups Entry key is now an Rc clone instead of allocating a fresh Utf16String per register_identifier call. WebsitesParse: -3.4% RSS (-104 MB) WebsitesRun: -3.0% RSS (-97 MB)	2026-04-08 16:41:25 +02:00
Johan Dahlin	d566a81b5c	LibJS: Remove redundant name param from register_identifier Use id.name (SharedUtf16String) directly, eliminating callers .to_vec() allocations. WebsitesParse: 1.04x faster WebsitesRun: 1.05x faster	2026-04-08 16:41:25 +02:00
Johan Dahlin	6e33d36eb5	LibJS: Cache common identifier spellings in the lexer Add SharedUtf16String (Rc<Utf16String>) for zero-copy sharing. Lexer caches short ASCII identifiers in a direct-mapped table. WebsitesParse: 1.03x faster, -5.1% RSS (-164 MB) WebsitesRun: 1.05x faster, -4.7% RSS (-161 MB)	2026-04-08 16:41:25 +02:00
Andreas Kling	b23aa38546	AK: Adopt mimalloc v2 as main allocator Use mimalloc for Ladybird-owned allocations without overriding malloc(). Route kmalloc(), kcalloc(), krealloc(), and kfree() through mimalloc, and put the embedded Rust crates on the same allocator via a shared shim in AK/kmalloc.cpp. This also lets us drop kfree_sized(), since it no longer used its size argument. StringData, Utf16StringData, JS object storage, Rust error strings, and the CoreAudio playback helpers can all free their AK-backed storage with plain kfree(). Sanitizer builds still use the system allocator. LeakSanitizer does not reliably trace references stored in mimalloc-managed AK containers, so static caches and other long-lived roots can look leaked. Pass the old size into the Rust realloc shim so aligned fallback reallocations can move posix_memalign-backed blocks safely. Static builds still need a little linker help. macOS app binaries need the Rust allocator entry points forced in from liblagom-ak.a, while static ELF links can pull in identical allocator shim definitions from multiple Rust staticlibs. Keep the Apple -u flags and allow those duplicate shim symbols for LibJS and LibRegex links on Linux and BSD.	2026-04-08 09:57:53 +02:00
Andreas Kling	c167bfd50a	Meta: Make Rust FFI headers reproducible Teach import_rust_crate() to track RustFFI.h as a real build output, and teach the relevant Rust build scripts to rerun when their FFI inputs change. Also keep a copy of RustFFI.h in Cargo's own OUT_DIR and restore the configured FFI output from that cached copy after cargo rustc runs. This fixes the case where Ninja knows the header is missing, reruns the custom command, and Cargo exits without rerunning build.rs because the crate itself is already up to date. When Cargo leaves multiple hashed build-script outputs behind, pick the newest root-output before restoring RustFFI.h so we do not copy a stale header after Rust-side API changes. Finally, track the remaining Rust-side inputs that could leave build artifacts stale: LibUnicode and LibJS now rerun build.rs when src/ changes, and the asmintgen rule now depends on Cargo.lock, the BytecodeDef path dependency, and newly added Rust source files.	2026-03-31 15:59:04 +02:00
Johan Dahlin	b542617e09	LibJS: Box StatementKind::ClassFieldInitializer variant	2026-03-28 11:55:41 +01:00
Johan Dahlin	057725c731	LibJS: Box StatementKind::FunctionDeclaration variant	2026-03-28 11:55:41 +01:00
Johan Dahlin	247b9a0a87	LibJS: Box StatementKind::UsingDeclaration variant	2026-03-28 11:55:41 +01:00
Johan Dahlin	0167c83b35	LibJS: Box StatementKind::VariableDeclaration variant	2026-03-28 11:55:41 +01:00
Johan Dahlin	938ec6ca18	LibJS: Box StatementKind::Labelled variant	2026-03-28 11:55:41 +01:00
Johan Dahlin	345dbab49a	LibJS: Box StatementKind::With variant	2026-03-28 11:55:41 +01:00
Johan Dahlin	a36d7e5d88	LibJS: Box StatementKind::ForInOf variant	2026-03-28 11:55:41 +01:00
Johan Dahlin	64ea365379	LibJS: Box StatementKind::For variant	2026-03-28 11:55:41 +01:00
Johan Dahlin	31ae13eec6	LibJS: Box StatementKind::DoWhile variant	2026-03-28 11:55:41 +01:00
Johan Dahlin	f29b7ab579	LibJS: Box StatementKind::While variant	2026-03-28 11:55:41 +01:00
Johan Dahlin	3496379d09	LibJS: Box StatementKind::If variant	2026-03-28 11:55:41 +01:00
Johan Dahlin	5ab51b173d	LibJS: Box ExpressionKind::Yield variant	2026-03-28 11:55:41 +01:00
Johan Dahlin	c62b7f2f87	LibJS: Box ExpressionKind::ImportCall variant	2026-03-28 11:55:41 +01:00
Johan Dahlin	15097fa8e0	LibJS: Box ExpressionKind::TaggedTemplateLiteral variant	2026-03-28 11:55:41 +01:00
Johan Dahlin	9d0d8129f4	LibJS: Box ExpressionKind::OptionalChain variant	2026-03-28 11:55:41 +01:00
Johan Dahlin	7262ab5880	LibJS: Box ExpressionKind::Member variant	2026-03-28 11:55:41 +01:00
Johan Dahlin	07e187ae6d	LibJS: Box ExpressionKind::Conditional variant	2026-03-28 11:55:41 +01:00
Johan Dahlin	333ae7cc6d	LibJS: Box ExpressionKind::Assignment variant	2026-03-28 11:55:41 +01:00

1 2 3 4

156 Commits