Commit Graph

156 Commits

Author SHA1 Message Date
Andreas Kling
e65e85cb8c LibJS: Materialize arguments object for shorthand { arguments }
The parser only set `might_need_arguments_object` when an `arguments`
or `eval` Identifier went through `consume()`, but shorthand object
properties create the reference via `make_identifier()` directly. As
a result `function f() { return { arguments } }` allocated an
`arguments` local, never initialized it, and crashed at runtime when
the property was read.

Fall back to scope-driven detection: if scope analysis allocated a
non-lexical `arguments` local for the function, treat it as a real
arguments-object reference and emit `CreateArguments`. Skip the
fallback when a function declaration named `arguments` claims the
local, since that local belongs to the function, not the arguments
object.

Add a runtime test covering shorthand inside a free function and a
method, plus a regression test for `({ eval } = ...)` to confirm
destructuring assignment doesn't accidentally trigger arguments
materialization.
2026-04-27 08:04:11 +02:00
Andreas Kling
c1bc0cdfa9 LibJS: Allocate local variable indices in source order
The scope collector stored identifier_groups and variables in
HashMaps and then sorted them alphabetically before assigning local
register indices. The sorts existed only because HashMap iteration
order is non-deterministic; alphabetical was a stable choice for
comparing bytecode against the now-removed C++ port.

Switch both maps to indexmap::IndexMap so iteration follows the order
of first reference (= source order), and drop the alphabetical sorts.
Local indices now reflect declaration order, which matches what shows
up in bytecode dumps and is easier to read alongside the source.

Add a focused bytecode test using zebra/yak/aardvark to pin the new
allocation order; existing tests using let/var declarations have
their local indices renumbered to match.
2026-04-27 08:04:11 +02:00
Andreas Kling
010deec578 LibJS: Build functions_to_initialize in source order
ECMAScript hoisting keeps the LAST function declaration with a given
name. The Rust scope_collector and script GDI extraction implemented
this with a single reverse scan that pushed first-seen entries, which
left the resulting list in REVERSE source order. The C++ side then
iterated `m_functions_to_initialize.in_reverse()` to undo that.

Switch the Rust side to a two-pass forward scan that records the last
position per name and emits entries in source order, and drop the
matching `.in_reverse()` calls in Script.cpp and AbstractOperations.cpp.
Same hoisting semantics; NewFunction emission and global property
iteration order now follow the source.

The HashMap that tracks last positions is keyed on `SharedUtf16String`,
so each insert is a refcount bump on the AST's existing Rc instead of
a deep `Vec<u16>` clone.

Add bytecode tests at script and nested-function scope that exercise
multiple declarations and a duplicate name to pin the new ordering.
2026-04-27 08:04:11 +02:00
Andreas Kling
30394ece8d LibJS: Use natural source positions for parser-synthesized identifiers
The Rust parser used to copy several "rule_start"-derived positions
from the C++ implementation: every identifier inside a binding pattern
inherited the pattern's `[`/`{` position, every property identifier
after `.` inherited the period's position, every spread element
inherited the surrounding `[`/`{` position, and identifier-name
property keys inherited the object/class start position. This was
useful while comparing bytecode against the C++ port; with the C++
side gone, those quirks just hide the actual source positions in
source maps and devtools.

Drop the dedicated `binding_pattern_start` parser field and the
`ident_pos_override` parameter on `parse_property_key`, and capture
each identifier's own start position at the consume site.

Add an AST snapshot test that pins the new per-identifier positions
for object, array, nested, and parameter binding patterns.
2026-04-27 08:04:11 +02:00
Andreas Kling
cec0be6f3d LibJS: Replace in_property_key_context flag with explicit consume helper
The parser used to suppress the arguments/eval reference check via a
state flag that was set during the entire `parse_property_key` call.
That was over-broad: identifiers inside a computed property key like
`{ [arguments]: 1 }` are real references, but the flag silenced their
check too, leaving the function unmarked as needing the arguments
object. Reading the resulting property at runtime crashed.

Replace the flag with a `consume_property_key_token()` method used at
the specific consume sites for the property key token itself, so the
suppression is narrow. Inner consumes inside computed keys now go
through regular `consume()` and run the check normally.

Add a focused AST snapshot test covering plain, shorthand, computed,
binding-pattern, and method-name property-key cases.
2026-04-27 08:04:11 +02:00
Andreas Kling
141063e91d LibJS: Precompile top-level IIFEs off-thread
Mark direct calls to function expressions while generating top-level
Rust bytecode, then compile those functions before returning the
off-thread compilation result to WebContent.

The main thread still performs all VM and GC-backed materialization. It
now receives an already assembled executable for each eager IIFE and
attaches it to the SharedFunctionInstanceData while creating the parent
Executable. Nested functions owned by the eager executable remain lazy.

This targets large wrapper IIFEs that are invoked as soon as top-level
code starts running. Their bytecode generation now runs on the existing
script compilation worker instead of blocking the main thread on first
call.
2026-04-26 21:51:52 +02:00
Andreas Kling
4a7dc45b3f LibWeb+LibJS: Compile fetched top-level JS off-thread
Split Rust program compilation so code generation and assembly finish
before the main thread materializes GC-backed executable objects. The
new CompiledProgram handle owns the parsed program, generator state, and
bytecode until C++ consumes it on the main thread.

Wire WebContent script fetching through that handle for classic scripts
and modules. Syntax-error paths still return ParsedProgram, so existing
error reporting stays in place. Successful fetches now do top-level
codegen on the thread pool before deferred_invoke hands control back to
the main thread.

Executable creation, SharedFunctionInstanceData materialization, module
metadata extraction, and declaration data extraction still run on the
main thread where VM and GC access is valid.
2026-04-26 21:51:52 +02:00
Andreas Kling
0d120019df LibJS: Resolve VM constants during executable creation
Rust bytecode generation still reached into the VM to encode well-known
symbols and intrinsic abstract-operation functions as raw JS::Value
constants. That is not compatible with running top-level code generation
away from the main thread.

Keep those constants symbolic in the Rust constant pool instead. The C++
Executable materialization step now resolves them into real VM values
while it is already decoding the rest of the constant table on the main
thread.

This removes another VM dependency from Rust bytecode emission without
changing when the resulting constants become visible to the bytecode
interpreter.
2026-04-26 21:51:52 +02:00
Andreas Kling
56625c13d7 LibJS: Defer function data materialization
Rust bytecode generation currently creates SharedFunctionInstanceData
and ClassBlueprint GC objects as soon as nested functions and classes
are encountered. That keeps the whole code generation phase tied to
the main-thread VM and heap.

Record pending descriptors on the Generator instead, then materialize
those descriptors while creating the C++ Executable. This keeps the GC
allocation boundary exactly where it already belongs, but removes the
last direct function-data allocations from the codegen walk.

This is a preparatory step for compiling top-level bytecode off-thread
and only doing C++ materialization after returning to the main thread.
2026-04-26 21:51:52 +02:00
Andreas Kling
63d6ef1026 LibJS: Track nested function ids during Rust parsing
FunctionTable::extract_reachable() used to rediscover a function's
nested functions by walking the full body and parameter list during
bytecode generation. This is hot during page loading because creating
every lazy SFD pays for an extra structural AST traversal.

Record each parser-created function's direct child function ids while
parsing instead. Extraction can then recursively move that known
subtree without scanning the enclosing function again.

Keep the old structural scan for codegen-synthesized wrappers, such as
class field initializers, where no parser function context exists.
This preserves the sparse FunctionTable storage while making the common
extraction path proportional to the nested function count.
2026-04-26 21:51:52 +02:00
Timothy Flynn
5c34c7f554 Meta: Move python code generators to a subdirectory
Let's have a bit of organization here, rather than an ever-growing Meta
folder.
2026-04-23 07:31:19 -04:00
Andreas Kling
eb9432fcb8 LibJS: Preserve source positions in bytecode source maps
Carry full source positions through the Rust bytecode source map so
stack traces and other bytecode-backed source lookups can use them
directly.

This keeps exception-heavy paths from reconstructing line and column
information through SourceCode::range_from_offsets(), which can spend a
lot of time building SourceCode's position cache on first use.

We're trading some space for time here, but I believe it's worth it at
this tag, as this saves ~250ms of main thread time while loading
https://x.com/ on my Linux machine. :^)

Reading the stored Position out of the source map directly also exposed
two things masked by the old range_from_offsets() path: a latent
off-by-one in Lexer::new_at_offset() (its consume() bumped line_column
past the character at offset; only synthesize_binding_pattern() hit it),
and a (1,1) fallback in range_from_offsets() that fired whenever the
queried range reached EOF. Fix the lexer, then rebaseline both the
bytecode dump tests (no more spurious "1:1") and the destructuring AST
tests (binding-pattern identifiers now report their real columns).
2026-04-22 22:34:54 +02:00
Andreas Kling
51758f3022 LibJS: Make bytecode register allocator O(1)
Generator::allocate_register used to scan the free pool to find the
lowest-numbered register and then Vec::remove it, making every
allocation O(n) in the size of the pool. When loading https://x.com/
on my Linux machine, we spent ~800ms in this function alone!

This logic only existed to match the C++ register allocation ordering
while transitioning from C++ to Rust in the LibJS compiler, so now
we can simply get rid of it and make it instant. :^)

So drop the "always hand out the lowest-numbered free register" policy
and use the pool as a plain LIFO stack. Pushing and popping the back
of the Vec are both O(1), and peak register usage is unchanged since
the policy only affects which specific register gets reused, not how
aggressively.
2026-04-21 13:59:55 +02:00
Timothy Flynn
10ce847931 LibJS+LibUnicode: Use LibUnicode as appropriate for lexing JavaScript
Now that LibUnicode exports its character type APIs in Rust, we can use
them to lex identifiers and whitespace.

Fixes #8870.
2026-04-19 10:39:26 +02:00
Andrew Kaster
f26cb24751 Rust: Add a config file for rustfmt
This sets max_width to 120, which causes a lot of reformatting.
2026-04-18 08:05:47 -04:00
Andreas Kling
530f6fb05c LibJS: Fold nested Rust match conditionals
Move several let/const checks and the `instanceof` keyword check into
match guards.
2026-04-16 22:44:41 +02:00
Andreas Kling
c301a21960 LibJS: Skip preserving zero-argument call callees
The callee and this-value preservation copies only matter while later
argument expressions are still being evaluated. For zero-argument calls
there is nothing left to clobber them, so we can keep the original
operand and let the interpreter load it directly.

This removes the hot Mov arg0->reg pattern from zero-argument local
calls and reduces register pressure.
2026-04-13 18:29:43 +02:00
Andreas Kling
3a08f7b95f LibJS: Drop dead entry GetLexicalEnvironment loads
Teach the Rust bytecode generator to treat the synthetic entry
GetLexicalEnvironment as a removable prologue load.

We still model reg4 as the saved entry lexical environment during
codegen, but assemble() now deletes that load when no emitted
instruction refers to the saved environment register. This keeps the
semantics of unwinding and environment restoration intact while letting
empty functions and other simple bodies start at their first real
instruction.
2026-04-13 18:29:43 +02:00
Andreas Kling
3e18136a8c LibJS: Add a String.fromCharCode builtin opcode
Specialize only the fixed unary case in the bytecode generator and let
all other argument counts keep using the generic Call instruction. This
keeps the builtin bytecode simple while still covering the common fast
path.

The asm interpreter handles int32 inputs directly, applies the ToUint16
mask in-place, and reuses the VM's cached ASCII single-character
strings when the result is 7-bit representable. Non-ASCII single code
unit results stay on the dedicated builtin path via a small helper, and
the dedicated slow path still handles the generic cases.
2026-04-12 19:15:50 +02:00
Andreas Kling
7bc40bd54a LibJS: Add a charAt builtin bytecode fast path
Tag String.prototype.charAt as a builtin and emit a dedicated
bytecode instruction for non-computed calls.

The asm interpreter can then stay on the fast path when the
receiver is a primitive string with resident UTF-16 data and the
selected code unit is ASCII. In that case we can return the VM's
cached empty or single-character ASCII string directly.
2026-04-12 19:15:50 +02:00
Andreas Kling
d31750a43c LibJS: Add a charCodeAt builtin bytecode fast path
Teach builtin call specialization to recognize non-computed
member calls to charCodeAt() and emit a dedicated builtin opcode.
Mark String.prototype.charCodeAt with that builtin tag, then add
an asm interpreter fast path for primitive-string receivers whose
UTF-16 data is already resident.

The asm path handles both ASCII-backed and UTF-16-backed resident
strings, returns NaN for out-of-bounds Int32 indices, and falls
back to the generic builtin call path for everything else. This
keeps the optimistic case in asm while preserving the ordinary
method call semantics when charCodeAt has been replaced or when
string resolution would be required.
2026-04-12 19:15:50 +02:00
Andreas Kling
7ffe01cee3 LibJS: Split builtin call bytecode opcodes
Replace the generic CallBuiltin instruction with one opcode per
supported builtin call and make those instructions fixed-size by
arity. This removes the builtin dispatch sled in the asm
interpreter, gives each builtin a dedicated slow-path entry point,
and lets bytecode generation encode the callee shape directly.

Keep the existing handwritten asm fast paths for the Math builtins
that already benefit from them, while routing the other builtin
opcodes through their own C++ execute implementations. Build the
new opcode directly in Rust codegen, and keep the generic call
fallback when the original builtin function has been replaced.
2026-04-12 19:15:50 +02:00
RubenKelevra
a1ae402bb9 LibJS: Make folded non-decimal prefix parsing UTF-8-safe
Folded StringToNumber() and StringToBigInt() detected non-decimal
prefixes by slicing the string at byte offset 2. On UTF-8 input this
could split at a non-character boundary and panic.

To prevent this, we replace the byte-based split with ASCII prefix
stripping and preserve rejection of empty suffixes such as "0x", "0o",
and "0b" explicitly before parsing the remaining digits.

This makes non-decimal prefix folding UTF-8-safe and preserves the
expected invalid-result behavior for empty prefixed literals.

Tests:

Add regression coverage for folded StringToNumber() and StringToBigInt()
non-decimal prefix handling to validate the UTF-8 safety fix as
'string-to-number-and-bigint-non-decimal-prefixes.js'.

These tests ensure empty suffixes like "0x", "0o", and "0b" and
other invalid prefixed forms stay invalid, while valid prefixed
literals continue to be accepted.

Since we removed a byte-index split in folded
StringToNumber()/StringToBigInt() coercion that could panic when byte
index 2 landed inside a multi-byte UTF-8 scalar, we add regression
tests for representative panic-shape inputs to ensure these coercions
now return invalid results instead of crashing as
'string-to-number-and-bigint-utf8-boundary.js'
2026-04-12 17:36:51 +02:00
Andreas Kling
879ac36e45 LibJS: Cache stable for-in iteration at bytecode sites
Cache the flattened enumerable key snapshot for each `for..in` site and
reuse a `PropertyNameIterator` when the receiver shape, dictionary
generation, indexed storage kind and length, prototype chain
validity, and magical-length state still match.

Handle packed indexed receivers as well as plain named-property
objects. Teach `ObjectPropertyIteratorNext` in `asmint.asm` to return
cached property values directly and to fall back to the slow iterator
logic when any guard fails.

Treat arrays' hidden non-enumerable `length` property as a visited
name for for-in shadowing, and include the receiver's magical-length
state in the cache key so arrays and plain objects do not share
snapshots.

Add `test-js` and `test-js-bytecode` coverage for mixed numeric and
named keys, packed receiver transitions, re-entry, iterator reuse, GC
retention, array length shadowing, and same-site cache reuse.
2026-04-10 15:12:53 +02:00
Johan Dahlin
c7969858d3 LibJS: Use SharedUtf16String in pattern_bound_names
Replace .to_vec() in parse_variable_declaration with
token_identifier_name(). Destructuring patterns use Rc clones instead
of string copies.

WebsitesParse: -0.1% RSS (-4 MB)
WebsitesRun:   -0.1% RSS (-5 MB)
2026-04-08 16:41:25 +02:00
Johan Dahlin
1fbf2023c8 LibJS: Expand lexer identifier cache to all identifiers
Remove <=8 char and ASCII-only restrictions, use first_code_unit % 128
for indexing.

WebsitesParse: 1.10x faster, -0.4% RSS (-12 MB)
WebsitesRun:   1.06x faster, -0.3% RSS (-11 MB)
2026-04-08 16:41:25 +02:00
Johan Dahlin
36fe4d4af3 LibJS: Avoid alloc in ScopeRecord::variable() on cache hit
contains_key() before entry() skips Utf16String alloc when the variable
name already exists in the scope.
2026-04-08 16:41:25 +02:00
Johan Dahlin
1d5e976fbb LibJS: Use SharedUtf16String keys in identifier_groups
Entry key is now an Rc clone instead of allocating a fresh Utf16String
per register_identifier call.

WebsitesParse: -3.4% RSS (-104 MB)
WebsitesRun:   -3.0% RSS (-97 MB)
2026-04-08 16:41:25 +02:00
Johan Dahlin
d566a81b5c LibJS: Remove redundant name param from register_identifier
Use id.name (SharedUtf16String) directly, eliminating callers .to_vec()
allocations.

WebsitesParse: 1.04x faster
WebsitesRun:   1.05x faster
2026-04-08 16:41:25 +02:00
Johan Dahlin
6e33d36eb5 LibJS: Cache common identifier spellings in the lexer
Add SharedUtf16String (Rc<Utf16String>) for zero-copy sharing. Lexer
caches short ASCII identifiers in a direct-mapped table.

WebsitesParse: 1.03x faster, -5.1% RSS (-164 MB)
WebsitesRun:   1.05x faster, -4.7% RSS (-161 MB)
2026-04-08 16:41:25 +02:00
Andreas Kling
b23aa38546 AK: Adopt mimalloc v2 as main allocator
Use mimalloc for Ladybird-owned allocations without overriding malloc().
Route kmalloc(), kcalloc(), krealloc(), and kfree() through mimalloc,
and put the embedded Rust crates on the same allocator via a shared
shim in AK/kmalloc.cpp.

This also lets us drop kfree_sized(), since it no longer used its size
argument. StringData, Utf16StringData, JS object storage, Rust error
strings, and the CoreAudio playback helpers can all free their AK-backed
storage with plain kfree().

Sanitizer builds still use the system allocator. LeakSanitizer does not
reliably trace references stored in mimalloc-managed AK containers, so
static caches and other long-lived roots can look leaked. Pass the old
size into the Rust realloc shim so aligned fallback reallocations can
move posix_memalign-backed blocks safely.

Static builds still need a little linker help. macOS app binaries need
the Rust allocator entry points forced in from liblagom-ak.a, while
static ELF links can pull in identical allocator shim definitions from
multiple Rust staticlibs. Keep the Apple -u flags and allow those
duplicate shim symbols for LibJS and LibRegex links on Linux and BSD.
2026-04-08 09:57:53 +02:00
Andreas Kling
c167bfd50a Meta: Make Rust FFI headers reproducible
Teach import_rust_crate() to track RustFFI.h as a real build output,
and teach the relevant Rust build scripts to rerun when their FFI
inputs change.

Also keep a copy of RustFFI.h in Cargo's own OUT_DIR and restore the
configured FFI output from that cached copy after cargo rustc runs.
This fixes the case where Ninja knows the header is missing, reruns
the custom command, and Cargo exits without rerunning build.rs
because the crate itself is already up to date.

When Cargo leaves multiple hashed build-script outputs behind, pick
the newest root-output before restoring RustFFI.h so we do not copy a
stale header after Rust-side API changes.

Finally, track the remaining Rust-side inputs that could leave build
artifacts stale: LibUnicode and LibJS now rerun build.rs when src/
changes, and the asmintgen rule now depends on Cargo.lock, the
BytecodeDef path dependency, and newly added Rust source files.
2026-03-31 15:59:04 +02:00
Johan Dahlin
b542617e09 LibJS: Box StatementKind::ClassFieldInitializer variant 2026-03-28 11:55:41 +01:00
Johan Dahlin
057725c731 LibJS: Box StatementKind::FunctionDeclaration variant 2026-03-28 11:55:41 +01:00
Johan Dahlin
247b9a0a87 LibJS: Box StatementKind::UsingDeclaration variant 2026-03-28 11:55:41 +01:00
Johan Dahlin
0167c83b35 LibJS: Box StatementKind::VariableDeclaration variant 2026-03-28 11:55:41 +01:00
Johan Dahlin
938ec6ca18 LibJS: Box StatementKind::Labelled variant 2026-03-28 11:55:41 +01:00
Johan Dahlin
345dbab49a LibJS: Box StatementKind::With variant 2026-03-28 11:55:41 +01:00
Johan Dahlin
a36d7e5d88 LibJS: Box StatementKind::ForInOf variant 2026-03-28 11:55:41 +01:00
Johan Dahlin
64ea365379 LibJS: Box StatementKind::For variant 2026-03-28 11:55:41 +01:00
Johan Dahlin
31ae13eec6 LibJS: Box StatementKind::DoWhile variant 2026-03-28 11:55:41 +01:00
Johan Dahlin
f29b7ab579 LibJS: Box StatementKind::While variant 2026-03-28 11:55:41 +01:00
Johan Dahlin
3496379d09 LibJS: Box StatementKind::If variant 2026-03-28 11:55:41 +01:00
Johan Dahlin
5ab51b173d LibJS: Box ExpressionKind::Yield variant 2026-03-28 11:55:41 +01:00
Johan Dahlin
c62b7f2f87 LibJS: Box ExpressionKind::ImportCall variant 2026-03-28 11:55:41 +01:00
Johan Dahlin
15097fa8e0 LibJS: Box ExpressionKind::TaggedTemplateLiteral variant 2026-03-28 11:55:41 +01:00
Johan Dahlin
9d0d8129f4 LibJS: Box ExpressionKind::OptionalChain variant 2026-03-28 11:55:41 +01:00
Johan Dahlin
7262ab5880 LibJS: Box ExpressionKind::Member variant 2026-03-28 11:55:41 +01:00
Johan Dahlin
07e187ae6d LibJS: Box ExpressionKind::Conditional variant 2026-03-28 11:55:41 +01:00
Johan Dahlin
333ae7cc6d LibJS: Box ExpressionKind::Assignment variant 2026-03-28 11:55:41 +01:00