Commit Graph

71 Commits

Author SHA1 Message Date
Andreas Kling
51758f3022 LibJS: Make bytecode register allocator O(1)
Generator::allocate_register used to scan the free pool to find the
lowest-numbered register and then Vec::remove it, making every
allocation O(n) in the size of the pool. When loading https://x.com/
on my Linux machine, we spent ~800ms in this function alone!

This logic only existed to match the C++ register allocation ordering
while transitioning from C++ to Rust in the LibJS compiler, so now
we can simply get rid of it and make it instant. :^)

So drop the "always hand out the lowest-numbered free register" policy
and use the pool as a plain LIFO stack. Pushing and popping the back
of the Vec are both O(1), and peak register usage is unchanged since
the policy only affects which specific register gets reused, not how
aggressively.
2026-04-21 13:59:55 +02:00
Andreas Kling
c301a21960 LibJS: Skip preserving zero-argument call callees
The callee and this-value preservation copies only matter while later
argument expressions are still being evaluated. For zero-argument calls
there is nothing left to clobber them, so we can keep the original
operand and let the interpreter load it directly.

This removes the hot Mov arg0->reg pattern from zero-argument local
calls and reduces register pressure.
2026-04-13 18:29:43 +02:00
Andreas Kling
3a08f7b95f LibJS: Drop dead entry GetLexicalEnvironment loads
Teach the Rust bytecode generator to treat the synthetic entry
GetLexicalEnvironment as a removable prologue load.

We still model reg4 as the saved entry lexical environment during
codegen, but assemble() now deletes that load when no emitted
instruction refers to the saved environment register. This keeps the
semantics of unwinding and environment restoration intact while letting
empty functions and other simple bodies start at their first real
instruction.
2026-04-13 18:29:43 +02:00
Andreas Kling
3e18136a8c LibJS: Add a String.fromCharCode builtin opcode
Specialize only the fixed unary case in the bytecode generator and let
all other argument counts keep using the generic Call instruction. This
keeps the builtin bytecode simple while still covering the common fast
path.

The asm interpreter handles int32 inputs directly, applies the ToUint16
mask in-place, and reuses the VM's cached ASCII single-character
strings when the result is 7-bit representable. Non-ASCII single code
unit results stay on the dedicated builtin path via a small helper, and
the dedicated slow path still handles the generic cases.
2026-04-12 19:15:50 +02:00
Andreas Kling
7bc40bd54a LibJS: Add a charAt builtin bytecode fast path
Tag String.prototype.charAt as a builtin and emit a dedicated
bytecode instruction for non-computed calls.

The asm interpreter can then stay on the fast path when the
receiver is a primitive string with resident UTF-16 data and the
selected code unit is ASCII. In that case we can return the VM's
cached empty or single-character ASCII string directly.
2026-04-12 19:15:50 +02:00
Andreas Kling
879ac36e45 LibJS: Cache stable for-in iteration at bytecode sites
Cache the flattened enumerable key snapshot for each `for..in` site and
reuse a `PropertyNameIterator` when the receiver shape, dictionary
generation, indexed storage kind and length, prototype chain
validity, and magical-length state still match.

Handle packed indexed receivers as well as plain named-property
objects. Teach `ObjectPropertyIteratorNext` in `asmint.asm` to return
cached property values directly and to fall back to the slow iterator
logic when any guard fails.

Treat arrays' hidden non-enumerable `length` property as a visited
name for for-in shadowing, and include the receiver's magical-length
state in the cache key so arrays and plain objects do not share
snapshots.

Add `test-js` and `test-js-bytecode` coverage for mixed numeric and
named keys, packed receiver transitions, re-entry, iterator reuse, GC
retention, array length shadowing, and same-site cache reuse.
2026-04-10 15:12:53 +02:00
InvalidUsernameException
61e6dbe4e7 LibJS: Copy object of member expression to preserve evaluation order
Noticed this pattern when reading some minified JS while debugging a
seemingly unrelated problem and immediately got suspicious because of my
earlier, similar fixes.
2026-03-22 15:40:38 +01:00
Andreas Kling
bb0acb54ae LibJS: Optimize x >> 0 to ToInt32 in bytecode codegen
x >> 0 is a common JS idiom equivalent to ToInt32(x). We already had
this optimization for x | 0, now do it for right shift by zero as well.

This allows the asmint handler for ToInt32 to run instead of the more
expensive RightShift handler, which wastes time loading and checking the
rhs operand and performing a shift by zero.
2026-03-20 00:51:23 -05:00
Andreas Kling
02b0746676 LibJS: Deduplicate double constants in bytecode generator
Add a deduplication cache for double constants, matching the existing
approach for int32 and string constants. Multiple references to the
same floating-point value now share a single constant table entry.
2026-03-20 00:51:23 -05:00
Andreas Kling
144ab69715 LibJS: Remove C++ pipeline compatibility hacks from Rust codegen
Now that the C++ bytecode pipeline has been removed, we no longer
need to match its register allocation or block layout. This removes:

- All manual drop() calls that existed solely to match C++ register
  lifetimes, replaced with scope blocks to naturally limit register
  lifetimes without increasing register pressure.

- The unnecessary saved_property copy in update expressions. The
  property register is now used directly since emit_update_op
  doesn't evaluate user expressions that could mutate it. The copy
  is retained in compound/logical assignments where the RHS can
  mutate the property variable (e.g. a[i] |= a[++i]).

- All "matching C++", "Match C++", etc. comments throughout
  codegen.rs and generator.rs that referenced the removed pipeline.
2026-03-20 00:51:23 -05:00
Andreas Kling
bc4379983f LibJS: Improve bytecode executable dump format
Add a metadata header showing register count, block count, local
variable names, and the constants table. Resolve jump targets to
block labels (e.g. "block1") instead of raw hex addresses, and add
visual separation between basic blocks.

Make identifier and property key formatting more concise by using
backtick quoting and showing base_identifier as a trailing
parenthetical hint that joins the base and property names.

Generate a stable name for each executable by hashing the source
text it covers (stable across codegen changes). Named functions
show as "foo$9beb91ec", anonymous ones as "$43362f3f". Also show
the source filename, line, and column.
2026-03-20 00:51:23 -05:00
Andreas Kling
b4185f0ecd LibJS: Split packed and holey asm indexed fast paths
Use dedicated Packed branches in GetByValue and PutByValue so
in-bounds indexed accesses can skip hole checks and slot
reloads.

Keep Holey writes on the guarded arm, and keep append writes on
the C++ slow path so PutByValue still respects non-extensible
indexed objects and arrays with a non-writable length.

Add a bytecode regression that exercises both append failure
cases through the real js binary path.
2026-03-17 22:28:35 -05:00
Andreas Kling
31606fddd3 LibJS: Add Mov2/Mov3 instructions to reduce dispatch overhead
Add Mov2 and Mov3 bytecode instructions that perform 2 or 3 register
moves in a single dispatch. A peephole optimization pass during
bytecode assembly merges consecutive Mov instructions within each
basic block into these combined instructions.

When merging, identical Movs are deduplicated (e.g. two identical Movs
become a single Mov, not a Mov2). This optimization is implemented in
both the C++ and Rust codegen pipelines.

The goal is to reduce the per-instruction dispatch overhead, which is
significant compared to the actual cost of moving a value.

This isn't fancy or elegant, but provides a real speed-up on many
workloads. As an example, Kraken/imaging-desaturate.js improves by
~1.07x on my laptop.
2026-03-11 17:04:32 +01:00
InvalidUsernameException
4cd1fc8019 LibJS: Copy LHS of compound assignment to preserve evaluation order
This error was found by asking an LLM to generate additional, related
test cases for the bug affecting https://volkswagen.de fixed in an
earlier commit.
2026-03-08 15:01:07 +01:00
InvalidUsernameException
ced435987c LibJS: Copy keys in object expression to preserve evaluation order
This error was found by asking an LLM to generate additional, related
test cases for the bug affecting https://volkswagen.de fixed in an
earlier commit.
2026-03-08 15:01:07 +01:00
InvalidUsernameException
bb762fb43b LibJS: Do not assume arguments cannot be clobbered
`copy_if_needed_to_preserve_evaluation_order` was introduced in
c372a084a2. At that point function
arguments still needed to be copied into registers with a special
`GetArgument` instructions. Later, in
3f04d18ef7 this was changed and arguments
were made their own operand type that can be accessed directly instead.

Similar to locals, arguments can also be overwritten due to evaluation
order in various scenarios. However, the function was never updated to
account for that. Rectify that here.

With this change, https://volkswagen.de no longer gets blanked shortly
after initial load and the unhandled JS exception spam on that site is
gone too.
2026-03-08 15:01:07 +01:00
InvalidUsernameException
1d011f432b Tests/LibJS: Rebaseline bytecode tests with trailing newline diff
These tests pass when running them normally, but they produce a diff
when rebaselining. We should probably find out where this is coming
from, but for now just rebaseline all affected tests to make bytecode
diffs of upcoming commits clean.
2026-03-08 15:01:07 +01:00
Andreas Kling
54a1a66112 LibJS: Store cache pointers directly in bytecode instructions
Instead of storing a u32 index into a cache vector and looking up the
cache at runtime through a chain of dependent loads (load Executable*,
load vector data pointer, multiply index, add), store the actual cache
pointer as a u64 directly in the instruction stream.

A fixup pass (Executable::fixup_cache_pointers()) runs after Executable
construction in both the Rust and C++ pipelines, walking the bytecode
and replacing each index with the corresponding pointer.

The cache pointer type is encoded in Bytecode.def (e.g.
PropertyLookupCache*, GlobalVariableCache*) so the fixup switch is
auto-generated by the Python Op code generator, making it impossible
to forget updating the fixup when adding new cached instructions.

This eliminates 3-4 dependent loads on every inline cache access in
both the C++ interpreter and the assembly interpreter.
2026-03-08 10:27:13 +01:00
Andreas Kling
ac35ef465b LibJS: Emit ThrowIfTDZ before simple assignment to let variables
The Rust bytecode codegen was missing a TDZ check before assigning to
local let/const variables in simple assignment expressions (a = expr).
The C++ pipeline correctly emits ThrowIfTDZ before the store to ensure
temporal dead zone semantics are enforced.

Add an emit_tdz_check_if_needed helper matching the C++ equivalent,
and call it in the simple assignment path.
2026-03-04 18:53:12 +01:00
Andreas Kling
56e09695e0 LibJS: Consolidate Put bytecode instructions and reduce code bloat
Replace 20 separate Put instructions (5 PutKinds x 4 forms) with
4 unified instructions (PutById, PutByIdWithThis, PutByValue,
PutByValueWithThis), each carrying a PutKind field at runtime instead
of being a separate opcode.

This reduces the number of handler entry points in the dispatch loop
and eliminates template instantiations of put_by_property_key and
put_by_value that were being duplicated 5x each when inlined by LTO.
2026-03-04 18:53:12 +01:00
Andreas Kling
fb61294df7 LibJS: Add UsingDeclaration to needs_block_declaration_instantiation
Blocks containing non-local using declarations need a lexical
environment, just like let/const declarations. Add the missing
UsingDeclaration case to match C++ behavior.
2026-03-04 12:17:59 +01:00
Andreas Kling
bd7fc2b1b1 LibJS: Fix ResolveThisBinding/ResolveSuperBase emission order
Emit ResolveThisBinding before ResolveSuperBase in both
emit_evaluate_member_reference and emit_store_to_reference, matching
the C++ pipeline's evaluation order for super property references.

Also restructure emit_evaluate_member_reference to move non-super base
evaluation into the else branch, since the super path now handles
base evaluation differently (explicit ResolveSuperBase instead of
going through generate_expression on Super).
2026-03-04 12:17:59 +01:00
Andreas Kling
4120765497 LibJS: Keep arg_holders alive in generate_arguments_array
Keep the arg_holders vector alive through the spread arguments loop,
matching the C++ pipeline where the args Vector keeps registers held
through the loop. This ensures consistent register allocation.
2026-03-04 12:17:59 +01:00
Andreas Kling
7ceba6d2cb LibJS: Fix register order in private logical assignment
Move the destination register allocation after RHS evaluation in
private identifier logical assignment (&&=, ||=, ??=), matching the
C++ pipeline's register allocation order.
2026-03-04 12:17:59 +01:00
Andreas Kling
fa72fd9f95 LibJS: Optimize constant string computed properties to MemberId
When a computed member expression uses a constant string (e.g.
super["minutes"] or obj["key"]), optimize it to use the MemberId or
SuperMemberId reference form instead of the value-based form, matching
the C++ pipeline optimization.
2026-03-04 12:17:59 +01:00
Andreas Kling
33a6b90ccf LibJS: Clear pending_lhs_name for named class expressions
Named class expressions don't use pending_lhs_name, but we must still
clear it to prevent it from leaking through to nested anonymous
functions inside the class body.
2026-03-04 12:17:59 +01:00
Andreas Kling
ce4767f744 LibJS: Only set pending_lhs_name for non-empty class field names
For computed class fields, field_name is empty and the name is set at
runtime. Avoid setting pending_lhs_name in that case, which prevents
the name from leaking into computed field initializers.
2026-03-04 12:17:59 +01:00
Andreas Kling
722a897b28 LibJS: Remove redundant ThrowIfTDZ from Rust emit_set_variable
The caller is responsible for emitting ThrowIfTDZ before calling
emit_set_variable(), matching the C++ pipeline behavior. Remove the
redundant TDZ checks from both the const and non-const local paths.
2026-03-04 12:17:59 +01:00
Andreas Kling
d88374e119 Tests/LibJS: Add bytecode test for for-of with conditional in RHS
This adds a test case where the for-of iterable is a sequence
expression containing a conditional expression. The C++ pipeline
creates loop blocks before evaluating the iterable, giving them lower
block numbers, while the Rust pipeline evaluates the iterable first.
2026-03-01 21:20:54 +01:00
Andreas Kling
aadfe0f02a Tests/LibJS: Add test for compound assignment after destructuring
This adds a test case for compound assignment to a variable that was
initialized via a let destructuring pattern. The Rust pipeline emits a
redundant ThrowIfTDZ after the compound assignment because the variable
is not tracked as initialized after destructuring.
2026-03-01 21:20:54 +01:00
Andreas Kling
86191ce229 Tests/LibJS: Add bytecode test for destructuring assignment in &&
This adds a test case for array destructuring assignment inside a
logical AND expression, e.g. `t && ([a, b] = t(e))`. The C++ pipeline
allocates a separate register for the RHS and copies it to the result
register after destructuring, while the Rust pipeline evaluates the
RHS directly into the preferred destination, omitting the copy.
2026-03-01 21:20:54 +01:00
Andreas Kling
f0c34d54a1 Tests/LibJS: Add bytecode test for for-of continue with block scope
Test that continue inside a for-of loop body properly restores the
lexical environment when the for-of creates a per-iteration scope
for the loop variable.
2026-03-01 21:20:54 +01:00
Andreas Kling
fa9c1a6885 Tests/LibJS: Add bytecode test for return from switch with block scope
Test that returning from inside a switch statement that has a lexical
environment (for const/let declarations) properly emits
SetLexicalEnvironment to restore the parent environment before each
Return instruction.
2026-03-01 21:20:54 +01:00
Andreas Kling
6432754251 Tests/LibJS: Add bytecode test for postfix increment on private member 2026-03-01 21:20:54 +01:00
Andreas Kling
ffa380f15b Tests/LibJS: Add bytecode test for async await in try-catch with scope
Test that the await continuation's throw path properly unwinds the
lexical environment when inside a with statement within a try-catch.
2026-03-01 21:20:54 +01:00
Andreas Kling
dc464ba270 Tests/LibJS: Add bytecode test for var environment capacity
Add a test that exercises CreateVariableEnvironment capacity when a
function has parameter expressions and non-local var bindings.
2026-03-01 21:20:54 +01:00
Andreas Kling
17c8a80afc Tests/LibJS: Add bytecode test for postfix update in logical AND
Add a test that exercises postfix increment/decrement as the RHS of a
logical AND expression, verifying the register allocation matches C++.
2026-03-01 21:20:54 +01:00
Andreas Kling
95fec309cd Tests/LibJS: Add bytecode test for nested try-finally continue
Add a test that exercises break/continue trampolines through nested
try-finally blocks, ensuring exception handler ranges are correct.
2026-03-01 21:20:54 +01:00
Andreas Kling
d0b9905de1 LibJS/Rust: Use GetLengthWithThis for super.length property access
The C++ pipeline has an optimization that uses the GetLengthWithThis
instruction instead of GetByIdWithThis when accessing the "length"
property. Add the same optimization to the Rust pipeline by
introducing an emit_get_by_id_with_this helper that checks for the
"length" property name and emits the optimized instruction.

Also update emit_get_by_value_with_this to use GetLengthWithThis
when the computed property is a constant "length" string.
2026-03-01 21:20:54 +01:00
Andreas Kling
56603319b4 LibJS/Rust: Fix evaluation order in delete super[key]
Per spec, the property key expression should be evaluated before
calling ResolveSuperBase. Fix the Rust codegen to match the C++
pipeline's correct evaluation order.
2026-03-01 21:20:54 +01:00
Andreas Kling
176a618fce LibJS: Don't emit dead code after Throw for invalid LHS expressions
When the left-hand side of an assignment, update, or for-in loop is
invalid (e.g. `foo() = "bar"`), the bytecode generator emits a Throw
instruction. Previously, it would also create a dead basic block after
the Throw, resulting in unreachable instructions in the output.

Fix this by returning early from the relevant codegen paths after
emitting the Throw, and by guarding for-in/for-of body generation
with an is_current_block_terminated() check.
2026-03-01 21:20:54 +01:00
Andreas Kling
18c40a1328 LibJS/Rust: Fix has_parameter_expressions and TDZ checks for arguments
Fix two bugs in the Rust bytecode codegen:

1. has_parameter_expressions incorrectly treated any destructuring
   parameter as a "parameter expression", when it should only do so
   for patterns that contain expressions (defaults or computed keys).
   This caused an unnecessary CreateLexicalEnvironment for simple
   destructuring like `function f({a, b}) {}`. The same bug existed
   in both codegen.rs and lib.rs (SFD metadata computation).

2. emit_set_variable used is_local_lexically_declared(index) for
   argument locals, but that function indexes into the local_variables
   array using the argument's index, checking the wrong variable.
   This caused spurious ThrowIfTDZ instructions when assigning to
   function arguments that happened to share an index with an
   uninitialized let/const variable.
2026-03-01 21:20:54 +01:00
Andreas Kling
6cdfbd01a6 LibJS: Add alternative source-to-bytecode pipeline in Rust
Implement a complete Rust reimplementation of the LibJS frontend:
lexer, parser, AST, scope collector, and bytecode code generator.

The Rust pipeline is built via Corrosion (CMake-Cargo bridge) and
linked into LibJS as a static library. It is gated behind a build
flag (ENABLE_RUST, on by default except on Windows) and two runtime
environment variables:

- LIBJS_CPP: Use the C++ pipeline instead of Rust
- LIBJS_COMPARE_PIPELINES=1: Run both pipelines in lockstep,
  aborting on any difference in AST or bytecode generated.

The C++ side communicates with Rust through a C FFI layer
(RustIntegration.cpp/h) that passes source text to Rust and receives
a populated Executable back via a BytecodeFactory interface.
2026-02-24 09:39:42 +01:00
Andreas Kling
f3b675fb37 Tests/LibJS: Import various tests developed alongside Rust pipeline 2026-02-24 09:39:42 +01:00
Andreas Kling
234203ed9b LibJS: Ensure deterministic ordering in scope analysis and codegen
The scope collector uses HashMaps for identifier groups and variables,
which means their iteration order is non-deterministic. This causes
local variable indices and function declaration instantiation (FDI)
bytecode to vary between runs.

Fix this by sorting identifier group keys alphabetically before
assigning local variable indices, and sorting vars_to_initialize by
name before emitting FDI bytecode.

Also make register allocation deterministic by always picking the
lowest-numbered free register instead of whichever one happens to be
at the end of the free list.

This is preparation for bringing in a new source->bytecode pipeline
written in Rust. Checking for regressions is significantly easier
if we can expect identical output from both pipelines.
2026-02-24 09:39:42 +01:00
Andreas Kling
d4f222e442 LibJS: Don't reset switch case completion value for empty results
When a statement in a switch case body doesn't produce a result (e.g.
a variable declaration), we were incorrectly resetting the completion
value to undefined. This caused the completion value of preceding
expression statements to be lost.
2026-02-19 12:02:50 +01:00
Andreas Kling
b0b0275e9e LibJS: Add bytecode test for switch statement completion values
The completion value of a switch case is incorrectly reset to undefined
when a statement without a result (like a variable declaration) follows
an expression statement. This will be fixed in the next commit.
2026-02-19 12:02:50 +01:00
Andreas Kling
afae23e270 LibJS: Don't optimize body vars to locals when referenced in defaults
When a function has parameter expressions (default values), body var
declarations that shadow a name referenced in a default parameter
expression must not be optimized to local variables. The default
expression needs to resolve the name from the outer scope via the
environment chain, not read the uninitialized local.

We now mark identifiers referenced during formal parameter parsing
with an IsReferencedInFormalParameters flag, and skip local variable
optimization for body vars that carry both this flag and IsVar (but
not IsForbiddenLexical, which indicates parameter names themselves).
2026-02-19 02:45:37 +01:00
Andreas Kling
cd2576c031 LibJS: Mark block-scoped function declaration locals as initialized
When emitting block declaration instantiation, we were not calling
set_local_initialized() after writing block-scoped function
declarations to local variables via Mov. This caused unnecessary
ThrowIfTDZ checks to be emitted when those locals were later read.

Block-scoped function declarations are always initialized at block
entry (via NewFunction + Mov), so TDZ checks for them are redundant.
2026-02-19 02:45:37 +01:00
Andreas Kling
47e552e8fd LibJS: Consolidate TDZ check emission into Generator helper
Move the duplicated ThrowIfTDZ emission logic from three places in
ASTCodegen.cpp into a single Generator::emit_tdz_check_if_needed()
helper. This handles both argument TDZ (which requires a Mov to
empty first) and lexically-declared variable TDZ uniformly.

This avoids emitting some unnecessary ThrowIfTDZ instructions.
2026-02-17 20:44:57 +01:00