x >> 0 is a common JS idiom equivalent to ToInt32(x). We already had
this optimization for x | 0, now do it for right shift by zero as well.
This allows the asmint handler for ToInt32 to run instead of the more
expensive RightShift handler, which wastes time loading and checking the
rhs operand and performing a shift by zero.
Add a deduplication cache for double constants, matching the existing
approach for int32 and string constants. Multiple references to the
same floating-point value now share a single constant table entry.
Now that the C++ bytecode pipeline has been removed, we no longer
need to match its register allocation or block layout. This removes:
- All manual drop() calls that existed solely to match C++ register
lifetimes, replaced with scope blocks to naturally limit register
lifetimes without increasing register pressure.
- The unnecessary saved_property copy in update expressions. The
property register is now used directly since emit_update_op
doesn't evaluate user expressions that could mutate it. The copy
is retained in compound/logical assignments where the RHS can
mutate the property variable (e.g. a[i] |= a[++i]).
- All "matching C++", "Match C++", etc. comments throughout
codegen.rs and generator.rs that referenced the removed pipeline.
- Restrict catch parameter conflict check to only direct children
of the catch body block, not nested scopes
- Set new_target_is_valid for dynamic function compilation (new
Function)
- Move check_parameters_post_body before flag restoration in
parse_method_definition so generator methods inside static init
blocks correctly allow 'await' as a parameter name
- Reject duplicate bindings in catch parameter patterns
- Reject redeclaration of catch parameter with let/const/function
- Reject binding patterns with initializers in for-in heads (AnnexB
only permits simple BindingIdentifier with initializer)
- Reject 'await' as binding identifier in class static init blocks
and module code
Check identifier name validity for destructuring assignment pattern
bound names, and validate arrow function parameters after the arrow
is confirmed rather than during speculative parameter parsing.
This fixes arguments/eval as destructuring assignment targets and as
arrow function parameter names in strict mode.
Arrow functions don't have their own new.target binding -- they
inherit from the enclosing scope. At the global level, there is no
enclosing function, so new.target inside a global arrow is invalid.
Add a new_target_is_valid flag to ParserFlags that is set to true
when entering regular (non-arrow) function bodies, method
definitions, and class static init blocks. Arrow functions inherit
the flag from their enclosing scope rather than setting it.
- Reject `true`, `false`, `null` as label identifiers
- Reject generator declarations in if-statement bodies (not covered
by Annex B)
- Reject `await` as label in class static init blocks and modules
- Reject `arguments` in class static initialization blocks
- Reject generator shorthand without method body in object literals
- Reject `get constructor()` / `set constructor()` in class bodies
- Reject `super.#private` member access
Delete Lexer.cpp/h and Token.cpp, replacing all tokenization with a
new rust_tokenize() FFI function that calls back for each token.
Rewrite SyntaxHighlighter.cpp and js.cpp REPL to use the Rust
tokenizer. The token type and category enums in Token.h now mirror
the Rust definitions in token.rs.
Move is_syntax_character/is_whitespace/is_line_terminator helpers
into RegExpConstructor.cpp as static functions, since they were only
used there.
Replace the BytecodeFactory header with cbindgen.
This will help ensure that types and enums and constants are kept in
sync between the C++ and Rust code. It's also a step in exporting more
Rust enums directly rather than relying on magic constants for
switch statements.
The FFI functions are now all placed in the JS::FFI namespace, which
is the cause for all the churn in the scripting parts of LibJS and
LibWeb.
The automatic conversion of `Option<function pointer>` to a C function
pointer is only supported by cbindgen if the option is in the typedef
itself, rather than an argument type. This no-op commit will make a
future cbingen update happy.
Switch from a sparse Vec<Option<Box<FunctionData>>> keyed by numeric
FunctionId to FxHashMap<FunctionId, Box<FunctionData>>. The previous
representation allocated slots for every ID between 0 and max, wasting
memory when IDs are sparse (common in large bundles with many modules).
Fix the static semantics for AssignmentTargetType in both the C++ and
Rust parsers:
- NewExpression is never a valid assignment target. Previously, the C++
parser's is<CallExpression> check matched NewExpression since it
inherits from CallExpression in our AST. Add explicit
!is<NewExpression> guards everywhere.
- CallExpression as assignment target is only valid in non-strict mode
(web-compat "runtime error for function call assignment targets").
Pass strict_mode to is_simple_assignment_target and reject call
expressions in strict mode for assignment, compound assignment,
prefix/postfix update, and for-in/of LHS positions.
- Parenthesized ObjectLiteral/ArrayLiteral (e.g. `({}) = 1`) must not
be treated as destructuring patterns. Track whether the primary
expression was parenthesized and skip binding pattern synthesis.
Update existing tests that were testing incorrect behavior:
- `'use strict'; foo() = 'foo'` is now correctly a SyntaxError
- for-in/of with call expression LHS: use toEval() instead of
toEvalTo() (which runs eval inside a class method, i.e. strict mode)
Add Mov2 and Mov3 bytecode instructions that perform 2 or 3 register
moves in a single dispatch. A peephole optimization pass during
bytecode assembly merges consecutive Mov instructions within each
basic block into these combined instructions.
When merging, identical Movs are deduplicated (e.g. two identical Movs
become a single Mov, not a Mov2). This optimization is implemented in
both the C++ and Rust codegen pipelines.
The goal is to reduce the per-instruction dispatch overhead, which is
significant compared to the actual cost of moving a value.
This isn't fancy or elegant, but provides a real speed-up on many
workloads. As an example, Kraken/imaging-desaturate.js improves by
~1.07x on my laptop.
This error was found by asking an LLM to generate additional, related
test cases for the bug affecting https://volkswagen.de fixed in an
earlier commit.
An unconditional call to `copy_if_needed_to_preserve_evaluation_order`
in this place was showing up quiet significantly in the JS benchmarks.
To avoid the regression, there is now a small heuristic that avoids the
unnecessary Mov instruction in the vast majority of cases. This is
likely not the best way to deal with this. But the changes in the
current patch set are focussed on correctness, not performance. So I
opted for a localized, minimal-impact solution to the performance
regression.
This error was found by asking an LLM to generate additional, related
test cases for the bug affecting https://volkswagen.de fixed in an
earlier commit.
This error was found by asking an LLM to generate additional, related
test cases for the bug affecting https://volkswagen.de fixed in an
earlier commit.
`copy_if_needed_to_preserve_evaluation_order` was introduced in
c372a084a2. At that point function
arguments still needed to be copied into registers with a special
`GetArgument` instructions. Later, in
3f04d18ef7 this was changed and arguments
were made their own operand type that can be accessed directly instead.
Similar to locals, arguments can also be overwritten due to evaluation
order in various scenarios. However, the function was never updated to
account for that. Rectify that here.
With this change, https://volkswagen.de no longer gets blanked shortly
after initial load and the unhandled JS exception spam on that site is
gone too.
The last time a new operand type was added, the effects from that on the
function changed in this commit were seemingly not properly considered,
introducing a bug. To avoid such errors in the future, rewrite the code
to produce a compile-time error if new operand types are added.
No functional changes yet, the actual bugfix will be in a
followup-commit.
Instead of storing a u32 index into a cache vector and looking up the
cache at runtime through a chain of dependent loads (load Executable*,
load vector data pointer, multiply index, add), store the actual cache
pointer as a u64 directly in the instruction stream.
A fixup pass (Executable::fixup_cache_pointers()) runs after Executable
construction in both the Rust and C++ pipelines, walking the bytecode
and replacing each index with the corresponding pointer.
The cache pointer type is encoded in Bytecode.def (e.g.
PropertyLookupCache*, GlobalVariableCache*) so the fixup switch is
auto-generated by the Python Op code generator, making it impossible
to forget updating the fixup when adding new cached instructions.
This eliminates 3-4 dependent loads on every inline cache access in
both the C++ interpreter and the assembly interpreter.
Replace three identical error structs (ParserError, ScopeError,
ParsedError) with a single shared ParseError type. Since all three
had the same fields (message, line, column), having separate types
only added verbose field-by-field copying at each boundary.
Now errors flow directly from parser/scope collector into
ParsedProgram without conversion.
Add rust_compile_parsed_module() which takes a ParsedProgram (from
rust_parse_program with type=Module) and compiles it with GC
interaction. This extracts import/export metadata, compiles the
module body to bytecode, and extracts declaration data.
Rewrite rust_compile_module() to delegate to rust_parse_program()
followed by rust_compile_parsed_module() internally, matching the
rust_compile_script() pattern.
Add a ParsedProgram struct that holds the parsed AST, function table,
scope data, and strictness flag without any GC references. This
enables future off-thread parsing since the parse step makes zero
GC allocations.
The type is called ParsedProgram (not ParsedScript) because it will
be used for both scripts and modules. It takes a program_type
parameter (0 = Script, 1 = Module) to handle both cases.
New FFI functions:
- rust_parse_program(): lex, parse, scope analysis (no VM/GC needed)
- rust_compile_parsed_script(): codegen + GDI extraction (needs VM)
- rust_parsed_program_has_errors(): check for parse errors
- rust_parsed_program_take_errors(): report errors via callback
- rust_parsed_program_ast_dump(): lazily generate AST dump string
- rust_free_parsed_program(): free without compiling
Rewrite rust_compile_script() to call rust_parse_program() followed
by rust_compile_parsed_script() internally, preserving the existing
behavior and API.
Move regex compilation out of the parsing hot path. Both the C++ and
Rust parsers now collect raw regex pattern+flags strings during parsing
and batch-compile them after parsing completes.
This is a prerequisite for moving the Rust parser to a background
thread, since LibRegex is thread-unsafe and FFI calls during parsing
prevent parallelization.
Flag validation remains in the parser since it's trivial string
checking with no LibRegex dependency.
The Rust bytecode codegen was missing a TDZ check before assigning to
local let/const variables in simple assignment expressions (a = expr).
The C++ pipeline correctly emits ThrowIfTDZ before the store to ensure
temporal dead zone semantics are enforced.
Add an emit_tdz_check_if_needed helper matching the C++ equivalent,
and call it in the simple assignment path.
Replace 20 separate Put instructions (5 PutKinds x 4 forms) with
4 unified instructions (PutById, PutByIdWithThis, PutByValue,
PutByValueWithThis), each carrying a PutKind field at runtime instead
of being a separate opcode.
This reduces the number of handler entry points in the dispatch loop
and eliminates template instantiations of put_by_property_key and
put_by_value that were being duplicated 5x each when inlined by LTO.
Blocks containing non-local using declarations need a lexical
environment, just like let/const declarations. Add the missing
UsingDeclaration case to match C++ behavior.
Emit ResolveThisBinding before ResolveSuperBase in both
emit_evaluate_member_reference and emit_store_to_reference, matching
the C++ pipeline's evaluation order for super property references.
Also restructure emit_evaluate_member_reference to move non-super base
evaluation into the else branch, since the super path now handles
base evaluation differently (explicit ResolveSuperBase instead of
going through generate_expression on Super).
Keep the arg_holders vector alive through the spread arguments loop,
matching the C++ pipeline where the args Vector keeps registers held
through the loop. This ensures consistent register allocation.