Commit Graph

78 Commits

Author SHA1 Message Date
Andreas Kling
bb0acb54ae LibJS: Optimize x >> 0 to ToInt32 in bytecode codegen
x >> 0 is a common JS idiom equivalent to ToInt32(x). We already had
this optimization for x | 0, now do it for right shift by zero as well.

This allows the asmint handler for ToInt32 to run instead of the more
expensive RightShift handler, which wastes time loading and checking the
rhs operand and performing a shift by zero.
2026-03-20 00:51:23 -05:00
Andreas Kling
02b0746676 LibJS: Deduplicate double constants in bytecode generator
Add a deduplication cache for double constants, matching the existing
approach for int32 and string constants. Multiple references to the
same floating-point value now share a single constant table entry.
2026-03-20 00:51:23 -05:00
Andreas Kling
144ab69715 LibJS: Remove C++ pipeline compatibility hacks from Rust codegen
Now that the C++ bytecode pipeline has been removed, we no longer
need to match its register allocation or block layout. This removes:

- All manual drop() calls that existed solely to match C++ register
  lifetimes, replaced with scope blocks to naturally limit register
  lifetimes without increasing register pressure.

- The unnecessary saved_property copy in update expressions. The
  property register is now used directly since emit_update_op
  doesn't evaluate user expressions that could mutate it. The copy
  is retained in compound/logical assignments where the RHS can
  mutate the property variable (e.g. a[i] |= a[++i]).

- All "matching C++", "Match C++", etc. comments throughout
  codegen.rs and generator.rs that referenced the removed pipeline.
2026-03-20 00:51:23 -05:00
Andreas Kling
f5eea4d232 LibJS: Fix catch parameter and new.target regressions
- Restrict catch parameter conflict check to only direct children
  of the catch body block, not nested scopes
- Set new_target_is_valid for dynamic function compilation (new
  Function)
- Move check_parameters_post_body before flag restoration in
  parse_method_definition so generator methods inside static init
  blocks correctly allow 'await' as a parameter name
2026-03-19 23:15:03 -05:00
Andreas Kling
5374f0a85c LibJS: Add more early errors in Rust parser
- Reject duplicate bindings in catch parameter patterns
- Reject redeclaration of catch parameter with let/const/function
- Reject binding patterns with initializers in for-in heads (AnnexB
  only permits simple BindingIdentifier with initializer)
- Reject 'await' as binding identifier in class static init blocks
  and module code
2026-03-19 23:15:03 -05:00
Andreas Kling
49cc44a3eb LibJS: Reject arguments/eval in strict mode destructuring and arrows
Check identifier name validity for destructuring assignment pattern
bound names, and validate arrow function parameters after the arrow
is confirmed rather than during speculative parameter parsing.

This fixes arguments/eval as destructuring assignment targets and as
arrow function parameter names in strict mode.
2026-03-19 23:15:03 -05:00
Andreas Kling
66dbb355fe LibJS: Reject new.target in arrow functions at global scope
Arrow functions don't have their own new.target binding -- they
inherit from the enclosing scope. At the global level, there is no
enclosing function, so new.target inside a global arrow is invalid.

Add a new_target_is_valid flag to ParserFlags that is set to true
when entering regular (non-arrow) function bodies, method
definitions, and class static init blocks. Arrow functions inherit
the flag from their enclosing scope rather than setting it.
2026-03-19 23:15:03 -05:00
Andreas Kling
6029a3d40e LibJS: Add missing early errors in Rust parser
- Reject `true`, `false`, `null` as label identifiers
- Reject generator declarations in if-statement bodies (not covered
  by Annex B)
- Reject `await` as label in class static init blocks and modules
- Reject `arguments` in class static initialization blocks
- Reject generator shorthand without method body in object literals
- Reject `get constructor()` / `set constructor()` in class bodies
- Reject `super.#private` member access
2026-03-19 23:15:03 -05:00
Andreas Kling
30f108ba36 LibJS: Remove C++ lexer, use Rust tokenizer for syntax highlighting
Delete Lexer.cpp/h and Token.cpp, replacing all tokenization with a
new rust_tokenize() FFI function that calls back for each token.

Rewrite SyntaxHighlighter.cpp and js.cpp REPL to use the Rust
tokenizer. The token type and category enums in Token.h now mirror
the Rust definitions in token.rs.

Move is_syntax_character/is_whitespace/is_line_terminator helpers
into RegExpConstructor.cpp as static functions, since they were only
used there.
2026-03-19 21:55:10 -05:00
Andrew Kaster
f06bd0303f LibJS: Use enum for retrieving well known symbols from C++ to Rust 2026-03-19 09:48:32 +01:00
Andrew Kaster
5d43707896 LibJS: Directly use LiteralValueKind enum across FFI boundary 2026-03-19 09:48:32 +01:00
RubenKelevra
fae2f8f3ba LibJS: Align new-expression paren flags with C++ parser 2026-03-18 17:41:36 -05:00
RubenKelevra
3cb636ca38 LibJS: Keep new call-paren optional chaining valid 2026-03-18 17:41:36 -05:00
RubenKelevra
ea8fa63e79 LibJS: Reject optional chaining on unparenthesized new 2026-03-18 17:41:36 -05:00
RubenKelevra
04b27429de LibJS: Isolate super validity in nested function scopes 2026-03-18 17:41:36 -05:00
RubenKelevra
d8469c384d LibJS: Reject invalid bare private identifier usage 2026-03-18 17:41:36 -05:00
RubenKelevra
d6229a1cc8 LibJS: Fix async arrow and for-of async parsing 2026-03-18 17:41:36 -05:00
RubenKelevra
af777b5d86 LibJS: Align duplicate parameter early errors 2026-03-18 17:41:36 -05:00
RubenKelevra
40984d0f39 LibJS: Enforce const initializers in declarations 2026-03-18 17:41:36 -05:00
Andrew Kaster
92e4c20ad5 LibJS: Generate FFI header using cbindgen instead of hand-rolling
Replace the BytecodeFactory header with cbindgen.

This will help ensure that types and enums and constants are kept in
sync between the C++ and Rust code. It's also a step in exporting more
Rust enums directly rather than relying on magic constants for
switch statements.

The FFI functions are now all placed in the JS::FFI namespace, which
is the cause for all the churn in the scripting parts of LibJS and
LibWeb.
2026-03-17 20:49:50 -05:00
Andrew Kaster
ba8a71152c LibJS: Wrap ParseErrorCallback typedef in Option
The automatic conversion of `Option<function pointer>` to a C function
pointer is only supported by cbindgen if the option is in the typedef
itself, rather than an argument type. This no-op commit will make a
future cbingen update happy.
2026-03-17 20:49:50 -05:00
juhotuho10
2a42ef36db LibJS/Rust: Apply lint clippy::explicit_iter_loop 2026-03-16 13:55:16 -05:00
juhotuho10
012c7d077b LibJS/Rust: Apply lint clippy::borrow_as_ptr 2026-03-16 13:55:16 -05:00
juhotuho10
8deedd4e67 LibJS/Rust: Apply lint clippy::unnested_or_patterns 2026-03-16 13:55:16 -05:00
juhotuho10
21d1cfdd63 LibJS/Rust: Apply lint clippy::redundant_clone 2026-03-16 13:55:16 -05:00
juhotuho10
dfa7993780 LibJS/Rust: Apply lint clippy::elidable_lifetime_names 2026-03-16 13:55:16 -05:00
juhotuho10
277177484e LibJS/Rust: Apply lint clippy::ref_option 2026-03-16 13:55:16 -05:00
juhotuho10
59251cc371 LibJS/Rust: Apply lint clippy::useless_let_if_seq 2026-03-16 13:55:16 -05:00
juhotuho10
57c2e40f0e LibJS/Rust: Apply lint clippy::unnecessary_wraps 2026-03-16 13:55:16 -05:00
juhotuho10
9ed1700558 LibJS/Rust: Apply lint clippy::manual_let_else 2026-03-16 13:55:16 -05:00
juhotuho10
3342e2c125 LibJS/Rust: Apply lint clippy::semicolon_if_nothing_returned 2026-03-16 13:55:16 -05:00
juhotuho10
671306d260 LibJS/Rust: Apply lint clippy::uninlined_format_args 2026-03-16 13:55:16 -05:00
Johan Dahlin
9454aa3796 LibJS: Replace FunctionTable sparse Vec with HashMap
Switch from a sparse Vec<Option<Box<FunctionData>>> keyed by numeric
FunctionId to FxHashMap<FunctionId, Box<FunctionData>>. The previous
representation allocated slots for every ID between 0 and max, wasting
memory when IDs are sparse (common in large bundles with many modules).
2026-03-15 10:51:47 -05:00
Andreas Kling
c55418bc60 LibJS: Fix AssignmentTargetType for NewExpression and strict mode
Fix the static semantics for AssignmentTargetType in both the C++ and
Rust parsers:

- NewExpression is never a valid assignment target. Previously, the C++
  parser's is<CallExpression> check matched NewExpression since it
  inherits from CallExpression in our AST. Add explicit
  !is<NewExpression> guards everywhere.

- CallExpression as assignment target is only valid in non-strict mode
  (web-compat "runtime error for function call assignment targets").
  Pass strict_mode to is_simple_assignment_target and reject call
  expressions in strict mode for assignment, compound assignment,
  prefix/postfix update, and for-in/of LHS positions.

- Parenthesized ObjectLiteral/ArrayLiteral (e.g. `({}) = 1`) must not
  be treated as destructuring patterns. Track whether the primary
  expression was parenthesized and skip binding pattern synthesis.

Update existing tests that were testing incorrect behavior:
- `'use strict'; foo() = 'foo'` is now correctly a SyntaxError
- for-in/of with call expression LHS: use toEval() instead of
  toEvalTo() (which runs eval inside a class method, i.e. strict mode)
2026-03-13 13:08:22 -05:00
Andreas Kling
31606fddd3 LibJS: Add Mov2/Mov3 instructions to reduce dispatch overhead
Add Mov2 and Mov3 bytecode instructions that perform 2 or 3 register
moves in a single dispatch. A peephole optimization pass during
bytecode assembly merges consecutive Mov instructions within each
basic block into these combined instructions.

When merging, identical Movs are deduplicated (e.g. two identical Movs
become a single Mov, not a Mov2). This optimization is implemented in
both the C++ and Rust codegen pipelines.

The goal is to reduce the per-instruction dispatch overhead, which is
significant compared to the actual cost of moving a value.

This isn't fancy or elegant, but provides a real speed-up on many
workloads. As an example, Kraken/imaging-desaturate.js improves by
~1.07x on my laptop.
2026-03-11 17:04:32 +01:00
InvalidUsernameException
133bbeb4ec LibJS: Copy LHS of binary expression to preserve evaluation order
This error was found by asking an LLM to generate additional, related
test cases for the bug affecting https://volkswagen.de fixed in an
earlier commit.

An unconditional call to `copy_if_needed_to_preserve_evaluation_order`
in this place was showing up quiet significantly in the JS benchmarks.
To avoid the regression, there is now a small heuristic that avoids the
unnecessary Mov instruction in the vast majority of cases. This is
likely not the best way to deal with this. But the changes in the
current patch set are focussed on correctness, not performance. So I
opted for a localized, minimal-impact solution to the performance
regression.
2026-03-08 15:01:07 +01:00
InvalidUsernameException
4cd1fc8019 LibJS: Copy LHS of compound assignment to preserve evaluation order
This error was found by asking an LLM to generate additional, related
test cases for the bug affecting https://volkswagen.de fixed in an
earlier commit.
2026-03-08 15:01:07 +01:00
InvalidUsernameException
ced435987c LibJS: Copy keys in object expression to preserve evaluation order
This error was found by asking an LLM to generate additional, related
test cases for the bug affecting https://volkswagen.de fixed in an
earlier commit.
2026-03-08 15:01:07 +01:00
InvalidUsernameException
bb762fb43b LibJS: Do not assume arguments cannot be clobbered
`copy_if_needed_to_preserve_evaluation_order` was introduced in
c372a084a2. At that point function
arguments still needed to be copied into registers with a special
`GetArgument` instructions. Later, in
3f04d18ef7 this was changed and arguments
were made their own operand type that can be accessed directly instead.

Similar to locals, arguments can also be overwritten due to evaluation
order in various scenarios. However, the function was never updated to
account for that. Rectify that here.

With this change, https://volkswagen.de no longer gets blanked shortly
after initial load and the unhandled JS exception spam on that site is
gone too.
2026-03-08 15:01:07 +01:00
InvalidUsernameException
34b7cb6e55 LibJS: Explicitly handle all operand types when determining clobbering
The last time a new operand type was added, the effects from that on the
function changed in this commit were seemingly not properly considered,
introducing a bug. To avoid such errors in the future, rewrite the code
to produce a compile-time error if new operand types are added.

No functional changes yet, the actual bugfix will be in a
followup-commit.
2026-03-08 15:01:07 +01:00
Andreas Kling
54a1a66112 LibJS: Store cache pointers directly in bytecode instructions
Instead of storing a u32 index into a cache vector and looking up the
cache at runtime through a chain of dependent loads (load Executable*,
load vector data pointer, multiply index, add), store the actual cache
pointer as a u64 directly in the instruction stream.

A fixup pass (Executable::fixup_cache_pointers()) runs after Executable
construction in both the Rust and C++ pipelines, walking the bytecode
and replacing each index with the corresponding pointer.

The cache pointer type is encoded in Bytecode.def (e.g.
PropertyLookupCache*, GlobalVariableCache*) so the fixup switch is
auto-generated by the Python Op code generator, making it impossible
to forget updating the fixup when adding new cached instructions.

This eliminates 3-4 dependent loads on every inline cache access in
both the C++ interpreter and the assembly interpreter.
2026-03-08 10:27:13 +01:00
Andreas Kling
c31c52b0a9 LibJS: Unify parser and scope collector error types
Replace three identical error structs (ParserError, ScopeError,
ParsedError) with a single shared ParseError type. Since all three
had the same fields (message, line, column), having separate types
only added verbose field-by-field copying at each boundary.

Now errors flow directly from parser/scope collector into
ParsedProgram without conversion.
2026-03-06 13:06:05 +01:00
Andreas Kling
7d45e897c4 LibJS: Add rust_compile_parsed_module() for pre-parsed modules
Add rust_compile_parsed_module() which takes a ParsedProgram (from
rust_parse_program with type=Module) and compiles it with GC
interaction. This extracts import/export metadata, compiles the
module body to bytecode, and extracts declaration data.

Rewrite rust_compile_module() to delegate to rust_parse_program()
followed by rust_compile_parsed_module() internally, matching the
rust_compile_script() pattern.
2026-03-06 13:06:05 +01:00
Andreas Kling
2082e063aa LibJS: Split rust_compile_script() into parse and compile steps
Add a ParsedProgram struct that holds the parsed AST, function table,
scope data, and strictness flag without any GC references. This
enables future off-thread parsing since the parse step makes zero
GC allocations.

The type is called ParsedProgram (not ParsedScript) because it will
be used for both scripts and modules. It takes a program_type
parameter (0 = Script, 1 = Module) to handle both cases.

New FFI functions:
- rust_parse_program(): lex, parse, scope analysis (no VM/GC needed)
- rust_compile_parsed_script(): codegen + GDI extraction (needs VM)
- rust_parsed_program_has_errors(): check for parse errors
- rust_parsed_program_take_errors(): report errors via callback
- rust_parsed_program_ast_dump(): lazily generate AST dump string
- rust_free_parsed_program(): free without compiling

Rewrite rust_compile_script() to call rust_parse_program() followed
by rust_compile_parsed_script() internally, preserving the existing
behavior and API.
2026-03-06 13:06:05 +01:00
Andreas Kling
b2b72a1884 LibJS: Defer regex literal compilation to post-parse step
Move regex compilation out of the parsing hot path. Both the C++ and
Rust parsers now collect raw regex pattern+flags strings during parsing
and batch-compile them after parsing completes.

This is a prerequisite for moving the Rust parser to a background
thread, since LibRegex is thread-unsafe and FFI calls during parsing
prevent parallelization.

Flag validation remains in the parser since it's trivial string
checking with no LibRegex dependency.
2026-03-06 13:06:05 +01:00
Andreas Kling
ac35ef465b LibJS: Emit ThrowIfTDZ before simple assignment to let variables
The Rust bytecode codegen was missing a TDZ check before assigning to
local let/const variables in simple assignment expressions (a = expr).
The C++ pipeline correctly emits ThrowIfTDZ before the store to ensure
temporal dead zone semantics are enforced.

Add an emit_tdz_check_if_needed helper matching the C++ equivalent,
and call it in the simple assignment path.
2026-03-04 18:53:12 +01:00
Andreas Kling
56e09695e0 LibJS: Consolidate Put bytecode instructions and reduce code bloat
Replace 20 separate Put instructions (5 PutKinds x 4 forms) with
4 unified instructions (PutById, PutByIdWithThis, PutByValue,
PutByValueWithThis), each carrying a PutKind field at runtime instead
of being a separate opcode.

This reduces the number of handler entry points in the dispatch loop
and eliminates template instantiations of put_by_property_key and
put_by_value that were being duplicated 5x each when inlined by LTO.
2026-03-04 18:53:12 +01:00
Andreas Kling
fb61294df7 LibJS: Add UsingDeclaration to needs_block_declaration_instantiation
Blocks containing non-local using declarations need a lexical
environment, just like let/const declarations. Add the missing
UsingDeclaration case to match C++ behavior.
2026-03-04 12:17:59 +01:00
Andreas Kling
bd7fc2b1b1 LibJS: Fix ResolveThisBinding/ResolveSuperBase emission order
Emit ResolveThisBinding before ResolveSuperBase in both
emit_evaluate_member_reference and emit_store_to_reference, matching
the C++ pipeline's evaluation order for super property references.

Also restructure emit_evaluate_member_reference to move non-super base
evaluation into the else branch, since the super path now handles
base evaluation differently (explicit ResolveSuperBase instead of
going through generate_expression on Super).
2026-03-04 12:17:59 +01:00
Andreas Kling
4120765497 LibJS: Keep arg_holders alive in generate_arguments_array
Keep the arg_holders vector alive through the spread arguments loop,
matching the C++ pipeline where the args Vector keeps registers held
through the loop. This ensures consistent register allocation.
2026-03-04 12:17:59 +01:00