Commit Graph

2346 Commits

Author SHA1 Message Date
Andreas Kling
34d954e2d7 LibRegex: Add ECMAScriptRegex and migrate callers
Add `ECMAScriptRegex`, LibRegex's C++ facade for ECMAScript regexes.

The facade owns compilation, execution, captures, named groups, and
error translation for the Rust backend, which lets callers stop
depending on the legacy parser and matcher types directly. Use it in the
remaining non-LibJS callers: URLPattern, HTML input pattern handling,
and the places in LibHTTP that only needed token validation.

Where a full regex engine was unnecessary, replace those call sites with
direct character checks. Also update focused LibURL, LibHTTP, and WPT
coverage for the migrated callers and corrected surrogate handling.
2026-03-27 17:32:19 +01:00
Andreas Kling
66fb0a8394 LibRegex/Rust: Add the ECMA-262 regex engine
Add LibRegex's new Rust ECMAScript regular expression engine.

Replace the old parser's direct pattern-to-bytecode pipeline with a
split architecture: parse patterns into a lossless AST first, then
lower that AST into bytecode for a dedicated backtracking VM. Keep the
syntax tree as the place for validation, analysis, and optimization
instead of teaching every transformation to rewrite partially built
bytecode.

Specialize this backend for the job LibJS actually needs. The old C++
engine shared one generic parser and matcher stack across ECMA-262 and
POSIX modes and supported both byte-string and UTF-16 inputs. The new
engine focuses on ECMA-262 semantics on WTF-16 data, which lets it
model lone surrogates and other JavaScript-specific behavior directly
instead of carrying POSIX and multi-encoding constraints through the
whole implementation.

Fill in the ECMAScript features needed to replace the old engine for
real web workloads: Unicode properties and sets, lookahead and
lookbehind, named groups and backreferences, modifier groups, string
properties, large quantifiers, lone surrogates, and the parser and VM
corner cases those features exercise.

Reshape the runtime around compile-time pattern hints and a hotter VM
loop. Pre-resolve Unicode properties, derive first-character,
character-class, and simple-scan filters, extract safe trailing
literals for anchored patterns, add literal and literal-alternation
fast paths, and keep reusable scratch storage for registers,
backtracking state, and modifier stacks. Teach `find_all` to stay
inside one VM so global searches stop paying setup costs on every
match.

Make those shortcuts semantics-aware instead of merely fast. In Unicode
mode, do not use literal fast paths for lone surrogates, since
ECMA-262 must not let `/\ud83d/u` match inside a surrogate pair.
Likewise, only derive end-anchor suffix hints when the suffix lies on
every path to `Match`, so lookarounds and disjunctions cannot skip into
a shared tail and produce false negatives.

This commit lands the Rust crate, the C++ wrapper, the build
integration, and the initial LibJS-side plumbing needed to exercise
the new engine under real RegExp callers before removing the legacy
backend.
2026-03-27 17:32:19 +01:00
kalenikaliaksandr
0ecb02363d LibJS: Re-enable AddressSanitizer for LibJS on Windows
The ASAN disable was added as a workaround for a stack-overflow crash
during test-js. The root cause was a Variant layout bug due to missing
empty base optimization on the MSVC ABI, which has now been fixed.
2026-03-25 00:57:49 +01:00
Andreas Kling
e49ab8a1f2 LibJS: Make prototype shape tracking per-Realm instead of global
The set of all prototype shapes was a process-global static, which
meant that Shape::invalidate_all_prototype_chains_leading_to_this()
had to iterate over every prototype shape from every Realm in the
process.

This was catastrophic for pages that load many SVG-as-img resources,
since each SVG image creates its own Realm with a full set of JS
intrinsics and web prototypes. With N SVG images, each adding ~100
properties to their ObjectPrototype, this became O(N * 100 * M)
where M is the total number of prototype shapes across all Realms.

Since prototype chains never cross Realm boundaries, we can scope
the tracking to each Realm, making the invalidation cost independent
of the number of Realms in the process.
2026-03-24 08:28:47 +01:00
Andreas Kling
72fdf10248 LibJS: Don't cache dictionary shapes in NewObject premade shape cache
Dictionary shapes are mutable (properties added/removed in-place via
add_property_without_transition), so sharing them between objects via
the NewObject premade shape cache is unsafe.

When a large object literal (>64 properties) is created repeatedly in
a loop, the first execution transitions to a dictionary shape, which
CacheObjectShape then caches. Subsequent iterations create new objects
all pointing to the same dictionary shape. If any of these objects adds
a new property, it mutates the shared shape in-place, increasing its
property_count, but only grows its own named property storage. Other
objects sharing the shape are left with undersized storage, leading to
a heap-buffer-overflow when the GC visits their edges.

Fix this by not caching dictionary shapes. This means object literals
with >64 properties won't get the premade-shape fast path, but such
literals are uncommon.
2026-03-22 11:10:01 -05:00
InvalidUsernameException
61e6dbe4e7 LibJS: Copy object of member expression to preserve evaluation order
Noticed this pattern when reading some minified JS while debugging a
seemingly unrelated problem and immediately got suspicious because of my
earlier, similar fixes.
2026-03-22 15:40:38 +01:00
InvalidUsernameException
a94d8a1f78 LibJS: Remove some dead code
Looks like the combined parse and compile code path wasn't used
(anymore).
2026-03-22 15:40:38 +01:00
Andreas Kling
cf7576728d LibJS: Collect hoisted var declarations when checking module exports
The check_undeclared_exports() function only looked at top-level
VariableDeclaration nodes when collecting declared names. This missed
var declarations nested inside for loops, if/else blocks, try/catch,
switch statements, etc. Since var declarations are hoisted to the
enclosing module scope, they are valid export targets.

This caused legitimate modules (like the claude.ai JS bundle) to fail
with "'name' in export is not declared" errors when a var was declared
inside a for loop and later exported.

Fix this by recursively walking nested statements to collect var
declarations, while correctly not crossing function boundaries (since
var does not hoist out of functions) and not collecting block-scoped
let/const declarations.
2026-03-22 09:24:44 -05:00
pwespi
3dc3bcb556 LibJS: Fix expected SyntaxErrors for private fields 2026-03-20 16:06:51 -05:00
Ollie Hensman-Crook
df8ead1f12 LibJS: Treat concise methods as non-constructors 2026-03-20 15:58:05 -05:00
Johan Dahlin
1179e40d3f LibJS: Eliminate GeneratorResult GC cell allocation on yield/await
Store yield_continuation and yield_is_await directly in
ExecutionContext instead of allocating a GeneratorResult GC cell.
This removes a heap allocation per yield/await and fixes a latent
bug where continuation addresses stored as doubles could lose
precision.
2026-03-20 15:57:23 -05:00
Andreas Kling
943319453d LibJS: Fix syntax highlighter position starting at invalid sentinel
The RehighlightState designated initializer used `.position = {}`
which invokes TextPosition's default constructor, initializing line
and column to 0xFFFFFFFF (the "invalid" sentinel). This overrode
the struct's default member initializer of { 0, 0 }.

When advance_position() processed the first newline, it incremented
0xFFFFFFFF to 0x100000000, producing line numbers in the billions.
These bogus positions propagated into folding regions, causing an
out-of-bounds crash in Document::set_folding_regions() when viewing
page source on pages with <script> blocks.

Fix by explicitly initializing position to { 0, 0 }.

Fixes #8529.
2026-03-20 15:32:33 +01:00
Timothy Flynn
b3795eb5bb LibJS: Handle time zone gaps in JS::utc_time
Commit 88365031f2 added support for time
zone gaps in general, but missed this method.
2026-03-20 14:46:46 +01:00
Jelle Raaijmakers
6ce327f715 LibJS: Reduce size of Optional<EnvironmentCoordinate>
Reduces the size of `Optional<EnvironmentCoordinate>` from 12 to 8
bytes, and by reordering the fields in `Reference` we shrink that down
from 64 to 56 bytes as well.
2026-03-20 12:03:36 +01:00
Jelle Raaijmakers
e123d48043 AK: Add SentinelOptional
We specialize `Optional<T>` for value types that inherently support some
kind of "empty" value or whose value range allow for a unlikely to be
useful sentinel value that can mean "empty", instead of the boolean flag
a regular Optional<T> needs to store. Because of padding, this often
means saving 4 to 8 bytes per instance.

By extending the new `SentinelOptional<T, Traits>`, these
specializations are significantly simplified to just having to define
what the sentinel value is, and how to identify a sentinel value.
2026-03-20 12:03:36 +01:00
Andreas Kling
bb0acb54ae LibJS: Optimize x >> 0 to ToInt32 in bytecode codegen
x >> 0 is a common JS idiom equivalent to ToInt32(x). We already had
this optimization for x | 0, now do it for right shift by zero as well.

This allows the asmint handler for ToInt32 to run instead of the more
expensive RightShift handler, which wastes time loading and checking the
rhs operand and performing a shift by zero.
2026-03-20 00:51:23 -05:00
Andreas Kling
02b0746676 LibJS: Deduplicate double constants in bytecode generator
Add a deduplication cache for double constants, matching the existing
approach for int32 and string constants. Multiple references to the
same floating-point value now share a single constant table entry.
2026-03-20 00:51:23 -05:00
Andreas Kling
144ab69715 LibJS: Remove C++ pipeline compatibility hacks from Rust codegen
Now that the C++ bytecode pipeline has been removed, we no longer
need to match its register allocation or block layout. This removes:

- All manual drop() calls that existed solely to match C++ register
  lifetimes, replaced with scope blocks to naturally limit register
  lifetimes without increasing register pressure.

- The unnecessary saved_property copy in update expressions. The
  property register is now used directly since emit_update_op
  doesn't evaluate user expressions that could mutate it. The copy
  is retained in compound/logical assignments where the RHS can
  mutate the property variable (e.g. a[i] |= a[++i]).

- All "matching C++", "Match C++", etc. comments throughout
  codegen.rs and generator.rs that referenced the removed pipeline.
2026-03-20 00:51:23 -05:00
Andreas Kling
bc4379983f LibJS: Improve bytecode executable dump format
Add a metadata header showing register count, block count, local
variable names, and the constants table. Resolve jump targets to
block labels (e.g. "block1") instead of raw hex addresses, and add
visual separation between basic blocks.

Make identifier and property key formatting more concise by using
backtick quoting and showing base_identifier as a trailing
parenthetical hint that joins the base and property names.

Generate a stable name for each executable by hashing the source
text it covers (stable across codegen changes). Named functions
show as "foo$9beb91ec", anonymous ones as "$43362f3f". Also show
the source filename, line, and column.
2026-03-20 00:51:23 -05:00
Andreas Kling
f5eea4d232 LibJS: Fix catch parameter and new.target regressions
- Restrict catch parameter conflict check to only direct children
  of the catch body block, not nested scopes
- Set new_target_is_valid for dynamic function compilation (new
  Function)
- Move check_parameters_post_body before flag restoration in
  parse_method_definition so generator methods inside static init
  blocks correctly allow 'await' as a parameter name
2026-03-19 23:15:03 -05:00
Andreas Kling
5374f0a85c LibJS: Add more early errors in Rust parser
- Reject duplicate bindings in catch parameter patterns
- Reject redeclaration of catch parameter with let/const/function
- Reject binding patterns with initializers in for-in heads (AnnexB
  only permits simple BindingIdentifier with initializer)
- Reject 'await' as binding identifier in class static init blocks
  and module code
2026-03-19 23:15:03 -05:00
Andreas Kling
49cc44a3eb LibJS: Reject arguments/eval in strict mode destructuring and arrows
Check identifier name validity for destructuring assignment pattern
bound names, and validate arrow function parameters after the arrow
is confirmed rather than during speculative parameter parsing.

This fixes arguments/eval as destructuring assignment targets and as
arrow function parameter names in strict mode.
2026-03-19 23:15:03 -05:00
Andreas Kling
66dbb355fe LibJS: Reject new.target in arrow functions at global scope
Arrow functions don't have their own new.target binding -- they
inherit from the enclosing scope. At the global level, there is no
enclosing function, so new.target inside a global arrow is invalid.

Add a new_target_is_valid flag to ParserFlags that is set to true
when entering regular (non-arrow) function bodies, method
definitions, and class static init blocks. Arrow functions inherit
the flag from their enclosing scope rather than setting it.
2026-03-19 23:15:03 -05:00
Andreas Kling
6029a3d40e LibJS: Add missing early errors in Rust parser
- Reject `true`, `false`, `null` as label identifiers
- Reject generator declarations in if-statement bodies (not covered
  by Annex B)
- Reject `await` as label in class static init blocks and modules
- Reject `arguments` in class static initialization blocks
- Reject generator shorthand without method body in object literals
- Reject `get constructor()` / `set constructor()` in class bodies
- Reject `super.#private` member access
2026-03-19 23:15:03 -05:00
Andreas Kling
f491d44b3b LibJS: Replace ScopedOperand with Operand in bytecode ops
ScopedOperand was a ref-counted wrapper around Operand used by the
C++ bytecode Generator for register lifetime tracking. Now that the
Generator is gone, it's just a pointless indirection.

Update the bytecode def code generator to emit Operand directly
instead of ScopedOperand in variable-argument op constructors, and
delete ScopedOperand.h.
2026-03-19 21:55:10 -05:00
Andreas Kling
362207b45d LibJS: Remove remaining C++ pipeline artifacts
Clean up leftover references to the removed C++ pipeline:

- Remove stale forward declarations from Forward.h (ASTNode,
  Parser, Program, FunctionNode, ScopeNode, etc.)
- Delete unused FunctionParsingInsights.h
- Remove dead get_builtin(MemberExpression const&) declaration
  from Builtins.h
- Update stale comments referencing ASTCodegen.cpp and
  generate_bytecode()
2026-03-19 21:55:10 -05:00
Andreas Kling
30f108ba36 LibJS: Remove C++ lexer, use Rust tokenizer for syntax highlighting
Delete Lexer.cpp/h and Token.cpp, replacing all tokenization with a
new rust_tokenize() FFI function that calls back for each token.

Rewrite SyntaxHighlighter.cpp and js.cpp REPL to use the Rust
tokenizer. The token type and category enums in Token.h now mirror
the Rust definitions in token.rs.

Move is_syntax_character/is_whitespace/is_line_terminator helpers
into RegExpConstructor.cpp as static functions, since they were only
used there.
2026-03-19 21:55:10 -05:00
Andreas Kling
8ec7e7c07c LibJS: Remove C++ AST
Delete AST.cpp, AST.h, ASTDump.cpp, ScopeRecord.h, and the dead
get_builtin(MemberExpression const&) from Builtins.cpp.

Extract ImportEntry and ExportEntry into a new ModuleEntry.h,
since they are data types used by the module system, not AST
node types.

Inline ModuleRequest's sorting constructor and
SourceRange::filename().

Remove the dead annex_b_function_declarations field from
EvalDeclarationData, which was only populated by the C++ parser.
2026-03-19 21:55:10 -05:00
Andreas Kling
169452f41b LibJS: Remove C++ parser
Delete Parser.cpp/h and ScopeCollector.cpp/h, now that all parsing
goes through the Rust pipeline.

Port test262-runner to use RustIntegration::parse_program() for its
fast parse-only check instead of the C++ Parser.

Add parsed_program_has_errors() and free_parsed_program() to the
RustIntegration public API for parse-only use cases.
2026-03-19 21:55:10 -05:00
Andreas Kling
1f6ca58e55 LibJS: Remove C++ AST constructor from SharedFunctionInstanceData
Remove the constructor that took C++ AST nodes (FunctionParameters,
Statement), along with create_for_function_node() and the
m_formal_parameters / m_ecmascript_code fields. These were only used
by the now-removed C++ compilation pipeline.

Also remove the dead EvalDeclarationData::create(VM&, Program&, bool)
and ECMAScriptFunctionObject::ecmascript_code() accessor.
2026-03-19 21:55:10 -05:00
Andreas Kling
c25227d324 LibJS: Remove C++ bytecode codegen
Delete the C++ bytecode code generator, now that all compilation goes
through the Rust pipeline:

- Bytecode/ASTCodegen.cpp (4417 lines)
- Bytecode/Generator.cpp (1961 lines)
- Bytecode/Generator.h (535 lines)
- Bytecode/ScopedOperand.cpp (23 lines)

Also remove all generate_bytecode() and generate_labelled_evaluation()
virtual method declarations from AST.h, and their associated Bytecode
includes.
2026-03-19 21:55:10 -05:00
Andreas Kling
272562ddc5 LibJS: Remove dead C++ bytecode compilation functions
Remove Bytecode::compile() and the old create() overloads on
ECMAScriptFunctionObject that accepted C++ AST nodes. These
have no remaining callers now that all compilation goes through
the Rust pipeline.

Also remove the if-constexpr Parse Node branch from
async_block_start, since the Statement template instantiation
was already removed.

Fix transitive include dependencies on Generator.h by adding
explicit includes for headers that were previously pulled in
transitively.
2026-03-19 21:55:10 -05:00
Andreas Kling
3518efd71c LibJS+LibWeb: Port remaining callers to Rust pipeline
Port all remaining users of the C++ Parser/Lexer/Generator to
use the Rust pipeline instead:

- Intrinsics: Remove C++ fallback in parse_builtin_file()
- ECMAScriptFunctionObject: Remove C++ compile() fallback
- NativeJavaScriptBackedFunction: Remove C++ compile() fallback
- EventTarget: Port to compile_dynamic_function
- WebDriver/ExecuteScript: Port to compile_dynamic_function
- LibTest/JavaScriptTestRunner.h: Remove Parser/Lexer includes
- FuzzilliJs: Remove unused Parser/Lexer includes

Also remove the dead Statement-based template instantiation of
async_block_start/async_function_start.
2026-03-19 21:55:10 -05:00
Andreas Kling
0c7d50b33d LibJS: Remove LIBJS_CPP env var and ENABLE_RUST guards
The Rust pipeline is now the only compilation path, so remove:

- The LIBJS_CPP environment variable check
- The rust_pipeline_enabled() helper
- The #ifdef ENABLE_RUST / #else stub section
- The test-js-cpp CTest target and LIBJS_TEST_PARSER_MODE env var
- The ParserMode enum and canParseSourceWithCpp/Rust test functions

rust_pipeline_available() now unconditionally returns true.
2026-03-19 21:55:10 -05:00
Andreas Kling
77cd434710 LibJS: Remove C++ compiler pipeline fallback paths
Now that the Rust pipeline is the sole compilation path, remove all
C++ parser/codegen fallback paths from the callers:

- Script::parse() no longer falls back to C++ Parser
- SourceTextModule::parse() no longer falls back to C++ Parser
- perform_eval() no longer falls back to C++ Parser + Generator
- create_dynamic_function() no longer falls back to C++ Parser
- ShadowRealm eval no longer falls back to C++ Parser + Generator
- Interpreter::run(Script&) no longer falls back to Generator

Also remove the now-dead old constructors that took C++ AST nodes,
the module_requests() helper, and AST dump code from js.cpp.
2026-03-19 21:55:10 -05:00
Andreas Kling
2c45472a11 LibJS: Remove pipeline comparison infrastructure
Remove PipelineComparison.cpp/h and all LIBJS_COMPARE_PIPELINES
support from RustIntegration.cpp. This includes:

- The compare_pipelines_enabled() function
- All comparison blocks in compile_script/eval/module/function
- The pair_shared_function_data() helper
- The m_cpp_comparison_sfd field on SharedFunctionInstanceData

The Rust pipeline has been validated extensively through comparison
testing and no longer needs the side-by-side verification harness.
2026-03-19 21:55:10 -05:00
Andrew Kaster
f06bd0303f LibJS: Use enum for retrieving well known symbols from C++ to Rust 2026-03-19 09:48:32 +01:00
Andrew Kaster
5d43707896 LibJS: Directly use LiteralValueKind enum across FFI boundary 2026-03-19 09:48:32 +01:00
Andreas Kling
3efd1a1bb5 LibJS: Reject duplicate params across destructuring patterns in C++
The C++ parser was not rejecting duplicate parameter names across
destructuring patterns in non-simple parameter lists. For example,
`function f({ bar, ...a }, { bar, ...b }) {}` was accepted despite
being a syntax error per spec.

The existing inline duplicate check only ran for identifier parameters,
missing the case where both parameters are binding patterns. Add a
post-parse pass that collects all bound names and checks for duplicates
when the parameter list is non-simple (or in strict mode/arrows).

Also fix existing tests that relied on the incorrect behavior and add
new test coverage for destructuring duplicate detection.
2026-03-19 09:43:11 +01:00
Andreas Kling
1ff61754a7 LibJS: Re-box double arithmetic results as Int32 when possible
When the asmint computes a double result for Add, Sub, Mul,
Math.floor, Math.ceil, or Math.sqrt, try to store it as Int32
if the value is a whole number in [INT32_MIN, INT32_MAX] and
not -0.0. This mirrors the JS::Value(double) constructor and
allows downstream int32 fast paths to fire.

Also add label uniquification to the DSL macro expander so the
same macro can be used multiple times in one handler without
label collisions.
2026-03-19 09:42:04 +01:00
Andreas Kling
5e403af5be LibJS: Tighten asmint ToInt32 boxing
Teach js_to_int32 to leave a clean low 32-bit result on success, then
use box_int32_clean in the ToInt32 fast path and adjacent boolean
coercions. This removes one instruction from the AArch64 fjcvtzs path
and trims the boolean boxing path without changing behavior.
2026-03-19 09:42:04 +01:00
Andreas Kling
645f481825 LibJS: Fast-path Float32Array indexed access
Add the small AsmIntGen float32 load, store, and conversion operations
needed to handle Float32Array directly in the AsmInt typed-array
GetByValue and PutByValue paths.

This covers direct indexed reads plus both int32 and double stores,
and adds regression coverage for Math.fround rounding, negative zero,
and NaN.
2026-03-19 09:42:04 +01:00
Andreas Kling
6614971e6f LibJS: Fast-path Uint8ClampedArray indexed access
Teach the asm typed-array GetByValue and PutByValue paths to handle
Uint8ClampedArray directly. Reads can share the Uint8Array load path,
while int32 stores clamp in asm instead of bailing out to C++.

Add a direct indexed access regression test for clamped int32 stores.
2026-03-19 09:42:04 +01:00
RubenKelevra
fae2f8f3ba LibJS: Align new-expression paren flags with C++ parser 2026-03-18 17:41:36 -05:00
RubenKelevra
3cb636ca38 LibJS: Keep new call-paren optional chaining valid 2026-03-18 17:41:36 -05:00
RubenKelevra
ea8fa63e79 LibJS: Reject optional chaining on unparenthesized new 2026-03-18 17:41:36 -05:00
RubenKelevra
04b27429de LibJS: Isolate super validity in nested function scopes 2026-03-18 17:41:36 -05:00
RubenKelevra
d8469c384d LibJS: Reject invalid bare private identifier usage 2026-03-18 17:41:36 -05:00
RubenKelevra
d6229a1cc8 LibJS: Fix async arrow and for-of async parsing 2026-03-18 17:41:36 -05:00
RubenKelevra
af777b5d86 LibJS: Align duplicate parameter early errors 2026-03-18 17:41:36 -05:00