Commit Graph

182 Commits

Author SHA1 Message Date
Andreas Kling
afa1f77252 LibJS: Materialize decoded bytecode cache blobs
Create parser-free script and module materializers for decoded cache
blobs. Cached functions create SFDs without Rust compile inputs and
attach their precompiled executable immediately, while declaration
metadata is populated from decoded records.

Treat cache blobs as external input from the HTTP disk cache. Run
bytecode validation unconditionally before fixing up cache pointers, and
reject decoded source ranges or metadata indices that would be
out-of-bounds during C++ materialization.

Report executable validation failures as parser errors so callers can
reject corrupt sidecars and fall back to source compilation. LibJS tests
cover corrupt top-level bytecode, declaration bytecode, and declaration
source spans.
2026-05-06 08:20:06 +02:00
Andreas Kling
b265694f0d LibJS: Match bytecode cache blobs to their source
Store a SHA-256 fingerprint of the decoded source text in each bytecode
cache blob, and require callers to provide the expected fingerprint when
validating or decoding a blob.

This rejects sidecars for stale HTTP cache entries whose URL and request
headers still match but whose source body has been replaced. Bytecode
cache tests cover the mismatched-source rejection path.
2026-05-06 08:20:06 +02:00
Andreas Kling
de9f2b8343 LibJS: Decode bytecode cache blobs over FFI
Expose an owned decoded bytecode cache handle through RustIntegration.
This lets C++ callers keep validated metadata and executable records in
Rust-owned cache structures without invoking the parser.

Extend the js bytecode cache validation mode to create and free the FFI
handle so blob generation exercises the ownership path.
2026-05-06 08:20:06 +02:00
Andreas Kling
fb11f81305 LibJS: Cache declaration function bytecode
Precompile top-level function declarations used during script and
module instantiation while producing a full bytecode cache entry. Keep
normal off-thread execution artifacts on the existing eager path, and
store declaration records separately from executable nested functions.

Place those records after declaration metadata in the cache blob, and
bump the blob version so older sidecars are rejected.
2026-05-06 08:20:06 +02:00
Andreas Kling
9b33cd1c21 LibJS: Return decoded bytecode cache blobs
Make bytecode cache validation return a decoded blob containing the
validated program record, declaration metadata, and executable records.

This keeps a single Rust-owned object alive for consumers that need to
materialize cached bytecode after validation succeeds.
2026-05-06 08:20:06 +02:00
Andreas Kling
16f4775a99 LibJS: Decode bytecode cache executable records
Decode bytecode cache executable records into owned Rust data instead
of only skipping over their serialized fields during validation.

Keep cached bytecode, constants, nested functions, and class blueprints
available for later materialization without rebuilding ASTs.
2026-05-06 08:20:06 +02:00
Andreas Kling
4f1bf52eb3 LibJS: Persist bytecode cache declaration metadata
Store and decode script declaration-instantiation metadata and module
import, export, request, and declaration metadata in bytecode cache
blobs.

Return decoded metadata as owned Rust records so warm-cache script and
module construction can recover parser-derived facts from the sidecar.
2026-05-06 08:20:06 +02:00
Andreas Kling
c5b6739c47 LibJS: Tag bytecode cache blobs with program type
Record whether each bytecode cache blob contains a classic script or a
module, and pass that type through the serializer call sites.

Require validation callers to provide the expected program type so
script and module sidecars cannot be reused for the wrong loader.
2026-05-06 08:20:06 +02:00
Andreas Kling
b327f61ab3 LibJS: Validate bytecode cache blob records
Add structural validation for bytecode cache blobs using the serialized
record layout. The decoder shares primitive helpers across cache
sections instead of duplicating flat parsing logic.

Reject bad magic values, unsupported versions, invalid enum tags, and
truncated sections before materialization. Bound sequence reads against
the remaining blob so malformed sidecars cannot force large allocations.
2026-05-06 08:20:06 +02:00
Andreas Kling
96a6782800 LibJS: Serialize compiled bytecode cache blobs
Add a versioned Rust bytecode cache writer for fully compiled programs.
The blob records executable bytecode, metadata tables, source maps,
exception handlers, nested function bytecode, and class blueprints
without materializing GC objects.

Expose the serialized blob through RustIntegration as an owned
ByteBuffer so Web-facing callers can store it as HTTP cache data.
2026-05-06 08:20:06 +02:00
Andreas Kling
f25f245b3c LibJS: Split full off-thread script compilation
Keep fetched script and module compilation on the latency-sensitive path
limited to top-level code and eager direct IIFE compilation before
returning bytecode to the main thread.

Add a separate full off-thread compile entry point for bytecode cache
generation. Cache jobs can use it to compile every nested function after
the execution path has already been unblocked.
2026-05-06 08:20:06 +02:00
Andreas Kling
3604259676 LibJS: Add off-thread function bytecode artifacts
Add Rust and C++ integration points for cloning lazy function compile
payloads, compiling them to GC-free bytecode off the main thread, and
materializing the result later on the main thread.

The cloned payload lets background compilation race with lazy
main-thread compilation without sharing AST ownership between threads.
Compiled function artifacts recursively include nested functions, so
materialization can discard the corresponding AST subtree.
2026-05-06 08:20:06 +02:00
Andreas Kling
d9dd412440 LibJS: Use foldhash in parser and scope-collector hash maps
The std default RandomState (SipHash) was using ~9 percentage points
of CPU on hash_one and write across the parse hot path, with the
string interner adding another ~3 pp on top. The cost was spread
across the interner, the scope collector's IndexMap<Utf16String, _>,
and several parser-side HashSet<Utf16String> declarations.

Use foldhash::quality::RandomState for the parser, scope collector,
and string interner via a new fast_hash module. Quality keeps
HashDoS resistance (keys are lexer tokens, attacker-controlled in a
browser context) while shedding SipHash's per-byte cost, and on this
workload it benchmarks slightly faster than foldhash::fast.
2026-05-05 13:53:51 +02:00
Andreas Kling
17d3a285a7 LibJS: Move ScopeData into ScopeArena and reference it by ScopeId
The Rust AST kept every scope in Rc<RefCell<ScopeData>>. The Rc made
the AST !Send (cross-thread codegen needed unsafe impl Send), and the
RefCell added a runtime borrow check on every hot-path read.

AST nodes (Block, FunctionBody, Program, SwitchStatement, SwitchCase)
now hold a ScopeId index into ScopeArena. The scope collector and
codegen take &mut/&ScopeArena, so the borrow checker enforces the
previously-implicit invariant that two phases never touch the same
scope at once.

ParsedProgram is now naturally Send. The unsafe impl Send and the
arc_with_non_send_sync allow go away. CompiledProgram keeps its
hand-rolled Send impl because it carries codegen-time state outside
the AST.

FunctionDeclarationData::is_hoisted was a Cell<bool> only because the
old &[ScopeRecord] traversal couldn't get &mut to the AST. It is now
a plain bool.
2026-05-05 13:53:51 +02:00
Andreas Kling
f2bf914874 LibJS: Intern identifier names in a per-arena string table
Identifier::name was SharedUtf16String (Rc<Utf16String>), so equality
checks against literals walked the slice and the Rc made the AST
!Send.

Replace it with a StringId (u32 index) backed by a StringInterner on
AstArena. Repeated names dedupe to the same id, so name comparisons
collapse to u32 == u32. The lexer's short/recent identifier caches
and the shared_identifier_value field on Token go away; the interner
already deduplicates everything.

Methods that previously took &mut IdentifierArena now also take
&StringInterner so they can resolve names from StringId during
analyze. Codegen helpers in bytecode/codegen.rs uniformly take
&AstArena. Generator gains intern_identifier_id, intern_property_key_id,
and intern_string_id helpers.
2026-05-05 13:53:51 +02:00
Andreas Kling
9ace32e276 LibJS: Drop Cell<> wrappers from Identifier scope-analysis fields
scope_collector now reaches Identifier through &mut IdentifierArena
indexing instead of through Rc<Identifier>'s shared reference, so the
Cell<> wrappers on local_type, local_index, is_global,
is_inside_scope_with_eval, and declaration_kind no longer earn their
keep.

Replace each Cell<T> with a plain T. The borrow checker now enforces
the existing "only scope_collector mutates these post-parse"
invariant. Shrinks Identifier and removes a layer of indirection on
hot-path field reads in codegen and ast_dump.
2026-05-05 13:53:51 +02:00
Andreas Kling
3e15e59cd1 LibJS: Move identifiers into a contiguous IdentifierArena
Replace per-AST-node Rc<Identifier> with a Copy IdentifierId index
into a Vec<Identifier> arena, plumbed through Parser, scope_collector,
codegen, ast_dump, and the FFI. The arena lives on the parser during
parse, ships out via Arc<AstArena> on ParsedProgram, and is shared by
each child Generator and FunctionPayload through Arc clones.

Eliminates the per-occurrence Rc::new in the parser: every identifier
reference, parameter binding, function name, class name, and
binding-pattern target lands in the arena's Vec instead of getting its
own malloc plus Rc control block. Identifier field reads in codegen
become direct array indexing.

Identifier still carries Cell<>-wrapped scope-analysis state, so
AstArena is not yet Send + Sync; the existing unsafe-impl-Send wrapper
on ParsedProgram covers cross-thread handoff. Removing the Cells is
the next step.
2026-05-05 13:53:51 +02:00
Andreas Kling
d9b9925914 LibJS: Make CompiledRegex thread-safe with Arc + AtomicPtr
The regex literal handle is shared between AST clones (e.g. class
field initializers reuse the same compiled regex), so shared ownership
has to stay. Switch from Rc to Arc and from Cell<*mut c_void> to
AtomicPtr<c_void> so the regex can travel with a function payload to
a worker thread without UB on the non-atomic Rc refcount.
2026-05-05 13:53:51 +02:00
Andreas Kling
5b2dd60d11 LibJS: Add AST arena types for identifiers, scopes, and interned strings
Introduce IdentifierArena, ScopeArena, and StringInterner plus their
opaque IdentifierId/ScopeId/StringId index newtypes, bundled under
AstArena.
2026-05-05 13:53:51 +02:00
Andreas Kling
47c82b38d0 LibJS: Range-check enum-typed bytecode fields in the validator
The validator now bounds-checks the five enum-shaped field types
that appear in Bytecode.def: Completion::Type, IteratorHint,
EnvironmentMode, PutKind, and ArgumentsKind. The codegen
recognizes each by its .def type name and emits a u32 read plus a
range check against the corresponding variant count.

The variant counts ride across the FFI as new fields on
FFIValidatorBounds rather than being hardcoded on the Rust side,
so the Rust validator never has to know which variants the C++
enum currently defines. The C++ side computes each count as
`to_underlying(LastVariant) + 1` with a static_assert pinning the
expected value, so adding or removing a variant in any of these
enums fails the build until the validator is updated.
2026-05-03 08:43:19 +02:00
Andreas Kling
2782fa1559 LibJS: Tighten the bytecode validator's argument operand bound
Until now the validator passed `u32::MAX` as the argument-region
upper bound because nothing on Executable tracked how many
argument slots a given bytecode buffer might reference. That left
the largest validation hole open: any flat operand index above
`registers + locals + constants` slid through the check.

The Rust assembler already walks every operand during phase 1 so
it can offset each one into the runtime's flat layout. This commit
piggybacks on that walk to record the highest `Operand::argument`
index touched and surfaces `(max + 1)` (or zero if no argument is
ever referenced) on `AssembledBytecode`. The value rides through
`FFIExecutableData` onto a new `Executable::number_of_arguments`
field, which `Validator.cpp` then feeds into `FFIValidatorBounds`.

The bound is now tight: every operand index in the encoded stream
is range-checked against the actual runtime array size, including
the argument region.
2026-05-03 08:43:19 +02:00
Andreas Kling
306060b448 LibJS: Add negative tests for the bytecode validator
Until now we had only confirmed that real, encoder-produced bytecode
passes the validator. That tells us we don't false-fail, but says
nothing about whether we actually catch a corrupted buffer.

This commit fills that gap with a set of Rust unit tests that
hand-craft minimal buffers and assert that each error category
triggers exactly when expected.

Coverage spans the three passes: unknown opcodes and truncated /
misaligned instructions for the structural walk, operand and label
out-of-range cases for the per-instruction checks, and basic block
/ exception handler / source map offsets for the structural
metadata pass. There's also a pair of cache-pointer tests that
pin the BeforeFixup vs AfterFixup behavior down: an out-of-range
cache index is rejected before fixup and silently skipped after,
because by then the slot holds a real pointer.

To make `cargo test` work for the staticlib crate without dragging
in the C++ allocator, RustAllocator falls back to the standard
system allocator under cfg(test). The test harness only ever runs
in cargo's test profile, so the production builds keep using the
ladybird-side allocator unchanged.
2026-05-03 08:43:19 +02:00
Andreas Kling
d3ca680a62 LibJS: Validate basic blocks, exception handlers, and source map
Pass 3 cross-checks the structural metadata stored alongside the
bytecode buffer on Executable against the offset set built during
Pass 1. Every basic block start offset must point at an instruction
boundary; exception handler start, end, and handler offsets must
either be at an instruction boundary or, for the inclusive-start /
exclusive-end pair, equal to the bytecode length; source map
entries must do the same.

Of these, the exception handler's handler_offset is the safety-
critical one for the disk-cache use case: a corrupted offset there
sends control flow into the middle of an instruction. The other
checks tighten the cache-load surface area and catch obvious file
corruption.

The metadata is plumbed across the FFI as a separate
FFIValidatorExtras struct so the validator entry point keeps the
single-call shape, with a flat-offset mirror struct for exception
handlers since the original carries no source data we need.
2026-05-03 08:43:19 +02:00
Andreas Kling
0b8fbc03ef LibJS: Add per-field bytecode validation generated from Bytecode.def
Pass 2 of the validator now runs a per-instruction check that walks
each opcode's fields and verifies every reference points somewhere
sensible. Operand indices, label addresses, identifier/string/
property-key/regex table indices, cache indices, and trailing
operand arrays are all bound-checked against the values the C++
side carries on the Executable. Fields whose bound depends on an
enum variant count or other type information not present in
Bytecode.def are left for a follow-up.

The codegen lives in build.rs and reuses the existing layout
machinery from the bytecode_def crate, so each opcode gets a match
arm whose body reads each field at its known byte offset and calls
the right hand-written validate_* helper. Variable-length
instructions cross-check the count field against m_length before
iterating the trailing array, which guards against an attacker
sneaking a count that walks off the end of the instruction.

Note that the encoded operand format is a flat u32 index into the
runtime [registers | locals | constants | arguments] array, since
Operand::offset_index_by zeroes the 3-bit type tag during assembly.
The validator therefore range-checks the flat index rather than
reading the type tag and dispatching per kind.

The argument-count upper bound isn't tracked on Executable yet, so
arguments remain effectively unbounded; tightening that bound is
left for a later commit.

Cache pointer fields are validated only when before_cache_fixup is
true, since after the fixup pass they hold real pointers and must
be left alone. NewFunction and NewClass have plain u32 fields for
shared-function-data and class-blueprint indices; those are
recognized by name in the codegen so the indices still get
range-checked.

The error category enum is renumbered to drop the per-operand-kind
codes, since at the bytecode level we no longer differentiate.
2026-05-03 08:43:19 +02:00
Andreas Kling
d4ed658429 LibJS: Add bytecode validator scaffolding driven from Bytecode.def
The plan is to start caching compiled JS bytecode on disk. Before
loading anything from a cache we need confidence that the bytes are
structurally well-formed, since a corrupted or tampered-with cache
file could otherwise hand the interpreter an out-of-bounds jump or a
constant-pool index that points past the end of the table.

This commit lays down the scaffolding for that validator. The walker
lives in Rust (Libraries/LibJS/Rust/src/bytecode/validator.rs) so
that it can share the existing Bytecode.def-driven layout machinery
with the encoder. C++ calls into it through cbindgen, the same way
the rest of the Rust pipeline is wired up.

For now, the validator only does Pass 1: walk the byte stream,
verify each instruction is 8-byte aligned, the opcode byte is in
range, and the reported length keeps us inside the buffer. The
length lookup is generated from Bytecode.def so fixed-length and
variable-length instructions stay in sync with the rest of the
codegen automatically. Per-field bounds checks (operands, labels,
table indices, cache indices) and structural extras (basic block
offsets, exception handlers, source map) come in follow-up commits.

The validator runs after every successful compilation in debug and
sanitizer builds, gated on !NDEBUG || HAS_ADDRESS_SANITIZER, so we
get an extra sanity check on every executable the encoder produces
without paying for it in release builds. Failure trips a
VERIFY_NOT_REACHED with the offset, opcode, and error category
logged via dbgln().
2026-05-03 08:43:19 +02:00
Aliaksandr Kalenik
2171563daf LibJS: Avoid function envs for lexical-this arrows
Track whether a function needs environment-backed this resolution
separately from whether it needs to allocate its own function
environment. Arrow functions that only capture lexical this can now
resolve through the outer environment without allocating an empty
function environment for every call.

Keep the asm Call path conservative by routing functions that still need
lexical-this resolution through the C++ inline-call helper, so the call
receiver is not cached as the arrow function's this value.

Microbenchmark:

    function makeLexicalThisArrow() {
        return () => this.value;
    }

    let object = { value: 1, makeLexicalThisArrow };
    let fn = object.makeLexicalThisArrow();
    for (let i = 0; i < 20_000_000; ++i)
        fn();

Measured with the same Release build toggling this patch:

    baseline:  1069.2 ms mean over 12 runs
    optimized:  501.2 ms mean over 12 runs
    speedup:    2.13 times faster
2026-04-30 18:44:34 +02:00
Andreas Kling
e65e85cb8c LibJS: Materialize arguments object for shorthand { arguments }
The parser only set `might_need_arguments_object` when an `arguments`
or `eval` Identifier went through `consume()`, but shorthand object
properties create the reference via `make_identifier()` directly. As
a result `function f() { return { arguments } }` allocated an
`arguments` local, never initialized it, and crashed at runtime when
the property was read.

Fall back to scope-driven detection: if scope analysis allocated a
non-lexical `arguments` local for the function, treat it as a real
arguments-object reference and emit `CreateArguments`. Skip the
fallback when a function declaration named `arguments` claims the
local, since that local belongs to the function, not the arguments
object.

Add a runtime test covering shorthand inside a free function and a
method, plus a regression test for `({ eval } = ...)` to confirm
destructuring assignment doesn't accidentally trigger arguments
materialization.
2026-04-27 08:04:11 +02:00
Andreas Kling
c1bc0cdfa9 LibJS: Allocate local variable indices in source order
The scope collector stored identifier_groups and variables in
HashMaps and then sorted them alphabetically before assigning local
register indices. The sorts existed only because HashMap iteration
order is non-deterministic; alphabetical was a stable choice for
comparing bytecode against the now-removed C++ port.

Switch both maps to indexmap::IndexMap so iteration follows the order
of first reference (= source order), and drop the alphabetical sorts.
Local indices now reflect declaration order, which matches what shows
up in bytecode dumps and is easier to read alongside the source.

Add a focused bytecode test using zebra/yak/aardvark to pin the new
allocation order; existing tests using let/var declarations have
their local indices renumbered to match.
2026-04-27 08:04:11 +02:00
Andreas Kling
010deec578 LibJS: Build functions_to_initialize in source order
ECMAScript hoisting keeps the LAST function declaration with a given
name. The Rust scope_collector and script GDI extraction implemented
this with a single reverse scan that pushed first-seen entries, which
left the resulting list in REVERSE source order. The C++ side then
iterated `m_functions_to_initialize.in_reverse()` to undo that.

Switch the Rust side to a two-pass forward scan that records the last
position per name and emits entries in source order, and drop the
matching `.in_reverse()` calls in Script.cpp and AbstractOperations.cpp.
Same hoisting semantics; NewFunction emission and global property
iteration order now follow the source.

The HashMap that tracks last positions is keyed on `SharedUtf16String`,
so each insert is a refcount bump on the AST's existing Rc instead of
a deep `Vec<u16>` clone.

Add bytecode tests at script and nested-function scope that exercise
multiple declarations and a duplicate name to pin the new ordering.
2026-04-27 08:04:11 +02:00
Andreas Kling
30394ece8d LibJS: Use natural source positions for parser-synthesized identifiers
The Rust parser used to copy several "rule_start"-derived positions
from the C++ implementation: every identifier inside a binding pattern
inherited the pattern's `[`/`{` position, every property identifier
after `.` inherited the period's position, every spread element
inherited the surrounding `[`/`{` position, and identifier-name
property keys inherited the object/class start position. This was
useful while comparing bytecode against the C++ port; with the C++
side gone, those quirks just hide the actual source positions in
source maps and devtools.

Drop the dedicated `binding_pattern_start` parser field and the
`ident_pos_override` parameter on `parse_property_key`, and capture
each identifier's own start position at the consume site.

Add an AST snapshot test that pins the new per-identifier positions
for object, array, nested, and parameter binding patterns.
2026-04-27 08:04:11 +02:00
Andreas Kling
cec0be6f3d LibJS: Replace in_property_key_context flag with explicit consume helper
The parser used to suppress the arguments/eval reference check via a
state flag that was set during the entire `parse_property_key` call.
That was over-broad: identifiers inside a computed property key like
`{ [arguments]: 1 }` are real references, but the flag silenced their
check too, leaving the function unmarked as needing the arguments
object. Reading the resulting property at runtime crashed.

Replace the flag with a `consume_property_key_token()` method used at
the specific consume sites for the property key token itself, so the
suppression is narrow. Inner consumes inside computed keys now go
through regular `consume()` and run the check normally.

Add a focused AST snapshot test covering plain, shorthand, computed,
binding-pattern, and method-name property-key cases.
2026-04-27 08:04:11 +02:00
Andreas Kling
141063e91d LibJS: Precompile top-level IIFEs off-thread
Mark direct calls to function expressions while generating top-level
Rust bytecode, then compile those functions before returning the
off-thread compilation result to WebContent.

The main thread still performs all VM and GC-backed materialization. It
now receives an already assembled executable for each eager IIFE and
attaches it to the SharedFunctionInstanceData while creating the parent
Executable. Nested functions owned by the eager executable remain lazy.

This targets large wrapper IIFEs that are invoked as soon as top-level
code starts running. Their bytecode generation now runs on the existing
script compilation worker instead of blocking the main thread on first
call.
2026-04-26 21:51:52 +02:00
Andreas Kling
4a7dc45b3f LibWeb+LibJS: Compile fetched top-level JS off-thread
Split Rust program compilation so code generation and assembly finish
before the main thread materializes GC-backed executable objects. The
new CompiledProgram handle owns the parsed program, generator state, and
bytecode until C++ consumes it on the main thread.

Wire WebContent script fetching through that handle for classic scripts
and modules. Syntax-error paths still return ParsedProgram, so existing
error reporting stays in place. Successful fetches now do top-level
codegen on the thread pool before deferred_invoke hands control back to
the main thread.

Executable creation, SharedFunctionInstanceData materialization, module
metadata extraction, and declaration data extraction still run on the
main thread where VM and GC access is valid.
2026-04-26 21:51:52 +02:00
Andreas Kling
0d120019df LibJS: Resolve VM constants during executable creation
Rust bytecode generation still reached into the VM to encode well-known
symbols and intrinsic abstract-operation functions as raw JS::Value
constants. That is not compatible with running top-level code generation
away from the main thread.

Keep those constants symbolic in the Rust constant pool instead. The C++
Executable materialization step now resolves them into real VM values
while it is already decoding the rest of the constant table on the main
thread.

This removes another VM dependency from Rust bytecode emission without
changing when the resulting constants become visible to the bytecode
interpreter.
2026-04-26 21:51:52 +02:00
Andreas Kling
56625c13d7 LibJS: Defer function data materialization
Rust bytecode generation currently creates SharedFunctionInstanceData
and ClassBlueprint GC objects as soon as nested functions and classes
are encountered. That keeps the whole code generation phase tied to
the main-thread VM and heap.

Record pending descriptors on the Generator instead, then materialize
those descriptors while creating the C++ Executable. This keeps the GC
allocation boundary exactly where it already belongs, but removes the
last direct function-data allocations from the codegen walk.

This is a preparatory step for compiling top-level bytecode off-thread
and only doing C++ materialization after returning to the main thread.
2026-04-26 21:51:52 +02:00
Andreas Kling
63d6ef1026 LibJS: Track nested function ids during Rust parsing
FunctionTable::extract_reachable() used to rediscover a function's
nested functions by walking the full body and parameter list during
bytecode generation. This is hot during page loading because creating
every lazy SFD pays for an extra structural AST traversal.

Record each parser-created function's direct child function ids while
parsing instead. Extraction can then recursively move that known
subtree without scanning the enclosing function again.

Keep the old structural scan for codegen-synthesized wrappers, such as
class field initializers, where no parser function context exists.
This preserves the sparse FunctionTable storage while making the common
extraction path proportional to the nested function count.
2026-04-26 21:51:52 +02:00
Timothy Flynn
5c34c7f554 Meta: Move python code generators to a subdirectory
Let's have a bit of organization here, rather than an ever-growing Meta
folder.
2026-04-23 07:31:19 -04:00
Andreas Kling
eb9432fcb8 LibJS: Preserve source positions in bytecode source maps
Carry full source positions through the Rust bytecode source map so
stack traces and other bytecode-backed source lookups can use them
directly.

This keeps exception-heavy paths from reconstructing line and column
information through SourceCode::range_from_offsets(), which can spend a
lot of time building SourceCode's position cache on first use.

We're trading some space for time here, but I believe it's worth it at
this tag, as this saves ~250ms of main thread time while loading
https://x.com/ on my Linux machine. :^)

Reading the stored Position out of the source map directly also exposed
two things masked by the old range_from_offsets() path: a latent
off-by-one in Lexer::new_at_offset() (its consume() bumped line_column
past the character at offset; only synthesize_binding_pattern() hit it),
and a (1,1) fallback in range_from_offsets() that fired whenever the
queried range reached EOF. Fix the lexer, then rebaseline both the
bytecode dump tests (no more spurious "1:1") and the destructuring AST
tests (binding-pattern identifiers now report their real columns).
2026-04-22 22:34:54 +02:00
Andreas Kling
51758f3022 LibJS: Make bytecode register allocator O(1)
Generator::allocate_register used to scan the free pool to find the
lowest-numbered register and then Vec::remove it, making every
allocation O(n) in the size of the pool. When loading https://x.com/
on my Linux machine, we spent ~800ms in this function alone!

This logic only existed to match the C++ register allocation ordering
while transitioning from C++ to Rust in the LibJS compiler, so now
we can simply get rid of it and make it instant. :^)

So drop the "always hand out the lowest-numbered free register" policy
and use the pool as a plain LIFO stack. Pushing and popping the back
of the Vec are both O(1), and peak register usage is unchanged since
the policy only affects which specific register gets reused, not how
aggressively.
2026-04-21 13:59:55 +02:00
Timothy Flynn
10ce847931 LibJS+LibUnicode: Use LibUnicode as appropriate for lexing JavaScript
Now that LibUnicode exports its character type APIs in Rust, we can use
them to lex identifiers and whitespace.

Fixes #8870.
2026-04-19 10:39:26 +02:00
Andrew Kaster
f26cb24751 Rust: Add a config file for rustfmt
This sets max_width to 120, which causes a lot of reformatting.
2026-04-18 08:05:47 -04:00
Andreas Kling
530f6fb05c LibJS: Fold nested Rust match conditionals
Move several let/const checks and the `instanceof` keyword check into
match guards.
2026-04-16 22:44:41 +02:00
Andreas Kling
c301a21960 LibJS: Skip preserving zero-argument call callees
The callee and this-value preservation copies only matter while later
argument expressions are still being evaluated. For zero-argument calls
there is nothing left to clobber them, so we can keep the original
operand and let the interpreter load it directly.

This removes the hot Mov arg0->reg pattern from zero-argument local
calls and reduces register pressure.
2026-04-13 18:29:43 +02:00
Andreas Kling
3a08f7b95f LibJS: Drop dead entry GetLexicalEnvironment loads
Teach the Rust bytecode generator to treat the synthetic entry
GetLexicalEnvironment as a removable prologue load.

We still model reg4 as the saved entry lexical environment during
codegen, but assemble() now deletes that load when no emitted
instruction refers to the saved environment register. This keeps the
semantics of unwinding and environment restoration intact while letting
empty functions and other simple bodies start at their first real
instruction.
2026-04-13 18:29:43 +02:00
Andreas Kling
3e18136a8c LibJS: Add a String.fromCharCode builtin opcode
Specialize only the fixed unary case in the bytecode generator and let
all other argument counts keep using the generic Call instruction. This
keeps the builtin bytecode simple while still covering the common fast
path.

The asm interpreter handles int32 inputs directly, applies the ToUint16
mask in-place, and reuses the VM's cached ASCII single-character
strings when the result is 7-bit representable. Non-ASCII single code
unit results stay on the dedicated builtin path via a small helper, and
the dedicated slow path still handles the generic cases.
2026-04-12 19:15:50 +02:00
Andreas Kling
7bc40bd54a LibJS: Add a charAt builtin bytecode fast path
Tag String.prototype.charAt as a builtin and emit a dedicated
bytecode instruction for non-computed calls.

The asm interpreter can then stay on the fast path when the
receiver is a primitive string with resident UTF-16 data and the
selected code unit is ASCII. In that case we can return the VM's
cached empty or single-character ASCII string directly.
2026-04-12 19:15:50 +02:00
Andreas Kling
d31750a43c LibJS: Add a charCodeAt builtin bytecode fast path
Teach builtin call specialization to recognize non-computed
member calls to charCodeAt() and emit a dedicated builtin opcode.
Mark String.prototype.charCodeAt with that builtin tag, then add
an asm interpreter fast path for primitive-string receivers whose
UTF-16 data is already resident.

The asm path handles both ASCII-backed and UTF-16-backed resident
strings, returns NaN for out-of-bounds Int32 indices, and falls
back to the generic builtin call path for everything else. This
keeps the optimistic case in asm while preserving the ordinary
method call semantics when charCodeAt has been replaced or when
string resolution would be required.
2026-04-12 19:15:50 +02:00
Andreas Kling
7ffe01cee3 LibJS: Split builtin call bytecode opcodes
Replace the generic CallBuiltin instruction with one opcode per
supported builtin call and make those instructions fixed-size by
arity. This removes the builtin dispatch sled in the asm
interpreter, gives each builtin a dedicated slow-path entry point,
and lets bytecode generation encode the callee shape directly.

Keep the existing handwritten asm fast paths for the Math builtins
that already benefit from them, while routing the other builtin
opcodes through their own C++ execute implementations. Build the
new opcode directly in Rust codegen, and keep the generic call
fallback when the original builtin function has been replaced.
2026-04-12 19:15:50 +02:00
RubenKelevra
a1ae402bb9 LibJS: Make folded non-decimal prefix parsing UTF-8-safe
Folded StringToNumber() and StringToBigInt() detected non-decimal
prefixes by slicing the string at byte offset 2. On UTF-8 input this
could split at a non-character boundary and panic.

To prevent this, we replace the byte-based split with ASCII prefix
stripping and preserve rejection of empty suffixes such as "0x", "0o",
and "0b" explicitly before parsing the remaining digits.

This makes non-decimal prefix folding UTF-8-safe and preserves the
expected invalid-result behavior for empty prefixed literals.

Tests:

Add regression coverage for folded StringToNumber() and StringToBigInt()
non-decimal prefix handling to validate the UTF-8 safety fix as
'string-to-number-and-bigint-non-decimal-prefixes.js'.

These tests ensure empty suffixes like "0x", "0o", and "0b" and
other invalid prefixed forms stay invalid, while valid prefixed
literals continue to be accepted.

Since we removed a byte-index split in folded
StringToNumber()/StringToBigInt() coercion that could panic when byte
index 2 landed inside a multi-byte UTF-8 scalar, we add regression
tests for representative panic-shape inputs to ensure these coercions
now return invalid results instead of crashing as
'string-to-number-and-bigint-utf8-boundary.js'
2026-04-12 17:36:51 +02:00
Andreas Kling
879ac36e45 LibJS: Cache stable for-in iteration at bytecode sites
Cache the flattened enumerable key snapshot for each `for..in` site and
reuse a `PropertyNameIterator` when the receiver shape, dictionary
generation, indexed storage kind and length, prototype chain
validity, and magical-length state still match.

Handle packed indexed receivers as well as plain named-property
objects. Teach `ObjectPropertyIteratorNext` in `asmint.asm` to return
cached property values directly and to fall back to the slow iterator
logic when any guard fails.

Treat arrays' hidden non-enumerable `length` property as a visited
name for for-in shadowing, and include the receiver's magical-length
state in the cache key so arrays and plain objects do not share
snapshots.

Add `test-js` and `test-js-bytecode` coverage for mixed numeric and
named keys, packed receiver transitions, re-entry, iterator reuse, GC
retention, array length shadowing, and same-site cache reuse.
2026-04-10 15:12:53 +02:00