ladybird

mirror of https://github.com/LadybirdBrowser/ladybird synced 2026-05-12 09:56:45 +02:00

Author	SHA1	Message	Date
Andreas Kling	afa1f77252	LibJS: Materialize decoded bytecode cache blobs Create parser-free script and module materializers for decoded cache blobs. Cached functions create SFDs without Rust compile inputs and attach their precompiled executable immediately, while declaration metadata is populated from decoded records. Treat cache blobs as external input from the HTTP disk cache. Run bytecode validation unconditionally before fixing up cache pointers, and reject decoded source ranges or metadata indices that would be out-of-bounds during C++ materialization. Report executable validation failures as parser errors so callers can reject corrupt sidecars and fall back to source compilation. LibJS tests cover corrupt top-level bytecode, declaration bytecode, and declaration source spans.	2026-05-06 08:20:06 +02:00
Andreas Kling	b265694f0d	LibJS: Match bytecode cache blobs to their source Store a SHA-256 fingerprint of the decoded source text in each bytecode cache blob, and require callers to provide the expected fingerprint when validating or decoding a blob. This rejects sidecars for stale HTTP cache entries whose URL and request headers still match but whose source body has been replaced. Bytecode cache tests cover the mismatched-source rejection path.	2026-05-06 08:20:06 +02:00
Andreas Kling	de9f2b8343	LibJS: Decode bytecode cache blobs over FFI Expose an owned decoded bytecode cache handle through RustIntegration. This lets C++ callers keep validated metadata and executable records in Rust-owned cache structures without invoking the parser. Extend the js bytecode cache validation mode to create and free the FFI handle so blob generation exercises the ownership path.	2026-05-06 08:20:06 +02:00
Andreas Kling	fb11f81305	LibJS: Cache declaration function bytecode Precompile top-level function declarations used during script and module instantiation while producing a full bytecode cache entry. Keep normal off-thread execution artifacts on the existing eager path, and store declaration records separately from executable nested functions. Place those records after declaration metadata in the cache blob, and bump the blob version so older sidecars are rejected.	2026-05-06 08:20:06 +02:00
Andreas Kling	9b33cd1c21	LibJS: Return decoded bytecode cache blobs Make bytecode cache validation return a decoded blob containing the validated program record, declaration metadata, and executable records. This keeps a single Rust-owned object alive for consumers that need to materialize cached bytecode after validation succeeds.	2026-05-06 08:20:06 +02:00
Andreas Kling	16f4775a99	LibJS: Decode bytecode cache executable records Decode bytecode cache executable records into owned Rust data instead of only skipping over their serialized fields during validation. Keep cached bytecode, constants, nested functions, and class blueprints available for later materialization without rebuilding ASTs.	2026-05-06 08:20:06 +02:00
Andreas Kling	4f1bf52eb3	LibJS: Persist bytecode cache declaration metadata Store and decode script declaration-instantiation metadata and module import, export, request, and declaration metadata in bytecode cache blobs. Return decoded metadata as owned Rust records so warm-cache script and module construction can recover parser-derived facts from the sidecar.	2026-05-06 08:20:06 +02:00
Andreas Kling	c5b6739c47	LibJS: Tag bytecode cache blobs with program type Record whether each bytecode cache blob contains a classic script or a module, and pass that type through the serializer call sites. Require validation callers to provide the expected program type so script and module sidecars cannot be reused for the wrong loader.	2026-05-06 08:20:06 +02:00
Andreas Kling	b327f61ab3	LibJS: Validate bytecode cache blob records Add structural validation for bytecode cache blobs using the serialized record layout. The decoder shares primitive helpers across cache sections instead of duplicating flat parsing logic. Reject bad magic values, unsupported versions, invalid enum tags, and truncated sections before materialization. Bound sequence reads against the remaining blob so malformed sidecars cannot force large allocations.	2026-05-06 08:20:06 +02:00
Andreas Kling	96a6782800	LibJS: Serialize compiled bytecode cache blobs Add a versioned Rust bytecode cache writer for fully compiled programs. The blob records executable bytecode, metadata tables, source maps, exception handlers, nested function bytecode, and class blueprints without materializing GC objects. Expose the serialized blob through RustIntegration as an owned ByteBuffer so Web-facing callers can store it as HTTP cache data.	2026-05-06 08:20:06 +02:00
Andreas Kling	f25f245b3c	LibJS: Split full off-thread script compilation Keep fetched script and module compilation on the latency-sensitive path limited to top-level code and eager direct IIFE compilation before returning bytecode to the main thread. Add a separate full off-thread compile entry point for bytecode cache generation. Cache jobs can use it to compile every nested function after the execution path has already been unblocked.	2026-05-06 08:20:06 +02:00
Andreas Kling	3604259676	LibJS: Add off-thread function bytecode artifacts Add Rust and C++ integration points for cloning lazy function compile payloads, compiling them to GC-free bytecode off the main thread, and materializing the result later on the main thread. The cloned payload lets background compilation race with lazy main-thread compilation without sharing AST ownership between threads. Compiled function artifacts recursively include nested functions, so materialization can discard the corresponding AST subtree.	2026-05-06 08:20:06 +02:00
Andreas Kling	d9dd412440	LibJS: Use foldhash in parser and scope-collector hash maps The std default RandomState (SipHash) was using ~9 percentage points of CPU on hash_one and write across the parse hot path, with the string interner adding another ~3 pp on top. The cost was spread across the interner, the scope collector's IndexMap<Utf16String, _>, and several parser-side HashSet<Utf16String> declarations. Use foldhash::quality::RandomState for the parser, scope collector, and string interner via a new fast_hash module. Quality keeps HashDoS resistance (keys are lexer tokens, attacker-controlled in a browser context) while shedding SipHash's per-byte cost, and on this workload it benchmarks slightly faster than foldhash::fast.	2026-05-05 13:53:51 +02:00
Andreas Kling	17d3a285a7	LibJS: Move ScopeData into ScopeArena and reference it by ScopeId The Rust AST kept every scope in Rc<RefCell<ScopeData>>. The Rc made the AST !Send (cross-thread codegen needed unsafe impl Send), and the RefCell added a runtime borrow check on every hot-path read. AST nodes (Block, FunctionBody, Program, SwitchStatement, SwitchCase) now hold a ScopeId index into ScopeArena. The scope collector and codegen take &mut/&ScopeArena, so the borrow checker enforces the previously-implicit invariant that two phases never touch the same scope at once. ParsedProgram is now naturally Send. The unsafe impl Send and the arc_with_non_send_sync allow go away. CompiledProgram keeps its hand-rolled Send impl because it carries codegen-time state outside the AST. FunctionDeclarationData::is_hoisted was a Cell<bool> only because the old &[ScopeRecord] traversal couldn't get &mut to the AST. It is now a plain bool.	2026-05-05 13:53:51 +02:00
Andreas Kling	f2bf914874	LibJS: Intern identifier names in a per-arena string table Identifier::name was SharedUtf16String (Rc<Utf16String>), so equality checks against literals walked the slice and the Rc made the AST !Send. Replace it with a StringId (u32 index) backed by a StringInterner on AstArena. Repeated names dedupe to the same id, so name comparisons collapse to u32 == u32. The lexer's short/recent identifier caches and the shared_identifier_value field on Token go away; the interner already deduplicates everything. Methods that previously took &mut IdentifierArena now also take &StringInterner so they can resolve names from StringId during analyze. Codegen helpers in bytecode/codegen.rs uniformly take &AstArena. Generator gains intern_identifier_id, intern_property_key_id, and intern_string_id helpers.	2026-05-05 13:53:51 +02:00
Andreas Kling	9ace32e276	LibJS: Drop Cell<> wrappers from Identifier scope-analysis fields scope_collector now reaches Identifier through &mut IdentifierArena indexing instead of through Rc<Identifier>'s shared reference, so the Cell<> wrappers on local_type, local_index, is_global, is_inside_scope_with_eval, and declaration_kind no longer earn their keep. Replace each Cell<T> with a plain T. The borrow checker now enforces the existing "only scope_collector mutates these post-parse" invariant. Shrinks Identifier and removes a layer of indirection on hot-path field reads in codegen and ast_dump.	2026-05-05 13:53:51 +02:00
Andreas Kling	3e15e59cd1	LibJS: Move identifiers into a contiguous IdentifierArena Replace per-AST-node Rc<Identifier> with a Copy IdentifierId index into a Vec<Identifier> arena, plumbed through Parser, scope_collector, codegen, ast_dump, and the FFI. The arena lives on the parser during parse, ships out via Arc<AstArena> on ParsedProgram, and is shared by each child Generator and FunctionPayload through Arc clones. Eliminates the per-occurrence Rc::new in the parser: every identifier reference, parameter binding, function name, class name, and binding-pattern target lands in the arena's Vec instead of getting its own malloc plus Rc control block. Identifier field reads in codegen become direct array indexing. Identifier still carries Cell<>-wrapped scope-analysis state, so AstArena is not yet Send + Sync; the existing unsafe-impl-Send wrapper on ParsedProgram covers cross-thread handoff. Removing the Cells is the next step.	2026-05-05 13:53:51 +02:00
Andreas Kling	d9b9925914	LibJS: Make CompiledRegex thread-safe with Arc + AtomicPtr The regex literal handle is shared between AST clones (e.g. class field initializers reuse the same compiled regex), so shared ownership has to stay. Switch from Rc to Arc and from Cell<*mut c_void> to AtomicPtr<c_void> so the regex can travel with a function payload to a worker thread without UB on the non-atomic Rc refcount.	2026-05-05 13:53:51 +02:00
Andreas Kling	5b2dd60d11	LibJS: Add AST arena types for identifiers, scopes, and interned strings Introduce IdentifierArena, ScopeArena, and StringInterner plus their opaque IdentifierId/ScopeId/StringId index newtypes, bundled under AstArena.	2026-05-05 13:53:51 +02:00
Andreas Kling	47c82b38d0	LibJS: Range-check enum-typed bytecode fields in the validator The validator now bounds-checks the five enum-shaped field types that appear in Bytecode.def: Completion::Type, IteratorHint, EnvironmentMode, PutKind, and ArgumentsKind. The codegen recognizes each by its .def type name and emits a u32 read plus a range check against the corresponding variant count. The variant counts ride across the FFI as new fields on FFIValidatorBounds rather than being hardcoded on the Rust side, so the Rust validator never has to know which variants the C++ enum currently defines. The C++ side computes each count as `to_underlying(LastVariant) + 1` with a static_assert pinning the expected value, so adding or removing a variant in any of these enums fails the build until the validator is updated.	2026-05-03 08:43:19 +02:00
Andreas Kling	2782fa1559	LibJS: Tighten the bytecode validator's argument operand bound Until now the validator passed `u32::MAX` as the argument-region upper bound because nothing on Executable tracked how many argument slots a given bytecode buffer might reference. That left the largest validation hole open: any flat operand index above `registers + locals + constants` slid through the check. The Rust assembler already walks every operand during phase 1 so it can offset each one into the runtime's flat layout. This commit piggybacks on that walk to record the highest `Operand::argument` index touched and surfaces `(max + 1)` (or zero if no argument is ever referenced) on `AssembledBytecode`. The value rides through `FFIExecutableData` onto a new `Executable::number_of_arguments` field, which `Validator.cpp` then feeds into `FFIValidatorBounds`. The bound is now tight: every operand index in the encoded stream is range-checked against the actual runtime array size, including the argument region.	2026-05-03 08:43:19 +02:00
Andreas Kling	306060b448	LibJS: Add negative tests for the bytecode validator Until now we had only confirmed that real, encoder-produced bytecode passes the validator. That tells us we don't false-fail, but says nothing about whether we actually catch a corrupted buffer. This commit fills that gap with a set of Rust unit tests that hand-craft minimal buffers and assert that each error category triggers exactly when expected. Coverage spans the three passes: unknown opcodes and truncated / misaligned instructions for the structural walk, operand and label out-of-range cases for the per-instruction checks, and basic block / exception handler / source map offsets for the structural metadata pass. There's also a pair of cache-pointer tests that pin the BeforeFixup vs AfterFixup behavior down: an out-of-range cache index is rejected before fixup and silently skipped after, because by then the slot holds a real pointer. To make `cargo test` work for the staticlib crate without dragging in the C++ allocator, RustAllocator falls back to the standard system allocator under cfg(test). The test harness only ever runs in cargo's test profile, so the production builds keep using the ladybird-side allocator unchanged.	2026-05-03 08:43:19 +02:00
Andreas Kling	d3ca680a62	LibJS: Validate basic blocks, exception handlers, and source map Pass 3 cross-checks the structural metadata stored alongside the bytecode buffer on Executable against the offset set built during Pass 1. Every basic block start offset must point at an instruction boundary; exception handler start, end, and handler offsets must either be at an instruction boundary or, for the inclusive-start / exclusive-end pair, equal to the bytecode length; source map entries must do the same. Of these, the exception handler's handler_offset is the safety- critical one for the disk-cache use case: a corrupted offset there sends control flow into the middle of an instruction. The other checks tighten the cache-load surface area and catch obvious file corruption. The metadata is plumbed across the FFI as a separate FFIValidatorExtras struct so the validator entry point keeps the single-call shape, with a flat-offset mirror struct for exception handlers since the original carries no source data we need.	2026-05-03 08:43:19 +02:00
Andreas Kling	0b8fbc03ef	LibJS: Add per-field bytecode validation generated from Bytecode.def Pass 2 of the validator now runs a per-instruction check that walks each opcode's fields and verifies every reference points somewhere sensible. Operand indices, label addresses, identifier/string/ property-key/regex table indices, cache indices, and trailing operand arrays are all bound-checked against the values the C++ side carries on the Executable. Fields whose bound depends on an enum variant count or other type information not present in Bytecode.def are left for a follow-up. The codegen lives in build.rs and reuses the existing layout machinery from the bytecode_def crate, so each opcode gets a match arm whose body reads each field at its known byte offset and calls the right hand-written validate_* helper. Variable-length instructions cross-check the count field against m_length before iterating the trailing array, which guards against an attacker sneaking a count that walks off the end of the instruction. Note that the encoded operand format is a flat u32 index into the runtime [registers \| locals \| constants \| arguments] array, since Operand::offset_index_by zeroes the 3-bit type tag during assembly. The validator therefore range-checks the flat index rather than reading the type tag and dispatching per kind. The argument-count upper bound isn't tracked on Executable yet, so arguments remain effectively unbounded; tightening that bound is left for a later commit. Cache pointer fields are validated only when before_cache_fixup is true, since after the fixup pass they hold real pointers and must be left alone. NewFunction and NewClass have plain u32 fields for shared-function-data and class-blueprint indices; those are recognized by name in the codegen so the indices still get range-checked. The error category enum is renumbered to drop the per-operand-kind codes, since at the bytecode level we no longer differentiate.	2026-05-03 08:43:19 +02:00
Andreas Kling	d4ed658429	LibJS: Add bytecode validator scaffolding driven from Bytecode.def The plan is to start caching compiled JS bytecode on disk. Before loading anything from a cache we need confidence that the bytes are structurally well-formed, since a corrupted or tampered-with cache file could otherwise hand the interpreter an out-of-bounds jump or a constant-pool index that points past the end of the table. This commit lays down the scaffolding for that validator. The walker lives in Rust (Libraries/LibJS/Rust/src/bytecode/validator.rs) so that it can share the existing Bytecode.def-driven layout machinery with the encoder. C++ calls into it through cbindgen, the same way the rest of the Rust pipeline is wired up. For now, the validator only does Pass 1: walk the byte stream, verify each instruction is 8-byte aligned, the opcode byte is in range, and the reported length keeps us inside the buffer. The length lookup is generated from Bytecode.def so fixed-length and variable-length instructions stay in sync with the rest of the codegen automatically. Per-field bounds checks (operands, labels, table indices, cache indices) and structural extras (basic block offsets, exception handlers, source map) come in follow-up commits. The validator runs after every successful compilation in debug and sanitizer builds, gated on !NDEBUG \|\| HAS_ADDRESS_SANITIZER, so we get an extra sanity check on every executable the encoder produces without paying for it in release builds. Failure trips a VERIFY_NOT_REACHED with the offset, opcode, and error category logged via dbgln().	2026-05-03 08:43:19 +02:00
Aliaksandr Kalenik	2171563daf	LibJS: Avoid function envs for lexical-this arrows Track whether a function needs environment-backed this resolution separately from whether it needs to allocate its own function environment. Arrow functions that only capture lexical this can now resolve through the outer environment without allocating an empty function environment for every call. Keep the asm Call path conservative by routing functions that still need lexical-this resolution through the C++ inline-call helper, so the call receiver is not cached as the arrow function's this value. Microbenchmark: function makeLexicalThisArrow() { return () => this.value; } let object = { value: 1, makeLexicalThisArrow }; let fn = object.makeLexicalThisArrow(); for (let i = 0; i < 20_000_000; ++i) fn(); Measured with the same Release build toggling this patch: baseline: 1069.2 ms mean over 12 runs optimized: 501.2 ms mean over 12 runs speedup: 2.13 times faster	2026-04-30 18:44:34 +02:00
Andreas Kling	e65e85cb8c	LibJS: Materialize arguments object for shorthand `{ arguments }` The parser only set `might_need_arguments_object` when an `arguments` or `eval` Identifier went through `consume()`, but shorthand object properties create the reference via `make_identifier()` directly. As a result `function f() { return { arguments } }` allocated an `arguments` local, never initialized it, and crashed at runtime when the property was read. Fall back to scope-driven detection: if scope analysis allocated a non-lexical `arguments` local for the function, treat it as a real arguments-object reference and emit `CreateArguments`. Skip the fallback when a function declaration named `arguments` claims the local, since that local belongs to the function, not the arguments object. Add a runtime test covering shorthand inside a free function and a method, plus a regression test for `({ eval } = ...)` to confirm destructuring assignment doesn't accidentally trigger arguments materialization.	2026-04-27 08:04:11 +02:00
Andreas Kling	c1bc0cdfa9	LibJS: Allocate local variable indices in source order The scope collector stored identifier_groups and variables in HashMaps and then sorted them alphabetically before assigning local register indices. The sorts existed only because HashMap iteration order is non-deterministic; alphabetical was a stable choice for comparing bytecode against the now-removed C++ port. Switch both maps to indexmap::IndexMap so iteration follows the order of first reference (= source order), and drop the alphabetical sorts. Local indices now reflect declaration order, which matches what shows up in bytecode dumps and is easier to read alongside the source. Add a focused bytecode test using zebra/yak/aardvark to pin the new allocation order; existing tests using let/var declarations have their local indices renumbered to match.	2026-04-27 08:04:11 +02:00
Andreas Kling	010deec578	LibJS: Build functions_to_initialize in source order ECMAScript hoisting keeps the LAST function declaration with a given name. The Rust scope_collector and script GDI extraction implemented this with a single reverse scan that pushed first-seen entries, which left the resulting list in REVERSE source order. The C++ side then iterated `m_functions_to_initialize.in_reverse()` to undo that. Switch the Rust side to a two-pass forward scan that records the last position per name and emits entries in source order, and drop the matching `.in_reverse()` calls in Script.cpp and AbstractOperations.cpp. Same hoisting semantics; NewFunction emission and global property iteration order now follow the source. The HashMap that tracks last positions is keyed on `SharedUtf16String`, so each insert is a refcount bump on the AST's existing Rc instead of a deep `Vec<u16>` clone. Add bytecode tests at script and nested-function scope that exercise multiple declarations and a duplicate name to pin the new ordering.	2026-04-27 08:04:11 +02:00
Andreas Kling	30394ece8d	LibJS: Use natural source positions for parser-synthesized identifiers The Rust parser used to copy several "rule_start"-derived positions from the C++ implementation: every identifier inside a binding pattern inherited the pattern's `[`/`{` position, every property identifier after `.` inherited the period's position, every spread element inherited the surrounding `[`/`{` position, and identifier-name property keys inherited the object/class start position. This was useful while comparing bytecode against the C++ port; with the C++ side gone, those quirks just hide the actual source positions in source maps and devtools. Drop the dedicated `binding_pattern_start` parser field and the `ident_pos_override` parameter on `parse_property_key`, and capture each identifier's own start position at the consume site. Add an AST snapshot test that pins the new per-identifier positions for object, array, nested, and parameter binding patterns.	2026-04-27 08:04:11 +02:00
Andreas Kling	cec0be6f3d	LibJS: Replace in_property_key_context flag with explicit consume helper The parser used to suppress the arguments/eval reference check via a state flag that was set during the entire `parse_property_key` call. That was over-broad: identifiers inside a computed property key like `{ [arguments]: 1 }` are real references, but the flag silenced their check too, leaving the function unmarked as needing the arguments object. Reading the resulting property at runtime crashed. Replace the flag with a `consume_property_key_token()` method used at the specific consume sites for the property key token itself, so the suppression is narrow. Inner consumes inside computed keys now go through regular `consume()` and run the check normally. Add a focused AST snapshot test covering plain, shorthand, computed, binding-pattern, and method-name property-key cases.	2026-04-27 08:04:11 +02:00
Andreas Kling	141063e91d	LibJS: Precompile top-level IIFEs off-thread Mark direct calls to function expressions while generating top-level Rust bytecode, then compile those functions before returning the off-thread compilation result to WebContent. The main thread still performs all VM and GC-backed materialization. It now receives an already assembled executable for each eager IIFE and attaches it to the SharedFunctionInstanceData while creating the parent Executable. Nested functions owned by the eager executable remain lazy. This targets large wrapper IIFEs that are invoked as soon as top-level code starts running. Their bytecode generation now runs on the existing script compilation worker instead of blocking the main thread on first call.	2026-04-26 21:51:52 +02:00
Andreas Kling	4a7dc45b3f	LibWeb+LibJS: Compile fetched top-level JS off-thread Split Rust program compilation so code generation and assembly finish before the main thread materializes GC-backed executable objects. The new CompiledProgram handle owns the parsed program, generator state, and bytecode until C++ consumes it on the main thread. Wire WebContent script fetching through that handle for classic scripts and modules. Syntax-error paths still return ParsedProgram, so existing error reporting stays in place. Successful fetches now do top-level codegen on the thread pool before deferred_invoke hands control back to the main thread. Executable creation, SharedFunctionInstanceData materialization, module metadata extraction, and declaration data extraction still run on the main thread where VM and GC access is valid.	2026-04-26 21:51:52 +02:00
Andreas Kling	0d120019df	LibJS: Resolve VM constants during executable creation Rust bytecode generation still reached into the VM to encode well-known symbols and intrinsic abstract-operation functions as raw JS::Value constants. That is not compatible with running top-level code generation away from the main thread. Keep those constants symbolic in the Rust constant pool instead. The C++ Executable materialization step now resolves them into real VM values while it is already decoding the rest of the constant table on the main thread. This removes another VM dependency from Rust bytecode emission without changing when the resulting constants become visible to the bytecode interpreter.	2026-04-26 21:51:52 +02:00
Andreas Kling	56625c13d7	LibJS: Defer function data materialization Rust bytecode generation currently creates SharedFunctionInstanceData and ClassBlueprint GC objects as soon as nested functions and classes are encountered. That keeps the whole code generation phase tied to the main-thread VM and heap. Record pending descriptors on the Generator instead, then materialize those descriptors while creating the C++ Executable. This keeps the GC allocation boundary exactly where it already belongs, but removes the last direct function-data allocations from the codegen walk. This is a preparatory step for compiling top-level bytecode off-thread and only doing C++ materialization after returning to the main thread.	2026-04-26 21:51:52 +02:00
Andreas Kling	63d6ef1026	LibJS: Track nested function ids during Rust parsing FunctionTable::extract_reachable() used to rediscover a function's nested functions by walking the full body and parameter list during bytecode generation. This is hot during page loading because creating every lazy SFD pays for an extra structural AST traversal. Record each parser-created function's direct child function ids while parsing instead. Extraction can then recursively move that known subtree without scanning the enclosing function again. Keep the old structural scan for codegen-synthesized wrappers, such as class field initializers, where no parser function context exists. This preserves the sparse FunctionTable storage while making the common extraction path proportional to the nested function count.	2026-04-26 21:51:52 +02:00
Timothy Flynn	5c34c7f554	Meta: Move python code generators to a subdirectory Let's have a bit of organization here, rather than an ever-growing Meta folder.	2026-04-23 07:31:19 -04:00
Andreas Kling	eb9432fcb8	LibJS: Preserve source positions in bytecode source maps Carry full source positions through the Rust bytecode source map so stack traces and other bytecode-backed source lookups can use them directly. This keeps exception-heavy paths from reconstructing line and column information through SourceCode::range_from_offsets(), which can spend a lot of time building SourceCode's position cache on first use. We're trading some space for time here, but I believe it's worth it at this tag, as this saves ~250ms of main thread time while loading https://x.com/ on my Linux machine. :^) Reading the stored Position out of the source map directly also exposed two things masked by the old range_from_offsets() path: a latent off-by-one in Lexer::new_at_offset() (its consume() bumped line_column past the character at offset; only synthesize_binding_pattern() hit it), and a (1,1) fallback in range_from_offsets() that fired whenever the queried range reached EOF. Fix the lexer, then rebaseline both the bytecode dump tests (no more spurious "1:1") and the destructuring AST tests (binding-pattern identifiers now report their real columns).	2026-04-22 22:34:54 +02:00
Andreas Kling	51758f3022	LibJS: Make bytecode register allocator O(1) Generator::allocate_register used to scan the free pool to find the lowest-numbered register and then Vec::remove it, making every allocation O(n) in the size of the pool. When loading https://x.com/ on my Linux machine, we spent ~800ms in this function alone! This logic only existed to match the C++ register allocation ordering while transitioning from C++ to Rust in the LibJS compiler, so now we can simply get rid of it and make it instant. :^) So drop the "always hand out the lowest-numbered free register" policy and use the pool as a plain LIFO stack. Pushing and popping the back of the Vec are both O(1), and peak register usage is unchanged since the policy only affects which specific register gets reused, not how aggressively.	2026-04-21 13:59:55 +02:00
Timothy Flynn	10ce847931	LibJS+LibUnicode: Use LibUnicode as appropriate for lexing JavaScript Now that LibUnicode exports its character type APIs in Rust, we can use them to lex identifiers and whitespace. Fixes #8870.	2026-04-19 10:39:26 +02:00
Andrew Kaster	f26cb24751	Rust: Add a config file for rustfmt This sets max_width to 120, which causes a lot of reformatting.	2026-04-18 08:05:47 -04:00
Andreas Kling	530f6fb05c	LibJS: Fold nested Rust match conditionals Move several let/const checks and the `instanceof` keyword check into match guards.	2026-04-16 22:44:41 +02:00
Andreas Kling	c301a21960	LibJS: Skip preserving zero-argument call callees The callee and this-value preservation copies only matter while later argument expressions are still being evaluated. For zero-argument calls there is nothing left to clobber them, so we can keep the original operand and let the interpreter load it directly. This removes the hot Mov arg0->reg pattern from zero-argument local calls and reduces register pressure.	2026-04-13 18:29:43 +02:00
Andreas Kling	3a08f7b95f	LibJS: Drop dead entry GetLexicalEnvironment loads Teach the Rust bytecode generator to treat the synthetic entry GetLexicalEnvironment as a removable prologue load. We still model reg4 as the saved entry lexical environment during codegen, but assemble() now deletes that load when no emitted instruction refers to the saved environment register. This keeps the semantics of unwinding and environment restoration intact while letting empty functions and other simple bodies start at their first real instruction.	2026-04-13 18:29:43 +02:00
Andreas Kling	3e18136a8c	LibJS: Add a String.fromCharCode builtin opcode Specialize only the fixed unary case in the bytecode generator and let all other argument counts keep using the generic Call instruction. This keeps the builtin bytecode simple while still covering the common fast path. The asm interpreter handles int32 inputs directly, applies the ToUint16 mask in-place, and reuses the VM's cached ASCII single-character strings when the result is 7-bit representable. Non-ASCII single code unit results stay on the dedicated builtin path via a small helper, and the dedicated slow path still handles the generic cases.	2026-04-12 19:15:50 +02:00
Andreas Kling	7bc40bd54a	LibJS: Add a charAt builtin bytecode fast path Tag String.prototype.charAt as a builtin and emit a dedicated bytecode instruction for non-computed calls. The asm interpreter can then stay on the fast path when the receiver is a primitive string with resident UTF-16 data and the selected code unit is ASCII. In that case we can return the VM's cached empty or single-character ASCII string directly.	2026-04-12 19:15:50 +02:00
Andreas Kling	d31750a43c	LibJS: Add a charCodeAt builtin bytecode fast path Teach builtin call specialization to recognize non-computed member calls to charCodeAt() and emit a dedicated builtin opcode. Mark String.prototype.charCodeAt with that builtin tag, then add an asm interpreter fast path for primitive-string receivers whose UTF-16 data is already resident. The asm path handles both ASCII-backed and UTF-16-backed resident strings, returns NaN for out-of-bounds Int32 indices, and falls back to the generic builtin call path for everything else. This keeps the optimistic case in asm while preserving the ordinary method call semantics when charCodeAt has been replaced or when string resolution would be required.	2026-04-12 19:15:50 +02:00
Andreas Kling	7ffe01cee3	LibJS: Split builtin call bytecode opcodes Replace the generic CallBuiltin instruction with one opcode per supported builtin call and make those instructions fixed-size by arity. This removes the builtin dispatch sled in the asm interpreter, gives each builtin a dedicated slow-path entry point, and lets bytecode generation encode the callee shape directly. Keep the existing handwritten asm fast paths for the Math builtins that already benefit from them, while routing the other builtin opcodes through their own C++ execute implementations. Build the new opcode directly in Rust codegen, and keep the generic call fallback when the original builtin function has been replaced.	2026-04-12 19:15:50 +02:00
RubenKelevra	a1ae402bb9	LibJS: Make folded non-decimal prefix parsing UTF-8-safe Folded StringToNumber() and StringToBigInt() detected non-decimal prefixes by slicing the string at byte offset 2. On UTF-8 input this could split at a non-character boundary and panic. To prevent this, we replace the byte-based split with ASCII prefix stripping and preserve rejection of empty suffixes such as "0x", "0o", and "0b" explicitly before parsing the remaining digits. This makes non-decimal prefix folding UTF-8-safe and preserves the expected invalid-result behavior for empty prefixed literals. Tests: Add regression coverage for folded StringToNumber() and StringToBigInt() non-decimal prefix handling to validate the UTF-8 safety fix as 'string-to-number-and-bigint-non-decimal-prefixes.js'. These tests ensure empty suffixes like "0x", "0o", and "0b" and other invalid prefixed forms stay invalid, while valid prefixed literals continue to be accepted. Since we removed a byte-index split in folded StringToNumber()/StringToBigInt() coercion that could panic when byte index 2 landed inside a multi-byte UTF-8 scalar, we add regression tests for representative panic-shape inputs to ensure these coercions now return invalid results instead of crashing as 'string-to-number-and-bigint-utf8-boundary.js'	2026-04-12 17:36:51 +02:00
Andreas Kling	879ac36e45	LibJS: Cache stable for-in iteration at bytecode sites Cache the flattened enumerable key snapshot for each `for..in` site and reuse a `PropertyNameIterator` when the receiver shape, dictionary generation, indexed storage kind and length, prototype chain validity, and magical-length state still match. Handle packed indexed receivers as well as plain named-property objects. Teach `ObjectPropertyIteratorNext` in `asmint.asm` to return cached property values directly and to fall back to the slow iterator logic when any guard fails. Treat arrays' hidden non-enumerable `length` property as a visited name for for-in shadowing, and include the receiver's magical-length state in the cache key so arrays and plain objects do not share snapshots. Add `test-js` and `test-js-bytecode` coverage for mixed numeric and named keys, packed receiver transitions, re-entry, iterator reuse, GC retention, array length shadowing, and same-site cache reuse.	2026-04-10 15:12:53 +02:00

1 2 3 4

182 Commits