ladybird

mirror of https://github.com/LadybirdBrowser/ladybird synced 2026-04-26 17:55:07 +02:00

Author	SHA1	Message	Date
Andreas Kling	31606fddd3	LibJS: Add Mov2/Mov3 instructions to reduce dispatch overhead Add Mov2 and Mov3 bytecode instructions that perform 2 or 3 register moves in a single dispatch. A peephole optimization pass during bytecode assembly merges consecutive Mov instructions within each basic block into these combined instructions. When merging, identical Movs are deduplicated (e.g. two identical Movs become a single Mov, not a Mov2). This optimization is implemented in both the C++ and Rust codegen pipelines. The goal is to reduce the per-instruction dispatch overhead, which is significant compared to the actual cost of moving a value. This isn't fancy or elegant, but provides a real speed-up on many workloads. As an example, Kraken/imaging-desaturate.js improves by ~1.07x on my laptop.	2026-03-11 17:04:32 +01:00
Andreas Kling	96d02d5249	LibJS: Remove derivable fields from ExecutionContext Remove four fields that are trivially derivable from other fields already present in the ExecutionContext: - global_object (from realm) - global_declarative_environment (from realm) - identifier_table (from executable) - property_key_table (from executable) This shrinks ExecutionContext from 192 to 160 bytes (-17%). The asmint's GetGlobal/SetGlobal handlers now load through the realm pointer, taking advantage of the cached declarative environment pointer added in the previous commit.	2026-03-11 13:33:47 +01:00
Andreas Kling	d5eed2632f	AsmInt: Add branch_zero32/branch_nonzero32 to the asmint DSL These test only the low 32 bits of a register, replacing the previous pattern of `and reg, 0xFFFFFFFF` followed by `branch_zero` or `branch_nonzero`. On aarch64 the old pattern emitted `mov w1, w1; cbnz x1` (2 insns), now it's just `cbnz w1` (1 insn). Used in JumpIf, JumpTrue, JumpFalse, and Not for the int32 truthiness fast path.	2026-03-08 23:04:55 +01:00
Andreas Kling	368efef620	AsmIntGen: Support [pb, pc, field] three-operand memory access Teach the DSL and both arch backends to handle memory operands of the form [pb, pc, field_ref], meaning base + index + field_offset. On aarch64, since x21 already caches pb + pc (the instruction pointer), this emits a single `ldr dst, [x21, #offset]` instead of the previous `mov t0, x21` + `ldr dst, [t0, #offset]` two-instruction sequence. On x86_64, this emits `[r14 + r13 + offset]` which is natively supported by x86 addressing modes. Convert all `lea t0, [pb, pc]` + `loadNN tX, [t0, field]` pairs in the DSL to the new single-instruction form, saving one instruction per IC access and other field loads in GetById, PutById, GetLength, GetGlobal, SetGlobal, and CallBuiltin handlers.	2026-03-08 10:27:13 +01:00
Andreas Kling	54a1a66112	LibJS: Store cache pointers directly in bytecode instructions Instead of storing a u32 index into a cache vector and looking up the cache at runtime through a chain of dependent loads (load Executable, load vector data pointer, multiply index, add), store the actual cache pointer as a u64 directly in the instruction stream. A fixup pass (Executable::fixup_cache_pointers()) runs after Executable construction in both the Rust and C++ pipelines, walking the bytecode and replacing each index with the corresponding pointer. The cache pointer type is encoded in Bytecode.def (e.g. PropertyLookupCache, GlobalVariableCache*) so the fixup switch is auto-generated by the Python Op code generator, making it impossible to forget updating the fixup when adding new cached instructions. This eliminates 3-4 dependent loads on every inline cache access in both the C++ interpreter and the assembly interpreter.	2026-03-08 10:27:13 +01:00
Andreas Kling	fe48e27a05	LibJS: Replace GC::Weak with GC::RawPtr in inline cache entries Property lookup cache entries previously used GC::Weak<T> for shape, prototype, and prototype_chain_validity pointers. Each GC::Weak requires a ref-counted WeakImpl allocation and an extra indirection on every access. Replace these with GC::RawPtr<T> and make Executable a WeakContainer so the GC can clear stale pointers during sweep via remove_dead_cells. For static PropertyLookupCache instances (used throughout the runtime for well-known property lookups), introduce StaticPropertyLookupCache which registers itself in a global list that also gets swept. Now that inline cache entries use GC::RawPtr instead of GC::Weak, we can compare shape/prototype pointers directly without going through the WeakImpl indirection. This removes one dependent load from each IC check in GetById, PutById, GetLength, GetGlobal, and SetGlobal handlers.	2026-03-08 10:27:13 +01:00
Andreas Kling	271cd0173d	AsmInt: Remove redundant accessor check from GetByValue SimpleIndexedPropertyStorage can only hold default-attributed data properties. Any attempt to store a property with non-default attributes (such as accessors) triggers conversion to GenericIndexedPropertyStorage first. So when we've already verified is_simple_storage, the accessor check is dead code.	2026-03-08 10:27:13 +01:00
Andreas Kling	95eaac03fb	AsmInt: Inline environment binding path for GetGlobal/SetGlobal Instead of calling into C++ helpers for global let/const variable access, inline the binding lookup directly in the asm handlers. This avoids the overhead of a C++ call for the common case. Module environments still use the C++ helper since they require additional lookups that aren't worth inlining.	2026-03-08 10:27:13 +01:00
Andreas Kling	e486ad2c0c	AsmIntGen: Use platform-optimal codegen for NaN-boxing operations Convert extract_tag, unbox_int32, unbox_object, box_int32, and box_int32_clean from DSL macros into codegen instructions, allowing each backend to emit optimal platform-specific code. On aarch64, this produces significant improvements: - extract_tag: single `lsr xD, xS, #48` instead of `mov` + `lsr` (3-operand shifts are free on ARM). Saves 1 instruction at 57 call sites. - unbox_object: single `and xD, xS, #0xffffffffffff` instead of `mov` + `shl` + `shr`. The 48-bit mask is a valid ARM64 logical immediate. Saves 2 instructions at 6 call sites. - box_int32: `mov wD, wS` + `movk xD, #tag, lsl #48` instead of `mov` + `and 0xFFFFFFFF` + `movabs tag` + `or`. The w-register mov zero-extends, and movk overwrites just the top 16 bits. Saves 2 instructions and no longer clobbers t0 (rax). - box_int32_clean: `movk xD, #tag, lsl #48` (1 instruction) instead of `mov` + `movabs tag` + `or` (saves 2 instructions, no t0 clobber). On x86_64, the generated code is equivalent to the old macros.	2026-03-07 22:18:22 +01:00
Andreas Kling	c65e1955a1	AsmInt: Skip redundant zero-extension in more box_int32 sites UnsignedRightShift: after shr on a zero-extended value, upper bits are already clear. GetByValue typed array path: load32/load8/load16/load8s/load16s all write to 32-bit destination registers, zeroing the upper 32 bits. Both can use box_int32_clean to skip the redundant AND 0xFFFFFFFF.	2026-03-07 22:18:22 +01:00
Andreas Kling	3e7fa8b09a	AsmInt: Use 32-bit NOT in BitwiseNot handler Add a not32 DSL instruction that operates on the 32-bit sub-register, zeroing the upper 32 bits (x86_64: not r32, aarch64: mvn w_reg). Use it in BitwiseNot to avoid the sign-extension (unbox_int32), 64-bit NOT, and explicit AND 0xFFFFFFFF. The 32-bit NOT produces a clean upper half, so we can use box_int32_clean directly. Before: movsxd + not r64 + and 0xFFFFFFFF + and 0xFFFFFFFF + or tag After: mov + not r32 + or tag	2026-03-07 22:18:22 +01:00
Andreas Kling	8e6cd7f5a8	AsmInt: Eliminate redundant copy in boolean coercion of int32 In JumpIf, JumpTrue, JumpFalse, and Not, the int32 zero-test path copied the value to a temporary before masking: mov t3, t1; and t3, 0xFFFFFFFF; branch_zero t3. Since t1 is dead after the test, operate on it directly: and t1, 0xFFFFFFFF; branch_zero t1. Saves one mov instruction per handler on the int32 truthiness path.	2026-03-07 22:18:22 +01:00
Andreas Kling	b5f203e9f5	AsmInt: Skip redundant zero-extension in box_int32 Add box_int32_clean for sites where the upper 32 bits are already known to be zero, skipping the redundant zero-extension. On x86_64, 32-bit register writes (add esi, edi; neg esi; etc.) implicitly clear the upper 32 bits, making the truncation in box_int32 unnecessary. Use box_int32_clean at 9 call sites after add32_overflow, sub32_overflow, mul32_overflow, and neg32_overflow, saving one instruction per site on the hot int32 arithmetic paths.	2026-03-07 22:18:22 +01:00
Andreas Kling	b9a12066e9	AsmInt: Use tag extraction for double checks instead of 64-bit mask Replace the check_is_double pattern that loaded the full 64-bit CANON_NAN_BITS constant (10-byte movabs on x86_64) and masked the entire value, with a cheaper approach: extract the upper 16-bit tag and check if (tag & NAN_BASE_TAG) == NAN_BASE_TAG. This saves instructions at every double-check site. Additionally, add a check_tag_is_double macro for call sites where the tag has already been extracted into a register, avoiding redundant extract_tag operations. This is used in 11 call sites across coerce_to_doubles, strict_equality_core, numeric_compare, Div, UnaryPlus, UnaryMinus, and ToInt32.	2026-03-07 22:18:22 +01:00
Andreas Kling	5b8114a96b	AsmInt: Use hardware overflow flag for int32 arithmetic Replace the pattern of 64-bit arithmetic + sign-extend + compare with dedicated 32-bit overflow instructions that use the hardware overflow flag directly. Before: add t3, t4 / unbox_int32 t5, t3 / branch_ne t3, t5, .overflow After: add32_overflow t3, t4, .overflow On x86_64 this compiles to `add r32, r32; jo label` (the 32-bit register write implicitly zeros the upper 32 bits). On aarch64, `adds w, w, w; b.vs label` for add/sub, `smull + sxtw + cmp + b.ne` for multiply, and `negs + b.vs` for negate. Nine call sites updated: Add, Sub, Mul, Increment, Decrement, PostfixIncrement, PostfixDecrement, UnaryMinus, and CallBuiltin(abs).	2026-03-07 22:18:22 +01:00
Andreas Kling	200103b0af	LibJS: Add AsmInterpreter, a low-level assembly bytecode interpreter Add a new interpreter that executes bytecode via generated assembly, written in a custom DSL (asmint.asm) that AsmIntGen compiles to native x86_64 or aarch64 code. The interpreter keeps the bytecode program counter and register file pointer in machine registers for fast access, dispatching opcodes through a jump table. Hot paths (arithmetic, comparisons, property access on simple objects) are handled entirely in assembly, with cold/complex operations calling into C++ helper functions defined in AsmInterpreter.cpp. A small build-time tool (gen_asm_offsets) uses offsetof() to emit struct field offsets as constants consumed by the DSL, ensuring the assembly stays in sync with C++ struct layouts. The interpreter is enabled by default on platforms that support it. The C++ interpreter can be selected via LIBJS_USE_CPP_INTERPRETER=1. Currently supported platforms: - Linux/x86_64 - Linux/aarch64 - macOS/x86_64 - macOS/aarch64	2026-03-07 13:09:59 +01:00

16 Commits