ladybird

mirror of https://github.com/LadybirdBrowser/ladybird synced 2026-04-26 17:55:07 +02:00

Author	SHA1	Message	Date
Andreas Kling	c3deaa4746	AsmIntGen: Panic on missing constants Replace all unwrap_or(0) and parse().unwrap_or(0) calls in the asmint code generator with expect()/panic! so that missing constants or unparseable literals cause a build-time failure instead of silently generating wrong code.	2026-03-08 23:04:55 +01:00
Andreas Kling	30d7b7db20	AsmIntGen: Fix aarch64 mul32_overflow leaving garbage in upper 32 bits The smull instruction writes a 64-bit result to the destination register. For negative results like 1 * -1 = -1, this means the upper 32 bits are all 1s (sign extension of the 64-bit value). The subsequent box_int32_clean assumed the upper 32 bits were already zero, so it just set the NaN-boxing tag with movk. This produced a corrupted Value where strict equality (===) would fail even though the numeric value was correct. Fix this by adding a mov wN, wN after the overflow check to zero-extend the 32-bit result, matching what add32_overflow and sub32_overflow already do by writing to W registers.	2026-03-08 23:04:55 +01:00
Andreas Kling	d5eed2632f	AsmInt: Add branch_zero32/branch_nonzero32 to the asmint DSL These test only the low 32 bits of a register, replacing the previous pattern of `and reg, 0xFFFFFFFF` followed by `branch_zero` or `branch_nonzero`. On aarch64 the old pattern emitted `mov w1, w1; cbnz x1` (2 insns), now it's just `cbnz w1` (1 insn). Used in JumpIf, JumpTrue, JumpFalse, and Not for the int32 truthiness fast path.	2026-03-08 23:04:55 +01:00
Andreas Kling	2db4d30e56	AsmIntGen: Stop maintaining w25 (pc) on every asmint dispatch on aarch64 x21 (instruction pointer = pb + pc) is already the primary dispatch register. Maintaining w25 (the 32-bit pc offset) in parallel on every dispatch_next, goto_handler, and dispatch_variable was redundant. Compute the 32-bit pc on demand via `sub w1, w21, w26` only when calling into C++ (slow paths), which is the cold path. This removes one instruction from every hot dispatch sequence and every jump target. The generated output shrinks from 4692 to 4345 lines (~347 instructions removed), with every handler benefiting from shorter dispatch tails.	2026-03-08 23:04:55 +01:00
Andreas Kling	949454feb9	AsmIntGen: Pin {INT32,BOOLEAN,NAN_BASE}_TAG in callee-saved registers These three tag constants (0x7FFA, 0x7FF9, 0x7FF8) exceed the 12-bit cmp immediate range on aarch64, so every comparison required a mov+cmp pair. Pin them in x22, x23, x24 (callee-saved, previously unused) to turn ~160 two-instruction sequences into single cmp instructions.	2026-03-08 23:04:55 +01:00
Andreas Kling	368efef620	AsmIntGen: Support [pb, pc, field] three-operand memory access Teach the DSL and both arch backends to handle memory operands of the form [pb, pc, field_ref], meaning base + index + field_offset. On aarch64, since x21 already caches pb + pc (the instruction pointer), this emits a single `ldr dst, [x21, #offset]` instead of the previous `mov t0, x21` + `ldr dst, [t0, #offset]` two-instruction sequence. On x86_64, this emits `[r14 + r13 + offset]` which is natively supported by x86 addressing modes. Convert all `lea t0, [pb, pc]` + `loadNN tX, [t0, field]` pairs in the DSL to the new single-instruction form, saving one instruction per IC access and other field loads in GetById, PutById, GetLength, GetGlobal, SetGlobal, and CallBuiltin handlers.	2026-03-08 10:27:13 +01:00
Andreas Kling	8936cda523	AsmIntGen: Cache pb+pc in callee-saved x21 on aarch64 Pin x21 = pb + pc (the instruction pointer) as a callee-saved register that survives C++ calls. x21 is set during dispatch and remains valid throughout the entire handler. This eliminates redundant `add x9, x26, x25` instructions from every load_operand, store_operand, load_label, and dispatch_next sequence. Also optimizes `lea dst, [pb, pc]` to `mov dst, x21`. For dispatch_next, the next opcode is loaded via `ldrb w9, [x21, #size]` and x21 is updated incrementally (`add x21, x21, #size`), which also improves the dependency chain vs recomputing from x26 + x25. dispatch_current is promoted from a DSL macro to a codegen instruction so it can set x21 for the next handler.	2026-03-07 22:18:22 +01:00
Andreas Kling	8afa1df951	AsmIntGen: Pin canonical NaN in callee-saved d8 on aarch64 Load CANON_NAN_BITS into d8 (a callee-saved FP register) at interpreter entry. This avoids materializing the 64-bit constant in every canonicalize_nan cold fixup block. Before: cold block was `movz x9, ... / movk x9, ... / b ret` After: cold block is just `fmov xD, d8 / b ret` The hot path (fmov + fcmp + b.vs) is unchanged. The constant is only needed when the result is actually NaN, which is rare, but this still shrinks code size and avoids the multi-instruction immediate materialization at 11 call sites.	2026-03-07 22:18:22 +01:00
Andreas Kling	e486ad2c0c	AsmIntGen: Use platform-optimal codegen for NaN-boxing operations Convert extract_tag, unbox_int32, unbox_object, box_int32, and box_int32_clean from DSL macros into codegen instructions, allowing each backend to emit optimal platform-specific code. On aarch64, this produces significant improvements: - extract_tag: single `lsr xD, xS, #48` instead of `mov` + `lsr` (3-operand shifts are free on ARM). Saves 1 instruction at 57 call sites. - unbox_object: single `and xD, xS, #0xffffffffffff` instead of `mov` + `shl` + `shr`. The 48-bit mask is a valid ARM64 logical immediate. Saves 2 instructions at 6 call sites. - box_int32: `mov wD, wS` + `movk xD, #tag, lsl #48` instead of `mov` + `and 0xFFFFFFFF` + `movabs tag` + `or`. The w-register mov zero-extends, and movk overwrites just the top 16 bits. Saves 2 instructions and no longer clobbers t0 (rax). - box_int32_clean: `movk xD, #tag, lsl #48` (1 instruction) instead of `mov` + `movabs tag` + `or` (saves 2 instructions, no t0 clobber). On x86_64, the generated code is equivalent to the old macros.	2026-03-07 22:18:22 +01:00
Andreas Kling	3e7fa8b09a	AsmInt: Use 32-bit NOT in BitwiseNot handler Add a not32 DSL instruction that operates on the 32-bit sub-register, zeroing the upper 32 bits (x86_64: not r32, aarch64: mvn w_reg). Use it in BitwiseNot to avoid the sign-extension (unbox_int32), 64-bit NOT, and explicit AND 0xFFFFFFFF. The 32-bit NOT produces a clean upper half, so we can use box_int32_clean directly. Before: movsxd + not r64 + and 0xFFFFFFFF + and 0xFFFFFFFF + or tag After: mov + not r32 + or tag	2026-03-07 22:18:22 +01:00
Andreas Kling	6492c88ad8	AsmIntGen: Elide redundant FP comparisons in consecutive branch_fp_* When consecutive branch_fp_* instructions use the same operands (e.g. branch_fp_unordered followed by branch_fp_equal), the 2nd ucomisd/fcmp is redundant since flags are still valid from the first comparison. Track the last FP comparison operands in HandlerState and skip the comparison instruction when it would be identical. This is common in the double_equality_compare macro which checks for unordered (NaN) before testing equality.	2026-03-07 22:18:22 +01:00
Andreas Kling	472edb3448	AsmIntGen: Use mov r32 for unsigned 32-bit immediates on x86_64 Values in the range 0x80000000..0xFFFFFFFF were incorrectly emitted as plain `mov r64, imm` which GAS encodes as a 10-byte movabs. Use `mov r32, imm32` instead (5 bytes, implicitly zero-extends to 64 bits). This affects constants like ENVIRONMENT_COORDINATE_INVALID (0xFFFFFFFE) which appeared 5 times in the generated assembly.	2026-03-07 22:18:22 +01:00
Andreas Kling	c6fd52e317	AsmIntGen: Move NaN canonicalization to cold fixup blocks canonicalize_nan previously emitted its full NaN fixup inline: on x86_64, a 10-byte movabs + cmovp; on aarch64, a multi-instruction mov sequence + fcsel. These were always on the hot path even though NaN results from arithmetic are extremely rare. Move the NaN fixup to a cold block emitted after the handler body. The hot path is now just: movq/fmov + ucomisd/fcmp + jp/b.vs (a forward branch predicted not-taken). This removes 14 bytes of instructions from the hot path of every handler that produces double results (Add, Sub, Mul, Div, and several builtins). Both backends gain a HandlerState struct (shared between them) that accumulates cold fixup blocks during code generation, emitted after the main body.	2026-03-07 22:18:22 +01:00
Andreas Kling	5b8114a96b	AsmInt: Use hardware overflow flag for int32 arithmetic Replace the pattern of 64-bit arithmetic + sign-extend + compare with dedicated 32-bit overflow instructions that use the hardware overflow flag directly. Before: add t3, t4 / unbox_int32 t5, t3 / branch_ne t3, t5, .overflow After: add32_overflow t3, t4, .overflow On x86_64 this compiles to `add r32, r32; jo label` (the 32-bit register write implicitly zeros the upper 32 bits). On aarch64, `adds w, w, w; b.vs label` for add/sub, `smull + sxtw + cmp + b.ne` for multiply, and `negs + b.vs` for negate. Nine call sites updated: Add, Sub, Mul, Increment, Decrement, PostfixIncrement, PostfixDecrement, UnaryMinus, and CallBuiltin(abs).	2026-03-07 22:18:22 +01:00
Andreas Kling	9ae5445493	LibJS: Add AsmIntGen assembly interpreter code generator AsmIntGen is a Rust tool that compiles a custom assembly DSL into native x86_64 or aarch64 assembly (.S files). It reads struct field offsets from a generated constants file and instruction layouts from Bytecode.def (via the BytecodeDef crate) to emit platform-specific code for each bytecode handler. The DSL provides a portable instruction set with register aliases, field access syntax, labels, conditionals, and calls. Each backend (codegen_x86_64.rs, codegen_aarch64.rs) translates this into the appropriate platform assembly with correct calling conventions (SysV AMD64, AAPCS64).	2026-03-07 13:09:59 +01:00

15 Commits