Commit Graph

15 Commits

Author SHA1 Message Date
Andreas Kling
c3deaa4746 AsmIntGen: Panic on missing constants
Replace all unwrap_or(0) and parse().unwrap_or(0) calls in the
asmint code generator with expect()/panic! so that missing
constants or unparseable literals cause a build-time failure
instead of silently generating wrong code.
2026-03-08 23:04:55 +01:00
Andreas Kling
30d7b7db20 AsmIntGen: Fix aarch64 mul32_overflow leaving garbage in upper 32 bits
The smull instruction writes a 64-bit result to the destination
register. For negative results like 1 * -1 = -1, this means the
upper 32 bits are all 1s (sign extension of the 64-bit value).

The subsequent box_int32_clean assumed the upper 32 bits were
already zero, so it just set the NaN-boxing tag with movk. This
produced a corrupted Value where strict equality (===) would fail
even though the numeric value was correct.

Fix this by adding a mov wN, wN after the overflow check to
zero-extend the 32-bit result, matching what add32_overflow and
sub32_overflow already do by writing to W registers.
2026-03-08 23:04:55 +01:00
Andreas Kling
d5eed2632f AsmInt: Add branch_zero32/branch_nonzero32 to the asmint DSL
These test only the low 32 bits of a register, replacing the previous
pattern of `and reg, 0xFFFFFFFF` followed by `branch_zero` or
`branch_nonzero`.

On aarch64 the old pattern emitted `mov w1, w1; cbnz x1` (2 insns),
now it's just `cbnz w1` (1 insn). Used in JumpIf, JumpTrue, JumpFalse,
and Not for the int32 truthiness fast path.
2026-03-08 23:04:55 +01:00
Andreas Kling
2db4d30e56 AsmIntGen: Stop maintaining w25 (pc) on every asmint dispatch on aarch64
x21 (instruction pointer = pb + pc) is already the primary dispatch
register. Maintaining w25 (the 32-bit pc offset) in parallel on every
dispatch_next, goto_handler, and dispatch_variable was redundant.

Compute the 32-bit pc on demand via `sub w1, w21, w26` only when
calling into C++ (slow paths), which is the cold path. This removes
one instruction from every hot dispatch sequence and every jump target.

The generated output shrinks from 4692 to 4345 lines (~347 instructions
removed), with every handler benefiting from shorter dispatch tails.
2026-03-08 23:04:55 +01:00
Andreas Kling
949454feb9 AsmIntGen: Pin {INT32,BOOLEAN,NAN_BASE}_TAG in callee-saved registers
These three tag constants (0x7FFA, 0x7FF9, 0x7FF8) exceed the 12-bit
cmp immediate range on aarch64, so every comparison required a mov+cmp
pair. Pin them in x22, x23, x24 (callee-saved, previously unused) to
turn ~160 two-instruction sequences into single cmp instructions.
2026-03-08 23:04:55 +01:00
Andreas Kling
368efef620 AsmIntGen: Support [pb, pc, field] three-operand memory access
Teach the DSL and both arch backends to handle memory operands of
the form [pb, pc, field_ref], meaning base + index + field_offset.

On aarch64, since x21 already caches pb + pc (the instruction
pointer), this emits a single `ldr dst, [x21, #offset]` instead of
the previous `mov t0, x21` + `ldr dst, [t0, #offset]` two-instruction
sequence.

On x86_64, this emits `[r14 + r13 + offset]` which is natively
supported by x86 addressing modes.

Convert all `lea t0, [pb, pc]` + `loadNN tX, [t0, field]` pairs in
the DSL to the new single-instruction form, saving one instruction
per IC access and other field loads in GetById, PutById, GetLength,
GetGlobal, SetGlobal, and CallBuiltin handlers.
2026-03-08 10:27:13 +01:00
Andreas Kling
8936cda523 AsmIntGen: Cache pb+pc in callee-saved x21 on aarch64
Pin x21 = pb + pc (the instruction pointer) as a callee-saved register
that survives C++ calls. x21 is set during dispatch and remains valid
throughout the entire handler.

This eliminates redundant `add x9, x26, x25` instructions from every
load_operand, store_operand, load_label, and dispatch_next sequence.
Also optimizes `lea dst, [pb, pc]` to `mov dst, x21`.

For dispatch_next, the next opcode is loaded via `ldrb w9, [x21, #size]`
and x21 is updated incrementally (`add x21, x21, #size`), which also
improves the dependency chain vs recomputing from x26 + x25.

dispatch_current is promoted from a DSL macro to a codegen instruction
so it can set x21 for the next handler.
2026-03-07 22:18:22 +01:00
Andreas Kling
8afa1df951 AsmIntGen: Pin canonical NaN in callee-saved d8 on aarch64
Load CANON_NAN_BITS into d8 (a callee-saved FP register) at
interpreter entry. This avoids materializing the 64-bit constant
in every canonicalize_nan cold fixup block.

Before: cold block was `movz x9, ... / movk x9, ... / b ret`
After:  cold block is just `fmov xD, d8 / b ret`

The hot path (fmov + fcmp + b.vs) is unchanged. The constant is
only needed when the result is actually NaN, which is rare, but
this still shrinks code size and avoids the multi-instruction
immediate materialization at 11 call sites.
2026-03-07 22:18:22 +01:00
Andreas Kling
e486ad2c0c AsmIntGen: Use platform-optimal codegen for NaN-boxing operations
Convert extract_tag, unbox_int32, unbox_object, box_int32, and
box_int32_clean from DSL macros into codegen instructions, allowing
each backend to emit optimal platform-specific code.

On aarch64, this produces significant improvements:

- extract_tag: single `lsr xD, xS, #48` instead of `mov` + `lsr`
  (3-operand shifts are free on ARM). Saves 1 instruction at 57
  call sites.

- unbox_object: single `and xD, xS, #0xffffffffffff` instead of
  `mov` + `shl` + `shr`. The 48-bit mask is a valid ARM64 logical
  immediate. Saves 2 instructions at 6 call sites.

- box_int32: `mov wD, wS` + `movk xD, #tag, lsl #48` instead of
  `mov` + `and 0xFFFFFFFF` + `movabs tag` + `or`. The w-register
  mov zero-extends, and movk overwrites just the top 16 bits.
  Saves 2 instructions and no longer clobbers t0 (rax).

- box_int32_clean: `movk xD, #tag, lsl #48` (1 instruction) instead
  of `mov` + `movabs tag` + `or` (saves 2 instructions, no t0
  clobber).

On x86_64, the generated code is equivalent to the old macros.
2026-03-07 22:18:22 +01:00
Andreas Kling
3e7fa8b09a AsmInt: Use 32-bit NOT in BitwiseNot handler
Add a not32 DSL instruction that operates on the 32-bit sub-register,
zeroing the upper 32 bits (x86_64: not r32, aarch64: mvn w_reg).

Use it in BitwiseNot to avoid the sign-extension (unbox_int32), 64-bit
NOT, and explicit AND 0xFFFFFFFF. The 32-bit NOT produces a clean
upper half, so we can use box_int32_clean directly.

Before: movsxd + not r64 + and 0xFFFFFFFF + and 0xFFFFFFFF + or tag
After:  mov + not r32 + or tag
2026-03-07 22:18:22 +01:00
Andreas Kling
6492c88ad8 AsmIntGen: Elide redundant FP comparisons in consecutive branch_fp_*
When consecutive branch_fp_* instructions use the same operands (e.g.
branch_fp_unordered followed by branch_fp_equal), the 2nd ucomisd/fcmp
is redundant since flags are still valid from the first comparison.

Track the last FP comparison operands in HandlerState and skip the
comparison instruction when it would be identical. This is common in
the double_equality_compare macro which checks for unordered (NaN)
before testing equality.
2026-03-07 22:18:22 +01:00
Andreas Kling
472edb3448 AsmIntGen: Use mov r32 for unsigned 32-bit immediates on x86_64
Values in the range 0x80000000..0xFFFFFFFF were incorrectly emitted
as plain `mov r64, imm` which GAS encodes as a 10-byte movabs. Use
`mov r32, imm32` instead (5 bytes, implicitly zero-extends to 64
bits). This affects constants like ENVIRONMENT_COORDINATE_INVALID
(0xFFFFFFFE) which appeared 5 times in the generated assembly.
2026-03-07 22:18:22 +01:00
Andreas Kling
c6fd52e317 AsmIntGen: Move NaN canonicalization to cold fixup blocks
canonicalize_nan previously emitted its full NaN fixup inline:
on x86_64, a 10-byte movabs + cmovp; on aarch64, a multi-instruction
mov sequence + fcsel. These were always on the hot path even though
NaN results from arithmetic are extremely rare.

Move the NaN fixup to a cold block emitted after the handler body.
The hot path is now just: movq/fmov + ucomisd/fcmp + jp/b.vs (a
forward branch predicted not-taken). This removes 14 bytes of
instructions from the hot path of every handler that produces
double results (Add, Sub, Mul, Div, and several builtins).

Both backends gain a HandlerState struct (shared between them) that
accumulates cold fixup blocks during code generation, emitted after
the main body.
2026-03-07 22:18:22 +01:00
Andreas Kling
5b8114a96b AsmInt: Use hardware overflow flag for int32 arithmetic
Replace the pattern of 64-bit arithmetic + sign-extend + compare
with dedicated 32-bit overflow instructions that use the hardware
overflow flag directly.

Before: add t3, t4 / unbox_int32 t5, t3 / branch_ne t3, t5, .overflow
After:  add32_overflow t3, t4, .overflow

On x86_64 this compiles to `add r32, r32; jo label` (the 32-bit
register write implicitly zeros the upper 32 bits). On aarch64,
`adds w, w, w; b.vs label` for add/sub, `smull + sxtw + cmp + b.ne`
for multiply, and `negs + b.vs` for negate.

Nine call sites updated: Add, Sub, Mul, Increment, Decrement,
PostfixIncrement, PostfixDecrement, UnaryMinus, and CallBuiltin(abs).
2026-03-07 22:18:22 +01:00
Andreas Kling
9ae5445493 LibJS: Add AsmIntGen assembly interpreter code generator
AsmIntGen is a Rust tool that compiles a custom assembly DSL into
native x86_64 or aarch64 assembly (.S files). It reads struct field
offsets from a generated constants file and instruction layouts from
Bytecode.def (via the BytecodeDef crate) to emit platform-specific
code for each bytecode handler.

The DSL provides a portable instruction set with register aliases,
field access syntax, labels, conditionals, and calls. Each backend
(codegen_x86_64.rs, codegen_aarch64.rs) translates this into the
appropriate platform assembly with correct calling conventions
(SysV AMD64, AAPCS64).
2026-03-07 13:09:59 +01:00