Commit Graph

32 Commits

Author SHA1 Message Date
Andreas Kling
583fa475fb LibJS: Call RawNativeFunction directly from asm Call
The asm interpreter already inlines ECMAScript calls, but builtin calls
still went through the generic C++ Call slow path even when the callee
was a plain native function pointer. That added an avoidable boundary
around hot builtin calls and kept asm from taking full advantage of the
new RawNativeFunction representation.

Teach the asm Call handler to recognize RawNativeFunction, allocate the
callee frame on the interpreter stack, copy the call-site arguments,
and jump straight to the stored C++ entry point.
NativeJavaScriptBackedFunction and other non-raw callees keep falling
through to the existing C++ slow path unchanged.
2026-04-15 15:57:48 +02:00
Andreas Kling
517812647a LibJS: Pack asm Call shared-data metadata
Pack the asm Call fast path metadata next to the executable pointer
so the interpreter can fetch both values with one paired load. This
removes several dependent shared-data loads from the hot path.

Keep the executable pointer and packed metadata in separate registers
through this binding so the fast path can still use the paired-load
layout after any non-strict this adjustment.

Lower the packed metadata flag checks correctly on x86_64 as well.
Those bits now live above bit 31, so the generator uses bt for single-
bit high masks and covers that path with a unit test.

Add a runtime test that exercises both object and global this binding
through the asm Call fast path.
2026-04-14 12:37:12 +02:00
Andreas Kling
fa931612e1 LibJS: Pair-store the asm Call frame setup
Teach the asm Call fast path to use paired stores for the fixed
ExecutionContext header writes and for the caller linkage fields.
This also initializes the five reserved Value slots directly instead
of looping over them as part of the general register clear path.

That keeps the hot frame setup work closer to the actual data layout:
reserved registers are seeded with a couple of fixed stores, while the
remaining register and local slots are cleared in wider chunks.

On x86_64, keep the new explicit-offset formatting on store_pair*
and load_pair* without changing ordinary [base, index, scale]
operands into base-plus-index-plus-offset addresses. Add unit
tests covering both the paired zero-offset form and the preserved
scaled-index lowering.
2026-04-14 12:37:12 +02:00
Andreas Kling
fcbbc6a4b8 LibJS: Add paired stores to the AsmInt DSL
Teach AsmIntGen about store_pair32 and store_pair64 so hot handlers
can describe adjacent writes just as explicitly as adjacent reads.
The DSL now requires naming both memory operands and rejects
non-adjacent or reordered pairs at code generation time.

On aarch64 the new instructions lower to stp when the address is
encodable, while x86_64 keeps the same semantics with two scalar
stores. The shared validation keeps the paired access rules consistent
across both load and store primitives.
2026-04-14 12:37:12 +02:00
Andreas Kling
ce753047b0 LibJS: Add verifiable paired loads to the AsmInt DSL
Add load_pair32 and load_pair64 to the AsmInt DSL and make the
generator verify that both named memory operands are truly adjacent.
That keeps paired loads self-documenting in the DSL instead of
hiding the second field behind an implicit adjacency assumption.

AArch64 now lowers valid pairs to ldp when the address form allows
it, while x86_64 keeps the same behavior with two obvious scalar loads.
Add unit tests for the shared validator so reversed or non-adjacent
field pairs are rejected during code generation.
2026-04-14 12:37:12 +02:00
Andreas Kling
4405c52042 LibJS: Zero-extend 32-bit AArch64 asm immediates
Teach the AArch64 AsmInt generator to materialize immediates through
w-register writes when the upper 32 bits are known zero.

That keeps the same x-register value while letting common constants
use shorter instruction sequences.
2026-04-14 08:14:43 +02:00
Andreas Kling
960a36db53 LibJS: Lower zero store immediates to zero registers on AArch64
Teach the AArch64 AsmInt generator to lower zero-immediate stores
through xzr or wzr instead of materializing a temporary register.

This covers store64 as well as the narrow store8, store16, and
store32 forms, keeping the generated code shorter on the zero
store fast path.
2026-04-14 08:14:43 +02:00
Andreas Kling
87797e9161 LibJS: Use tbz and tbnz for single-bit asm branches
AsmIntGen already lowers branch_zero and branch_nonzero to the compact
AArch64 branch-on-bit forms when possible, but branch_bits_set and
branch_bits_clear still expanded single-bit immediates into tst plus a
separate conditional branch.

Teach the AArch64 backend to recognize power-of-two masks and emit
tbnz or tbz directly. This shortens several hot interpreter paths.
2026-04-14 08:14:43 +02:00
Andreas Kling
b1dab18e42 LibJS: Teach AsmIntGen helper primitives
Add load_vm, memory-operand macro substitution, and a generic
inc32_mem instruction to the AsmInt DSL.

Also drop redundant mov reg, reg copies in the backends so handlers
that use the new helpers expand to cleaner assembly.
2026-04-14 08:14:43 +02:00
Andreas Kling
2ca7dfa649 LibJS: Move bytecode interpreter state to VM
The bytecode interpreter only needed the running execution context,
but still threaded a separate Interpreter object through both the C++
and asm entry points. Move that state and the bytecode execution
helpers onto VM instead, and teach the asm generator and slow paths to
use VM directly.
2026-04-13 18:29:43 +02:00
Andreas Kling
114eeddea1 AsmIntGen: Avoid clobbering r11 in store_operand
The x86_64 asm interpreter mapped t8 to r11, but store_operand
also used r11 as its scratch register for operand loads. When a
handler stored a JS value from t8, the scratch load overwrote the
value first and wrote raw operand bits into the register file.
2026-04-10 15:12:53 +02:00
Andreas Kling
1ff61754a7 LibJS: Re-box double arithmetic results as Int32 when possible
When the asmint computes a double result for Add, Sub, Mul,
Math.floor, Math.ceil, or Math.sqrt, try to store it as Int32
if the value is a whole number in [INT32_MIN, INT32_MAX] and
not -0.0. This mirrors the JS::Value(double) constructor and
allows downstream int32 fast paths to fire.

Also add label uniquification to the DSL macro expander so the
same macro can be used multiple times in one handler without
label collisions.
2026-03-19 09:42:04 +01:00
Andreas Kling
5e403af5be LibJS: Tighten asmint ToInt32 boxing
Teach js_to_int32 to leave a clean low 32-bit result on success, then
use box_int32_clean in the ToInt32 fast path and adjacent boolean
coercions. This removes one instruction from the AArch64 fjcvtzs path
and trims the boolean boxing path without changing behavior.
2026-03-19 09:42:04 +01:00
Andreas Kling
645f481825 LibJS: Fast-path Float32Array indexed access
Add the small AsmIntGen float32 load, store, and conversion operations
needed to handle Float32Array directly in the AsmInt typed-array
GetByValue and PutByValue paths.

This covers direct indexed reads plus both int32 and double stores,
and adds regression coverage for Math.fround rounding, negative zero,
and NaN.
2026-03-19 09:42:04 +01:00
Luke Wilde
75725e283d LibJS/AsmIntGen: Emit unwind info for AArch64
This restores backtrace functionality on AArch64 when crossing a
bytecode call frame, where it would previously stop.
2026-03-16 19:30:40 -05:00
Luke Wilde
1378d37e92 LibJS/AsmIntGen: Emit unwind info for x86_64
This restores backtrace functionality on x86_64 when crossing a
bytecode call frame, where it would previously stop.
2026-03-16 19:30:40 -05:00
Timothy Flynn
b894278295 LibJS: Use the 2024 Rust edition for AsmIntGen and ByteCodeDef 2026-03-11 16:57:09 -04:00
Andreas Kling
c3deaa4746 AsmIntGen: Panic on missing constants
Replace all unwrap_or(0) and parse().unwrap_or(0) calls in the
asmint code generator with expect()/panic! so that missing
constants or unparseable literals cause a build-time failure
instead of silently generating wrong code.
2026-03-08 23:04:55 +01:00
Andreas Kling
30d7b7db20 AsmIntGen: Fix aarch64 mul32_overflow leaving garbage in upper 32 bits
The smull instruction writes a 64-bit result to the destination
register. For negative results like 1 * -1 = -1, this means the
upper 32 bits are all 1s (sign extension of the 64-bit value).

The subsequent box_int32_clean assumed the upper 32 bits were
already zero, so it just set the NaN-boxing tag with movk. This
produced a corrupted Value where strict equality (===) would fail
even though the numeric value was correct.

Fix this by adding a mov wN, wN after the overflow check to
zero-extend the 32-bit result, matching what add32_overflow and
sub32_overflow already do by writing to W registers.
2026-03-08 23:04:55 +01:00
Andreas Kling
d5eed2632f AsmInt: Add branch_zero32/branch_nonzero32 to the asmint DSL
These test only the low 32 bits of a register, replacing the previous
pattern of `and reg, 0xFFFFFFFF` followed by `branch_zero` or
`branch_nonzero`.

On aarch64 the old pattern emitted `mov w1, w1; cbnz x1` (2 insns),
now it's just `cbnz w1` (1 insn). Used in JumpIf, JumpTrue, JumpFalse,
and Not for the int32 truthiness fast path.
2026-03-08 23:04:55 +01:00
Andreas Kling
2db4d30e56 AsmIntGen: Stop maintaining w25 (pc) on every asmint dispatch on aarch64
x21 (instruction pointer = pb + pc) is already the primary dispatch
register. Maintaining w25 (the 32-bit pc offset) in parallel on every
dispatch_next, goto_handler, and dispatch_variable was redundant.

Compute the 32-bit pc on demand via `sub w1, w21, w26` only when
calling into C++ (slow paths), which is the cold path. This removes
one instruction from every hot dispatch sequence and every jump target.

The generated output shrinks from 4692 to 4345 lines (~347 instructions
removed), with every handler benefiting from shorter dispatch tails.
2026-03-08 23:04:55 +01:00
Andreas Kling
949454feb9 AsmIntGen: Pin {INT32,BOOLEAN,NAN_BASE}_TAG in callee-saved registers
These three tag constants (0x7FFA, 0x7FF9, 0x7FF8) exceed the 12-bit
cmp immediate range on aarch64, so every comparison required a mov+cmp
pair. Pin them in x22, x23, x24 (callee-saved, previously unused) to
turn ~160 two-instruction sequences into single cmp instructions.
2026-03-08 23:04:55 +01:00
Andreas Kling
368efef620 AsmIntGen: Support [pb, pc, field] three-operand memory access
Teach the DSL and both arch backends to handle memory operands of
the form [pb, pc, field_ref], meaning base + index + field_offset.

On aarch64, since x21 already caches pb + pc (the instruction
pointer), this emits a single `ldr dst, [x21, #offset]` instead of
the previous `mov t0, x21` + `ldr dst, [t0, #offset]` two-instruction
sequence.

On x86_64, this emits `[r14 + r13 + offset]` which is natively
supported by x86 addressing modes.

Convert all `lea t0, [pb, pc]` + `loadNN tX, [t0, field]` pairs in
the DSL to the new single-instruction form, saving one instruction
per IC access and other field loads in GetById, PutById, GetLength,
GetGlobal, SetGlobal, and CallBuiltin handlers.
2026-03-08 10:27:13 +01:00
Andreas Kling
8936cda523 AsmIntGen: Cache pb+pc in callee-saved x21 on aarch64
Pin x21 = pb + pc (the instruction pointer) as a callee-saved register
that survives C++ calls. x21 is set during dispatch and remains valid
throughout the entire handler.

This eliminates redundant `add x9, x26, x25` instructions from every
load_operand, store_operand, load_label, and dispatch_next sequence.
Also optimizes `lea dst, [pb, pc]` to `mov dst, x21`.

For dispatch_next, the next opcode is loaded via `ldrb w9, [x21, #size]`
and x21 is updated incrementally (`add x21, x21, #size`), which also
improves the dependency chain vs recomputing from x26 + x25.

dispatch_current is promoted from a DSL macro to a codegen instruction
so it can set x21 for the next handler.
2026-03-07 22:18:22 +01:00
Andreas Kling
8afa1df951 AsmIntGen: Pin canonical NaN in callee-saved d8 on aarch64
Load CANON_NAN_BITS into d8 (a callee-saved FP register) at
interpreter entry. This avoids materializing the 64-bit constant
in every canonicalize_nan cold fixup block.

Before: cold block was `movz x9, ... / movk x9, ... / b ret`
After:  cold block is just `fmov xD, d8 / b ret`

The hot path (fmov + fcmp + b.vs) is unchanged. The constant is
only needed when the result is actually NaN, which is rare, but
this still shrinks code size and avoids the multi-instruction
immediate materialization at 11 call sites.
2026-03-07 22:18:22 +01:00
Andreas Kling
e486ad2c0c AsmIntGen: Use platform-optimal codegen for NaN-boxing operations
Convert extract_tag, unbox_int32, unbox_object, box_int32, and
box_int32_clean from DSL macros into codegen instructions, allowing
each backend to emit optimal platform-specific code.

On aarch64, this produces significant improvements:

- extract_tag: single `lsr xD, xS, #48` instead of `mov` + `lsr`
  (3-operand shifts are free on ARM). Saves 1 instruction at 57
  call sites.

- unbox_object: single `and xD, xS, #0xffffffffffff` instead of
  `mov` + `shl` + `shr`. The 48-bit mask is a valid ARM64 logical
  immediate. Saves 2 instructions at 6 call sites.

- box_int32: `mov wD, wS` + `movk xD, #tag, lsl #48` instead of
  `mov` + `and 0xFFFFFFFF` + `movabs tag` + `or`. The w-register
  mov zero-extends, and movk overwrites just the top 16 bits.
  Saves 2 instructions and no longer clobbers t0 (rax).

- box_int32_clean: `movk xD, #tag, lsl #48` (1 instruction) instead
  of `mov` + `movabs tag` + `or` (saves 2 instructions, no t0
  clobber).

On x86_64, the generated code is equivalent to the old macros.
2026-03-07 22:18:22 +01:00
Andreas Kling
3e7fa8b09a AsmInt: Use 32-bit NOT in BitwiseNot handler
Add a not32 DSL instruction that operates on the 32-bit sub-register,
zeroing the upper 32 bits (x86_64: not r32, aarch64: mvn w_reg).

Use it in BitwiseNot to avoid the sign-extension (unbox_int32), 64-bit
NOT, and explicit AND 0xFFFFFFFF. The 32-bit NOT produces a clean
upper half, so we can use box_int32_clean directly.

Before: movsxd + not r64 + and 0xFFFFFFFF + and 0xFFFFFFFF + or tag
After:  mov + not r32 + or tag
2026-03-07 22:18:22 +01:00
Andreas Kling
6492c88ad8 AsmIntGen: Elide redundant FP comparisons in consecutive branch_fp_*
When consecutive branch_fp_* instructions use the same operands (e.g.
branch_fp_unordered followed by branch_fp_equal), the 2nd ucomisd/fcmp
is redundant since flags are still valid from the first comparison.

Track the last FP comparison operands in HandlerState and skip the
comparison instruction when it would be identical. This is common in
the double_equality_compare macro which checks for unordered (NaN)
before testing equality.
2026-03-07 22:18:22 +01:00
Andreas Kling
472edb3448 AsmIntGen: Use mov r32 for unsigned 32-bit immediates on x86_64
Values in the range 0x80000000..0xFFFFFFFF were incorrectly emitted
as plain `mov r64, imm` which GAS encodes as a 10-byte movabs. Use
`mov r32, imm32` instead (5 bytes, implicitly zero-extends to 64
bits). This affects constants like ENVIRONMENT_COORDINATE_INVALID
(0xFFFFFFFE) which appeared 5 times in the generated assembly.
2026-03-07 22:18:22 +01:00
Andreas Kling
c6fd52e317 AsmIntGen: Move NaN canonicalization to cold fixup blocks
canonicalize_nan previously emitted its full NaN fixup inline:
on x86_64, a 10-byte movabs + cmovp; on aarch64, a multi-instruction
mov sequence + fcsel. These were always on the hot path even though
NaN results from arithmetic are extremely rare.

Move the NaN fixup to a cold block emitted after the handler body.
The hot path is now just: movq/fmov + ucomisd/fcmp + jp/b.vs (a
forward branch predicted not-taken). This removes 14 bytes of
instructions from the hot path of every handler that produces
double results (Add, Sub, Mul, Div, and several builtins).

Both backends gain a HandlerState struct (shared between them) that
accumulates cold fixup blocks during code generation, emitted after
the main body.
2026-03-07 22:18:22 +01:00
Andreas Kling
5b8114a96b AsmInt: Use hardware overflow flag for int32 arithmetic
Replace the pattern of 64-bit arithmetic + sign-extend + compare
with dedicated 32-bit overflow instructions that use the hardware
overflow flag directly.

Before: add t3, t4 / unbox_int32 t5, t3 / branch_ne t3, t5, .overflow
After:  add32_overflow t3, t4, .overflow

On x86_64 this compiles to `add r32, r32; jo label` (the 32-bit
register write implicitly zeros the upper 32 bits). On aarch64,
`adds w, w, w; b.vs label` for add/sub, `smull + sxtw + cmp + b.ne`
for multiply, and `negs + b.vs` for negate.

Nine call sites updated: Add, Sub, Mul, Increment, Decrement,
PostfixIncrement, PostfixDecrement, UnaryMinus, and CallBuiltin(abs).
2026-03-07 22:18:22 +01:00
Andreas Kling
9ae5445493 LibJS: Add AsmIntGen assembly interpreter code generator
AsmIntGen is a Rust tool that compiles a custom assembly DSL into
native x86_64 or aarch64 assembly (.S files). It reads struct field
offsets from a generated constants file and instruction layouts from
Bytecode.def (via the BytecodeDef crate) to emit platform-specific
code for each bytecode handler.

The DSL provides a portable instruction set with register aliases,
field access syntax, labels, conditionals, and calls. Each backend
(codegen_x86_64.rs, codegen_aarch64.rs) translates this into the
appropriate platform assembly with correct calling conventions
(SysV AMD64, AAPCS64).
2026-03-07 13:09:59 +01:00