asmint.asm no longer references any positional temp register name --
every handler and macro declares its temporaries by name with `temp` /
`ftemp` and lets the register allocator place them. Migrate the last
two macros holding out:
* dispatch_current uses a macro-local `opcode` temp for the load8 +
indirect jmp.
* pop_inline_frame_and_resume names its return-pc, dst-index, value-
address, vm-pointer, and executable temps explicitly.
With nothing left referring to the positional aliases, drop the
tN / ftN -> physical-register fallback from registers::resolve_register
and update the DSL reference comments at the top of asmint.asm and in
main.rs to describe the named-temp model. The two pre-existing codegen
tests that probed the old positional behavior get rewritten to use the
post-allocation physical-register names directly, since that is now
the actual contract of resolve_op.
The Call handler is the asm interpreter's hot path for inline JS-to-JS
dispatch, and it carries roughly fifty distinct named values across
its body (callee object, packed metadata word, formal/passed/total
counts, two stack-side pointers, the argument-copy cursor, the
script-or-module pair-load scratch, the native return value and
variant tag, ...).
The native-exception path adds a small `mov helper_arg, native_return`
bridge between call_raw_native (output pinned to rax by
fixed_operands) and call_helper (input pinned to rcx by
fixed_operands), since the two ABIs land their values in different
registers and the explicit mov is the cleanest way to express the
hand-off.
Convert the for-in iteration fast path to named temps. The handler
threads ~25 distinct values through its body (the iterator value,
its tag, the unboxed iterator, the fast-path discriminator, the
property cache pointer, the cached and current shapes, the receiver,
the dictionary-generation pair, the indexed-count and next-indexed
counters, the named-index, the named-key data pointer, and a few
scratch slots), so the migrated form is much easier to read than the
positional spaghetti where t0..t8 each meant five different things in
different basic blocks.
Convert the array / typed-array load fast paths in GetByValue, the
mirror of PutByValue. Roughly the same shape: ~18 named GPR temps
and a slot_dbl FPR temp covering the loaded value, the bytes-or-int
raw integer view, the address scratch, and the negative-zero compare.
The .ta_float32 / .ta_float64 paths become readable: where the
positional code had "Exclude negative zero early (t1 gets clobbered
by double_to_int32)" because t1 had to do double duty, the migrated
form names slot, slot_dbl, neg_zero, and raw separately and the
allocator lays them out without the implicit clobber dance.
Convert the array / typed-array store fast paths in PutByValue --
the largest single handler so far at 150+ positional t-uses -- to
declare its 19 GPR temps and 1 FPR temp by name (kind, base, prop,
index, obj, flags, storage_kind, size, elements, src, capacity_addr,
capacity, slot, empty_tag, kind_byte, addr, src_int32, max, result,
src_dbl, plus base_tag/prop_tag scratches reused across paths).
The migrated body removes the "save in t0 before load_operand
clobbers it" / "compute store address before check_is_double clobbers
t4" maneuvers that the positional code needed -- the allocator
handles those constraints automatically.
Convert the load_primitive_string_utf16_code_unit macro to take its
inputs (string pointer, index) and output (code_unit) as explicit
parameters, plus migrate CallBuiltinStringFromCharCode,
CallBuiltinStringPrototypeCharCodeAt, and
CallBuiltinStringPrototypeCharAt.
The macro previously hard-coded inputs to t2/t4 and output to t0,
which forced callers to remember those slot assignments and to wrap
the call_helper interactions around them. The migrated form -- where
the caller names what it wants in each slot -- removes another
"clobbers t3, t5" comment.
Convert the global variable IC and the array-length fast path to
named temps. These three handlers are large -- GetGlobal and
SetGlobal carry ~20 named values across each path (realm pointer,
global object, declarative environment, cache pointer, two serial
numbers, the shape, the cached shape, the dictionary generations,
the property offset, the named-properties pointer, the loaded value
and tag, plus environment-binding state) -- and the migration makes
those values easy to read instead of having to remember which slot
each piece of state was occupying.
GetLength's "magical" int-to-double widening picks up a named
temp for the sign-bit check and the result, which removes another
"NB: load_operand clobbers t0" comment.
Convert the inline-cache fast paths for property load/store to named
temps. Both handlers carry a substantial amount of state across the
ic-hit check (the boxed base value, its tag, the unboxed Object*,
the shape, the PLC pointer, the cached shape and prototype, the
property offset, the current and cached dictionary generations, the
named-properties pointer, and the loaded value), so giving each
piece an explicit name removes a real readability burden. The IC-miss
path uses the new 3-operand call_interp form.
Notably, PutById's old code carried a "save property offset in t4
before load_operand clobbers t0 (rax)" maneuver -- that is exactly
the kind of cross-cut the allocator handles automatically once the
offset is named. The migrated handler just keeps using `prop_offset`
across the load_operand and the store, no scratch shuffle needed.
Convert the validate_callee_builtin macro and the Math.abs / floor /
ceil / sqrt / exp / and ResolveThisBinding handlers to named DSL
temporaries. Math.exp uses the 3-operand call_helper form; the
allocator pins its `arg` to t1 (rcx) and `result` to t0 (rax)
automatically.
Convert the Div and Mod handlers to named DSL temporaries. Mod is the
first migrated handler that uses divmod, with its quot/rem pinned to
rax/rdx by fixed_operands and its dividend/divisor kept off both by
the operand-vs-implicit-output interference rule.
Convert the walk_env_chain macro to take its outputs (target_env,
bind_index) as explicit parameters, then migrate every handler that
walks the environment chain: GetBinding, GetInitializedBinding,
InitializeLexicalBinding, SetLexicalBinding, and
GetCalleeAndThisFromEnvironment.
The macro's internal scratch (the EnvironmentCoordinate address, the
hops counter, the invalid-coord sentinel, and the screwed-by-eval
flag) is now declared as macro-local temps that the allocator places
without touching whatever the caller has live. Each handler picks
its own names for env / idx / binding / value, instead of remembering
that the macro happens to leave them in t3 and t2.
Convert coerce_to_int32s and bitwise_op to take operand temps as
explicit parameters, then migrate UnaryPlus.
The bitwise_op macro is the actual body of BitwiseAnd, BitwiseOr, and
BitwiseXor (which just call into it), so a single macro migration
covers all three handlers. UnaryPlus is migrated alongside since it
shares the same coerce-to-int32 conceptual path.
Convert the equality dispatch chain to take operand temps as explicit
parameters: equality_same_tag, double_equality_compare,
strict_equality_core, and loose_equality_core. Then migrate the
StrictlyEquals, StrictlyInequals, LooselyEquals, LooselyInequals,
JumpStrictlyEquals, JumpStrictlyInequals, JumpLooselyEquals, and
JumpLooselyInequals handlers.
These macros are tightly coupled -- strict_equality_core and
loose_equality_core both invoke equality_same_tag and reach
double_equality_compare via .double_compare, a label that crosses
macro boundaries. The existing label-uniquification rule (only
self-contained labels are renamed) keeps that contract working
without further plumbing.
Convert the heavy comparison and arithmetic macros to take their
operand temps as explicit parameters, then migrate every handler that
uses them: coerce_to_doubles, numeric_compare, numeric_compare_coerce,
boolean_result_epilogue, jump_binary_epilogue, plus the Add, Sub, Mul,
LessThan, LessThanEquals, GreaterThan, GreaterThanEquals,
JumpLessThan, JumpGreaterThan, JumpLessThanEquals, and
JumpGreaterThanEquals handlers.
Convert check_is_double, check_both_double, and box_double_or_int32 to
declare their internal scratch as macro-local `temp` decls instead of
hardcoding t3/t4. Each invocation now gets a freshly uniquified temp
that the allocator places independently of any other live values in
the caller, so adding a `temp` declaration around a sequence that
calls into one of these macros no longer needs to know which
positional slot the macro happened to be using.
box_double_or_int32 in particular is on the hot arithmetic path
(Add/Sub/Mul/PostfixIncrement/etc.), so this opens the door to
migrating those handlers without conflicting with the macro's
internal scratch use.
Convert another batch of macro-free handlers to named DSL temporaries.
The shift handlers in particular are nice exercises of the allocator's
fixed-operand support: x86's `shl`/`shr`/`sar` need the count register
to be rcx, and the allocator pins the named `count` temp accordingly
without the user having to remember it.
Convert another batch of straightforward handlers to named DSL
temporaries: Increment, Decrement, Not, GetLexicalEnvironment, Return,
and End. The migration uncovers a pleasing property of the new
pipeline: when a migrated handler invokes an un-migrated macro
(pop_inline_frame_and_resume here, which still uses positional t2/t3
/t4 internally), the macro body's positional uses appear as physical-
register defs in the flat instruction list and the allocator's
liveness analysis avoids placing live named temps in those registers.
Convert the three boolean-conditional jump handlers to named temps,
exercising the 3-operand call_helper form so the input value and
truthy result no longer have to be pinned to t1/t0 by hand. The
allocator places `condition` in rcx (matching call_helper's input
convention) and `truthy` in rax (matching its output) automatically.
Convert Jump, JumpNullish, and JumpUndefined to declare named
temporaries instead of pinning values to t0..t2 directly. These three
handlers are macro-free and call-free, so the migration is mechanical
and the allocator picks register assignments equivalent in cost to
the hand-written positional code.
The other Jump* handlers (JumpIf/JumpTrue/JumpFalse) drive
call_helper, which still needs the value to be in t1 (rcx) on x86 by
the existing register convention. Migrating those needs DSL syntax to
attach call_helper's input/output to named temps; until that lands
they stay on the positional form.
These three trivial data-movement handlers are a useful first
migration target -- they exercise the new `temp` declaration syntax
and the register allocator end-to-end without interacting with macros
or fixed-operand constraints. The allocator picks t1-equivalent
registers for the first temp (which is the smallest live range that
must survive load_operand's rax/x9 scratch use) and reuses the slot
across non-overlapping live ranges where possible.
Carry full source positions through the Rust bytecode source map so
stack traces and other bytecode-backed source lookups can use them
directly.
This keeps exception-heavy paths from reconstructing line and column
information through SourceCode::range_from_offsets(), which can spend a
lot of time building SourceCode's position cache on first use.
We're trading some space for time here, but I believe it's worth it at
this tag, as this saves ~250ms of main thread time while loading
https://x.com/ on my Linux machine. :^)
Reading the stored Position out of the source map directly also exposed
two things masked by the old range_from_offsets() path: a latent
off-by-one in Lexer::new_at_offset() (its consume() bumped line_column
past the character at offset; only synthesize_binding_pattern() hit it),
and a (1,1) fallback in range_from_offsets() that fired whenever the
queried range reached EOF. Fix the lexer, then rebaseline both the
bytecode dump tests (no more spurious "1:1") and the destructuring AST
tests (binding-pattern identifiers now report their real columns).
GetCalleeAndThisFromEnvironment treated a binding as initialized when
its value slot was not <empty>. Declarative bindings do not encode TDZ
in that slot, though: uninitialized bindings keep a separate initialized
flag and their value starts as undefined.
That let the first slow-path TDZ failure populate the environment cache,
then a second call at the same site reused the cached coordinate and
turned the required ReferenceError into a TypeError from calling
undefined.
Check Binding.initialized in the asm fast path instead and cover the
cached second-hit case with a regression test.
The asm interpreter already inlines ECMAScript calls, but builtin calls
still went through the generic C++ Call slow path even when the callee
was a plain native function pointer. That added an avoidable boundary
around hot builtin calls and kept asm from taking full advantage of the
new RawNativeFunction representation.
Teach the asm Call handler to recognize RawNativeFunction, allocate the
callee frame on the interpreter stack, copy the call-site arguments,
and jump straight to the stored C++ entry point.
NativeJavaScriptBackedFunction and other non-raw callees keep falling
through to the existing C++ slow path unchanged.
Place Realm's cached declarative environment next to its global object
so the asm global access fast paths can fetch the two pointers with a
paired load. These handlers never use the intervening GlobalEnvironment
pointer directly.
Mirror Executable's constants size and data pointer in adjacent fields
so the asm Call fast path can pair-load them together. The underlying
Vector layout keeps size and data apart, so a small cached raw span
lets the hot constant-copy loop fetch both pieces of metadata at once.
Load PropertyNameIterator's indexed-property count and next index
together when stepping the fast path. Keeping the paired count live
into the named-property case also avoids reloading it before computing
the flattened index.
Load PropertyNameIterator's cached property cache and shape snapshot
together before validating the receiver shape. The two fields already
sit adjacent in the object layout, so the fast path can fetch both
without any extra reshuffling.
Load EnvironmentCoordinate::hops and ::index together in the asm
environment-walk helper. The pair-load keeps the DSL explicit about
which two fields travel together and removes another scalar metadata
fetch from the fast path.
Load the cached property offset and dictionary generation with paired
loads in the property inline-cache fast paths. AsmIntGen now verifies
these reads against the actual cache layout, so the DSL keeps both
fields named and self-documenting.
Load the cached shape and prototype pointer together in the property
inline-cache fast paths that already read both. This keeps the
cache-entry metadata fetches aligned with the DSL's paired-load model
without changing the surrounding control flow.
Load the inline frame's return pc and destination register at once when
Return or End resumes an asm-managed caller. This keeps the unwind
metadata with the helper that consumes it and removes a separate scalar
load from both handlers.
The asm Call fast path was still reloading the executable pointer while
building the inline callee frame, even though it had already loaded the
same pointer while validating the call target.
Carry that executable pointer through frame setup and reload the passed
argument count from the call bytecode instead of the fresh frame header.
This trims a couple more loads from the hot path.
Pack the asm Call fast path metadata next to the executable pointer
so the interpreter can fetch both values with one paired load. This
removes several dependent shared-data loads from the hot path.
Keep the executable pointer and packed metadata in separate registers
through this binding so the fast path can still use the paired-load
layout after any non-strict this adjustment.
Lower the packed metadata flag checks correctly on x86_64 as well.
Those bits now live above bit 31, so the generator uses bt for single-
bit high masks and covers that path with a unit test.
Add a runtime test that exercises both object and global this binding
through the asm Call fast path.
Executable already caches the combined registers, locals, and constants
count that the asm Call fast path needs for inline frame allocation.
Use that precomputed total instead of rebuilding it from the registers
count and constants vector size in the hot path.
The asm Call fast path already checks SharedFunctionInstanceData's
cached can_inline_call bit before touching the executable pointer.
That cache is only true for ordinary functions with compiled bytecode,
so the extra executable null check is redundant work in the hot path.
The asm Call fast path reads InterpreterStack::m_top and m_limit
back-to-back while checking whether the inline callee frame fits.
Those fields are adjacent, so we can load them together with one
paired load and keep the stack-size check otherwise unchanged.
Teach the asm Call fast path to use paired stores for the fixed
ExecutionContext header writes and for the caller linkage fields.
This also initializes the five reserved Value slots directly instead
of looping over them as part of the general register clear path.
That keeps the hot frame setup work closer to the actual data layout:
reserved registers are seeded with a couple of fixed stores, while the
remaining register and local slots are cleared in wider chunks.
On x86_64, keep the new explicit-offset formatting on store_pair*
and load_pair* without changing ordinary [base, index, scale]
operands into base-plus-index-plus-offset addresses. Add unit
tests covering both the paired zero-offset form and the preserved
scaled-index lowering.
Use the new paired-load DSL operations in the inline Call path for the
adjacent environment, ScriptOrModule, caller metadata, and callee-entry
loads. The flow stays the same, but the hot call setup now needs fewer
scalar memory operations on aarch64.
Handle inline-eligible JS-to-JS Call directly in asmint.asm instead
of routing the whole operation through AsmInterpreter.cpp.
The asm handler now validates the callee, binds `this` for the
non-allocating cases, reserves the callee InterpreterStack frame,
populates the ExecutionContext header and Value tail, and enters the
callee bytecode at pc 0.
Keep the cases that need NewFunctionEnvironment() or sloppy `this`
boxing on a narrow helper that still builds an inline frame. This
preserves the existing inline-call semantics for promise-job ordering,
receiver binding, and sloppy global-this handling while keeping the
common path in assembly.
Add regression coverage for closure-capturing callees, sloppy
primitive receivers, and sloppy undefined receivers.
Emit the ExecutionContext, function-object, executable, and realm
offsets that the asm Call path needs to inspect and initialize
directly when building inline frames.
Handle Return and End entirely in AsmInt when leaving an inline frame.
The handlers now restore the caller, update the interpreter stack
bookkeeping directly, and bump the execution generation without
bouncing through AsmInterpreter.cpp.
Add WeakRef tests that exercise both inline Return and inline End
so this path stays covered.
Store whether a function can participate in JS-to-JS inline calls on
SharedFunctionInstanceData instead of recomputing the function kind,
class-constructor bit, and bytecode availability at each fast-path
call site.
Keep JS-to-JS inline calls out of m_execution_context_stack and walk
the active stack from the running execution context instead. Base
pushes now record the previous running context so duplicate
TemporaryExecutionContext pushes and host re-entry still restore
correctly.
This keeps the fast JS-to-JS path off the vector without losing GC
root collection, stack traces, or helpers that need to inspect the
active execution context chain.
The bytecode interpreter only needed the running execution context,
but still threaded a separate Interpreter object through both the C++
and asm entry points. Move that state and the bytecode execution
helpers onto VM instead, and teach the asm generator and slow paths to
use VM directly.
Specialize only the fixed unary case in the bytecode generator and let
all other argument counts keep using the generic Call instruction. This
keeps the builtin bytecode simple while still covering the common fast
path.
The asm interpreter handles int32 inputs directly, applies the ToUint16
mask in-place, and reuses the VM's cached ASCII single-character
strings when the result is 7-bit representable. Non-ASCII single code
unit results stay on the dedicated builtin path via a small helper, and
the dedicated slow path still handles the generic cases.
Tag String.prototype.charAt as a builtin and emit a dedicated
bytecode instruction for non-computed calls.
The asm interpreter can then stay on the fast path when the
receiver is a primitive string with resident UTF-16 data and the
selected code unit is ASCII. In that case we can return the VM's
cached empty or single-character ASCII string directly.
Teach builtin call specialization to recognize non-computed
member calls to charCodeAt() and emit a dedicated builtin opcode.
Mark String.prototype.charCodeAt with that builtin tag, then add
an asm interpreter fast path for primitive-string receivers whose
UTF-16 data is already resident.
The asm path handles both ASCII-backed and UTF-16-backed resident
strings, returns NaN for out-of-bounds Int32 indices, and falls
back to the generic builtin call path for everything else. This
keeps the optimistic case in asm while preserving the ordinary
method call semantics when charCodeAt has been replaced or when
string resolution would be required.
Replace the generic CallBuiltin instruction with one opcode per
supported builtin call and make those instructions fixed-size by
arity. This removes the builtin dispatch sled in the asm
interpreter, gives each builtin a dedicated slow-path entry point,
and lets bytecode generation encode the callee shape directly.
Keep the existing handwritten asm fast paths for the Math builtins
that already benefit from them, while routing the other builtin
opcodes through their own C++ execute implementations. Build the
new opcode directly in Rust codegen, and keep the generic call
fallback when the original builtin function has been replaced.
Cache the flattened enumerable key snapshot for each `for..in` site and
reuse a `PropertyNameIterator` when the receiver shape, dictionary
generation, indexed storage kind and length, prototype chain
validity, and magical-length state still match.
Handle packed indexed receivers as well as plain named-property
objects. Teach `ObjectPropertyIteratorNext` in `asmint.asm` to return
cached property values directly and to fall back to the slow iterator
logic when any guard fails.
Treat arrays' hidden non-enumerable `length` property as a visited
name for for-in shadowing, and include the receiver's magical-length
state in the cache key so arrays and plain objects do not share
snapshots.
Add `test-js` and `test-js-bytecode` coverage for mixed numeric and
named keys, packed receiver transitions, re-entry, iterator reuse, GC
retention, array length shadowing, and same-site cache reuse.