Carry full source positions through the Rust bytecode source map so
stack traces and other bytecode-backed source lookups can use them
directly.
This keeps exception-heavy paths from reconstructing line and column
information through SourceCode::range_from_offsets(), which can spend a
lot of time building SourceCode's position cache on first use.
We're trading some space for time here, but I believe it's worth it at
this tag, as this saves ~250ms of main thread time while loading
https://x.com/ on my Linux machine. :^)
Reading the stored Position out of the source map directly also exposed
two things masked by the old range_from_offsets() path: a latent
off-by-one in Lexer::new_at_offset() (its consume() bumped line_column
past the character at offset; only synthesize_binding_pattern() hit it),
and a (1,1) fallback in range_from_offsets() that fired whenever the
queried range reached EOF. Fix the lexer, then rebaseline both the
bytecode dump tests (no more spurious "1:1") and the destructuring AST
tests (binding-pattern identifiers now report their real columns).
Generator::allocate_register used to scan the free pool to find the
lowest-numbered register and then Vec::remove it, making every
allocation O(n) in the size of the pool. When loading https://x.com/
on my Linux machine, we spent ~800ms in this function alone!
This logic only existed to match the C++ register allocation ordering
while transitioning from C++ to Rust in the LibJS compiler, so now
we can simply get rid of it and make it instant. :^)
So drop the "always hand out the lowest-numbered free register" policy
and use the pool as a plain LIFO stack. Pushing and popping the back
of the Vec are both O(1), and peak register usage is unchanged since
the policy only affects which specific register gets reused, not how
aggressively.
Teach the Rust bytecode generator to treat the synthetic entry
GetLexicalEnvironment as a removable prologue load.
We still model reg4 as the saved entry lexical environment during
codegen, but assemble() now deletes that load when no emitted
instruction refers to the saved environment register. This keeps the
semantics of unwinding and environment restoration intact while letting
empty functions and other simple bodies start at their first real
instruction.
Add a metadata header showing register count, block count, local
variable names, and the constants table. Resolve jump targets to
block labels (e.g. "block1") instead of raw hex addresses, and add
visual separation between basic blocks.
Make identifier and property key formatting more concise by using
backtick quoting and showing base_identifier as a trailing
parenthetical hint that joins the base and property names.
Generate a stable name for each executable by hashing the source
text it covers (stable across codegen changes). Named functions
show as "foo$9beb91ec", anonymous ones as "$43362f3f". Also show
the source filename, line, and column.
Add Mov2 and Mov3 bytecode instructions that perform 2 or 3 register
moves in a single dispatch. A peephole optimization pass during
bytecode assembly merges consecutive Mov instructions within each
basic block into these combined instructions.
When merging, identical Movs are deduplicated (e.g. two identical Movs
become a single Mov, not a Mov2). This optimization is implemented in
both the C++ and Rust codegen pipelines.
The goal is to reduce the per-instruction dispatch overhead, which is
significant compared to the actual cost of moving a value.
This isn't fancy or elegant, but provides a real speed-up on many
workloads. As an example, Kraken/imaging-desaturate.js improves by
~1.07x on my laptop.
These tests pass when running them normally, but they produce a diff
when rebaselining. We should probably find out where this is coming
from, but for now just rebaseline all affected tests to make bytecode
diffs of upcoming commits clean.
Instead of storing a u32 index into a cache vector and looking up the
cache at runtime through a chain of dependent loads (load Executable*,
load vector data pointer, multiply index, add), store the actual cache
pointer as a u64 directly in the instruction stream.
A fixup pass (Executable::fixup_cache_pointers()) runs after Executable
construction in both the Rust and C++ pipelines, walking the bytecode
and replacing each index with the corresponding pointer.
The cache pointer type is encoded in Bytecode.def (e.g.
PropertyLookupCache*, GlobalVariableCache*) so the fixup switch is
auto-generated by the Python Op code generator, making it impossible
to forget updating the fixup when adding new cached instructions.
This eliminates 3-4 dependent loads on every inline cache access in
both the C++ interpreter and the assembly interpreter.
The scope collector uses HashMaps for identifier groups and variables,
which means their iteration order is non-deterministic. This causes
local variable indices and function declaration instantiation (FDI)
bytecode to vary between runs.
Fix this by sorting identifier group keys alphabetically before
assigning local variable indices, and sorting vars_to_initialize by
name before emitting FDI bytecode.
Also make register allocation deterministic by always picking the
lowest-numbered free register instead of whichever one happens to be
at the end of the free list.
This is preparation for bringing in a new source->bytecode pipeline
written in Rust. Checking for regressions is significantly easier
if we can expect identical output from both pipelines.