Remove four fields that are trivially derivable from other fields
already present in the ExecutionContext:
- global_object (from realm)
- global_declarative_environment (from realm)
- identifier_table (from executable)
- property_key_table (from executable)
This shrinks ExecutionContext from 192 to 160 bytes (-17%).
The asmint's GetGlobal/SetGlobal handlers now load through the realm
pointer, taking advantage of the cached declarative environment
pointer added in the previous commit.
Realm now caches a direct pointer to the global declarative
environment record, updated when set_global_environment() is called.
This avoids an extra pointer chase through GlobalEnvironment in hot
paths like the asmint's GetGlobal/SetGlobal handlers.
Replace the icu4c-based calendar implementation with one built on the
icu4x Rust crate (icu_calendar).
The icu4c API does not expose the píngqì month-assignment algorithm
used by the Chinese and Dangi lunisolar calendars. Our old code had to
approximate this by walking months via epoch millisecond arithmetic and
manually tracking leap month positions, which produced incorrect month
codes and ordinal month numbers for certain years. The icu4x calendar
crate handles píngqì natively.
With this patch, which is almost a 1-to-1 mapping of ICU invocations, we
pass 100% of all Temporal test262 tests.
The end goal might be to use icu4x for all of our ICU needs. But it does
not yet provide the APIs needed for all ECMA-402 prototypes.
This adds international calendar support to our Temporal implementation,
using the Intl Era and Month Code Proposal as a guide. See:
https://tc39.es/proposal-intl-era-monthcode/
Same as commit f9fa548d43.
These are String from the outset, so this patch is almost entirely just
changing function parameter types. This will allow us to cache calendar
objects in ICU without invoking any extra allocations.
Property lookup cache entries previously used GC::Weak<T> for shape,
prototype, and prototype_chain_validity pointers. Each GC::Weak
requires a ref-counted WeakImpl allocation and an extra indirection
on every access.
Replace these with GC::RawPtr<T> and make Executable a WeakContainer
so the GC can clear stale pointers during sweep via remove_dead_cells.
For static PropertyLookupCache instances (used throughout the runtime
for well-known property lookups), introduce StaticPropertyLookupCache
which registers itself in a global list that also gets swept.
Now that inline cache entries use GC::RawPtr instead of GC::Weak,
we can compare shape/prototype pointers directly without going
through the WeakImpl indirection. This removes one dependent load
from each IC check in GetById, PutById, GetLength, GetGlobal, and
SetGlobal handlers.
Replace individual bool bitfields in Object (m_is_extensible,
m_has_parameter_map, m_has_magical_length_property, etc.) with a
single u8 m_flags field and Flag:: constants.
This consolidates 8 scattered bitfields into one byte with explicit
bit positions, making them easy to access from generated assembly
code at a known offset. It also converts the virtual is_function()
and is_ecmascript_function_object() methods to flag-based checks,
avoiding virtual dispatch for these hot queries.
ProxyObject now explicitly clears the IsFunction flag in its
constructor when wrapping a non-callable target, instead of relying
on a virtual is_function() override.
Instead of recursing through 5 native stack frames per JS function
call (execute_call -> internal_call -> ordinary_call_evaluate_body ->
run_executable -> run_bytecode), handle Call and CallConstruct for
normal ECMAScript functions directly in the dispatch loop.
The fast path allocates the callee's execution context on the
InterpreterStack, copies arguments, sets up the environment, and
jumps to the callee's bytecode entry point. Return and End unwind
inline frames by restoring the caller's state. Exception unwinding
walks through inline frames to find handlers.
The fast path code is kept in NEVER_INLINE helper functions
(try_inline_call, try_inline_call_construct, pop_inline_frame) to
minimize register pressure in the dispatch loop. handle_exception
takes program_counter by value to avoid forcing it onto the stack.
Reloading of bytecode/program_counter after frame switches is done
inline at each call site via RELOAD_AND_GOTO_START to preserve a
single dispatch entry point for optimal indirect branch prediction.
Replace alloca-based execution context allocation with InterpreterStack
bump allocation across all call sites: bytecode call instructions,
AbstractOperations call/construct, script evaluation, module evaluation,
and LibWeb module script evaluation.
Also replace the native stack space check with an InterpreterStack
exhaustion check, and remove the now-unused alloca macros from
ExecutionContext.h.
Add a managed bump allocator backed by an 8 MB mmap region to replace
alloca-based execution context allocation. Pages are committed on
demand by the OS, so only the memory actually touched is resident.
The InterpreterStack is stored as a member of VM and provides simple
LIFO allocation and deallocation of ExecutionContext frames.
Replace 20 separate Put instructions (5 PutKinds x 4 forms) with
4 unified instructions (PutById, PutByIdWithThis, PutByValue,
PutByValueWithThis), each carrying a PutKind field at runtime instead
of being a separate opcode.
This reduces the number of handler entry points in the dispatch loop
and eliminates template instantiations of put_by_property_key and
put_by_value that were being duplicated 5x each when inlined by LTO.
LIBJS_COMPARE_PIPELINES previously only compared top-level
script/eval/module bytecodes. Function bodies are compiled lazily
via compile_function(), and that path had no comparison at all.
Fix this by pairing each Rust-compiled SharedFunctionInstanceData
with its C++ counterpart during top-level compilation. When a
function is later lazily compiled, compile_function() runs both
pipelines and compares the bytecodes (crashing on mismatch, same
as the top-level comparisons). The pairing is done recursively so
nested functions are also covered.
Escaped surrogate sequences should not combine with adjacent literal
surrogates in Unicode mode.
We now use `\u{XXXX}` braces instead of `\uXXXX` when escaping code
units in Unicode mode, so LibRegex treats each as a standalone code
point. Also prevent GenericLexer from combining `\uXXXX` and `\u{XXXX}`.
If we have a UTF-16 string with ASCII storage, we don't need to convert
the string to UTF-8 to get a StringView. This fast-path is hit over 76k
times during test-js and over 31 million times during test262.
Implement a complete Rust reimplementation of the LibJS frontend:
lexer, parser, AST, scope collector, and bytecode code generator.
The Rust pipeline is built via Corrosion (CMake-Cargo bridge) and
linked into LibJS as a static library. It is gated behind a build
flag (ENABLE_RUST, on by default except on Windows) and two runtime
environment variables:
- LIBJS_CPP: Use the C++ pipeline instead of Rust
- LIBJS_COMPARE_PIPELINES=1: Run both pipelines in lockstep,
aborting on any difference in AST or bytecode generated.
The C++ side communicates with Rust through a C FFI layer
(RustIntegration.cpp/h) that passes source text to Rust and receives
a populated Executable back via a BytecodeFactory interface.
In the benchmark added here, fmt's dragonbox is ~3x faster than our own
Ryu implementation (1197ms for dragonbox vs. 3435ms for Ryu).
Daniel Lemire recently published an article about these algorithms:
https://lemire.me/blog/2026/02/01/converting-floats-to-strings-quickly/
In this article, fmt's dragonbox implementation is actually one of the
slower ones (with the caveat that some comments note that the article is
a bit out-of-date). I've gone with fmt here because:
1. It has a readily available recent version on vcpkg.
2. It provides the methods we need to actually convert a floating point
to decimal exponential form.
3. There is an ongoing effort to replace dragonbox with a new algorithm,
zmij, which promises to be faster.
4. It is one of the only users of AK/UFixedBigInt, so we can potentially
remove that as well soon.
5. Bringing in fmt opens the door to replacing a bunch of AK::format
facilities with fmt as well.
When the computed significand lands exactly on 10 ^ (precision - 1), the
value sits right on a power-of-10 boundary where two representations are
possible. For example, consider 1e-21. The nearest double value to 1e-21
is actually slightly less than 1e-21. So with `toPrecision(16)`, we must
choose between:
exponent=-21 significand=1000000000000000 -> 1.000000000000000e-21
exponent=-22 significand=9999999999999999 -> 9.999999999999999e-22
The spec dictates that we must pick the value that is closer to the true
value. In this case, the second value is actually closer.
The arithmetic here nearly exactly matches that of toPrecision from
commit cf180bd4da.
The only difference is test262 contains tests for which our exponent
estimate is off-by-1. We now handle this by detecting this inaccuracy.
adjusting the exponent, and recomputing the significand.
This will be needed by Number.prototype.toExponential. It will also need
some changes, so moving it up ahead of time will make that diff more
practical to read.
Rework our hash functions a bit for significant better performance:
* Rename int_hash to u32_hash to mirror u64_hash.
* Make pair_int_hash call u64_hash instead of multiple u32_hash()es.
* Implement MurmurHash3's fmix32 and fmix64 for u32_hash and u64_hash.
On my machine, this speeds up u32_hash by 20%, u64_hash by ~290%, and
pair_int_hash by ~260%.
We lose the property that an input of 0 results in something that is not
0. I've experimented with an offset to both hash functions, but it
resulted in a measurable performance degradation for u64_hash. If
there's a good use case for 0 not to result in 0, we can always add in
that offset as a countermeasure in the future.
Our previous implementation produced incorrect results for values near
the limits of double precision. This new implementation avoid floating-
point arithmetic entirely by:
1. Decomposing the double value into its exact binary form.
2. Computing the formulas from the spec using bigints.
3. Using Ryu to calculate the decimal exponent.
The result of parsing an identifier cannot change. It is not cheap to do
so, so let's cache the result.
This is hammered in a few test262 tests. On my machine, this reduces the
runtime of each test/staging/sm/Date/dst-offset-caching-{N}-of-8.js by
0.3 to 0.5 seconds. For example:
dst-offset-caching-1-of-8.js: Reduces from 1.2s -> 0.9s
dst-offset-caching-3-of-8.js: Reduces from 1.5s -> 1.1s
These are String from the outset, so this patch is almost entirely just
changing function parameter types. This will allow us to cache time zone
parse results without invoking any extra allocations.