We specialize `Optional<T>` for value types that inherently support some
kind of "empty" value or whose value range allow for a unlikely to be
useful sentinel value that can mean "empty", instead of the boolean flag
a regular Optional<T> needs to store. Because of padding, this often
means saving 4 to 8 bytes per instance.
By extending the new `SentinelOptional<T, Traits>`, these
specializations are significantly simplified to just having to define
what the sentinel value is, and how to identify a sentinel value.
Clean up leftover references to the removed C++ pipeline:
- Remove stale forward declarations from Forward.h (ASTNode,
Parser, Program, FunctionNode, ScopeNode, etc.)
- Delete unused FunctionParsingInsights.h
- Remove dead get_builtin(MemberExpression const&) declaration
from Builtins.h
- Update stale comments referencing ASTCodegen.cpp and
generate_bytecode()
Delete Lexer.cpp/h and Token.cpp, replacing all tokenization with a
new rust_tokenize() FFI function that calls back for each token.
Rewrite SyntaxHighlighter.cpp and js.cpp REPL to use the Rust
tokenizer. The token type and category enums in Token.h now mirror
the Rust definitions in token.rs.
Move is_syntax_character/is_whitespace/is_line_terminator helpers
into RegExpConstructor.cpp as static functions, since they were only
used there.
Delete AST.cpp, AST.h, ASTDump.cpp, ScopeRecord.h, and the dead
get_builtin(MemberExpression const&) from Builtins.cpp.
Extract ImportEntry and ExportEntry into a new ModuleEntry.h,
since they are data types used by the module system, not AST
node types.
Inline ModuleRequest's sorting constructor and
SourceRange::filename().
Remove the dead annex_b_function_declarations field from
EvalDeclarationData, which was only populated by the C++ parser.
Remove the constructor that took C++ AST nodes (FunctionParameters,
Statement), along with create_for_function_node() and the
m_formal_parameters / m_ecmascript_code fields. These were only used
by the now-removed C++ compilation pipeline.
Also remove the dead EvalDeclarationData::create(VM&, Program&, bool)
and ECMAScriptFunctionObject::ecmascript_code() accessor.
Remove Bytecode::compile() and the old create() overloads on
ECMAScriptFunctionObject that accepted C++ AST nodes. These
have no remaining callers now that all compilation goes through
the Rust pipeline.
Also remove the if-constexpr Parse Node branch from
async_block_start, since the Statement template instantiation
was already removed.
Fix transitive include dependencies on Generator.h by adding
explicit includes for headers that were previously pulled in
transitively.
Port all remaining users of the C++ Parser/Lexer/Generator to
use the Rust pipeline instead:
- Intrinsics: Remove C++ fallback in parse_builtin_file()
- ECMAScriptFunctionObject: Remove C++ compile() fallback
- NativeJavaScriptBackedFunction: Remove C++ compile() fallback
- EventTarget: Port to compile_dynamic_function
- WebDriver/ExecuteScript: Port to compile_dynamic_function
- LibTest/JavaScriptTestRunner.h: Remove Parser/Lexer includes
- FuzzilliJs: Remove unused Parser/Lexer includes
Also remove the dead Statement-based template instantiation of
async_block_start/async_function_start.
Now that the Rust pipeline is the sole compilation path, remove all
C++ parser/codegen fallback paths from the callers:
- Script::parse() no longer falls back to C++ Parser
- SourceTextModule::parse() no longer falls back to C++ Parser
- perform_eval() no longer falls back to C++ Parser + Generator
- create_dynamic_function() no longer falls back to C++ Parser
- ShadowRealm eval no longer falls back to C++ Parser + Generator
- Interpreter::run(Script&) no longer falls back to Generator
Also remove the now-dead old constructors that took C++ AST nodes,
the module_requests() helper, and AST dump code from js.cpp.
Remove PipelineComparison.cpp/h and all LIBJS_COMPARE_PIPELINES
support from RustIntegration.cpp. This includes:
- The compare_pipelines_enabled() function
- All comparison blocks in compile_script/eval/module/function
- The pair_shared_function_data() helper
- The m_cpp_comparison_sfd field on SharedFunctionInstanceData
The Rust pipeline has been validated extensively through comparison
testing and no longer needs the side-by-side verification harness.
Cache raw data pointers on fixed-length typed array views so asm
GetByValue and PutByValue can use them directly for indexed
element access.
Replace the asm typed-array hot-path
ArrayBuffer/DataBlock/ByteBuffer walk with one cached_data_ptr load.
Remove six unconditional loads, four branches, and the byte_offset
add before the element access, trading them for one
cached_data_ptr null check.
Keep direct C++ typed-array access on IsValidIntegerIndex-based
checks, invalidate cached pointers eagerly when a backing
ArrayBuffer is detached, and add regression coverage for shrink,
regrow, and detach on number and BigInt typed arrays.
Use dedicated Packed branches in GetByValue and PutByValue so
in-bounds indexed accesses can skip hole checks and slot
reloads.
Keep Holey writes on the guarded arm, and keep append writes on
the C++ slow path so PutByValue still respects non-extensible
indexed objects and arrays with a non-writable length.
Add a bytecode regression that exercises both append failure
cases through the real js binary path.
Arrays created via new Array(N) or by setting .length start as Holey
since their elements are not present. After sequential fill (e.g.
for (i=0; i<N; i++) a[i]=v), all holes are filled but the array
remained Holey, preventing the Packed fast paths in the asm
interpreter from triggering.
Now, whenever indexed_put() writes to the last index of a Holey
array, we scan for remaining holes and promote to Packed if none
are found. Only checking on writes to the last index avoids O(N^2)
scanning on partial fills while still catching the common
sequential fill pattern.
When the receiver is an Array with packed storage and an intact default
prototype chain, some methods can skip the generic property access
machinery and operate directly on the indexed element storage.
This patch adds fast paths for push(), pop(), concat(), slice() and
splice().
Replace the OwnPtr<IndexedPropertyStorage> indirection with inline
indexed element storage directly on Object. This eliminates virtual
dispatch and reduces indirection for indexed property access.
The new system uses three storage kinds tracked by IndexedStorageKind:
- Packed: Dense array, no holes. Elements stored in a malloced Value*
array with capacity header (same layout as named properties).
- Holey: Dense array with possible holes marked by empty sentinel.
Same physical layout as Packed.
- Dictionary: Sparse storage using GenericIndexedPropertyStorage,
type-punned into the m_indexed_elements pointer.
Transitions: None->Packed->Holey->Dictionary (mostly monotonic).
Dictionary mode triggers on non-default attributes or sparse arrays.
Object keeps the same 48-byte size since m_indexed_elements (8 bytes)
replaces IndexedProperties (8 bytes), and the storage kind + array
size fit in existing padding alongside m_flags.
The asm interpreter benefits from one fewer indirection: it now reads
the element pointer and array size directly from Object fields instead
of chasing through OwnPtr -> IndexedPropertyStorage -> Vector.
Removes: IndexedProperties, SimpleIndexedPropertyStorage,
IndexedPropertyStorage, IndexedPropertyIterator.
Keeps: GenericIndexedPropertyStorage (for Dictionary mode).
Replace the 24-byte Vector<Value> m_storage with an 8-byte raw
Value* m_named_properties pointer, backed by a malloc'd allocation
with an inline capacity header.
Memory layout of the allocation:
[u32 capacity] [u32 padding] [Value 0] [Value 1] ...
m_named_properties points to Value 0.
This shrinks JS::Object from 64 to 48 bytes (on non-Windows
platforms) and removes one level of indirection for property access
in the asm interpreter, since the data pointer is now stored directly
on the object rather than inside a Vector's internal metadata.
Growth policy: max(4, max(needed, old_capacity * 2)).
When an async function is resumed from a microtask and hits another
await with a non-thenable value (primitive or already-settled native
promise), and the microtask queue is empty, we can resolve the await
synchronously without suspending. No other microtask can observe the
difference in execution order, making this optimization safe.
This avoids the overhead of creating a GC::Function for the microtask
job, enqueuing/dequeuing from the microtask queue, and the execution
context push/pop that comes with it.
A new VM host hook, host_promise_job_queue_is_empty, is added so both
the standalone js binary and LibWeb can provide the appropriate check
for their respective job queue implementations.
Per spec, every `await` goes through PromiseResolve (which wraps the
value in a new Promise via NewPromiseCapability) and then
PerformPromiseThen (which creates PromiseReaction and JobCallback
objects). This results in 13-16 GC cell allocations per await.
Add a fast path that detects two common cases:
1. Primitive values: These can never have a "then" property, so we
can skip all promise wrapping and directly schedule the async
function's continuation as a microtask.
2. Already-settled native Promises: If the promise has no own
properties and its prototype is the intrinsic %Promise.prototype%,
we can extract the result directly and schedule continuation.
For these cases, we bypass promise_resolve(), new_promise_capability(),
create_resolving_functions(), perform_then(), PromiseReaction creation,
and JobCallback creation -- replacing ~13 GC allocations with 1
(the GC::Function for the microtask job).
The fulfilled and rejected closures in AsyncFunctionDriverWrapper::await
were identical in structure, differing only in whether they resumed the
async context with a NormalCompletion or ThrowCompletion.
Since the callback has access to the current promise, we can check its
state at reaction time to determine which completion to use. This allows
us to allocate a single GC-tracked NativeFunction (m_on_settled) and
register it for both the onFulfilled and onRejected slots in
PerformPromiseThen, halving the number of GC allocations on this path.
There will be an extremely similar function that accepts a calendar year
rather than ISO year in a subsequent commit. Rename the ISO year variant
so that they are more distinguishable.
Replace the 16-byte Variant<Empty, GC::Ref<Script>, GC::Ref<Module>>
with a simple 8-byte GC::Ptr<Cell> that points to either a Script or
Module (or is null for Empty).
A helper function script_or_module_from_cell() converts back to the
full ScriptOrModule variant when needed (e.g. in
VM::get_active_script_or_module).
This field was written by push_inline_frame but never read anywhere.
The caller's executable is accessible via caller_frame->executable
if ever needed.
Shrinks ExecutionContext from 120 to 112 bytes.
The arguments Span (pointer + size = 16 bytes) was always derivable
from the tail array layout: data = values + (total_count - arg_count).
Replace it with a u32 argument_count and derive the span on demand
via arguments_span() / arguments_data() accessors.
Shrinks ExecutionContext from 136 to 120 bytes.
CachedSourceRange was a GC-allocated cell stored on the
ExecutionContext, only needed because ExecutionContext must be
trivially destructible.
Move the source range cache to a HashMap<u32, SourceRange> on the
Executable (keyed by program counter), where it belongs. This
eliminates the GC::Cell subclass entirely and removes the
cached_source_range field from ExecutionContext.
StackTraceElement and TracebackFrame now store Optional<SourceRange>
directly instead of GC::Ptr<CachedSourceRange>.
Shrinks ExecutionContext from 144 to 136 bytes.
This field was only used by LibWeb to prevent GC collection of the
EnvironmentSettingsObject while its execution context is on the stack.
This is unnecessary because the ESO is already reachable through the
realm's host_defined pointer: EC -> realm -> host_defined ->
PrincipalHostDefined -> environment_settings_object.
Shrinks ExecutionContext from 152 to 144 bytes.
Group the u32 fields (passed_argument_count, caller_return_pc,
caller_dst_raw) together so they pack naturally without alignment
padding between pointer-sized fields.
This shrinks ExecutionContext from 160 to 152 bytes.
Remove four fields that are trivially derivable from other fields
already present in the ExecutionContext:
- global_object (from realm)
- global_declarative_environment (from realm)
- identifier_table (from executable)
- property_key_table (from executable)
This shrinks ExecutionContext from 192 to 160 bytes (-17%).
The asmint's GetGlobal/SetGlobal handlers now load through the realm
pointer, taking advantage of the cached declarative environment
pointer added in the previous commit.
Realm now caches a direct pointer to the global declarative
environment record, updated when set_global_environment() is called.
This avoids an extra pointer chase through GlobalEnvironment in hot
paths like the asmint's GetGlobal/SetGlobal handlers.
Replace the icu4c-based calendar implementation with one built on the
icu4x Rust crate (icu_calendar).
The icu4c API does not expose the píngqì month-assignment algorithm
used by the Chinese and Dangi lunisolar calendars. Our old code had to
approximate this by walking months via epoch millisecond arithmetic and
manually tracking leap month positions, which produced incorrect month
codes and ordinal month numbers for certain years. The icu4x calendar
crate handles píngqì natively.
With this patch, which is almost a 1-to-1 mapping of ICU invocations, we
pass 100% of all Temporal test262 tests.
The end goal might be to use icu4x for all of our ICU needs. But it does
not yet provide the APIs needed for all ECMA-402 prototypes.
This adds international calendar support to our Temporal implementation,
using the Intl Era and Month Code Proposal as a guide. See:
https://tc39.es/proposal-intl-era-monthcode/
Same as commit f9fa548d43.
These are String from the outset, so this patch is almost entirely just
changing function parameter types. This will allow us to cache calendar
objects in ICU without invoking any extra allocations.
Property lookup cache entries previously used GC::Weak<T> for shape,
prototype, and prototype_chain_validity pointers. Each GC::Weak
requires a ref-counted WeakImpl allocation and an extra indirection
on every access.
Replace these with GC::RawPtr<T> and make Executable a WeakContainer
so the GC can clear stale pointers during sweep via remove_dead_cells.
For static PropertyLookupCache instances (used throughout the runtime
for well-known property lookups), introduce StaticPropertyLookupCache
which registers itself in a global list that also gets swept.
Now that inline cache entries use GC::RawPtr instead of GC::Weak,
we can compare shape/prototype pointers directly without going
through the WeakImpl indirection. This removes one dependent load
from each IC check in GetById, PutById, GetLength, GetGlobal, and
SetGlobal handlers.
Replace individual bool bitfields in Object (m_is_extensible,
m_has_parameter_map, m_has_magical_length_property, etc.) with a
single u8 m_flags field and Flag:: constants.
This consolidates 8 scattered bitfields into one byte with explicit
bit positions, making them easy to access from generated assembly
code at a known offset. It also converts the virtual is_function()
and is_ecmascript_function_object() methods to flag-based checks,
avoiding virtual dispatch for these hot queries.
ProxyObject now explicitly clears the IsFunction flag in its
constructor when wrapping a non-callable target, instead of relying
on a virtual is_function() override.
Instead of recursing through 5 native stack frames per JS function
call (execute_call -> internal_call -> ordinary_call_evaluate_body ->
run_executable -> run_bytecode), handle Call and CallConstruct for
normal ECMAScript functions directly in the dispatch loop.
The fast path allocates the callee's execution context on the
InterpreterStack, copies arguments, sets up the environment, and
jumps to the callee's bytecode entry point. Return and End unwind
inline frames by restoring the caller's state. Exception unwinding
walks through inline frames to find handlers.
The fast path code is kept in NEVER_INLINE helper functions
(try_inline_call, try_inline_call_construct, pop_inline_frame) to
minimize register pressure in the dispatch loop. handle_exception
takes program_counter by value to avoid forcing it onto the stack.
Reloading of bytecode/program_counter after frame switches is done
inline at each call site via RELOAD_AND_GOTO_START to preserve a
single dispatch entry point for optimal indirect branch prediction.
Replace alloca-based execution context allocation with InterpreterStack
bump allocation across all call sites: bytecode call instructions,
AbstractOperations call/construct, script evaluation, module evaluation,
and LibWeb module script evaluation.
Also replace the native stack space check with an InterpreterStack
exhaustion check, and remove the now-unused alloca macros from
ExecutionContext.h.
Add a managed bump allocator backed by an 8 MB mmap region to replace
alloca-based execution context allocation. Pages are committed on
demand by the OS, so only the memory actually touched is resident.
The InterpreterStack is stored as a member of VM and provides simple
LIFO allocation and deallocation of ExecutionContext frames.
Replace 20 separate Put instructions (5 PutKinds x 4 forms) with
4 unified instructions (PutById, PutByIdWithThis, PutByValue,
PutByValueWithThis), each carrying a PutKind field at runtime instead
of being a separate opcode.
This reduces the number of handler entry points in the dispatch loop
and eliminates template instantiations of put_by_property_key and
put_by_value that were being duplicated 5x each when inlined by LTO.
LIBJS_COMPARE_PIPELINES previously only compared top-level
script/eval/module bytecodes. Function bodies are compiled lazily
via compile_function(), and that path had no comparison at all.
Fix this by pairing each Rust-compiled SharedFunctionInstanceData
with its C++ counterpart during top-level compilation. When a
function is later lazily compiled, compile_function() runs both
pipelines and compares the bytecodes (crashing on mismatch, same
as the top-level comparisons). The pairing is done recursively so
nested functions are also covered.