Skip the generic HasProperty and Get loop when indexOf operates on a
simple packed array. In that case every index below length is an own
data property, so a direct scan of the packed indexed property storage
gives the same strict-equality result without the per-element property
lookup
ceremony.
Only use the fast path when the current packed storage size still
matches the length captured before fromIndex coercion, since that
coercion can run user code and mutate the receiver. Add coverage for
length and storage mutations during fromIndex coercion.
ECMAScript hoisting keeps the LAST function declaration with a given
name. The Rust scope_collector and script GDI extraction implemented
this with a single reverse scan that pushed first-seen entries, which
left the resulting list in REVERSE source order. The C++ side then
iterated `m_functions_to_initialize.in_reverse()` to undo that.
Switch the Rust side to a two-pass forward scan that records the last
position per name and emits entries in source order, and drop the
matching `.in_reverse()` calls in Script.cpp and AbstractOperations.cpp.
Same hoisting semantics; NewFunction emission and global property
iteration order now follow the source.
The HashMap that tracks last positions is keyed on `SharedUtf16String`,
so each insert is a refcount bump on the AST's existing Rc instead of
a deep `Vec<u16>` clone.
Add bytecode tests at script and nested-function scope that exercise
multiple declarations and a duplicate name to pin the new ordering.
Move owned ArrayBuffer storage directly when transferring stream
buffers instead of copying the bytes before detaching the source.
WebAssembly memory continues to copy because its ArrayBuffer wraps
externally-owned storage.
Preserve the abrupt completion from DetachArrayBuffer before moving
storage so non-transferable buffers, such as WebAssembly.Memory-backed
views, still surface TypeError through stream operations instead of
aborting.
This saves ~130ms of main thread time when loading a YouTube video
on my Linux computer. :^)
WebAssembly.Memory-backed ArrayBuffers wrap external
ByteBuffer storage. When that memory grows,
ByteBuffer::try_resize() may realloc the backing storage while
old fixed-length buffer objects remain reachable from JS.
TypedArrayBase cached m_data for all fixed-length buffers, and
the asm interpreter fast path dereferenced that cached pointer
directly. For wasm memory views this could leave a stale
pointer behind across grow().
Restrict cached typed-array data pointers to fixed-length
ArrayBuffers that own stable ByteBuffer storage.
External/unowned buffers, including WebAssembly.Memory
buffers, now keep m_data == nullptr and fall back to code that
re-derives buffer().data() on each access.
Add regressions for both the original shared-memory grow case
and the second-grow stale-view case.
invalidate_all_prototype_chains_leading_to_this used to scan every
prototype shape in the realm and walk each one's chain looking for
the mutated shape. That was O(N_prototype_shapes x chain_depth) per
mutation and showed up hot in real profiles when a page churned a
lot of prototype state during startup.
Each prototype shape now keeps a weak list of the prototype shapes
whose immediate [[Prototype]] points at the object that owns this
shape. The list is registered on prototype-shape creation
(clone_for_prototype, set_prototype_shape) and migrated to the new
prototype shape when the owning prototype object transitions to a
new shape. Invalidation is then a recursive walk over this direct-
child registry, costing O(transitive descendants).
Saves ~300 ms of main thread time when loading https://youtube.com/
on my Linux machine. :^)
No other engine defines this function, so it is an observable difference
of our engine. This traces back to the earliest days of LibJS.
We now define `gc` in just the test-js and test262 runners.
indexed_take_first() already memmoves elements down for both Packed and
Holey storage, but the caller at ArrayPrototype::shift() only entered
the fast path for Packed arrays. Holey arrays fell through to the
spec-literal per-element loop (has_property / get / set /
delete_property_or_throw), which is substantially slower.
Add a separate Holey predicate with the additional safety checks the
spec semantics require: default_prototype_chain_intact() (so
HasProperty on a hole doesn't escape to a poisoned prototype) and
extensible() (so set() on a hole slot doesn't create a new own
property on a non-extensible object). The existing Packed predicate
is left unchanged -- packed arrays don't need these checks because
every index in [0, size) is already an own data property.
Allows us to fail at Cloudflare Turnstile way much faster!
The element-by-element loop compiled to scalar 8-byte moves that the
compiler could not vectorize: source and destination alias, and strict
aliasing prevented hoisting the m_indexed_elements pointer load out of
the loop body. memmove collapses the shift into a single vectorized
copy.
Previously it used `realm.[[GlobalObject]]` instead of
`realm.[[GlobalEnv]].[[GlobalThisValue]]`.
In LibWeb, that corresponds to Window and WindowProxy respectively.
Shape::visit_edges used to walk every entry of m_property_table and
call PropertyKey::visit_edges on each key. For non-dictionary shapes
that work is redundant: m_property_table is a lazily built cache of
the transition chain, and every key it contains was originally
inserted as some ancestor shape's m_property_key, which is already
kept alive via m_previous.
Intrinsic shapes populated through add_property_without_transition()
in Intrinsics.cpp are not dictionaries and have no m_previous to
reach their keys through, but each of those keys is either a
vm.names.* string or a well-known symbol and is strongly rooted by
the VM for its whole lifetime, so skipping them here is safe too.
Measured on the main WebWorker used by https://www.maptiler.com/maps/
this cuts out ~98% of the PropertyKey::visit_edges calls made by
Shape::visit_edges each GC, reducing time spent in GC by ~1.3 seconds
on my Linux PC while initially loading the map.
Carry full source positions through the Rust bytecode source map so
stack traces and other bytecode-backed source lookups can use them
directly.
This keeps exception-heavy paths from reconstructing line and column
information through SourceCode::range_from_offsets(), which can spend a
lot of time building SourceCode's position cache on first use.
We're trading some space for time here, but I believe it's worth it at
this tag, as this saves ~250ms of main thread time while loading
https://x.com/ on my Linux machine. :^)
Reading the stored Position out of the source map directly also exposed
two things masked by the old range_from_offsets() path: a latent
off-by-one in Lexer::new_at_offset() (its consume() bumped line_column
past the character at offset; only synthesize_binding_pattern() hit it),
and a (1,1) fallback in range_from_offsets() that fired whenever the
queried range reached EOF. Fix the lexer, then rebaseline both the
bytecode dump tests (no more spurious "1:1") and the destructuring AST
tests (binding-pattern identifiers now report their real columns).
CompareArrayElements was calling ToString(x) +
PrimitiveString::create(vm, ...) on every comparison, producing a
fresh PrimitiveString that wrapped the original's AK::String but
carried no cached UTF-16. The subsequent IsLessThan then hit
PrimitiveString::utf16_string_view() on that fresh object, which
re-ran simdutf UTF-8 validation + UTF-8 -> UTF-16 conversion for
both sides on every one of the N log N comparisons.
When x and y are already String Values, ToString(x) and
ToPrimitive(x, Number) are the identity per spec, so we can drop
the IsLessThan detour entirely and compare their Utf16Views
directly. The original PrimitiveString caches its UTF-16 on first
access, so subsequent comparisons against the same element hit
the cache; Utf16View::operator<=> additionally gives us a memcmp
fast path when both sides ended up with short-ASCII UTF-16 storage.
Microbenchmark:
```js
function makeStrings(n) {
let seed = 1234567;
const rand = () => {
seed = (seed * 1103515245 + 12345) & 0x7fffffff;
return seed;
};
const out = new Array(n);
for (let i = 0; i < n; i++)
out[i] = "item_" + rand().toString(36)
+ "_" + rand().toString(36);
return out;
}
const base = makeStrings(100000);
const arr = base.slice();
arr.sort();
```
```
n before after speedup
1k 0.70ms 0.30ms 2.3x
10k 8.33ms 3.33ms 2.5x
50k 49.33ms 17.33ms 2.8x
100k 118.00ms 45.00ms 2.6x
```
WebAssembly.Memory({shared:true}).grow() reallocates the underlying
AK::ByteBuffer outline (kmalloc+kfree) but, per the threads proposal,
must not detach the associated SharedArrayBuffer.
ArrayBuffer::detach_buffer was the only path that walked m_cached_views
and cleared the cached raw m_data pointer on each TypedArrayBase, so
every existing view retained a dangling pointer into the freed outline.
The AsmInterpreter GetByValue / PutByValue fast paths dereference that
cached pointer directly, yielding a use-after-free triggerable from
JavaScript.
Add ArrayBuffer::refresh_cached_typed_array_view_data_pointers() which
re-derives m_data for each registered view from the current outline
base (and refreshes UnownedFixedLengthByteBuffer::size), and call it
from Memory::refresh_the_memory_buffer on the SAB-fixed-length path
where detach is spec-forbidden.
The asm interpreter already inlines ECMAScript calls, but builtin calls
still went through the generic C++ Call slow path even when the callee
was a plain native function pointer. That added an avoidable boundary
around hot builtin calls and kept asm from taking full advantage of the
new RawNativeFunction representation.
Teach the asm Call handler to recognize RawNativeFunction, allocate the
callee frame on the interpreter stack, copy the call-site arguments,
and jump straight to the stored C++ entry point.
NativeJavaScriptBackedFunction and other non-raw callees keep falling
through to the existing C++ slow path unchanged.
Mapped and unmapped arguments objects already use dedicated premade
shapes. Track their [[ParameterMap]] internal slot there instead of
setting an Object flag after construction.
This keeps the information on the shared shape, preserves it through
shape transitions, and still lets Object.prototype.toString()
recognize arguments objects without per-instance bookkeeping.
NativeFunction previously stored an AK::Function for every builtin,
even when the callable was just a plain C++ entry point. That mixed
together two different representations, made simple builtins carry
capture storage they did not need, and forced the GC to treat every
native function as if it might contain captured JS values.
Introduce RawNativeFunction for plain NativeFunctionPointer callees
and keep AK::Function-backed callables on a CapturingNativeFunction
subclass. Update the straightforward native registrations in LibJS
and LibWeb to use the raw representation, while leaving exported
Wasm functions on the capturing path because they still capture
state.
Wrap UniversalGlobalScope's byte-length strategy lambda in
Function<...> explicitly so it keeps selecting the capturing
NativeFunction::create overload.
First: We now pin the icu4x version to an exact number. Minor version
upgrades can result in noisy deprecation warnings and API changes which
cause tests to fail. So let's pin the known-good version exactly.
This patch updates our Rust calendar module to use the new APIs. This
initially caused some test failures due to the new Date::try_new API
(which is the recommended replacement for Date::try_new_from_codes)
having quite a limited year range of +/-9999. So we must use other
APIs (Date::try_from_fields and calendrical_calculations::gregorian)
to avoid these limits.
http://github.com/unicode-org/icu4x/blob/main/CHANGELOG.md#icu4x-22
Place Realm's cached declarative environment next to its global object
so the asm global access fast paths can fetch the two pointers with a
paired load. These handlers never use the intervening GlobalEnvironment
pointer directly.
Pack the asm Call fast path metadata next to the executable pointer
so the interpreter can fetch both values with one paired load. This
removes several dependent shared-data loads from the hot path.
Keep the executable pointer and packed metadata in separate registers
through this binding so the fast path can still use the paired-load
layout after any non-strict this adjustment.
Lower the packed metadata flag checks correctly on x86_64 as well.
Those bits now live above bit 31, so the generator uses bt for single-
bit high masks and covers that path with a unit test.
Add a runtime test that exercises both object and global this binding
through the asm Call fast path.
Store whether a function can participate in JS-to-JS inline calls on
SharedFunctionInstanceData instead of recomputing the function kind,
class-constructor bit, and bytecode availability at each fast-path
call site.
Keep JS-to-JS inline calls out of m_execution_context_stack and walk
the active stack from the running execution context instead. Base
pushes now record the previous running context so duplicate
TemporaryExecutionContext pushes and host re-entry still restore
correctly.
This keeps the fast JS-to-JS path off the vector without losing GC
root collection, stack traces, or helpers that need to inspect the
active execution context chain.
The bytecode interpreter only needed the running execution context,
but still threaded a separate Interpreter object through both the C++
and asm entry points. Move that state and the bytecode execution
helpers onto VM instead, and teach the asm generator and slow paths to
use VM directly.
Specialize only the fixed unary case in the bytecode generator and let
all other argument counts keep using the generic Call instruction. This
keeps the builtin bytecode simple while still covering the common fast
path.
The asm interpreter handles int32 inputs directly, applies the ToUint16
mask in-place, and reuses the VM's cached ASCII single-character
strings when the result is 7-bit representable. Non-ASCII single code
unit results stay on the dedicated builtin path via a small helper, and
the dedicated slow path still handles the generic cases.
Teach the PrimitiveString substring creation path to return the
VM's preallocated single-character ASCII strings instead of always
allocating a deferred Substring.
This keeps one-code-unit ASCII substrings on the same fast path as
direct string creation, including callers like charAt and indexed
string property access.
Tag String.prototype.charAt as a builtin and emit a dedicated
bytecode instruction for non-computed calls.
The asm interpreter can then stay on the fast path when the
receiver is a primitive string with resident UTF-16 data and the
selected code unit is ASCII. In that case we can return the VM's
cached empty or single-character ASCII string directly.
Teach builtin call specialization to recognize non-computed
member calls to charCodeAt() and emit a dedicated builtin opcode.
Mark String.prototype.charCodeAt with that builtin tag, then add
an asm interpreter fast path for primitive-string receivers whose
UTF-16 data is already resident.
The asm path handles both ASCII-backed and UTF-16-backed resident
strings, returns NaN for out-of-bounds Int32 indices, and falls
back to the generic builtin call path for everything else. This
keeps the optimistic case in asm while preserving the ordinary
method call semantics when charCodeAt has been replaced or when
string resolution would be required.
Keep the legacy regexp static properties backed by PrimitiveString
values instead of eagerly copied Utf16Strings. lastMatch, leftContext,
rightContext, and $1-$9 now materialize lazy Substrings from the
original match input when accessed.
Keep RegExp.input as a separate slot from the match source so manual
writes do not rewrite the last match state. Add coverage for that
behavior and for rope-backed UTF-16 inputs.
Keep the primitive string that segment() creates alongside the UTF-16
buffer used by LibUnicode. Segment data objects can then return lazy
Substring instances for "segment" and reuse the original
PrimitiveString for "input" instead of copying both strings.
Add a rope-backed UTF-16 segmenter test that exercises both
containing() and iterator results.
Route the obvious substring-producing string operations through the
new PrimitiveString substring factory. Direct indexing, at(), charAt(),
slice(), substring(), substr(), and the plain-string split path can now
return lazy JS::Substring values backed by the original string.
Add runtime coverage for rope-backed string operations so these lazy
string slices stay exercised across both ASCII and UTF-16 inputs.
Return JS::Substring objects from the builtin regexp exec and split
paths instead of eagerly copying UTF-16 slices into new strings.
Matches, captures, and split pieces can now point back at the original
input until someone asks for the string contents.
Add focused runtime coverage for UTF-16 captures and regex split
captures so these lazy slices stay exercised.
Introduce JS::Substring as a lazily materialized PrimitiveString
variant that stores an originating string plus a UTF-16 offset and
length. This makes substring creation cheap while still reifying to
a normal string when character data is requested.
Track which short strings actually live in the VM caches so lazily
resolved ropes and substrings do not evict unrelated cached strings
when they are finalized. Add focused unit tests for nested ranges,
rope-backed substrings, surrogate boundaries, and cache behavior.
Cache the flattened enumerable key snapshot for each `for..in` site and
reuse a `PropertyNameIterator` when the receiver shape, dictionary
generation, indexed storage kind and length, prototype chain
validity, and magical-length state still match.
Handle packed indexed receivers as well as plain named-property
objects. Teach `ObjectPropertyIteratorNext` in `asmint.asm` to return
cached property values directly and to fall back to the slow iterator
logic when any guard fails.
Treat arrays' hidden non-enumerable `length` property as a visited
name for for-in shadowing, and include the receiver's magical-length
state in the cache key so arrays and plain objects do not share
snapshots.
Add `test-js` and `test-js-bytecode` coverage for mixed numeric and
named keys, packed receiver transitions, re-entry, iterator reuse, GC
retention, array length shadowing, and same-site cache reuse.
Increase indexed element growth from 25% to 50% so lazily realized
holey arrays need fewer reallocations and copies during dense fills.
This keeps the lazy length model from paying as much incremental growth
overhead on benchmarks that pre-size with Array(length) and then write
sequentially within bounds.
Treat setting a large array length as a logical length change instead of
forcing dictionary indexed storage or materializing every hole up front.
This keeps dense fills on Array(length) on the holey indexed path and
only falls back to sparse storage when later writes actually create a
large realized gap.
The asm indexed get/put fast paths assumed holey arrays always had a
materialized backing store. Guard those paths with a capacity check so
lazy holey arrays fall back safely until an index has been realized.
Add regression coverage for very large holey arrays and for densely
filling a large holey array after pre-sizing it with Array(length).
Split JS::ErrorData out of JS::Error so that it can be used both
by JS::Error and WebIDL::DOMException. This adds support for
Error.isError to DOMException, also letting us report DOMException
stack information to the console.
Use mimalloc for Ladybird-owned allocations without overriding malloc().
Route kmalloc(), kcalloc(), krealloc(), and kfree() through mimalloc,
and put the embedded Rust crates on the same allocator via a shared
shim in AK/kmalloc.cpp.
This also lets us drop kfree_sized(), since it no longer used its size
argument. StringData, Utf16StringData, JS object storage, Rust error
strings, and the CoreAudio playback helpers can all free their AK-backed
storage with plain kfree().
Sanitizer builds still use the system allocator. LeakSanitizer does not
reliably trace references stored in mimalloc-managed AK containers, so
static caches and other long-lived roots can look leaked. Pass the old
size into the Rust realloc shim so aligned fallback reallocations can
move posix_memalign-backed blocks safely.
Static builds still need a little linker help. macOS app binaries need
the Rust allocator entry points forced in from liblagom-ak.a, while
static ELF links can pull in identical allocator shim definitions from
multiple Rust staticlibs. Keep the Apple -u flags and allow those
duplicate shim symbols for LibJS and LibRegex links on Linux and BSD.
The proposal has not seemed to progress for a while, and there is
a open issue about module imports which breaks HTML integration.
While we could probably make an AD-HOC change to fix that issue,
it is deep enough in the JS engine that I am not particularly
keen on making that change.
Until other browsers begin to make positive signals about shipping
ShadowRealms, let's remove our implementation for now.
There is still some cleanup that can be done with regard to the
HTML integration, but there are a few more items that need to be
untangled there.
RegExpBuiltinExec used to snap any Unicode lastIndex that landed on a
low surrogate back to the start of the pair. That matched `/😀/u`,
but it skipped valid empty matches when the original low-surrogate
position was itself matchable, such as `/p{Script=Cyrillic}?(?<!\D)/v`
on `"A😘"` and the longer fuzzed global case.
Try the snapped position first, then retry the original lastIndex when
the snapped match fails. Only keep that second result when it is empty
at the original low-surrogate position, so consuming /u and /v matches
still cannot split a surrogate pair. In the Rust VM, treat backward
Unicode matches that start between surrogate halves as having no
complete code point to their left, which matches V8's lookbehind
behavior for those positions.
Add reduced coverage for both low-surrogate exec cases, the original
global match count regression, and the consuming-match retry regression.
Reject surrogate pairs in named group names unless both halves come
from the same raw form. A literal surrogate half was being
normalized into \uXXXX before LibRegex parsed the pattern, which let
mixed literal and escaped forms sneak through.
Validate surrogate handling on the UTF-16 pattern before
normalization, but only treat \k<...> as a named backreference when
the parser would do that too. Legacy regexes without named groups
still use \k as an identity escape, so their literal text must not be
rejected by the pre-scan.
Add runtime and syntax tests for the mixed forms, the valid literal,
fixed-width, and braced escape cases, and the legacy \k literals.
This better describes what the method returns and avoids the possible
confusion caused by the mismatch in behavior between
`Value::is_array()` and `Value::as_array()`.
In b2d9fd3352, the root cause of the crash
was somewhat misdiagnosed, particularly around what place in code an
allocation could occur while constants data was uninitialized.
But more importantly, we can do better than the solution in that commit.
Instead of initializing constants with default values and then
overwriting them afterwards, simply initialize them with their actual
values directly when constructing the execution context.
This effectivly reverts commit b2d9fd3352.
The additional data being passed will be used in an upcoming commit.
Allows splitting the churn of modified function signatures from the
logically meaningful code change.
No behavior change.
When allocating ExecutionContext, we were skipping allocating constants,
because they are filled in shortly after. However, there is still some
code executing between allocating the ExecutionContext and assigning
constants. This code may allocate GC-aware objects before the
assignment. Allocating any GC object may cause a garbage collection to
be triggered. And running garbage collection on uninitialized objects
might have all kinds of unintended effects.
To avoid that, simply initialize constants right away. To be safe, also
initialize arguments, but I haven't checked more closely if it is needed
for them or not.
The specific case where this bug manifested was JetStream3's
raytrace-private-class-fields.js, which was triggering a segmentation
fault when trying to visit a functions constant during GC marking phase.
The offending allocation triggering the garbage collection is the
corresponding FunctionEnvironment being created.
This crash was exposed as a combination of multiple things coming
together. In particular, both of 1179e40d3f and 61e6dbe4e7 combined
were exposing the problem, but is seems neither commit is at fault. Most
likely the crash happening or not is sensitive to the exact amount of GC
pressure being present or size of individual execution contexts. And so
any change that affects those might make it appear or go away.
To put in an additional wrinkle, this could only be observed using the
ASM JS interpreter. The CPP interpreter was using a fast path for
function calls that has a different allocation pattern and did not run
into the crash.
No explicit regression test for this change because:
* The problem is very sensitive to implementation details and a
reproduction that stays valid with code changes in the interpreter is
probably impossible to come by.
* The bug was exposed by the JS benchmarks, which already are as good as
a regression test as we are going to get here realistically.
Preserve V8's behavior for bare single-astral literals when a unicode
global search starts in the middle of a surrogate pair. We were
snapping that lastIndex back to the pair start unconditionally,
which let /😀/gu and /\u{1F600}/gu match where V8 returns null.
Expose that literal shape from LibRegex to LibJS and add runtime
coverage for the bare literal case alongside a grouped control.
The fast path in RegExp.prototype.test() checked for an own "exec"
property on the instance via storage_has(), but did not detect when
RegExp.prototype.exec itself had been replaced. This meant overriding
exec on the prototype was silently ignored, violating the spec which
requires test() to go through RegExpExec() and thus the overridable
exec method.
Fix this by resolving "exec" via a full prototype chain lookup and
checking whether the result is still the built-in exec, matching the
approach already used in Symbol.replace's fast path.