HashMap<_, GC::Ref<_>>::ensure() crashed under UBSan whenever the
initialization callback triggered a GC: lookup_for_writing() stamped
the target bucket as used and added it to the ordered list before the
callback ran, so the marking visitor walked the map, read the
uninitialized slot, and failed the returns_nonnull check in GC::Ref.
Split bucket reservation into two phases. lookup_for_writing() now
hands back the target in the Free state (not in the ordered list,
m_size unchanged); callers placement-new the value and then commit via
commit_inserted_bucket(). The Robin Hood displacement loop still
stamps the slot internally and un-stamps before returning, so probing
is unchanged and the whole operation remains a single hash and a
single probe.
The !has_ascii_storage() && !other.has_ascii_storage() branch did a
byte-wise __builtin_memcmp over a char16_t array, which on little-endian
does not give code-unit order: the low byte is compared first, so
0xD83D (bytes [0x3D, 0xD8]) spuriously compared less than 0x2764
(bytes [0x64, 0x27]) even though the code unit 0xD83D is greater.
No in-tree caller currently uses operator<=> for Utf16View ordering,
so this bug is dormant; the follow-up LibJS change exposes it.
Replace the memcmp branch with a per-code-unit loop, which the compiler
can auto-vectorize and which mirrors what is_code_unit_less_than already
does.
The old implementation stored chunks in a Vector, which meant every
discard() had to call Vector::remove(0, N) to drop the consumed chunks
from the front, shifting every remaining chunk down. For a stream used
as a back-pressure queue, draining it by discarding one chunk at a time
was quadratic in the queued chunk count: in RequestServer that cost
about a second of CPU per large response.
Replace it with a singly-linked list of chunks (head, tail, head read
offset, tail write offset) so push-back and pop-front are both O(1)
and no shifting ever happens. Each chunk now holds its CHUNK_SIZE byte
array inline rather than a separately-allocated ByteBuffer, which also
halves the per-chunk allocations. Teardown unlinks iteratively to avoid
recursive OwnPtr destructors on very long chains.
Expose the next contiguous range of readable bytes at the current read
offset without copying. This lets callers hand the underlying chunk
straight to write/send without round-tripping through a user-space
buffer, which is important when the stream is used as a back-pressure
queue and the total queued size can grow to many megabytes.
The iterator returned by SinglyLinkedList::remove() left `m_prev`
default-initialized to `nullptr`. If the caller removed another element
without first advancing, the previous node's next pointer was left
dangling to the freed node.
This caused a UAF in FinalizationRegistry's `remove_by_token()` when
two consecutive records shared an unregister token.
Make the _string, _fly_string, _utf16, and _utf16_fly_string UDL
operators constexpr, with a fast path for short (<= 7 byte) ASCII
literals that folds directly into an inline ShortString. Previously,
every "foo"_fly_string (and friends) involved an out-of-line call
into the string factory, even though the result is entirely known
at compile time.
With numerators or denominators approaching NumericLimits<u32>::max(),
we could overflow in the sum of the remainder and the rounding
contribution. Instead, divide them separately and sum them afterward.
Overflow could happen in the multiplication of the remainder seconds
back into time units. Instead, take a remainder of the time units from
the division and use that for the nanoseconds.
Add a HistoryStore abstraction with transient and persisted backends,
normalize recorded URLs, and skip non-browsable schemes.
Cover lookup and persistence in TestHistoryStore so history-driven
features can share one backend.
Use mimalloc for Ladybird-owned allocations without overriding malloc().
Route kmalloc(), kcalloc(), krealloc(), and kfree() through mimalloc,
and put the embedded Rust crates on the same allocator via a shared
shim in AK/kmalloc.cpp.
This also lets us drop kfree_sized(), since it no longer used its size
argument. StringData, Utf16StringData, JS object storage, Rust error
strings, and the CoreAudio playback helpers can all free their AK-backed
storage with plain kfree().
Sanitizer builds still use the system allocator. LeakSanitizer does not
reliably trace references stored in mimalloc-managed AK containers, so
static caches and other long-lived roots can look leaked. Pass the old
size into the Rust realloc shim so aligned fallback reallocations can
move posix_memalign-backed blocks safely.
Static builds still need a little linker help. macOS app binaries need
the Rust allocator entry points forced in from liblagom-ak.a, while
static ELF links can pull in identical allocator shim definitions from
multiple Rust staticlibs. Keep the Apple -u flags and allow those
duplicate shim symbols for LibJS and LibRegex links on Linux and BSD.
Since e47cdc6b, converting `ReadonlyBytes` to a `StringView` will crash
if the `ReadonlyBytes` given us empty. This caused a crash when calling
`btoa("")`, which called encode_base64() with an empty `ReadonlyBytes`.
We now don't convert the input from `ReadonlyBytes` to
`StringView`, which avoids this problem.
An empty Span (and so ReadonlyBytes) can be created with a null data
pointer, which is no longer allowed for StringView. Detect that case
and use an empty string instead.
Now that there are no callers of is_null left, Make default constructed
StringViews represent the empty string and disallow null pointers in the
public constructors, matching ByteString and String.
Keep a private null sentinel only for Optional<StringView>.
The default constructor now initializes to an empty ASCII string rather
than a null pointer.
Also add a VERIFY in the utf16 constructor to assert the pointer is
non null, and remove the now unneeded is_null() method.
Add a maximum nesting depth check in JsonParser so deeply nested
arrays/objects from untrusted input cannot blow the call stack.
Add regression tests for excessive nesting rejection and
reasonable nesting acceptance.
On the MSVC ABI (used by clang-cl), each empty base class is given a
unique address with 1 byte of padding by default. Variant inherits from
a deep chain of empty base classes (InheritFromPacks ->
InheritFromUniqueEntries -> VariantConstructors) for its inherited
constructor mechanism. Without __declspec(empty_bases), the pointer
adjustment in VariantConstructors::internal_cast() from the
VariantConstructors subobject back to the Variant base was computed
incorrectly due to accumulated padding, causing the constructor to write
m_data and m_index at wrong offsets and corrupting the stored object.
This manifested as heap corruption (STATUS_HEAP_CORRUPTION 0xc0000374)
whenever a Variant containing a large type (e.g. ByteCode at 240 bytes)
was destroyed after being constructed through the inherited constructor
path. In practice this crashed `new RegExp('a')` in LibJS on Windows,
preventing ~880 of ~1058 LibJS runtime tests from running.
Add standalone saturating_add(), saturating_sub(), and
saturating_mul() free functions for integral types.
The signed implementations are fully branchless, using
__builtin_add/sub/mul_overflow combined with bitmask selection.
We specialize `Optional<T>` for value types that inherently support some
kind of "empty" value or whose value range allow for a unlikely to be
useful sentinel value that can mean "empty", instead of the boolean flag
a regular Optional<T> needs to store. Because of padding, this often
means saving 4 to 8 bytes per instance.
By extending the new `SentinelOptional<T, Traits>`, these
specializations are significantly simplified to just having to define
what the sentinel value is, and how to identify a sentinel value.
Robin Hood displacement and `delete_bucket()` shift-up used BucketType's
implicit move operations, which bitwise-copy the `u8` storage array
instead of going through T's move constructor and destructor. This
change adds `relocate_bucket()` and `swap_buckets()` helpers that use a
fast path for trivially-relocatable types and move-construct + destroy
for others.
The microtask queue is a pure FIFO (enqueue at back, dequeue from
front) but was using a Vector, making every dequeue O(n) due to
element shifting.
Replace it with AK::Queue which has O(1) dequeue. This makes a huge
difference when processing large numbers of microtasks, e.g. during
async-heavy JavaScript workloads where each `await` generates a
microtask.
Also add a for_each() method to AK::Queue so the GC can visit the
queued tasks.
Vulkan Validation Layers provides diagnostic feedback about Vulkan API
usage while live testing Ladybird. It is possible to enable this
diagnostic output without changing the code (using the Vulkan SDK),
but having it built into the code makes it very easy to enable whenever
required.
Co-authored-by: Rocco Corsi <5201151+rcorsi@users.noreply.github.com>
If `fast_is<T>` was not available but `static_cast<T>` was, we would
call into `is<T>` causing a `dynamic_cast` there, throwing away the
result and return a `static_cast`ed pointer.
We can encourage better codegen by checking for the presence of
`fast_is<T>` ourselves. This does not change anything for when
`fast_is<T>` is present, but if it's unavailable we now reduce
`as_if<T>` to a simple tailcall of `dynamic_cast<T>`.
Old:
```
stp x20, x19, [sp, #-32]!
stp x29, x30, [sp, #16]
add x29, sp, #16
mov x19, x0
adrp x1, typeinfo for Base@PAGE
add x1, x1, typeinfo for Base@PAGEOFF
adrp x2, typeinfo for WithoutFastIs@PAGE
add x2, x2, typeinfo for WithoutFastIs@PAGEOFF
mov x3, #0
bl ___dynamic_cast
cmp x0, #0
csel x0, xzr, x19, eq
ldp x29, x30, [sp, #16]
ldp x20, x19, [sp], #32
ret
```
New:
```
adrp x1, typeinfo for Base@PAGE
add x1, x1, typeinfo for Base@PAGEOFF
adrp x2, typeinfo for WithoutFastIs@PAGE
add x2, x2, typeinfo for WithoutFastIs@PAGEOFF
mov x3, #0
b ___dynamic_cast
```
We have a common pattern of creating a `WeakPtr<T>` from a reference and
passing that into a lambda, to then take the strong ref when the lambda
is executed. Add `weak_callback(Weakable, lambda)` that returns a lambda
that only invokes the callback if a strong ref exists, and passes it as
the first argument.
Escaped surrogate sequences should not combine with adjacent literal
surrogates in Unicode mode.
We now use `\u{XXXX}` braces instead of `\uXXXX` when escaping code
units in Unicode mode, so LibRegex treats each as a standalone code
point. Also prevent GenericLexer from combining `\uXXXX` and `\u{XXXX}`.
These new traits are identical to `Traits<Integral>`, except that
calling `.hash()` will return the value itself instead of hashing it.
This should be used in cases where either the value is already a proper
hash, or using the value as a hash will yield "good enough" performance
in e.g. HashTable.
Types larger than 32 bits are folded in on themselves. Collision tests
on some popular hashing algorithms show that XOR folding slightly
increases the number of collisions, but this allows `IdentityHashTraits`
not to make any assumptions on which bits are the most relevant for the
final hash.
In the benchmark added here, fmt's dragonbox is ~3x faster than our own
Ryu implementation (1197ms for dragonbox vs. 3435ms for Ryu).
Daniel Lemire recently published an article about these algorithms:
https://lemire.me/blog/2026/02/01/converting-floats-to-strings-quickly/
In this article, fmt's dragonbox implementation is actually one of the
slower ones (with the caveat that some comments note that the article is
a bit out-of-date). I've gone with fmt here because:
1. It has a readily available recent version on vcpkg.
2. It provides the methods we need to actually convert a floating point
to decimal exponential form.
3. There is an ongoing effort to replace dragonbox with a new algorithm,
zmij, which promises to be faster.
4. It is one of the only users of AK/UFixedBigInt, so we can potentially
remove that as well soon.
5. Bringing in fmt opens the door to replacing a bunch of AK::format
facilities with fmt as well.
We were previously using our own defines, however those didn't cover
various systems. This change makes AK_CACHE_ALIGNED use the destructive
interference size as defined by the compiler for the target platform.
These were moved in 0f04c6d but the namespace remained the same. We also
now forward declare these in `AK/Forward.h` rather than
`LibWeb/Forward.h` (or not at all in the case of `ValueComparingRefPtr`)
Measurements on my device have shown this to make lookups and inserts
faster by 30-40% on average, especially for larger number of buckets.
The memory cost is significant depending on the exact number of buckets,
increasing anywhere between 10% and 100% compared to the old way of
growing our capacity. The performance benefits still make this worth it,
in my opinion.
Measurements on my device have shown HashTable lookups and inserts to be
faster up to ~20%, at the cost of 10-20% more memory usage depending on
the exact amount of buckets.