112 Commits

Author SHA1 Message Date
Andreas Kling
c66cab7e6b AK: Hide tentative HashTable bucket from iterators across ensure()
HashMap<_, GC::Ref<_>>::ensure() crashed under UBSan whenever the
initialization callback triggered a GC: lookup_for_writing() stamped
the target bucket as used and added it to the ordered list before the
callback ran, so the marking visitor walked the map, read the
uninitialized slot, and failed the returns_nonnull check in GC::Ref.

Split bucket reservation into two phases. lookup_for_writing() now
hands back the target in the Free state (not in the ordered list,
m_size unchanged); callers placement-new the value and then commit via
commit_inserted_bucket(). The Robin Hood displacement loop still
stamps the slot internally and un-stamps before returning, so probing
is unchanged and the whole operation remains a single hash and a
single probe.
2026-04-25 06:21:36 +02:00
Andreas Kling
b23aa38546 AK: Adopt mimalloc v2 as main allocator
Use mimalloc for Ladybird-owned allocations without overriding malloc().
Route kmalloc(), kcalloc(), krealloc(), and kfree() through mimalloc,
and put the embedded Rust crates on the same allocator via a shared
shim in AK/kmalloc.cpp.

This also lets us drop kfree_sized(), since it no longer used its size
argument. StringData, Utf16StringData, JS object storage, Rust error
strings, and the CoreAudio playback helpers can all free their AK-backed
storage with plain kfree().

Sanitizer builds still use the system allocator. LeakSanitizer does not
reliably trace references stored in mimalloc-managed AK containers, so
static caches and other long-lived roots can look leaked. Pass the old
size into the Rust realloc shim so aligned fallback reallocations can
move posix_memalign-backed blocks safely.

Static builds still need a little linker help. macOS app binaries need
the Rust allocator entry points forced in from liblagom-ak.a, while
static ELF links can pull in identical allocator shim definitions from
multiple Rust staticlibs. Keep the Apple -u flags and allow those
duplicate shim symbols for LibJS and LibRegex links on Linux and BSD.
2026-04-08 09:57:53 +02:00
Tim Ledbetter
972bcdeebe AK: Use correct relocation for all HashTable entry types
Robin Hood displacement and `delete_bucket()` shift-up used BucketType's
implicit move operations, which bitwise-copy the `u8` storage array
instead of going through T's move constructor and destructor. This
change adds `relocate_bucket()` and `swap_buckets()` helpers that use a
fast path for trivially-relocatable types and move-construct + destroy
for others.
2026-03-19 14:21:44 +01:00
Jelle Raaijmakers
a18722093d AK: Use power-of-two growth in HashTable
Measurements on my device have shown this to make lookups and inserts
faster by 30-40% on average, especially for larger number of buckets.
The memory cost is significant depending on the exact number of buckets,
increasing anywhere between 10% and 100% compared to the old way of
growing our capacity. The performance benefits still make this worth it,
in my opinion.
2026-02-20 22:47:24 +01:00
Jelle Raaijmakers
2522aad8b7 AK: Reduce HashTable's load factor to 70%
Measurements on my device have shown HashTable lookups and inserts to be
faster up to ~20%, at the cost of 10-20% more memory usage depending on
the exact amount of buckets.
2026-02-20 22:47:24 +01:00
Andreas Kling
ca772caee6 AK: Add HashTable::ensure(hash, predicate, init_callback)
...and use it to make HashMap::ensure() do a single hash lookup instead
of three.

We achieve this by factoring out everything but the bucket construction
logic from HashTable::write_value() into a lookup_for_writing() helper
so we can use it from more places.
2025-10-05 21:44:06 +02:00
Zaggy1024
744568c912 AK: Destroy non-trivial types by ref in HashTable::clear_with_capacity
Buckets being iterated by pointer instead of reference was causing a
compilation error when calling clear_with_capacity() on a HashTable
containing a non-trivially-destructible type.
2025-09-22 17:28:00 -05:00
Andreas Kling
59a28febc9 AK: Store hash with HashTable entry to avoid expensive equality checks
When T in HashTable<T> has a potentially slow equality check, it can be
very profitable to check for a matching hash before full equality.

This patch adds may_have_slow_equality_check() to AK::Traits and
defaults it to true. For trivial types (pointers, integers, etc) we
default it to false. This means we skip the hash check when the equality
check would be a single-CPU-word compare anyway.

This synergizes really well with things like HashMap<String, V> where
collisions previously meant we may have to churn through multiple O(n)
equality checks.
2025-09-18 22:37:18 +02:00
Idan Horowitz
5097e72174 AK: Implement take_all_matching(predicate) API in HashTable 2025-08-08 13:09:58 -04:00
Timothy Flynn
7280ed6312 Meta: Enforce newlines around namespaces
This has come up several times during code review, so let's just enforce
it using a new clang-format 20 option.
2025-05-14 02:01:59 -06:00
Andrew Kaster
5e7e6475c6 AK: Annotate [[no_unique_address]] members with NO_UNIQUE_ADDRESS macro 2025-04-15 02:19:06 -06:00
Jelle Raaijmakers
c7773d0312 Meta: Update my email address everywhere 2024-11-01 12:14:53 +01:00
Andreas Kling
cc4b3cbacc Meta: Update my e-mail address everywhere 2024-10-04 13:19:50 +02:00
Dan Klishch
5ed7cd6e32 Everywhere: Use east const in more places
These changes are compatible with clang-format 16 and will be mandatory
when we eventually bump clang-format version. So, since there are no
real downsides, let's commit them now.
2024-04-19 06:31:19 -04:00
Andreas Kling
6724f840cd AK: Early return from empty hash table lookups to avoid hashing
When calling get() or find() on an empty HashTable or HashMap, we can
avoid hashing the sought-after key.
2024-03-16 14:27:59 +01:00
kleines Filmröllchen
9a026fc8d5 AK: Implement SipHash as the default hash algorithm for most use cases
SipHash is highly HashDoS-resistent, initialized with a random seed at
startup (i.e. non-deterministic) and usable for security-critical use
cases with large enough parameters. We just use it because it's
reasonably secure with parameters 1-3 while having excellent properties
and not being significantly slower than before.
2023-10-01 11:06:36 +03:30
Daniel Bertalan
4d2af7c3d6 AK: Implement reverse iterators for OrderedHashTable 2023-09-24 23:36:43 +02:00
Karol Kosek
e575ee4462 AK+Kernel: Unify Traits<T>::equals()'s argument order on different types
There was a small mishmash of argument order, as seen on the table:

                 | Traits<T>::equals(U, T) | Traits<T>::equals(T, U)
   ============= | ======================= | =======================
   uses equals() | HashMap                 | Vector, HashTable
defines equals() | *String[^1]             | ByteBuffer

[^1]: String, DeprecatedString, their Fly-type equivalents and KString.

This mostly meant that you couldn't use a StringView for finding a value
in Vector<String>.

I'm changing the order of arguments to make the trait type itself first
(`Traits<T>::equals(T, U)`), as I think it's more expected and makes us
more consistent with the rest of the functions that put the stored type
first (like StringUtils functions and binary_serach). I've also renamed
the variable name "other" in find functions to "entry" to give more
importance to the value.

With this change, each of the following lines will now compile
successfully:

    Vector<String>().contains_slow("WHF!"sv);
    HashTable<String>().contains("WHF!"sv);
    HashMap<ByteBuffer, int>().contains("WHF!"sv.bytes());
2023-08-23 20:21:09 +02:00
Ben Wiederhake
36ff6187f6 Everywhere: Change spelling of 'behaviour' to 'behavior'
"The official project language is American English […]."
5d2e915623/CONTRIBUTING.md (L30)

Here's a short statistic of the occurrences of the word "behavio(u)r":

$ git grep -IPioh 'behaviou?r' | sort | uniq -c | sort -n
      2 BEHAVIOR
     24 Behaviour
     32 behaviour
    407 Behavior
    992 behavior

Therefore, it is clear that "behaviour" (56 occurrences) should be
regarded a typo, and "behavior" (1401 occurrences) should be preferred.

Note that The occurrences in LibJS are intentionally NOT changed,
because there are taken verbatim from the specification. Hence:

$ git grep -IPioh 'behaviou?r' | sort | uniq -c | sort -n
      2 BEHAVIOR
     10 behaviour
     24 Behaviour
    407 Behavior
   1014 behavior
2023-05-07 01:05:09 +02:00
Ben Wiederhake
ee47c0275e Everywhere: Run spellcheck on all documentation 2023-05-07 01:05:09 +02:00
Aliaksandr Kalenik
4c6564e3c1 AK: Add values() method in HashTable
Add HashTable::values() method that returns all values.
2023-04-28 18:11:44 +02:00
Jelle Raaijmakers
954d660094 AK: Clear OrderedHashTable previous/next pointers on removal
With Clang, the previous/next pointers in buckets of an
`OrderedHashTable` are not cleared when a bucket is being shifted up as
a result of a removed bucket. As a result, an unfortunate pointer mixup
could lead to an infinite loop in the `HashTable` iterator, which was
exposed in `HashMap::keys()`.

Co-authored-by: Luke Wilde <lukew@serenityos.org>
2023-03-15 21:43:52 +01:00
Hediadyoin1
fd8c54d720 AK: Add take_first to HashTable and rename pop to take_last
This naming scheme matches Vector.

This also changes `take_last` to move the value it takes, and delete by
known pointer, avoiding a full lookup and potential copies.
2023-02-21 22:13:06 +01:00
Hediadyoin1
93945062a7 AK: Update HashTables head and tail when shifting during deletion
Otherwise we end up with invalid pointers to them, breaking iteration.
2023-02-21 22:13:06 +01:00
Jelle Raaijmakers
c08d137fcd AK: Reimplement HashTable with smart linear probing
Instead of rehashing on collisions, we use Robin Hood hashing: a simple
linear probe where we keep track of the distance between the bucket and
its ideal position. On insertion, we allow a new bucket to "steal" the
position of "rich" buckets (those near their ideal position) and move
them further down.

On removal, we shift buckets back up into the freed slot, decrementing
their distance while doing so.

This behavior automatically optimizes the number of required probes for
any value, and removes the need for periodic rehashing (except when
expanding the capacity).
2023-02-17 22:29:51 -07:00
Timothy Flynn
4f5353cbb8 AK: Rename double_hash to rehash_for_collision
The name is currently quite confusing as it indicates it hashes doubles.
2023-01-21 10:36:14 +01:00
Eli Youngs
a2024cfb69 AK: Support popping an arbitrary element from a HashTable 2022-12-16 10:41:56 -07:00
Moustafa Raafat
b8f1e1bed2 Everywhere: Remove unnecessary AK and Detail namespace scoping 2022-12-09 11:25:30 +00:00
Linus Groh
d26aabff04 Everywhere: Run clang-format 2022-12-03 23:52:23 +00:00
Andreas Kling
ae3ffdd521 AK: Make it possible to not using AK classes into the global namespace
This patch adds the `USING_AK_GLOBALLY` macro which is enabled by
default, but can be overridden by build flags.

This is a step towards integrating Jakt and AK types.
2022-11-26 15:51:34 +01:00
Zaggy1024
a1300d3797 AK: Don't crash in HashTable::clear_with_capacity on an empty table
When calling clear_with_capacity on an empty HashTable/HashMap, a null
deref would occur when trying to memset() m_buckets. Checking that it
has capacity before clearing fixes the issue.
2022-11-11 00:44:04 -07:00
Hendiadyoin1
5bf84a5b0e AK: Zero previous pointer *after* fixing the insertion list in HashTable 2022-06-23 20:25:12 +03:00
Idan Horowitz
eb02425ef9 AK: Clear the previous and next pointers of deleted HashTable buckets
Usually the values of the previous and next pointers of deleted buckets
are never used, as they're not part of the main ordered bucket chain,
but if an in-place rehashing is done, which results in the bucket being
turned into a free bucket, the stale pointers will remain, at which
point any item that is inserted into said free-bucket will have either
a stale previous pointer if the HashTable was empty on insertion, or a
stale next pointer, resulting in undefined behaviour.

This commit also includes a new HashMap test that reproduces this issue
2022-06-22 21:53:13 +02:00
Vitaly Dyachkov
a0a4d169f4 AK+LibGUI: Pass predicate to *_matching() methods by const reference 2022-05-08 17:02:00 +02:00
Idan Horowitz
086969277e Everywhere: Run clang-format 2022-04-01 21:24:45 +01:00
kleines Filmröllchen
09a12247fb AK: Use bucket states with special bit patterns in HashTable
This simplifies some of the bucket state handling code, as there's now
an easy way of checking the basic category of bucket state.
2022-03-31 12:06:13 +02:00
kleines Filmröllchen
49d29c8298 AK: Rehash HashTable in-place instead of shrinking
As seen on TV, HashTable can get "thrashed", i.e. it has a bunch of
deleted buckets that count towards the load factor. This means that hash
tables which are large enough for their contents need to be resized.
This was fixed in 9d8da16 with a workaround that shrinks the HashTable
back down in these cases, as after the resize and re-hash the load
factor is very low again. However, that's not a good solution. If you
insert and remove repeatedly around a size boundary, you might get
frequent resizes, which involve frequent re-allocations.

The new solution is an in-place rehashing algorithm that I came up with.
(Do complain to me, I'm at fault.) Basically, it iterates the buckets
and re-hashes the used buckets while marking the deleted slots empty.
The issue arises with collisions in the re-hash. For this reason, there
are two kinds of used buckets during the re-hashing: the normal "used"
buckets, which are old and are treated as free space, and the
"re-hashed" buckets, which are new and treated as used space, i.e. they
trigger probing. Therefore, the procedure for relocating a bucket's
contents is as follows:
- Locate the "real" bucket of the contents with the hash. That bucket is
  the starting point for the target bucket, and the current (old) bucket
  is the bucket we want to move.
- While we still need to move the bucket:
  - If we're the target, something strange happened last iteration or we
    just re-hashed to the same location. We're done.
  - If the target is empty or deleted, just move the bucket. We're done.
  - If the target is a re-hashed full bucket, we probe by double-hashing
    our hash as usual. Henceforth, we move our target for the next
    iteration.
  - If the target is an old full bucket, we swap the target and to-move
buckets. Therefore, the bucket to move is a the correct location and the
former target, which still needs to find a new place, is now in the
bucket to move. So we can just continue with the loop; the target is
re-obtained from the bucket to move. This happens for each and every
bucket, though some buckets are "coincidentally" moved before their
point of iteration is reached. Either way, this guarantees full in-place
movement (even without stack storage) and therefore space complexity of
O(1). Time complexity is amortized O(2n) asssuming a good hashing
function.

This leads to a performance improvement of ~30% on the benchmark
introduced with the last commit.

Co-authored-by: Hendiadyoin1 <leon.a@serenityos.org>
2022-03-31 12:06:13 +02:00
kleines Filmröllchen
bcb8937898 AK: Merge HashTable bucket state into one enum
The hash table buckets had three different state booleans that are in
fact exclusive. In preparation for further states, this commit
consolidates them into one enum. This has the added benefit on not
relying on the compiler's boolean packing anymore; we definitely now
only need one byte for the bucket state.
2022-03-31 12:06:13 +02:00
Daniel Bertalan
e3eb68dd58 AK+Kernel: Avoid double memory clearing of HashTable buckets
Since the allocated memory is going to be zeroed immediately anyway,
let's avoid redundantly scrubbing it with MALLOC_SCRUB_BYTE just before
that.

The latest versions of gcc and Clang can automatically do this malloc +
memset -> calloc optimization, but I've seen a couple of places where it
failed to be done.

This commit also adds a naive kcalloc function to the kernel that
doesn't (yet) eliminate the redundancy like the userland does.
2022-03-15 11:56:46 +01:00
Andreas Kling
9d8da1697e AK: Automatically shrink HashTable when removing entries
If the utilization of a HashTable (size vs capacity) goes below 20%,
we'll now shrink the table down to capacity = (size * 2).

This fixes an issue where tables would grow infinitely when inserting
and removing keys repeatedly. Basically, we would accumulate deleted
buckets with nothing reclaiming them, and eventually deciding that we
needed to grow the table (because we grow if used+deleted > limit!)

I found this because HashTable iteration was taking a suspicious amount
of time in Core::EventLoop::get_next_timer_expiration(). Turns out the
timer table kept growing in capacity over time. That made iteration
slower and slower since HashTable iterators visit every bucket.
2022-03-07 00:08:22 +01:00
Andreas Kling
eb829924da AK: Remove return value from HashTable::remove() and HashMap::remove()
This was only used by remove_all_matching(), where it's no longer used.
2022-03-07 00:08:22 +01:00
Andreas Kling
623bdd8b6a AK: Simplify HashTable::remove_all_matching()
Just walk the table from start to finish, deleting buckets as we go.
This removes the need for remove() to return an iterator, which is
preventing me from implementing hash table auto-shrinking.
2022-03-07 00:08:22 +01:00
Idan Horowitz
9b0d90a71d AK: Support using custom comparison operations for hash compatible keys 2022-01-29 23:01:23 +02:00
James Puleo
10b25d2a57 AK: Implement HashTable::try_ensure_capacity, as used in HashMap
This was used in `HashMap::try_ensure_capacity`, but was missing from
`HashTable`s implementation. No one had used
`HashMap::try_ensure_capacity` before so it went unnoticed!
2022-01-25 09:17:22 +01:00
Andreas Kling
5279a04c78 AK: Make Hash{Map,Table}::remove_all_matching() return removal success
These functions now return whether one or more entries were removed.
2022-01-05 18:57:14 +01:00
Andreas Kling
54cf42fac1 AK: Add HashTable::remove_all_matching(predicate)
This removes all matching entries from a table in a single pass.
2022-01-05 18:57:14 +01:00
Hendiadyoin1
c673b7220a AK: Enable fast path for removal by hash-compatible key in HashMap/Table 2021-12-15 23:35:14 -08:00
Hendiadyoin1
d50360f5dd AK: Allow hash-compatible key types in Hash[Table|Map] lookup
This will allow us to avoid some potentially expensive type conversion
during lookup, like form String to StringView, which would allocate
memory otherwise.
2021-12-15 13:09:49 +03:30
Andrew Kaster
762b92c650 AK: Resolve clang-tidy readability-qualified-auto warnings
... In files included by Kernel/Process.cpp and Kernel/Thread.cpp
2021-11-14 22:52:35 +01:00
Andrew Kaster
22feb9d47b AK: Resolve clang-tidy readability-bool-conversion warnings
... In files included by Kernel/Process.cpp and Kernel/Thread.cpp
2021-11-14 22:52:35 +01:00