Use a direct anonymous mapping for POSIX BlockAllocator chunks and trim
any temporary padding needed to make the live 2 MiB chunk HeapBlock-
aligned.
The GC only needs each 16 KiB HeapBlock slot aligned so from_cell() can
recover the block base by masking low bits. Request that alignment from
mach_vm_map() as well, rather than aligning whole chunks to 2 MiB.
This commit splits out synchronization primitives from LibThreading into
LibSync. This is because LibThreading depends on LibCore, while LibCore
needs the synchronization primitives from LibThreading. This worked
while they were header only, but when I tried to add an implementation
file it ran into the circular dependency. To abstract away the pthread
implementation using cpp files is necessary so the synchronization
primitives were moved to a separate library.
MADV_FREE is lazy: the kernel only reclaims pages once it actually
needs them. As a result, blocks freed during a transient burst of
allocation continue to count toward our RSS for an arbitrarily long
time after a busy page goes idle.
Switch to MADV_DONTNEED on Linux so freed blocks drop out of RSS
immediately. macOS keeps the existing FREE_REUSABLE/FREE_REUSE paired
protocol (which integrates with its RSS accounting and gives the same
"eager release" behavior).
On cloudflare.com, some big GCs now drop ~350 MB instantly instead of
accumulating into a long-lived MADV_FREE backlog.
The slot lists used to be drained in a random order to make heap
layout less predictable, but on top of the per-CellAllocator type-
isolation we already enforce, the security delta is negligible.
LIFO via take_last() gives us better cache locality (we hand out
the most recently freed slot, which is more likely to be warm) and
saves a get_random_uniform call on every allocation.
deallocate_block() used to call MADV_FREE_REUSABLE / MADV_FREE /
MADV_DONTNEED inline on every freed block. With sweep typically
freeing many blocks per GC, the cumulative syscall cost shows up
as real GC pause time.
Move the work onto a single global "decommit worker" thread:
- deallocate_block now just poisons the slot and pushes it onto a
per-allocator m_freshly_freed queue. No syscalls.
- allocate_block prefers m_freshly_freed over m_blocks, so a slot
that's recycled before the worker sees it skips the
REUSABLE/REUSE pair entirely. This is the main payoff.
- Heap::sweep_dead_cells kicks the worker at the end of sweep.
The worker sleeps 50 ms after each kick to give the JS thread
breathing room, then drains each registered allocator's
m_freshly_freed, madvises slots in batches of 64 with
sched_yield between batches, and splices them onto m_blocks.
- Per-allocator refcount + condvar lets ~BlockAllocator wait
until the worker has dropped its reference before our storage
goes away. (Chunks themselves remain leaked: type-isolated VM
is permanent, so we never tear them down.)
Previously every 16 KiB HeapBlock was its own posix_memalign /
mach_vm_map / VirtualAlloc, which churned VMAs and made the kernel's
vm_area_struct list balloon for any non-trivial heap.
Carve slots out of 2 MiB chunks instead. The kernel now sees one
mmap per 128 blocks. Chunks are owned exclusively by a single
BlockAllocator and are never released back to the OS or shared
across allocators -- that's how we keep the heap's VM permanently
type-isolated, where a virtual address used for a cell of type T
is never reused for any other type. We don't bother tracking chunk
bases for teardown: the destructor leaks them by design.
Per-block memory return is preserved: deallocate_block still calls
MADV_FREE_REUSABLE / MADV_FREE / MADV_DONTNEED / DiscardVirtualMemory
so the kernel can reclaim physical pages under pressure.
These are macOS madvise() hints that keep the kernel accounting aware of
how we're using the GC memory. This keeps our memory footprint looking
more accurate.
Before this change, we'd use the system page size as the HeapBlock
size. This caused it to vary on different platforms, going as low
as 4 KiB on most Linux systems.
To make this work, we now use posix_memalign() to ensure we get
size-aligned allocations on every platform.
Also nice: HeapBlock::BLOCK_SIZE is now a constant.
Resulting in a massive rename across almost everywhere! Alongside the
namespace change, we now have the following names:
* JS::NonnullGCPtr -> GC::Ref
* JS::GCPtr -> GC::Ptr
* JS::HeapFunction -> GC::Function
* JS::CellImpl -> GC::Cell
* JS::Handle -> GC::Root