The phase breakdown was authored when sweep_dead_cells was the
single STW sweep phase, with sweep_callbacks and weak-container
work nested under it. After incremental sweep landed, that nesting
no longer matches reality: sweep_callbacks runs at STW every
collection, the weak-container prune is its own STW step, and
sweep_dead_cells only runs for CollectEverything.
Promote prune_weak_containers and sweep_callbacks to top-level
phases so they show up correctly in the report, and gate the
sweep_dead_cells subsection on a non-zero time so normal
collections no longer print a wall of zero rows.
The prune-weak-containers loop was previously untimed, leaving an
unaccounted gap in the per-GC totals. Wire it through the existing
ScopedPhaseTimer mechanism. The PhaseTimings field for it is
renamed from sweep_weak_containers_us to prune_weak_containers_us
to disambiguate from sweep_weak_blocks_us, which times a different
piece of work.
Record incremental sweep batch timing while LIBGC_LOG_LEVEL enables GC
reporting. Print the batch summary once the incremental sweep fully
finishes, so normal collection reports include the delayed sweep work
instead of leaving the sweep section empty for incremental collections.
Now that Heap maintains a persistent set of live heap blocks, use it
in the marking and conservative root scanning phases instead of
rebuilding a local copy on every collection cycle.
Move weak container cleanup (remove_dead_cells) out of both
sweep_dead_cells() and start_incremental_sweep() to the place
where it is actually safe to inspect cell state: collect_garbage().
Previously, remove_dead_cells could access cells that had already
been swept and poisoned by ASAN, causing use-after-poison crashes
when a new GC triggered while an incremental sweep was in progress.
Instead of sweeping all heap blocks in one go after marking, sweep
incrementally, one block at a time, interleaved with program execution.
This significantly reduces worst-case GC pause times by spreading
sweep work across multiple smaller time slices.
Sweep is driven by two complementary mechanisms:
1. Timer-based sweeping: A 16ms repeating timer drives background
sweep work, processing blocks for up to 5ms per timer fire.
2. Allocation-directed sweeping: Each allocator sweeps its own
pending blocks before creating new ones, ensuring forward
progress even without timer events.
Each allocator maintains its own list of blocks pending sweep,
and allocators with pending work are tracked in a separate list
for efficient timer-driven sweeping.
Key implementation details:
- Newly allocated cells during sweep are marked immediately to
prevent premature collection.
- Mark bits are cleared incrementally as each block is swept,
rather than in a separate pass over the entire heap.
- Finalization and weak reference processing remain stop-the-world
since they must complete atomically before any sweeping occurs.
Reading LIBGC_LOG_LEVEL once at startup picks the verbosity for the
collection reports:
0/default: silent.
1: per-GC report with totals and detailed per-phase timings.
2+: everything in level 1, plus the full block allocator dump.
The existing collect_garbage(..., print_report=true) entry point still
works and behaves like a per-call floor of level 1, so the DevTools
"collect-garbage" inspector request keeps emitting a report regardless
of the env var. The allocator dump is now off by default at level 1
since most reports do not need it.
Right-aligned the percentage column in the breakdown so the numbers
line up cleanly.
The per-collection report now includes microsecond timings (with
percentage of total) for each phase and major subphase:
gather_roots
must-survive scan / embedder roots / explicit roots
conservative roots
register scan / stack scan / conservative-vector / cell lookup
mark_live_cells
initial visit / BFS marking / clear uprooted
finalize_unmarked_cells
sweep_weak_blocks
sweep_dead_cells
block iteration / weak containers / sweep callbacks
block reclassify / update threshold
Timings are recorded via a small RAII helper into a file-scope struct,
keeping all the plumbing inside Heap.cpp so the public Heap.h surface
stays untouched. Sweep stats now travel back to collect_garbage() the
same way, which lets the report move out of sweep_dead_cells() into a
single print_gc_report() helper run after every phase has completed.
Switch the per-GC report to a precise timer and microsecond output, and
format all byte counts via human_readable_size so they read naturally
(e.g. "12.3 MiB" instead of "12923847 bytes").
deallocate_block() used to call MADV_FREE_REUSABLE / MADV_FREE /
MADV_DONTNEED inline on every freed block. With sweep typically
freeing many blocks per GC, the cumulative syscall cost shows up
as real GC pause time.
Move the work onto a single global "decommit worker" thread:
- deallocate_block now just poisons the slot and pushes it onto a
per-allocator m_freshly_freed queue. No syscalls.
- allocate_block prefers m_freshly_freed over m_blocks, so a slot
that's recycled before the worker sees it skips the
REUSABLE/REUSE pair entirely. This is the main payoff.
- Heap::sweep_dead_cells kicks the worker at the end of sweep.
The worker sleeps 50 ms after each kick to give the JS thread
breathing room, then drains each registered allocator's
m_freshly_freed, madvises slots in batches of 64 with
sched_yield between batches, and splices them onto m_blocks.
- Per-allocator refcount + condvar lets ~BlockAllocator wait
until the worker has dropped its reference before our storage
goes away. (Chunks themselves remain leaked: type-isolated VM
is permanent, so we never tear them down.)
Add a Cell hook for externally owned memory, and retally live external
bytes while sweeping after a collection.
Use the combined live cell and external byte count when sizing the next
GC threshold. External allocation notifications also participate in the
allocation-since-GC trigger.
Establish post-collection heap thresholds with a 1.75x growth factor
over live byte count, with an 8 MiB minimum.
These constants were chosen based on a benchmark sweep of the
Speedometer browser benchmarks (and a wider set of JS workloads) — see
the PR description for the data behind the choice.
Keep the constants in Heap.cpp instead of Heap.h so future tweaks don't
trigger 1000+ file rebuilds.
In sanitizer builds, we need to convert the fake ASan stack pointers to
the real one in order to perform a conservative scan. We were blindly
scanning these stack frames regardless of whether they belong to the
_active_ stack range, i.e. the current function's frame and everything
above it. It's very likely that stale pointers exist below the stack
pointer, and we now take care to exclude that range.
Fixes a flake in the LibJS builtins/WeakRef/WeakRef.prototype.deref.js
test.
Property lookup cache entries previously used GC::Weak<T> for shape,
prototype, and prototype_chain_validity pointers. Each GC::Weak
requires a ref-counted WeakImpl allocation and an extra indirection
on every access.
Replace these with GC::RawPtr<T> and make Executable a WeakContainer
so the GC can clear stale pointers during sweep via remove_dead_cells.
For static PropertyLookupCache instances (used throughout the runtime
for well-known property lookups), introduce StaticPropertyLookupCache
which registers itself in a global list that also gets swept.
Now that inline cache entries use GC::RawPtr instead of GC::Weak,
we can compare shape/prototype pointers directly without going
through the WeakImpl indirection. This removes one dependent load
from each IC check in GetById, PutById, GetLength, GetGlobal, and
SetGlobal handlers.
This adds a stack trace to the JSON output from GC graph dumps which
is shown in a default-collapsed tray on the right side of the graph
explorer. When a stack pointer root is selected, the stack frame it
originated from is highlighted in the tray.
Using ensure_capacity() was a mistake, as that API is for specifying an
exact needed capacity, while grow_capacity() is for growing at a
reasonable rate.
Amusingly, we ended up with very different behavior on macOS and Linux
here, since ensure_capacity() calls kmalloc_good_size() which quantizes
to malloc bucket sizes on macOS, but is effectively a no-op on Linux.
Extreme slowdown on Linux caught by GarBench/marking-stress.js
When passing a Vector<JS::Value> to the MarkingVisitor, we were
iterating over the vector and visiting one value at a time. This led
to a very inefficient way of building up the GC's work queue.
By adding a new visit_impl() virtual to Cell::Visitor, we can now
grow the work queue capacity once, and then add without incrementally
growing the storage.
Instead of checking if every single cell overrides the "must survive GC"
virtual, we can make this a HeapBlock level thing.
This avoids almost an entire GC heap traversal during the mark phase.
Post-GC tasks may trigger another GC, and things got very confusing
when that happened. Just dump all stats before running tasks.
Also add a separate Heap function to run these tasks. This makes
backtraces much easier to understand.
This had two fatal bugs:
1. We didn't actually mark the cell that must survive GC, we only
visited its edges.
2. Worse, we didn't actually mark anything at all! We just added
cells to MarkingVisitor's work queue, but this happened after
the work queue had already been processed.
This commit fixes these issues by moving the "must survive" pass
earlier in the mark phase.
Before this change, we'd use the system page size as the HeapBlock
size. This caused it to vary on different platforms, going as low
as 4 KiB on most Linux systems.
To make this work, we now use posix_memalign() to ensure we get
size-aligned allocations on every platform.
Also nice: HeapBlock::BLOCK_SIZE is now a constant.
In our process architecture, there's only ever one JS::VM per process.
This allows us to have a VM::the() singleton getter that optimizes
down to a single global access everywhere.
Seeing 1-2% speed-up on all JS benchmarks from this.
This is a weak pointer that integrates with the garbage collector.
It has a number of differences compared to AK::WeakPtr, including:
- The "control block" is allocated from a well-packed WeakBlock owned by
the GC heap, not just a generic malloc allocation.
- Pointers to dead cells are nulled out by the garbage collector
immediately before running destructors.
- It works on any GC::Cell derived type, meaning you don't have to
inherit from AK::Weakable for the ability to be weakly referenced.
- The Weak always points to a control block, even when "null" (it then
points to a null WeakImpl), which means one less null check when
chasing pointers.
This is a GC-aware wrapper around AK::HashMap. Entry values are treated
as GC roots, much like the GC::RootVector we already had.
We also provide GC::OrderedRootHashMap as a convenience.
Previously, we would only keep the cell that must survive alive, but
none of it's edges.
This cropped up with a GC UAF in must_survive_garbage_collection of
WebSocket in .NET's SignalR frontend implementation, where an
out-of-scope WebSocket had it's underlying EventTarget properties
garbage collected, and must_survive_garbage_collection read from the
destroyed EventTarget properties.
See: https://github.com/dotnet/aspnetcore/blob/main/src/SignalR/clients/ts/signalr/src/WebSocketTransport.ts#L81
Found on https://www.formula1.com/ during a live session.
Co-Authored-By: Tim Flynn <trflynn89@pm.me>
Before this change, it was possible for a second GC to get triggered
in the middle of a first GC, due to allocations happening in the
FinalizationRegistry cleanup host hook. To avoid this causing problems,
we add a "post-GC task" mechanism and use that to invoke the host hook
once all other GC activity is finished, and we've unset the "collecting
garbage" flag.
Note that the test included here only fails reliably when running with
the -g flag (collect garbage after each allocation).
Fixes#3051
Resulting in a massive rename across almost everywhere! Alongside the
namespace change, we now have the following names:
* JS::NonnullGCPtr -> GC::Ref
* JS::GCPtr -> GC::Ptr
* JS::HeapFunction -> GC::Function
* JS::CellImpl -> GC::Cell
* JS::Handle -> GC::Root