LibWeb: Replace spin_until in HTMLParser::the_end() with state machine

HTMLParser::the_end() had three spin_until calls that blocked the event
loop: step 5 (deferred scripts), step 7 (ASAP scripts), and step 8
(load event delay). This replaces them with an HTMLParserEndState state
machine that progresses asynchronously via callbacks.

The state machine has three phases matching the three spin_until calls:
- WaitingForDeferredScripts: loops executing ready deferred scripts
- WaitingForASAPScripts: waits for ASAP script lists to empty
- WaitingForLoadEventDelay: waits for nothing to delay the load event

Notification triggers re-evaluate the state machine when conditions
change: HTMLScriptElement::mark_as_ready, stylesheet unblocking in
StyleElementBase/HTMLLinkElement, did_stop_being_active_document, and
DocumentLoadEventDelayer decrements. NavigableContainer state changes
(session history readiness, content navigable cleared, lazy load flag)
also trigger re-evaluation of the load event delay check.

Key design decisions and why:

1. Microtask checkpoint in schedule_progress_check(): The old spin_until
   called perform_a_microtask_checkpoint() before checking conditions.
   This is critical because HTMLImageElement::update_the_image_data step
   8 queues a microtask that creates the DocumentLoadEventDelayer.
   Without the checkpoint, check_progress() would see zero delayers and
   complete before images start delaying the load event.

2. deferred_invoke in schedule_progress_check():
   I tried Core::Timer (0ms), queue_global_task, and synchronous calls.
   Timers caused non-deterministic ordering with the HTML event loop's
   task processing timer, leading to image layout tests failing (wrong
   subtest pass/fail patterns). Synchronous calls fired too early during
   image load processing before dimensions were set, causing 0-height
   images in layout tests. queue_global_task had task ordering issues
   with the session history traversal queue. deferred_invoke runs after
   the current callback returns but within the same event loop pump,
   giving the right balance.

3. Navigation load event guard (m_navigation_load_event_guard): During
   cross-document navigation, finalize_a_cross_document_navigation step
   2 calls set_delaying_load_events(false) before the session history
   traversal activates the new document. This creates a transient state
   where the parent's load event delay check sees the about:blank (which
   has ready_for_post_load_tasks=true) as the active document and
   completes prematurely.
This commit is contained in:
Aliaksandr Kalenik
2026-03-28 09:39:51 +01:00
committed by Alexander Kalenik
parent b542617e09
commit df96b69e7a
Notes: github-actions[bot] 2026-03-28 22:15:52 +00:00
12 changed files with 235 additions and 40 deletions

View File

@@ -49,12 +49,14 @@
#include <LibWeb/Infra/Strings.h>
#include <LibWeb/MathML/TagNames.h>
#include <LibWeb/Namespace.h>
#include <LibWeb/Platform/EventLoopPlugin.h>
#include <LibWeb/SVG/SVGScriptElement.h>
#include <LibWeb/SVG/TagNames.h>
namespace Web::HTML {
GC_DEFINE_ALLOCATOR(HTMLParser);
GC_DEFINE_ALLOCATOR(HTMLParserEndState);
static inline void log_parse_error(SourceLocation const& location = SourceLocation::current())
{
@@ -275,8 +277,6 @@ void HTMLParser::run(URL::URL const& url, HTMLTokenizer::StopAtInsertionPoint st
// https://html.spec.whatwg.org/multipage/parsing.html#the-end
void HTMLParser::the_end(GC::Ref<DOM::Document> document, GC::Ptr<HTMLParser> parser)
{
auto& heap = document->heap();
// Once the user agent stops parsing the document, the user agent must run the following steps:
// NOTE: This is a static method because the spec sometimes wants us to "act as if the user agent had stopped
@@ -332,33 +332,128 @@ void HTMLParser::the_end(GC::Ref<DOM::Document> document, GC::Ptr<HTMLParser> pa
return;
}
// 5. While the list of scripts that will execute when the document has finished parsing is not empty:
while (!document->scripts_to_execute_when_parsing_has_finished().is_empty()) {
// 1. Spin the event loop until the first script in the list of scripts that will execute when the document has finished parsing
// has its "ready to be parser-executed" flag set and the parser's Document has no style sheet that is blocking scripts.
main_thread_event_loop().spin_until(GC::create_function(heap, [document] {
return document->scripts_to_execute_when_parsing_has_finished().first()->is_ready_to_be_parser_executed()
&& !document->has_a_style_sheet_that_is_blocking_scripts();
}));
// Steps 5-11 are handled by the HTMLParserEndState state machine.
auto state = HTMLParserEndState::create(document, parser);
document->set_html_parser_end_state(state);
state->schedule_progress_check();
}
// 2. Execute the first script in the list of scripts that will execute when the document has finished parsing.
document->scripts_to_execute_when_parsing_has_finished().first()->execute_script();
static constexpr int THE_END_TIMEOUT_MS = 15000;
// 3. Remove the first script element from the list of scripts that will execute when the document has finished parsing (i.e. shift out the first entry in the list).
(void)document->scripts_to_execute_when_parsing_has_finished().take_first();
GC::Ref<HTMLParserEndState> HTMLParserEndState::create(GC::Ref<DOM::Document> document, GC::Ptr<HTMLParser> parser)
{
return document->heap().allocate<HTMLParserEndState>(document, parser);
}
HTMLParserEndState::HTMLParserEndState(GC::Ref<DOM::Document> document, GC::Ptr<HTMLParser> parser)
: m_document(document)
, m_parser(parser)
, m_timeout(Platform::Timer::create_single_shot(heap(), THE_END_TIMEOUT_MS, GC::create_function(heap(), [this] {
if (m_phase != Phase::Completed)
dbgln("HTMLParserEndState: timed out in phase {}", to_underlying(m_phase));
})))
{
m_timeout->start();
}
void HTMLParserEndState::visit_edges(Cell::Visitor& visitor)
{
Base::visit_edges(visitor);
visitor.visit(m_document);
visitor.visit(m_parser);
visitor.visit(m_timeout);
}
void HTMLParserEndState::schedule_progress_check()
{
if (m_phase == Phase::Completed)
return;
if (m_check_pending)
return;
m_check_pending = true;
Platform::EventLoopPlugin::the().deferred_invoke(GC::create_function(heap(), [this] {
// NOTE: Pending microtasks (e.g. image load event delayer creation from update_the_image_data
// step 8) must be processed before we check conditions, matching spin_until's behavior.
// Skip the checkpoint when the microtask queue is empty to avoid unnecessary work
// (save/restore execution context stack, notify_about_rejected_promises, etc.).
if (!main_thread_event_loop().microtask_queue_empty()) {
auto& vm = main_thread_event_loop().vm();
vm.save_execution_context_stack();
vm.clear_execution_context_stack();
main_thread_event_loop().perform_a_microtask_checkpoint();
vm.restore_execution_context_stack();
}
check_progress();
m_check_pending = false;
}));
}
void HTMLParserEndState::check_progress()
{
// AD-HOC: Bail out if the document is no longer fully active (e.g. navigated away from).
if (!m_document->is_fully_active()) {
complete();
return;
}
switch (m_phase) {
case Phase::WaitingForDeferredScripts:
// 5. While the list of scripts that will execute when the document has finished parsing is not empty:
while (!m_document->scripts_to_execute_when_parsing_has_finished().is_empty()) {
auto& first_script = *m_document->scripts_to_execute_when_parsing_has_finished().first();
// 1. Spin the event loop until the first script in the list of scripts that will execute when the document has finished parsing
// has its "ready to be parser-executed" flag set and the parser's Document has no style sheet that is blocking scripts.
if (!first_script.is_ready_to_be_parser_executed() || m_document->has_a_style_sheet_that_is_blocking_scripts())
return;
// 2. Execute the first script in the list of scripts that will execute when the document has finished parsing.
first_script.execute_script();
// 3. Remove the first script element from the list of scripts that will execute when the document has finished parsing (i.e. shift out the first entry in the list).
(void)m_document->scripts_to_execute_when_parsing_has_finished().take_first();
}
advance_to_asap_scripts_phase();
[[fallthrough]];
case Phase::WaitingForASAPScripts:
// 7. Spin the event loop until the set of scripts that will execute as soon as possible and the list of scripts
// that will execute in order as soon as possible are empty.
if (!m_document->scripts_to_execute_as_soon_as_possible().is_empty()
|| !m_document->scripts_to_execute_in_order_as_soon_as_possible().is_empty())
return;
m_phase = Phase::WaitingForLoadEventDelay;
[[fallthrough]];
case Phase::WaitingForLoadEventDelay:
// 8. Spin the event loop until there is nothing that delays the load event in the Document.
if (m_document->anything_is_delaying_the_load_event())
return;
m_phase = Phase::Completed;
[[fallthrough]];
case Phase::Completed:
complete();
return;
}
}
void HTMLParserEndState::advance_to_asap_scripts_phase()
{
// AD-HOC: We need to scroll to the fragment on page load somewhere.
// But a script that ran in step 5 above may have scrolled the page already,
// so only do this if there is an actual fragment to avoid resetting the scroll position unexpectedly.
// Spec bug: https://github.com/whatwg/html/issues/10914
auto indicated_part = document->determine_the_indicated_part();
auto indicated_part = m_document->determine_the_indicated_part();
if (indicated_part.has<DOM::Element*>() && indicated_part.get<DOM::Element*>() != nullptr) {
document->scroll_to_the_fragment();
m_document->scroll_to_the_fragment();
}
// 6. Queue a global task on the DOM manipulation task source given the Document's relevant global object to run the following substeps:
queue_global_task(HTML::Task::Source::DOMManipulation, *document, GC::create_function(heap, [document] {
queue_global_task(HTML::Task::Source::DOMManipulation, *m_document, GC::create_function(m_document->heap(), [document = m_document] {
// 1. Set the Document's load timing info's DOM content loaded event start time to the current high resolution time given the Document's relevant global object.
document->load_timing_info().dom_content_loaded_event_start_time = HighResolutionTime::current_high_resolution_time(relevant_global_object(*document));
@@ -375,32 +470,23 @@ void HTMLParser::the_end(GC::Ref<DOM::Document> document, GC::Ptr<HTMLParser> pa
// FIXME: 5. Invoke WebDriver BiDi DOM content loaded with the Document's browsing context, and a new WebDriver BiDi navigation status whose id is the Document object's navigation id, status is "pending", and url is the Document object's URL.
}));
// 7. Spin the event loop until the set of scripts that will execute as soon as possible and the list of scripts that will execute in order as soon as possible are empty.
main_thread_event_loop().spin_until(GC::create_function(heap, [document] {
// AD-HOC: Also bail out if the document is no longer fully active (e.g. navigated away from).
// Otherwise this spin_until stays on the call stack indefinitely, and all subsequent
// event processing on the same event loop happens in nested spin_until pumping.
if (!document->is_fully_active())
return true;
return document->scripts_to_execute_as_soon_as_possible().is_empty();
}));
m_phase = Phase::WaitingForASAPScripts;
}
// 8. Spin the event loop until there is nothing that delays the load event in the Document.
main_thread_event_loop().spin_until(GC::create_function(heap, [document] {
// AD-HOC: Bail out if the document is no longer fully active.
if (!document->is_fully_active())
return true;
return !document->anything_is_delaying_the_load_event();
}));
void HTMLParserEndState::complete()
{
m_phase = Phase::Completed;
m_timeout->stop();
m_document->set_html_parser_end_state(nullptr);
// 9. Queue a global task on the DOM manipulation task source given the Document's relevant global object to run the following steps:
queue_global_task(HTML::Task::Source::DOMManipulation, *document, GC::create_function(document->heap(), [document, parser] {
queue_global_task(HTML::Task::Source::DOMManipulation, *m_document, GC::create_function(m_document->heap(), [document = m_document, parser = m_parser] {
// 1. Update the current document readiness to "complete".
document->update_readiness(HTML::DocumentReadyState::Complete);
// AD-HOC: We need to wait until the document ready state is complete before detaching the parser, otherwise the DOM complete time will not be set correctly.
if (parser)
document->detach_parser({});
document->detach_parser();
// 2. If the Document object's browsing context is null, then abort these steps.
if (!document->browsing_context())
@@ -442,7 +528,7 @@ void HTMLParser::the_end(GC::Ref<DOM::Document> document, GC::Ptr<HTMLParser> pa
// FIXME: 10. If the Document's print when loaded flag is set, then run the printing steps.
// 11. The Document is now ready for post-load tasks.
document->set_ready_for_post_load_tasks(true);
m_document->set_ready_for_post_load_tasks(true);
}
void HTMLParser::process_using_the_rules_for(InsertionMode mode, HTMLToken& token)