When the regular HTML parser is blocked on an external script, the
speculative parser scans ahead and pre-fetches discoverable
sub-resources. Previously those fetches were tracked only in the
parser's own URL list and never registered in the document's preload
map, so when the regular parser later reached each element fetch()'s
consume_a_preloaded_resource() lookup found nothing and issued a
duplicate request — every parser-blocked sub-resource was fetched
twice.
issue_speculative_fetch now creates a PreloadEntry, registers it
under create_a_preload_key(request) in the document's preload map,
and supplies a processResponseConsumeBody callback that populates
the entry. The map insertion happens after fetch() starts because
fetch() runs consume_a_preloaded_resource() synchronously, so
registering the entry beforehand would short-circuit the
speculative fetch itself.
The body-handling steps (1, 2, 5 of the preload algorithm's
processResponseConsumeBody) are factored into a shared
deliver_preload_response helper used by both the speculative parser
and HTMLLinkElement::preload.
IFrame geometry changes and object representation changes directly
selected style invalidation reasons from their HTML element classes.
Move those mappings into a new
CSS::Invalidation::EmbeddedContentInvalidator.
The HTML elements continue to own their loading, representation, and
layout-tree side effects. CSS invalidation now owns the style dirtiness
associated with those embedded-content changes.
CustomStateSet directly selected the style invalidation reason used when
its JS-visible set is modified. Move that mapping into
CSS::Invalidation::CustomElementInvalidator.
This keeps the custom-state container focused on its set contents while
CSS invalidation owns the style work required by :state() selectors.
HTMLInputElement still mapped its picker open-state change directly to a
style invalidation reason. Move that mapping into
CSS::Invalidation::ElementStateInvalidator alongside the matching select
open-state helper.
This keeps another element-state invalidation decision out of the HTML
implementation without changing the invalidation behavior.
Several HTML element state changes directly selected style invalidation
reasons from their element implementations. Move those mappings into a
new CSS::Invalidation::ElementStateInvalidator helper.
This keeps details, dialog, option, and select code focused on their own
state changes while CSS invalidation owns the style work those changes
require. The existing invalidation breadth is preserved.
HTMLInputElement had two call sites spelling out the same checked and
unchecked pseudo-class invalidation set. Move that selector policy into
FormControlInvalidator.
This keeps the input element responsible for detecting state changes,
while CSS::Invalidation owns the affected selector features.
HTML and SVG link elements both encoded the same pseudo-class list for
hyperlink state changes. Move that CSS policy into LinkInvalidator and
have both call sites delegate to it.
This keeps element-specific code focused on detecting hyperlink state
changes, while the helper owns the affected selector features.
Introduce IncrementalDocumentParser, which streams the response body
through a TextCodec::StreamingDecoder into the HTMLTokenizer one chunk
at a time. The tokenizer pauses when it runs out of input and resumes
once the next chunk is appended; when the body closes we close the
tokenizer's input stream so it can finish the parse.
DocumentLoading routes HTML responses through the new parser instead of
buffering the full body before handing it to HTMLParser.
Pull the post-parse-action setup, run loop, and post-parse invocation
out of HTMLParser::run(URL, ...) into a new run_until_completion()
method. The URL overload still calls it; behavior is unchanged. The
incremental parser will use this entry point directly without going
through the URL-setting overload.
Add a ScriptCreatedParser flag plumbed through HTMLParser's constructor
and create_for_scripting(). Only document.open()'s parser sets it to
Yes. Document::close() step 3 now checks is_script_created() so it
correctly skips parsers that weren't created via document.open(),
matching the spec.
Previously the check was just `if (!m_parser)`, which incorrectly let
document.close() insert an EOF into a network-driven parser. The bug
was mostly latent because the network parser used to finish quickly,
but it matters once the network parser stays alive for the duration of
a streamed parse.
Add can_run_out_of_characters() and use it in the
NamedCharacterReference state and consume_next_if_match() so that an
open input stream gets the same code-point-at-a-time treatment as an
active document.write insertion point. Without this, a network chunk
that ends partway through a named character reference or a
multi-character match would make the tokenizer commit to a "no match"
decision before the remaining bytes arrive.
No behavior change for existing callers: the new helper still returns
false once the input stream is closed (which the StringView constructor
sets immediately).
Add an explicit "input stream closed" flag plus the streaming-input API
(append_to_input_stream, close_input_stream, is_input_stream_closed) to
let a future incremental driver feed bytes as they arrive. Rewrite
should_pause_before_next_input_character so the tokenizer pauses when
the buffer is exhausted but more bytes may still arrive, including the
case where a chunk ends in CR (CRLF normalization needs one code point
of lookahead).
Existing call sites are unaffected: the StringView constructor
immediately marks the input stream closed, and insert_eof() now also
closes the stream so document.close() drives the same exit path.
HTML newline normalization collapses CRLF into a single LF, so
next_code_point() needs one code point of lookahead at a CR to decide
whether the CR stands alone or is the first half of a CRLF pair. When
the tokenizer is paused at the insertion point and the next code point
to consume is a CR sitting one position before it, that lookahead has
not been written yet.
Previously the tokenizer consumed the CR and emitted it as LF, so a
subsequent document.write() that began with LF surfaced as a second
LF instead of being absorbed into the original CRLF pair.
Stop one code point earlier in this case and wait for the next write
to arrive. This makes four html5lib write_single WPT tests pass.
Fixes flakiness in worker tests that create a Worker or SharedWorker
with a missing script URL and only attach an error handler to it.
Once the test callback returns, nothing keeps the worker rooted from
JavaScript. If GC ran before the WebWorker process reported the
script fetch failure, the Worker/WorkerAgentParent cycle could be
collected and the error event never delivered, leaving the test hung
until timeout.
Hold startup-pending WorkerAgentParents from the outside
EnvironmentSettingsObject and release that edge once the script load
succeeds, fails, or the worker closes. The worker now survives long
enough to deliver its first script-load result.
The HTML parser's script end tag algorithms save the current insertion
point in an "old insertion point" local before executing a script, then
restore that local after script execution. Ladybird modeled that local
as a single tokenizer field, so nested script execution via
document.write() could overwrite the outer script's saved value.
Keep a stack of old insertion points instead, and adjust saved offsets
when document.write() inserts new input before them. This keeps the
normal script and SVG script paths aligned with the spec text while
leaving the parser-blocking script resume path to set the insertion
point to undefined again.
When a click handler calls history.replaceState and the link's
cross-document activation behavior runs in the same task, the queued
sync step runs apply-the-history-step on the navigable mid-navigation,
transitioning its ongoing navigation through "traversal" and back to
null. That aborts the link's navigate event and bails out its deferred
work, leaving the link nav abandoned.
Skip this transient when the navigable already has a fresh ongoing
navigation. No major engine reproduces the race; long term, sync
same-document nav should bypass the traversal queue entirely (matching
Chromium).
This solves the long-standing issue where clicking on a box of tea
on https://twinings.co.uk/ would freeze the browser. :^)
The spec says to run inline if on the navigable's active window's
relevant agent's event loop, otherwise queue. WebContent is always on
the main thread event loop, so this collapses to "always inline".
Queueing here let the abort cancel a navigate event created later in
the same task, instead of the one it was queued for.
Replace the spin_until in SVGScriptElement::process_the_script_element
with an async fetch that mirrors HTMLScriptElement's mark_as_ready
pattern. External SVG scripts now fetch and execute asynchronously,
matching Chromium's behavior.
For HTML-embedded SVG scripts, the parser pauses via the existing
schedule_resume_check infrastructure, extended to support SVG scripts
through a new pending_parsing_blocking_svg_script slot on Document.
For top-level XML/SVG documents, scripts execute when their fetch
completes; the load event is delayed via DocumentLoadEventDelayer which
the existing XMLDocumentBuilder::document_end already waits on.
Split Rust program compilation so code generation and assembly finish
before the main thread materializes GC-backed executable objects. The
new CompiledProgram handle owns the parsed program, generator state, and
bytecode until C++ consumes it on the main thread.
Wire WebContent script fetching through that handle for classic scripts
and modules. Syntax-error paths still return ParsedProgram, so existing
error reporting stays in place. Successful fetches now do top-level
codegen on the thread pool before deferred_invoke hands control back to
the main thread.
Executable creation, SharedFunctionInstanceData materialization, module
metadata extraction, and declaration data extraction still run on the
main thread where VM and GC access is valid.
When the HTML parser blocks on a synchronous external script, run a
separate tokenizer over the unparsed input and issue speculative fetches
for the resources it finds (script src, link rel=stylesheet|preload, img
src), with <base href> tracking and template/foreign-content skipping.
Also fills in the previously-stubbed "consume a preloaded resource"
algorithm and the document's "map of preloaded resources", so that
<link rel="preload"> followed by a matching consumer deduplicates to
a single fetch.
Generate exact JS buffer types for exact IDL buffer arguments instead
of widening them to BufferSource or ArrayBufferView.
This fixes cases like TextEncoder.encodeInto(), whose IDL requires
a Uint8Array destination. Previously the generated binding accepted
any BufferSource, so DataView, other typed arrays, and
ArrayBuffer-backed values were let through. With exact conversion,
those are rejected at the binding layer as expected.
The img inside a <picture> has to re-run "update the image data" when
nearby <source> elements change, so script-driven swaps of srcset (and
the other dimension/media attributes) actually take effect.
Per the HTML spec, the relevant mutations for an img element include:
"The element's parent is a picture element and a source element that
is a previous sibling has its srcset, sizes, media, type, width or
height attributes set, changed, or removed."
The same applies to source insertion, moving, and removal.
Fixes image loading on https://www.apple.com/mac/
Spinning a nested event loop to wait for a parser-blocking script blocks
the calling thread, can deadlock, and creates reentrancy hazards. Switch
to an event-driven pause/resume model, mirroring the prior
HTMLParserEndState refactor (df96b69e7a).
Three WPT document.write tests flip from Fail to Pass and are
rebaselined: all write an external script via document.write() followed
by inline content. With spin_until, control did not return to the caller
of document.write() between writing the script and observing its effects
so the test's order assertions saw a different sequence than the spec
mandates.
The whitespace-normalization loop in prepare_text() called
StringBuilder::append() on each code point, which resolves to the
`char` overload and truncates non-ASCII characters. measureText("ó")
therefore returned a width of 0, despite fillText painting the glyph.
Use append_code_point() instead, and add a regression test for both
precomposed and decomposed accented text.
The list previously omitted AVIF even though we ship a working
AVIFImageDecoderPlugin, which meant <picture><source type="image/avif">
candidates and image-set(... type("image/avif")) candidates were
unconditionally skipped.
Extract the file-local is_supported_image_type() helper from
HTMLImageElement into a small standalone translation unit so other
parts of the engine can ask the same question. The next commit reuses
it for the image-set() type() filter.
The list is still hard-coded; deriving it from the registered image
decoders remains a FIXME.
HTMLImageElement's update-the-image-data step 16 queues its state
transition and load event dispatch via a 1 ms BatchingDispatcher, so
the current request does not become CompletelyAvailable synchronously
when the fetch finishes. decode()'s on_finish callback, however, was
queuing its resolve task directly on the event loop, bypassing the
batch. That race meant decode() could resolve while the image request
was still in Unavailable state, so any .then() handler inspecting
img.width / img.height (or anything derived from the bitmap) would see
zeros.
Google Maps hits this on its .9.png road shield icons: after awaiting
img.decode() it reads a.width / a.height and calls
ctx.getImageData(0, 0, 0, 0), which throws IndexSizeError and aborts
the tile rendering pipeline.
Route decode()'s on_finish through the same BatchingDispatcher so both
are processed in the same batch, with the decode resolution queued
after step 16's element task.
notify_about_rejected_promises() is called for every related environment
settings object at the end of every microtask checkpoint. It was
unconditionally reading the realm up front, which showed up at 3.0% self
time in a YouTube playback profile.
This patch moves the realm lookup into the queued task callback, which
happens way less often.
These handlers crashed on several kinds of JS-dispatched input:
zero-width range (divide by zero in the slider mouse handler),
step="any" (MUST(step_up) throws InvalidStateError), plain Event
without clientX/deltaY/key (unchecked as_foo() asserts on
undefined), min > max (trips clamp()'s VERIFY), and input.type
changes leaving the range listeners attached to dereference empty
Optionals from the range-only min()/max() accessors.
Gate each handler on its expected type_state() and on
allowed_value_step() having a value, validate event property types
before converting, and bail out on zero-width rects or min > max.
Six crash tests cover the new paths.
Hit on a Cloudflare challenge page.
HTMLLinkElement::removed_from() used `old_root` to find the
StyleSheetList to remove the link's stylesheet from. That's wrong
when the link element lives inside a shadow tree that is itself
nested within a larger removed subtree: Node::remove() hands every
shadow-including descendant the outer subtree's root as `old_root`,
not the descendant's own containing root. So we'd look in the
document's list while the sheet was actually in the shadow root's
list, failing the did_remove VERIFY in StyleSheetList::remove_sheet.
Fix by using the sheet's own owning-root tracking. A link-owned sheet
always has exactly one owning document or shadow root (only constructed
stylesheets can be adopted, and link sheets are never constructed), so
we can just read that entry.
Also make owning_documents_or_shadow_roots() return by const reference
instead of copying the HashTable on every call, which benefits existing
iterating callers too.
Fixes a crash on https://nytimes.com/.
Import WebIDL/Function.idl where TimerHandler uses Function, and let the
bindings generator handle it through the normal callback-function path.
This removes the special C++ mapping for Function and makes TimerHandler
use GC::Root<CallbackType>, matching the generated binding type when IDL
files are parsed together.
Previously, the select button's text was only refreshed inside the
two non-trivial branches of the selectedness setting algorithm.
Paths that left the select with exactly one selected option hit a
no-op branch and skipped the refresh.
Fix this by implementing the "clone selected option into select
button" algorithm and invoking it whenever the set of selected options
may have changed.
Bypass the async body-reading pipeline for about:srcdoc iframes whose
body bytes are already in memory. Set up a deferred parser at document
load time and run the post-activation update synchronously, so the body
element exists before parent script can observe the new document via
contentDocument. This matches Chrome and Firefox behavior for srcdoc
iframes and fixes the flaky test
`set-innerHTML-inside-iframe-srcdoc-document.html` that relied on body
being non-null.
Co-authored-by: Tim Ledbetter <tim.ledbetter@ladybird.org>
Start making IDL::Context represent the shared IDL world used during
code generation.
Move globally visible parsed IDL such as dictionaries, enums,
typedefs, callbacks, mixins, and partial declarations out of individual
Interface objects and into Context.
The main goal of this change is a step towards invoking the IDL
generator on every IDL file at once, rather than per interface.
In the meantime as standalone improvements, this lets code generation
resolve imported IDL types through the shared Context without copying
imported declarations onto each Interface. It also makes duplicate
local declarations unnecessary for imported shared types, since an
interface can reference an enum or dictionary owned by another
parsed IDL module without re-emitting it itself.
This tightens the implementation of video element sizing to the spec by
implementing two spec concepts:
- The media resource's natural width and height, and
- The video element's natural width and height.
The element's natural dimensions change based on the representation,
which has many inputs, so update checks are triggered from many
locations.
The resize event is fired when the media resource's natural dimensions
change, and the layout is invalidated if the element's natural
dimensions change.
Tests for a few important resize triggers have been added.
This state will indicate to the media element that it's not guaranteed
to have a frame yet, for the purposes of determining the ready state.
JavaScript should be sure that video elements with a ready state of
HAVE_CURRENT_DATA or greater represent the current video frame already.
To allow the state to be exited if audio is disabled, audio tracks are
now only added to the buffering set on enable if the audio sink exists,
since without the sink starting the data provider, it will never be
removed.
This is a step towards making video ref tests.
This allows us to differentiate between having no data available yet,
having current data, and having future data. The main purpose of this
is to allow a new starting state to explicitly force HAVE_METADATA
instead of >= HAVE_CURRENT_DATA.
Note that the SeekingStateHandler returns Current instead of None. This
is deliberate, since the buffered ranges from the demuxer(s) can be
used to inform whether the possibly-current data is actually available
at the seek target.
A while ago, we removed the relayout upon rendering a new frame. In
doing so, it became possible for the layout to remain stale after the
video metadata had loaded, leaving the video drawn in a 0x0 box.
Otherwise, the PlaybackManager may get stuck waiting for enough data to
read the metadata and call on_metadata_parsed.
This is unfortunately difficult to test without direct control over the
fetching process, but it could cause flakes in tests that wait for
loadeddata.
By implementing this method ourselves, we no longer go through
::supported_property_names() and skip both the vector allocation and
sorting, which we don't need to determine if a property name is present.