ladybird

mirror of https://github.com/LadybirdBrowser/ladybird synced 2026-05-11 01:22:43 +02:00

Author	SHA1	Message	Date
Aliaksandr Kalenik	b88cbb1b74	LibWeb: Have speculative HTML parser populate the preload map When the regular HTML parser is blocked on an external script, the speculative parser scans ahead and pre-fetches discoverable sub-resources. Previously those fetches were tracked only in the parser's own URL list and never registered in the document's preload map, so when the regular parser later reached each element fetch()'s consume_a_preloaded_resource() lookup found nothing and issued a duplicate request — every parser-blocked sub-resource was fetched twice. issue_speculative_fetch now creates a PreloadEntry, registers it under create_a_preload_key(request) in the document's preload map, and supplies a processResponseConsumeBody callback that populates the entry. The map insertion happens after fetch() starts because fetch() runs consume_a_preloaded_resource() synchronously, so registering the entry beforehand would short-circuit the speculative fetch itself. The body-handling steps (1, 2, 5 of the preload algorithm's processResponseConsumeBody) are factored into a shared deliver_preload_response helper used by both the speculative parser and HTMLLinkElement::preload.	2026-04-29 15:59:22 +02:00
Aliaksandr Kalenik	4762c4fa5c	LibWeb: Add incremental HTML parsing Introduce IncrementalDocumentParser, which streams the response body through a TextCodec::StreamingDecoder into the HTMLTokenizer one chunk at a time. The tokenizer pauses when it runs out of input and resumes once the next chunk is appended; when the body closes we close the tokenizer's input stream so it can finish the parse. DocumentLoading routes HTML responses through the new parser instead of buffering the full body before handing it to HTMLParser.	2026-04-29 04:12:44 +02:00
Aliaksandr Kalenik	01fa8a27ac	LibWeb: Extract HTMLParser::run_until_completion() Pull the post-parse-action setup, run loop, and post-parse invocation out of HTMLParser::run(URL, ...) into a new run_until_completion() method. The URL overload still calls it; behavior is unchanged. The incremental parser will use this entry point directly without going through the URL-setting overload.	2026-04-29 04:12:44 +02:00
Aliaksandr Kalenik	f499edefae	LibWeb: Track whether HTMLParser is script-created Add a ScriptCreatedParser flag plumbed through HTMLParser's constructor and create_for_scripting(). Only document.open()'s parser sets it to Yes. Document::close() step 3 now checks is_script_created() so it correctly skips parsers that weren't created via document.open(), matching the spec. Previously the check was just `if (!m_parser)`, which incorrectly let document.close() insert an EOF into a network-driven parser. The bug was mostly latent because the network parser used to finish quickly, but it matters once the network parser stays alive for the duration of a streamed parse.	2026-04-29 04:12:44 +02:00
Aliaksandr Kalenik	c8368882b8	LibWeb: Allow tokenizer to run out of characters mid-tokenization Add can_run_out_of_characters() and use it in the NamedCharacterReference state and consume_next_if_match() so that an open input stream gets the same code-point-at-a-time treatment as an active document.write insertion point. Without this, a network chunk that ends partway through a named character reference or a multi-character match would make the tokenizer commit to a "no match" decision before the remaining bytes arrive. No behavior change for existing callers: the new helper still returns false once the input stream is closed (which the StringView constructor sets immediately).	2026-04-29 04:12:44 +02:00
Aliaksandr Kalenik	c439f810f2	LibWeb: Track input stream closed state in HTMLTokenizer Add an explicit "input stream closed" flag plus the streaming-input API (append_to_input_stream, close_input_stream, is_input_stream_closed) to let a future incremental driver feed bytes as they arrive. Rewrite should_pause_before_next_input_character so the tokenizer pauses when the buffer is exhausted but more bytes may still arrive, including the case where a chunk ends in CR (CRLF normalization needs one code point of lookahead). Existing call sites are unaffected: the StringView constructor immediately marks the input stream closed, and insert_eof() now also closes the stream so document.close() drives the same exit path.	2026-04-29 04:12:44 +02:00
Aliaksandr Kalenik	b6ffd51d1c	LibWeb: Pause tokenizer at a CR right before the insertion point HTML newline normalization collapses CRLF into a single LF, so next_code_point() needs one code point of lookahead at a CR to decide whether the CR stands alone or is the first half of a CRLF pair. When the tokenizer is paused at the insertion point and the next code point to consume is a CR sitting one position before it, that lookahead has not been written yet. Previously the tokenizer consumed the CR and emitted it as LF, so a subsequent document.write() that began with LF surfaced as a second LF instead of being absorbed into the original CRLF pair. Stop one code point earlier in this case and wait for the next write to arrive. This makes four html5lib write_single WPT tests pass.	2026-04-27 21:44:56 +02:00
Aliaksandr Kalenik	c44c36416e	LibWeb: Preserve old insertion points across reentrant scripts The HTML parser's script end tag algorithms save the current insertion point in an "old insertion point" local before executing a script, then restore that local after script execution. Ladybird modeled that local as a single tokenizer field, so nested script execution via document.write() could overwrite the outer script's saved value. Keep a stack of old insertion points instead, and adjust saved offsets when document.write() inserts new input before them. This keeps the normal script and SVG script paths aligned with the spec text while leaving the parser-blocking script resume path to set the insertion point to undefined again.	2026-04-27 18:02:19 +02:00
Aliaksandr Kalenik	53fa1b19f1	LibWeb: Make external SVG script fetches async Replace the spin_until in SVGScriptElement::process_the_script_element with an async fetch that mirrors HTMLScriptElement's mark_as_ready pattern. External SVG scripts now fetch and execute asynchronously, matching Chromium's behavior. For HTML-embedded SVG scripts, the parser pauses via the existing schedule_resume_check infrastructure, extended to support SVG scripts through a new pending_parsing_blocking_svg_script slot on Document. For top-level XML/SVG documents, scripts execute when their fetch completes; the load event is delayed via DocumentLoadEventDelayer which the existing XMLDocumentBuilder::document_end already waits on.	2026-04-27 03:04:07 +02:00
Aliaksandr Kalenik	70ac025eff	LibWeb: Implement the speculative HTML parser When the HTML parser blocks on a synchronous external script, run a separate tokenizer over the unparsed input and issue speculative fetches for the resources it finds (script src, link rel=stylesheet\|preload, img src), with <base href> tracking and template/foreign-content skipping. Also fills in the previously-stubbed "consume a preloaded resource" algorithm and the document's "map of preloaded resources", so that <link rel="preload"> followed by a matching consumer deduplicates to a single fetch.	2026-04-26 18:48:29 +02:00
Aliaksandr Kalenik	b1ccab81ad	LibWeb: Replace spin_until in HTMLParser::handle_text with async resume Spinning a nested event loop to wait for a parser-blocking script blocks the calling thread, can deadlock, and creates reentrancy hazards. Switch to an event-driven pause/resume model, mirroring the prior HTMLParserEndState refactor (`df96b69e7a`). Three WPT document.write tests flip from Fail to Pass and are rebaselined: all write an external script via document.write() followed by inline content. With spin_until, control did not return to the caller of document.write() between writing the script and observing its effects so the test's order assertions saw a different sequence than the spec mandates.	2026-04-26 10:44:45 +02:00
Pavel Shliak	0e98fdccd5	LibWeb/HTML: Fix ruby parse error check for rp/rt	2026-04-22 15:30:41 +01:00
Shannon Booth	8642801889	LibWeb: Set fragment scripting mode from the context document This corresponds with the editorial change to the HTML standard introducing the parsing mode enum of: https://github.com/whatwg/html/commit/01c45cede And a follow up normative change of: https://github.com/whatwg/html/commit/508706c80 Making fragment parsing derive its scripting mode from the context document.	2026-04-14 23:01:36 +02:00
Andreas Kling	88d4d1b1a6	LibWeb: Use VM helpers for execution context access Inline JS-to-JS frames no longer live in the raw execution context vector, so LibWeb callers that need to inspect or pop contexts now go through VM helpers instead of peeking into that storage directly. This keeps the execution context bookkeeping encapsulated while preserving existing microtask and realm-entry checks.	2026-04-13 18:29:43 +02:00
Aliaksandr Kalenik	df96b69e7a	LibWeb: Replace spin_until in HTMLParser::the_end() with state machine HTMLParser::the_end() had three spin_until calls that blocked the event loop: step 5 (deferred scripts), step 7 (ASAP scripts), and step 8 (load event delay). This replaces them with an HTMLParserEndState state machine that progresses asynchronously via callbacks. The state machine has three phases matching the three spin_until calls: - WaitingForDeferredScripts: loops executing ready deferred scripts - WaitingForASAPScripts: waits for ASAP script lists to empty - WaitingForLoadEventDelay: waits for nothing to delay the load event Notification triggers re-evaluate the state machine when conditions change: HTMLScriptElement::mark_as_ready, stylesheet unblocking in StyleElementBase/HTMLLinkElement, did_stop_being_active_document, and DocumentLoadEventDelayer decrements. NavigableContainer state changes (session history readiness, content navigable cleared, lazy load flag) also trigger re-evaluation of the load event delay check. Key design decisions and why: 1. Microtask checkpoint in schedule_progress_check(): The old spin_until called perform_a_microtask_checkpoint() before checking conditions. This is critical because HTMLImageElement::update_the_image_data step 8 queues a microtask that creates the DocumentLoadEventDelayer. Without the checkpoint, check_progress() would see zero delayers and complete before images start delaying the load event. 2. deferred_invoke in schedule_progress_check(): I tried Core::Timer (0ms), queue_global_task, and synchronous calls. Timers caused non-deterministic ordering with the HTML event loop's task processing timer, leading to image layout tests failing (wrong subtest pass/fail patterns). Synchronous calls fired too early during image load processing before dimensions were set, causing 0-height images in layout tests. queue_global_task had task ordering issues with the session history traversal queue. deferred_invoke runs after the current callback returns but within the same event loop pump, giving the right balance. 3. Navigation load event guard (m_navigation_load_event_guard): During cross-document navigation, finalize_a_cross_document_navigation step 2 calls set_delaying_load_events(false) before the session history traversal activates the new document. This creates a transient state where the parent's load event delay check sees the about:blank (which has ready_for_post_load_tasks=true) as the active document and completes prematurely.	2026-03-28 23:14:55 +01:00
Sam Atkins	ed6a5f25a0	LibWeb: Implement scoped custom element registries	2026-03-27 19:49:55 +00:00
Luke Wilde	0381c40cb4	LibWeb: Reset non-FACEs and don't associate them to a form during parse (FACE stands for form-associated custom element)	2026-03-25 13:18:15 +00:00
Jelle Raaijmakers	428a47cb7c	LibWeb: Reduce size of `Optional<HTMLToken>`	2026-03-20 12:03:36 +01:00
Tim Ledbetter	36f59a406e	LibWeb: Put HTML parser debug message behind a flag	2026-03-10 11:14:04 +01:00
Andreas Kling	e87f889e31	Everywhere: Abandon Swift adoption After making no progress on this for a very long time, let's acknowledge it's not going anywhere and remove it from the codebase.	2026-02-17 10:48:09 -05:00
Aliaksandr Kalenik	30e4779acb	AK+LibWeb: Reduce recompilation impact of DOM/Node.h Remove includes from Node.h that are only needed for forward declarations (AccessibilityTreeNode.h, XMLSerializer.h, JsonObjectSerializer.h). Extract StyleInvalidationReason and FragmentSerializationMode enums into standalone lightweight headers so downstream headers (CSSStyleSheet.h, CSSStyleProperties.h, HTMLParser.h) can include just the enum they need instead of all of Node.h. Replace Node.h with forward declarations in headers that only use Node by pointer/reference. This breaks the circular dependency between Node.h and AccessibilityTreeNode.h, reducing AccessibilityTreeNode.h's recompilation footprint from ~1399 to ~25 files.	2026-02-11 20:02:28 +01:00
Andreas Kling	9b987baf0e	LibWeb: Bail out of the_end() spin_untils for inactive documents When a document is navigated away from while HTMLParser::the_end() is spinning the event loop (steps 7 and 8), the spin_until stays on the call stack indefinitely, causing all subsequent event processing on the same event loop to happen within nested spin_until pumping. Add is_fully_active() checks to bail out early in this case.	2026-02-10 21:19:35 +01:00
Jelle Raaijmakers	ae20ecf857	AK+Everywhere: Add Vector::contains(predicate) and use it No functional changes.	2026-01-08 15:27:30 +00:00
Andreas Kling	a9cc425cde	LibJS+LibWeb: Add missing GC marking visits This adds visit_edges(Cell::Visitor&) methods to various helper structs that contain GC pointers, and makes sure they are called from owning GC-heap-allocated objects as needed. These were found by our Clang plugin after expanding its capabilities. The added rules will be enforced by CI going forward.	2026-01-07 12:48:58 +01:00
Feng Yu	b58fcaeecf	LibWeb: Add HTMLSelectedContentElement for customizable select Introduce the HTMLSelectedContentElement and integrate it into <select>, <option> and HTMLParser. See whatwg/html#10548. There are two bugs with WPT tests which causes the third subtest in selectedcontent.html and selectedcontent-mutations.html fail. See whatwg/html#11882, web-platform-tests/wpt#55849.	2025-12-12 12:06:24 +00:00
Feng Yu	d2029b1814	LibWeb: Relax HTML parser to allow more tags inside <select> This implements parsing part of customizable <select> spec update. See whatwg/html PR #10548. Two failing subtests in `html5lib_innerHTML_tests_innerHTML_1.html` and `customizable-select/select-parsing.html` are due to the spec still disallowing `<input>` inside `<select>`, even though Chrome has already implemented this behavoir (see whatwg/html#11288).	2025-12-04 17:17:01 +00:00
Timothy Flynn	3dce6766a3	LibWeb: Extract some CORS and MIME Fetch helpers to their own files An upcoming commit will migrate the contents of Headers.h/cpp to LibHTTP for use outside of LibWeb. These CORS and MIME helpers depend on other LibWeb facilities, however, so they cannot be moved.	2025-11-27 14:57:29 +01:00
Sam Atkins	a25cb679fb	LibWeb/HTML: Update spec text related to template's content Corresponds to: `aa52274b5a`	2025-11-27 10:26:13 +00:00
Sam Atkins	8ca4833885	LibWeb/HTML: Update spec text in create_element_for() No behaviour changes.	2025-11-26 09:52:47 +01:00
Sam Atkins	6e2f8166f4	LibWeb/HTML: Combine duplicate parsing branches These are combined in the current spec. No behaviour change.	2025-11-26 09:52:47 +01:00
Sam Atkins	6a4ab26b48	LibWeb/HTML: Return early from find_appropriate_place_for_inserting_node Step 2.(a).5 says to abort, but we were instead carrying on and would run steps 3 and 4. Those steps would not change the result at all, but this avoids a little unnecessary work. I wrapped a couple of comments at 120 columns while I was at it.	2025-11-26 09:52:47 +01:00
Sam Atkins	418e22d65a	LibWeb/HTML: Bring hand_in_head in HTML parser more up to date A couple of spec text changes I noticed, and use `has_attribute()` instead of manually checking it.	2025-11-26 09:52:47 +01:00
Jelle Raaijmakers	b3c186ce64	LibWeb: Add spec steps to prescan a byte stream algorithm Fix up the implementation as we go. Also change `ByteBuffer const&` into `ReadonlyBytes` everywhere, so we don't accidentally copy bytes.	2025-11-21 17:43:08 +01:00
Jelle Raaijmakers	f52632d48a	LibWeb: Skip right amount of characters during encoding detection When detecting an element's opening tag, the spec asks us to skip ahead to the first whitespace or end chevron character before trying to read attributes. Instead, we were always skipping 2 positions ahead and then ignoring all whitespace characters and slashes, which was clearly wrong. Theoretically this could have caused some weird behaviors if part of the opening tag matched an expected attribute name, but it's very unlikely to see that in the wild.	2025-11-21 17:43:08 +01:00
Jelle Raaijmakers	4bcf988e46	LibWeb: Use FlyString attribute name comparisons in encoding detection Prevent some unnecessary work by performing pointer comparisons.	2025-11-21 17:43:08 +01:00
Jelle Raaijmakers	4ebbbdfef1	LibWeb: Advance position at attribute end quote in encoding detection This did not cause any immediate issues except generating instances of `Attr` with useless values which caused some unnecessary work during encoding detection.	2025-11-21 17:43:08 +01:00
Psychpsyo	100f37995f	Everywhere: Clean up AD-HOC and FIXME comments without colons	2025-11-13 15:56:04 +01:00
Lorenz A	f8330a2ec5	LibWeb: Do not execute unclosed SVG script tags	2025-11-09 01:43:46 +01:00
Tim Ledbetter	478cf4f42c	LibWeb: Explicitly disallow skipping 0 characters in HTML tokenizer Previously, this caused the source position of an ambiguous ampersand to be incorrect.	2025-11-06 10:59:54 +01:00
Andreas Kling	3593c3b687	LibWeb: Throw out decoded UTF-32 data in HTMLTokenizer after parser runs This ends up saving quite a bit of memory on many pages, since UTF-32 uses 4 bytes per code points. As an example, it reduces the footprint on https://gymgrossisten.com/ by 2 MiB.	2025-10-24 08:52:53 +02:00
Andreas Kling	b10f2993b3	LibWeb: Use Optional<ssize_t> for the HTMLTokenizer insertion points Instead of making a custom struct, we can just use Optional here, to make the code feel a bit more idiomatic.	2025-10-24 08:52:53 +02:00
mikiubo	0b715b20a2	LibWeb: Make HTML fragment parsing return ExceptionOr Update Element::parse_fragment and Node::unsafely_set_html to propagate exceptions. This refactor is needed as a prerequisite for implementing the XML fragment parser, which requires consistent error handling in fragment parsing.	2025-10-23 11:06:39 +01:00
Lorenz A	6afd39b16a	LibWeb: Keep the tokens in `ListOfActiveFormattingElements`	2025-10-21 23:36:07 +02:00
Lorenz A	eeef370902	LibWeb: Check that elements are HTML elements in `StackOfOpenElements`	2025-10-20 12:14:14 +01:00
Lorenz A	e6831003c6	LibWeb: Check for Svg & MathML tags in stack of open elements scope list & button scope need to check svg & mathml elements besides the list from s_base_list see https://html.spec.whatwg.org/multipage/parsing.html#has-an-element-in-the-specific-scope	2025-10-10 12:09:20 +01:00
Lorenz A	6f31d9a40d	LibWeb: Ensure Noah's Ark clause is called in the Parser	2025-10-09 12:24:45 +01:00
Tim Ledbetter	b2591309d6	LibWeb: Use preferred style for swift `guard case` clause This stops `swift-format` complaining.	2025-09-27 20:38:47 +01:00
Tete17	82f56e30ed	LibWeb: Adapt the parsing of script elements to accommodate TrustedTypes	2025-09-16 10:57:34 +02:00
Lorenz A	47796e7967	LibWeb: Serialize HTML attribute names as per spec	2025-09-15 10:08:12 +02:00
euro20179	e442aa6e10	LibWeb: Ensure parser cannot change the mode is handled This fixes at least 1 wpt bug where text/plain documents are rendered in quirks mode. The test in question: https://wpt.live/html/browsers/browsing-the-web/read-text/load-text-plain.html	2025-09-07 11:11:43 +01:00

1 2 3 4

154 Commits