Commit Graph

19 Commits

Author SHA1 Message Date
Andreas Kling
2e59dbd10f LibWeb: Don't process frameset token twice in "in body" insertion mode
We were neglecting to return after handling the `frameset` start tag,
which caused us to process it twice, once properly and once generically.

54 new passes in WPT/html/syntax/parsing/ :^)
2025-02-20 14:32:13 +01:00
Andreas Kling
6f7b865cd1 LibWeb: Let HTML parser handle EOF inserted by document.close()
Before this change, the explicit EOF inserted by document.close() would
instantly abort the parser. This meant that parsing algorithms that ran
as part of the parser unwinding on EOF would never actually run.

591 new passes in WPT/html/syntax/parsing/ :^)

This exposed a problem where the parser would try to insert a root
<html> element on EOF in a document where someone already inserted such
an element via direct DOM manipulation. The parser now gracefully
handles this scenario. It's covered by existing tests (which would
crash without this change.)
2025-02-20 14:32:13 +01:00
Andreas Kling
dc652aee75 LibWeb: Also run HTML parser tests in "write" and "write_single" mode
Since we don't support the "variant" meta tag stuff in WPT, I've simply
copied the test files here, and then test.js looks at the filename to
figure out which test function to use.

This incrases our coverage of the HTML parser substantially by also
invoking it via document.write() one-shot, and character-at-a-time.
2025-02-20 14:32:13 +01:00
Tim Ledbetter
8b5e9c2a1d LibWeb: Emit comment token for unterminated bogus comments on EOF 2025-01-11 11:09:47 +01:00
Ryan Liptak
1ba15e1fa3 LibWeb: Fix hex character references accepting all alphabetic ASCII
Instead of just A-F/a-f, any char A-Z/a-z was being accepted as a valid
hexadecimal digit.
2025-01-07 00:43:41 +01:00
Ryan Liptak
df87a9689c LibWeb: Fix numeric character reference at EOF leaking its last digit
Previously, if the NumericCharacterReferenceEnd state was reached when
current_input_character was None, then the
DONT_CONSUME_NEXT_INPUT_CHARACTER macro would restore back before the
EOF, and allow the next state (after the SWITCH_TO_RETURN_STATE) to
proceed with the last digit of the numeric character reference.

For example, with something like `&#1111`, before this commit the
output would incorrectly be `<code point with the value 1111>1` instead
of just `<code point with the value 1111>`.

Instead of putting the `if (current_input_character.has_value())` check
inside NumericCharacterReferenceEnd directly, it was instead added to
DONT_CONSUME_NEXT_INPUT_CHARACTER, because all usages of the macro
benefit from this check, even if the other existing usage sites don't
exhibit any bugs without it:

- In MarkupDeclarationOpen, if the current_input_character is EOF, then
  the previous character is always `!`, so restoring and then checking
  forward for strings like `--`, `DOCTYPE`, etc won't match and the
  BogusComment state will run one extra time (once for `!` and once
  for EOF) with no practical consequences. With the `has_value()` check,
  BogusComment will only run once with EOF.

- In AfterDOCTYPEName, ConsumeNextResult::RanOutOfCharacters can only
  occur when stopping at the insertion point, and because of how
  the code is structured, it is guaranteed that current_input_character
  is either `P` or `S`, so the `has_value()` check is irrelevant.
2025-01-07 00:43:41 +01:00
Ryan Liptak
752deaf6ef LibWeb: Don't skip named-character-references test
This already passes, so there's no reason to skip it anymore
2025-01-07 00:43:41 +01:00
Pavel Shliak
d5cdda0b40 LibWeb: Load external scripts in SVG 2024-12-11 16:29:42 -07:00
Tim Ledbetter
61ae388140 Tests: Create imported WPT test output from completion callback data
This allows us to disable test output, which performs expensive assert
tracking. This was making our imported tests run significantly slower
than tests run via `WPT.sh`.

Formatting the output ourselves also allows us to remove unnecessary
information from the test output.

This commit also rebaselines all existing imported WPT tests to follow
the new format.
2024-12-02 22:41:51 +00:00
Gingeh
0adf261c32 LibWeb: Don't end parsing after reaching the insertion point 2024-11-26 23:50:18 +01:00
Andreas Kling
01f4bbbba7 LibWeb: Abandon Node.replaceChild() if removal rejigs the DOM
This isn't directly in the spec, but since replaceChild is implemented
in terms of remove + insert, the removal step may cause arbitrary code
to execute, and so we have to verify that the replaceChild inputs still
make sense afterwards, before doing the insertion.

This roughly matches what WebKit does, and makes a bunch of HTML parsing
tests in WPT stop asserting.
2024-11-24 11:45:23 +01:00
Andreas Kling
69367194a6 LibWeb: Make HTML tokenizer stop at insertion point after state switch
If we reach the insertion point at the same time as we switch to another
tokenizer state, we have to bail immediately without proceeding with the
next code point. Otherwise we'd fetch the next token, get an EOF marker,
and then proceed as if we're at the end of the input stream, even though
more data may be coming (with more calls to document.write()..)
2024-11-23 19:19:31 +01:00
Psychpsyo
f09ed59351 LibWeb: Add the search element 2024-11-19 23:30:43 +00:00
Andreas Kling
ceedfb34d2 LibWeb: Look at both namespace & tag name in HTML parser stack checks
We were neglecting to check the namespace when looking for a specific
type of element on the stack of open elements in many cases.

This caused us to confuse HTML and SVG elements.
2024-11-05 12:29:27 +01:00
Andreas Kling
49b88fc095 LibWeb: Don't compare against HTML-uppercased tag names in HTML parser
Element::tag_name() returns an uppercased string for HTML elements,
which is usually not what's expected by the parser algorithms that look
at tag names.
2024-11-05 12:29:27 +01:00
Andreas Kling
64747c0397 LibWeb: Make HTML parser "current node" APIs return nullable pointer
It's actually possible for there to be no adjusted current node, when
the stack of open elements is empty. This was covered by one of the WPT
parsing tests.
2024-11-03 20:32:32 +01:00
Andreas Kling
0a47d8cb08 LibWeb: Handle more MathML/HTML integration points in the parser 2024-11-03 20:32:32 +01:00
Andreas Kling
ae39c54e51 LibWeb: Fix SVG tag adjustment for feSpotLight, feTile, feTurbulence 2024-11-03 20:32:32 +01:00
Andreas Kling
88a4a86ece Tests: Import many HTML parsing tests from WPT
I had to skip a lot of these due to assertion failures that we need to
investigate before unskipping.
2024-11-03 17:51:44 +01:00