Commit Graph

10 Commits

Author SHA1 Message Date
sideshowbarker
1b41659efd LibXML+LibWeb: Use existing HTML entities table for XML parsing too
For XHTML documents, resolve named character entities (e.g.,  )
using the HTML entity table via a getEntity SAX callback. This avoids
parsing a large embedded DTD on every document and matches the approach
used by Blink and WebKit.

This also removes the now-unused DTD infrastructure:

- Remove resolve_external_resource callback from Parser::Options
- Remove resolve_xml_resource() function and its ~60KB embedded DTD
- Remove all call sites passing the unused callback
2026-01-09 19:13:41 +00:00
sideshowbarker
fac81e84ba LibXML: Replace the existing XML parser with libxml2 parsing
This change replaces our LibXML parser with a new implementation that
wraps libxml2's SAX2 API.

The new Parser class uses libxml2's SAX2 callbacks to drive the existing
XML::Listener interface. That preserves backward compatibility with all
existing consumers (XMLDocumentBuilder, DOMParser, etc.).
2026-01-07 14:38:52 +01:00
rmg-x
b9554038ff LibWeb+LibXML: Make Listener::set_source(ByteString) fallible
`set_source` takes a ByteString but the implementation might require a
specific encoding. Make it fallible so that we don't need to crash in
the case of invalid UTF-8 or similar.

The test includes a sequence of invalid UTF-8 bytes that crash the
browser without this change.
2025-10-02 02:25:28 +02:00
ayeteadoe
2c91014bbf LibXML: Enable EXPLICIT_SYMBOL_EXPORT 2025-08-24 12:58:27 -06:00
Andreas Kling
b7595013c1 LibWeb+LibXML: Preserve element attribute order in XML documents
We now use OrderedHashMap instead of HashMap to ensure that attributes
on XML elements retain their original order.
2025-08-22 11:35:59 +02:00
Andrew Kaster
d9976b98b9 LibXML: Add parser hooks for CDATASection and ProcessingInstructions
This allows listeners to be notified when a CDATASection or
ProcessingInstruction is encountered during parsing. The non-listener
path still has the incorrect behavior of silently treating CDATASection
as Text nodes, but this allows listeners to handle them correctly.
2025-07-19 14:56:20 +02:00
Timothy Flynn
7280ed6312 Meta: Enforce newlines around namespaces
This has come up several times during code review, so let's just enforce
it using a new clang-format 20 option.
2025-05-14 02:01:59 -06:00
Timothy Flynn
488034477a Revert "LibWeb: Set doctype node immediately while parsing XML document"
This reverts commit cd446e5e9c.

This broke about 20k WPT subtests, all related to XML parsing. See:
https://wpt.fyi/results/html/the-xhtml-syntax/parsing-xhtml-documents?diff=&filter=ADC&run_id=5154815472828416&run_id=5090731742199808
2024-11-20 19:11:56 -05:00
Andreas Kling
cd446e5e9c LibWeb: Set doctype node immediately while parsing XML document
Instead of deferring it to the end of parsing, where scripts that
were expecting to look at the doctype may have already run.
2024-11-20 16:10:57 +01:00
Timothy Flynn
93712b24bf Everywhere: Hoist the Libraries folder to the top-level 2024-11-10 12:50:45 +01:00