For XHTML documents, resolve named character entities (e.g., )
using the HTML entity table via a getEntity SAX callback. This avoids
parsing a large embedded DTD on every document and matches the approach
used by Blink and WebKit.
This also removes the now-unused DTD infrastructure:
- Remove resolve_external_resource callback from Parser::Options
- Remove resolve_xml_resource() function and its ~60KB embedded DTD
- Remove all call sites passing the unused callback
This change replaces our LibXML parser with a new implementation that
wraps libxml2's SAX2 API.
The new Parser class uses libxml2's SAX2 callbacks to drive the existing
XML::Listener interface. That preserves backward compatibility with all
existing consumers (XMLDocumentBuilder, DOMParser, etc.).
`set_source` takes a ByteString but the implementation might require a
specific encoding. Make it fallible so that we don't need to crash in
the case of invalid UTF-8 or similar.
The test includes a sequence of invalid UTF-8 bytes that crash the
browser without this change.
This allows listeners to be notified when a CDATASection or
ProcessingInstruction is encountered during parsing. The non-listener
path still has the incorrect behavior of silently treating CDATASection
as Text nodes, but this allows listeners to handle them correctly.