script: Prescan byte stream to determine encoding before parsing document (#41376)

Servo currently completely ignores `<meta charset>` tags. When we find
one with an encoding that is incompatible to the current one, then we
should reload the page and start over with the new encoding. A common
optimization that has even made its way into the specification is to
wait for a few bytes to arrive and inspect them for `meta` tags, so the
browser is able to use the correct encoding from the very beginng.

In practice, I've run into problems with our WPT harness when reloading
the page after `meta` tags. Therefore, this change implement the
optimization first, so we never have to reload when running WPT. I've
implemented prescanning in a way where we wait for 1024 bytes to arrive
or for one second to pass, whichever one happens first.

This causes a large number of web platform tests to flip around. I've
looked at most of the new failures and I believe they're reasonable.

Testing: New tests start to pass.
Part of https://github.com/servo/servo/issues/6414

---------

Signed-off-by: Simon Wülker <simon.wuelker@arcor.de>
This commit is contained in:
Simon Wülker
2025-12-19 10:54:19 +01:00
committed by GitHub
parent b02465bc53
commit 8c344f5641
60 changed files with 1032 additions and 3285 deletions

View File

@@ -82,39 +82,3 @@
[Check encoding windows-1252, zero-around-label-trail.htm]
expected: FAIL
[Check encoding windows-1253, xml-and-meta.htm]
expected: FAIL
[Check encoding windows-1253, incomplete-utf-16le-and-meta.htm]
expected: FAIL
[Check encoding windows-1253, incomplete-utf-16be-and-meta.htm]
expected: FAIL
[Check encoding windows-1253, xml-and-meta-trail.htm]
expected: FAIL
[Check encoding windows-1253, incomplete-utf-16le-and-meta-trail.htm]
expected: FAIL
[Check encoding windows-1253, incomplete-utf-16be-and-meta-trail.htm]
expected: FAIL
[Check encoding UTF-16LE, utf-16le-and-meta.htm]
expected: FAIL
[Check encoding UTF-16LE, utf-16le-and-meta-trail.htm]
expected: FAIL
[Check encoding UTF-16BE, utf-16be-and-meta.htm]
expected: FAIL
[Check encoding UTF-16BE, utf-16be-and-meta-trail.htm]
expected: FAIL
[Check encoding replacement, replacement.htm]
expected: FAIL
[Check encoding replacement, replacement-trail.htm]
expected: FAIL