Files
ladybird/Libraries/LibWeb/HTML/Parser/HTMLTokenizer.cpp
Aliaksandr Kalenik b6ffd51d1c LibWeb: Pause tokenizer at a CR right before the insertion point
HTML newline normalization collapses CRLF into a single LF, so
next_code_point() needs one code point of lookahead at a CR to decide
whether the CR stands alone or is the first half of a CRLF pair. When
the tokenizer is paused at the insertion point and the next code point
to consume is a CR sitting one position before it, that lookahead has
not been written yet.

Previously the tokenizer consumed the CR and emitted it as LF, so a
subsequent document.write() that began with LF surfaced as a second
LF instead of being absorbed into the original CRLF pair.

Stop one code point earlier in this case and wait for the next write
to arrive. This makes four html5lib write_single WPT tests pass.
2026-04-27 21:44:56 +02:00

118 KiB