ladybird/Libraries/LibWeb/HTML/Parser/HTMLTokenizer.cpp at 62d9a84b8d7fa20821784112b31ae4e886eb61dd

mirror of https://github.com/LadybirdBrowser/ladybird synced 2026-04-27 10:07:15 +02:00

Files

Andreas Kling 263b125782 LibWeb: Let HTMLTokenizer walk over code points instead of UTF-8

Instead of using UTF-8 iterators to traverse the HTMLTokenizer input
stream one code point at a time, we now do a one-shot conversion up
front from the input encoding to a Vector<u32> of Unicode code points.

This simplifies the tokenizer logic somewhat, and ends up being faster
as well, so win-win.

1.02x speedup on Speedometer 2.1

2025-05-11 01:13:20 +02:00

115 KiB

Raw Blame History

View Raw

115 KiB Raw Blame History

115 KiB

Raw Blame History