Files
ladybird/Libraries/LibWeb/HTML/Parser/HTMLTokenizer.cpp
Andreas Kling 263b125782 LibWeb: Let HTMLTokenizer walk over code points instead of UTF-8
Instead of using UTF-8 iterators to traverse the HTMLTokenizer input
stream one code point at a time, we now do a one-shot conversion up
front from the input encoding to a Vector<u32> of Unicode code points.

This simplifies the tokenizer logic somewhat, and ends up being faster
as well, so win-win.

1.02x speedup on Speedometer 2.1
2025-05-11 01:13:20 +02:00

115 KiB