ladybird/Libraries/LibWeb/HTML/Parser/HTMLTokenizer.cpp at 39fff3cf8a4b8e8f990c8c4ed3fd073c18f8a738

mirror of https://github.com/LadybirdBrowser/ladybird synced 2026-04-27 02:05:07 +02:00

Files

Andreas Kling 263b125782 LibWeb: Let HTMLTokenizer walk over code points instead of UTF-8

Instead of using UTF-8 iterators to traverse the HTMLTokenizer input
stream one code point at a time, we now do a one-shot conversion up
front from the input encoding to a Vector<u32> of Unicode code points.

This simplifies the tokenizer logic somewhat, and ends up being faster
as well, so win-win.

1.02x speedup on Speedometer 2.1

2025-05-11 01:13:20 +02:00

115 KiB

Raw Blame History

View Raw

115 KiB Raw Blame History

115 KiB

Raw Blame History