Instead of using UTF-8 iterators to traverse the HTMLTokenizer input
stream one code point at a time, we now do a one-shot conversion up
front from the input encoding to a Vector<u32> of Unicode code points.
This simplifies the tokenizer logic somewhat, and ends up being faster
as well, so win-win.
1.02x speedup on Speedometer 2.1