Commit Graph

8 Commits

Author SHA1 Message Date
Andreas Kling
355fb6b825 LibWeb: Stream Rust CSS tokenizer tokens over FFI
Avoid building a temporary Rust token vector before calling back into
C++. The tokenizer now invokes the callback as each token is produced,
while borrowing the already-filtered input for source slices.

Reserve an initial C++ token capacity from the input size so the common
path avoids repeated growth while appending the converted tokens.

With this change, the Rust CSS tokenizer is now ~1.3x faster than the
C++ CSS tokenizer at churning through all the https://vercel.com/ CSS.
2026-05-03 17:22:17 +02:00
Sam Atkins
46d0d241e6 LibWeb/CSS: Avoid reconsume-then-consume dance in Rust tokenizer 2026-05-03 09:49:00 +02:00
Sam Atkins
d7bfbe3ee1 LibWeb/CSS: Use a Rust enum for the internal Token type
This will be nicer to work with once we have a Parser in Rust too.
2026-05-03 09:49:00 +02:00
Sam Atkins
5e29a2a419 LibWeb/CSS: Use Range for Rust Token ranged fields 2026-05-03 09:49:00 +02:00
Sam Atkins
305d3eeb80 LibWeb/CSS: Replace series of ifs in consume_a_token() with a match
More idiomatic for Rust, and also lets us remove a few of these helper
methods.
2026-05-03 09:49:00 +02:00
Sam Atkins
77cdbd278d LibWeb/CSS: Use tuples instead of U32Twin and U32Triplet 2026-05-03 09:49:00 +02:00
Sam Atkins
b02d96220c LibWeb/CSS: Rename next_code_point() to consume_code_point()
Makes it clearer that this consumes.
2026-05-03 09:49:00 +02:00
Sam Atkins
4278194d96 LibWeb/CSS: Port the CSS Tokenizer to Rust
test-css-tokenizer is updated to run both the C++ and Rust tokenizers
and compare their output, to ensure they behave identically. The Parser
still uses the C++ Tokenizer.

The LibWeb crate, FFI layer etc are all based on the existing ones for
other libraries.

This is a direct AI translation to get us started, and not idiomatic
Rust. Future work can be done to make it more sensible.
2026-05-03 09:49:00 +02:00