Commit Graph

4 Commits

Author SHA1 Message Date
Tim Ledbetter
ae0f5ef9ce LibUnicode+LibWeb: Add infrastructure for line segmentation using ICU
No behavior change This is needed for correct UAX#14 line breaking.
2026-02-14 16:23:18 -05:00
Andreas Kling
8cdfbfed49 LibUnicode+LibWeb: Add fast path grapheme segmenter for ASCII text
For ASCII text, every character is its own grapheme - there are no
combining characters or emoji sequences. This means grapheme boundary
detection is trivial: next_boundary(i) is simply i+1.

This commit adds AsciiGraphemeSegmenter, a simple Segmenter subclass
that performs O(1) boundary lookups without any ICU overhead.

TextNode::grapheme_segmenter() now checks if the text is ASCII and uses
this fast path, avoiding expensive ICU BreakIterator cloning and
boundary detection for the common case of ASCII-only text.
2026-01-11 11:10:19 +01:00
Timothy Flynn
8b6e3cb735 LibWeb+LibUnicode+WebContent: Port DOM:CharacterData to UTF-16
This replaces the underlying storage of CharacterData with Utf16String
and deals with the fallout.
2025-07-24 19:00:20 +02:00
Timothy Flynn
93712b24bf Everywhere: Hoist the Libraries folder to the top-level 2024-11-10 12:50:45 +01:00