ladybird

mirror of https://github.com/LadybirdBrowser/ladybird synced 2026-05-12 01:46:46 +02:00

Author	SHA1	Message	Date
Tim Ledbetter	f161215f53	LibUnicode: Add an ASCII fast path for line break segmentation Add a Segmenter implementation that implements the UAX#14 line breaking rules applicable to ASCII text. This avoids the need to build an ICU BreakIterator for the majority of text on the web.	2026-05-05 18:59:07 +02:00
InvalidUsernameException	1b2bca7831	LibUnicode: Print error codes when calls to ICU fail Primary motivation for this change is the `VERIFY(icu_success(status))` line in `Segmenter::create()` that was failing on multiple systems and where we had to ask people to apply a patch to even know what the error was. Since this seems to be a recurring problem, let's just add a little helper function and print the error codes returned by library calls.	2026-02-21 16:55:36 -05:00
Tim Ledbetter	ae0f5ef9ce	LibUnicode+LibWeb: Add infrastructure for line segmentation using ICU No behavior change This is needed for correct UAX#14 line breaking.	2026-02-14 16:23:18 -05:00
Andreas Kling	8cdfbfed49	LibUnicode+LibWeb: Add fast path grapheme segmenter for ASCII text For ASCII text, every character is its own grapheme - there are no combining characters or emoji sequences. This means grapheme boundary detection is trivial: next_boundary(i) is simply i+1. This commit adds AsciiGraphemeSegmenter, a simple Segmenter subclass that performs O(1) boundary lookups without any ICU overhead. TextNode::grapheme_segmenter() now checks if the text is ASCII and uses this fast path, avoiding expensive ICU BreakIterator cloning and boundary detection for the common case of ASCII-only text.	2026-01-11 11:10:19 +01:00
Timothy Flynn	8b6e3cb735	LibWeb+LibUnicode+WebContent: Port DOM:CharacterData to UTF-16 This replaces the underlying storage of CharacterData with Utf16String and deals with the fallout.	2025-07-24 19:00:20 +02:00
Timothy Flynn	fe676585f5	AK: Add a UTF-16 string with optimized short- and ASCII-string storage This is a strictly UTF-16 string with some optimizations for ASCII. * If created from a short UTF-8 or UTF-16 string that is also ASCII, then the string is stored in an inlined byte buffer. * If created with a long UTF-8 or UTF-16 string that is also ASCII, then the string is stored in an outlined char buffer. * If created with a short or long UTF-8 or UTF-16 string that is not ASCII, then the string is stored in an outlined char16 buffer. We do not store short non-ASCII text in the inlined buffer to avoid confusion with operations such as `length_in_code_units` and `code_unit_at`. For example, "😀" would be stored as 4 UTF-8 bytes in short string form. But we still want `length_in_code_units` to be 2, and `code_unit_at(0)` to be 0xD83D.	2025-07-18 12:45:38 -04:00
Timothy Flynn	86b1c78c1a	AK+Everywhere: Prepare Utf16View for integration with a UTF-16 string To prepare for an upcoming Utf16String, this migrates Utf16View to store its data as a char16_t. Most function definitions are moved inline and made constexpr. This also adds a UDL to construct a Utf16View from a string literal: auto string = u"hello"sv; This let's us remove the NTTP Utf16View constructor, as we have found that such constructors bloat binary size quite a bit.	2025-07-03 09:51:56 -04:00
mikiubo	8ec72d6906	LibUnicode: Avoid rejecting end-of-text position as a valid boundary When the cursor was positioned at the end of text, attempting to move it left(using the left arrow key) would fail because align_boundary() was rejecting the end-of-text position as a valid boundary.	2025-04-11 15:30:17 -04:00
Timothy Flynn	e6b7c8cde2	LibUnicode: Consistently reject out-of-bounds segmenter indices In the UTF-8 implementation, this prevents out-of-bounds access of the underlying text data, as the ICU macro would essentially do something akin to `text[text.length()]`. The UTF-16 implementation already checks for out-of-bounds, but would previously return 0. We now return an empty Optional in both impls. This doesn't affect LibJS (the user of the UTF-16 impl), as it already does bounds checking before invoking LibUnicode APIs.	2025-01-16 23:22:48 +01:00
Timothy Flynn	93712b24bf	Everywhere: Hoist the Libraries folder to the top-level	2024-11-10 12:50:45 +01:00

10 Commits