mirror of
https://github.com/servo/servo
synced 2026-05-11 17:37:21 +02:00
This change is a reworking of the shaping code and simplification of the `GlyphRun` data structure. The shaper was written between 2012 and 2014 against an early version of Rust. It was originally written to have a single glyph entry per UTF-8 code point. This is useful when you always need to iterate through glyphs based on UTF-8 code points. Unfortunately, this required a tri-level data structure to hold detailed glyphs and meant that CJK characters took over 3x the memory usage of ASCII characters. In addition, iterating through glyphs (the most common and basic operation on shaped text) required doing a lookup involving a binary search for detailed glyphs (ones that had large advances or mapped to more or less than a single UTF-8 code point). The new design of the `GlyphStore` is instead based on `chars` in the input string. These are tracked during layout so that the resulting glyph output can be interpreted relatively to its original character offset in the containing IFC. We are already dealing with IFC text on a per-character basis for a variety of reasons (such as text transformation and whitespace collapse). In addition, we will now able to implement mapping between the character offsets before and after layout transformations of the original DOM string. Now the penalty of more complex glyph iteration is only paid when transforming glyph offsets to character offsets. Currently this is only done for selections and clicking in text boxes, both of which are much less common than layout. This change does not properly handle selections in RTL text, though rendering and basic selection and visual movement works (though buggy). It does not seem like this affects the performance of shaping based on measurement using the text shaping performance counters. This likely means that the performance of shaping is dominated on our machines by HarfBuzz. We noticed no performance degradation in Speedometer when run on a M3 Mac. Followup work: - Properly handle selection in RTL text. - Support mapping from original DOM character offsets to offsets in layout after text transformation and whitespace collapse. This is now possible. Testing: This causes some tests to pass and a few to fail. This is likely due to the fact that we are handling glyphs more consistently while shaping. Of the new failures: - `letter-spacing-bengali-yaphala-001.html`, `letter-spacing-cursive-001.html`, `font-feature-settings-tibetan.html` where passing before probably because we were not applying letter spacing to detailed glyphs. These scripts should not have letter spacing applied to them, because they are cursive -- which we never implemented properly. It will be handled in a a followup. - `shaping-arabic-diacritics-001.html`: This was a false pass. The tests verifies that Arabic diacritics are applied to NBSP. This wasn't happening before nor after this change, but the results matched anyway. Now they don't, but before and after are equally broken. - Fixes: #216 Part of #35540. --------- Signed-off-by: Martin Robinson <mrobinson@igalia.com> Co-authored-by: Oriol Brufau <obrufau@igalia.com>