Files
servo/components/script/dom/html
Martin Robinson 2a7d2780b3 fonts: Store shaping output per-character rather than per-code point (#42105)
This change is a reworking of the shaping code and simplification of the
`GlyphRun` data structure.

The shaper was written between 2012 and 2014 against an early version of
Rust. It was originally written to have a single glyph entry per UTF-8
code point. This is useful when you always need to iterate through
glyphs based on UTF-8 code points. Unfortunately, this required a
tri-level data structure to hold detailed glyphs and meant that CJK
characters took over 3x the memory usage of ASCII characters. In
addition, iterating through glyphs (the most common and basic operation
on shaped text) required doing a lookup involving a binary search for
detailed glyphs (ones that had large advances or mapped to more or less
than a single UTF-8 code point).

The new design of the `GlyphStore` is instead based on `chars` in the
input string. These are tracked during layout so that the resulting
glyph output can be interpreted relatively to its original character
offset in the containing IFC. We are already dealing with IFC text on a
per-character basis for a variety of reasons (such as text
transformation and whitespace collapse). In addition, we will now able
to
implement mapping between the character offsets before and after layout
transformations of the original DOM string.

Now the penalty of more complex glyph iteration is only paid when
transforming glyph offsets to character offsets. Currently this is only
done for selections and clicking in text boxes, both of which are much
less common than layout.

This change does not properly handle selections in RTL text, though
rendering and basic selection and visual movement works (though buggy).

It does not seem like this affects the performance of shaping based on
measurement using the text shaping performance counters. This likely
means that the performance of shaping is dominated on our machines by
HarfBuzz. We noticed no performance degradation in Speedometer when run
on a M3 Mac.

Followup work:
 - Properly handle selection in RTL text.
 - Support mapping from original DOM character offsets to offsets in
   layout after text transformation and whitespace collapse. This is now
   possible.

Testing: This causes some tests to pass and a few to fail. This is
likely
due to the fact that we are handling glyphs more consistently while
shaping. Of the new failures:
- `letter-spacing-bengali-yaphala-001.html`,
`letter-spacing-cursive-001.html`, `font-feature-settings-tibetan.html`
where passing before probably because we were not applying letter
spacing to detailed glyphs. These scripts should not have letter spacing
applied to them, because they are cursive -- which we never implemented
properly. It will be handled in a a followup.
- `shaping-arabic-diacritics-001.html`: This was a false pass. The tests
verifies that Arabic diacritics are applied to NBSP. This wasn't
happening before nor after this change, but the results matched anyway.
Now they don't, but before and after are equally broken.
 - 
Fixes: #216
Part of #35540.

---------

Signed-off-by: Martin Robinson <mrobinson@igalia.com>
Co-authored-by: Oriol Brufau <obrufau@igalia.com>
2026-01-26 12:38:04 +00:00
..