AK: Support UTF-16 string formatting

The underlying storage used during string formatting is StringBuilder.
To support UTF-16 strings, this patch allows callers to specify a mode
during StringBuilder construction. The default mode is UTF-8, for which
StringBuilder remains unchanged.

In UTF-16 mode, we treat the StringBuilder's internal ByteBuffer as a
series of u16 code units. Appending a single character will append 2
bytes for that character (cast to a char16_t). Appending a StringView
will transcode the string to UTF-16.

Utf16String also gains the same memory optimization that we added for
String, where we hand-off the underlying buffer to Utf16String to avoid
having to re-allocate.

In the future, we may want to further optimize for ASCII strings. For
example, we could defer committing to the u16-esque storage until we
see a non-ASCII code point.
This commit is contained in:
Timothy Flynn
2025-06-17 16:08:30 -04:00
committed by Tim Flynn
parent fe676585f5
commit 2803d66d87
Notes: github-actions[bot] 2025-07-18 16:47:24 +00:00
11 changed files with 362 additions and 55 deletions

View File

@@ -31,9 +31,15 @@ public:
static NonnullRefPtr<Utf16StringData> from_utf8(StringView, AllowASCIIStorage);
static NonnullRefPtr<Utf16StringData> from_utf16(Utf16View const&);
static NonnullRefPtr<Utf16StringData> from_utf32(Utf32View const&);
static NonnullRefPtr<Utf16StringData> from_string_builder(StringBuilder&);
~Utf16StringData() = default;
[[nodiscard]] static constexpr size_t offset_of_string_storage()
{
return offsetof(Utf16StringData, m_ascii_data);
}
void operator delete(void* ptr)
{
free(ptr);