Commit Graph

12 Commits

Author SHA1 Message Date
Timothy Flynn
f22710abba LibUnicode: Work around ICU bug over-canonicalizing "yes" keyword values
ICU will canonicalize "yes" to "true" in all Unicode keywords. This is a
bug - only some keywords should undergo this change. It will then change
"true" to the empty string. This is a known ICU bug, which we can avoid
for now by detecting which keywords should retain their "yes" value.
2026-03-14 08:17:03 -04:00
Timothy Flynn
86c8a57794 LibJS+LibUnicode: Use icu4x for Temporal calendar operations
Replace the icu4c-based calendar implementation with one built on the
icu4x Rust crate (icu_calendar).

The icu4c API does not expose the píngqì month-assignment algorithm
used by the Chinese and Dangi lunisolar calendars. Our old code had to
approximate this by walking months via epoch millisecond arithmetic and
manually tracking leap month positions, which produced incorrect month
codes and ordinal month numbers for certain years. The icu4x calendar
crate handles píngqì natively.

With this patch, which is almost a 1-to-1 mapping of ICU invocations, we
pass 100% of all Temporal test262 tests.

The end goal might be to use icu4x for all of our ICU needs. But it does
not yet provide the APIs needed for all ECMA-402 prototypes.
2026-03-11 07:09:57 -04:00
Timothy Flynn
b6699d439a LibUnicode: Add a cache for ICU calendar objects
Analogous to our existing locale and time zone caches.
2026-03-09 11:40:59 +01:00
InvalidUsernameException
1b2bca7831 LibUnicode: Print error codes when calls to ICU fail
Primary motivation for this change is the `VERIFY(icu_success(status))`
line in `Segmenter::create()` that was failing on multiple systems and
where we had to ask people to apply a patch to even know what the error
was.

Since this seems to be a recurring problem, let's just add a little
helper function and print the error codes returned by library calls.
2026-02-21 16:55:36 -05:00
Timothy Flynn
8472e469f4 AK+LibJS+LibWeb: Recognize that our UTF-16 string is actually WTF-16
For the web, we allow a wobbly UTF-16 encoding (i.e. lonely surrogates
are permitted). Only in a few exceptional cases do we strictly require
valid UTF-16. As such, our `validate(AllowLonelySurrogates::Yes)` calls
will always succeed. It's a wasted effort to ever make such a check.

This patch eliminates such invocations. The validation methods will now
only check for strict UTF-16, and are only invoked when needed.
2025-08-13 09:56:13 -04:00
Timothy Flynn
abcb2d42bc LibUnicode: Port Intl.PluralRules to UTF-16 strings 2025-07-24 10:39:52 +02:00
Timothy Flynn
db2148b44a LibJS+LibUnicode: Port Intl.ListFormat to UTF-16 strings 2025-07-24 10:39:52 +02:00
Timothy Flynn
7d80aabbdb LibJS+LibUnicode: Port Intl.DisplayNames to UTF-16 strings 2025-07-24 10:39:52 +02:00
Timothy Flynn
ee01f857d1 LibJS+LibUnicode: Port Intl.DateTimeFormat to UTF-16 strings 2025-07-24 10:39:52 +02:00
Timothy Flynn
b2f053e783 LibJS+LibUnicode: Port Intl.Collator to UTF-16 strings 2025-07-24 10:39:52 +02:00
Timothy Flynn
86b1c78c1a AK+Everywhere: Prepare Utf16View for integration with a UTF-16 string
To prepare for an upcoming Utf16String, this migrates Utf16View to store
its data as a char16_t. Most function definitions are moved inline and
made constexpr.

This also adds a UDL to construct a Utf16View from a string literal:

    auto string = u"hello"sv;

This let's us remove the NTTP Utf16View constructor, as we have found
that such constructors bloat binary size quite a bit.
2025-07-03 09:51:56 -04:00
Timothy Flynn
93712b24bf Everywhere: Hoist the Libraries folder to the top-level 2024-11-10 12:50:45 +01:00