Let's not have LibRegex be the home of LibUnicode FFI. Move these to
LibUnicode so that we can:
1. Use these helpers in other libraries more easily.
2. Swap out icu4c methods with icu4x methods all within LibUnicode.
A future commit will make LibJS and LibRegex depend on LibUnicode's rust
module. The global allocator overrides will cause a compliation error:
error: the `#[global_allocator]` in this crate conflicts with global
allocator in: libunicode_rust
This hides the allocator override behind a feature flag that is enabled
for the standalone LibUnicode shared library.
First: We now pin the icu4x version to an exact number. Minor version
upgrades can result in noisy deprecation warnings and API changes which
cause tests to fail. So let's pin the known-good version exactly.
This patch updates our Rust calendar module to use the new APIs. This
initially caused some test failures due to the new Date::try_new API
(which is the recommended replacement for Date::try_new_from_codes)
having quite a limited year range of +/-9999. So we must use other
APIs (Date::try_from_fields and calendrical_calculations::gregorian)
to avoid these limits.
http://github.com/unicode-org/icu4x/blob/main/CHANGELOG.md#icu4x-22
Use mimalloc for Ladybird-owned allocations without overriding malloc().
Route kmalloc(), kcalloc(), krealloc(), and kfree() through mimalloc,
and put the embedded Rust crates on the same allocator via a shared
shim in AK/kmalloc.cpp.
This also lets us drop kfree_sized(), since it no longer used its size
argument. StringData, Utf16StringData, JS object storage, Rust error
strings, and the CoreAudio playback helpers can all free their AK-backed
storage with plain kfree().
Sanitizer builds still use the system allocator. LeakSanitizer does not
reliably trace references stored in mimalloc-managed AK containers, so
static caches and other long-lived roots can look leaked. Pass the old
size into the Rust realloc shim so aligned fallback reallocations can
move posix_memalign-backed blocks safely.
Static builds still need a little linker help. macOS app binaries need
the Rust allocator entry points forced in from liblagom-ak.a, while
static ELF links can pull in identical allocator shim definitions from
multiple Rust staticlibs. Keep the Apple -u flags and allow those
duplicate shim symbols for LibJS and LibRegex links on Linux and BSD.
Teach import_rust_crate() to track RustFFI.h as a real build output,
and teach the relevant Rust build scripts to rerun when their FFI
inputs change.
Also keep a copy of RustFFI.h in Cargo's own OUT_DIR and restore the
configured FFI output from that cached copy after cargo rustc runs.
This fixes the case where Ninja knows the header is missing, reruns
the custom command, and Cargo exits without rerunning build.rs
because the crate itself is already up to date.
When Cargo leaves multiple hashed build-script outputs behind, pick
the newest root-output before restoring RustFFI.h so we do not copy a
stale header after Rust-side API changes.
Finally, track the remaining Rust-side inputs that could leave build
artifacts stale: LibUnicode and LibJS now rerun build.rs when src/
changes, and the asmintgen rule now depends on Cargo.lock, the
BytecodeDef path dependency, and newly added Rust source files.
ICU will canonicalize "yes" to "true" in all Unicode keywords. This is a
bug - only some keywords should undergo this change. It will then change
"true" to the empty string. This is a known ICU bug, which we can avoid
for now by detecting which keywords should retain their "yes" value.
ICU's Islamic calendar implementations always set ERA=0, even for dates
before the Hijra (622 CE), using negative year values instead. However,
the CLDR defines two eras: "Anno Hegirae" (era 0) and "Before Hijrah"
(era 1). ECMA-402 expects distinct era names in formatToParts output.
Similarly, ICU's Coptic calendar has an empty CLDR era 0 name, causing
the era parts to be omitted entirely from formatted output.
This patch adds another icu::Calendar subclass to handle these cases.
Commit 86c8a57794 caused one regression in
test/intl402/DateTimeFormat. It is expected that Intl.DateTimeFormat and
Temporal produce consistent results.
Due to the píngqì approximation implemented in icu4x, it actually does
not totally align with icu4c for lunisolar calendars at extreme dates.
Ideally, icu4x will one day support all Intl.DateTimeFormat operations
as well. But for now, the fix is to create a custom icu::Calendar class
for lunisolar calendars that pipes calculations to icu4x.
There will be an extremely similar function that accepts a calendar year
rather than ISO year in a subsequent commit. Rename the ISO year variant
so that they are more distinguishable.
It is a known issue that corrosion_import_crate() does not track deps
and will issue a cargo command on every ninja invocation. Now that we
use Rust in LibUnicode, we initially see 3k+ targets queued as dirty.
This quickly drops to zero, but ideally, when no Rust-related files
have changed, ninja would simply report "no work to do".
This introduces an import_rust_crate() function. This is much like
corrosion, but it tracks dependencies. So when nothing changes, then
nothing is queued for rebuilds. It uses rustc's depfile to track Rust
dependencies automatically.
Replace the icu4c-based calendar implementation with one built on the
icu4x Rust crate (icu_calendar).
The icu4c API does not expose the píngqì month-assignment algorithm
used by the Chinese and Dangi lunisolar calendars. Our old code had to
approximate this by walking months via epoch millisecond arithmetic and
manually tracking leap month positions, which produced incorrect month
codes and ordinal month numbers for certain years. The icu4x calendar
crate handles píngqì natively.
With this patch, which is almost a 1-to-1 mapping of ICU invocations, we
pass 100% of all Temporal test262 tests.
The end goal might be to use icu4x for all of our ICU needs. But it does
not yet provide the APIs needed for all ECMA-402 prototypes.
When AdjustDateTimeStyleFormat determines that no adjustment is needed
(no conflicting fields), the original date/time style pattern should be
used as-is for Temporal type formatting. Previously, the CalendarPattern
was round-tripped through ICU, which apparently can produce a different
pattern than the original result.
This adds international calendar support to our Temporal implementation,
using the Intl Era and Month Code Proposal as a guide. See:
https://tc39.es/proposal-intl-era-monthcode/
Primary motivation for this change is the `VERIFY(icu_success(status))`
line in `Segmenter::create()` that was failing on multiple systems and
where we had to ask people to apply a patch to even know what the error
was.
Since this seems to be a recurring problem, let's just add a little
helper function and print the error codes returned by library calls.
ENABLE_WINDOWS_CI and the *_CI presets were initially added back when
the AK library and all the AK Test* executables were the only targets
that supported building and running in CI. Since then, almost all the
targets in the codebase are built on Windows besides the following:
- LibLine
- test-262-runner
Since these targets above are not required to actually run or test the
browser on Windows in its current experimental state, fully disabling
them should be fine for now.
ENABLE_WINDOWS_CI was also used to exclude test-web from ctest. This
can be fully disabled on Windows for now until proper runtime support
is added.
The remaining locations were all using ENABLE_WINDOWS_CI as a proxy for
ENABLE_ADDRESS_SANITIZER, so we can just be explicit instead.
The new presets map much more directly to the unix Release, Debug, and
Sanitizer presets which should make setting up ladybird on Windows less
confusing.
We also make the new Windows_Experimental_Release preset the default in
ladybird.py to match Unix.
For ASCII text, every character is its own grapheme - there are no
combining characters or emoji sequences. This means grapheme boundary
detection is trivial: next_boundary(i) is simply i+1.
This commit adds AsciiGraphemeSegmenter, a simple Segmenter subclass
that performs O(1) boundary lookups without any ICU overhead.
TextNode::grapheme_segmenter() now checks if the text is ASCII and uses
this fast path, avoiding expensive ICU BreakIterator cloning and
boundary detection for the common case of ASCII-only text.
icu::TimeZone::createDefault() was returning the timezone that was set
when current_time_zone() was first called or the timezone set via
set_current_time_zone().
This meant that even when the system timezone changed and we instructed
all WebContent processes to invoke
ConnectionFromClient::system_time_zone_changed(), the updated timezone
system was not being used as we fetched from icu's default timezone.
icu::TimeZone::detectHostTimeZone() gets the timezone from the current
host system configuration and ensures we are always synced with the host
if we have no timezone cache.
For the web, we allow a wobbly UTF-16 encoding (i.e. lonely surrogates
are permitted). Only in a few exceptional cases do we strictly require
valid UTF-16. As such, our `validate(AllowLonelySurrogates::Yes)` calls
will always succeed. It's a wasted effort to ever make such a check.
This patch eliminates such invocations. The validation methods will now
only check for strict UTF-16, and are only invoked when needed.