Commit Graph

76 Commits

Author SHA1 Message Date
Undefine
e39a8719fd Meta: Move most dependency checks to check_for_dependencies.cmake
This file was here for quite a long while now. Let's finally move most
of the dependency checks to one centralized place.
2026-04-20 16:41:29 -06:00
Timothy Flynn
10ce847931 LibJS+LibUnicode: Use LibUnicode as appropriate for lexing JavaScript
Now that LibUnicode exports its character type APIs in Rust, we can use
them to lex identifiers and whitespace.

Fixes #8870.
2026-04-19 10:39:26 +02:00
Timothy Flynn
11719369e8 LibRegex+LibUnicode: Migrate Unicode Rust FFI methods to LibUnicode
Let's not have LibRegex be the home of LibUnicode FFI. Move these to
LibUnicode so that we can:

1. Use these helpers in other libraries more easily.
2. Swap out icu4c methods with icu4x methods all within LibUnicode.
2026-04-19 10:39:26 +02:00
Timothy Flynn
46912fce8d LibUnicode: Hide the Unicode allocator override behind a feature
A future commit will make LibJS and LibRegex depend on LibUnicode's rust
module. The global allocator overrides will cause a compliation error:

    error: the `#[global_allocator]` in this crate conflicts with global
    allocator in: libunicode_rust

This hides the allocator override behind a feature flag that is enabled
for the standalone LibUnicode shared library.
2026-04-19 10:39:26 +02:00
Andrew Kaster
f26cb24751 Rust: Add a config file for rustfmt
This sets max_width to 120, which causes a lot of reformatting.
2026-04-18 08:05:47 -04:00
Andreas Kling
0755c7d180 LibUnicode: Remove redundant month casts
Remove three no-op `u8` casts from calendar date conversions.
2026-04-16 22:44:41 +02:00
Timothy Flynn
4b1ecbc9df LibJS+LibUnicode: Update icu4x's calendar module to 2.2.0
First: We now pin the icu4x version to an exact number. Minor version
upgrades can result in noisy deprecation warnings and API changes which
cause tests to fail. So let's pin the known-good version exactly.

This patch updates our Rust calendar module to use the new APIs. This
initially caused some test failures due to the new Date::try_new API
(which is the recommended replacement for Date::try_new_from_codes)
having quite a limited year range of +/-9999. So we must use other
APIs (Date::try_from_fields and calendrical_calculations::gregorian)
to avoid these limits.

http://github.com/unicode-org/icu4x/blob/main/CHANGELOG.md#icu4x-22
2026-04-14 18:12:31 -04:00
Andreas Kling
b23aa38546 AK: Adopt mimalloc v2 as main allocator
Use mimalloc for Ladybird-owned allocations without overriding malloc().
Route kmalloc(), kcalloc(), krealloc(), and kfree() through mimalloc,
and put the embedded Rust crates on the same allocator via a shared
shim in AK/kmalloc.cpp.

This also lets us drop kfree_sized(), since it no longer used its size
argument. StringData, Utf16StringData, JS object storage, Rust error
strings, and the CoreAudio playback helpers can all free their AK-backed
storage with plain kfree().

Sanitizer builds still use the system allocator. LeakSanitizer does not
reliably trace references stored in mimalloc-managed AK containers, so
static caches and other long-lived roots can look leaked. Pass the old
size into the Rust realloc shim so aligned fallback reallocations can
move posix_memalign-backed blocks safely.

Static builds still need a little linker help. macOS app binaries need
the Rust allocator entry points forced in from liblagom-ak.a, while
static ELF links can pull in identical allocator shim definitions from
multiple Rust staticlibs. Keep the Apple -u flags and allow those
duplicate shim symbols for LibJS and LibRegex links on Linux and BSD.
2026-04-08 09:57:53 +02:00
Andreas Kling
c167bfd50a Meta: Make Rust FFI headers reproducible
Teach import_rust_crate() to track RustFFI.h as a real build output,
and teach the relevant Rust build scripts to rerun when their FFI
inputs change.

Also keep a copy of RustFFI.h in Cargo's own OUT_DIR and restore the
configured FFI output from that cached copy after cargo rustc runs.
This fixes the case where Ninja knows the header is missing, reruns
the custom command, and Cargo exits without rerunning build.rs
because the crate itself is already up to date.

When Cargo leaves multiple hashed build-script outputs behind, pick
the newest root-output before restoring RustFFI.h so we do not copy a
stale header after Rust-side API changes.

Finally, track the remaining Rust-side inputs that could leave build
artifacts stale: LibUnicode and LibJS now rerun build.rs when src/
changes, and the asmintgen rule now depends on Cargo.lock, the
BytecodeDef path dependency, and newly added Rust source files.
2026-03-31 15:59:04 +02:00
Ali Mohammad Pur
b92e630a59 LibUnicode: Avoid expensive path in canonicalize() for ASCII chars 2026-03-20 16:10:25 -05:00
Andrew Kaster
125c89f9e5 LibUnicode: Apply workspace clippy lints to Rust code
Follow-up to 94d3aa8d89
2026-03-18 08:36:13 -04:00
Andrew Kaster
6b1a346fad LibUnicode: Generate FFI bindings with cbindgen
This places the exported structs and functions in the Unicode::FFI
namespace.
2026-03-18 08:36:13 -04:00
Timothy Flynn
f22710abba LibUnicode: Work around ICU bug over-canonicalizing "yes" keyword values
ICU will canonicalize "yes" to "true" in all Unicode keywords. This is a
bug - only some keywords should undergo this change. It will then change
"true" to the empty string. This is a known ICU bug, which we can avoid
for now by detecting which keywords should retain their "yes" value.
2026-03-14 08:17:03 -04:00
Timothy Flynn
776134ce03 LibJS+LibUnicode: Add an API to loop over Unicode extensions of one type 2026-03-14 08:17:03 -04:00
Timothy Flynn
d094de39fc LibJS: Format "islamic" and "islamic-rgsa" calendars as "islamic-tbla"
This was missed when implementing the Intl Era and Month Code proposal.
2026-03-13 14:43:45 -04:00
Timothy Flynn
34d7a8fa69 LibUnicode: Handle ICU vs ECMA-402 era formatting discrepancies
ICU's Islamic calendar implementations always set ERA=0, even for dates
before the Hijra (622 CE), using negative year values instead. However,
the CLDR defines two eras: "Anno Hegirae" (era 0) and "Before Hijrah"
(era 1). ECMA-402 expects distinct era names in formatToParts output.
Similarly, ICU's Coptic calendar has an empty CLDR era 0 name, causing
the era parts to be omitted entirely from formatted output.

This patch adds another icu::Calendar subclass to handle these cases.
2026-03-13 14:43:45 -04:00
Timothy Flynn
3c0d0d248c LibUnicode: Move icu::Calendar subclass to a Calendars folder
There will be another calendar subclass coming up, so let's give them a
bit of organization.
2026-03-13 14:43:45 -04:00
Timothy Flynn
ef899027c5 LibUnicode: Use a calendar with icu4x glue to format lunisolar calendars
Commit 86c8a57794 caused one regression in
test/intl402/DateTimeFormat. It is expected that Intl.DateTimeFormat and
Temporal produce consistent results.

Due to the píngqì approximation implemented in icu4x, it actually does
not totally align with icu4c for lunisolar calendars at extreme dates.
Ideally, icu4x will one day support all Intl.DateTimeFormat operations
as well. But for now, the fix is to create a custom icu::Calendar class
for lunisolar calendars that pipes calculations to icu4x.
2026-03-12 17:29:59 -05:00
Timothy Flynn
101fee6cb2 LibJS+LibUnicode: Rename a LibUnicode FFI function for clarity
There will be an extremely similar function that accepts a calendar year
rather than ISO year in a subsequent commit. Rename the ISO year variant
so that they are more distinguishable.
2026-03-12 17:29:59 -05:00
Timothy Flynn
397be77866 LibJS+LibUnicode: Migrate MonthCode and its utilities to LibUnicode
Will be used in a Chinese/Dangi calendar implementation.
2026-03-12 17:29:59 -05:00
Timothy Flynn
ba8d63556f Meta: Replace corrosion with custom function that tracks dependencies
It is a known issue that corrosion_import_crate() does not track deps
and will issue a cargo command on every ninja invocation. Now that we
use Rust in LibUnicode, we initially see 3k+ targets queued as dirty.
This quickly drops to zero, but ideally, when no Rust-related files
have changed, ninja would simply report "no work to do".

This introduces an import_rust_crate() function. This is much like
corrosion, but it tracks dependencies. So when nothing changes, then
nothing is queued for rebuilds. It uses rustc's depfile to track Rust
dependencies automatically.
2026-03-11 16:57:09 -04:00
Timothy Flynn
86c8a57794 LibJS+LibUnicode: Use icu4x for Temporal calendar operations
Replace the icu4c-based calendar implementation with one built on the
icu4x Rust crate (icu_calendar).

The icu4c API does not expose the píngqì month-assignment algorithm
used by the Chinese and Dangi lunisolar calendars. Our old code had to
approximate this by walking months via epoch millisecond arithmetic and
manually tracking leap month positions, which produced incorrect month
codes and ordinal month numbers for certain years. The icu4x calendar
crate handles píngqì natively.

With this patch, which is almost a 1-to-1 mapping of ICU invocations, we
pass 100% of all Temporal test262 tests.

The end goal might be to use icu4x for all of our ICU needs. But it does
not yet provide the APIs needed for all ECMA-402 prototypes.
2026-03-11 07:09:57 -04:00
Timothy Flynn
dc381c4ba7 LibUnicode: Preserve original ICU pattern in CalendarPattern
When AdjustDateTimeStyleFormat determines that no adjustment is needed
(no conflicting fields), the original date/time style pattern should be
used as-is for Temporal type formatting. Previously, the CalendarPattern
was round-tripped through ICU, which apparently can produce a different
pattern than the original result.
2026-03-09 19:02:59 +01:00
Timothy Flynn
b800c97ab8 LibJS+LibUnicode: Add support for non-ISO-8601 Temporal calendars
This adds international calendar support to our Temporal implementation,
using the Intl Era and Month Code Proposal as a guide. See:

https://tc39.es/proposal-intl-era-monthcode/
2026-03-09 11:40:59 +01:00
Timothy Flynn
b6699d439a LibUnicode: Add a cache for ICU calendar objects
Analogous to our existing locale and time zone caches.
2026-03-09 11:40:59 +01:00
Timothy Flynn
a41f2f56a8 LibJS+LibUnicode: Migrate some Temporal calendar types to LibUnicode
These will be needed for calendar operations involving ICU.
2026-03-09 11:40:59 +01:00
Timothy Flynn
88365031f2 LibJS+LibUnicode: Implement support for handling gaps in time zones 2026-03-09 11:40:59 +01:00
Timothy Flynn
49b09b3fbe LibJS+LibUnicode: Always format zoned minutes and seconds with 2 digits
This matches the behavior of other engines.
2026-03-09 11:40:59 +01:00
Timothy Flynn
e549d168ac LibUnicode: Update ICU to 78.2
Contains fixes needed for non-ISO-8601 calendars in Temporal.
2026-03-09 11:40:59 +01:00
InvalidUsernameException
1b2bca7831 LibUnicode: Print error codes when calls to ICU fail
Primary motivation for this change is the `VERIFY(icu_success(status))`
line in `Segmenter::create()` that was failing on multiple systems and
where we had to ask people to apply a patch to even know what the error
was.

Since this seems to be a recurring problem, let's just add a little
helper function and print the error codes returned by library calls.
2026-02-21 16:55:36 -05:00
aplefull
e1682424aa LibUnicode: Add Unicode case folding support for regex matching 2026-02-16 07:51:00 -05:00
Tim Ledbetter
d8be8e27ed LibUnicode: Add enum for UAX#14 line breaking classes 2026-02-14 16:23:18 -05:00
Tim Ledbetter
ae0f5ef9ce LibUnicode+LibWeb: Add infrastructure for line segmentation using ICU
No behavior change This is needed for correct UAX#14 line breaking.
2026-02-14 16:23:18 -05:00
Timothy Flynn
72a6f59df5 LibJS+LibUnicode: Support Intl.MathematicalValue in Intl.PluralRules
This is a normative change in the ECMA-402 spec. See:
https://github.com/tc39/ecma402/commit/7344f42

The main difference here is that Intl.PluralRules now supports BigInt.
2026-02-06 12:19:46 -05:00
ayeteadoe
5279e0ce73 CMake: Remove ENABLE_WINDOWS_CI option and adjust presets to match Unix
ENABLE_WINDOWS_CI and the *_CI presets were initially added back when
the AK library and all the AK Test* executables were the only targets
that supported building and running in CI. Since then, almost all the
targets in the codebase are built on Windows besides the following:
    - LibLine
    - test-262-runner

Since these targets above are not required to actually run or test the
browser on Windows in its current experimental state, fully disabling
them should be fine for now.

ENABLE_WINDOWS_CI was also used to exclude test-web from ctest. This
can be fully disabled on Windows for now until proper runtime support
is added.

The remaining locations were all using ENABLE_WINDOWS_CI as a proxy for
ENABLE_ADDRESS_SANITIZER, so we can just be explicit instead.

The new presets map much more directly to the unix Release, Debug, and
Sanitizer presets which should make setting up ladybird on Windows less
confusing.

We also make the new Windows_Experimental_Release preset the default in
ladybird.py to match Unix.
2026-01-16 11:15:16 -07:00
Luke Wilde
183323d329 LibUnicode: Add ability to find time zone transitions
This will be used for Temporal's ZonedDateTime#getTimeZoneTransition.
2026-01-16 07:00:02 -05:00
Andreas Kling
8cdfbfed49 LibUnicode+LibWeb: Add fast path grapheme segmenter for ASCII text
For ASCII text, every character is its own grapheme - there are no
combining characters or emoji sequences. This means grapheme boundary
detection is trivial: next_boundary(i) is simply i+1.

This commit adds AsciiGraphemeSegmenter, a simple Segmenter subclass
that performs O(1) boundary lookups without any ICU overhead.

TextNode::grapheme_segmenter() now checks if the text is ASCII and uses
this fast path, avoiding expensive ICU BreakIterator cloning and
boundary detection for the common case of ASCII-only text.
2026-01-11 11:10:19 +01:00
Marcos Del Sol Vives
77cab84620 LibUnicode: Update ICU to 78.1
Updated also IdnaTestV2 for Unicode 17.0.0
2025-12-08 11:29:12 -05:00
aplefull
a49c39de32 LibRegex: Support matching unicode multi-character sequences 2025-11-26 11:34:38 +01:00
aplefull
7ce4abe330 LibRegex+LibUnicode: Add unicode string properties 2025-10-24 13:24:55 -04:00
ayeteadoe
c5d17e2796 LibUnicode: Query timezone from host config when cache is stale
icu::TimeZone::createDefault() was returning the timezone that was set
when current_time_zone() was first called or the timezone set via
set_current_time_zone().

This meant that even when the system timezone changed and we instructed
all WebContent processes to invoke
ConnectionFromClient::system_time_zone_changed(), the updated timezone
system was not being used as we fetched from icu's default timezone.

icu::TimeZone::detectHostTimeZone() gets the timezone from the current
host system configuration and ensures we are always synced with the host
if we have no timezone cache.
2025-10-05 15:46:15 +02:00
Jelle Raaijmakers
b51cc00478 LibWeb: Resolve Unicode FIXME in forwardDelete 2025-09-16 06:57:30 -04:00
ayeteadoe
97e8a922ad ImageDecoder: Enable in Windows CI 2025-08-23 16:04:36 -06:00
Timothy Flynn
8472e469f4 AK+LibJS+LibWeb: Recognize that our UTF-16 string is actually WTF-16
For the web, we allow a wobbly UTF-16 encoding (i.e. lonely surrogates
are permitted). Only in a few exceptional cases do we strictly require
valid UTF-16. As such, our `validate(AllowLonelySurrogates::Yes)` calls
will always succeed. It's a wasted effort to ever make such a check.

This patch eliminates such invocations. The validation methods will now
only check for strict UTF-16, and are only invoked when needed.
2025-08-13 09:56:13 -04:00
Timothy Flynn
a740bfd8ff AK+LibUnicode: Implement Unicode-aware UTF-16 case transformations 2025-07-25 18:16:22 +02:00
Timothy Flynn
8b6e3cb735 LibWeb+LibUnicode+WebContent: Port DOM:CharacterData to UTF-16
This replaces the underlying storage of CharacterData with Utf16String
and deals with the fallout.
2025-07-24 19:00:20 +02:00
Timothy Flynn
173bb67004 LibJS+LibUnicode: Port Intl.RelativeTimeFormat to UTF-16 strings 2025-07-24 10:39:52 +02:00
Timothy Flynn
abcb2d42bc LibUnicode: Port Intl.PluralRules to UTF-16 strings 2025-07-24 10:39:52 +02:00
Timothy Flynn
6fe0e13474 LibJS+LibUnicode: Port Intl.DurationFormat to UTF-16 strings 2025-07-24 10:39:52 +02:00
Timothy Flynn
e637e148d4 LibJS+LibUnicode: Port Intl.NumberFormat to UTF-16 strings 2025-07-24 10:39:52 +02:00