10 Commits

Author SHA1 Message Date
Timothy Flynn
10ce847931 LibJS+LibUnicode: Use LibUnicode as appropriate for lexing JavaScript
Now that LibUnicode exports its character type APIs in Rust, we can use
them to lex identifiers and whitespace.

Fixes #8870.
2026-04-19 10:39:26 +02:00
Timothy Flynn
11719369e8 LibRegex+LibUnicode: Migrate Unicode Rust FFI methods to LibUnicode
Let's not have LibRegex be the home of LibUnicode FFI. Move these to
LibUnicode so that we can:

1. Use these helpers in other libraries more easily.
2. Swap out icu4c methods with icu4x methods all within LibUnicode.
2026-04-19 10:39:26 +02:00
Zaggy1024
3cfe1f7542 LibGfx: Implement YUV->RGBA color conversion for CPU painting
Using the Rust yuv crate, eagerly convert from YUV to RGBA on the CPU
when a GPU context is unavailable.

Time spent converting an 8-bit YUV frame with this crate is better than
libyuv on ARM by about 20%, and on x86 with AVX2, it achieves similar
numbers to libyuv.
2026-04-18 01:25:00 -05:00
Timothy Flynn
4b1ecbc9df LibJS+LibUnicode: Update icu4x's calendar module to 2.2.0
First: We now pin the icu4x version to an exact number. Minor version
upgrades can result in noisy deprecation warnings and API changes which
cause tests to fail. So let's pin the known-good version exactly.

This patch updates our Rust calendar module to use the new APIs. This
initially caused some test failures due to the new Date::try_new API
(which is the recommended replacement for Date::try_new_from_codes)
having quite a limited year range of +/-9999. So we must use other
APIs (Date::try_from_fields and calendrical_calculations::gregorian)
to avoid these limits.

http://github.com/unicode-org/icu4x/blob/main/CHANGELOG.md#icu4x-22
2026-04-14 18:12:31 -04:00
Andreas Kling
66fb0a8394 LibRegex/Rust: Add the ECMA-262 regex engine
Add LibRegex's new Rust ECMAScript regular expression engine.

Replace the old parser's direct pattern-to-bytecode pipeline with a
split architecture: parse patterns into a lossless AST first, then
lower that AST into bytecode for a dedicated backtracking VM. Keep the
syntax tree as the place for validation, analysis, and optimization
instead of teaching every transformation to rewrite partially built
bytecode.

Specialize this backend for the job LibJS actually needs. The old C++
engine shared one generic parser and matcher stack across ECMA-262 and
POSIX modes and supported both byte-string and UTF-16 inputs. The new
engine focuses on ECMA-262 semantics on WTF-16 data, which lets it
model lone surrogates and other JavaScript-specific behavior directly
instead of carrying POSIX and multi-encoding constraints through the
whole implementation.

Fill in the ECMAScript features needed to replace the old engine for
real web workloads: Unicode properties and sets, lookahead and
lookbehind, named groups and backreferences, modifier groups, string
properties, large quantifiers, lone surrogates, and the parser and VM
corner cases those features exercise.

Reshape the runtime around compile-time pattern hints and a hotter VM
loop. Pre-resolve Unicode properties, derive first-character,
character-class, and simple-scan filters, extract safe trailing
literals for anchored patterns, add literal and literal-alternation
fast paths, and keep reusable scratch storage for registers,
backtracking state, and modifier stacks. Teach `find_all` to stay
inside one VM so global searches stop paying setup costs on every
match.

Make those shortcuts semantics-aware instead of merely fast. In Unicode
mode, do not use literal fast paths for lone surrogates, since
ECMA-262 must not let `/\ud83d/u` match inside a surrogate pair.
Likewise, only derive end-anchor suffix hints when the suffix lies on
every path to `Match`, so lookarounds and disjunctions cannot skip into
a shared tail and produce false negatives.

This commit lands the Rust crate, the C++ wrapper, the build
integration, and the initial LibJS-side plumbing needed to exercise
the new engine under real RegExp callers before removing the legacy
backend.
2026-03-27 17:32:19 +01:00
Andrew Kaster
6b1a346fad LibUnicode: Generate FFI bindings with cbindgen
This places the exported structs and functions in the Unicode::FFI
namespace.
2026-03-18 08:36:13 -04:00
Andrew Kaster
92e4c20ad5 LibJS: Generate FFI header using cbindgen instead of hand-rolling
Replace the BytecodeFactory header with cbindgen.

This will help ensure that types and enums and constants are kept in
sync between the C++ and Rust code. It's also a step in exporting more
Rust enums directly rather than relying on magic constants for
switch statements.

The FFI functions are now all placed in the JS::FFI namespace, which
is the cause for all the churn in the scripting parts of LibJS and
LibWeb.
2026-03-17 20:49:50 -05:00
Timothy Flynn
86c8a57794 LibJS+LibUnicode: Use icu4x for Temporal calendar operations
Replace the icu4c-based calendar implementation with one built on the
icu4x Rust crate (icu_calendar).

The icu4c API does not expose the píngqì month-assignment algorithm
used by the Chinese and Dangi lunisolar calendars. Our old code had to
approximate this by walking months via epoch millisecond arithmetic and
manually tracking leap month positions, which produced incorrect month
codes and ordinal month numbers for certain years. The icu4x calendar
crate handles píngqì natively.

With this patch, which is almost a 1-to-1 mapping of ICU invocations, we
pass 100% of all Temporal test262 tests.

The end goal might be to use icu4x for all of our ICU needs. But it does
not yet provide the APIs needed for all ECMA-402 prototypes.
2026-03-11 07:09:57 -04:00
Andreas Kling
d2760d09b7 LibJS: Extract BytecodeDef into a shared Rust crate
Move the Bytecode.def parser, field type info, and layout computation
out of Rust/build.rs into a standalone BytecodeDef crate. This allows
both the Rust bytecode codegen (build.rs) and the upcoming AsmIntGen
tool to share a single source of truth for instruction field offsets
and sizes.

The AsmIntGen directory is excluded from the workspace since it has
its own Cargo.toml and is built separately by CMake.
2026-03-07 13:09:59 +01:00
Andreas Kling
6cdfbd01a6 LibJS: Add alternative source-to-bytecode pipeline in Rust
Implement a complete Rust reimplementation of the LibJS frontend:
lexer, parser, AST, scope collector, and bytecode code generator.

The Rust pipeline is built via Corrosion (CMake-Cargo bridge) and
linked into LibJS as a static library. It is gated behind a build
flag (ENABLE_RUST, on by default except on Windows) and two runtime
environment variables:

- LIBJS_CPP: Use the C++ pipeline instead of Rust
- LIBJS_COMPARE_PIPELINES=1: Run both pipelines in lockstep,
  aborting on any difference in AST or bytecode generated.

The C++ side communicates with Rust through a C FFI layer
(RustIntegration.cpp/h) that passes source text to Rust and receives
a populated Executable back via a BytecodeFactory interface.
2026-02-24 09:39:42 +01:00