Add `ECMAScriptRegex`, LibRegex's C++ facade for ECMAScript regexes.
The facade owns compilation, execution, captures, named groups, and
error translation for the Rust backend, which lets callers stop
depending on the legacy parser and matcher types directly. Use it in the
remaining non-LibJS callers: URLPattern, HTML input pattern handling,
and the places in LibHTTP that only needed token validation.
Where a full regex engine was unnecessary, replace those call sites with
direct character checks. Also update focused LibURL, LibHTTP, and WPT
coverage for the migrated callers and corrected surrogate handling.
Add LibRegex's new Rust ECMAScript regular expression engine.
Replace the old parser's direct pattern-to-bytecode pipeline with a
split architecture: parse patterns into a lossless AST first, then
lower that AST into bytecode for a dedicated backtracking VM. Keep the
syntax tree as the place for validation, analysis, and optimization
instead of teaching every transformation to rewrite partially built
bytecode.
Specialize this backend for the job LibJS actually needs. The old C++
engine shared one generic parser and matcher stack across ECMA-262 and
POSIX modes and supported both byte-string and UTF-16 inputs. The new
engine focuses on ECMA-262 semantics on WTF-16 data, which lets it
model lone surrogates and other JavaScript-specific behavior directly
instead of carrying POSIX and multi-encoding constraints through the
whole implementation.
Fill in the ECMAScript features needed to replace the old engine for
real web workloads: Unicode properties and sets, lookahead and
lookbehind, named groups and backreferences, modifier groups, string
properties, large quantifiers, lone surrogates, and the parser and VM
corner cases those features exercise.
Reshape the runtime around compile-time pattern hints and a hotter VM
loop. Pre-resolve Unicode properties, derive first-character,
character-class, and simple-scan filters, extract safe trailing
literals for anchored patterns, add literal and literal-alternation
fast paths, and keep reusable scratch storage for registers,
backtracking state, and modifier stacks. Teach `find_all` to stay
inside one VM so global searches stop paying setup costs on every
match.
Make those shortcuts semantics-aware instead of merely fast. In Unicode
mode, do not use literal fast paths for lone surrogates, since
ECMA-262 must not let `/\ud83d/u` match inside a surrogate pair.
Likewise, only derive end-anchor suffix hints when the suffix lies on
every path to `Match`, so lookarounds and disjunctions cannot skip into
a shared tail and produce false negatives.
This commit lands the Rust crate, the C++ wrapper, the build
integration, and the initial LibJS-side plumbing needed to exercise
the new engine under real RegExp callers before removing the legacy
backend.
For some text decorations (e.g. underline) we were using the line's top
edge as the Y-coordinate to draw the line at, which combined with the
line's thickness meant that it was positioned too high up.
Correct this by calculating the line's center Y position.
Use Skia's SkTextBlob::getIntercepts() to find where glyph outlines
cross the underline/overline band, then split the decoration line into
segments with gaps around those intersections.
Currently this only applies to the `@property` `syntax` descriptor.
As with custom properties in the previous commit we assumed that any
consumed values were valid but that's not the case.
The definition of syntax in the "css-properties-values-api" spec (which
is used for the `@property/syntax` descriptor) is slightly different
from the definition of `<syntax>` in the "css-values" spec (which we
implement) in that it limits literal idents to exclusively
`<custom-ident>`s (i.e. not CSS-wide keywords or "default").
`<custom-ident>`s are also case-sensitive so that behavior is
implemented for syntax matching here as well
Previously we would consider an alternative syntax child to be a match
as long as parsing produced a value, even if there were trailing tokens
(which would later invalidate it within `parse_with_a_syntax`). This
meant that we wouldn't consider later alternatives which may actually
produce a valid match.
Other browsers appear to only do this for form submission, not for
all javascript URL navigations. Let's remove the handling in the
general javascript URL navigation handling so that our behaviour
diference to other browsers is limited specifically to form
elements, instead of the general case.
Unfortunately this does (expectedly) cause the test added in
3e0ea4f62 to start timing out, so that test is marked as skipped.
If we conform with the HTML specification by processing the iframe
synchronously as part of the iframe post connection steps, this test
will time out as the load event is fired in the appendChild call.
Create the promise for the load event before performing the load event
to handle this situation, which also allows the test to pass in
chromium.
This requires us to front load computation of writing-mode and direction
before we encounter any logical aliases or their physical counterparts
so that we can create a mapping context.
Doing this at compute rather than cascade time achieves a few things:
1) Brings us into line with the spec
2) Avoids the double cascade that was previously required to compute
mapping contexts
3) We now compute values of logical aliases, while
`style_value_for_computed_property` maps logical aliases to their
physical counterparts, this didn't account for all cases (i.e. if
there was no layout node, Typed OM, etc).
4) Removes a hurdle to moving other upstream processes (i.e. arbitrary
substitution function resolution, custom property computation) to
compute time as the spec requires.
The hover invalidation code only tried matching ::before/::after
selectors when has_pseudo_element() returned true, which requires an
existing layout node. A pseudo-element that doesn't exist yet (because
its content is only set by a hover rule) has no layout node, so the
match was skipped and hovering never triggered a style recompute.
Always try ::before/::after selectors during hover invalidation.
We no longer try to resolve calculated values at parse time which means
we support relative lengths.
We now clamp negative values rather than rejecting them at parse time.
Parsing has been inlined into `parse_ratio_value` and `parse_ratio` has
been removed since `parse_ratio_value` was the only caller
We only checked if the paintable box had scrollable overflow, but that
was too simple - we now use the same logic that checks whether a box can
be scrolled by a mousewheel event.
The autoscrolling test was updated as well to use rAF+rAF+timeout
instead of a fixed 1000ms timeout, which is prone to flakiness.
Imported WPT tests often load resources using canonical absolute paths,
for example:
/html/browsers/windows/resources/message-parent.html
Our usual strategy is to rewrite such URLs to relative paths during
import so they resolve from the importing test HTML file (which also
allows for loading this HTML as a file://).
That works for same-origin relative loading with the echo server, but it
breaks down when a test constructs a cross-origin URL from an absolute
path, for example:
get_host_info().REMOTE_ORIGIN + "/some/wpt/path.html"
In that case the resource still needs to be fetched at its canonical
absolute path, so rewriting it relative to the importing document is no
longer sufficient.
Update the HTTP test server so /static/ continues to serve from the
general test root, other non-echo requests are served from the imported
WPT tree, and dynamically registered echo responses live under /echo/
to avoid path conflicts.
Make some local file edits to get-host-info.sub.js such that
get_host_info() refers to the echo server through localhost,
and the cross origin server through 127.0.0.1.
This allows many imported WPT tests using this function work
correctly (if loaded from the echo HTTP server).