LibRegex: Add ECMAScriptRegex and migrate callers

Add `ECMAScriptRegex`, LibRegex's C++ facade for ECMAScript regexes. The facade owns compilation, execution, captures, named groups, and error translation for the Rust backend, which lets callers stop depending on the legacy parser and matcher types directly. Use it in the remaining non-LibJS callers: URLPattern, HTML input pattern handling, and the places in LibHTTP that only needed token validation. Where a full regex engine was unnecessary, replace those call sites with direct character checks. Also update focused LibURL, LibHTTP, and WPT coverage for the migrated callers and corrected surrogate handling.
Author: https://github.com/awesomekling Commit: https://github.com/LadybirdBrowser/ladybird/commit/34d954e2d70 Pull-request: https://github.com/LadybirdBrowser/ladybird/pull/8612 Reviewed-by: https://github.com/jdahlin Reviewed-by: https://github.com/trflynn89
2026-04-26 09:45:06 +02:00 · 2026-03-25 10:52:20 +01:00 · 2026-03-27 16:35:21 +00:00
parent 66fb0a8394
commit 34d954e2d7
21 changed files with 394 additions and 104 deletions
--- a/Libraries/LibHTTP/HTTP.h
+++ b/Libraries/LibHTTP/HTTP.h
@@ -39,6 +39,36 @@ constexpr bool is_http_tab_or_space(u32 code_point)
    return code_point == 0x09u || code_point == 0x20u;
 }

+constexpr bool is_http_token_code_point(u32 code_point)
+{
+    if ((code_point >= '0' && code_point <= '9')
+        || (code_point >= 'A' && code_point <= 'Z')
+        || (code_point >= 'a' && code_point <= 'z')) {
+        return true;
+    }
+
+    switch (code_point) {
+    case '!':
+    case '#':
+    case '$':
+    case '%':
+    case '&':
+    case '\'':
+    case '*':
+    case '+':
+    case '-':
+    case '.':
+    case '^':
+    case '_':
+    case '`':
+    case '|':
+    case '~':
+        return true;
+    default:
+        return false;
+    }
+}
+
 enum class HttpQuotedStringExtractValue {
    No,
    Yes,