ladybird

mirror of https://github.com/LadybirdBrowser/ladybird synced 2026-04-26 01:35:08 +02:00

Author	SHA1	Message	Date
Andreas Kling	34d954e2d7	LibRegex: Add ECMAScriptRegex and migrate callers Add `ECMAScriptRegex`, LibRegex's C++ facade for ECMAScript regexes. The facade owns compilation, execution, captures, named groups, and error translation for the Rust backend, which lets callers stop depending on the legacy parser and matcher types directly. Use it in the remaining non-LibJS callers: URLPattern, HTML input pattern handling, and the places in LibHTTP that only needed token validation. Where a full regex engine was unnecessary, replace those call sites with direct character checks. Also update focused LibURL, LibHTTP, and WPT coverage for the migrated callers and corrected surrogate handling.	2026-03-27 17:32:19 +01:00
Shannon Booth	1be69479a6	LibURL+Elsewhere: Consider file:// origins opaque by default This aligns our behaviour closer to other browsers, which _mostly_ consider file scheme URLs as opaque. For test purposes, allow overriding this behaviour with a commandline flag.	2026-02-21 23:00:57 +01:00
Shannon Booth	c8dc5ea27c	LibURL: Optimize parsing of URLs in authority state Previously the authority state was parsed character-by-character in a loop, appending each character to a buffer. When an '@' character was encountered, the entire buffer would be re-processed to extract and percent-encode the username and password portions. This created O(n^2) behavior for URLs with multiple '@' characters, as each '@' would trigger reprocessing of all previously buffered content. This commit changes the authority state parser to process the authority section in chunks rather than character-by-character: 1. Find the next delimiter ('@', '/', '?', '#', or '\' for special URLs) 2. Process the entire chunk up to that delimiter at once 3. Directly extract username/password from the chunk without buffering With an additional change of switching to: iterator_at_byte_offset_without_validation (which does not have any loops), this reduces the time complexity to O(n) and the included test found by fuzzing to actually complete parsing :^)	2026-01-26 18:46:59 +01:00
Shannon Booth	89dbdd3411	LibURL: Add domain concept to URL::Origin to fix same-origin-domain The same-origin domain check always returned true if only the scheme matched. This was because of missing steps to check for the origin's domain, which didn't exist. Add this concept to URL::Origin, even though we do not use it at this stage in document.domain setter. Co-Authored-By: Luke Wilde <luke@ladybird.org>	2025-12-30 13:02:10 +01:00
Shannon Booth	2c11e03582	LibURL: Return a proper registrable domain if it's missing in the PSL Instead of '*', which is a nonsensical value. Similar to what we do for determining the public suffix, if no match could be made via the PSL algorithm, then take everything after the second dot as the registrable domain. This prevents us from considering e.g. b.b.example and a.example as the same site.	2025-12-30 12:40:27 +01:00
Tim Ledbetter	15518f119c	Tests: Add some basic public suffix tests	2025-10-23 15:01:13 +02:00
Viktor Szépe	1c01e183b7	Everywhere: Fix even more typos	2025-08-27 08:48:01 +02:00
ayeteadoe	25f5936dee	CMake: Rename serenity_* helper functions/macros to ladybird_*	2025-07-03 23:19:41 +02:00
Shannon Booth	bd67a5afaa	LibURL: Differentiate cross site opaque origins Previously if we had two opaque origins both URLs were being treated as same site.	2025-06-30 08:06:37 +01:00
Shannon Booth	b49b1b35e4	LibURL: Correct logic for domains not matched by PSL in public_suffix For the AO defined in the URL specification, in the case the domain does not match against the PSL, we should be returning the TLD. This fixes a crash for a bunch of WPT tests using the Document.domain setter when the test is being served by WPT locally. We should be doing similar logic in registrable_domain, but that unfortunately runs into some other issues, so just leave a FIXME for now.	2025-06-29 12:47:57 +01:00
stasoid	8d33a97630	Tests/LibURL: Port to Windows	2025-06-01 16:42:19 -06:00
Shannon Booth	8e37cd2f71	LibURL: Remove URL's valid state No code now relies on using URL's valid state. A URL can still be _technically_ invalid through use of the URL constructor or by directly changing URL fields. However, all URLs should be constructed through the URL parser, and we should ideally be getting rid of the default constructor at some stage. Also, any code which is manually setting URL fields need to be aware that this is full of pitfalls since there are many different forms of canonicalization which is bypassed by not going through the URL parser.	2025-04-19 07:18:43 -04:00
Shannon Booth	00bbb2105b	LibURL: Port create_with_file_scheme to Optional Removing one of the main remaining users of URL valid state.	2025-04-19 07:18:43 -04:00
Shannon Booth	3f73cd30a2	LibURL: Rename 'cannot have a base URL' to 'has an opaque path' This follows a rename made in the URL specification.	2025-04-06 08:24:54 -04:00
Shannon Booth	e369756e9c	LibURL/Pattern: Implement the constructor string parser This is missing one small bit of functionality where the not-yet impplemented component compilation is required.	2025-03-15 07:39:03 -04:00
Timothy Flynn	a34f7a5bd1	LibURL: Correctly acquire the registrable domain for a URL We were using the public suffix of the URL's host as its registrable domain. But the registrable domain is actually the public suffix plus one additional label.	2025-03-11 12:10:42 +01:00
Shannon Booth	d62cf0a807	Everywhere: Remove some use of the URL constructors These make it too easy to construct an invalid URL, which makes it difficult to remove the valid state of URL - which this API relies on.	2025-02-19 08:01:35 -05:00
Shannon Booth	53826995f6	LibURL+LibWeb: Port URL::complete_url to Optional Removing one more source of the URL::is_valid API.	2025-02-15 17:05:55 +00:00
Shannon Booth	5bed8f4055	LibURL+LibWeb: Make URL::basic_parse return an Optional<URL> URL::basic_parse has a subtle bug where the resulting URL is not set to valid when StateOveride is provided and the URL parser early returns a valid URL. This has not surfaced as a problem so far, as the only users of the state override API provide an already valid URL buffer and also ignore the result of basic parsing with a state override. However, this bug surfaces implementing the URL pattern spec, which as part of URL canonicalization: * Provides a dummy URL record * Basic URL parses that URL with state override * Checks the result of the URL parser to validate the URL While we could set URL validity on every early return of the URL parser during state override, it has been a long standing FIXME around the code to try and remove the awkward validity state of the URL class. So this commit makes the first stage of this change by migrating the basic parser API to return Optional, which also happens to make this subtle issue not a problem any more.	2025-01-11 10:08:29 -05:00
Sam Atkins	900c131178	LibURL: Make URL::serialized_host() infallible This can no longer fail, so update the return type to match. This makes a few more methods now unable to return errors, but one thing at a time. 😅	2024-11-30 12:07:39 +01:00
Sam Atkins	90e763de4c	LibURL: Replace Host's Empty state with making Url's Host optional A couple of reasons: - Origin's Host (when in the tuple state) can't be null - There's an "empty host" concept in the spec which is NOT the same as a null Host, and that was confusing me.	2024-11-30 12:07:39 +01:00
Gingeh	c10cb8ac8d	LibURL: Use UTF-8 for percent encoding URL fragments	2024-10-23 11:30:59 -06:00
Andreas Kling	cc4b3cbacc	Meta: Update my e-mail address everywhere	2024-10-04 13:19:50 +02:00
Shannon Booth	8723f72f0f	LibURL: Remove unspecified steps in URL file slash parsing state There were some extra steps in there which produced wrong results for relative file URLs. Fixes 7 test cases in: https://wpt.live/url/url-constructor.any.html We also need to adjust the test results in TestURL. The behaviour tested does not match how URL is specified to work as an absolute relative is given.	2024-08-06 07:58:07 +01:00
Shannon Booth	cc55732332	LibURL+Everywhere: Only percent decode URL paths when actually needed Web specs do not return through javascript percent decoded URL path components - but we were doing this in a number of places due to the default behaviour of URL::serialize_path. Since percent encoded URL paths may not contain valid UTF-8 - this was resulting in us crashing in these places. For example - on an HTMLAnchorElement when retrieving the pathname for the URL of: http://ladybird.org/foo%C2%91%91 To fix this make the URL class only return the percent encoded serialized path, matching the URL spec. When the decoded path is required instead explicitly call URL::percent_decode. This fixes a crash running WPT URL tests for the anchor element on: https://wpt.live/url/a-element.html	2024-08-05 09:58:13 +02:00
Shannon Booth	fdf4f1e887	LibURL: Validate for invalid _domain_ code points for non-opaque domains We were previously not checking for C0 control, U+0025 (%), or U+007F DELETE. This makes another good set of URL tests in WPT pass :^)	2024-08-04 18:29:06 +01:00
Shannon Booth	f511c0b441	LibURL+LibWeb: Do not percent decode in password/username getters Doing it is not part of the spec. Whenever needed, the spec will explicitly percent decode the username and password. This fixes some URL WPT tests.	2024-08-04 12:59:02 +01:00
Tim Ledbetter	1a4b042664	LibURL: Convert ASCII only URLs to lowercase during parsing This fixes an issue where entering EXAMPLE.COM into the URL bar in the browser would fail to load as expected.	2024-06-10 20:34:57 -04:00
Timothy Flynn	24ecf31ff5	LibURL+LibWeb: Move data URL processing to LibWeb's fetch infrastructure This is a fetching AO and is only used by LibWeb in the context of fetch tasks. Move it to LibWeb with other fetch methods. The main reason for this is that it requires the use of other LibWeb AOs such as the forgiving Base64 decoder and MIME sniffing. These AOs aren't available within LibURL.	2024-03-25 08:13:27 +01:00
Shannon Booth	e800605ad3	AK+LibURL: Move AK::URL into a new URL library This URL library ends up being a relatively fundamental base library of the system, as LibCore depends on LibURL. This change has two main benefits: * Moving AK back more towards being an agnostic library that can be used between the kernel and userspace. URL has never really fit that description - and is not used in the kernel. * URL _should_ depend on LibUnicode, as it needs punnycode support. However, it's not really possible to do this inside of AK as it can't depend on any external library. This change brings us a little closer to being able to do that, but unfortunately we aren't there quite yet, as the code generators depend on LibCore.	2024-03-18 14:06:28 -04:00

30 Commits