This aligns our behaviour closer to other browsers, which
_mostly_ consider file scheme URLs as opaque. For test
purposes, allow overriding this behaviour with a commandline
flag.
This was previously a depdency due to the use of
Crypto::get_secure_random for the nonce used for an opaque origin.
Now that this has been moved to AK, we no longer have any dependency
on LibCrypto.
Previously the authority state was parsed character-by-character in a
loop, appending each character to a buffer. When an '@' character was
encountered, the entire buffer would be re-processed to extract and
percent-encode the username and password portions.
This created O(n^2) behavior for URLs with multiple '@' characters, as
each '@' would trigger reprocessing of all previously buffered content.
This commit changes the authority state parser to process the authority
section in chunks rather than character-by-character:
1. Find the next delimiter ('@', '/', '?', '#', or '\' for special URLs)
2. Process the entire chunk up to that delimiter at once
3. Directly extract username/password from the chunk without buffering
With an additional change of switching to:
iterator_at_byte_offset_without_validation (which does not have any
loops), this reduces the time complexity to O(n) and the included
test found by fuzzing to actually complete parsing :^)
LibURL previously assigned a default port to the IRC schemes,
a carryover from SerenityOS where IRC is supported.
This behavior deviates from the URL Standard and affects URL parsing by
eliding an explicitly specified port when it matches the default (this
is considered a legacy behaviour of the web URL schemes). Remove the IRC
default port to restore spec-compliant behavior.
AK/Random is already the same as SecureRandom. See PR for more details.
ProcessPrng is used on Windows for compatibility w/ sandboxing measures
See e.g. https://crbug.com/40277768
The same-origin domain check always returned true if only the scheme
matched. This was because of missing steps to check for the origin's
domain, which didn't exist. Add this concept to URL::Origin, even
though we do not use it at this stage in document.domain setter.
Co-Authored-By: Luke Wilde <luke@ladybird.org>
Instead of '*', which is a nonsensical value. Similar to what we do
for determining the public suffix, if no match could be made via the
PSL algorithm, then take everything after the second dot as
the registrable domain.
This prevents us from considering e.g. b.b.example and
a.example as the same site.
This required some changes in LibURL & LibIPC since it has its own
definition of an BlobURLEntry. For now, we don't have a concrete usage
of MediaSource in LibURL so it is defined as an empty struct.
This removes one FIXME in an idl file.
Our floating point number parser was based on the fast_float library:
https://github.com/fastfloat/fast_float
However, our implementation only supports 8-bit characters. To support
UTF-16, we will need to be able to convert char16_t-based strings to
numbers as well. This works out-of-the-box with fast_float.
We can also use fast_float for integer parsing.
For the AO defined in the URL specification, in the case the
domain does not match against the PSL, we should be returning
the TLD. This fixes a crash for a bunch of WPT tests using the
Document.domain setter when the test is being served by WPT
locally.
We should be doing similar logic in registrable_domain, but that
unfortunately runs into some other issues, so just leave a FIXME
for now.
It is confusing to have both URL::Host::public_suffix and
URL:get_public_suffix, both with slightly different semantics.
Instead, use PublicSuffixData for cases that just want a direct
match against the list, and URL::Host::public_suffix in LibWeb
land as the URL spec defined AO.
I believe this is in the specification since the spec technically
requires passing through a valid unicode string. However, our
implementation already handles a non valid unicode string, and will
do the replacement character substitution.
Opaque origins are meant to be unique in terms of equality from
one another. Since this uniqueness needs to be across processes,
use a nonce to implement the uniqueness check.
Instead, porting over all users to use the newly created
Origin::create_opaque factory function. This also requires porting
over some users of Origin to avoid default construction.
The spec seems to indicate in its wording that while opaque
origins only serialize to 'null', they can still be tested
for equality with one another. Probably we will need to
generate some unique ID which is unique across processes.
No code now relies on using URL's valid state.
A URL can still be _technically_ invalid through use of the URL
constructor or by directly changing URL fields.
However, all URLs should be constructed through the URL parser,
and we should ideally be getting rid of the default constructor
at some stage.
Also, any code which is manually setting URL fields need to be
aware that this is full of pitfalls since there are many different
forms of canonicalization which is bypassed by not going through
the URL parser.
Creating a URL should almost always go through the URLParser to
handle all of the small edge cases involved. This reduces the
need for URL valid state.
There's a bit of a UTF-8 assumption with this change. But nearly every
caller of these methods were immediately creating a String from the
resulting ByteString anyways.
We were not properly handling the case that prefix code point was the
empty string (which we represent as an OptionalNone). While this
still resulted in the correct pattern string being generated, an
incorrect regular expression was being generated causing matching
to fail.
This has no functional difference as far as I can tell, but for
clarity explicitly do not attempt to do this, which has the nice
side effect of not checking for whitespace known to not exist.
It turns out that the problem here was simply that we were trimming
trailing whitespace when we did not need to, which was meaning that
the port number of '80 ' was being converted to the empty string
per URLPattern elision as the port matches the http scheme.
Corresponds to URL spec change:
https://github.com/whatwg/url/commit/cc8b776b
Note that the new test failure being introduced here is an unrelated
WPT test change bundled in the resources test file update that I am
not convinced is correct.
The URL spec represents its path as a:
Variant<String, Vector<String>>
A URL is defined has having an opaque path if it has a single String,
the URL path otherwise containing a list of path components.
We (like in an older version of the spec) track this through a boolean
and only use a Vector with a single component for opaque paths.
This means it was incorrect to simple assign the path to a list with
a single empty string without setting that URL as opaque, which
meant that the path serialization was producing incorrect results.
It may make sense changing the API so this situation is a little more
clear. But for now, we simply need to set the opaque path boolean
to true here.