handle_connect_state (used by <link rel=preconnect>) attached easy
handles to the multi without setting CURLOPT_RESOLVE, so libcurl spawned
its own threaded resolver for them. When the system stub resolver was
slow, the thread got stuck and the next ~Request pthread_join'd it on
the main thread for many seconds.
Route Connect-mode requests through the same DNS + CURLOPT_RESOLVE path
that Fetch uses, so libcurl never spawns the thread.
Per-request and per-connection logging that surfaces enough detail
to diagnose where time goes when a page load misbehaves. Gated by a
new REQUESTSERVER_WIRE_DEBUG cmakedefine.
Documentation/RequestServerWireLogging.md describes each label
(wire/wire+/wire++/wire^, wire-batch, wire-stall, wire-burst,
wire-pipe-pressure, LibDNS wire-dns, UI wire-cookie) and how to read
them.
write_queued_bytes_without_blocking() used to allocate a Vector sized
to the entire queued response buffer and memcpy every queued byte into
it, on every curl on_data_received callback. When the client pipe was
slower than the network, the buffer grew, and each arriving chunk
triggered a full copy of everything still queued. With enough pending
data this added up to tens of gigabytes of memcpy per request and
stalled RequestServer for tens of seconds.
Drain the stream in a loop using peek_some_contiguous() + send()
directly from the underlying chunk, and discard exactly what the
socket accepted. No intermediate buffer, no copy. The loop exits on
EAGAIN and enables the writer notifier, matching the previous
back-pressure behavior.
When we request the HTTP cookie for a SWR request, we were providing the
cookie to the standard request corresponding to the SWR request's ID.
This had two effects:
1. The SWR request would never finish.
2. If the corresponding standard request happened to be a connect-only
request, this would result in a crash as we were expecting it to have
gone through the normal fetch process.
This was seen on some articles on news.google.com.
The Fetch spec defines HTTP whitespace as tab, LF, CR, and space.
Previously, trim_whitespace was also stripping vertical tab (U+000B)
and form feed (U+000C), which are not HTTP whitespace characters.
Switch to HTTP::normalize_header_value which matches the fetch
definition.
Fixes 4 subtests for WPT test:
https://wpt.live/cors/origin.htm
It's possible for the client (WebContent) to stop a request while the
request is waiting for an async callback to be invoked. So we cannot
assume the request itself will still be alive once these callbacks are
finally invoked.
Substituted responses were missing the Access-Control-Allow-Origin
header, causing the CORS check to fail and the script to be treated as
a network error.
Fix this by adding `Access-Control-Allow-Origin: *` to all substituted
responses.
Also include the script's src attribute in the error message when a
script element's result is null, to make debugging easier.
The caching RFC is quite strict about the format of date strings. If we
received a revalidation attribute with an invalid date string, we would
previously fail a runtime assertion. This was because to start a
revalidation request, we would simply check for the presence of any
revalidation header; but then when we issued the request, we would fail
to parse the header, and end up with all attributes being null.
We now don't parse the revalidation attributes at all. Whatever we
receive in the Last-Modified response header is what we will send in the
If-Modified-Since request header, verbatim. For better or worse, this is
how other browsers behave. So if the server sends us an invalid date
string, it can receive its own date format for revalidation.
We currently attach HTTP cookie headers from LibWeb within Fetch. This
has the downside that the cookie IPC, and the infrastructure around it,
are all synchronous. This blocks the WebContent process entirely while
the cookie is being retrieved, for every request on a page.
We now attach cookie headers from RequestServer. The state machine in
RequestServer::Request allows us to easily do this work asynchronously.
We can also skip this work entirely when the response is served from
disk cache.
Note that we will continue to parse cookies in the WebContent process.
If something goes awry during parsing. we limit the damage to that
process, instead of the UI or RequestServer.
Also note that WebSocket requests still have cookie headers attached
attached from LibWeb. This will be handled in a future patch.
In the future, we may want to introduce a memory cache for cookies in
RequestServer to avoid IPC altogether as able.
We don't want to be terminated when we write to a pipe that's closed,
so set SIGPIPE to be ignored in main. We already pass MSG_NOSIGNAL when
sending over our socket in RequestPipe, so this is a secondary measure.
However, cURL assumes by default that SIGPIPE is unhandled, so before
any operations that interact with pipes, they set their own handler,
interact with the pipe, then restore the original handler. Since we now
ignore the signal, we can just tell cURL not to do this extra work.
The only-if-cached directive currently behaves differently in the HTTP
Caching RFC compared to the Fetch spec. In the former, we must return
an HTTP 504 response if we do not find a cache entry. In the latter, we
must return a network error.
Note that similar to commit aa1517b727, we
cannot test the only-if-cached directive. We implement a same-origin
restriction aligned with the Fetch API that prevents our test infra from
excercising this directive.
If a request failed, or was stopped, do not attempt to write the cache
entry footer to disk. Note that at this point, the cache index will not
have been created, thus this entry will not be used in the future. We do
still delete any partial file on disk.
This serves as a more general fix for the issue addressed in commit
9f2ac14521.
We now partition the HTTP disk cache based on the Vary response header.
If a cached response contains a Vary header, we look for each of the
header names in the outgoing HTTP request. The outgoing request must
match every header value in the original request for the cache entry
to be used; otherwise, a new request will be issued, and a separate
cache entry will be created.
Note that we must now defer creating the disk cache file itself until we
have received the response headers. The Vary key is computed from these
headers, and affects the partitioned disk cache file name.
There are further optimizations we can make here. If we have a Vary
mismatch, we could find the best candidate cached response and issue a
conditional HTTP request. The content server may then respond with an
HTTP 304 if the mismatched request headers are actually okay. But for
now, if we have a Vary mismatch, we issue an unconditional request as
a purely correctness-oriented patch.
We need to store request headers in order to handle Vary mismatches.
(Note we should also be using BLOB for header storage in sqlite, as they
are not necessarily UTF-8.)
If the cache mode is no-store, we must not interact with the cache at
all.
If the cache mode is reload, we must not use any cached response.
If the cache-mode is only-if-cached or force-cache, we are permitted
to respond with stale cache responses.
Note that we currently cannot test only-if-cached in test-web. Setting
this mode also requires setting the cors mode to same-origin, but our
http-test-server infra requires setting the cors mode to cors.
This adds support for intercepting network requests and serving local
file content instead. When a URL matches an entry in the substitution
map, the local file is served while preserving the original URL's
origin for cross-origin checks.
Usage:
Ladybird --resource-map=/path/to/map.json
The JSON file format is:
{
"substitutions": [
{
"url": "https://example.com/script.js",
"file": "/path/to/local/script.js",
"content_type": "application/javascript",
"status_code": 200
}
]
}
Fields:
- url (required): Exact URL to intercept (query string and fragment
are stripped before matching)
- file (required): Absolute path to local file to serve
- content_type (optional): Override Content-Type header (defaults to
guessing from filename)
- status_code (optional): HTTP status code (defaults to 200)
This is incredibly useful for debugging production websites: you can
intercept any script, stylesheet, or other resource and replace it with
a local copy containing your own debug instrumentation, console.log
statements, or experimental fixes - all without modifying the actual
site or setting up a local dev server.
If the cURL request completes with anything other than CURLE_OK, we must
not keep the cache entry. For example, if the server's connection closes
while transferring data, we receive CURLE_PARTIAL_FILE. We don't want
this cache entry to be treated as valid in a subsequent request.
This directive allows our disk cache to serve stale responses for a time
indicated by the directive itself, while we revalidate the response in
the background.
Issuing requests that weren't initiated by a client is a new thing for
RequestServer. In this implementation, we associate the request with
the client that initiated the request to the stale cache entry. This
adds a "background request" mode to the Request object, to prevent us
from trying to send any of the revalidation response over IPC.
Not super important, but this will match an upcoming ID for stale-while-
revalidate requests. It also would techinically be UB if this ID had
ever overflowed. Let's make it 64-bit while we are here to also avoid
the possibility that it ever will overflow.
We currently have two ongoing implementations of RFC 9111, HTTP caching.
In order to consolidate these, this patch moves the implementation from
RequestServer to LibHTTP for re-use within LibWeb.
The end goal here is for LibHTTP to be the home of our RFC 9111 (HTTP
caching) implementation. We currently have one implementation in LibWeb
for our in-memory cache and another in RequestServer for our disk cache.
The implementations both largely revolve around interacting with HTTP
headers. But in LibWeb, we are using Fetch's header infra, and in RS we
are using are home-grown header infra from LibHTTP.
So to give these a common denominator, this patch replaces the LibHTTP
implementation with Fetch's infra. Our existing LibHTTP implementation
was not particularly compliant with any spec, so this at least gives us
a standards-based common implementation.
This migration also required moving a handful of other Fetch AOs over
to LibHTTP. (It turns out these AOs were all from the Fetch/Infra/HTTP
folder, so perhaps it makes sense for LibHTTP to be the implementation
of that entire set of facilities.)
This effectively reverts 9b8f6b8108.
I misunderstood what Chrome was doing here - they will issue a range
request only for what they call "sparse" cache entries. These entries
are basically used to cache partial large file, e.g. a multi-gigabyte
video. If they hit a legitimate read error, they will fail the request
with a ERR_CACHE_READ_FAILURE status.
We will now (again) fail with a network error when a cache read fails.
For example, we will want to be able to test that a cached object was
expired after N seconds. Rather than waiting that time during testing,
this adds a testing-only request header to internally advance the clock
for a single HTTP request.
This mode allows us to test the HTTP disk cache with two mechanisms:
1. If RequestServer is launched with --http-disk-cache-mode=testing, it
will cache requests with a X-Ladybird-Enable-Disk-Cache header.
2. In test mode, RS will include a X-Ladybird-Disk-Cache-Status response
header indicating how the response was handled by the cache. There is
no standard way for a web request to know what happened with respect
to the disk cache, so this fills that hole for testing.
This mode is not exposed to users.
The Windows RequestPipe implementation uses a non blocking local socket
pair, which means the non-fatal "resource is temporarily unavailable"
error that can occur in the non-blocking HTTP Response data writes can
be retried. This was seen often when loading https://ladybird.org.
While the EAGAIN errno is defined on Windows, WSAEWOULDBLOCK is the
error code returned in this scenario, so we were not detecting that we
could retry and treated the failed write attempt as a proper error.
We now detect WSAEWOULDBLOCK and convert it into the errno equivalent
EWOULDBLOCK. There is precedent for doing a similar conversion in the
Windows PosixSocketHelper::read() implementation.
Finally, we retry when we receive either EAGAIN or EWOULDBLOCK error
codes on all platforms. While POSIX allows these 2 error codes to have
the same value, which they do on Linux according to
https://www.man7.org/linux/man-pages/man3/errno.3.html, it is not
guarenteed. So we now ensure platforms that return EWOULDBLOCK with a
value different than EAGAIN also perform write retries.
When a request becomes stale, we will now issue a revalidation request
(if the response indicates it may be revalidated). We do this by issuing
a normal fetch request, with If-None-Match and/or If-Modified-Since
request headers.
If the server replies with an HTTP 304 status, we update the stored
response headers to match the 304's headers, and serve the response to
the client from the cache.
If the server replies with any other code, we remove the cache entry.
We will open a new cache entry to cache the new response, if possible.
We currently store response headers in the cache entry file, before the
response body. When we implement cache revalidation, we will need to
update the stored response headers with whatever headers are received
in a 304 response. It's not unlikely that those headers will have a size
that differs from the stored headers. We would then have to rewrite the
entire response body after the new headers.
Instead of dealing with those inefficiencies, let's instead store the
response headers in the cache index. This will allow us to update the
headers with a simple SQL query.
The Win32 API equivalent to pipe2() is CreatePipe(), which creates read
and write anonymous pipe handles that we can set to non-blocking via
SetNamedPipeHandleState(); however, this initial approach caused issues
as our Windows infrastructure assumes socket-based handles/fds and that
we don't use Windows pipes at all, see Core::System::is_socket() in
SystemWindows.cpp. So we use socketpair() to keep our current
assumptions true.
Given that Windows uses socketpair() and Unix uses pipe2(), this
RequestPipe abstraction avoids ifdef soup by hiding the details about
how the read/write fds pair is created and how response data is written
to the client.
We previously had no protection against the same URL being requested
multiple times at the same time. For example, if a URL did not have any
cache entry and became requested twice, we would open two cache writers
concurrently. This would result in both writers piping the response to
disk, and we'd have a corrupt cache file.
We now hold back requests under certain scenarios until existing cache
entries have completed:
* If we are opening a cache entry for reading:
- If there is an existing reader entry, carry on as normal. We can
have multiple readers.
- If there is an existing writer entry, defer the request until it is
complete.
* If we are opening a cache entry for writing:
- If there is an existing reader or writer entry, defer the request
until it is complete.
This object will be needed in a future commit to store requests awaiting
other requests to finish. Doing this in a separate commit just to make
that commit less noisy.
We previously waited until we received all response headers before we
would create the cache entry. We now create one immediately, and handle
writing the headers in its own function. This will allow us to know if
a cache entry writer already exists for a given cache key, and thus
prevent creating a second writer at the same time.
We currently manage request lifetime as both an ActiveRequest structure
and a series of lambda callbacks. In an upcoming patch, we will want to
"pause" a request to de-duplicate equivalent requests, such that only
one request goes over the network and saves its response to the disk
cache.
To make that easier to reason about, this adds a Request class to manage
the lifetime of a request via a state machine. We will now be able to
add a "waiting for disk cache" state to stop the request.