Per-request and per-connection logging that surfaces enough detail
to diagnose where time goes when a page load misbehaves. Gated by a
new REQUESTSERVER_WIRE_DEBUG cmakedefine.
Documentation/RequestServerWireLogging.md describes each label
(wire/wire+/wire++/wire^, wire-batch, wire-stall, wire-burst,
wire-pipe-pressure, LibDNS wire-dns, UI wire-cookie) and how to read
them.
When we request the HTTP cookie for a SWR request, we were providing the
cookie to the standard request corresponding to the SWR request's ID.
This had two effects:
1. The SWR request would never finish.
2. If the corresponding standard request happened to be a connect-only
request, this would result in a crash as we were expecting it to have
gone through the normal fetch process.
This was seen on some articles on news.google.com.
This prevents a race condition:
1. Try to connect a websocket
2. DNS lookup starts
3. JS causes the websocket to no longer be alive, and it is GCed
4. websocket_close() is called, but it doesn't find a websocket with
that websocket_id, so nothing happens
5. DNS lookup completes, and opens the websocket
6. This websocket never gets closed
By separately tracking which websockets we are trying to connect, we can
record the fact we tried to close it, and then the DNS lookup callback
can skip creating the now-unwanted websocket.
Add IPC::TransportHandle as an abstraction for passing IPC
transports through .ipc messages. This replaces IPC::File at
all sites where a transport (not a generic file) is being
transferred between processes.
TransportHandle provides from_transport(),
clone_from_transport(), and create_transport() methods that
encapsulate the fd-to-socket-to-transport conversion in one
place. This is preparatory work for Mach port support on
macOS -- when that lands, only TransportHandle's internals
need to change while all .ipc definitions and call sites
remain untouched.
Especially for the disk cache, it always felt a bit sketchy to create
these objects on the global stack. We now create them in RequestServer's
main() where we can be more certain of destruction order, and pass them
to ConnectionFromClient instances.
This adds a settings box to about:settings to allow users to limit the
disk cache size. This will override the default 5 GiB limit. We do not
automatically delete cache data if the new limit is suddenly less than
the used disk space; this will happen on the next request. This allows
multiple changes to the settings in a row without thrashing the cache.
In the future, we can add more toggles, such as disabling the disk
cache altogether.
We currently attach HTTP cookie headers from LibWeb within Fetch. This
has the downside that the cookie IPC, and the infrastructure around it,
are all synchronous. This blocks the WebContent process entirely while
the cookie is being retrieved, for every request on a page.
We now attach cookie headers from RequestServer. The state machine in
RequestServer::Request allows us to easily do this work asynchronously.
We can also skip this work entirely when the response is served from
disk cache.
Note that we will continue to parse cookies in the WebContent process.
If something goes awry during parsing. we limit the damage to that
process, instead of the UI or RequestServer.
Also note that WebSocket requests still have cookie headers attached
attached from LibWeb. This will be handled in a future patch.
In the future, we may want to introduce a memory cache for cookies in
RequestServer to avoid IPC altogether as able.
During process exit, static variables and thread-local variables are
destroyed in an unpredictable order. The connections HashMap was static,
so when destroyed during static destruction, it would destroy
ConnectionFromClient objects whose notifiers would try to unregister
from thread data that may have already been destroyed.
Fix this by moving the connections HashMap from static storage to stack
storage in ladybird_main(). This guarantees it will be destroyed before
the function returns, while the event loop and all thread data are still
fully alive.
If the cache mode is no-store, we must not interact with the cache at
all.
If the cache mode is reload, we must not use any cached response.
If the cache-mode is only-if-cached or force-cache, we are permitted
to respond with stale cache responses.
Note that we currently cannot test only-if-cached in test-web. Setting
this mode also requires setting the cors mode to same-origin, but our
http-test-server infra requires setting the cors mode to cors.
This directive allows our disk cache to serve stale responses for a time
indicated by the directive itself, while we revalidate the response in
the background.
Issuing requests that weren't initiated by a client is a new thing for
RequestServer. In this implementation, we associate the request with
the client that initiated the request to the stale cache entry. This
adds a "background request" mode to the Request object, to prevent us
from trying to send any of the revalidation response over IPC.
Not super important, but this will match an upcoming ID for stale-while-
revalidate requests. It also would techinically be UB if this ID had
ever overflowed. Let's make it 64-bit while we are here to also avoid
the possibility that it ever will overflow.
The bug fixed here is that in RequestServer's ConnectionFromClient, we
stored the websocket ID as i32, despite it being i64 everywhere else.
Let's make it unsigned while we are here to be consistent with how all
other RS-related IDs will be soon.
The end goal here is for LibHTTP to be the home of our RFC 9111 (HTTP
caching) implementation. We currently have one implementation in LibWeb
for our in-memory cache and another in RequestServer for our disk cache.
The implementations both largely revolve around interacting with HTTP
headers. But in LibWeb, we are using Fetch's header infra, and in RS we
are using are home-grown header infra from LibHTTP.
So to give these a common denominator, this patch replaces the LibHTTP
implementation with Fetch's infra. Our existing LibHTTP implementation
was not particularly compliant with any spec, so this at least gives us
a standards-based common implementation.
This migration also required moving a handful of other Fetch AOs over
to LibHTTP. (It turns out these AOs were all from the Fetch/Infra/HTTP
folder, so perhaps it makes sense for LibHTTP to be the implementation
of that entire set of facilities.)
We currently manage request lifetime as both an ActiveRequest structure
and a series of lambda callbacks. In an upcoming patch, we will want to
"pause" a request to de-duplicate equivalent requests, such that only
one request goes over the network and saves its response to the disk
cache.
To make that easier to reason about, this adds a Request class to manage
the lifetime of a request via a state machine. We will now be able to
add a "waiting for disk cache" state to stop the request.
This will allow more easily using these from other files. This also lets
us hide the Windows.h header necessity in a single location, instead of
needing to remember to include it everywhre we would otherwise include
<curl/curl.h>.
If transferring a cached response body fails for any reason, we will now
issue a network request instead of failing the request outright.
The catch here is that we will have already transferred the response
code and headers to the client, and potentially some of the body. So we
attempt to only request the remaining data over the network using a
range request. This feels a bit sketchy, but this is also how Chromium
behaves.
However, the server may or may not support range requests. If they do,
we can expect an HTTP 206 response with the bytes we need. If not, we
will receive an HTTP 200 (assuming the request succeeded), along with
the entire object's body. In this case, we also behave like Chromium,
and internally drop number of bytes we had already transferred.
This is a bit of a blunt hammer, but this hooks an action to clear the
HTTP disk cache into the existing Clear Cache action. Upon invocation,
it stops all existing cache entries from making further progress, and
then deletes the entire cache index and all cache files.
In the future, we will of course want more fine-grained control over
cache deletion, e.g. via an about:history page.
This adds an overlay port for curl that adds the features required for
HTTP/3.
This is not quite compatible with the upstream vcpkg.json, because
enabling HTTP/2 makes it use the default SSL backend, which is
sectransp for macOS and schannel on Windows. These backends are not
compatible with ngtcp2. Additionally, we can not build curl with
multiple SSL backends when using ngtcp2.
I couldn't find a way to selectively disable/enable dependencies based
on what features are enabled, so I made HTTP/2 pick OpenSSL in our
overlay port. Upstream vcpkg will likely want to support wolfSSL and
GnuTLS backends for ngtcp2, so they'll be additional work to get this
into upstream.
Instead of wrapping all non-movable members of TransportSocket in OwnPtr
to keep it movable, make TransportSocket itself non-movable and wrap it
in OwnPtr.
This has been a longstanding ergonomic issue with our IPC compiler. Non-
trivial types were previously passed by const&. So if we wanted to avoid
expensive copies, we would have to const_cast and move the data.
We now pass ownership of all transferred data to the client subclasses.
This allows us to remove const_cast from these methods, and allows us to
avoid some trivial expensive copies that we didn't bother to const_cast.
This implementation can be better improved in the future by ripping
out a lot of the manual logic in LibWebSocket and rely on libcurl to
parse our message payloads. But for now, this uses the 'raw mode' of
curl websockets in connect-only mode to allow for somewhat seamless
integration into our event loop.