This allows the Request to be cleaned up when it becomes inactive,
which in turn allows the GC to clean up the FetchController which is
indirectly captured by a root in the Request callbacks.
We currently attach HTTP cookie headers from LibWeb within Fetch. This
has the downside that the cookie IPC, and the infrastructure around it,
are all synchronous. This blocks the WebContent process entirely while
the cookie is being retrieved, for every request on a page.
We now attach cookie headers from RequestServer. The state machine in
RequestServer::Request allows us to easily do this work asynchronously.
We can also skip this work entirely when the response is served from
disk cache.
Note that we will continue to parse cookies in the WebContent process.
If something goes awry during parsing. we limit the damage to that
process, instead of the UI or RequestServer.
Also note that WebSocket requests still have cookie headers attached
attached from LibWeb. This will be handled in a future patch.
In the future, we may want to introduce a memory cache for cookies in
RequestServer to avoid IPC altogether as able.
* Remove unnecessary const_casts. These have been here since this was
called LibProtocol, and weren't needed then either.
* Use assign-and-check in if statements more liberally.
If the cache mode is no-store, we must not interact with the cache at
all.
If the cache mode is reload, we must not use any cached response.
If the cache-mode is only-if-cached or force-cache, we are permitted
to respond with stale cache responses.
Note that we currently cannot test only-if-cached in test-web. Setting
this mode also requires setting the cors mode to same-origin, but our
http-test-server infra requires setting the cors mode to cors.
Not super important, but this will match an upcoming ID for stale-while-
revalidate requests. It also would techinically be UB if this ID had
ever overflowed. Let's make it 64-bit while we are here to also avoid
the possibility that it ever will overflow.
The bug fixed here is that in RequestServer's ConnectionFromClient, we
stored the websocket ID as i32, despite it being i64 everywhere else.
Let's make it unsigned while we are here to be consistent with how all
other RS-related IDs will be soon.
We used to need an allocated ByteBuffer to send over IPC. Our IPC now
accepts spans for types like String and ByteBuffer though, so we don't
need to allocate buffers at the caller.
The end goal here is for LibHTTP to be the home of our RFC 9111 (HTTP
caching) implementation. We currently have one implementation in LibWeb
for our in-memory cache and another in RequestServer for our disk cache.
The implementations both largely revolve around interacting with HTTP
headers. But in LibWeb, we are using Fetch's header infra, and in RS we
are using are home-grown header infra from LibHTTP.
So to give these a common denominator, this patch replaces the LibHTTP
implementation with Fetch's infra. Our existing LibHTTP implementation
was not particularly compliant with any spec, so this at least gives us
a standards-based common implementation.
This migration also required moving a handful of other Fetch AOs over
to LibHTTP. (It turns out these AOs were all from the Fetch/Infra/HTTP
folder, so perhaps it makes sense for LibHTTP to be the implementation
of that entire set of facilities.)
This effectively reverts 9b8f6b8108.
I misunderstood what Chrome was doing here - they will issue a range
request only for what they call "sparse" cache entries. These entries
are basically used to cache partial large file, e.g. a multi-gigabyte
video. If they hit a legitimate read error, they will fail the request
with a ERR_CACHE_READ_FAILURE status.
We will now (again) fail with a network error when a cache read fails.
Given the RequestServer created the request fd with socketpair() on
Windows and pipe2() on Unix, this abstraction avoids inlined ifdef soup
by hiding the details of how the AK::Stream and Core::Notifier are
created.
On 64-bit Windows, long's are only 4 bytes. This meant that when the
client is deocding the IPC message from a async_request_finished call,
we were corrupting the RequestTimingInfo field and all the fields after
it as we were 4 bytes off.
If transferring a cached response body fails for any reason, we will now
issue a network request instead of failing the request outright.
The catch here is that we will have already transferred the response
code and headers to the client, and potentially some of the body. So we
attempt to only request the remaining data over the network using a
range request. This feels a bit sketchy, but this is also how Chromium
behaves.
However, the server may or may not support range requests. If they do,
we can expect an HTTP 206 response with the bytes we need. If not, we
will receive an HTTP 200 (assuming the request succeeded), along with
the entire object's body. In this case, we also behave like Chromium,
and internally drop number of bytes we had already transferred.
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
Instead of wrapping all non-movable members of TransportSocket in OwnPtr
to keep it movable, make TransportSocket itself non-movable and wrap it
in OwnPtr.
When the request is stopped, we clear its internal stream data. There is
a window where RequestServer may have sent an IPC message whose callback
will try to access that data in the time between the data being cleared
and RS receiving the stop signal. When this happens, just bail from IPC.
This has been a longstanding ergonomic issue with our IPC compiler. Non-
trivial types were previously passed by const&. So if we wanted to avoid
expensive copies, we would have to const_cast and move the data.
We now pass ownership of all transferred data to the client subclasses.
This allows us to remove const_cast from these methods, and allows us to
avoid some trivial expensive copies that we didn't bother to const_cast.