ladybird

mirror of https://github.com/LadybirdBrowser/ladybird synced 2026-04-27 02:05:07 +02:00

Author	SHA1	Message	Date
Andreas Kling	37bdcc3488	LibWeb: Support MIME type sniffing for streaming HTTP responses Previously, when loading a document, we would try to sniff the MIME type by reading from the response body's source. However, for streaming HTTP responses, the body source is Empty (the data comes through the stream instead), so we had no bytes to sniff. This caused pages like hypr.land (which sends no Content-Type header) to be misidentified as plain text instead of HTML, since the MIME sniffing algorithm would receive zero bytes and fall back to the default type. The fix captures the first bytes of the response body during fetch, storing them on the Body object. These bytes are the "resource header" defined by the MIME Sniffing spec - up to 1445 bytes, which is enough to identify any MIME type the spec can detect. Since bytes may arrive asynchronously during streaming, we use a callback mechanism: if bytes aren't ready yet when load_document() needs them, it registers a callback that fires once enough bytes have been captured (or the stream ends). The flow is: 1. FetchedDataReceiver receives network bytes, buffers them 2. When Body is created, buffered bytes are flushed to Body's sniff buffer, and subsequent bytes are appended as they arrive 3. Before calling load_document(), Navigable waits for sniff bytes 4. load_document() passes the bytes to MimeSniff::Resource::sniff()	2026-01-24 15:21:26 +01:00
Timothy Flynn	d3041dc054	LibHTTP+LibWeb: Support the HTTP Vary response header We now partition the HTTP disk cache based on the Vary response header. If a cached response contains a Vary header, we look for each of the header names in the outgoing HTTP request. The outgoing request must match every header value in the original request for the cache entry to be used; otherwise, a new request will be issued, and a separate cache entry will be created. Note that we must now defer creating the disk cache file itself until we have received the response headers. The Vary key is computed from these headers, and affects the partitioned disk cache file name. There are further optimizations we can make here. If we have a Vary mismatch, we could find the best candidate cached response and issue a conditional HTTP request. The content server may then respond with an HTTP 304 if the mismatched request headers are actually okay. But for now, if we have a Vary mismatch, we issue an unconditional request as a purely correctness-oriented patch.	2026-01-22 08:54:49 -05:00
Timothy Flynn	aa1517b727	LibHTTP+LibWeb+RequestServer: Handle the Fetch API's cache mode If the cache mode is no-store, we must not interact with the cache at all. If the cache mode is reload, we must not use any cached response. If the cache-mode is only-if-cached or force-cache, we are permitted to respond with stale cache responses. Note that we currently cannot test only-if-cached in test-web. Setting this mode also requires setting the cors mode to same-origin, but our http-test-server infra requires setting the cors mode to cors.	2026-01-22 07:05:06 -05:00
Timothy Flynn	6b91199253	LibHTTP+LibWeb: Move Infrastructure::Request::CacheMode to LibHTTP We will need to send this enum over IPC to RequestServer to affect the disk cache's behavior.	2026-01-22 07:05:06 -05:00
Timothy Flynn	4dda144ce0	LibWeb: Return an HTTP cache partition even if the cache is disabled Returning null here results in the fetch cache mode becoming hard-set to no-store. This means the HTTP cache cannot be consulted nor updated. A future commit will make our disk cache respect this flag, as this is a valid client-provided value. So instead of setting this flag when the memory cache is disabled, let's move the check to where the cache later becomes consulted/updated.	2026-01-22 07:05:06 -05:00
Timothy Flynn	1a5cd6b05f	LibWeb: Do not perform any cache revalidation from WebContent We currently will perform some revalidation from both WebContent and RequestServer. For simplicity's sake, now that the memory cache only holds fresh responses, let's remove revalidation handling from the WebContent process. If a memory-cached response is stale, it's fine to just forward that request to RequestServer. It will then either be served by disk cache, or revalidated at that point.	2026-01-19 08:02:14 -05:00
Timothy Flynn	2ac219405f	LibHTTP+LibWeb: Purge non-fresh entries from the memory cache Once a cache entry is not fresh, we now remove it from the memory cache. We will avoid handling revalidation from within WebContent. Instead, we will just forward the request to RequestServer, where the disk cache will handle revalidation for itself if needed.	2026-01-19 08:02:14 -05:00
Timothy Flynn	928522c48e	LibWeb: Define the memory cache flag as static This isn't externally referenced.	2026-01-19 08:02:14 -05:00
Andreas Kling	a39f3c383b	LibWeb: Set up report timing steps on fetch body read errors When fully reading a response body fails, the transform stream's flush algorithm (which calls processResponseEndOfBody) never runs since flush only executes on successful stream completion. This left report_timing_steps null, causing a crash when process_response_consume_body tried to call report_timing. The fetch spec doesn't explicitly handle this case. We work around it by extracting the report timing setup (steps 1-3 of processResponseEndOfBody) into a helper that we call from both the success path (via flush) and the error path (via processBodyError). This is an ad-hoc fix that deviates slightly from the spec, but it ensures failed fetches still produce useful timing data and don't crash.	2026-01-18 00:30:55 +01:00
Andreas Kling	681d00c218	LibDevTools: Pass request initiator type to network panel Propagate the request initiator type (e.g., "xmlhttprequest", "fetch", "script", "stylesheet") from LibWeb through the IPC layer to DevTools. This enables Firefox DevTools to correctly identify XHR/fetch requests and display appropriate cause types in the Network panel's "Initiator" column.	2026-01-15 20:10:19 +01:00
Timothy Flynn	b35645523c	LibHTTP+LibWeb: Make memory cache debug logs consistent with disk cache Let's also not yell.	2026-01-10 09:02:41 -05:00
Timothy Flynn	0d99d54c46	LibHTTP+LibWeb: Do not cache range requests (for now) We currently do not handle responses for range requests at all in our HTTP caches. This means if we issue a request for a range of bytes=1-10, that response will be served to a subsequent request for a range of bytes=10-20. This is obviously invalid - so until we handle these requests, just don't cache them for now.	2026-01-08 11:59:12 +01:00
Andreas Kling	2ac363dcba	LibGC: Only call finalize() on types that override finalize() This dramatically cuts down on time spent in the GC's finalizer pass, since most types don't override finalize().	2026-01-07 20:51:17 +01:00
breakgimme	4f74ced414	LibWeb: Make User-Agent updates apply to HTTP requests on reload	2025-12-28 09:11:13 -05:00
Aliaksandr Kalenik	f6a7df78e7	LibWeb: Add missing GC visits for `XHR::FormDataEntry` `3a6782689` fix up that changes `Vector<XHR::FormDataEntry>` to `GC::ConservativeVector<XHR::FormDataEntry>`.	2025-12-26 19:48:46 +01:00
Andreas Kling	3a6782689f	LibWeb: Don't use GC::Root in FormDataEntryValue variant This was causing reference cycles and leaking entire realms.	2025-12-26 11:57:00 +01:00
Timothy Flynn	696935d8ce	LibWeb: Remove outdated note from LoadRequest creation in fetch The method being referred to here was removed in commit `556364fd76`.	2025-12-21 09:24:51 -06:00
Timothy Flynn	bf7b812d0b	LibHTTP+LibWeb: Store the in-memory HTTP cache without JS realms The in-memory HTTP Fetch cache currently keeps the realm which created each cache entry alive indefinitely. This patch migrates this cache to LibHTTP, to ensure it is completely unaware of any JS objects. Now that we are not interacting with Fetch response objects, we can no longer use Streams infrastructure to pipe the response body into the Fetch response. Fetch also ultimately creates the cache response once the HTTP response headers have arrived. So the LibHTTP cache will hold entries in a pending list until we have received the entire response body. Then it is moved to a completed list and may be used thereafter.	2025-12-21 08:59:31 -06:00
Timothy Flynn	d08bd14928	LibWeb: Accumulate all network bytes in FetchedDataReceiver This will allow us to hand off the bytes to the HTTP memory cache.	2025-12-21 08:59:31 -06:00
Timothy Flynn	46b3218241	LibHTTP+LibWeb: Use LibHTTP to calculate stale-while-revalidate values No need to duplicate this in LibWeb. In doing so, this also fixes an apparent bug for SWR handling in LibWeb. We were previously deciding if we were in the SWR lifetime with: stale_while_revalidate > current_age However, the SWR lifetime is meant to be an additional time on top of the freshness lifetime: freshness_lifetime + stale_while_revalidate > current_age	2025-12-14 11:33:02 -05:00
Timothy Flynn	854981714f	LibWeb: Remove errant "non-standard" comment The method this comment was attached to was removed in commit `a5df972055`.	2025-12-14 11:33:02 -05:00
Aliaksandr Kalenik	f29212703e	LibWeb/Fetch: Use `GC::Function` for `fetch_main_content()` callback ...instead of creating GC roots for captured values.	2025-12-09 08:51:48 -05:00
Timothy Flynn	adcf5462af	LibWeb+WebContent: Rename the http-cache flag to http-memory-cache Rather than having http-cache and http-disk-cache, let's rename the former to http-memory-cache to be extra clear what we are talking about.	2025-12-02 12:19:42 +01:00
Timothy Flynn	2453f0bc04	LibHTTP+LibWeb: Use LibHTTP's cache implementation in LibWeb There are a couple of remaining RFC 9111 methods in LibWeb's Fetch, but these are currently directly tied to the way we store GC-allocated HTTP response objects. So de-coupling that is left as a future exercise.	2025-11-29 08:35:02 -05:00
Timothy Flynn	9375660b64	LibHTTP+LibWeb+RequestServer: Move Fetch's HTTP header infra to LibHTTP The end goal here is for LibHTTP to be the home of our RFC 9111 (HTTP caching) implementation. We currently have one implementation in LibWeb for our in-memory cache and another in RequestServer for our disk cache. The implementations both largely revolve around interacting with HTTP headers. But in LibWeb, we are using Fetch's header infra, and in RS we are using are home-grown header infra from LibHTTP. So to give these a common denominator, this patch replaces the LibHTTP implementation with Fetch's infra. Our existing LibHTTP implementation was not particularly compliant with any spec, so this at least gives us a standards-based common implementation. This migration also required moving a handful of other Fetch AOs over to LibHTTP. (It turns out these AOs were all from the Fetch/Infra/HTTP folder, so perhaps it makes sense for LibHTTP to be the implementation of that entire set of facilities.)	2025-11-27 14:57:29 +01:00
Timothy Flynn	3dce6766a3	LibWeb: Extract some CORS and MIME Fetch helpers to their own files An upcoming commit will migrate the contents of Headers.h/cpp to LibHTTP for use outside of LibWeb. These CORS and MIME helpers depend on other LibWeb facilities, however, so they cannot be moved.	2025-11-27 14:57:29 +01:00
Timothy Flynn	0fd80a8f99	LibTextCodec+LibWeb: Move isomorphic coders to LibTextCodec This will be used outside of LibWeb.	2025-11-27 14:57:29 +01:00
Timothy Flynn	7b0dfa61b1	LibWeb: Do not move header name/values that are re-used Fixes a regression from commit: `f675cfe90f`	2025-11-26 21:22:35 -05:00
Timothy Flynn	cbfae97101	LibWeb: Include empty header values when joining duplicated headers Fixes a regression from commit: `f675cfe90f` It is not sufficient to only check if the builder is empty, as we will then drop empty header values (when the first found value is empty). This is tested in WPT by /cors/origin.htm, but that requires an HTTP server.	2025-11-26 21:22:35 -05:00
Timothy Flynn	f675cfe90f	LibWeb: Store HTTP methods and headers as ByteString The spec declares these as a byte sequence, which we then implemented as a ByteBuffer. This has become pretty awkward to deal with, as evidenced by the plethora of `MUST(ByteBuffer::copy(...))` and `.bytes()` calls everywhere inside Fetch. We would then treat the bytes as a string anyways by wrapping them in StringView everywhere. We now store these as a ByteString. This is more comfortable to deal with, and we no longer need to continually copy underlying storage (as ByteString is ref-counted). This work is largely preparatory for an upcoming HTTP header refactor.	2025-11-26 09:15:06 -05:00
Timothy Flynn	ed27eea091	LibWeb: Do not copy the result of HeaderList::extract_header_list_values There's no need to copy the Vector out of this result every time we call it. We can move it out or access it directly.	2025-11-26 09:15:06 -05:00
Timothy Flynn	44fbf6451e	LibWeb: Simplify Fetch's build-content-range implementation * Don't pass u64 by reference * Don't double-format the range numbers	2025-11-26 09:15:06 -05:00
Timothy Flynn	d70224ad2e	LibWeb: Organize Fetch Headers.h/Headers.cpp a bit Generally just define things in the order they are declared (will make a change to use ByteString in this file a bit easier to follow). Also make a couple of free functions be class methods on Header / HeaderList.	2025-11-26 09:15:06 -05:00
Prajjwal	1f5ffe04c8	LibWeb: Fix race condition between read_all_bytes and stream population There might be a race between read_all_bytes and stream population. If document load reads stream before it is populated, the stream will be empty and might lead to hang in SessionHistoryTraversalQueue which is expecting a promise to be resolved on document load. This race can occur when stream population and document source are set very close to each other. For example, when a newly generated blob is set as the source of an iframe. - navigation/multiple-navigable-cross-document-navigation.html has been modified to trigger this race.	2025-11-26 12:27:12 +01:00
Aliaksandr Kalenik	69cede4a0f	AK+LibWeb: Make `StringBase::bytes()` lvalue-only Disallow calling `StringBase::bytes()` on temporaries to avoid returning `ReadonlyBytes` that outlive the underlying string. With this change, we catch a real UAF: `load_result.data = maybe_response.release_value().bytes();` All other updated call sites were already safe, they just needed to use an intermediate named variable to satisfy the new lvalue-only requirement.	2025-11-25 13:02:20 -05:00
Aliaksandr Kalenik	0eb28a1a54	LibWeb: Delete unused `BufferPolicy` from fetch `Request` This is no longer needed since all requests are unbuffered.	2025-11-20 06:29:13 -05:00
Aliaksandr Kalenik	16b0f1e6c2	LibWeb: Delete unused `ResourceLoader::load()` ...and rename `load_unbuffered()` to `load()`.	2025-11-20 06:29:13 -05:00
Aliaksandr Kalenik	3058274386	LibWeb: Use unbuffered network requests for all Fetch requests Previously, unbuffered requests were only available as a special mode for EventSource. With this change, they are enabled by default, which means chunks can be read from the stream as soon as they arrive. This unlocks some interesting possibilities, such as starting to parse HTML documents before the entire response has been received (that, in turn, allows us to initiate subresource fetches earlier or begin executing scripts sooner), or start rendering videos before they are fully downloaded. Co-authored-by: Timothy Flynn <trflynn89@pm.me>	2025-11-20 06:29:13 -05:00
Timothy Flynn	813986237e	LibWeb: Add some tests that exercise the HTTP disk cache Our HTTP disk cache is currently manually tested against various sites. This patch adds some tests to cover various scenarios, including non- cacheable responses, expired responses, and revalidation. In order to ensure we hit the disk cache in RequestServer, we must disable the in-memory cache in WebContent.	2025-11-20 09:33:49 +01:00
Psychpsyo	100f37995f	Everywhere: Clean up AD-HOC and FIXME comments without colons	2025-11-13 15:56:04 +01:00
Luke Wilde	167de08c81	LibWeb: Remove exception throwing from Fetch These were only here to manage OOMs, but there's not really any way to recover from small OOMs in Fetch especially with its async nature.	2025-11-07 04:08:30 +01:00
Timothy Flynn	e0a8eb3767	LibWeb+WebContent: Hook Fetch's HTTP cache into the clear-cache action And fix a typo in an invocation to clear the cache.	2025-11-05 18:27:36 +01:00
Timothy Flynn	6057719f63	LibWeb: Use fetch to retrieve all HTMLLinkElement resources HTMLLinkElement is the final user of Resource/ResourceClient (used for preloads and icons). This ports these link types to use fetch according to the spec. Preloads were particularly goofy because they would be stored in the ResourceLoader's ad-hoc cache. But this cache was never consulted for organic loads, thus were never used. There is more work to be done to use these preloads within fetch, but for now they at least are stored in fetch's HTTP cache for re-use.	2025-11-05 18:27:36 +01:00
Timothy Flynn	5b40398c39	LibWeb: Invoke process_response_consume_body with null in error cases We were previously invoking it with an empty ByteBuffer, which will be interpreted as a successful load by HTMLLinkElement in a future commit.	2025-11-05 18:27:36 +01:00
Luke Wilde	eeb5446c1b	LibWeb: Avoid including Navigable.h in headers This greatly reduces how much is recompiled when changing Navigable.h, from >1000 to 82.	2025-10-20 10:16:55 +01:00
Julian Dominguez-Schatz	4e3387778e	LibWeb: Respect IncludeCredentials for Set-Cookie during fetch Per https://fetch.spec.whatwg.org/#http-network-fetch, Set-Cookie should only store a cookie if IncludeCredentials::Yes is set. Fixes 1 web platform test.	2025-09-24 10:12:56 +01:00
Timothy Flynn	2df4835025	LibWeb: Place HTTP cache logging behind a debug flag It's quite verbose to have logging on by default here.	2025-09-19 13:52:07 +02:00
Timothy Flynn	b4df857a57	LibWeb+LibWebView+WebContent: Replace DNT with GPC Global Privacy Control aims to be a replacement for Do Not Track. DNT ended up not being a great solution, as it wasn't enforced by law. This actually resulted in the DNT header serving as an extra fingerprinting data point. GPC is becoming enforced by law in USA states such as California and Colorado. CA is further working on a bill which requires that browsers implement such an opt-out preference signal (OOPS): https://cppa.ca.gov/announcements/2025/20250911.html This patch replaces DNT with GPC and hooks up the associated settings.	2025-09-16 10:38:20 +02:00
Timothy Flynn	7b3465ab55	LibWeb: Do not require multipart form data to end with CRLF According to RFC 2046, the BNF of the form data body is: multipart-body := [preamble CRLF] dash-boundary transport-padding CRLF body-part *encapsulation close-delimiter transport-padding [CRLF epilogue] Where "epilogue" is any text that "may be ignored or discarded". So we should stop parsing the body once we encounter the terminating delimiter ("--"). Note that our parsing function is from an attempt to standardize the grammar in the spec: https://andreubotella.github.io/multipart-form-data This proposal hasn't been updated in ~4 years, and the fetch spec still does not have a formal definition of the body string.	2025-09-15 18:28:48 +02:00
Luke Wilde	4772e1b0c9	LibWeb/Fetch: Add missing spec step for checking for tuple origin Fixes https://github.com/LadybirdBrowser/ladybird/issues/6188	2025-09-15 09:58:33 +02:00

1 2 3

143 Commits