If provided, test-web batches & results will be partitioned in
multiple runs. Each run will execute the same set of tests, and
non-pass results will be stored in run1, run2, run6, etc directories.
Previously, --rebaseline would unconditionally overwrite expected PNGs
before comparing, causing every screenshot test expectation to be
regenerated even when the actual screenshot already matched. Restructure
to load and compare first, only writing the new expectation on mismatch
or when the expected file doesn't exist yet.
Instead of rendering a reference HTML page that wraps an <img> tag
pointing to a PNG, Screenshot tests now load the expected PNG directly
from disk and compare it against the rendered screenshot. This
eliminates the indirection of loading and rendering a second page just
to display a static image.
This also means --rebaseline now works for Screenshot tests, generating
the expected PNG automatically instead of requiring manual screenshot
capture and placement.
Changes:
- Add TestMode::Screenshot with its own collector and runner
- Move PNGs from Screenshot/images/ to Screenshot/expected/ with
normalized names matching input filenames
- Remove all 92 reference HTML wrapper files and the images/
directory
- Remove <link rel="match"> from all 94 Screenshot input HTML
files
- Update add_libweb_test.py Screenshot boilerplate accordingly
- Add Screenshot mode to results viewer image comparison tabs
If the difference between the expected and actual test images was small,
our in-browser diff tool would often fail to highlight differing pixels.
Replace this by generating a new diff PNG that is layered as follows:
1. 50%/50% blend of the actual and expected images
2. 80% blend with white / rgb(255, 255, 255)
3. Differing pixels are highlighted in red / rgb(255, 0, 0)
Many web features work differently when loaded from file:// URLs
as web features such as origin checks no longer hold. For example,
document.cookies is disabled when a file URL is loaded. It also
means that many WPT tests when imported do not work as intended.
While it would be nice to load almost all tests from an HTTP server
this makes running tests significantly slower. Furthermore, loading
tests over a file:// URL is sufficient for our needs in most
scenarios anyhow.
To support both usecases, add the ability to configure a
specific test in TestConfig.ini to be loaded over an HTTP server
instead of its file:// URL.
Commit 84db5d8c1c introduced the ability
to load tests over an http:// URL instead of a file:// URL. Each time
this happens, we switch to a new WebContent process due to site
isolation. Our WebContent output capture was not handling this.
For some reason, this was causing a wide array of test failures and
timeouts. Often, the failures were accompanied by the content of the
files loaded over HTTP being dumped to stdout. It's not quite clear
what was going on here.
I added the VERBOSITY_LEVEL_LOG_TEST_OUTPUT level for the case where I
run test-web with a filter for a single file, in which case I want to
just see the test output in the terminal. But for CI, we want to always
capture output for the uploaded test artifacts.
This patch changes this verbosity level to instead tee the output. So
we always create the results artifacts with all logs, and optionally
echo the received output.
When dumping a GC graph, we now write the output as a .js file
containing `var GC_GRAPH_DUMP = <json>;` instead of raw JSON.
This allows gc-heap-explorer.html to load the dump via a
dynamically created <script> element, avoiding CORS restrictions
that prevent file:// pages from fetching other file:// URLs.
After dumping, both the browser and test-web print a clickable
file:// URL that opens the heap explorer with the dump pre-loaded.
The heap explorer's drag-and-drop file picker also accepts both
the new .js format and plain .json files.
If tests fail, let's just always create the results page regardless of
verbosity. This allows viewing logs afterwards, as well as uploading the
test-dumps collection in CI.
As a result, this removes the --dump-failed-ref-tests flag. It's a bit
overloaded with our new results format, and it would be awkward to keep
both separately working here.
We were previously using the absence of a results directory to decide
if we should disable live display. However, this was never the case -
the results directory would become unconditionally set in main(). This
caused all test-web output in CI to become hidden.
Instead, let's add a verbosity mode to display test output explicitly.
This is set for CI.
This also makes it much easier to debug single tests locally. The live
capture and results file combo is a bit overkill for running a single
test; we can now just add "-v" to the command line to see the output
directly in the terminal.
This fixes a regression introduced in bf77aeb3d which added support for
WPT variant meta tags.
The problem occurs when a text test has no corresponding expectation
file. The test should fail, but instead it passes. This happens because
of the following execution flow:
1. Test loads and on_load_finish is called
2. internals.loadTestVariants() is executed (for all text tests)
3. The test JS calls signalTestIsDone(), setting did_finish_test = true
4. The variant metadata callback fires with an empty array (no variants)
5. In on_test_variant_metadata, since did_finish_test is true, it
directly calls
view.on_test_complete({ test_index, TestResult::Pass })
The bug is in step 5: the code bypasses handle_completed_test() which is
where the expectation file comparison happens. Instead of calling
on_test_complete() (the lambda that invokes handle_completed_test), it
directly returns Pass.
When tests are being collected, the paths get canonicalized via the
FileSystem::real_path() call. This means on Windows the paths are using
`\` as the directory separator.
Tests get removed by performing string-based matching on a hardcoded
set of support file globs and an optional user defined glob. Even if we
normalized these globs to use `\` instead of `/` on Windows, they still
wouldn't match because AK::StringUtils::matches() uses `\` to escape
metacharacters. For example, `my\relative\windows\path` will not match
`my\*\path` because the mask is interpreted as `"my*\path"` instead.
So we normalize paths to use POSIX separators to support the existing
glob matching infrastructure. This allows Windows to filter out support
files properly which prevents us from crashing when Ref tests try to
load one of those files. It also allows Windows to use `/`-based paths
for the -f argument.
Use to_truncated_seconds() instead of to_seconds() for the per-view
duration display. The latter rounds up, causing "1s" to appear almost
immediately when a test starts. Now "1s" only appears after a full
second has elapsed.
Previously, when a test's expectation file didn't exist, test-web would
return an error immediately without generating diff output files. This
made it difficult to see what the actual test output was.
Now, when an expectation file is missing, test-web continues to generate
the diff files (showing all output as additions) so users can inspect
the actual output and create expectation files if needed.
Also defer the "Failed opening" warning to avoid cluttering the live
display output.
Add support for WPT test variants, which allow a single test file to be
run multiple times with different URL query parameters. Tests declare
variants using `<meta name="variant" content="?param=value">` tags.
When test-web encounters a test with variants, it expands that test into
multiple runs, each with its own expectation file using the naming
convention `testname@variant.txt` (e.g., `test@run_type=uri.txt`).
Implementation details:
- WebContent observes variant meta tags and communicates them to the
test runner via a new `did_receive_test_variant_metadata` IPC call
- test-web dynamically expands tests with variants during execution,
waking idle views after each test completion to pick up new work
- Use index-based test tracking to avoid dangling references when the
test vector grows during variant expansion
- Introduce TestRunContext to group test run state, and store a static
pointer to it for signal handler access
This enables proper testing of WPT tests that use variants, such as the
html5lib parsing tests (which test uri, write, and write_single modes)
and the editing/bold tests (which split across multiple ranges).
Two fixes for crash handling:
1. Re-setup output capture after WebContent respawns. When WebContent
crashes, a new process is spawned with new stdout/stderr pipes.
We need to set up new notifiers to drain them, otherwise the new
WebContent blocks on write() when its pipe buffer fills up.
2. Disconnect child view crash handlers between tests. Child views
(from window.open, iframes, etc.) persist after a test completes.
If they crash later, we don't want that to affect subsequent tests.
The timeout timer for ref tests was stored in a local variable that
went out of scope when run_ref_test() returned, causing the timer to
be destroyed before it could fire.
Store the timer in test.timeout_timer (like dump tests already do)
so it persists until the test completes.
When a WebContent process crashes, its stdout/stderr pipes will signal
POLLHUP. The output capture notifiers were not handling this case,
causing the event loop to spin at 100% CPU as poll() kept returning
immediately with POLLHUP set.
Fix by disabling the notifiers when read() returns 0 or an error.
After a WebContent process crashes, the view's client is no longer
valid. Calling reset_zoom() would attempt to send IPC messages to the
dead process. Skip this operation when the test result indicates a
crash.
- Generate colorized HTML diff files (.diff.html) alongside plain text
diffs (.diff.txt) for each failing test
- Add total test count to results summary
- Improve the results-index.html viewer with a dark theme, keyboard
navigation, search/filter, and tabbed interface for viewing diffs,
expected/actual output, and stdout/stderr
- Move s_total_tests assignment outside live display block so it's
always set
- Add --results-dir CLI flag to specify output directory
- Default to {tempdir}/test-web-results if not specified
- Capture stdout/stderr from all helper processes (WebContent,
RequestServer, ImageDecoder) to prevent output spam
- Save captured output to per-test files in results directory
- Save test diffs (expected vs actual) to results directory
- Generate HTML index of failed tests with links to diffs
- Display live-updating concurrent test status with progress bar
- Defer warning messages until after test run completes
Clearing the callback opens a window for the WebContent process to crash
while we do not have a callback set. In practice, we see this with ASAN
crashes, where the crash occurs after we've already signaled that the
test has completed.
We now set the crash handler once during init. This required moving the
clearing of other callbacks to the test completion handler (we were
clearing these callbacks in several different ways anyways, so now we
will at least be consistent).
When trying to repro a failed CI test, it is handy to know the order in
which test-web ran its tests. We've seen in the past that the exact
order can be important to debug flakey tests.
This adds the index of the WebView running the test to verbose log
statements (which is enabled in CI).
When a test is active in a test-web view, show the relative path to the
test instead of the view's URL. This gives a better starting point for
debugging than whatever the last loaded URL happened to be.
If no test is active, we still show the view's URL.
If you interrupt the test runner (Ctrl+C, SIGINT) or if the test runner
is gracefully being terminated (SIGTERM), it now reports the current
status of all the spawned views with their URL and, if an active test is
still being run, the time since the start of the test.
Hopefully this will help gain insight into which tests are hanging.
The test logs tend to get a bit mixed together, so this makes it
possible to identify which values correspond to which test when multiple
are failing at once.
With the newly supported fuzzy matching in our test-web runner, we can
now define the expected maximum color channel and pixel count errors per
failing test and set a baseline they should not exceed.
The figures I added to these tests all come from my macOS M4 machine.
Most discrepancies seem to come from color calculations being slightly
off.