We were previously using the absence of a results directory to decide
if we should disable live display. However, this was never the case -
the results directory would become unconditionally set in main(). This
caused all test-web output in CI to become hidden.
Instead, let's add a verbosity mode to display test output explicitly.
This is set for CI.
This also makes it much easier to debug single tests locally. The live
capture and results file combo is a bit overkill for running a single
test; we can now just add "-v" to the command line to see the output
directly in the terminal.
This fixes a regression introduced in bf77aeb3d which added support for
WPT variant meta tags.
The problem occurs when a text test has no corresponding expectation
file. The test should fail, but instead it passes. This happens because
of the following execution flow:
1. Test loads and on_load_finish is called
2. internals.loadTestVariants() is executed (for all text tests)
3. The test JS calls signalTestIsDone(), setting did_finish_test = true
4. The variant metadata callback fires with an empty array (no variants)
5. In on_test_variant_metadata, since did_finish_test is true, it
directly calls
view.on_test_complete({ test_index, TestResult::Pass })
The bug is in step 5: the code bypasses handle_completed_test() which is
where the expectation file comparison happens. Instead of calling
on_test_complete() (the lambda that invokes handle_completed_test), it
directly returns Pass.
When tests are being collected, the paths get canonicalized via the
FileSystem::real_path() call. This means on Windows the paths are using
`\` as the directory separator.
Tests get removed by performing string-based matching on a hardcoded
set of support file globs and an optional user defined glob. Even if we
normalized these globs to use `\` instead of `/` on Windows, they still
wouldn't match because AK::StringUtils::matches() uses `\` to escape
metacharacters. For example, `my\relative\windows\path` will not match
`my\*\path` because the mask is interpreted as `"my*\path"` instead.
So we normalize paths to use POSIX separators to support the existing
glob matching infrastructure. This allows Windows to filter out support
files properly which prevents us from crashing when Ref tests try to
load one of those files. It also allows Windows to use `/`-based paths
for the -f argument.
Use to_truncated_seconds() instead of to_seconds() for the per-view
duration display. The latter rounds up, causing "1s" to appear almost
immediately when a test starts. Now "1s" only appears after a full
second has elapsed.
Previously, when a test's expectation file didn't exist, test-web would
return an error immediately without generating diff output files. This
made it difficult to see what the actual test output was.
Now, when an expectation file is missing, test-web continues to generate
the diff files (showing all output as additions) so users can inspect
the actual output and create expectation files if needed.
Also defer the "Failed opening" warning to avoid cluttering the live
display output.
Add support for WPT test variants, which allow a single test file to be
run multiple times with different URL query parameters. Tests declare
variants using `<meta name="variant" content="?param=value">` tags.
When test-web encounters a test with variants, it expands that test into
multiple runs, each with its own expectation file using the naming
convention `testname@variant.txt` (e.g., `test@run_type=uri.txt`).
Implementation details:
- WebContent observes variant meta tags and communicates them to the
test runner via a new `did_receive_test_variant_metadata` IPC call
- test-web dynamically expands tests with variants during execution,
waking idle views after each test completion to pick up new work
- Use index-based test tracking to avoid dangling references when the
test vector grows during variant expansion
- Introduce TestRunContext to group test run state, and store a static
pointer to it for signal handler access
This enables proper testing of WPT tests that use variants, such as the
html5lib parsing tests (which test uri, write, and write_single modes)
and the editing/bold tests (which split across multiple ranges).
Two fixes for crash handling:
1. Re-setup output capture after WebContent respawns. When WebContent
crashes, a new process is spawned with new stdout/stderr pipes.
We need to set up new notifiers to drain them, otherwise the new
WebContent blocks on write() when its pipe buffer fills up.
2. Disconnect child view crash handlers between tests. Child views
(from window.open, iframes, etc.) persist after a test completes.
If they crash later, we don't want that to affect subsequent tests.
The timeout timer for ref tests was stored in a local variable that
went out of scope when run_ref_test() returned, causing the timer to
be destroyed before it could fire.
Store the timer in test.timeout_timer (like dump tests already do)
so it persists until the test completes.
When a WebContent process crashes, its stdout/stderr pipes will signal
POLLHUP. The output capture notifiers were not handling this case,
causing the event loop to spin at 100% CPU as poll() kept returning
immediately with POLLHUP set.
Fix by disabling the notifiers when read() returns 0 or an error.
After a WebContent process crashes, the view's client is no longer
valid. Calling reset_zoom() would attempt to send IPC messages to the
dead process. Skip this operation when the test result indicates a
crash.
- Generate colorized HTML diff files (.diff.html) alongside plain text
diffs (.diff.txt) for each failing test
- Add total test count to results summary
- Improve the results-index.html viewer with a dark theme, keyboard
navigation, search/filter, and tabbed interface for viewing diffs,
expected/actual output, and stdout/stderr
- Move s_total_tests assignment outside live display block so it's
always set
- Add --results-dir CLI flag to specify output directory
- Default to {tempdir}/test-web-results if not specified
- Capture stdout/stderr from all helper processes (WebContent,
RequestServer, ImageDecoder) to prevent output spam
- Save captured output to per-test files in results directory
- Save test diffs (expected vs actual) to results directory
- Generate HTML index of failed tests with links to diffs
- Display live-updating concurrent test status with progress bar
- Defer warning messages until after test run completes
Clearing the callback opens a window for the WebContent process to crash
while we do not have a callback set. In practice, we see this with ASAN
crashes, where the crash occurs after we've already signaled that the
test has completed.
We now set the crash handler once during init. This required moving the
clearing of other callbacks to the test completion handler (we were
clearing these callbacks in several different ways anyways, so now we
will at least be consistent).
When trying to repro a failed CI test, it is handy to know the order in
which test-web ran its tests. We've seen in the past that the exact
order can be important to debug flakey tests.
This adds the index of the WebView running the test to verbose log
statements (which is enabled in CI).
When a test is active in a test-web view, show the relative path to the
test instead of the view's URL. This gives a better starting point for
debugging than whatever the last loaded URL happened to be.
If no test is active, we still show the view's URL.
If you interrupt the test runner (Ctrl+C, SIGINT) or if the test runner
is gracefully being terminated (SIGTERM), it now reports the current
status of all the spawned views with their URL and, if an active test is
still being run, the time since the start of the test.
Hopefully this will help gain insight into which tests are hanging.
The test logs tend to get a bit mixed together, so this makes it
possible to identify which values correspond to which test when multiple
are failing at once.
With the newly supported fuzzy matching in our test-web runner, we can
now define the expected maximum color channel and pixel count errors per
failing test and set a baseline they should not exceed.
The figures I added to these tests all come from my macOS M4 machine.
Most discrepancies seem to come from color calculations being slightly
off.
WPT reference tests can add metadata to tests to instruct the test
runner how to interpret the results. Because of this, it is not enough
to have an action that starts loading the (mis)match reference: we need
the test runner to receive the metadata so it can act accordingly.
This sets our test runner up for potentially supporting multiple
(mis)match references, and fuzzy rendering matches - the latter will be
implemented in the following commit.
This produces more granural information on the actual pixel errors
present between two bitmaps. It now also asserts that both bitmaps are
the same size, since we should never compare two differently sized
bitmaps anyway.
Of the available 71 screenshot tests that we have, 42 fail on macOS
arm64. Let's make it possible to skip those and run the remaining
succeeding screenshot tests on arm64 anyway.
When a test has a `reftest-wait` class, we now wait for fonts to load
and then for 2 `requestAnimationFrame` callbacks. This matches the
behavior of the WPT runner and allows more imported WPT tests to work
correctly.
Now that headless mode is built into the main Ladybird executable, the
headless-browser's only purpose is to run tests. So let's move it to the
testing directory and rename it to test-web (a la test-js / test-wasm).