We were previously using the absence of a results directory to decide
if we should disable live display. However, this was never the case -
the results directory would become unconditionally set in main(). This
caused all test-web output in CI to become hidden.
Instead, let's add a verbosity mode to display test output explicitly.
This is set for CI.
This also makes it much easier to debug single tests locally. The live
capture and results file combo is a bit overkill for running a single
test; we can now just add "-v" to the command line to see the output
directly in the terminal.
This fixes a regression introduced in bf77aeb3d which added support for
WPT variant meta tags.
The problem occurs when a text test has no corresponding expectation
file. The test should fail, but instead it passes. This happens because
of the following execution flow:
1. Test loads and on_load_finish is called
2. internals.loadTestVariants() is executed (for all text tests)
3. The test JS calls signalTestIsDone(), setting did_finish_test = true
4. The variant metadata callback fires with an empty array (no variants)
5. In on_test_variant_metadata, since did_finish_test is true, it
directly calls
view.on_test_complete({ test_index, TestResult::Pass })
The bug is in step 5: the code bypasses handle_completed_test() which is
where the expectation file comparison happens. Instead of calling
on_test_complete() (the lambda that invokes handle_completed_test), it
directly returns Pass.
When tests are being collected, the paths get canonicalized via the
FileSystem::real_path() call. This means on Windows the paths are using
`\` as the directory separator.
Tests get removed by performing string-based matching on a hardcoded
set of support file globs and an optional user defined glob. Even if we
normalized these globs to use `\` instead of `/` on Windows, they still
wouldn't match because AK::StringUtils::matches() uses `\` to escape
metacharacters. For example, `my\relative\windows\path` will not match
`my\*\path` because the mask is interpreted as `"my*\path"` instead.
So we normalize paths to use POSIX separators to support the existing
glob matching infrastructure. This allows Windows to filter out support
files properly which prevents us from crashing when Ref tests try to
load one of those files. It also allows Windows to use `/`-based paths
for the -f argument.
ENABLE_WINDOWS_CI and the *_CI presets were initially added back when
the AK library and all the AK Test* executables were the only targets
that supported building and running in CI. Since then, almost all the
targets in the codebase are built on Windows besides the following:
- LibLine
- test-262-runner
Since these targets above are not required to actually run or test the
browser on Windows in its current experimental state, fully disabling
them should be fine for now.
ENABLE_WINDOWS_CI was also used to exclude test-web from ctest. This
can be fully disabled on Windows for now until proper runtime support
is added.
The remaining locations were all using ENABLE_WINDOWS_CI as a proxy for
ENABLE_ADDRESS_SANITIZER, so we can just be explicit instead.
The new presets map much more directly to the unix Release, Debug, and
Sanitizer presets which should make setting up ladybird on Windows less
confusing.
We also make the new Windows_Experimental_Release preset the default in
ladybird.py to match Unix.
Use to_truncated_seconds() instead of to_seconds() for the per-view
duration display. The latter rounds up, causing "1s" to appear almost
immediately when a test starts. Now "1s" only appears after a full
second has elapsed.
Previously, when a test's expectation file didn't exist, test-web would
return an error immediately without generating diff output files. This
made it difficult to see what the actual test output was.
Now, when an expectation file is missing, test-web continues to generate
the diff files (showing all output as additions) so users can inspect
the actual output and create expectation files if needed.
Also defer the "Failed opening" warning to avoid cluttering the live
display output.
Add support for WPT test variants, which allow a single test file to be
run multiple times with different URL query parameters. Tests declare
variants using `<meta name="variant" content="?param=value">` tags.
When test-web encounters a test with variants, it expands that test into
multiple runs, each with its own expectation file using the naming
convention `testname@variant.txt` (e.g., `test@run_type=uri.txt`).
Implementation details:
- WebContent observes variant meta tags and communicates them to the
test runner via a new `did_receive_test_variant_metadata` IPC call
- test-web dynamically expands tests with variants during execution,
waking idle views after each test completion to pick up new work
- Use index-based test tracking to avoid dangling references when the
test vector grows during variant expansion
- Introduce TestRunContext to group test run state, and store a static
pointer to it for signal handler access
This enables proper testing of WPT tests that use variants, such as the
html5lib parsing tests (which test uri, write, and write_single modes)
and the editing/bold tests (which split across multiple ranges).
Two fixes for crash handling:
1. Re-setup output capture after WebContent respawns. When WebContent
crashes, a new process is spawned with new stdout/stderr pipes.
We need to set up new notifiers to drain them, otherwise the new
WebContent blocks on write() when its pipe buffer fills up.
2. Disconnect child view crash handlers between tests. Child views
(from window.open, iframes, etc.) persist after a test completes.
If they crash later, we don't want that to affect subsequent tests.
The timeout timer for ref tests was stored in a local variable that
went out of scope when run_ref_test() returned, causing the timer to
be destroyed before it could fire.
Store the timer in test.timeout_timer (like dump tests already do)
so it persists until the test completes.
When a WebContent process crashes, its stdout/stderr pipes will signal
POLLHUP. The output capture notifiers were not handling this case,
causing the event loop to spin at 100% CPU as poll() kept returning
immediately with POLLHUP set.
Fix by disabling the notifiers when read() returns 0 or an error.
After a WebContent process crashes, the view's client is no longer
valid. Calling reset_zoom() would attempt to send IPC messages to the
dead process. Skip this operation when the test result indicates a
crash.
- Generate colorized HTML diff files (.diff.html) alongside plain text
diffs (.diff.txt) for each failing test
- Add total test count to results summary
- Improve the results-index.html viewer with a dark theme, keyboard
navigation, search/filter, and tabbed interface for viewing diffs,
expected/actual output, and stdout/stderr
- Move s_total_tests assignment outside live display block so it's
always set
- Add --results-dir CLI flag to specify output directory
- Default to {tempdir}/test-web-results if not specified
- Capture stdout/stderr from all helper processes (WebContent,
RequestServer, ImageDecoder) to prevent output spam
- Save captured output to per-test files in results directory
- Save test diffs (expected vs actual) to results directory
- Generate HTML index of failed tests with links to diffs
- Display live-updating concurrent test status with progress bar
- Defer warning messages until after test run completes
Clearing the callback opens a window for the WebContent process to crash
while we do not have a callback set. In practice, we see this with ASAN
crashes, where the crash occurs after we've already signaled that the
test has completed.
We now set the crash handler once during init. This required moving the
clearing of other callbacks to the test completion handler (we were
clearing these callbacks in several different ways anyways, so now we
will at least be consistent).
This mode allows us to test the HTTP disk cache with two mechanisms:
1. If RequestServer is launched with --http-disk-cache-mode=testing, it
will cache requests with a X-Ladybird-Enable-Disk-Cache header.
2. In test mode, RS will include a X-Ladybird-Disk-Cache-Status response
header indicating how the response was handled by the cache. There is
no standard way for a web request to know what happened with respect
to the disk cache, so this fills that hole for testing.
This mode is not exposed to users.
The function currently has 2 purposes: (1) To copy dependent dlls for
executables to output binary directory. This ensures that these helper
processes can be ran after a build given not all DLLs from vcpkg libs
get implicitly copied to the bin folder. (2) Allow fully background
and/or GUI processes to use the Windows Subsystem. This prevents
unnecessarily launching a console for the process, as we either require
no user interaction or the user interaction is all handled in the GUI.
This is used by tests to set the default time zone to UTC.
This is because certain tests create JavaScript Date objects, which are
in the current timezone.
We only supported headless clipboard management in test-web. So when WPT
tests the clipboard APIs, we would blindly try to access the Qt app,
which does not exist.
Note that the AppKit UI has no such restriction, as the NSPasteboard is
accessible even without a GUI.
When trying to repro a failed CI test, it is handy to know the order in
which test-web ran its tests. We've seen in the past that the exact
order can be important to debug flakey tests.
This adds the index of the WebView running the test to verbose log
statements (which is enabled in CI).
Clipboard handling largely has nothing to do with the individual web
views. Rather, we interact with the system clipboard at the application
level. So let's move these implementations to the Application.
For every ref tests actual and expected screenshots are taken. These
screenshots are only needed while the individual test executes. However,
they are never freed during the run, leading to a continuous increase in
memory usage of the test runner while executing ref tests.
With the number of ref tests growing, this currently amounts to nearly 3
GB of uncompressed bitmap data being held in memory. Lets avoid that by
clearing the screenshot data at the end of each test. With this change
applied, the memory usage of test-web stays stable and below 100 MB for
the entire test run.
When a test is active in a test-web view, show the relative path to the
test instead of the view's URL. This gives a better starting point for
debugging than whatever the last loaded URL happened to be.
If no test is active, we still show the view's URL.
The BUILD_RPATH/INSTALL_RPATH CMake infrastructure is not supported
on Windows, but we want to ensure Ladybird executables are runnable
after the build phase so there can be an efficient dev loop.
lagom_copy_runtime_dlls() can be used by executable targets so all
their dependent dlls are copied to their output directory in their
post build step.
We occasionally (frequently) time out in CI. If ctest triggers this
timeout, it sends a SIGSTOP followed by a SIGKILL. Since we want to
gracefully exit the test runner, ask ctest to send a SIGTERM instead.
This should cause active test status reports to show up in CI.
If you interrupt the test runner (Ctrl+C, SIGINT) or if the test runner
is gracefully being terminated (SIGTERM), it now reports the current
status of all the spawned views with their URL and, if an active test is
still being run, the time since the start of the test.
Hopefully this will help gain insight into which tests are hanging.
The test logs tend to get a bit mixed together, so this makes it
possible to identify which values correspond to which test when multiple
are failing at once.
With the newly supported fuzzy matching in our test-web runner, we can
now define the expected maximum color channel and pixel count errors per
failing test and set a baseline they should not exceed.
The figures I added to these tests all come from my macOS M4 machine.
Most discrepancies seem to come from color calculations being slightly
off.