Commit Graph

93 Commits

Author SHA1 Message Date
Magnus Müller
aef11dc6f1 Parallel tests 2025-10-25 09:56:33 -07:00
Magnus Müller
4db51f9e24 Ci cd test update 2025-10-25 09:12:15 -07:00
Magnus Müller
9d2e379af7 Api key 2025-10-24 02:40:35 -07:00
Magnus Müller
c1abe88048 Higher timeout + weekly chrome cache 2025-10-24 02:30:29 -07:00
Magnus Müller
9f711b4281 Reduce timeout 2025-10-24 01:41:22 -07:00
Magnus Müller
46e727f7bd Enhance CI workflow by adding conditional caching for Chromium installation
This update introduces a caching mechanism for Chromium binaries in the GitHub Actions workflow. The installation of Chromium will now only occur if it is not already cached, reducing unnecessary downloads and speeding up the CI process. This change aims to optimize the workflow efficiency, particularly during parallel test runs.
2025-10-24 01:28:39 -07:00
Magnus Müller
e6cb8e7587 Add explicit uv package caching to speed up CI
The built-in astral-sh/setup-uv cache wasn't working properly,
causing 2m+ downloads of Python packages on every test run.

Added explicit ~/.cache/uv caching keyed on uv.lock hash.

Before: 2m 9s downloading packages (numpy, oci, imageio-ffmpeg, etc)
After: ~5s restoring from cache

Saves ~2 minutes per test job × 40 parallel jobs = 80 runner-minutes saved
2025-10-24 01:25:03 -07:00
Magnus Müller
2c7d3372e1 Add setup-chromium job to pre-cache chromium before parallel test runs
Prevents 40 parallel runners from racing to install chromium simultaneously on cache miss.

Before: 40 runners × 2min = 80 runner-minutes wasted on first run
After: 1 runner installs (2min), then 40 runners use cached version (10s each)

Savings on cache miss: ~78 runner-minutes per workflow run
2025-10-24 01:19:06 -07:00
Magnus Müller
a2b0f084be Refactor GitHub workflows to remove Playwright version detection and update cache keys for Chromium binaries based on uv.lock file. 2025-10-24 01:07:39 -07:00
Magnus Müller
2ca0ba38ec Bug playwright not found 2025-10-24 01:03:36 -07:00
Magnus Müller
633fb53b1a Enhance GitHub workflows by adding Playwright version detection and updating cache keys for Chromium binaries based on the installed Playwright version. 2025-10-24 01:00:26 -07:00
Magnus Müller
aba7e2cbae cancel tests on new commit 2025-09-08 08:52:25 -07:00
Magnus Müller
6bba023d38 Reduce timeout for GitHub Actions tests from 15 to 10 minutes to prevent hanging. 2025-09-02 00:24:09 -07:00
Magnus Müller
172951b8cd Fix GitHub Actions workflow hanging: add timeouts and force fresh checkout
- Add timeout-minutes to prevent jobs from hanging indefinitely
- Force fresh checkout with fetch-depth: 1 to avoid cache issues
- Add file existence check to handle renamed/deleted tests gracefully
- Add debug output to track test discovery process
2025-09-02 00:20:54 -07:00
Magnus Müller
dd6a187fc0 Type 2025-09-02 00:14:20 -07:00
Magnus Müller
616c81e435 Skip none existing tests 2025-09-02 00:14:06 -07:00
Magnus Müller
723c68c20c Test fails if 0% 2025-08-29 18:38:24 -07:00
Magnus Müller
657ea42efd Remove Playwright/Patchright dependencies - use uvx playwright install for browser only
- Updated test.yaml to use uvx playwright install chromium --with-deps --no-shell
- Removed Chrome stable installation and patchright dependencies
- Updated Dockerfile to use uvx playwright temporarily for browser installation
- Updated chromium Dockerfile to use uvx playwright without permanent dependency
- Simplified CI pipeline while maintaining browser functionality
2025-08-26 17:27:41 -07:00
Magnus Müller
caa0e7ef1b Rename controller to tools instances 2025-08-26 11:30:39 -07:00
Nick Sweeting
f5925b1080 fix browser-use extensions dir to use .config/browseruse like everything else and cache in actions 2025-07-29 13:04:44 -07:00
Nick Sweeting
fd07360f57 simpler page title fetching 2025-07-10 18:35:24 -07:00
Nick Sweeting
cdca6339f6 try all browsers for evals 2025-07-10 16:24:47 -07:00
Nick Sweeting
4b4b93f6cc tweak chrome used for test.yaml evaluate_Tasks 2025-07-09 15:32:10 -07:00
Nick Sweeting
3bd76cea98 tweak chrome used for test.yaml evaluate_Tasks 2025-07-09 15:26:07 -07:00
Nick Sweeting
435426fc9a bump cache action version 2025-07-08 18:40:36 -07:00
Nick Sweeting
b4a8776fec speed up chrome install in CI 2025-07-08 18:19:08 -07:00
Nick Sweeting
1fa7fee4f6 fix cache key for tests 2025-07-08 18:10:14 -07:00
Nick Sweeting
7cf6a26664 fix flipped order 2025-07-08 18:07:43 -07:00
Nick Sweeting
1c6b510f07 use sudo for curl to update 2025-07-08 18:05:20 -07:00
Nick Sweeting
28f0d4d401 use runner arch in cache key 2025-07-08 18:03:11 -07:00
Nick Sweeting
b206db41a1 use consistent bin name 2025-07-08 18:01:24 -07:00
Nick Sweeting
32e5430b62 only cache actual binary 2025-07-08 18:00:55 -07:00
Nick Sweeting
14030006db fix missing sudo 2025-07-08 17:57:18 -07:00
Nick Sweeting
4599f815f2 try to cache chrome apt package 2025-07-08 17:55:44 -07:00
Nick Sweeting
3f84d1c460 set in_docker in evals 2025-07-08 06:09:30 -07:00
Nick Sweeting
4d8bdb3dbf install all browser versions for evals and tests 2025-07-08 06:05:20 -07:00
Nick Sweeting
7403c33be3 fix user-data-dir matching 2025-07-08 05:05:49 -07:00
Magnus Müller
8ed1f6cb88 Update failing test 2025-07-08 13:43:12 +02:00
Nick Sweeting
fe3af9479a run only one proc for tests for now 2025-06-27 03:02:03 -07:00
Nick Sweeting
74d02c07a7 increase screenshot timeout to default timeout 2025-06-26 02:59:57 -07:00
Nick Sweeting
170c3e0bb7 remove duplicate timeout config 2025-06-26 01:24:03 -07:00
Nick Sweeting
7c317e9515 always group tests by class so they can reuse fixtures 2025-06-25 23:31:02 -07:00
Nick Sweeting
e3d21d33a1 fix evaluate_tasks.py errors in CI 2025-06-16 17:20:27 -07:00
Nick Sweeting
06488e11ba fix clickaction error handling test 2025-06-11 00:05:58 -07:00
Nick Sweeting
1fd8e0ec92 try statuses-write option 2025-06-10 23:57:41 -07:00
Nick Sweeting
ffd36eb5da tweak env vars for CI 2025-06-10 03:58:37 -07:00
Magnus Müller
eaab9f04d7 Enhance GitHub Actions workflow and evaluate_tasks.py to include detailed task evaluation results. The workflow now captures and displays detailed results in a structured format, while the Python script outputs detailed results as JSON for better integration with GitHub Actions. This improves visibility and understanding of task outcomes in the evaluation process. 2025-06-07 13:39:16 +02:00
Magnus Müller
3ecee462a2 Update GitHub Actions workflow permissions to allow writing comments on pull requests and issues, enhancing interaction capabilities for automated testing processes. 2025-06-07 13:22:43 +02:00
Magnus Müller
bdf29c34fb Add PR comment functionality to GitHub Actions workflow for agent task evaluation results. This includes a summary of passed tasks, percentage score, and status emoji based on task outcomes, enhancing visibility of evaluation results directly in pull requests. 2025-06-07 13:16:30 +02:00
Magnus Müller
8d9b24b03a Add summary output for agent tasks evaluation in CI workflow 2025-06-07 11:27:06 +02:00