Commit Graph

4515 Commits

Author SHA1 Message Date
Nick Sweeting
b061e9bc22 limit navigation time for tests so they fail faster 2025-07-10 18:40:08 -07:00
Nick Sweeting
fd07360f57 simpler page title fetching 2025-07-10 18:35:24 -07:00
Nick Sweeting
848fdf8aad clearer exception handling 2025-07-10 18:26:08 -07:00
Nick Sweeting
3cb45f1c8e dont include crashed tabs in list for agent 2025-07-10 18:05:53 -07:00
Nick Sweeting
4f55a0e916 bump sleep time 2025-07-10 18:02:54 -07:00
Nick Sweeting
c96af9db7f allow more time in recovery 2025-07-10 18:02:30 -07:00
Nick Sweeting
f2091a86d5 make screenshot tests parallel more lax 2025-07-10 17:57:09 -07:00
Nick Sweeting
275a848d12 force-close pages that have crashed opportunistically 2025-07-10 17:48:47 -07:00
Nick Sweeting
651ae9101b force page list refresh after force CDP close of page 2025-07-10 17:29:19 -07:00
Nick Sweeting
b05da141b9 lax timeouts for pageload for easier tests 2025-07-10 17:19:14 -07:00
Nick Sweeting
5f9120338d Merge branch 'main' into cdp-loading 2025-07-10 16:58:51 -07:00
Nick Sweeting
c321656ffe use 1px white png for about:blank screenshots 2025-07-10 16:58:24 -07:00
Nick Sweeting
253f7cff62 use 1px white png for about:blank screenshots 2025-07-10 16:54:49 -07:00
Saurav Panda
22da5225ac updated gmail example with oauth setup guide (#2393)
<!-- This is an auto-generated description by cubic. -->

## Summary by cubic
Updated the Gmail 2FA integration example to include a step-by-step
OAuth setup guide and a new grant mechanism for handling missing or
invalid credentials.

- **New Features**
- Added GmailGrantManager for interactive OAuth credential setup,
validation, and recovery.
- Improved error handling with clear user guidance and fallback
authentication flows.
- Expanded example to demonstrate 2FA code detection and a complete
login flow.

<!-- End of auto-generated description by cubic. -->
2025-07-10 16:51:09 -07:00
Nick Sweeting
317b2d1fe3 allow agent to attempt to use crashed pages 2025-07-10 16:30:58 -07:00
Nick Sweeting
cdca6339f6 try all browsers for evals 2025-07-10 16:24:47 -07:00
Saurav Panda
da88567a87 updated gmail example with oauth setup guide 2025-07-10 16:23:13 -07:00
Nick Sweeting
d66cf32615 tweak nav error handling and logging 2025-07-10 16:22:18 -07:00
Nick Sweeting
fad1fb38a9 hard cap on nav 2025-07-10 06:29:22 -07:00
Nick Sweeting
ed528276af fix test 2025-07-10 06:12:48 -07:00
Nick Sweeting
cb4f6be2a8 Merge branch 'main' into cdp-loading 2025-07-10 05:55:01 -07:00
Nick Sweeting
5a15642d6a crashed page recovery 2025-07-10 05:17:45 -07:00
Gregor Žunič
072b877e7e Openrouter evals support (#2391)
<!-- This is an auto-generated description by cubic. -->

## Summary by cubic
Added support for evaluating OpenRouter models by enabling wildcard
model matching and updating environment variable handling.

- **New Features**
  - Added OpenRouter wildcard pattern to supported models.
  - Updated model config logic to handle wildcard matches.
  - Included OpenRouter API key in workflow environment.
  - Added an example script for running OpenRouter models.

<!-- End of auto-generated description by cubic. -->
2025-07-10 13:13:15 +02:00
Gregor Žunič
1eac01dfab openrouter evals support 2025-07-10 12:51:52 +02:00
Gregor Žunič
b850a43a4f added grok-4 2025-07-10 11:00:17 +02:00
Alexander Yue
ea6b46600a add 2FA usage into the auth prompt injeection section (#2389)
<!-- This is an auto-generated description by cubic. -->

## Summary by cubic
Updated the authentication prompt to include instructions for handling
2FA and OTP codes using the get_recent_emails action.

<!-- End of auto-generated description by cubic. -->
2025-07-09 18:37:43 -07:00
Alexander Yue
4fa0d74f61 Merge branch 'main' into 2FA-prompt 2025-07-09 18:26:25 -07:00
Alezander9
bb7db834d5 add 2FA usage into the auth prompt injeection section 2025-07-09 18:24:27 -07:00
Alexander Yue
6f7a813698 always tell agent to not use credentials unless necessary when provid… (#2388)
…ed credentials in evals
    
<!-- This is an auto-generated description by cubic. -->
---

## Summary by cubic
Updated agent instructions to only use provided credentials if required
for the task, reducing unnecessary logins.

<!-- End of auto-generated description by cubic. -->
2025-07-09 18:16:19 -07:00
Alezander9
e035d490c3 always tell agent to not use credentials unless necessary when provided credentials in evals 2025-07-09 18:03:49 -07:00
Nick Sweeting
4b4b93f6cc tweak chrome used for test.yaml evaluate_Tasks 2025-07-09 15:32:10 -07:00
Nick Sweeting
3bd76cea98 tweak chrome used for test.yaml evaluate_Tasks 2025-07-09 15:26:07 -07:00
Nick Sweeting
1f698ff1c2 handle timeouts during JS execution 2025-07-09 15:05:33 -07:00
Nick Sweeting
1fb1254fe3 handle timeouts during JS execution 2025-07-09 15:04:50 -07:00
Nick Sweeting
ab0f9cae8a lower initial nav timeout because we wait for loading later on 2025-07-09 14:54:02 -07:00
Nick Sweeting
e461380998 switch evaluate_tasks back to patchright+chrome 2025-07-09 14:51:42 -07:00
Nick Sweeting
d1f7af3636 better logging of crashed pages 2025-07-09 14:50:36 -07:00
Nick Sweeting
bdf4f3caab remove duplicate prepare_user_data_dir 2025-07-09 14:42:29 -07:00
Nick Sweeting
6f2e60d9a9 assert pages are still usable after skipping loading 2025-07-09 14:41:08 -07:00
Nick Sweeting
f81c381a65 loading animation never blocks 2025-07-09 14:19:25 -07:00
Nick Sweeting
73fe288370 only warn storage_state conflict with user_data_dir when not tmp user_data_dir 2025-07-09 14:00:17 -07:00
Nick Sweeting
c0b139e8e9 ignore nav during nav 2025-07-09 13:29:50 -07:00
Magnus Müller
91a61e9c34 eval remove laminar args (#2384)
Auto-generated PR for: eval remove laminar args
    
<!-- This is an auto-generated description by cubic. -->
---

## Summary by cubic
Removed the laminar logging level argument from the eval workflow to
simplify command arguments.

<!-- End of auto-generated description by cubic. -->
2025-07-09 20:02:27 +02:00
Magnus Müller
bc5ff33b09 eval remove laminar args 2025-07-09 20:02:00 +02:00
Magnus Müller
359a83dda3 eval laminar key name (#2382)
Auto-generated PR for: eval laminar key name
    
<!-- This is an auto-generated description by cubic. -->
---

## Summary by cubic
Updated the eval workflow to use LAMINAR_LOGGING_LEVEL instead of
BROWSER_LOGGING_LEVEL for logging configuration. This ensures the
correct environment variable and script argument are set during
evaluation runs.

<!-- End of auto-generated description by cubic. -->
2025-07-09 18:01:21 +02:00
Magnus Müller
f5763bdf49 add browser use logging 2025-07-09 18:01:09 +02:00
Magnus Müller
db3fa28442 eval laminar key name 2025-07-09 17:59:47 +02:00
Magnus Müller
559afe938b Ignore about:blank tab in context (#2374)
<!-- This is an auto-generated description by cubic. -->

## Summary by cubic
Fixed a bug where tabs with real websites were incorrectly reported as
"about:blank" and marked as unusable. Now, only true about:blank tabs
are ignored, and tabs with valid URLs show a fallback title if the page
title can't be retrieved.

- **Bug Fixes**
  - Preserves the real URL for tabs when title retrieval fails.
- Uses a descriptive fallback title instead of marking valid tabs as
unusable.

<!-- End of auto-generated description by cubic. -->
2025-07-09 16:53:07 +02:00
Magnus Müller
418802c810 Merge branch 'main' into cursor/ignore-about-blank-tab-in-context-d397 2025-07-09 16:52:27 +02:00
Magnus Müller
53d595d967 change laminar debug secret (#2378)
Auto-generated PR for: change laminar debug secret
    
<!-- This is an auto-generated description by cubic. -->
---

## Summary by cubic
Switched debug mode checks from BROWSER_USE_LOGGING_LEVEL to
LMNR_LOGGING_LEVEL and removed unused verbose observability logging.

<!-- End of auto-generated description by cubic. -->
2025-07-09 16:51:53 +02:00