Nick Sweeting
|
2c9eaf700a
|
Update claude.yml
|
2025-07-15 18:42:37 -04:00 |
|
Nick Sweeting
|
f46fbc6ce5
|
Update claude.yml to use opus
|
2025-07-15 17:51:33 -04:00 |
|
Nick Sweeting
|
464a51512f
|
more claude code permissions tweaks and set up python env
|
2025-07-15 17:42:52 -04:00 |
|
Nick Sweeting
|
520fc3abc4
|
Give claude code action more bash permissions
|
2025-07-15 17:29:09 -04:00 |
|
Alexander Yue
|
4a2e52ab36
|
Merge branch 'main' into move-eval
|
2025-07-11 22:36:41 -07:00 |
|
Alezander9
|
883c47bb04
|
move eval workflow
|
2025-07-11 22:35:47 -07:00 |
|
Nick Sweeting
|
fd07360f57
|
simpler page title fetching
|
2025-07-10 18:35:24 -07:00 |
|
Nick Sweeting
|
cdca6339f6
|
try all browsers for evals
|
2025-07-10 16:24:47 -07:00 |
|
Nick Sweeting
|
cb4f6be2a8
|
Merge branch 'main' into cdp-loading
|
2025-07-10 05:55:01 -07:00 |
|
Gregor Žunič
|
1eac01dfab
|
openrouter evals support
|
2025-07-10 12:51:52 +02:00 |
|
Nick Sweeting
|
4b4b93f6cc
|
tweak chrome used for test.yaml evaluate_Tasks
|
2025-07-09 15:32:10 -07:00 |
|
Nick Sweeting
|
3bd76cea98
|
tweak chrome used for test.yaml evaluate_Tasks
|
2025-07-09 15:26:07 -07:00 |
|
Magnus Müller
|
bc5ff33b09
|
eval remove laminar args
|
2025-07-09 20:02:00 +02:00 |
|
Magnus Müller
|
f5763bdf49
|
add browser use logging
|
2025-07-09 18:01:09 +02:00 |
|
Magnus Müller
|
db3fa28442
|
eval laminar key name
|
2025-07-09 17:59:47 +02:00 |
|
Nick Sweeting
|
435426fc9a
|
bump cache action version
|
2025-07-08 18:40:36 -07:00 |
|
Nick Sweeting
|
b4a8776fec
|
speed up chrome install in CI
|
2025-07-08 18:19:08 -07:00 |
|
Nick Sweeting
|
1fa7fee4f6
|
fix cache key for tests
|
2025-07-08 18:10:14 -07:00 |
|
Nick Sweeting
|
7cf6a26664
|
fix flipped order
|
2025-07-08 18:07:43 -07:00 |
|
Nick Sweeting
|
1c6b510f07
|
use sudo for curl to update
|
2025-07-08 18:05:20 -07:00 |
|
Nick Sweeting
|
28f0d4d401
|
use runner arch in cache key
|
2025-07-08 18:03:11 -07:00 |
|
Nick Sweeting
|
b206db41a1
|
use consistent bin name
|
2025-07-08 18:01:24 -07:00 |
|
Nick Sweeting
|
32e5430b62
|
only cache actual binary
|
2025-07-08 18:00:55 -07:00 |
|
Nick Sweeting
|
14030006db
|
fix missing sudo
|
2025-07-08 17:57:18 -07:00 |
|
Nick Sweeting
|
4599f815f2
|
try to cache chrome apt package
|
2025-07-08 17:55:44 -07:00 |
|
Aitor
|
e409c36fd7
|
feat: forward unikraft secrets to the eval workflow .yaml
|
2025-07-08 17:40:59 +02:00 |
|
Nick Sweeting
|
3f84d1c460
|
set in_docker in evals
|
2025-07-08 06:09:30 -07:00 |
|
Nick Sweeting
|
4d8bdb3dbf
|
install all browser versions for evals and tests
|
2025-07-08 06:05:20 -07:00 |
|
Nick Sweeting
|
7403c33be3
|
fix user-data-dir matching
|
2025-07-08 05:05:49 -07:00 |
|
Magnus Müller
|
8ed1f6cb88
|
Update failing test
|
2025-07-08 13:43:12 +02:00 |
|
Nick Sweeting
|
fdba54fb34
|
add pyright to pre-commit hooks
|
2025-07-07 18:03:55 -07:00 |
|
reformedot
|
b7fa04d336
|
feat: add parameters to remove images and css in the eval.yaml
|
2025-07-07 16:26:54 +02:00 |
|
Aitor
|
d032a1ec61
|
fix: update eval.yaml to use full HD screen resolution
|
2025-07-07 09:44:50 +02:00 |
|
reformedot
|
9de712d702
|
feat: added browser settings to browser profile
|
2025-07-06 20:00:06 +02:00 |
|
Mert Unsal
|
1124e82cd3
|
Merge branch 'main' into mert/fix_encoding
|
2025-07-06 18:15:22 +02:00 |
|
mertunsall
|
3bec6fc9bf
|
add qwen2.5-vl-72b-instruct into evals
|
2025-07-06 18:10:13 +02:00 |
|
Magnus Müller
|
2b7367677f
|
eval-log-level
|
2025-07-06 17:53:49 +02:00 |
|
Magnus Müller
|
25392d9cde
|
add multiple last screenshots to llm input message
|
2025-07-06 13:50:31 +02:00 |
|
Magnus Müller
|
0612eb0aae
|
eval-repeat-judge
|
2025-07-06 12:25:13 +02:00 |
|
reformedot
|
8c5672b1ef
|
feat: added support for Browserbase and Hyperbrowser as available browsers in the eval
fix: revert example change
|
2025-07-05 16:20:24 +02:00 |
|
reformedot
|
8754e22ce3
|
feat: added browser arg to the eval script
|
2025-07-04 23:40:12 +02:00 |
|
Saurav Panda
|
8cf64699ad
|
refc: removed debug logs from eval.yamml
|
2025-07-04 03:21:16 -07:00 |
|
Saurav Panda
|
f3fa86ea21
|
feat: added 2fa token parsing logic
|
2025-07-04 02:58:04 -07:00 |
|
Saurav Panda
|
d87380b643
|
debugging: gmail 2fa json data
|
2025-07-04 01:29:53 -07:00 |
|
Saurav Panda
|
4ed5d96ef5
|
updated eval with toJson mapping
|
2025-07-04 01:17:43 -07:00 |
|
Saurav Panda
|
bbfbcebd6e
|
feat: added multi credential support
|
2025-07-04 00:15:38 -07:00 |
|
Saurav Panda
|
c7fedf5117
|
Merge branch 'main' into 2fa_gmail_integration
|
2025-07-03 22:03:35 -07:00 |
|
Magnus Müller
|
fc8d6b1c14
|
eval enable debug
|
2025-07-03 23:33:25 +02:00 |
|
Saurav Panda
|
2cd21e18e6
|
feat: removed debug from evals
|
2025-07-03 10:19:37 -07:00 |
|
Saurav Panda
|
20f66b9fc7
|
Update eval.yaml
|
2025-07-03 00:11:39 -07:00 |
|