browser-use

mirror of https://github.com/browser-use/browser-use synced 2026-05-06 17:52:15 +02:00

Author	SHA1	Message	Date
Alezander9	d9f1fb7bb4	track and report repo that code was run on in evals	2025-06-03 14:07:42 -07:00
Alezander9	61f9c31a3d	feat: support changing eval task set	2025-06-03 10:27:34 -07:00
Alezander9	4c7e173b62	add logs to track semaphore management	2025-05-24 12:49:04 -07:00
Alezander9	25b9c44bc3	fix issue where out of place return statement could skip server upload function	2025-05-24 12:03:07 -07:00
Alezander9	38c7307169	add more logs to track down missing tasks in pipeline	2025-05-24 10:25:22 -07:00
Alezander9	a3dd8b004b	update eval workflow with new arguments	2025-05-23 14:46:14 -07:00
Alezander9	3cfda361d1	move navigating to starting url code into the browser session setup	2025-05-23 11:44:04 -07:00
Alezander9	620fe6d254	add new layers of stage specific exception handlers	2025-05-23 11:35:12 -07:00
Alezander9	8233338d1f	new fix for gpt-o4-mini	2025-05-22 16:46:26 -07:00
Alezander9	c2d1ec50dc	refactor run task function and exception handling to be more elegant, and add fix for gpt-o4-mini	2025-05-22 16:42:28 -07:00
Alezander9	b8c453ee47	add gpt o4 mini	2025-05-22 16:03:57 -07:00
Alezander9	5f5a0931fe	add claude 4 opus	2025-05-22 11:07:34 -07:00
Alezander9	4a7e9113ca	add claude 4 support and cleanup eval script arguments	2025-05-22 10:54:00 -07:00
Alezander9	bd4091fd1a	remove old debug messages	2025-05-21 16:01:22 -07:00
Alezander9	a68b371ab5	remove old code and add timeouts to agent runs to prevent possible github action infinite stall	2025-05-21 14:50:23 -07:00
Alezander9	6b15840dc6	remove old model specific eval files	2025-05-21 14:43:18 -07:00
Alezander9	a8d661b2d0	consolidated changes: adapt refactored eval service to work with new browser and on github actions	2025-05-21 14:36:57 -07:00
Nick Sweeting	3e66046046	linter	2025-05-13 17:18:50 -07:00
Shoya SHIRAKI	4f117a5956	fix(eval): update GOOGLE_API_KEY comment to GEMINI_API_KEY	2025-05-03 09:16:57 +09:00
Nick Sweeting	2be4ba4f70	more pyupgrade changes	2025-05-02 20:50:21 +08:00
Magnus Müller	db29a1c9d5	Track the source with an parameter	2025-05-01 20:04:31 +08:00
Parth A. Patel	5250aefe58	fix: temperature can not be zero for reasoning models	2025-04-29 01:45:21 -07:00
Parth A. Patel	3860d526d4	eval: adding o4 mini support	2025-04-29 01:38:34 -07:00
Christian Clauss	fb3282527d	Detect blocking synchronous commands in asyncio code	2025-04-21 22:30:43 +02:00
Nick Sweeting	25d067d67b	Merge pull request #1421 from Alezander9/connect-eval-tool	2025-04-21 12:45:19 -07:00
Nick Sweeting	26d869c4f1	Merge branch 'main' into improve-eval-tool	2025-04-21 11:18:45 -07:00
Alezander9	aeaef4af57	feat: add gpt-4o-mini	2025-04-21 10:24:22 -07:00
Alezander9	8449b11ae0	refactored run task to more elegeantly handle various failure modes and still send results to server	2025-04-19 15:28:05 -07:00
Alezander9	bc95b0e2fa	feat: add user message as argument, pass total tasks into run info sent to server	2025-04-19 11:58:33 -07:00
Alezander9	0890da1eb3	feat: clear old files on new eval run unless fresh start is disabled	2025-04-19 11:15:20 -07:00
dha-aa	9b51ddd773	upgrade Grok model to for improved capabilities	2025-04-19 01:37:18 +00:00
Alezander9	8cf206997c	addressed feedback from mrge bot	2025-04-18 10:57:30 -07:00
Alezander9	4b0ebc3189	feat: update service to improve efficiency, fetch task via server and post results to server	2025-04-18 10:18:07 -07:00
Alezander9	8e07e31bb7	feat: add --no-vision as an argument to disable vision	2025-04-17 19:58:09 -07:00
Alezander9	057a69e298	feat: allow eval script to use any model for BU agent, add vision disabling for XAI models, test all models work in eval script	2025-04-17 19:51:50 -07:00
Alezander9	a386390652	address feedback from PR	2025-04-16 14:20:36 -07:00
Alezander9	a999325181	feat: add new evaluation script	2025-04-16 12:09:57 -07:00
Parth A. Patel	0c4a1ee0f9	nit: rename eval file to correct model name	2025-04-14 15:50:12 -07:00
Parth A. Patel	e448513f9d	evals: more models for evals	2025-04-14 15:48:54 -07:00
lorenss-m	2615e1286d	custom browser addition	2025-04-04 18:42:49 -07:00
Nick Sweeting	fb6fa259a8	apply ruff safe fixes	2025-03-28 18:11:36 -07:00
Nick Sweeting	e85e8f468d	add more pre-commit-hooks	2025-03-28 17:15:10 -07:00
Nick Sweeting	ec56bfe81b	run ruff on the entire codebase	2025-03-28 01:22:48 -07:00
Siddhant Somani	1f9386d636	Add grok eval	2025-03-15 18:09:27 -07:00
Magnus Müller	f2f8cf850d	Eval models	2025-02-23 09:15:28 -08:00
Magnus Müller	de57b4f55c	More eval examples	2025-02-22 08:51:55 -08:00
Magnus Müller	688a465fc5	Include test for no bounding-boxes	2025-02-21 16:18:57 -08:00
Magnus Müller	01a01312d6	Eval for claude and no-vision	2025-02-20 19:40:50 -08:00
Magnus Müller	76e4c2630d	Update gpt-4o.py	2025-02-16 02:25:04 +01:00
magmueller	7fb04ed9f1	Eval file	2025-02-10 16:20:14 -08:00

50 Commits