browser-use

mirror of https://github.com/browser-use/browser-use synced 2026-05-13 17:56:35 +02:00

Author	SHA1	Message	Date
Magnus Müller	0f9ffa1072	Set user_data_dir to None	2025-06-20 10:38:01 +02:00
Magnus Müller	f89a97f256	Delete laminar files since it is integrated to the main eval service	2025-06-19 11:26:53 +02:00
Magnus Müller	2c627a93fc	Fix event loop error	2025-06-18 23:55:27 +02:00
Magnus Müller	5b4b3dd5aa	Integrate Laminar tracing and enhance evaluation workflow - Added integration for Laminar tracing by initializing `AsyncLaminarClient` and creating evaluation links during task execution. - Updated `TaskResult` class to include an optional `laminar_task_link` for task-specific links. - Enhanced logging for task results and added error handling for Laminar datapoint creation. - Improved the `run_task_with_semaphore` function to manage Laminar evaluation links and update datapoints with evaluation scores. These changes aim to streamline the evaluation process and improve tracking of task performance.	2025-06-18 23:27:30 +02:00
Magnus Müller	081f5747c6	Refactor run_task_with_semaphore to use laminar_task_link - Renamed variable `link` to `laminar_task_link` for clarity in the `run_task_with_semaphore` function. - Updated the creation of `TaskResult` to utilize `laminar_task_link` instead of the previous `link` variable. - Improved logging to reflect the new variable name, enhancing readability and maintainability. These changes aim to improve code clarity and maintain consistency in the evaluation workflow.	2025-06-18 22:41:56 +02:00
Magnus Müller	158aa5b719	Enhance evaluation workflow in laminar_eval.py and update dependencies - Removed the `lmnr[all]` dependency from `pyproject.toml`. - Added `browser-use[dev,eval]` to `dev-dependencies` for improved development support. - Updated `TaskResult` class to include an optional `laminar_task_link` for task-specific links. - Modified `run_task_with_semaphore` to handle Laminar evaluation links and improved logging for task results. - Added logic to create a Laminar evaluation link during task execution. These changes aim to streamline the evaluation process and enhance the overall functionality of the evaluation workflow.	2025-06-18 22:41:05 +02:00
Magnus Müller	9547ebb3bb	Refactor laminar_eval.py to enhance task evaluation workflow - Updated `run_task_with_semaphore` to use `lmnr_run_id` for evaluation ID instead of `run_id`. - Added a new helper function `start_new_run` to initiate evaluation runs on the server. - Improved logging for task results and server interactions. - Ensured proper handling of environment variables for server configuration. This refactor aims to streamline the evaluation process and improve error handling.	2025-06-18 21:04:47 +02:00
Robert Kim	3548d0fe35	lint	2025-06-18 14:55:00 +01:00
Magnus Müller	3e796a5773	Merge branch 'main' into laminar_evals	2025-06-18 15:34:26 +02:00
Robert Kim	a0d6a08119	v0	2025-06-18 14:21:48 +01:00
Magnus Müller	eb2aabb7f8	Remove duplicate	2025-06-17 19:21:29 +02:00
Magnus Müller	c66880a8fa	Add lmnr package for tracing integration and update eval workflow	2025-06-17 18:58:56 +02:00
Nick Sweeting	86abd92b79	reset timeouts in evals back to faster defaults	2025-06-11 01:46:41 -07:00
Nick Sweeting	520bbbe6c0	Update eval/service.py Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>	2025-06-11 02:28:13 -04:00
Nick Sweeting	1c64c483d9	allow https errors on evals because they include http urls for some reason	2025-06-10 06:27:35 -07:00
Nick Sweeting	ed6db5802b	fix timeouts order of magnitude	2025-06-10 05:42:23 -07:00
Nick Sweeting	a0ee5de2ad	tweak browsersession timeouts	2025-06-10 05:18:56 -07:00
Alezander9	8e7663758d	modify service to accept tasks with login cookies	2025-06-09 22:09:48 -07:00
Alezander9	739a3b2c87	add flag to toggle whether webjudge gets to see final agent result	2025-06-08 17:49:05 -07:00
Alezander9	bc56c73fe7	add ability to append results to existing run, so we can parallelize	2025-06-07 21:47:47 -07:00
Alezander9	b1cc677b01	add samba nova models	2025-06-06 17:38:36 -07:00
Alezander9	d9f1fb7bb4	track and report repo that code was run on in evals	2025-06-03 14:07:42 -07:00
Alezander9	61f9c31a3d	feat: support changing eval task set	2025-06-03 10:27:34 -07:00
Alezander9	4c7e173b62	add logs to track semaphore management	2025-05-24 12:49:04 -07:00
Alezander9	25b9c44bc3	fix issue where out of place return statement could skip server upload function	2025-05-24 12:03:07 -07:00
Alezander9	38c7307169	add more logs to track down missing tasks in pipeline	2025-05-24 10:25:22 -07:00
Alezander9	a3dd8b004b	update eval workflow with new arguments	2025-05-23 14:46:14 -07:00
Alezander9	3cfda361d1	move navigating to starting url code into the browser session setup	2025-05-23 11:44:04 -07:00
Alezander9	620fe6d254	add new layers of stage specific exception handlers	2025-05-23 11:35:12 -07:00
Alezander9	8233338d1f	new fix for gpt-o4-mini	2025-05-22 16:46:26 -07:00
Alezander9	c2d1ec50dc	refactor run task function and exception handling to be more elegant, and add fix for gpt-o4-mini	2025-05-22 16:42:28 -07:00
Alezander9	b8c453ee47	add gpt o4 mini	2025-05-22 16:03:57 -07:00
Alezander9	5f5a0931fe	add claude 4 opus	2025-05-22 11:07:34 -07:00
Alezander9	4a7e9113ca	add claude 4 support and cleanup eval script arguments	2025-05-22 10:54:00 -07:00
Alezander9	bd4091fd1a	remove old debug messages	2025-05-21 16:01:22 -07:00
Alezander9	a68b371ab5	remove old code and add timeouts to agent runs to prevent possible github action infinite stall	2025-05-21 14:50:23 -07:00
Alezander9	6b15840dc6	remove old model specific eval files	2025-05-21 14:43:18 -07:00
Alezander9	a8d661b2d0	consolidated changes: adapt refactored eval service to work with new browser and on github actions	2025-05-21 14:36:57 -07:00
Nick Sweeting	3e66046046	linter	2025-05-13 17:18:50 -07:00
Shoya SHIRAKI	4f117a5956	fix(eval): update GOOGLE_API_KEY comment to GEMINI_API_KEY	2025-05-03 09:16:57 +09:00
Nick Sweeting	2be4ba4f70	more pyupgrade changes	2025-05-02 20:50:21 +08:00
Magnus Müller	db29a1c9d5	Track the source with an parameter	2025-05-01 20:04:31 +08:00
Parth A. Patel	5250aefe58	fix: temperature can not be zero for reasoning models	2025-04-29 01:45:21 -07:00
Parth A. Patel	3860d526d4	eval: adding o4 mini support	2025-04-29 01:38:34 -07:00
Christian Clauss	fb3282527d	Detect blocking synchronous commands in asyncio code	2025-04-21 22:30:43 +02:00
Nick Sweeting	25d067d67b	Merge pull request #1421 from Alezander9/connect-eval-tool	2025-04-21 12:45:19 -07:00
Nick Sweeting	26d869c4f1	Merge branch 'main' into improve-eval-tool	2025-04-21 11:18:45 -07:00
Alezander9	aeaef4af57	feat: add gpt-4o-mini	2025-04-21 10:24:22 -07:00
Alezander9	8449b11ae0	refactored run task to more elegeantly handle various failure modes and still send results to server	2025-04-19 15:28:05 -07:00
Alezander9	bc95b0e2fa	feat: add user message as argument, pass total tasks into run info sent to server	2025-04-19 11:58:33 -07:00

1 2

71 Commits