browser-use

mirror of https://github.com/browser-use/browser-use synced 2026-05-06 17:52:15 +02:00

Author	SHA1	Message	Date
Magnus Müller	d4a29c4b93	Improves evaluation robustness and reporting Enhances evaluation by improving error handling, providing more detailed logging, and adding a local summary calculation. The changes include: - Adds comprehensive judge fallback to Mind2Web judge and ensures backward compatibility. - Improves error handling during evaluation by capturing and logging the last part of the output on failure. - Adds a new function to calculate a summary of local evaluation results, displaying total tasks, success rate, and average score. - Includes comprehensive evaluation data for debugging purposes.	2025-06-23 00:08:14 +02:00
Magnus Müller	be170fb17a	Ensures payload serialization preserves dict structure Adds a type assertion to ensure that the payload remains a dictionary after serialization. Also, adds type hints to `make_json_serializable` for better code clarity and maintainability.	2025-06-22 23:50:46 +02:00
Magnus Müller	4a26f07c66	Ensures JSON serializability of task results Adds a utility function to convert objects within a payload to JSON-serializable types before returning the task result. This change addresses potential issues where the task result contains non-serializable objects (e.g., enums, custom objects), preventing proper data handling.	2025-06-22 23:46:39 +02:00
Magnus Müller	0a5a29e4a8	Updates judge system import path Updates the import path for the comprehensive judge system to reflect its new location in the project structure. This resolves an issue where the previous relative import was causing import errors.	2025-06-22 23:26:56 +02:00
Magnus Müller	eeb8024184	Handles varied LLM response formats Ensures the judge system can correctly parse LLM responses, accommodating both string and list content types. Adds a fallback mechanism to guarantee a result even if maximum retry attempts are exceeded, enhancing robustness and type safety.	2025-06-22 23:12:37 +02:00
Magnus Müller	4629a8d9b7	Fixes relative import and type hints Fixes a relative import issue for the judge system. Updates type hints to allow None values for laminar_link and critical_error. Comments out unused code related to Laminar link updates.	2025-06-22 23:07:10 +02:00
Magnus Müller	be16ff3f69	Implement comprehensive judge system for task evaluation Added a new judge system in `judge_system.py` that evaluates browser-use agent runs, providing detailed structured feedback. Updated the evaluation workflow in `eval.yaml` to include a new command-line argument for using the comprehensive judge. Modified `service.py` to integrate the new judge system, allowing for fallback to the original Mind2Web evaluation if specified. Enhanced error handling and logging throughout the evaluation process.	2025-06-22 22:43:57 +02:00
Nick Sweeting	4a8a4155b3	try keep alive browsers	2025-06-21 07:23:54 -07:00
Nick Sweeting	ac22e6ae20	Test fixes, evenbus tweaks, docs updates, and better warnings (#2027 )	2025-06-21 06:32:11 -07:00
Nick Sweeting	0af8c8c0fe	imports	2025-06-21 06:29:10 -07:00
Nick Sweeting	3209fd95f7	lint and hint fixes	2025-06-21 06:07:21 -07:00
Nick Sweeting	aad78d93ab	more type hint fixes	2025-06-21 05:44:49 -07:00
Nick Sweeting	6bc1f7985f	more type hint fixes	2025-06-21 04:56:27 -07:00
Magnus Müller	aeea1788fa	fix: CLI argument default conflict for highlight_elements - Change from --highlight-elements (action='store_true') to --no-highlight-elements (action='store_false') - Fix CLI argument defaulting to False when flag not provided, conflicting with function default of True - Update GitHub workflow to use new flag logic (add flag when highlight_elements=false) - Ensure consistent behavior: highlighting enabled by default, can be disabled with --no-highlight-elements Resolves bug where CLI users got highlighting disabled by default instead of enabled	2025-06-21 12:52:40 +02:00
Magnus Müller	9292e6c48d	feat: add highlight_elements flag for controlling element highlighting - Add --highlight-elements CLI argument to eval/service.py - Pass highlight_elements through entire execution pipeline - Add highlight_elements support to GitHub workflow (eval.yaml) - Allow users to control whether interactive elements are highlighted on pages during automation - Improves debugging and visibility options for browser automation	2025-06-21 12:40:41 +02:00
Nick Sweeting	d8c6876e08	tweak BrowserSession failure message details	2025-06-21 01:03:09 -07:00
Magnus Müller	9b72be9ea0	Update logging format in service.py to include logger name for better context in log messages.	2025-06-21 09:55:38 +02:00
Magnus Müller	f1d5dc5a17	Pass laminar_eval_id from frontend	2025-06-21 09:31:14 +02:00
Magnus Müller	83d92513a4	Monitor eval cpu	2025-06-20 23:35:56 +02:00
Magnus Müller	4a8cf30dac	Merge branch 'main' into mert/new_everything	2025-06-20 12:27:19 +02:00
Magnus Müller	0e5a8942f3	Add gemini-2.5-flash	2025-06-20 12:19:47 +02:00
Magnus Müller	4c2952d640	Squashed commit of the following: commit `a9cf53a1b1` Merge: `5aa62c11` `0f9ffa10` Author: Magnus Müller <67061560+MagMueller@users.noreply.github.com> Date: Fri Jun 20 10:41:19 2025 +0200 Set user_data_dir to None (#2015) <!-- This is an auto-generated description by cubic. --> Changed browser session setup to use incognito mode by setting user_data_dir to None, preventing persistent state between evaluation runs. <!-- End of auto-generated description by cubic. --> commit `0f9ffa1072` Author: Magnus Müller <67061560+MagMueller@users.noreply.github.com> Date: Fri Jun 20 10:38:01 2025 +0200 Set user_data_dir to None commit `5aa62c1113` Merge: `d8a9d21b` `e559ff5e` Author: Nick Sweeting <git@sweeting.me> Date: Thu Jun 19 23:01:49 2025 -0700 Fix cross-origin iframe DOM retrieval (#1965) commit `d8a9d21b00` Merge: `3e5f3049` `b6be1583` Author: Nick Sweeting <git@sweeting.me> Date: Thu Jun 19 23:01:21 2025 -0700 Fix critical domain restriction bypass vulnerability (#2006) commit `b6be158319` Author: Sahar <saharhashai@gmail.com> Date: Thu Jun 19 02:28:34 2025 -0700 Delete tests/ci/test_security_url_validation.py commit `aca4b57329` Author: Sahar <saharhashai@gmail.com> Date: Thu Jun 19 02:27:57 2025 -0700 Delete SECURITY_FIX_REPORT.md commit `45872c1e45` Author: Your Name <your.email@example.com> Date: Thu Jun 19 11:24:50 2025 +0200 fix(security): prevent domain restriction bypass in controller actions - Add domain validation to controller.click() and controller.type() methods - Implement comprehensive security checks before executing actions - Prevent potential prompt injection and unauthorized data access - Add extensive test coverage for domain validation scenarios - Update documentation with security considerations This critical fix prevents complete bypass of domain restrictions that could enable attackers to perform unauthorized actions on any domain. commit `e559ff5eaa` Merge: `19ae8a11` `f348e0c5` Author: Nick Sweeting <git@sweeting.me> Date: Sat Jun 14 01:56:09 2025 -0700 Merge branch 'main' into main commit `19ae8a1146` Merge: `e1b3ff9e` `08ed0be3` Author: Nick Sweeting <git@sweeting.me> Date: Sat Jun 14 00:31:30 2025 -0700 Merge branch 'main' into main commit `e1b3ff9e9d` Author: Ilya Biryukov <ilbiryuk@microsoft.com> Date: Thu Jun 12 17:40:40 2025 -0700 Revert changes to examples/features/multiple_agents_same_browser.py commit `d20a3b55d6` Author: Ilya Biryukov <ilbiryuk@microsoft.com> Date: Thu Jun 12 17:30:59 2025 -0700 Fix pre-commit lint issues and compile error in multiple_agents_same_browser commit `13d5468aa2` Author: Ilya Biryukov <ilbiryuk@microsoft.com> Date: Thu Jun 12 14:07:21 2025 -0700 Fix cross-origin iframe DOM retrieval	2025-06-20 10:51:06 +02:00
Magnus Müller	0f9ffa1072	Set user_data_dir to None	2025-06-20 10:38:01 +02:00
Magnus Müller	90ae26316e	Refactor ActionResult to standardize the inclusion of extracted content, replacing update_only_read_state with include_extracted_content_only_once across multiple services. This change enhances clarity in memory management and ensures consistent handling of extracted content.	2025-06-19 23:18:30 +02:00
Magnus Müller	ce880e5e35	Refactor ActionResult handling across multiple services to standardize the use of long_term_memory, replacing memory references. Update related logic to ensure extracted content is consistently managed for improved clarity and error handling.	2025-06-19 23:11:55 +02:00
Magnus Müller	c62d14d9ed	Update action result handling in service.py to include extracted content in memory for paused agent states and SERPER API key checks, enhancing error reporting consistency.	2025-06-19 22:52:53 +02:00
mertunsall	86083b09c9	reset eval to main to be comparable	2025-06-19 17:02:58 +02:00
mertunsall	76ef41da45	Merge branch 'main' into mert/new_everything	2025-06-19 14:11:56 +02:00
Magnus Müller	f89a97f256	Delete laminar files since it is integrated to the main eval service	2025-06-19 11:26:53 +02:00
Magnus Müller	0b6ebea431	Merge branch 'main' into mert/new_everything	2025-06-19 09:49:43 +02:00
Magnus Müller	2c627a93fc	Fix event loop error	2025-06-18 23:55:27 +02:00
Magnus Müller	5b4b3dd5aa	Integrate Laminar tracing and enhance evaluation workflow - Added integration for Laminar tracing by initializing `AsyncLaminarClient` and creating evaluation links during task execution. - Updated `TaskResult` class to include an optional `laminar_task_link` for task-specific links. - Enhanced logging for task results and added error handling for Laminar datapoint creation. - Improved the `run_task_with_semaphore` function to manage Laminar evaluation links and update datapoints with evaluation scores. These changes aim to streamline the evaluation process and improve tracking of task performance.	2025-06-18 23:27:30 +02:00
Magnus Müller	081f5747c6	Refactor run_task_with_semaphore to use laminar_task_link - Renamed variable `link` to `laminar_task_link` for clarity in the `run_task_with_semaphore` function. - Updated the creation of `TaskResult` to utilize `laminar_task_link` instead of the previous `link` variable. - Improved logging to reflect the new variable name, enhancing readability and maintainability. These changes aim to improve code clarity and maintain consistency in the evaluation workflow.	2025-06-18 22:41:56 +02:00
Magnus Müller	158aa5b719	Enhance evaluation workflow in laminar_eval.py and update dependencies - Removed the `lmnr[all]` dependency from `pyproject.toml`. - Added `browser-use[dev,eval]` to `dev-dependencies` for improved development support. - Updated `TaskResult` class to include an optional `laminar_task_link` for task-specific links. - Modified `run_task_with_semaphore` to handle Laminar evaluation links and improved logging for task results. - Added logic to create a Laminar evaluation link during task execution. These changes aim to streamline the evaluation process and enhance the overall functionality of the evaluation workflow.	2025-06-18 22:41:05 +02:00
Magnus Müller	9547ebb3bb	Refactor laminar_eval.py to enhance task evaluation workflow - Updated `run_task_with_semaphore` to use `lmnr_run_id` for evaluation ID instead of `run_id`. - Added a new helper function `start_new_run` to initiate evaluation runs on the server. - Improved logging for task results and server interactions. - Ensured proper handling of environment variables for server configuration. This refactor aims to streamline the evaluation process and improve error handling.	2025-06-18 21:04:47 +02:00
Robert Kim	3548d0fe35	lint	2025-06-18 14:55:00 +01:00
Magnus Müller	3e796a5773	Merge branch 'main' into laminar_evals	2025-06-18 15:34:26 +02:00
Robert Kim	a0d6a08119	v0	2025-06-18 14:21:48 +01:00
mertunsall	49bdba1578	Add evaluation criterion for handling vague user tasks in service.py	2025-06-18 13:18:40 +02:00
mertunsall	3c5c3b69cb	Refine evaluation criteria and system message in service.py	2025-06-18 11:53:34 +02:00
mertunsall	1864e52635	Merge branch 'main' into mert/new_everything	2025-06-18 10:00:25 +02:00
Magnus Müller	eb2aabb7f8	Remove duplicate	2025-06-17 19:21:29 +02:00
Magnus Müller	c66880a8fa	Add lmnr package for tracing integration and update eval workflow	2025-06-17 18:58:56 +02:00
mertunsall	0c34c399db	Merge branch 'main' into mert/new_everything	2025-06-17 11:19:06 +02:00
Nick Sweeting	86abd92b79	reset timeouts in evals back to faster defaults	2025-06-11 01:46:41 -07:00
Nick Sweeting	520bbbe6c0	Update eval/service.py Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>	2025-06-11 02:28:13 -04:00
Nick Sweeting	1c64c483d9	allow https errors on evals because they include http urls for some reason	2025-06-10 06:27:35 -07:00
Nick Sweeting	ed6db5802b	fix timeouts order of magnitude	2025-06-10 05:42:23 -07:00
Nick Sweeting	a0ee5de2ad	tweak browsersession timeouts	2025-06-10 05:18:56 -07:00
Alezander9	8e7663758d	modify service to accept tasks with login cookies	2025-06-09 22:09:48 -07:00

1 2 3

109 Commits