browser-use

mirror of https://github.com/browser-use/browser-use synced 2026-05-06 17:52:15 +02:00

Author	SHA1	Message	Date
Magnus Müller	d4a29c4b93	Improves evaluation robustness and reporting Enhances evaluation by improving error handling, providing more detailed logging, and adding a local summary calculation. The changes include: - Adds comprehensive judge fallback to Mind2Web judge and ensures backward compatibility. - Improves error handling during evaluation by capturing and logging the last part of the output on failure. - Adds a new function to calculate a summary of local evaluation results, displaying total tasks, success rate, and average score. - Includes comprehensive evaluation data for debugging purposes.	2025-06-23 00:08:14 +02:00
Magnus Müller	be170fb17a	Ensures payload serialization preserves dict structure Adds a type assertion to ensure that the payload remains a dictionary after serialization. Also, adds type hints to `make_json_serializable` for better code clarity and maintainability.	2025-06-22 23:50:46 +02:00
Magnus Müller	4a26f07c66	Ensures JSON serializability of task results Adds a utility function to convert objects within a payload to JSON-serializable types before returning the task result. This change addresses potential issues where the task result contains non-serializable objects (e.g., enums, custom objects), preventing proper data handling.	2025-06-22 23:46:39 +02:00
Magnus Müller	56568a6ce0	Updates judge system import path (#2048 ) Updates the import path for the comprehensive judge system to reflect its new location in the project structure. This resolves an issue where the previous relative import was causing import errors. <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Updated the import path for the judge system to fix import errors after moving the file. <!-- End of auto-generated description by cubic. -->	2025-06-22 23:30:25 +02:00
Magnus Müller	d960a538cb	Merge branch 'main' into fix/relative-import-eval	2025-06-22 23:28:07 +02:00
Magnus Müller	0a5a29e4a8	Updates judge system import path Updates the import path for the comprehensive judge system to reflect its new location in the project structure. This resolves an issue where the previous relative import was causing import errors.	2025-06-22 23:26:56 +02:00
Magnus Müller	d015c6cb94	Implement comprehensive judge system for task evaluation (#2047 ) Added a new judge system in `judge_system.py` that evaluates browser-use agent runs, providing detailed structured feedback. Updated the evaluation workflow in `eval.yaml` to include a new command-line argument for using the comprehensive judge. Modified `service.py` to integrate the new judge system, allowing for fallback to the original Mind2Web evaluation if specified. Enhanced error handling and logging throughout the evaluation process. <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Added a comprehensive judge system for evaluating browser-use agent runs, providing detailed, structured feedback and multi-dimensional scoring. Updated the evaluation workflow to support both the new judge and the original Mind2Web judge, with improved error handling and logging. - New Features - Introduced `judge_system.py` with a multi-criteria evaluation and JSON feedback. - Integrated the new judge into `service.py` with a command-line flag for judge selection. - Enhanced error handling and logging during evaluation. - Dependencies - Updated `.github/workflows/eval.yaml` to add a flag for selecting the judge system. <!-- End of auto-generated description by cubic. -->	2025-06-22 23:17:53 +02:00
Magnus Müller	eeb8024184	Handles varied LLM response formats Ensures the judge system can correctly parse LLM responses, accommodating both string and list content types. Adds a fallback mechanism to guarantee a result even if maximum retry attempts are exceeded, enhancing robustness and type safety.	2025-06-22 23:12:37 +02:00
Magnus Müller	4629a8d9b7	Fixes relative import and type hints Fixes a relative import issue for the judge system. Updates type hints to allow None values for laminar_link and critical_error. Comments out unused code related to Laminar link updates.	2025-06-22 23:07:10 +02:00
Magnus Müller	be16ff3f69	Implement comprehensive judge system for task evaluation Added a new judge system in `judge_system.py` that evaluates browser-use agent runs, providing detailed structured feedback. Updated the evaluation workflow in `eval.yaml` to include a new command-line argument for using the comprehensive judge. Modified `service.py` to integrate the new judge system, allowing for fallback to the original Mind2Web evaluation if specified. Enhanced error handling and logging throughout the evaluation process.	2025-06-22 22:43:57 +02:00
Mert Unsal	4255cb6cc2	Update action description for input_text to clarify functionality (#2044 ) Updated the action description for input_text to clarify that it both clicks and inputs text into an interactive element.	2025-06-22 17:58:07 +02:00
mertunsall	b3ccedf632	Update action description for input_text to clarify functionality	2025-06-22 17:56:53 +02:00
Mert Unsal	952c63a761	Fix input_text (#2040 ) Improved input text handling for DOM elements by prioritizing click-and-type, adding better error handling, and returning clear error messages on failure. Previously didn't work on https://flights.google.com/ and now it does.	2025-06-22 17:22:54 +02:00
Mert Unsal	4ce6f769bf	Merge branch 'main' into mert/fix_input_message	2025-06-22 17:17:22 +02:00
mertunsall	4e88b221b6	Fix exception handling for text input in DOM elements by raising BrowserError within the try-except block, ensuring proper error reporting.	2025-06-22 17:16:28 +02:00
mertunsall	f896fe40c6	Prioritize click and enter text for input text	2025-06-22 15:35:50 +02:00
mertunsall	258d0b0bec	Implemented a try-except block to handle exceptions when inputting text into DOM elements. If an error occurs, a descriptive message is returned to indicate the failure, improving robustness in browser interactions.	2025-06-22 15:34:37 +02:00
Mert Unsal	f825477f3c	fix: fixing open file encoding error in windows (#2039 ) fix: [Bug: Failed to load system prompt template: 'gbk' codec can't decode byte 0x94 in position 2390: illegal multibyte sequence #2038](https://github.com/browser-use/browser-use/issues/2038) In Python, when using the open() function without specifying an encoding, the default encoding is determined by: Python 3.10+: defaults to using the encoding returned by locale.getpreferredencoding(False) In my windows computer, locale.getpreferredencoding(False) return `cp936` (equivalent to GBK), so I add encoding param in open function to use `utf-8` encoding. ``` import locale print(locale.getpreferredencoding(False)) ``` <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Fixed a file encoding error when loading the system prompt template on Windows by setting the file to open with UTF-8 encoding. <!-- End of auto-generated description by cubic. -->	2025-06-22 12:29:26 +02:00
mingzhong.li	dbae589623	fix: fixing open file encoding error in windows	2025-06-22 17:52:14 +08:00
Mert Unsal	16f845ca79	Update system prompt (#2034 ) Improved the system prompt instructions for browser automation agents to clarify reasoning steps, element selection, and file handling.	2025-06-22 08:52:18 +02:00
mertunsall	95ca883894	Update system prompt	2025-06-21 23:36:48 +02:00
Mert Unsal	98d08cc040	Revert "Enhance system prompt reasoning (#2022 )" (#2033 ) Reverted recent changes to the system prompt. 0.3.2	2025-06-21 23:31:11 +02:00
mertunsall	b4783745b9	Revert "Enhance system prompt reasoning (#2022 )" This reverts commit `25a2eecbfd`, reversing changes made to `8194ecbc3e`.	2025-06-21 23:27:19 +02:00
Nick Sweeting	e6dd2ae475	error handling during browser launch	2025-06-21 08:05:41 -07:00
Nick Sweeting	4a8a4155b3	try keep alive browsers	2025-06-21 07:23:54 -07:00
Nick Sweeting	b5193c445f	make browser launch timeout set using playwright kwarg (#2030 )	2025-06-21 07:16:08 -07:00
Nick Sweeting	959e0b2911	make browser launch timeout set using playwright kwarg	2025-06-21 07:14:48 -07:00
Nick Sweeting	6d84267a1f	fix for indexerrors during browser launches	2025-06-21 07:05:03 -07:00
Nick Sweeting	6b50bda566	add pyright to lint script	2025-06-21 06:33:47 -07:00
Nick Sweeting	ac22e6ae20	Test fixes, evenbus tweaks, docs updates, and better warnings (#2027 )	2025-06-21 06:32:11 -07:00
Nick Sweeting	0af8c8c0fe	imports	2025-06-21 06:29:10 -07:00
Nick Sweeting	eb21d92d34	include extras packages in CI to avoid missing imports errors	2025-06-21 06:23:23 -07:00
Nick Sweeting	de67673b79	test fix	2025-06-21 06:19:05 -07:00
Nick Sweeting	046c53a171	hint and lint fixes	2025-06-21 06:16:53 -07:00
Nick Sweeting	a1144052ad	tests sync client auth	2025-06-21 06:09:57 -07:00
Nick Sweeting	3209fd95f7	lint and hint fixes	2025-06-21 06:07:21 -07:00
Nick Sweeting	aad78d93ab	more type hint fixes	2025-06-21 05:44:49 -07:00
Nick Sweeting	6c695d0a42	more lint and hint fixes	2025-06-21 05:39:17 -07:00
Nick Sweeting	f878b8f07c	type hint fixes	2025-06-21 05:16:02 -07:00
Nick Sweeting	eb12440558	fix CI numpy wheel not available on py 3.12 errors	2025-06-21 05:15:42 -07:00
Nick Sweeting	6bc1f7985f	more type hint fixes	2025-06-21 04:56:27 -07:00
Nick Sweeting	340bafdd29	move old tests to old folder	2025-06-21 04:47:46 -07:00
Nick Sweeting	e3c145377b	fix window resizing	2025-06-21 04:35:31 -07:00
Nick Sweeting	b67be37490	fix type hint errors	2025-06-21 04:35:24 -07:00
Nick Sweeting	875c8fc831	fix tests	2025-06-21 04:08:51 -07:00
Magnus Müller	84c69dacb0	feat: Add highlight_elements flag for controlling element highlighting (#2028 ) ## Summary This PR adds a new `highlight_elements` flag that allows users to control whether interactive elements are highlighted on web pages during browser automation. ## Changes Made - ✅ Frontend (UI): Added `highlightElements` field to run settings store with default value `true` - ✅ Backend: Added `--highlight-elements` CLI argument to `eval/service.py` - ✅ Pipeline: Pass `highlight_elements` parameter through entire execution pipeline - ✅ GitHub Workflow: Added support for `highlight_elements` in `eval.yaml` - ✅ Browser Configuration: Correctly pass flag to `BrowserSession` → `BrowserProfile` ## How it works - UI: Users can toggle "Highlight Elements" in the Flags section - CLI: Can be enabled with `--highlight-elements` argument - Backend: Parameter flows through all execution stages - Browser: Controls whether interactive elements are highlighted on pages during automation ## Benefits - 🎯 Better debugging: Users can see exactly which elements the agent is interacting with - 🔧 Flexible control: Can be disabled for performance or cleaner screenshots - 📱 UI Integration: Seamlessly integrated into the evaluation platform interface - 🛠️ CLI Support: Available for both UI and command-line usage ## Testing - Verified UI toggle functionality in evaluation platform - Tested CLI argument parsing and parameter flow - Confirmed GitHub workflow integration - Validated browser configuration handling Resolves the need for user-controllable element highlighting during browser automation. <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Added a highlight_elements flag to let users control if interactive elements are highlighted during browser automation. - New Features - Added UI toggle and CLI flag for element highlighting. - Passed highlight_elements through the backend and GitHub workflow. <!-- End of auto-generated description by cubic. -->	2025-06-21 13:07:34 +02:00
Nick Sweeting	3cf9f3410c	fix config issues	2025-06-21 04:03:58 -07:00
Magnus Müller	341a305b2c	chore: remove unused dataclass import - Remove unused 'from dataclasses import dataclass' import from dom/service.py - Applied by pre-commit hooks cleanup	2025-06-21 13:02:33 +02:00
Magnus Müller	aeea1788fa	fix: CLI argument default conflict for highlight_elements - Change from --highlight-elements (action='store_true') to --no-highlight-elements (action='store_false') - Fix CLI argument defaulting to False when flag not provided, conflicting with function default of True - Update GitHub workflow to use new flag logic (add flag when highlight_elements=false) - Ensure consistent behavior: highlighting enabled by default, can be disabled with --no-highlight-elements Resolves bug where CLI users got highlighting disabled by default instead of enabled	2025-06-21 12:52:40 +02:00
Nick Sweeting	c451bca15c	fix spaces	2025-06-21 03:46:00 -07:00

1 2 3 4 5 ...

3344 Commits