browser-use

eliott/browser-use

Fork 0

mirror of https://github.com/browser-use/browser-use synced 2026-05-06 17:52:15 +02:00

Commit Graph

Author	SHA1	Message	Date
Magnus Müller	eeb8024184	Handles varied LLM response formats Ensures the judge system can correctly parse LLM responses, accommodating both string and list content types. Adds a fallback mechanism to guarantee a result even if maximum retry attempts are exceeded, enhancing robustness and type safety.	2025-06-22 23:12:37 +02:00
Magnus Müller	4629a8d9b7	Fixes relative import and type hints Fixes a relative import issue for the judge system. Updates type hints to allow None values for laminar_link and critical_error. Comments out unused code related to Laminar link updates.	2025-06-22 23:07:10 +02:00
Magnus Müller	be16ff3f69	Implement comprehensive judge system for task evaluation Added a new judge system in `judge_system.py` that evaluates browser-use agent runs, providing detailed structured feedback. Updated the evaluation workflow in `eval.yaml` to include a new command-line argument for using the comprehensive judge. Modified `service.py` to integrate the new judge system, allowing for fallback to the original Mind2Web evaluation if specified. Enhanced error handling and logging throughout the evaluation process.	2025-06-22 22:43:57 +02:00

Author

SHA1

Message

Date

Magnus Müller

eeb8024184

Handles varied LLM response formats

Ensures the judge system can correctly parse LLM responses, accommodating both string and list content types.

Adds a fallback mechanism to guarantee a result even if maximum retry attempts are exceeded, enhancing robustness and type safety.

2025-06-22 23:12:37 +02:00

Magnus Müller

4629a8d9b7

Fixes relative import and type hints

Fixes a relative import issue for the judge system.

Updates type hints to allow None values for laminar_link and critical_error.

Comments out unused code related to Laminar link updates.

2025-06-22 23:07:10 +02:00

Magnus Müller

be16ff3f69

Implement comprehensive judge system for task evaluation

Added a new judge system in `judge_system.py` that evaluates browser-use agent runs, providing detailed structured feedback. Updated the evaluation workflow in `eval.yaml` to include a new command-line argument for using the comprehensive judge. Modified `service.py` to integrate the new judge system, allowing for fallback to the original Mind2Web evaluation if specified. Enhanced error handling and logging throughout the evaluation process.

2025-06-22 22:43:57 +02:00

3 Commits