browser-use/eval/service.py at d4a29c4b93a65bc3894de3228c3f264d008c041d

mirror of https://github.com/browser-use/browser-use synced 2026-05-06 17:52:15 +02:00

Files

Magnus Müller d4a29c4b93 Improves evaluation robustness and reporting

Enhances evaluation by improving error handling, providing more detailed logging, and adding a local summary calculation.

The changes include:

- Adds comprehensive judge fallback to Mind2Web judge and ensures backward compatibility.
- Improves error handling during evaluation by capturing and logging the last part of the output on failure.
- Adds a new function to calculate a summary of local evaluation results, displaying total tasks, success rate, and average score.
- Includes comprehensive evaluation data for debugging purposes.

2025-06-23 00:08:14 +02:00

94 KiB

Raw Blame History

View Raw

94 KiB Raw Blame History

94 KiB

Raw Blame History