Files
browser-use/.github
Magnus Müller d4a29c4b93 Improves evaluation robustness and reporting
Enhances evaluation by improving error handling, providing more detailed logging, and adding a local summary calculation.

The changes include:

- Adds comprehensive judge fallback to Mind2Web judge and ensures backward compatibility.
- Improves error handling during evaluation by capturing and logging the last part of the output on failure.
- Adds a new function to calculate a summary of local evaluation results, displaying total tasks, success rate, and average score.
- Includes comprehensive evaluation data for debugging purposes.
2025-06-23 00:08:14 +02:00
..
2025-06-17 22:38:59 -07:00
2025-05-13 20:51:49 -04:00
2025-06-09 22:34:34 -07:00