Files
browser-use/.github/workflows
Magnus Müller d4a29c4b93 Improves evaluation robustness and reporting
Enhances evaluation by improving error handling, providing more detailed logging, and adding a local summary calculation.

The changes include:

- Adds comprehensive judge fallback to Mind2Web judge and ensures backward compatibility.
- Improves error handling during evaluation by capturing and logging the last part of the output on failure.
- Adds a new function to calculate a summary of local evaluation results, displaying total tasks, success rate, and average score.
- Includes comprehensive evaluation data for debugging purposes.
2025-06-23 00:08:14 +02:00
..
2025-05-01 04:48:19 -04:00
2025-06-21 06:29:10 -07:00
2025-06-10 03:58:37 -07:00
2025-06-16 17:20:27 -07:00