Files
browser-use/eval
Magnus Müller e004a7efd8 Add explicit scoring criteria for tasks with no output
- Clarify that tasks producing no output must receive low scores
- Improve evaluation consistency for incomplete task scenarios
- Ensure judges properly penalize agents that fail to produce results

This helps maintain evaluation accuracy by explicitly stating that
lack of output is a significant failure condition that should be
reflected in low scoring.
2025-06-26 15:34:01 +02:00
..