mirror of
https://github.com/browser-use/browser-use
synced 2026-05-06 17:52:15 +02:00
- Clarify that tasks producing no output must receive low scores - Improve evaluation consistency for incomplete task scenarios - Ensure judges properly penalize agents that fail to produce results This helps maintain evaluation accuracy by explicitly stating that lack of output is a significant failure condition that should be reflected in low scoring.