browser-use

mirror of https://github.com/browser-use/browser-use synced 2026-05-06 17:52:15 +02:00

Files

Magnus Müller e004a7efd8 Add explicit scoring criteria for tasks with no output

- Clarify that tasks producing no output must receive low scores
- Improve evaluation consistency for incomplete task scenarios
- Ensure judges properly penalize agents that fail to produce results

This helps maintain evaluation accuracy by explicitly stating that
lack of output is a significant failure condition that should be
reflected in low scoring.

2025-06-26 15:34:01 +02:00

judge_system.py

Add explicit scoring criteria for tasks with no output

2025-06-26 15:34:01 +02:00

service.py

Complete last_message flow to comprehensive judge

2025-06-25 10:34:50 +02:00