Files
browser-use/eval
Magnus Müller 033418092f Enhance system prompt clarity and evaluation criteria
Refines the system prompt in judge_system.py by improving the context about the browser-use agent and updating evaluation criteria for better clarity. Adjusts the JSON response structure to reflect changes in task satisfaction and trajectory quality metrics.
2025-06-24 22:41:26 +02:00
..