Files
browser-use/eval
Magnus Müller 1b51780428 Improve judge system alignment with browser-use architecture
- Add new error categories for output format and data extraction issues
- Update system prompt to clarify browser_state contains readable text
- Remove misaligned focus on extract_structured_data usage
- Improve evaluation criteria for browser-use specific behaviors
- Streamline error categories and remove unused ones
- Better alignment with agent's actual capabilities and expected behaviors
2025-06-25 08:36:20 +02:00
..