Files
browser-use/eval
Magnus Müller b7fff509f9 Refactor JudgeResult to use Pydantic model and streamline evaluation process
- Replaced dataclass ScoreBreakdown with Pydantic's BaseModel for JudgeResult.
- Updated scoring guidelines to reflect percentage-based final scores.
- Removed unnecessary fields and improved JSON response structure.
- Enhanced error handling and logging for evaluation failures.
- Simplified parsing logic for structured responses from the model.
2025-06-29 15:47:45 +02:00
..