mirror of
https://github.com/we-promise/sure
synced 2026-04-25 17:15:07 +02:00
* feat(ci): improve LLM eval visibility in GitHub Actions - Add step summary output for each eval run (shows in GH UI) - Add new 'summarize_evals' job that aggregates results from all matrix runs - Generate markdown table with accuracy, cost, and duration for all evals - Add threshold checking (fails workflow if accuracy < 70%) - Include status icons (✅/❌) for quick visual assessment - Show overall pass/fail status at the end of summary * Fix LLM eval workflow summary --------- Co-authored-by: SureBot <sure-bot@we-promise.com> Co-authored-by: Juan José Mata <juanjo.mata@gmail.com>