# OpenWork UI evals

Human-readable scenarios that an LLM (or a person) can replay against a live
OpenWork instance to verify end-to-end behavior of the UI.

Each eval is:
- A short list of steps written in plain English.
- An **expected outcome** with observable signals.
- The most useful Chrome DevTools MCP calls to drive it.

They are not unit tests. They intentionally exercise the running stack
(OpenCode + OpenWork server + React UI) so regressions in wiring — not just
types — get caught.

## How to run

1. Start the Docker dev stack:
   ```bash
   packaging/docker/dev-up.sh
   ```
   Use the printed web URL (e.g. `http://localhost:50423`). The port is
   random; never hard-code `5173`/`8787`.

2. Pick a runner:
   - **Chrome DevTools MCP** (recommended). The tool names referenced in each
     flow are the `chrome-devtools_*` tools from the Chrome DevTools MCP
     server. Open the printed web URL in a fresh page via
     `chrome-devtools_new_page` and drive from there.
   - **Manual browser** — open the URL and follow the step lists by hand.

3. Walk each eval top-to-bottom. Only mark ✅ when every expected signal is
   visible. Capture a screenshot with `chrome-devtools_take_screenshot` if you
   want evidence.

4. Stop the stack with the exact `docker compose -p openwork-dev-... down`
   line printed by `dev-up.sh`.

## Conventions

- Selectors in the steps are descriptive, not CSS. Resolve them via
  `chrome-devtools_take_snapshot` and click by `uid`.
- When asked to "wait for X", use `chrome-devtools_wait_for` with the exact
  visible text. Keep the text short and unique.
- If a step expects a connected provider and the stack shows none, note it
  as an environment limitation, not an eval failure.

## Files

- [`react-session-flows.md`](./react-session-flows.md) — the 9 core
  session/settings flows verified during the React port cutover.