Merge pull request #2 from stephenleo/monitor-pattern

feat(bad): Monitor tool support + progressive disclosure refactor (v1.1.0)
2026-04-25 20:34:55 +02:00 · 2026-04-11 19:51:57 +08:00
parent 02384c6983 b2f5fa2302
commit 147dcdd947
15 changed files with 649 additions and 222 deletions
--- a/.claude-plugin/marketplace.json
+++ b/.claude-plugin/marketplace.json
@@ -24,7 +24,7 @@
      "name": "bmad-bad",
      "source": "./",
      "description": "Autonomous development orchestrator for the BMad Method. Runs fully autonomous parallel multi-agent pipelines through the full story lifecycle (create → dev → review → PR) driven by your sprint backlog and dependency graph.",
-      "version": "1.0.0",
+      "version": "1.1.0",
      "author": { "name": "Marie Stephen Leo" },
      "skills": [
        "./skills/bad"
--- a/README.md
+++ b/README.md
@@ -12,17 +12,18 @@ Once your epics and stories are planned, BAD takes over:

 1. *(`MODEL_STANDARD` subagent)* Builds a dependency graph from your sprint backlog — maps story dependencies, syncs GitHub PR status, and identifies what's ready to work on
 2. Picks ready stories from the graph, respecting epic ordering and dependencies
-3. Runs up to `MAX_PARALLEL_STORIES` stories simultaneously — each in its own isolated git worktree — each through a sequential 4-step pipeline:
-   - **Step 1** *(`MODEL_STANDARD` subagent)* — `bmad-create-story`: generates the story spec
+3. Runs up to `MAX_PARALLEL_STORIES` stories simultaneously — each in its own isolated git worktree — each through a sequential 5-step pipeline:
+   - **Step 1** *(`MODEL_STANDARD` subagent)* — `bmad-create-story`: generates and validates the story spec
   - **Step 2** *(`MODEL_STANDARD` subagent)* — `bmad-dev-story`: implements the code
   - **Step 3** *(`MODEL_QUALITY` subagent)* — `bmad-code-review`: reviews and fixes the implementation
-   - **Step 4** *(`MODEL_STANDARD` subagent)* — commit, push, open PR, monitor CI, fix any failing checks, resolve code review comments, and resolve merge conflicts
+   - **Step 4** *(`MODEL_STANDARD` subagent)* — commit, push, open PR, monitor CI, fix any failing checks
+   - **Step 5** *(`MODEL_STANDARD` subagent)* — PR code review: reviews diff, applies fixes, pushes clean
 4. *(`MODEL_STANDARD` subagent)* Optionally auto-merges batch PRs sequentially (lowest story number first), resolving any conflicts
 5. Waits, then loops back for the next batch — until the entire sprint is done

 ## Requirements

- [BMad Method](https://docs.bmad-method.org/) installed in your project
+- [BMad Method](https://docs.bmad-method.org/) installed in your project `npx bmad-method install --modules bmm,tea`
 - A sprint plan with epics, stories, and `sprint-status.yaml`
 - Git + GitHub CLI (`gh`) installed and authenticated:
  1. `brew install gh`
@@ -31,6 +32,15 @@ Once your epics and stories are planned, BAD takes over:
     ```bash
     export GITHUB_PERSONAL_ACCESS_TOKEN=$(gh auth token)
     ```
+  4. If running Claude Code with sandbox mode, allow `gh` to reach GitHub's API — add to `.claude/settings.json`:
+     ```json
+     {
+       "sandbox": {
+         ...
+         "enableWeakerNetworkIsolation": true
+       }
+     }
+     ```

 ## Installation

@@ -46,6 +56,14 @@ Then run setup in your project:

 ## Usage

+BAD spawns subagents for every step of the pipeline. For the full autonomous experience — no permission prompts — start Claude Code with:
+
+```bash
+claude --dangerously-skip-permissions
+```
+
+Then run:
+
 ```
 /bad
 ```
@@ -60,7 +78,7 @@ Run with optional overrides:

 ### Configuration

-BAD is configured at install time (`/bad setup`) and stores settings in `_bmad/bad/config.yaml`. All values can be overridden at runtime with `KEY=VALUE` args.
+BAD is configured at install time (`/bad setup`) and stores settings in the `bad:` section of `_bmad/config.yaml`. All values can be overridden at runtime with `KEY=VALUE` args.

 | Variable | Default | Description |
 |---|---|---|
@@ -72,6 +90,12 @@ BAD is configured at install time (`/bad setup`) and stores settings in `_bmad/b
 | `RUN_CI_LOCALLY` | `false` | Run CI locally instead of GitHub Actions |
 | `WAIT_TIMER_SECONDS` | `3600` | Wait between batches |
 | `RETRO_TIMER_SECONDS` | `600` | Delay before auto-retrospective |
+| `CONTEXT_COMPACTION_THRESHOLD` | `80` | Context window % at which to compact context |
+| `TIMER_SUPPORT` | `true` | Use native platform timers; `false` for prompt-based continuation |
+| `MONITOR_SUPPORT` | `true` | Use the Monitor tool for CI/PR-merge polling; `false` for Bedrock/Vertex/Foundry |
+| `API_FIVE_HOUR_THRESHOLD` | `80` | (Claude Code) 5-hour usage % at which to pause |
+| `API_SEVEN_DAY_THRESHOLD` | `95` | (Claude Code) 7-day usage % at which to pause |
+| `API_USAGE_THRESHOLD` | `80` | (Other harnesses) Generic usage % at which to pause |

 ## Agent Harness Support

--- a/docs/index.md
+++ b/docs/index.md
@@ -10,7 +10,7 @@ BAD is a [BMad Method](https://docs.bmad-method.org/) module that automates your

 ## Requirements

- [BMad Method](https://docs.bmad-method.org/) installed in your project
+- [BMad Method](https://docs.bmad-method.org/) installed in your project `npx bmad-method install --modules bmm,tea`
 - A sprint plan with epics, stories, and `sprint-status.yaml`
 - Git + GitHub CLI (`gh`) installed and authenticated:
  1. `brew install gh`
@@ -19,6 +19,15 @@ BAD is a [BMad Method](https://docs.bmad-method.org/) module that automates your
     ```bash
     export GITHUB_PERSONAL_ACCESS_TOKEN=$(gh auth token)
     ```
+  4. If running Claude Code with sandbox mode, allow `gh` to reach GitHub's API — add to `.claude/settings.json`:
+     ```json
+     {
+       "sandbox": {
+         ...
+         "enableWeakerNetworkIsolation": true
+       }
+     }
+     ```

 ## Installation

@@ -34,6 +43,14 @@ Then run setup in your project:

 ## Usage

+BAD spawns subagents for every step of the pipeline. For the full autonomous experience — no permission prompts — start Claude Code with:
+
+```bash
+claude --dangerously-skip-permissions
+```
+
+Then run:
+
 ```
 /bad
 ```
@@ -52,17 +69,18 @@ Once your epics and stories are planned, BAD takes over:

 1. *(`MODEL_STANDARD` subagent)* Builds a dependency graph from your sprint backlog — maps story dependencies, syncs GitHub PR status, and identifies what's ready to work on
 2. Picks ready stories from the graph, respecting epic ordering and dependencies
-3. Runs up to `MAX_PARALLEL_STORIES` stories simultaneously — each in its own isolated git worktree — each through a sequential 4-step pipeline. **Every step runs in a dedicated subagent with a fresh context window**, keeping the coordinator lean and each agent fully focused on its single task:
-   - **Step 1** *(`MODEL_STANDARD` subagent)* — `bmad-create-story`: generates the story spec
+3. Runs up to `MAX_PARALLEL_STORIES` stories simultaneously — each in its own isolated git worktree — each through a sequential 5-step pipeline. **Every step runs in a dedicated subagent with a fresh context window**, keeping the coordinator lean and each agent fully focused on its single task:
+   - **Step 1** *(`MODEL_STANDARD` subagent)* — `bmad-create-story`: generates and validates the story spec
   - **Step 2** *(`MODEL_STANDARD` subagent)* — `bmad-dev-story`: implements the code
   - **Step 3** *(`MODEL_QUALITY` subagent)* — `bmad-code-review`: reviews and fixes the implementation
-   - **Step 4** *(`MODEL_STANDARD` subagent)* — commit, push, open PR, monitor CI, fix any failing checks, resolve code review comments, and resolve merge conflicts
+   - **Step 4** *(`MODEL_STANDARD` subagent)* — commit, push, open PR, monitor CI, fix any failing checks
+   - **Step 5** *(`MODEL_STANDARD` subagent)* — PR code review: reviews diff, applies fixes, pushes clean
 4. *(`MODEL_STANDARD` subagent)* Optionally auto-merges batch PRs sequentially (lowest story number first), resolving any conflicts
 5. Waits, then loops back for the next batch — until the entire sprint is done

 ## Configuration

-BAD is configured at install time (`/bad setup`) and stores settings in `_bmad/bad/config.yaml`. All values can be overridden at runtime with `KEY=VALUE` args.
+BAD is configured at install time (`/bad setup`) and stores settings in the `bad:` section of `_bmad/config.yaml`. All values can be overridden at runtime with `KEY=VALUE` args.

 | Variable | Config Key | Default | Description |
 |---|---|---|---|
@@ -76,6 +94,7 @@ BAD is configured at install time (`/bad setup`) and stores settings in `_bmad/b
 | `RETRO_TIMER_SECONDS` | `retro_timer_seconds` | `600` | Seconds before auto-retrospective after epic completion |
 | `CONTEXT_COMPACTION_THRESHOLD` | `context_compaction_threshold` | `80` | Context window % at which to compact context |
 | `TIMER_SUPPORT` | `timer_support` | `true` | Use native platform timers; `false` for prompt-based continuation |
+| `MONITOR_SUPPORT` | `monitor_support` | `true` | Use the Monitor tool for CI/PR-merge polling; `false` for Bedrock/Vertex/Foundry |
 | `API_FIVE_HOUR_THRESHOLD` | `api_five_hour_threshold` | `80` | (Claude Code) 5-hour usage % at which to pause |
 | `API_SEVEN_DAY_THRESHOLD` | `api_seven_day_threshold` | `95` | (Claude Code) 7-day usage % at which to pause |
 | `API_USAGE_THRESHOLD` | `api_usage_threshold` | `80` | (Other harnesses) Generic usage % at which to pause |
--- a/skills/bad/SKILL.md
+++ b/skills/bad/SKILL.md
@@ -13,16 +13,16 @@ The `setup`/`configure` argument always triggers `./assets/module-setup.md`, eve

 After setup completes (or if config already exists), load the `bad` config and continue to Startup below.

-You are a **coordinator**. You delegate every step to subagents. You never read files, run git/gh commands, or write to disk yourself.
+You are a **coordinator**. You delegate every step to subagents via the **Agent tool**. You never read files, run git/gh commands, or write to disk yourself.

 **Coordinator-only responsibilities:**
 - Pick stories from subagent-reported data
- Spawn subagents (in parallel where allowed)
+- Call the Agent tool to spawn subagents (in parallel where allowed — multiple Agent tool calls in one message)
 - Manage timers (CronCreate / CronDelete)
 - Run Pre-Continuation Checks (requires session stdin JSON — coordinator only)
 - Handle user input, print summaries, and send channel notifications

-**Everything else** — file reads, git operations, gh commands, disk writes — happens in subagents with fresh context.
+**Everything else** — file reads, git operations, gh commands, disk writes — happens inside Agent tool subagents with fresh context windows.

 ## Startup: Capture Channel Context

@@ -33,7 +33,7 @@ Before doing anything else, determine how to send notifications:
   - If another channel type is connected, save its equivalent identifier.
   - If no channel is connected, set `NOTIFY_SOURCE="terminal"`.

-2. **Send the BAD started notification** using the [Notify Pattern](#notify-pattern):
+2. **Send the BAD started notification** using the [Notify Pattern](references/coordinator/pattern-notify.md):
   ```
   🤖 BAD started — building dependency graph...
   ```
@@ -44,18 +44,19 @@ Then proceed to Phase 0.

 ## Configuration

-Load base values from `_bmad/bad/config.yaml` at startup (via `/bmad-init --module bad --all`). Then parse any `KEY=VALUE` overrides from arguments passed to `/bad` — args win over config. For any variable not in config or args, use the default below.
+Load base values from the `bad` section of `_bmad/config.yaml` at startup. Then parse any `KEY=VALUE` overrides from arguments passed to `/bad` — args win over config. For any variable not in config or args, use the default below.

 | Variable | Config Key | Default | Description |
 |----------|-----------|---------|-------------|
 | `MAX_PARALLEL_STORIES` | `max_parallel_stories` | `3` | Max stories to run in a single batch |
 | `WORKTREE_BASE_PATH` | `worktree_base_path` | `.worktrees` | Root directory for git worktrees |
-| `MODEL_STANDARD` | `model_standard` | `sonnet` | Model for Steps 1, 2, 4 and Phase 3 (auto-merge) |
+| `MODEL_STANDARD` | `model_standard` | `sonnet` | Model for Steps 1, 2, 4, 5 and Phase 3 (auto-merge) |
 | `MODEL_QUALITY` | `model_quality` | `opus` | Model for Step 3 (code review) |
 | `RETRO_TIMER_SECONDS` | `retro_timer_seconds` | `600` | Auto-retrospective countdown after epic completion (10 min) |
 | `WAIT_TIMER_SECONDS` | `wait_timer_seconds` | `3600` | Post-batch wait before re-checking PR status (1 hr) |
 | `CONTEXT_COMPACTION_THRESHOLD` | `context_compaction_threshold` | `80` | Context window % at which to compact/summarise context |
 | `TIMER_SUPPORT` | `timer_support` | `true` | When `true`, use native platform timers; when `false`, use prompt-based continuation |
+| `MONITOR_SUPPORT` | `monitor_support` | `true` | When `true`, use the Monitor tool for CI and PR-merge polling; when `false`, fall back to manual polling loops (required for Bedrock/Vertex/Foundry) |
 | `API_FIVE_HOUR_THRESHOLD` | `api_five_hour_threshold` | `80` | (Claude Code) 5-hour rate limit % that triggers a pause |
 | `API_SEVEN_DAY_THRESHOLD` | `api_seven_day_threshold` | `95` | (Claude Code) 7-day rate limit % that triggers a pause |
 | `API_USAGE_THRESHOLD` | `api_usage_threshold` | `80` | (Other harnesses) Generic API usage % that triggers a pause |
@@ -83,9 +84,9 @@ Phase 1: Discover stories  [coordinator logic]
           └─ If none ready → skip to Phase 4
  │
 Phase 2: Run the pipeline  [subagents — stories parallel, steps sequential]
-  ├─► Story A ──► Step 1 → Step 2 → Step 3 → Step 4
-  ├─► Story B ──► Step 1 → Step 2 → Step 3 → Step 4
-  └─► Story C ──► Step 1 → Step 2 → Step 3 → Step 4
+  ├─► Story A ──► Step 1 → Step 2 → Step 3 → Step 4 → Step 5
+  ├─► Story B ──► Step 1 → Step 2 → Step 3 → Step 4 → Step 5
+  └─► Story C ──► Step 1 → Step 2 → Step 3 → Step 4 → Step 5
  │
 Phase 3: Auto-Merge Batch PRs  [subagents — sequential]
           └─ One subagent per story (lowest → highest story number)
@@ -102,65 +103,25 @@ Phase 4: Batch Completion & Continuation

 ## Phase 0: Build or Update the Dependency Graph

-Spawn a **single `MODEL_STANDARD` subagent** (yolo mode) with these instructions. The coordinator waits for the report.
+Before spawning the subagent, **create the full initial task list** using TaskCreate so the user can see the complete pipeline at a glance. Mark Phase 0 `in_progress`; all others start as `[ ]`. Apply the Phase 3 rule at creation time:

 ```
-You are the Phase 0 dependency graph builder. Auto-approve all tool calls (yolo mode).
+[in_progress] Phase 0: Dependency graph
+[ ] Phase 1: Story selection
+[ ] Phase 2: Step 1 — Create story
+[ ] Phase 2: Step 2 — Develop
+[ ] Phase 2: Step 3 — Code review
+[ ] Phase 2: Step 4 — PR + CI
+[ ] Phase 2: Step 5 — PR review
+[ ] Phase 3: Auto-merge                                      ← if AUTO_PR_MERGE=true
+[completed] Phase 3: Auto-merge — skipped (AUTO_PR_MERGE=false)  ← if AUTO_PR_MERGE=false
+[ ] Phase 4: Batch summary + continuation
+```

-DECIDE how much to run based on whether the graph already exists:
+Call the **Agent tool** with `model: MODEL_STANDARD`, `description: "Phase 0: dependency graph"`, and this prompt. The coordinator waits for the report.

-  | Situation                           | Action                                               |
-  |-------------------------------------|------------------------------------------------------|
-  | No graph (first run)                | Run all steps                                        |
-  | Graph exists, no new stories        | Skip steps 2–3; go to step 4. Preserve all chains.  |
-  | Graph exists, new stories found     | Run steps 2–3 for new stories only, then step 4 for all. |
-
-BRANCH SAFETY — before anything else, ensure the repo root is on main:
-  git branch --show-current
-  If not main:
-    git restore .
-    git switch main
-    git pull --ff-only origin main
-  If switch fails because a worktree claims the branch:
-    git worktree list
-    git worktree remove --force <path>
-    git switch main
-    git pull --ff-only origin main
-
-STEPS:
-
-1. Read `_bmad-output/implementation-artifacts/sprint-status.yaml`. Note current story
-   statuses. Compare against the existing graph (if any) to identify new stories.
-
-2. Read `_bmad-output/planning-artifacts/epics.md` for dependency relationships of
-   new stories. (Skip if no new stories.)
-
-3. Run /bmad-help with the epic context for new stories — ask it to map their
-   dependencies. Merge the result into the existing graph. (Skip if no new stories.)
-
-4. Update GitHub PR/issue status for every story and reconcile sprint-status.yaml.
-   Follow the procedure in `references/phase0-dependency-graph.md` exactly.
-
-5. Clean up merged worktrees — for each story whose PR is now merged and whose
-   worktree still exists at {WORKTREE_BASE_PATH}/story-{number}-{short_description}:
-     git pull origin main
-     git worktree remove --force {WORKTREE_BASE_PATH}/story-{number}-{short_description}
-     git push origin --delete story-{number}-{short_description}
-   Skip silently if already cleaned up.
-
-6. Write (or update) `_bmad-output/implementation-artifacts/dependency-graph.md`.
-   Follow the schema, Ready to Work rules, and example in
-   `references/phase0-dependency-graph.md` exactly.
-
-7. Pull latest main (if step 5 didn't already do so):
-     git pull origin main
-
-REPORT BACK to the coordinator with this structured summary:
-  - ready_stories: list of { number, short_description, status } for every story
-    marked "Ready to Work: Yes" that is not done
-  - all_stories_done: true/false — whether every story across every epic is done
-  - current_epic: name/number of the lowest incomplete epic
-  - any warnings or blockers worth surfacing
+```
+Read `references/subagents/phase0-prompt.md` and follow its instructions exactly.
 ```

 The coordinator uses the report to drive Phase 1. No coordinator-side file reads.
@@ -172,6 +133,27 @@ Ready: {N} stories — {comma-separated story numbers}
 Blocked: {N} stories (if any)
 ```

+After Phase 0 completes, **rebuild the task list in correct execution order** — tasks display in creation order, so delete and re-add to ensure Phase 2 story tasks appear before Phase 3 and Phase 4:
+
+1. Mark `Phase 0: Dependency graph` → `completed`
+2. Mark `Phase 1: Story selection` → `completed` (already done)
+3. Delete all seven generic startup tasks: the five `Phase 2: Step N` tasks, `Phase 3: Auto-merge`, and `Phase 4: Batch summary + continuation`
+4. Re-add in execution order using TaskCreate:
+
+```
+[ ] Phase 2 | Story {N}: Step 1 — Create story ← one set per selected story, all stories first
+[ ] Phase 2 | Story {N}: Step 2 — Develop
+[ ] Phase 2 | Story {N}: Step 3 — Code review
+[ ] Phase 2 | Story {N}: Step 4 — PR + CI
+[ ] Phase 2 | Story {N}: Step 5 — PR review
+                                               ← repeat for each story in the batch
+[ ] Phase 3: Auto-merge                        ← if AUTO_PR_MERGE=true
+[completed] Phase 3: Auto-merge — skipped (AUTO_PR_MERGE=false)  ← if AUTO_PR_MERGE=false
+[ ] Phase 4: Batch summary + continuation
+```
+
+Update each story step task to `in_progress` when its subagent is spawned, and `completed` (or `failed`) when it reports back. Update Phase 3 and Phase 4 tasks similarly as they execute.
+
 ---

 ## Phase 1: Discover Stories
@@ -200,12 +182,12 @@ Launch all stories' Step 1 subagents **in a single message** (parallel). Each st
 | `review`        | Step 3     | Steps 1–2 |
 | `done`          | —          | all       |

-**After each step:** run **Pre-Continuation Checks** (see `references/pre-continuation-checks.md`) before spawning the next subagent. Pre-Continuation Checks are the only coordinator-side work between steps.
+**After each step:** run **Pre-Continuation Checks** (see `references/coordinator/gate-pre-continuation.md`) before spawning the next subagent. Pre-Continuation Checks are the only coordinator-side work between steps.

 **On failure:** stop that story's pipeline. Report step, story, and error. Other stories continue.  
 **Exception:** rate/usage limit failures → run Pre-Continuation Checks (which auto-pauses until reset) then retry.

-📣 **Notify per story** as each pipeline concludes (Step 4 success or any step failure):
+📣 **Notify per story** as each pipeline concludes (Step 5 success or any step failure):
 - Success: `✅ Story {number} done — PR #{pr_number}`
 - Failure: `❌ Story {number} failed at Step {N} — {brief error}`

@@ -228,7 +210,11 @@ Working directory: {repo_root}. Auto-approve all tool calls (yolo mode).

 3. Run /bmad-create-story {number}-{short_description}.

-4. Update sprint-status.yaml at the REPO ROOT (not the worktree copy):
+4. Run "validate story {number}-{short_description}". For every finding,
+   apply a fix directly to the story file using your best engineering judgement.
+   Repeat until no findings remain.
+
+5. Update sprint-status.yaml at the REPO ROOT (not the worktree copy):
     _bmad-output/implementation-artifacts/sprint-status.yaml
   Set story {number} status to `ready-for-dev`.

@@ -282,18 +268,32 @@ Auto-approve all tool calls (yolo mode).
   If the result is NOT story-{number}-{short_description}, stash changes, checkout the
   correct branch, and re-apply. Never push to main or create a new branch.

-3. Run /commit-commands:commit-push-pr.
-   PR title: story-{number}-{short_description} - fixes #{gh_issue_number}
-   (look up gh_issue_number from the epic file or sprint-status.yaml; omit "fixes #" if none)
-   Add "Fixes #{gh_issue_number}" to the PR description if an issue number exists.
+3. Look up the GitHub issue number for this story:
+   Read the story's section in `_bmad-output/planning-artifacts/epics.md` and extract
+   the `**GH Issue:**` field. Save as `gh_issue_number`. If the field is absent
+   (local-only mode — no GitHub auth), proceed without it.

-4. CI:
+4. Run /commit-commands:commit-push-pr.
+   PR title: story-{number}-{short_description} - fixes #{gh_issue_number}
+   Include "Fixes #{gh_issue_number}" in the PR description body (omit only if
+   no issue number was found in step 3).
+
+5. CI:
   - If RUN_CI_LOCALLY is true → skip GitHub Actions and run the Local CI Fallback below.
-   - Otherwise monitor CI in a loop:
+   - Otherwise, if MONITOR_SUPPORT is true → use the Monitor tool to watch CI status:
+       Write a poller script:
+         while true; do gh run view --json status,conclusion 2>&1; sleep 30; done
+       Start it with Monitor. React to each output line as it arrives:
+       - conclusion=success → stop Monitor, report success
+       - conclusion=failure or cancelled → stop Monitor, diagnose, fix, push, restart Monitor
+       - Billing/spending limit error in output → stop Monitor, run Local CI Fallback
+       - gh TLS/auth error in output → stop Monitor, switch to curl poller from `references/coordinator/pattern-gh-curl-fallback.md`
+   - Otherwise → poll manually in a loop:
       gh run view
+     (If `gh` fails, use `gh run view` curl equivalent from `references/coordinator/pattern-gh-curl-fallback.md`)
     - Billing/spending limit error → exit loop, run Local CI Fallback
     - CI failed for other reason, or Claude bot left PR comments → fix, push, loop
-     - CI green → proceed to step 5
+     - CI green → report success

 LOCAL CI FALLBACK (when RUN_CI_LOCALLY=true or billing-limited):
  a. Read all .github/workflows/ files triggered on pull_request events.
@@ -309,35 +309,34 @@ LOCAL CI FALLBACK (when RUN_CI_LOCALLY=true or billing-limited):
     - [failure] → [fix]
     All rows must show ✅ Pass before this step is considered complete.

-5. CODE REVIEW — spawn a dedicated MODEL_DEV subagent (yolo mode) after CI passes:
-   ```
-   You are the code review agent for story {number}-{short_description}.
-   Working directory: {repo_root}/{WORKTREE_BASE_PATH}/story-{number}-{short_description}.
-   Auto-approve all tool calls (yolo mode).
+Report: success or failure, and the PR number/URL if opened.
+```

-   1. Run /code-review:code-review (reads the PR diff via gh pr diff).
-   2. For every finding, apply a fix using your best engineering judgement.
-      Do not skip or defer any finding — fix them all.
-   3. Commit all fixes and push to the PR branch.
-   4. If any fixes were pushed, re-run /code-review:code-review once more to confirm
-      no new issues were introduced. Repeat fix → commit → push → re-review until
-      the review comes back clean.
+### Step 5: PR Code Review (`MODEL_STANDARD`)

-   Report: clean (no findings or all fixed) or failure with details.
-   ```
-   Wait for the subagent to report before continuing. If it reports failure,
-   stop this story and surface the error.
+Spawn with model `MODEL_STANDARD` (yolo mode):
+```
+You are the Step 5 PR code reviewer for story {number}-{short_description}.
+Working directory: {repo_root}/{WORKTREE_BASE_PATH}/story-{number}-{short_description}.
+Auto-approve all tool calls (yolo mode).

-6. Update sprint-status.yaml at the REPO ROOT:
+1. Run /code-review:code-review (reads the PR diff via gh pr diff).
+2. For every finding, apply a fix using your best engineering judgement.
+   Do not skip or defer any finding — fix them all.
+3. Commit all fixes and push to the PR branch.
+4. If any fixes were pushed, re-run /code-review:code-review once more to confirm
+   no new issues were introduced. Repeat fix → commit → push → re-review until
+   the review comes back clean.
+5. Update sprint-status.yaml at the REPO ROOT:
     {repo_root}/_bmad-output/implementation-artifacts/sprint-status.yaml
   Set story {number} status to `done`.

-Report: success or failure, and the PR number/URL if opened.
+Report: clean (no findings or all fixed) or failure with details.
 ```

 ---

-## Phase 3: Auto-Merge Batch PRs (when `AUTO_PR_MERGE=true`)
+## Phase 3: Auto-Merge Batch PRs (when AUTO_PR_MERGE=true)

 After all batch stories complete Phase 2, merge every successful story's PR into `main` — one subagent per story, **sequentially** (lowest story number first).

@@ -348,7 +347,7 @@ After all batch stories complete Phase 2, merge every successful story's PR into
 1. Collect all stories from the current batch that reached Step 4 successfully (have a PR). Sort ascending by story number.
 2. For each story **sequentially** (wait for each to complete before starting the next):
   - Pull latest main at the repo root: spawn a quick subagent or include in the merge subagent.
-   - Spawn a `MODEL_STANDARD` subagent (yolo mode) with the instructions from `references/phase4-auto-merge.md`.
+   - Spawn a `MODEL_STANDARD` subagent (yolo mode) with the instructions from `references/subagents/phase3-merge.md`.
   - Run Pre-Continuation Checks after the subagent completes. If it fails (unresolvable conflict, CI blocking), report the error and continue to the next story.
 3. Print a merge summary (coordinator formats from subagent reports):
   ```
@@ -399,11 +398,11 @@ After all batch stories complete Phase 2, merge every successful story's PR into
 Coordinator prints immediately — no file reads, formats from Phase 2 step results:

 ```
-Story   | Step 1 | Step 2 | Step 3 | Step 4 | Result
--------|--------|--------|--------|--------|-------
-9.1     |   OK   |   OK   |   OK   |   OK   | PR #142
-9.2     |   OK   |   OK   |  FAIL  |   --   | Review failed: ...
-9.3     |   OK   |   OK   |   OK   |   OK   | PR #143
+Story   | Step 1 | Step 2 | Step 3 | Step 4 | Step 5 | Result
+--------|--------|--------|--------|--------|--------|-------
+9.1     |   OK   |   OK   |   OK   |   OK   |   OK   | PR #142
+9.2     |   OK   |   OK   |  FAIL  |   --   |   --   | Review failed: ...
+9.3     |   OK   |   OK   |   OK   |   OK   |   OK   | PR #143
 ```

 If arriving from Phase 1 with no ready stories:
@@ -428,28 +427,39 @@ Epic completion assessment. Auto-approve all tool calls (yolo mode).
 Read:
  - _bmad-output/planning-artifacts/epics.md
  - _bmad-output/implementation-artifacts/sprint-status.yaml
+  - _bmad-output/implementation-artifacts/dependency-graph.md
+
+Use the dependency graph's PR Status column as the authoritative source for whether a PR is
+merged. sprint-status `done` means the pipeline finished (code review complete) — it does NOT
+mean the PR is merged.

 Report back:
-  - current_epic_complete: true/false (all stories done or have open PRs)
-  - all_epics_complete: true/false (every story across every epic is done)
+  - current_epic_merged: true/false — every story in the current epic has PR Status = `merged`
+    in the dependency graph
+  - current_epic_prs_open: true/false — every story in the current epic has a PR number in the
+    dependency graph, but at least one PR Status is not `merged`
+  - all_epics_complete: true/false — every story across every epic has PR Status = `merged`
+    in the dependency graph
  - current_epic_name: name/number of the lowest incomplete epic
  - next_epic_name: name/number of the next epic (if any)
-  - stories_remaining: count of non-done stories in the current epic
+  - stories_remaining: count of stories in the current epic whose PR Status is not `merged`
 ```

 Using the assessment report:

-**If `current_epic_complete = true`:**
+**If `current_epic_merged = true`:**
 1. Print: `🎉 Epic {current_epic_name} is complete! Starting retrospective countdown ({RETRO_TIMER_SECONDS ÷ 60} minutes)...`

   📣 **Notify:** `🎉 Epic {current_epic_name} complete! Running retrospective in {RETRO_TIMER_SECONDS ÷ 60} min...`
-2. Start a timer using the **[Timer Pattern](#timer-pattern)** with:
+2. Start a timer using the **[Timer Pattern](references/coordinator/pattern-timer.md)** with:
   - **Duration:** `RETRO_TIMER_SECONDS`
-   - **Fire prompt:** `"BAD_RETRO_TIMER_FIRED — The retrospective countdown has elapsed. Auto-run the retrospective: spawn a MODEL_DEV subagent (yolo mode) to run /bmad-retrospective, accept all changes. Run Pre-Continuation Checks after it completes, then proceed to Phase 4 Step 3."`
+   - **Fire prompt:** `"BAD_RETRO_TIMER_FIRED — The retrospective countdown has elapsed. Auto-run the retrospective: spawn a MODEL_STANDARD subagent (yolo mode) to run /bmad-retrospective, accept all changes. Run Pre-Continuation Checks after it completes, then proceed to Phase 4 Step 3."`
   - **[C] label:** `Run retrospective now`
   - **[S] label:** `Skip retrospective`
-   - **[C] / FIRED action:** Spawn MODEL_DEV subagent (yolo mode) to run `/bmad-retrospective`. Accept all changes. Run Pre-Continuation Checks after.
+   - **[X] label:** `Stop BAD`
+   - **[C] / FIRED action:** Spawn MODEL_STANDARD subagent (yolo mode) to run `/bmad-retrospective`. Accept all changes. Run Pre-Continuation Checks after.
   - **[S] action:** Skip retrospective.
+   - **[X] action:** `CronDelete(JOB_ID)`, stop BAD, print final summary, and 📣 **Notify:** `🛑 BAD stopped by user.`
 3. Proceed to Step 3 after the retrospective decision resolves.

 ### Step 3: Gate & Continue
@@ -465,118 +475,60 @@ Using the assessment report from Step 2, follow the applicable branch:
 **Branch B — More work remains:**

 1. Print a status line:
-   - Epic just completed: `✅ Epic {current_epic_name} complete. Next up: Epic {next_epic_name} ({stories_remaining} stories remaining).`
-   - More stories in current epic: `✅ Batch complete. Ready for the next batch.`
-2. Start a timer using the **[Timer Pattern](#timer-pattern)** with:
-   - **Duration:** `WAIT_TIMER_SECONDS`
-   - **Fire prompt:** `"BAD_WAIT_TIMER_FIRED — The post-batch wait has elapsed. Run Pre-Continuation Checks, then re-run Phase 0, then proceed to Phase 1."`
-   - **[C] label:** `Continue now`
-   - **[S] label:** `Stop BAD`
-   - **[C] / FIRED action:** Run Pre-Continuation Checks, then re-run Phase 0.
-   - **[S] action:** Stop BAD, print a final summary, and 📣 **Notify:** `🛑 BAD stopped by user.`
+   - `current_epic_merged = true` (epic fully landed): `✅ Epic {current_epic_name} complete. Next up: Epic {next_epic_name} ({stories_remaining} stories remaining).`
+   - `current_epic_prs_open = true` (all stories have PRs, waiting for merges): `⏸ Epic {current_epic_name} in review — waiting for PRs to merge before continuing.`
+   - Otherwise (more stories to develop in current epic): `✅ Batch complete. Ready for the next batch.`
+2. Start the wait using the **[Monitor Pattern](references/coordinator/pattern-monitor.md)** (when `MONITOR_SUPPORT=true`) or the **[Timer Pattern](references/coordinator/pattern-timer.md)** (when `MONITOR_SUPPORT=false`):
+
+   **If `MONITOR_SUPPORT=true` — Monitor + CronCreate fallback:**
+   - Fill in `BATCH_PRS` from the Phase 0 pending-PR report (space-separated numbers, e.g. `"101 102 103"`). Use the PR-merge watcher script from [monitor-pattern.md](references/coordinator/pattern-monitor.md) with that value substituted. Save the Monitor handle as `PR_MONITOR`.
+   - Also start a CronCreate fallback timer using the [Timer Pattern](references/coordinator/pattern-timer.md) with:
+     - **Duration:** `WAIT_TIMER_SECONDS`
+     - **Fire prompt:** `"BAD_WAIT_TIMER_FIRED — Max wait elapsed. Stop PR_MONITOR, run Pre-Continuation Checks, then re-run Phase 0."`
+     - **[C] label:** `Continue now`
+     - **[S] label:** `Stop BAD`
+     - **[C] / FIRED action:** Stop `PR_MONITOR`, run Pre-Continuation Checks, then re-run Phase 0.
+     - **[S] action:** Stop `PR_MONITOR`, CronDelete, stop BAD, print final summary, and 📣 **Notify:** `🛑 BAD stopped by user.`
+   - **On `MERGED: #N` event:** log progress — `✅ PR #N merged — waiting for remaining batch PRs`; keep `PR_MONITOR` running.
+   - **On `ALL_MERGED` event:** CronDelete the fallback timer, stop `PR_MONITOR`, run Pre-Continuation Checks, re-run Phase 0.
+   - 📣 **Notify:** `⏳ Watching for PR merges (max wait: {WAIT_TIMER_SECONDS ÷ 60} min)...`
+
+   **If `MONITOR_SUPPORT=false` — Timer only:**
+   - Use the [Timer Pattern](references/coordinator/pattern-timer.md) with:
+     - **Duration:** `WAIT_TIMER_SECONDS`
+     - **Fire prompt:** `"BAD_WAIT_TIMER_FIRED — The post-batch wait has elapsed. Run Pre-Continuation Checks, then re-run Phase 0, then proceed to Phase 1."`
+     - **[C] label:** `Continue now`
+     - **[S] label:** `Stop BAD`
+     - **[C] / FIRED action:** Run Pre-Continuation Checks, then re-run Phase 0.
+     - **[S] action:** Stop BAD, print a final summary, and 📣 **Notify:** `🛑 BAD stopped by user.`
+
 3. After Phase 0 completes:
   - At least one story unblocked → proceed to Phase 1.
-   - All stories still blocked → print which PRs are pending (from Phase 0 report), restart Branch B for another `WAIT_TIMER_SECONDS` countdown.
+   - All stories still blocked → print which PRs are pending (from Phase 0 report), restart Branch B for another wait.

 ---

 ## Notify Pattern

-Use this pattern every time a `📣 Notify:` callout appears **anywhere in this skill** — including inside the Timer Pattern.
-
-**If `NOTIFY_SOURCE="telegram"`:** call `mcp__plugin_telegram_telegram__reply` with:
- `chat_id`: `NOTIFY_CHAT_ID`
- `text`: the message
-
-**If `NOTIFY_SOURCE="terminal"`** (or if the Telegram tool call fails): print the message in the conversation as a normal response.
-
-Always send both a terminal print and a channel message — the terminal print keeps the in-session transcript readable, and the channel message reaches the user on their device.
+Read `references/coordinator/pattern-notify.md` whenever a `📣 Notify:` callout appears. It covers Telegram and terminal output.

 ---

 ## Timer Pattern

-Both the retrospective and post-batch wait timers use this pattern. The caller supplies the duration, fire prompt, option labels, and actions.
-
-Behaviour depends on `TIMER_SUPPORT`:
+Read `references/coordinator/pattern-timer.md` when instructed to start a timer. It covers both `TIMER_SUPPORT=true` (CronCreate) and `TIMER_SUPPORT=false` (prompt-based) paths.

 ---

-### If `TIMER_SUPPORT=true` (native platform timers)
+## Monitor Pattern

-**Step 1 — compute target cron expression** (convert seconds to minutes: `SECONDS ÷ 60`):
-```bash
-# macOS
-date -v +{N}M '+%M %H %d %m *'
-# Linux
-date -d '+{N} minutes' '+%M %H %d %m *'
-```
-Save as `CRON_EXPR`. Save `TIMER_START=$(date +%s)`.
-
-**Step 2 — create the one-shot timer** via `CronCreate`:
- `cron`: expression from Step 1
- `recurring`: `false`
- `prompt`: the caller-supplied fire prompt
-
-Save the returned job ID as `JOB_ID`.
-
-**Step 3 — print the options menu** (always all three options):
-> Timer running (job: {JOB_ID}). I'll act in {N} minutes.
->
-> - **[C] Continue** — {C label}
-> - **[S] Stop** — {S label}
-> - **[M] {N} Modify timer to {N} minutes** — shorten or extend the countdown
-
-📣 **Notify** using the [Notify Pattern](#notify-pattern) with the same options so the user can respond from their device:
-```
-⏱ Timer set — {N} minutes (job: {JOB_ID})
-
-[C] {C label}
-[S] {S label}
-[M] <minutes> — modify countdown
-```
-
-Wait for whichever arrives first — user reply or fired prompt. On any human reply, print elapsed time first:
-```bash
-ELAPSED=$(( $(date +%s) - TIMER_START ))
-echo "⏱ Time elapsed: $((ELAPSED / 60))m $((ELAPSED % 60))s"
-```
-
- **[C]** → `CronDelete(JOB_ID)`, run the [C] action
- **[S]** → `CronDelete(JOB_ID)`, run the [S] action
- **[M] N** → `CronDelete(JOB_ID)`, recompute cron for N minutes from now, `CronCreate` again with same fire prompt, update `JOB_ID` and `TIMER_START`, print updated countdown, then 📣 **Notify** using the [Notify Pattern](#notify-pattern):
-  ```
-  ⏱ Timer updated — {N} minutes (job: {JOB_ID})
-
-  [C] {C label}
-  [S] {S label}
-  [M] <minutes> — modify countdown
-  ```
- **FIRED (no prior reply)** → run the [C] action automatically
+Read `references/coordinator/pattern-monitor.md` when `MONITOR_SUPPORT=true`. It covers CI status polling (Step 4) and PR-merge watching (Phase 4 Branch B), plus the `MONITOR_SUPPORT=false` fallback for each.

 ---

-### If `TIMER_SUPPORT=false` (prompt-based continuation)
+## gh → curl Fallback Pattern

-Save `TIMER_START=$(date +%s)`. No native timer is created — print the options menu immediately and wait for user reply:
-
-> Waiting {N} minutes before continuing. Reply when ready.
->
-> - **[C] Continue** — {C label}
-> - **[S] Stop** — {S label}
-> - **[M] N** — remind me after N minutes (reply with `[M] <minutes>`)
-
-📣 **Notify** using the [Notify Pattern](#notify-pattern) with the same options.
-
-On any human reply, print elapsed time first:
-```bash
-ELAPSED=$(( $(date +%s) - TIMER_START ))
-echo "⏱ Time elapsed: $((ELAPSED / 60))m $((ELAPSED % 60))s"
-```
-
- **[C]** → run the [C] action
- **[S]** → run the [S] action
- **[M] N** → update `TIMER_START`, print updated wait message, 📣 **Notify**, and wait again
+Read `references/coordinator/pattern-gh-curl-fallback.md` when any `gh` command fails (TLS error, sandbox restriction, spending limit, etc.). Pass the path to subagents that run `gh` commands so they can self-recover. Note: `gh pr merge` has no curl fallback — if unavailable, surface the failure to the user.

 ---

@@ -584,7 +536,7 @@ echo "⏱ Time elapsed: $((ELAPSED / 60))m $((ELAPSED % 60))s"

 1. **Delegate mode only** — never read files, run git/gh commands, or write to disk yourself. The only platform command the coordinator may run directly is context compaction via Pre-Continuation Checks (when `CONTEXT_COMPACTION_THRESHOLD` is exceeded). All other slash commands and operations are delegated to subagents.
 2. **One subagent per step per story** — spawn only after the previous step reports success.
-3. **Sequential steps within a story** — Steps 1→2→3→4 run strictly in order.
+3. **Sequential steps within a story** — Steps 1→2→3→4→5 run strictly in order.
 4. **Parallel stories** — launch all stories' Step 1 in one message (one tool call per story). Phase 3 runs sequentially by design.
 5. **Dependency graph is authoritative** — never pick a story whose dependencies are not fully merged. Use Phase 0's report, not your own file reads.
 6. **Phase 0 runs before every batch** — always after the Phase 4 wait. Always as a fresh subagent.
--- a/skills/bad/assets/bad-statusline.sh
+++ b/skills/bad/assets/bad-statusline.sh
@@ -0,0 +1,25 @@
+#!/bin/bash
+# BAD session-state capture — installed to {project-root}/.claude/bad-statusline.sh
+# during /bad setup. Configured as the Claude Code statusLine script so it runs
+# after every API response and writes the session JSON to a temp file that the
+# BAD coordinator reads during Pre-Continuation Checks.
+#
+# To chain with an existing statusline script:
+#   SESSION_JSON=$(cat)
+#   echo "$SESSION_JSON" | /path/to/your-existing-script.sh
+#   echo "$SESSION_JSON" > "${TMPDIR:-/tmp}/bad-session-state.json"
+
+SESSION_JSON=$(cat)
+echo "$SESSION_JSON" > "${TMPDIR:-/tmp}/bad-session-state.json"
+# Output nothing — add your own status text here if desired, e.g.:
+# python3 -c "
+# import sys, json
+# d = json.loads(sys.stdin.read())
+# cw = d.get('context_window', {})
+# pct = cw.get('used_percentage')
+# rl = d.get('rate_limits', {})
+# fh = rl.get('five_hour', {}).get('used_percentage')
+# parts = [f'ctx:{pct:.0f}%' if pct is not None else None,
+#          f'rate5h:{fh:.0f}%' if fh is not None else None]
+# print(' '.join(p for p in parts if p))
+# " <<< \"\$SESSION_JSON\" 2>/dev/null || true
--- a/skills/bad/assets/module-setup.md
+++ b/skills/bad/assets/module-setup.md
@@ -2,7 +2,7 @@

 Standalone module self-registration for BMad Autonomous Development. This file is loaded when:
 - The user passes `setup`, `configure`, or `install` as an argument
- The module is not yet registered in `{project-root}/_bmad/bad/config.yaml`
+- The module is not yet registered in `{project-root}/_bmad/config.yaml` (no `bad:` section)

 ## Overview

@@ -37,6 +37,47 @@ Check for the presence of harness directories at the project root:

 Store all detected harnesses. Determine the **current harness** from this skill's own file path — whichever harness directory contains this running skill is the current harness. Use the current harness to drive the question branch in Step 3.

+## Step 2b: Session-State Hook (Claude Code only)
+
+Skip this step if `claude-code` was not detected in Step 2.
+
+The BAD coordinator's Pre-Continuation Checks (rate-limit pausing, context compaction) need
+access to Claude Code session state — `context_window.used_percentage` and `rate_limits.*`.
+Claude Code exposes this data via the `statusLine` script mechanism: it pipes a JSON blob to
+the script on every API response. This step installs a lightweight script that writes that JSON
+to a temp file the coordinator reads with the Bash tool.
+
+Ask: **"Install BAD session-state capture (writes rate-limit / context data to a temp file for Pre-Continuation Checks)? [Y/n]"**
+
+Default: **yes** (or auto-accept if `--headless` / `accept all defaults`).
+
+If **yes**:
+
+1. Copy `./assets/bad-statusline.sh` to `{project-root}/.claude/bad-statusline.sh`.
+   Make it executable: `chmod +x {project-root}/.claude/bad-statusline.sh`.
+
+2. Read `{project-root}/.claude/settings.json` (create `{}` if absent).
+
+3. Check if a `statusLine` key already exists:
+   - **If absent:** set it to:
+     ```json
+     "statusLine": {
+       "type": "command",
+       "command": ".claude/bad-statusline.sh"
+     }
+     ```
+   - **If already set:** do not overwrite. Instead, print:
+     ```
+     ⚠️  A statusLine is already configured in .claude/settings.json.
+     To enable BAD's session-state capture, chain it manually:
+       1. Open .claude/bad-statusline.sh — it shows how to pipe to an existing script.
+       2. Add .claude/bad-statusline.sh as the last step in your existing statusline pipeline.
+     ```
+
+4. Write the updated `settings.json` back (only if you modified it).
+
+---
+
 ## Step 3: Collect Configuration

 Show defaults in brackets. Present all values together so the user can respond once with only what they want to change. Never say "press enter" or "leave blank".
@@ -80,7 +121,7 @@ Present as **"Claude Code settings"**:
 - `api_five_hour_threshold` — 5-hour API usage % at which to pause [80]
 - `api_seven_day_threshold` — 7-day API usage % at which to pause [95]

-Automatically write `timer_support: true` — no prompt needed.
+Automatically write `timer_support: true` and `monitor_support: true` — no prompt needed.

 #### All Other Harnesses

@@ -90,7 +131,7 @@ Present as **"{HarnessName} settings"**:
 - `model_quality` — Model for code review step (e.g. `best`, `o1`, `pro`)
 - `api_usage_threshold` — API usage % at which to pause for rate limits [80]

-Automatically write `timer_support: false` — no prompt needed. BAD will use prompt-based continuation instead of native timers on this harness.
+Automatically write `timer_support: false` and `monitor_support: false` — no prompt needed. BAD will use prompt-based continuation instead of native timers, and manual polling loops instead of the Monitor tool, on this harness.

 ## Step 4: Write Files

@@ -107,6 +148,7 @@ Write a temp JSON file with collected answers structured as:
    "retro_timer_seconds": "600",
    "context_compaction_threshold": "80",
    "timer_support": true,
+    "monitor_support": true,
    "model_standard": "sonnet",
    "model_quality": "opus",
    "api_five_hour_threshold": "80",
--- a/skills/bad/assets/module.yaml
+++ b/skills/bad/assets/module.yaml
@@ -1,7 +1,7 @@
 code: bad
 name: "BMad Autonomous Development"
 description: "Orchestrates parallel BMad story implementation pipelines — automatically runs bmad-create-story, bmad-dev-story, bmad-code-review, and commit/PR in batches, driven by the sprint backlog and dependency graph"
-module_version: "1.0.0"
+module_version: "1.1.0"
 module_greeting: "BAD is ready. Run /bad to start. Pass KEY=VALUE args to override config at runtime (e.g. /bad MAX_PARALLEL_STORIES=2)."

 header: "BAD — BMad Autonomous Development"
--- a/skills/bad/references/coordinator/gate-pre-continuation.md
+++ b/skills/bad/references/coordinator/gate-pre-continuation.md
@@ -1,18 +1,28 @@
 # Pre-Continuation Checks

-Run these checks **in order** every time you (the coordinator) are about to re-enter Phase 0 — whether triggered by a user reply, a timer firing, or the automatic loop.
+Run these checks **in order** at every gate point: between Phase 2 steps, after each Phase 3 merge, after the retrospective, and before re-entering Phase 0 — whether triggered by a user reply, a timer firing, or the automatic loop.

-**Harness note:** Checks 2 and 3 read from platform-provided session state (e.g. Claude Code's stdin JSON). On other harnesses this data may not be available — each check gracefully skips if its fields are absent.
+## Channel Reconnect (run first, before the numbered checks)

-Read the current session state from whatever mechanism your platform provides (e.g. Claude Code pipes session JSON to stdin). The relevant fields:
+If `NOTIFY_SOURCE` is not `"terminal"` (i.e. a channel like Telegram is configured), run `/reload-plugins` now. This is a no-op when the plugin is already connected, and silently restores it when it has dropped. No user-visible output needed unless the channel was actually missing.
+
+**Harness note:** Checks 2 and 3 require session state data. On Claude Code, this is available via the session-state hook installed by `/bad setup` (Step 2b). On other harnesses this data may not be available — each check gracefully skips if its fields are absent.
+
+Read the current session state using the Bash tool:
+
+```bash
+cat "${TMPDIR:-/tmp}/bad-session-state.json" 2>/dev/null || echo "{}"
+```
+
+Parse the output as JSON. The relevant fields:

 - `context_window.used_percentage` — 0–100, percentage of context window consumed (treat null as 0)
- `rate_limits.five_hour.used_percentage` — 0–100 (Claude Code: Pro/Max subscribers only)
+- `rate_limits.five_hour.used_percentage` — 0–100 (Claude Code Pro/Max only)
 - `rate_limits.five_hour.resets_at` — Unix epoch seconds when the 5-hour window resets
 - `rate_limits.seven_day.used_percentage` — 0–100 (Claude Code only)
 - `rate_limits.seven_day.resets_at` — Unix epoch seconds when the 7-day window resets

-Each field may be independently absent. If absent, skip the corresponding check.
+Each field may be independently absent. If the file does not exist or a field is absent, skip the corresponding check.

 ---

@@ -85,4 +95,4 @@ If `rate_limits.seven_day.used_percentage` is present and **> `API_SEVEN_DAY_THR

 ---

-Only after all applicable checks pass, proceed to re-run Phase 0 in full.
+Only after all applicable checks pass, proceed with the next step (or re-run Phase 0, if that is what triggered this gate).
--- a/skills/bad/references/coordinator/pattern-gh-curl-fallback.md
+++ b/skills/bad/references/coordinator/pattern-gh-curl-fallback.md
@@ -0,0 +1,102 @@
+# gh → curl Fallback Pattern
+
+If any `gh` command fails for any reason (TLS error, sandbox restriction, spending limit, etc.), use the curl equivalents below.
+
+## Setup
+
+```bash
+TOKEN=$(gh auth token 2>/dev/null)
+REMOTE=$(git remote get-url origin)
+OWNER=$(echo "$REMOTE" | sed 's|.*github\.com[:/]\([^/]*\)/.*|\1|')
+REPO=$(echo "$REMOTE" | sed 's|.*github\.com[:/][^/]*/\([^.]*\).*|\1|')
+```
+
+---
+
+## Read operations
+
+### gh pr list
+
+```bash
+# gh pr list --search "story-{number}" --state all --json number,title,state,mergedAt
+curl -s -H "Authorization: Bearer $TOKEN" \
+  "https://api.github.com/repos/$OWNER/$REPO/pulls?state=all&per_page=100" | \
+  python3 -c "
+import sys, json
+for p in json.load(sys.stdin):
+    if 'story-{number}' in p['head']['ref']:
+        print(p['number'], p['title'], p['state'], p.get('merged_at') or '')
+"
+```
+
+### gh pr view
+
+```bash
+# gh pr view {pr_number} --json number,title,mergeable,mergeStateStatus,state,mergedAt
+curl -s -H "Authorization: Bearer $TOKEN" \
+  "https://api.github.com/repos/$OWNER/$REPO/pulls/{pr_number}" | \
+  python3 -c "
+import sys, json; p = json.load(sys.stdin)
+print('number:', p['number'])
+print('state:', p['state'])
+print('mergeable:', p.get('mergeable'))
+print('mergeStateStatus:', p.get('merge_state_status'))
+print('mergedAt:', p.get('merged_at'))
+"
+```
+
+### gh pr checks
+
+```bash
+# gh pr checks {pr_number}
+SHA=$(curl -s -H "Authorization: Bearer $TOKEN" \
+  "https://api.github.com/repos/$OWNER/$REPO/pulls/{pr_number}" | \
+  python3 -c "import sys,json; print(json.load(sys.stdin)['head']['sha'])")
+
+curl -s -H "Authorization: Bearer $TOKEN" \
+  "https://api.github.com/repos/$OWNER/$REPO/commits/$SHA/check-runs" | \
+  python3 -c "
+import sys, json
+for r in json.load(sys.stdin).get('check_runs', []):
+    print(r['name'], r['status'], r.get('conclusion') or 'pending')
+"
+```
+
+For polling (replacing `gh pr checks --watch`): call the above in a 30-second sleep loop until all `status` values are `completed`.
+
+### gh issue list
+
+```bash
+# gh issue list --search "Story {n}:" --json number,title,state
+curl -s -H "Authorization: Bearer $TOKEN" \
+  "https://api.github.com/repos/$OWNER/$REPO/issues?state=all&per_page=100" | \
+  python3 -c "
+import sys, json
+for i in json.load(sys.stdin):
+    if 'Story {n}:' in i['title']:
+        print(i['number'], i['title'], i['state'])
+"
+```
+
+### gh run view
+
+```bash
+# Get latest workflow run for current branch
+BRANCH=$(git branch --show-current)
+curl -s -H "Authorization: Bearer $TOKEN" \
+  "https://api.github.com/repos/$OWNER/$REPO/actions/runs?branch=$BRANCH&per_page=5" | \
+  python3 -c "
+import sys, json
+runs = json.load(sys.stdin).get('workflow_runs', [])
+if runs:
+    r = runs[0]
+    print('status:', r['status'])
+    print('conclusion:', r.get('conclusion') or 'pending')
+"
+```
+
+---
+
+## Write operations
+
+`gh pr merge` has no safe curl equivalent (requires a GraphQL mutation). If `gh` is unavailable for the merge step, report the failure and ask the user to merge manually.
--- a/skills/bad/references/coordinator/pattern-monitor.md
+++ b/skills/bad/references/coordinator/pattern-monitor.md
@@ -0,0 +1,67 @@
+# Monitor Pattern
+
+Use this pattern when `MONITOR_SUPPORT=true`. It covers two use cases in BAD: CI status polling (Step 4) and PR-merge watching (Phase 4 Branch B). The caller supplies the poll script and the reaction logic.
+
+> **Requires Claude Code v2.1.98+.** Uses the same Bash permission rules. Not available on Amazon Bedrock, Google Vertex AI, or Microsoft Azure Foundry — set `MONITOR_SUPPORT=false` on those platforms.
+
+## How it works
+
+1. **Write a poll script** — a `while true; do ...; sleep N; done` loop that emits one line per status change to stdout.
+2. **Start Monitor** — pass the script to the Monitor tool. Claude receives each stdout line as a live event and can react immediately without blocking the conversation.
+3. **React to events** — on each line, apply the caller's reaction logic (e.g. CI green → proceed; PR merged → continue).
+4. **Stop Monitor** — call stop/cancel on the Monitor handle when done (success, failure, or user override).
+
+## CI status polling (Step 4)
+
+Poll script (run inside the Step 4 subagent):
+```bash
+GH_BIN="$(command -v gh)"
+while true; do
+  "$GH_BIN" run view --json status,conclusion 2>&1
+  sleep 30
+done
+```
+
+React to each output line:
+- `"conclusion":"success"` → stop Monitor, proceed to step 5
+- `"conclusion":"failure"` or `"conclusion":"cancelled"` → stop Monitor, diagnose, fix, push, restart Monitor
+- Billing/spending limit text in output → stop Monitor, run Local CI Fallback
+
+## PR-merge watching (Phase 4 Branch B)
+
+The coordinator fills in `BATCH_PRS` (space-separated PR numbers from the Phase 0 pending-PR report) before starting Monitor.
+
+Poll script (run by the coordinator):
+```bash
+GH_BIN="$(command -v gh)"
+BATCH_PRS="101 102 103"   # coordinator replaces with actual pending PR numbers
+ALREADY_REPORTED=""
+while true; do
+  MERGED_NOW=$(cd /path/to/repo && "$GH_BIN" pr list --state merged --json number \
+    --jq '.[].number' 2>/dev/null | tr '\n' ' ' || echo "")
+  ALL_DONE=true
+  for PR in $BATCH_PRS; do
+    if echo " $MERGED_NOW " | grep -q " $PR "; then
+      if ! echo " $ALREADY_REPORTED " | grep -q " $PR "; then
+        echo "MERGED: #$PR"
+        ALREADY_REPORTED="$ALREADY_REPORTED $PR"
+      fi
+    else
+      ALL_DONE=false
+    fi
+  done
+  [ "$ALL_DONE" = "true" ] && echo "ALL_MERGED" && break
+  sleep 60
+done
+```
+
+> **Note:** Replace `/path/to/repo` with the absolute path to the project repository. Common mistakes: (1) `gh pr list` defaults to `--state open` — merged PRs are invisible without `--state merged`; (2) `gh` has no `-C` flag (unlike `git`) — use `cd` instead; (3) the Monitor shell inherits a stripped PATH — `gh` must be resolved with `command -v gh` before the script runs, not looked up by name inside it.
+
+React to each output line:
+- `MERGED: #N` → log progress (e.g. `✅ PR #N merged — waiting for remaining batch PRs`); keep Monitor running
+- `ALL_MERGED` → CronDelete the fallback timer, stop Monitor, run Pre-Continuation Checks, re-run Phase 0
+
+## If `MONITOR_SUPPORT=false`
+
+- **CI polling:** use the manual `gh run view` loop in Step 4 (see Step 4 fallback path in SKILL.md).
+- **PR-merge watching:** use the CronCreate-only Timer Pattern in Phase 4 Branch B (see fallback path in SKILL.md).
--- a/skills/bad/references/coordinator/pattern-notify.md
+++ b/skills/bad/references/coordinator/pattern-notify.md
@@ -0,0 +1,16 @@
+# Notify Pattern
+
+Use this pattern every time a `📣 Notify:` callout appears **anywhere in the BAD skill** — including inside the Timer Pattern and Monitor Pattern.
+
+**If `NOTIFY_SOURCE="telegram"`:** call `mcp__plugin_telegram_telegram__reply` with:
+- `chat_id`: `NOTIFY_CHAT_ID`
+- `text`: the message
+
+If the Telegram tool call fails (tool unavailable or error returned):
+1. Run `/reload-plugins` to reconnect the MCP server.
+2. Retry the tool call once.
+3. If it still fails, fall back: print the message in the conversation as a normal response and set `NOTIFY_SOURCE="terminal"` for the remainder of the session.
+
+**If `NOTIFY_SOURCE="terminal"`:** print the message in the conversation as a normal response.
+
+Always send both a terminal print and a channel message — the terminal print keeps the in-session transcript readable, and the channel message reaches the user on their device.
--- a/skills/bad/references/coordinator/pattern-timer.md
+++ b/skills/bad/references/coordinator/pattern-timer.md
@@ -0,0 +1,89 @@
+# Timer Pattern
+
+Both the retrospective and post-batch wait timers use this pattern. The caller supplies the duration, fire prompt, option labels, and actions.
+
+Behaviour depends on `TIMER_SUPPORT`:
+
+---
+
+## If `TIMER_SUPPORT=true` (native platform timers)
+
+**Step 1 — compute target cron expression** (convert seconds to minutes: `SECONDS ÷ 60`):
+```bash
+# macOS
+date -v +{N}M '+%M %H %d %m *'
+# Linux
+date -d '+{N} minutes' '+%M %H %d %m *'
+```
+Save as `CRON_EXPR`. Save `TIMER_START=$(date +%s)`.
+
+**Step 2 — create the one-shot timer** via `CronCreate`:
+- `cron`: expression from Step 1
+- `recurring`: `false`
+- `prompt`: the caller-supplied fire prompt
+
+Save the returned job ID as `JOB_ID`.
+
+**Step 3 — print the options menu** (always [C], [S], [M]; include [X] only if the caller supplied an [X] label):
+> Timer running (job: {JOB_ID}). I'll act in {N} minutes.
+>
+> - **[C] Continue** — {C label}
+> - **[S] Stop** — {S label}
+> - **[X] Exit** — {X label}  ← omit this line if no [X] label was supplied
+> - **[M] {N} Modify timer to {N} minutes** — shorten or extend the countdown
+
+📣 **Notify** (see `references/coordinator/pattern-notify.md`) with the same options so the user can respond from their device:
+```
+⏱ Timer set — {N} minutes (job: {JOB_ID})
+
+[C] {C label}
+[S] {S label}
+[X] {X label}   ← omit if no [X] label supplied
+[M] <minutes> — modify countdown
+```
+
+Wait for whichever arrives first — user reply or fired prompt. On any human reply, print elapsed time first:
+```bash
+ELAPSED=$(( $(date +%s) - TIMER_START ))
+echo "⏱ Time elapsed: $((ELAPSED / 60))m $((ELAPSED % 60))s"
+```
+
+- **[C]** → `CronDelete(JOB_ID)`, run the [C] action
+- **[S]** → `CronDelete(JOB_ID)`, run the [S] action
+- **[X]** → `CronDelete(JOB_ID)`, run the [X] action ← only if [X] label was supplied
+- **[M] N** → `CronDelete(JOB_ID)`, recompute cron for N minutes from now, `CronCreate` again with same fire prompt, update `JOB_ID` and `TIMER_START`, print updated countdown, then 📣 **Notify**:
+  ```
+  ⏱ Timer updated — {N} minutes (job: {JOB_ID})
+
+  [C] {C label}
+  [S] {S label}
+  [X] {X label}   ← omit if no [X] label supplied
+  [M] <minutes> — modify countdown
+  ```
+- **FIRED (no prior reply)** → run the [C] action automatically
+
+---
+
+## If `TIMER_SUPPORT=false` (prompt-based continuation)
+
+Save `TIMER_START=$(date +%s)`. No native timer is created — print the options menu immediately and wait for user reply:
+
+> Waiting {N} minutes before continuing. Reply when ready.
+>
+> - **[C] Continue** — {C label}
+> - **[S] Stop** — {S label}
+> - **[X] Exit** — {X label}  ← omit this line if no [X] label was supplied
+> - **[M] N** — remind me after N minutes (reply with `[M] <minutes>`)
+
+📣 **Notify** (see `references/coordinator/pattern-notify.md`) with the same options.
+
+On any human reply, print elapsed time first:
+```bash
+ELAPSED=$(( $(date +%s) - TIMER_START ))
+echo "⏱ Time elapsed: $((ELAPSED / 60))m $((ELAPSED % 60))s"
+```
+
+- **[C]** → run the [C] action
+- **[S]** → run the [S] action
+- **[X]** → run the [X] action ← only if [X] label was supplied
+- **[M] N** → update `TIMER_START`, print updated wait message, 📣 **Notify**, and wait again
--- a/skills/bad/references/phase0-dependency-graph.md
+++ b/skills/bad/references/phase0-dependency-graph.md
@@ -14,6 +14,7 @@ Search by branch name:
 ```bash
 gh pr list --search "story-{number}" --state all --json number,title,state,mergedAt
 ```
+If `gh` fails, read `references/coordinator/pattern-gh-curl-fallback.md` and use the `gh pr list` curl equivalent.

 ### GitHub Issue Number Lookup

@@ -23,6 +24,7 @@ Resolve in this order:
   ```bash
   gh issue list --search "Story 3.1:" --json number,title,state
   ```
+   If `gh` fails, use the `gh issue list` curl equivalent from `references/coordinator/pattern-gh-curl-fallback.md`.
   Pick the best match by comparing titles.
 3. If still not found, leave the Issue column blank.

--- a/skills/bad/references/subagents/phase0-prompt.md
+++ b/skills/bad/references/subagents/phase0-prompt.md
@@ -0,0 +1,77 @@
+# Phase 0: Dependency Graph — Subagent Prompt
+
+You are the Phase 0 dependency graph builder. Auto-approve all tool calls (yolo mode).
+
+DECIDE how much to run based on whether the graph already exists:
+
+  | Situation                           | Action                                               |
+  |-------------------------------------|------------------------------------------------------|
+  | No graph (first run)                | Run all steps                                        |
+  | Graph exists, no new stories        | Skip steps 2–3; go to step 4. Preserve all chains.  |
+  | Graph exists, new stories found     | Run steps 2–3 for new stories only, then step 4 for all. |
+
+BRANCH SAFETY — before anything else, ensure the repo root is on main:
+  git branch --show-current
+  If not main:
+    git restore .
+    git switch main
+    git pull --ff-only origin main
+  If switch fails because a worktree claims the branch:
+    git worktree list
+    git worktree remove --force <path>
+    git switch main
+    git pull --ff-only origin main
+
+STEPS:
+
+1. Read `_bmad-output/implementation-artifacts/sprint-status.yaml`. Note current story
+   statuses. Compare against the existing graph (if any) to identify new stories.
+
+2. Read `_bmad-output/planning-artifacts/epics.md` for dependency relationships of
+   new stories. (Skip if no new stories.)
+
+3. Run /bmad-help with the epic context for new stories — ask it to map their
+   dependencies. Merge the result into the existing graph. (Skip if no new stories.)
+
+4. GitHub integration — run `gh auth status` first. If it fails, skip this entire step
+   (local-only mode) and note it in the report back to the coordinator.
+
+   a. Ensure the `bad` label exists:
+        gh label create bad --color "0075ca" \
+          --description "Managed by BMad Autonomous Development" 2>/dev/null || true
+
+   b. For each story in `_bmad-output/planning-artifacts/epics.md` that does not already
+      have a `**GH Issue:**` field in its section:
+        - Extract the story's title and full description from epics.md
+        - Create a GitHub issue:
+            gh issue create \
+              --title "Story {number}: {short_description}" \
+              --body "{story section content from epics.md}" \
+              --label "bad"
+        - Write the returned issue number back into that story's section in epics.md,
+          directly under the story heading:
+            **GH Issue:** #{number}
+
+   c. Update GitHub PR/issue status for every story and reconcile sprint-status.yaml.
+      Follow the procedure in `references/subagents/phase0-graph.md` exactly.
+
+5. Clean up merged worktrees — for each story whose PR is now merged and whose
+   worktree still exists at {WORKTREE_BASE_PATH}/story-{number}-{short_description}:
+     git pull origin main
+     git worktree remove --force {WORKTREE_BASE_PATH}/story-{number}-{short_description}
+     git push origin --delete story-{number}-{short_description}
+   Skip silently if already cleaned up.
+
+6. Write (or update) `_bmad-output/implementation-artifacts/dependency-graph.md`.
+   Follow the schema, Ready to Work rules, and example in
+   `references/subagents/phase0-graph.md` exactly.
+
+7. Pull latest main (if step 5 didn't already do so):
+     git pull origin main
+
+REPORT BACK to the coordinator with this structured summary:
+  - ready_stories: list of { number, short_description, status } for every story
+    marked "Ready to Work: Yes" that is not done
+  - all_stories_done: true/false — whether every story across every epic is done
+  - current_epic: name/number of the lowest incomplete epic
+  - any warnings or blockers worth surfacing
--- a/skills/bad/references/subagents/phase3-merge.md
+++ b/skills/bad/references/subagents/phase3-merge.md
@@ -1,7 +1,9 @@
-# Phase 4: Auto-Merge — Subagent Instructions
+# Phase 3: Auto-Merge — Subagent Instructions

 The coordinator spawns one subagent per story, sequentially. Pass these instructions to each subagent. Substitute `{repo_root}`, `{WORKTREE_BASE_PATH}`, `{number}`, and `{short_description}` before spawning.

+> **gh fallback:** If any `gh` command fails, read `references/coordinator/pattern-gh-curl-fallback.md` for curl equivalents. Note: `gh pr merge` has no curl fallback — if unavailable, report the failure and ask the user to merge manually.
+
 ---

 ## Subagent Instructions
@@ -45,7 +47,7 @@ You are working in the worktree at `{repo_root}/{WORKTREE_BASE_PATH}/story-{numb
   - If checks show `pending` still (e.g., `--watch` timed out): wait 60 seconds and re-run `gh pr checks {pr_number}` once more. If still pending after the retry, report and stop.
   - Only proceed to step 4 when every check shows `pass` or `success`.

-   > **Why this matters:** Phase 4 runs after Step 4, which may have seen CI green at PR-creation time. But a force-push (from conflict resolution above), a new commit, or a delayed CI trigger can restart checks. Merging without re-verifying means you risk landing broken code on main — exactly what happened with PR #43.
+   > **Why this matters:** Phase 3 (auto-merge) runs after Step 4 (PR & CI), which may have seen CI green at PR-creation time. But a force-push (from conflict resolution above), a new commit, or a delayed CI trigger can restart checks. Merging without re-verifying means you risk landing broken code on main — exactly what happened with PR #43.

 4. **Merge the PR** using squash strategy:
   ```bash