fix(executor): enforce acceptance_criteria as hard gate, not advisory text (#1959)

The existing MANDATORY acceptance_criteria instruction is purely advisory — executor agents read it and silently skip criteria when they run low on context or hit complexity. This causes planned work to be dropped without any signal to the orchestrator or verifier. Changes: - Replace advisory text with a structured 5-step verification loop - Each criterion must be proven via grep/file-check/CLI command - Agent is BLOCKED from next task until all criteria pass - Failed criteria after 2 fix attempts logged as deviation (not silent skip) - Self-check step now re-runs ALL acceptance criteria before SUMMARY - Self-check also re-runs plan-level verification commands Closes #1958 Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 17:25:23 +02:00 · 2026-04-10 21:47:43 +07:00
parent 50cce89a7c
commit 49645b04aa
1 changed files with 9 additions and 1 deletions
--- a/get-shit-done/workflows/execute-plan.md
+++ b/get-shit-done/workflows/execute-plan.md
@@ -110,6 +110,8 @@ Pattern B only (verify-only checkpoints). Skip for A/C.
 3. After ALL segments: aggregate files/deviations/decisions → create SUMMARY.md → commit → self-check:
   - Verify key-files.created exist on disk with `[ -f ]`
   - Check `git log --oneline --all --grep="{phase}-{plan}"` returns ≥1 commit
+   - Re-run ALL `<acceptance_criteria>` from every task — if any fail, fix before finalizing SUMMARY
+   - Re-run the plan-level `<verification>` commands — log results in SUMMARY
   - Append `## Self-Check: PASSED` or `## Self-Check: FAILED` to SUMMARY

   **Known Claude Code bug (classifyHandoffIfNeeded):** If any segment agent reports "failed" with `classifyHandoffIfNeeded is not defined`, this is a Claude Code runtime bug — not a real failure. Run spot-checks; if they pass, treat as successful.
@@ -145,7 +147,13 @@ Deviations are normal — handle via rules below.
   - **MANDATORY read_first gate:** If the task has a `<read_first>` field, you MUST read every listed file BEFORE making any edits. This is not optional. Do not skip files because you "already know" what's in them — read them. The read_first files establish ground truth for the task.
   - `type="auto"`: if `tdd="true"` → TDD execution. Implement with deviation rules + auth gates. Verify done criteria. Commit (see task_commit). Track hash for Summary.
   - `type="checkpoint:*"`: STOP → checkpoint_protocol → wait for user → continue only after confirmation.
-   - **MANDATORY acceptance_criteria check:** After completing each task, if it has `<acceptance_criteria>`, verify EVERY criterion before moving to the next task. Use grep, file reads, or CLI commands to confirm each criterion. If any criterion fails, fix the implementation before proceeding. Do not skip criteria or mark them as "will verify later".
+   - **HARD GATE — acceptance_criteria verification:** After completing each task, if it has `<acceptance_criteria>`, you MUST run a verification loop before proceeding:
+     1. For each criterion: execute the grep, file check, or CLI command that proves it passes
+     2. Log each result as PASS or FAIL with the command output
+     3. If ANY criterion fails: fix the implementation immediately, then re-run ALL criteria
+     4. Repeat until all criteria pass — you are BLOCKED from starting the next task until this gate clears
+     5. If a criterion cannot be satisfied after 2 fix attempts, log it as a deviation with reason — do NOT silently skip it
+     This is not advisory. A task with failing acceptance criteria is an incomplete task.
 3. Run `<verification>` checks
 4. Confirm `<success_criteria>` met
 5. Document deviations in Summary