Files
get-shit-done/.changeset/codex-discuss-fallback.md
Tom Boucher 6df9b44297 fix(#3018): codex adapter must stop and ask, not silently default decisions (#3027)
* fix(#3018): codex adapter must stop and ask, not silently default decisions

@jon-hendry: running `\$gsd-discuss-phase 81` in Codex Default mode
proceeded toward writing CONTEXT.md / DISCUSSION-LOG.md / checkpoint
artifacts without surfacing the discussion questions to the user. The
generated Codex skill adapter explicitly told it to do that:

  Execute mode fallback:
  - When `request_user_input` is rejected (Execute mode), present a
    plain-text numbered list and pick a reasonable default.

That instruction is wrong for any workflow whose contract is to
discuss with the user (most prominently `$gsd-discuss-phase`). The
fallback now requires the agent to:

1. STOP. Present the questions as a plain-text numbered list, then
   wait for the user's reply.
2. Only proceed without a user answer when one of these is true:
   (a) invocation included --auto / --all,
   (b) user explicitly approved a default for this question, or
   (c) workflow's documented contract permits autonomous defaults.
3. Do NOT write CONTEXT.md, DISCUSSION-LOG.md, PLAN.md, or checkpoint
   files until the user has answered or one of (a)-(c) above applies.

Tests:
- bug-3018-codex-discuss-fallback.test.cjs (5 tests, structural-IR):
  parses the generated header into sections via regex,  asserts on
  the Execute-mode-fallback section's content (must contain stop/
  wait + plain-text directives, must NOT contain "pick a reasonable
  default", must name a permission path, must forbid artifact
  writing). No raw text snapshot — the assertions describe the
  behavioral invariant, so prose can be reworded without test churn.
- codex-config.test.cjs:128 still passes — section still mentions
  "Execute mode" as required.

Verification:
- 5/5 pass on new regression test
- 116/116 pass on bug-3018 + codex-config combined
- 6763/6763 pass on full suite
- lint-no-source-grep clean

Closes #3018

* test(#3018): parse fallback into typed semantic-flag record (CR)

CodeRabbit nitpick on PR #3027: the regression tests grepped the
generated header prose with regex, which is brittle and tests wording
rather than semantics. Per CONTRIBUTING.md "no-source-grep" standard.

Refactored to a structural-IR shape:

- New `parseExecuteModeFallback(section)` walks the section text once
  and returns a typed record:
    {
      ok, sectionLength,
      instructsStop,                          // STOP/HALT/WAIT directive
      presentsPlainTextQuestions,             // plain-text / numbered list
      namesPermissionPath,                    // --auto / --all / explicit approval
      forbidsWritingArtifactsBeforeAnswer,    // write-ban + named artifact class
      silentlyPicksDefaults,                  // anti-pattern guard (must be false)
    }
- Each positive invariant gets its own test asserting on the parsed
  boolean, so a failure points at the exact invariant that broke.
- A final test does a single assert.deepStrictEqual against the full
  expected contract — gives a structured diff when any flag flips.
- The artifact-write ban now requires BOTH a "do not write" intent
  AND a named artifact class (was a single broad regex), so generic
  "do not write" prose elsewhere in the section can't satisfy it.

Verification: 8/8 pass; lint-no-source-grep clean.
2026-05-02 11:45:36 -04:00

536 B

type, pr
type pr
Fixed TBD

Codex skill adapter no longer instructs the agent to silently default discuss-phase decisions. When request_user_input was rejected (Default mode), the generated adapter said "pick a reasonable default" — so $gsd-discuss-phase proceeded toward writing CONTEXT.md / DISCUSSION-LOG.md / checkpoints without ever asking the user. Adapter prose now requires the agent to STOP, present plain-text questions, and wait, with explicit named exceptions (--auto/--all/explicit user approval). See #3018.