mirror of
https://github.com/glittercowboy/get-shit-done
synced 2026-05-13 10:36:38 +02:00
* fix(#3018): codex adapter must stop and ask, not silently default decisions @jon-hendry: running `\$gsd-discuss-phase 81` in Codex Default mode proceeded toward writing CONTEXT.md / DISCUSSION-LOG.md / checkpoint artifacts without surfacing the discussion questions to the user. The generated Codex skill adapter explicitly told it to do that: Execute mode fallback: - When `request_user_input` is rejected (Execute mode), present a plain-text numbered list and pick a reasonable default. That instruction is wrong for any workflow whose contract is to discuss with the user (most prominently `$gsd-discuss-phase`). The fallback now requires the agent to: 1. STOP. Present the questions as a plain-text numbered list, then wait for the user's reply. 2. Only proceed without a user answer when one of these is true: (a) invocation included --auto / --all, (b) user explicitly approved a default for this question, or (c) workflow's documented contract permits autonomous defaults. 3. Do NOT write CONTEXT.md, DISCUSSION-LOG.md, PLAN.md, or checkpoint files until the user has answered or one of (a)-(c) above applies. Tests: - bug-3018-codex-discuss-fallback.test.cjs (5 tests, structural-IR): parses the generated header into sections via regex, asserts on the Execute-mode-fallback section's content (must contain stop/ wait + plain-text directives, must NOT contain "pick a reasonable default", must name a permission path, must forbid artifact writing). No raw text snapshot — the assertions describe the behavioral invariant, so prose can be reworded without test churn. - codex-config.test.cjs:128 still passes — section still mentions "Execute mode" as required. Verification: - 5/5 pass on new regression test - 116/116 pass on bug-3018 + codex-config combined - 6763/6763 pass on full suite - lint-no-source-grep clean Closes #3018 * test(#3018): parse fallback into typed semantic-flag record (CR) CodeRabbit nitpick on PR #3027: the regression tests grepped the generated header prose with regex, which is brittle and tests wording rather than semantics. Per CONTRIBUTING.md "no-source-grep" standard. Refactored to a structural-IR shape: - New `parseExecuteModeFallback(section)` walks the section text once and returns a typed record: { ok, sectionLength, instructsStop, // STOP/HALT/WAIT directive presentsPlainTextQuestions, // plain-text / numbered list namesPermissionPath, // --auto / --all / explicit approval forbidsWritingArtifactsBeforeAnswer, // write-ban + named artifact class silentlyPicksDefaults, // anti-pattern guard (must be false) } - Each positive invariant gets its own test asserting on the parsed boolean, so a failure points at the exact invariant that broke. - A final test does a single assert.deepStrictEqual against the full expected contract — gives a structured diff when any flag flips. - The artifact-write ban now requires BOTH a "do not write" intent AND a named artifact class (was a single broad regex), so generic "do not write" prose elsewhere in the section can't satisfy it. Verification: 8/8 pass; lint-no-source-grep clean.
536 B
536 B
type, pr
| type | pr |
|---|---|
| Fixed | TBD |
Codex skill adapter no longer instructs the agent to silently default discuss-phase decisions. When request_user_input was rejected (Default mode), the generated adapter said "pick a reasonable default" — so $gsd-discuss-phase proceeded toward writing CONTEXT.md / DISCUSSION-LOG.md / checkpoints without ever asking the user. Adapter prose now requires the agent to STOP, present plain-text questions, and wait, with explicit named exceptions (--auto/--all/explicit user approval). See #3018.