Commit Graph

2 Commits

Author SHA1 Message Date
Tom Boucher
fb92d1e596 fix(#2983): classifier exit-code discipline, base-tag staging, drop vestigial merge-back (#2984)
* fix(#2983): classifier exit-code discipline, base-tag staging, drop vestigial merge-back

Three issues surfaced by CodeRabbit's post-merge review of #2981 plus
a production failure on the v1.39.1 release run.

(1) Overloaded classifier exit code

scripts/diff-touches-shipped-paths.cjs reused exit 1 for both the
legitimate "no shipped paths" result and Node's default exit on
uncaught throw, so any classifier failure (corrupt package.json,
EPERM, etc.) was indistinguishable from a normal skip — the workflow's
`if ! ... ; then skip` idiom would silently drop the commit.

Distinct exit codes now:
  0  shipped       — at least one path is in the npm `files` whitelist
  1  not shipped   — CI / test / docs / planning only
  2  classifier error — workflow MUST fail-fast

uncaughtException + unhandledRejection + try/catch around fs/JSON
parsing all route to exit 2 with stderr context.

(2) Classifier missing at the base tag (CRITICAL)

`Prepare hotfix branch` runs `git checkout -b "$BRANCH" "$BASE_TAG"`
BEFORE the cherry-pick loop, replacing the working tree with the base
tag's contents. Base tags predating #2980 (notably v1.39.0, the most
likely next hotfix base) don't have scripts/diff-touches-shipped-paths.cjs
at all — `node <missing>` exits non-zero — `if !` skips every commit —
empty hotfix branch published. Strictly worse than the original #2980
push-rejection, which at least failed loudly.

Stage the classifier from the dispatched ref's working tree into
$RUNNER_TEMP at the top of the run script (before any working-tree-
mutating git command). The cherry-pick loop now references $CLASSIFIER
(staged) instead of the in-tree path. Sanity guards: refuse to start
if scripts/diff-touches-shipped-paths.cjs is missing in the dispatched
ref, refuse to proceed if cp didn't materialize $CLASSIFIER.

The cherry-pick loop captures node's exit via ${PIPESTATUS[1]} and
dispatches via explicit case:
  0  proceed with cherry-pick
  1  skip into NON_SHIPPED_SKIPPED
  *  emit ::error:: + exit "$CLASSIFIER_RC"

(3) Drop the merge-back PR step

Auto-cherry-pick only picks commits already on main (`git cherry HEAD
origin/main` outputs the unmerged ones; we filter fix:/chore: from
main). By construction every code commit on the hotfix branch is
already on main. The only hotfix-branch-only commit is `chore: bump
version to X.Y.Z for hotfix`, which either no-ops against main or
rewinds main's in-progress version. The merge-back PR was vestigial.

It also failed in production on run 25232968975 with `GitHub Actions
is not permitted to create or approve pull requests (createPullRequest)`
— org policy blocks PR creation from the workflow's GH_TOKEN. Even
without that block, the PR would have nothing useful to merge.

Step removed. The `pull-requests: write` permission granted solely
for the merge-back step has been dropped from the release job
(least-privilege).

Regression coverage

tests/bug-2983-classifier-exit-codes-and-base-tag-staging.test.cjs
adds 12 assertions across two describe blocks:

  - 5 classifier behavioral: exit 0/1 preserved, exit 2 on missing
    package.json, exit 2 on malformed JSON, exit-code constants
    exported.
  - 7 workflow contract: classifier staged before checkout, target
    is $RUNNER_TEMP, missing-source guard, missing-staged guard,
    PIPESTATUS-based dispatch, error branch fails workflow, loop uses
    staged path (not in-tree).

tests/bug-2980-hotfix-only-picks-shipping-changes.test.cjs updated
where it asserted the pre-#2983 `if ! ... ; then` shape: now accepts
the post-#2983 case-dispatch form. The test still proves the
classifier participates; bug-2983 enforces the specific shape.

Run summary references for the curious reviewer:

  - Run 25232010071 — original #2980 trigger (workflow-file push
    rejection)
  - Run 25232968975 — failed merge-back step that prompted the
    "is this even useful?" question that drove the removal

Closes #2983

* fix(#2983): address CodeRabbit findings on PR #2984

Two findings, both real, both fixed.

(1) [Critical] PIPESTATUS capture clobbered by `|| true`

Pre-fix shape:
  git diff-tree ... | node "$CLASSIFIER" || true
  CLASSIFIER_RC="${PIPESTATUS[1]}"

When the classifier exits 1 ("not shipped" — common case) or 2
(error), `|| true` triggers the right-hand side. `true` is a
one-command "pipeline" that overwrites PIPESTATUS to (0).
${PIPESTATUS[1]} on the next line is therefore unset (or stale
under set -u). The case dispatch then matched the empty string —
falling into `*)` and failing the workflow on every non-shipped
commit, OR matching `0)` after some shells default-init unset
to 0 and silently picking commits that don't ship.

Local repro confirms the issue:

  $ bash -c 'set -euo pipefail; false | sh -c "exit 7" || true; \
       echo "PIPESTATUS: ${PIPESTATUS[*]}"; \
       echo "[1]: ${PIPESTATUS[1]:-<unset>}"'
  PIPESTATUS: 0
  [1]: <unset>

Fix: bracket the pipeline in `set +e`/`set -e`, snapshot
PIPESTATUS into a local array on the very next line, then
dispatch on the snapshot:

  set +e
  git diff-tree ... | node "$CLASSIFIER"
  PIPE_RC=("${PIPESTATUS[@]}")
  set -e
  DIFFTREE_RC="${PIPE_RC[0]}"
  CLASSIFIER_RC="${PIPE_RC[1]}"

The snapshot must happen on the first line after the pipeline;
any intervening simple command resets PIPESTATUS. The array form
is invariant against that.

Bonus from the new shape: $DIFFTREE_RC is now also captured.
git diff-tree is unlikely to fail on a known-good $SHA, but if
it does, we no longer feed partial/empty input to the classifier
and call it "not shipped." A non-zero DIFFTREE_RC emits
::error::git diff-tree failed and exits.

(2) [Minor] Stale "Merge-back PR opened against main" summary line

The hotfix run summary still printed:
  echo "- Merge-back PR opened against main"

But the merge-back step itself was removed in the previous commit
on this branch. Operators reading the summary would expect a PR
that doesn't exist. Replaced with explicit non-action text:

  echo "- No merge-back PR (auto-picked commits are already on main)"

Test coverage

bug-2983 test file gains 3 assertions:
  - PIPE_RC array-snapshot pattern is required (regex matches the
    exact `PIPE_RC=("${PIPESTATUS[@]}")` form).
  - The `pipeline || true; ${PIPESTATUS[1]}` antipattern is
    explicitly forbidden via assert.doesNotMatch.
  - DIFFTREE_RC is captured from PIPE_RC[0] and a non-zero value
    triggers ::error::git diff-tree failed.
  - Run summary forbids `Merge-back PR opened against main` and
    requires the new non-action sentence.

bug-2964 test's loop-anchor window bumped 6 KB → 8 KB to
accommodate the additional pre-pick scaffolding (the test's own
comment had already anticipated this kind of growth, citing prior
precedents from #2970 and #2980).

Mark CodeRabbit comments resolved post-commit.

Refs CR finding ids 3175253571, 3175253578 on PR #2984.
2026-05-01 17:25:20 -04:00
Tom Boucher
7424271aa0 fix(#2980): hotfix cherry-pick only picks commits that change what ships (#2981)
* fix(#2980): pre-skip workflow-file cherry-picks in release-sdk hotfix loop

The default GITHUB_TOKEN issued to the release-sdk run lacks the
`workflow` scope, so the prepare job's `git push origin "$BRANCH"` is
rejected by GitHub when any cherry-picked commit modifies a file under
`.github/workflows/`:

  ! [remote rejected] hotfix/X.YY.Z -> hotfix/X.YY.Z
    (refusing to allow a GitHub App to create or update workflow ...
     without `workflows` permission)

Pre-#2980 behavior: the auto_cherry_pick loop happily picked
workflow-file commits, then the trailing push exploded with no clear
signal which commit was the culprit. v1.39.1 hit this on PR #2977
(run 25232010071) — earlier release-sdk fixes (#2965, #2967, #2970)
had been skipped on conflict so their workflow-file changes never
reached the push step, masking the bug; #2977 was the first
workflow-file commit to apply cleanly and the push immediately
exploded.

Fix: pre-pick guard in the cherry-pick loop. Inspect each candidate
commit's file list via `git diff-tree --no-commit-id --name-only -r`
BEFORE attempting the pick. If any path matches `^\.github/workflows/`,
skip the commit, emit a `::warning::` annotation naming the dropped
commit, and append to a new `WORKFLOW_SKIPPED` bucket. The run summary
surfaces this bucket in its own section, distinct from `CONFLICT_SKIPPED`
(real merge conflicts) and `POLICY_SKIPPED` (feat/refactor exclusions),
so operators reviewing the run never confuse the remediation paths.

The loud-warning piece is non-negotiable: silent drops were explicitly
rejected as a failure mode during the option-1/2/3 tradeoff discussion.
If a workflow-file fix genuinely needs to ship in a hotfix, the
operator applies it manually on the hotfix branch using a token with
`workflow` scope, or lands it on main and re-cuts the release.

Regression covered by tests/bug-2980-skip-workflow-file-cherrypicks.test.cjs
(5 assertions: pre-pick guard exists, uses `git diff-tree`, emits
`::warning::`, lands in dedicated bucket, surfaces in summary).

The bug-2964 test's 4 KB window after the cherry-pick-loop anchor was
nudged to 6 KB to accommodate the new pre-pick scaffolding — the test's
own comment had already anticipated this kind of growth (citing #2970's
merge-commit pre-skip as prior precedent).

Closes #2980

* refactor(#2980): replace workflow-file pre-skip with shipped-paths filter

The previous commit on this branch caught only the .github/workflows/*
subset of the bug, treating the symptom (push rejection on workflow-file
changes) rather than the root cause (the fix:/chore: filter is too broad
— it picks any commit with that conventional-commit type even when the
diff cannot affect the published npm package).

CI-only fixes (release-sdk.yml itself, hotfix tooling, test-only
commits) shouldn't flow through hotfix runs at all — they cannot change
what `npm install get-shit-done-cc@X.YY.Z` produces. The
.github/workflows/* push rejection is just the loudest of these
"shouldn't have been picked" cases; tests/, docs/, .planning/ commits
get picked silently with the same lack of effect on consumers.

Replace the workflow-file pre-skip with a shipped-paths filter:

- New scripts/diff-touches-shipped-paths.cjs reads package.json `files`,
  plus package.json itself (always-shipped per `npm pack` semantics),
  and exits 0 iff any input path is in the shipped set. Lockfile is
  not shipped (npm pack excludes it unless explicitly in `files`).
- Workflow loop now pipes `git diff-tree --no-commit-id --name-only -r`
  through the classifier; on exit 1 the commit is skipped and
  appended to a new NON_SHIPPED_SKIPPED bucket (replaces
  WORKFLOW_SKIPPED).
- Run summary surfaces NON_SHIPPED_SKIPPED as informational — no
  ::warning:: annotation. A non-shipping commit cannot affect the
  package, so a yellow alert would imply remediation is possible
  and would mislead operators.

The classifier in a separate .cjs file (rather than inline bash
heredoc) is so its rules — directory-prefix vs exact-match,
package.json-always-shipped, lockfile-not-shipped — are unit-testable
in tests/bug-2980-hotfix-only-picks-shipping-changes.test.cjs (11 new
assertions: 4 static workflow + 6 classifier behavioral + 1 mixed-
diff edge case).

Why this dissolves the original push-rejection bug: workflow files
aren't in `files`, so workflow-only commits are skipped pre-pick.
The push step never sees them.

If a workflow-file fix genuinely needs to ship in a hotfix release
(extremely rare — the hotfix workflow is read from main's ref, not
the hotfix branch's), the operator applies it manually using a token
with `workflow` scope. The pre-skip puts that requirement in the run
summary explicitly.

Closes #2980
2026-05-01 16:59:49 -04:00