fix(digest): dense-fill topicOf with -1 sentinel + surface missed indices

Greptile P2 on PR #3247: `new Array(top.length)` creates a sparse array.
If a future injected clusterFn doesn't cover every input index,
topicOf[i] would be undefined, which then silently poisons the phase-1
aggregates (topicSize[undefined] / topicMax[undefined]) and degrades
the topic sort without any observable failure.

Fill with -1 so absence is unambiguous, then validate after clusterFn
runs and throw if any index is still -1. The outer try/catch captures
the error and returns {reps: top, topicCount: top.length, error}
matching the existing contract — primary order is preserved, no crash.

No behavior change today: singleLinkCluster's union-find guarantees
every index is covered. This just guards the invariant for future
clusterFn injections.
This commit is contained in:
Elie Habib
2026-04-21 08:52:49 +04:00
parent f5205bdb57
commit 93eca7bbbf

View File

@@ -319,10 +319,19 @@ export function groupTopicsPostDedup(top, cfg, embeddingByHash, deps = {}) {
vetoFn: null,
});
const topicOf = new Array(top.length);
// Dense-fill with -1 sentinel so an incomplete clusterFn (a future
// injection that doesn't cover every input index) surfaces as an
// explicit error instead of silently poisoning the phase-1 aggregates
// (topicSize[undefined] / topicMax[undefined] would degrade the sort).
const topicOf = new Array(top.length).fill(-1);
clusters.forEach((members, tIdx) => {
for (const i of members) topicOf[i] = tIdx;
});
for (let i = 0; i < topicOf.length; i++) {
if (topicOf[i] === -1) {
throw new Error(`topic grouping: clusterFn missed index ${i}`);
}
}
const hashOf = top.map((rep) =>
titleHashHex(normalizeForEmbedding(rep.title ?? '')),