mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
fix(digest): dense-fill topicOf with -1 sentinel + surface missed indices
Greptile P2 on PR #3247: `new Array(top.length)` creates a sparse array. If a future injected clusterFn doesn't cover every input index, topicOf[i] would be undefined, which then silently poisons the phase-1 aggregates (topicSize[undefined] / topicMax[undefined]) and degrades the topic sort without any observable failure. Fill with -1 so absence is unambiguous, then validate after clusterFn runs and throw if any index is still -1. The outer try/catch captures the error and returns {reps: top, topicCount: top.length, error} matching the existing contract — primary order is preserved, no crash. No behavior change today: singleLinkCluster's union-find guarantees every index is covered. This just guards the invariant for future clusterFn injections.
This commit is contained in:
@@ -319,10 +319,19 @@ export function groupTopicsPostDedup(top, cfg, embeddingByHash, deps = {}) {
|
||||
vetoFn: null,
|
||||
});
|
||||
|
||||
const topicOf = new Array(top.length);
|
||||
// Dense-fill with -1 sentinel so an incomplete clusterFn (a future
|
||||
// injection that doesn't cover every input index) surfaces as an
|
||||
// explicit error instead of silently poisoning the phase-1 aggregates
|
||||
// (topicSize[undefined] / topicMax[undefined] would degrade the sort).
|
||||
const topicOf = new Array(top.length).fill(-1);
|
||||
clusters.forEach((members, tIdx) => {
|
||||
for (const i of members) topicOf[i] = tIdx;
|
||||
});
|
||||
for (let i = 0; i < topicOf.length; i++) {
|
||||
if (topicOf[i] === -1) {
|
||||
throw new Error(`topic grouping: clusterFn missed index ${i}`);
|
||||
}
|
||||
}
|
||||
|
||||
const hashOf = top.map((rep) =>
|
||||
titleHashHex(normalizeForEmbedding(rep.title ?? '')),
|
||||
|
||||
Reference in New Issue
Block a user