- Replaced instances of silentDebug with happy_path_error__with_fallback across multiple files to improve error logging and handling. - Updated the utility function to provide clearer semantics for error handling when expected values are missing. - Introduced a script to find potential silent failures in the codebase that may need to be addressed with the new error handling approach.
17 KiB
Claude-Mem Smart Install & Plugin Hooks - Comprehensive Analysis
Generated: 2025-12-09 Scope: Smart install system, all plugin hooks, cross-platform compatibility, error handling, edge cases
Executive Summary
This report provides a comprehensive analysis of claude-mem's smart install system and plugin hook infrastructure. The analysis focuses on cross-platform compatibility, error handling patterns, artificial blockers, and edge case handling.
Key Findings:
- ✅ Overall architecture is well-designed with clear separation of concerns
- ⚠️ Multiple cross-platform compatibility issues identified
- ⚠️ Several silent failure patterns that hinder debugging
- ⚠️ Artificial blockers that could prevent legitimate use cases
- ⚠️ Inconsistent timeout values across different components
- ✅ No nested try-catch anti-patterns found
Architecture Overview
Smart Install System Flow
User Invokes Hook
↓
ensureWorkerRunning() [worker-utils.ts]
↓
isWorkerHealthy() → fetch /health endpoint
↓
├─ [HEALTHY] → Continue
└─ [UNHEALTHY] → startWorker()
↓
├─ [Windows] → PowerShell Start-Process (hidden window)
└─ [Unix] → PM2 start ecosystem.config.cjs
↓
Wait for health check (15 retries × 1000ms)
↓
├─ [SUCCESS] → Continue
└─ [FAILURE] → Throw error with manual recovery instructions
Plugin Hook Lifecycle
-
SessionStart (context-hook.ts + user-message-hook.ts)
- context-hook: Fetches context via HTTP/curl
- user-message-hook: Displays context to user via stderr
-
UserPromptSubmit (new-hook.ts)
- Creates/retrieves SDK session
- Strips privacy tags from prompt
- Initializes session via HTTP
-
PostToolUse (save-hook.ts)
- Filters skipped tools
- Sends observation to worker via HTTP
-
Stop (summary-hook.ts)
- Parses transcript JSONL
- Extracts last user/assistant messages
- Requests summary generation via HTTP
-
SessionEnd (cleanup-hook.ts)
- Marks session complete
- Fire-and-forget HTTP request
Cross-Platform Compatibility Issues
🔴 CRITICAL: curl Dependency (context-hook.ts)
Location: src/hooks/context-hook.ts:32
const result = execSync(`curl -s "${url}"`, { encoding: "utf-8", timeout: 5000 });
Issues:
- Windows Compatibility: curl is not guaranteed to be available on Windows systems (though included in Windows 10 1803+, it may be missing on older systems or custom installations)
- Error Handling: No try-catch around execSync - will throw unhandled exception if curl fails
- Redundancy: Uses curl when JavaScript's native
fetchis already used everywhere else in the codebase
Impact: High - SessionStart hook will crash if curl is unavailable or returns non-zero exit code
Edge Cases:
- Corporate proxies blocking curl
- Systems without curl in PATH
- curl returning non-zero exit with valid output (warnings, etc.)
Recommendation:
// Replace curl with fetch (already used in user-message-hook.ts)
const response = await fetch(url, { signal: AbortSignal.timeout(5000) });
const result = await response.text();
🟡 MEDIUM: Platform-Specific Process Spawning (worker-utils.ts)
Location: src/shared/worker-utils.ts:55-93
Windows Implementation:
spawnSync('powershell.exe', [
'-NoProfile',
'-NonInteractive',
'-Command',
`Start-Process -FilePath 'node' -ArgumentList '${workerScript}' -WorkingDirectory '${MARKETPLACE_ROOT}' -WindowStyle Hidden`
])
Issues:
- PowerShell Dependency: Assumes PowerShell is available and in PATH
- Command Injection Risk: Worker script path inserted directly into command string without escaping
- Process Monitoring: Windows approach launches detached process with no PM2 monitoring - harder to debug/restart
- Health Check Timeout: Comment says "Windows needs longer timeouts" but timeout is same for all platforms (500ms)
Edge Cases:
- Windows systems with PowerShell execution policy restrictions
- Paths containing single quotes or special characters
- Windows subsystem for Linux (WSL) environments
- Wine/Proton compatibility layers
Unix Implementation:
const localPm2Base = path.join(MARKETPLACE_ROOT, 'node_modules', '.bin', 'pm2');
const pm2Command = existsSync(localPm2Base) ? localPm2Base : 'pm2';
Issues:
- PM2 Dependency: Falls back to global pm2 if local not found, but doesn't verify it exists
- Silent Failure: If PM2 not installed globally, spawnSync will fail with cryptic ENOENT error
Recommendation:
- Add pm2 existence check before spawn
- Implement consistent process monitoring across platforms
- Add path escaping for Windows command construction
- Actually implement longer timeout for Windows if needed
🟡 MEDIUM: Git Dependency (paths.ts)
Location: src/shared/paths.ts:89-97
export function getCurrentProjectName(): string {
try {
const gitRoot = execSync('git rev-parse --show-toplevel', {
cwd: process.cwd(),
encoding: 'utf8',
stdio: ['pipe', 'pipe', 'ignore']
}).trim();
return basename(gitRoot);
} catch {
return basename(process.cwd());
}
}
Issues:
- Git Assumption: Assumes git is installed and available in PATH
- Non-Git Projects: Silently falls back to cwd basename, but this behavior is undocumented
Edge Cases:
- Projects not using git
- Monorepos where cwd !== git root is desired
- Systems without git installed
Status: ✅ Already handled with fallback, but could benefit from debug logging
Error Handling Analysis
🔴 CRITICAL: Silent Failures Without Logging
1. Settings File Loading (early-settings.ts:20-28)
try {
if (existsSync(SETTINGS_PATH)) {
const data = JSON.parse(readFileSync(SETTINGS_PATH, 'utf-8'));
const fileValue = data.env?.[key];
if (fileValue !== undefined) return fileValue;
}
} catch {
// Fail silently - fall through to env var
}
Problem:
- Invalid JSON in settings file fails silently
- File read permission errors fail silently
- Users have no way to know their settings file is being ignored
Impact: High - Users may think settings are applied when they're actually using defaults
Recommendation:
} catch (error) {
logger.warn('SETTINGS', 'Failed to load settings file', { path: SETTINGS_PATH }, error);
}
2. Worker Startup Failure (worker-utils.ts:104-107)
try {
// ... worker startup logic ...
} catch (error) {
// Failed to start worker
return false;
}
Problem:
- Catches ALL errors during worker startup
- Returns boolean with no information about what failed
- User only gets generic error after all retries exhausted
Impact: High - Makes debugging worker startup issues extremely difficult
Recommendation:
} catch (error) {
logger.error('WORKER', 'Failed to start worker', {}, error as Error);
return false;
}
3. Worker Health Check (worker-utils.ts:30-40)
async function isWorkerHealthy(): Promise<boolean> {
try {
const port = getWorkerPort();
const response = await fetch(`http://127.0.0.1:${port}/health`, {
signal: AbortSignal.timeout(HEALTH_CHECK_TIMEOUT_MS)
});
return response.ok;
} catch {
return false;
}
}
Problem:
- Network errors, timeouts, and non-200 responses all indistinguishable
- No logging at all - completely silent
Impact: Medium - Hard to debug why health checks fail
Recommendation:
} catch (error) {
logger.debug('WORKER', 'Health check failed', { port }, error);
return false;
}
4. Tool Formatting (logger.ts:122-124)
try {
const input = typeof toolInput === 'string' ? JSON.parse(toolInput) : toolInput;
// ...
} catch {
return toolName;
}
Problem:
- Invalid JSON in tool input fails silently
- Could mask data corruption issues
Impact: Low - Only affects log formatting
Status: ✅ Acceptable for log formatting, but could log at DEBUG level
🟢 GOOD: No Nested Try-Catch Anti-Patterns
Analysis confirmed zero instances of nested try-catch blocks. Error handling is consistently at single level per function.
Artificial Blockers & Unnecessary Checks
🔴 CRITICAL: First-Run Detection (user-message-hook.ts:14-40)
const nodeModulesPath = join(pluginDir, 'node_modules');
if (!existsSync(nodeModulesPath)) {
// Show first-time setup message
console.error(`...`);
process.exit(3);
}
Problems:
- False Positive: Will trigger if user manually deletes node_modules (e.g., for troubleshooting)
- Installation Race: Could fail if installation is still in progress
- Hook-Level Check: Runs on EVERY SessionStart, not just actual first run
Impact: High - Prevents usage until node_modules exists, even if dependencies are installed elsewhere
Edge Cases:
- User runs
rm -rf node_modulesfor troubleshooting - Package manager installation interrupted
- Symlinked node_modules (some package managers)
Recommendation:
- Use a
.first-run-completemarker file instead - Move check to npm postinstall script
- Make check more robust (check for specific required modules)
🟡 MEDIUM: Overly Specific Validation (paths.ts:117-119)
if (!existsSync(join(commandsDir, 'save.md'))) {
throw new Error('Package commands directory missing required files');
}
Problem:
- Checks for ONE specific file to validate entire directory
- Hardcoded filename could break if files reorganized
- Error message doesn't specify what's missing
Impact: Medium - Could prevent package from working after internal refactoring
Recommendation:
- Remove check entirely (let actual command invocation fail with better error)
- Or check all required files if validation is critical
🟡 MEDIUM: Duplicate Health Endpoints
Locations:
src/services/worker-service.ts:107-/api/healthsrc/services/worker/http/routes/ViewerRoutes.ts:27-/health
Usage:
worker-utils.tsuses/healthmcp-server.tsuses/api/health
Problem:
- Redundant endpoints doing the same thing
- Inconsistent usage across codebase
- Maintenance burden
Impact: Low - Both work, but creates confusion
Recommendation:
- Standardize on
/api/health(follows REST convention) - Remove
/healthendpoint - Update worker-utils.ts to use
/api/health
Timeout Configuration Issues
Inconsistent Timeouts Across Components
| Component | Timeout | Location | Purpose |
|---|---|---|---|
| Health check | 500ms | worker-utils.ts:13 | Check if worker alive |
| Worker startup wait | 1000ms | worker-utils.ts:14 | Wait between health checks |
| Worker startup retries | 15x | worker-utils.ts:15 | Max retries (15s total) |
| Hook HTTP requests | 2000ms | cleanup-hook.ts:61, save-hook.ts:70, summary-hook.ts:164 | Send data to worker |
| New hook session init | 5000ms | new-hook.ts:129 | Initialize session |
| Context hook fetch | 5000ms | context-hook.ts:32 | Fetch context via curl |
| User message hook | 5000ms | user-message-hook.ts:52 | Fetch context display |
Problems:
- Health Check Too Aggressive: 500ms may be too short for loaded systems or slow network
- No Platform Adjustment: Comment says "Windows needs longer timeouts" but values are same
- Hook Timeout Variation: Some hooks use 2s, others use 5s with no clear reasoning
Recommendations:
- Increase health check timeout to 1000ms minimum
- Actually implement longer timeouts for Windows
- Standardize hook timeouts to 5000ms across the board
- Make timeouts configurable via settings
Edge Case Analysis
Handled Well ✅
- JSONL Parsing: summary-hook.ts continues on malformed lines (60-64, 117-121)
- Git Not Available: paths.ts falls back to cwd basename (89-97)
- Settings File Missing: early-settings.ts falls back to env vars and defaults (20-28)
- Privacy Tags: new-hook.ts handles fully-private prompts (99-109)
- Tool Skipping: save-hook.ts filters low-value tools (24-30)
Missing Edge Case Handling ⚠️
- curl Failure: context-hook.ts has no error handling for curl failures
- PM2 Not Installed: worker-utils.ts assumes pm2 exists globally
- PowerShell Restrictions: worker-utils.ts doesn't check execution policy
- Concurrent Worker Starts: No locking to prevent multiple hooks from starting worker simultaneously
- Port Already In Use: No detection or recovery if worker port is taken
- Zombie Processes: Windows approach doesn't track PIDs, can't detect/kill zombies
Recommendations Summary
High Priority 🔴
-
Replace curl with fetch in context-hook.ts
- Eliminates external dependency
- Consistent with rest of codebase
- Better error handling
-
Add logging to silent failures
- early-settings.ts: Log when settings file fails to load
- worker-utils.ts: Log startup failures with details
- worker-utils.ts: Log health check failures at debug level
-
Fix first-run detection
- Use marker file instead of node_modules check
- More reliable and intentional
Medium Priority 🟡
-
Verify PM2 availability before attempting to use it
- Check existence before spawn
- Provide clear error message if missing
-
Implement platform-specific timeouts
- Actually use longer timeouts on Windows as comment suggests
- Make timeouts configurable
-
Standardize health endpoints
- Remove duplicate
/healthendpoint - Use
/api/healtheverywhere
- Remove duplicate
-
Add path escaping for Windows PowerShell commands
- Prevent injection issues
- Handle paths with special characters
Low Priority 🟢
- Standardize HTTP timeouts across all hooks
- Add concurrent startup protection (locking mechanism)
- Improve error messages with actionable recovery steps
Testing Recommendations
Cross-Platform Testing Needed
-
Windows Environments:
- Windows 10 (various versions)
- Windows 11
- Windows Server
- WSL/WSL2
- PowerShell execution policies (Restricted, RemoteSigned, Unrestricted)
-
Unix Environments:
- macOS (Intel + Apple Silicon)
- Linux (Ubuntu, Fedora, Arch)
- FreeBSD
-
Edge Environments:
- Docker containers
- CI/CD environments
- Systems without git installed
- Systems without curl (or with restricted curl)
- Corporate networks with proxies
- Low-spec systems (slow startup)
Test Scenarios
- Cold Start: First run with no existing data
- Corrupt Settings: Invalid JSON in settings.json
- Missing Dependencies: No PM2, no git, no curl
- Port Conflicts: Worker port already in use
- Rapid Hook Invocations: Multiple hooks trying to start worker simultaneously
- Permission Issues: Read-only filesystem, restricted execution
- Network Issues: Localhost blocked, slow network
Code Quality Assessment
Strengths ✅
- Clean separation of concerns (hooks → worker → database)
- No nested try-catch anti-patterns
- Consistent use of modern async/await
- Good use of TypeScript for type safety
- Idempotent database operations
- Clear documentation in critical sections
Weaknesses ⚠️
- Silent failures hinder debugging
- Inconsistent error handling patterns
- Platform-specific code not fully tested/documented
- Timeout configuration hardcoded and inconsistent
- Some artificial blockers prevent legitimate use cases
Technical Debt
- Duplicate health endpoints
- curl dependency when fetch available
- PM2 dependency on Unix but not Windows (inconsistent monitoring)
- First-run detection using node_modules existence
- Hardcoded timeout values
Conclusion
The claude-mem smart install and plugin hook system is architecturally sound with a well-designed separation of concerns. However, several cross-platform compatibility issues and silent failure patterns could cause problems in production, particularly on Windows systems or in edge case scenarios.
The highest priority improvements are:
- Removing the curl dependency
- Adding proper logging to silent failures
- Fixing the fragile first-run detection
- Verifying external dependencies before use
These changes would significantly improve debuggability and cross-platform reliability without requiring major architectural changes.
Analysis Methodology:
- Systematic review of all TypeScript source files
- Static analysis of error handling patterns
- Cross-platform compatibility assessment
- Edge case identification through code path analysis
- Comparison against best practices and KISS principles
Files Analyzed:
- src/hooks/*.ts (6 files)
- src/services/worker-service.ts
- src/services/worker/*.ts (10+ files)
- src/servers/mcp-server.ts
- src/shared/*.ts (worker-utils, early-settings, paths)
- src/utils/*.ts (logger, silent-debug, tag-stripping)