mirror of https://github.com/thedotmack/claude-mem synced 2026-04-25 17:15:04 +02:00

Files

Alex Newman c3761a2204 Refactor silent debugging to happy path error handling

- Replaced instances of silentDebug with happy_path_error__with_fallback across multiple files to improve error logging and handling.
- Updated the utility function to provide clearer semantics for error handling when expected values are missing.
- Introduced a script to find potential silent failures in the codebase that may need to be addressed with the new error handling approach.

2025-12-09 15:09:44 -05:00

17 KiB

Raw Blame History

Claude-Mem Smart Install & Plugin Hooks - Comprehensive Analysis

Generated: 2025-12-09 Scope: Smart install system, all plugin hooks, cross-platform compatibility, error handling, edge cases

Executive Summary

This report provides a comprehensive analysis of claude-mem's smart install system and plugin hook infrastructure. The analysis focuses on cross-platform compatibility, error handling patterns, artificial blockers, and edge case handling.

Key Findings:

✅ Overall architecture is well-designed with clear separation of concerns
⚠️ Multiple cross-platform compatibility issues identified
⚠️ Several silent failure patterns that hinder debugging
⚠️ Artificial blockers that could prevent legitimate use cases
⚠️ Inconsistent timeout values across different components
✅ No nested try-catch anti-patterns found

Architecture Overview

Smart Install System Flow

User Invokes Hook
    ↓
ensureWorkerRunning() [worker-utils.ts]
    ↓
isWorkerHealthy() → fetch /health endpoint
    ↓
    ├─ [HEALTHY] → Continue
    └─ [UNHEALTHY] → startWorker()
        ↓
        ├─ [Windows] → PowerShell Start-Process (hidden window)
        └─ [Unix] → PM2 start ecosystem.config.cjs
            ↓
        Wait for health check (15 retries × 1000ms)
            ↓
            ├─ [SUCCESS] → Continue
            └─ [FAILURE] → Throw error with manual recovery instructions

Plugin Hook Lifecycle

SessionStart (context-hook.ts + user-message-hook.ts)
- context-hook: Fetches context via HTTP/curl
- user-message-hook: Displays context to user via stderr
UserPromptSubmit (new-hook.ts)
- Creates/retrieves SDK session
- Strips privacy tags from prompt
- Initializes session via HTTP
PostToolUse (save-hook.ts)
- Filters skipped tools
- Sends observation to worker via HTTP
Stop (summary-hook.ts)
- Parses transcript JSONL
- Extracts last user/assistant messages
- Requests summary generation via HTTP
SessionEnd (cleanup-hook.ts)
- Marks session complete
- Fire-and-forget HTTP request

Cross-Platform Compatibility Issues

🔴 CRITICAL: curl Dependency (context-hook.ts)

Location: src/hooks/context-hook.ts:32

const result = execSync(`curl -s "${url}"`, { encoding: "utf-8", timeout: 5000 });

Issues:

Windows Compatibility: curl is not guaranteed to be available on Windows systems (though included in Windows 10 1803+, it may be missing on older systems or custom installations)
Error Handling: No try-catch around execSync - will throw unhandled exception if curl fails
Redundancy: Uses curl when JavaScript's native fetch is already used everywhere else in the codebase

Impact: High - SessionStart hook will crash if curl is unavailable or returns non-zero exit code

Edge Cases:

Corporate proxies blocking curl
Systems without curl in PATH
curl returning non-zero exit with valid output (warnings, etc.)

Recommendation:

// Replace curl with fetch (already used in user-message-hook.ts)
const response = await fetch(url, { signal: AbortSignal.timeout(5000) });
const result = await response.text();

🟡 MEDIUM: Platform-Specific Process Spawning (worker-utils.ts)

Location: src/shared/worker-utils.ts:55-93

Windows Implementation:

spawnSync('powershell.exe', [
  '-NoProfile',
  '-NonInteractive',
  '-Command',
  `Start-Process -FilePath 'node' -ArgumentList '${workerScript}' -WorkingDirectory '${MARKETPLACE_ROOT}' -WindowStyle Hidden`
])

Issues:

PowerShell Dependency: Assumes PowerShell is available and in PATH
Command Injection Risk: Worker script path inserted directly into command string without escaping
Process Monitoring: Windows approach launches detached process with no PM2 monitoring - harder to debug/restart
Health Check Timeout: Comment says "Windows needs longer timeouts" but timeout is same for all platforms (500ms)

Edge Cases:

Windows systems with PowerShell execution policy restrictions
Paths containing single quotes or special characters
Windows subsystem for Linux (WSL) environments
Wine/Proton compatibility layers

Unix Implementation:

const localPm2Base = path.join(MARKETPLACE_ROOT, 'node_modules', '.bin', 'pm2');
const pm2Command = existsSync(localPm2Base) ? localPm2Base : 'pm2';

Issues:

PM2 Dependency: Falls back to global pm2 if local not found, but doesn't verify it exists
Silent Failure: If PM2 not installed globally, spawnSync will fail with cryptic ENOENT error

Recommendation:

Add pm2 existence check before spawn
Implement consistent process monitoring across platforms
Add path escaping for Windows command construction
Actually implement longer timeout for Windows if needed

🟡 MEDIUM: Git Dependency (paths.ts)

Location: src/shared/paths.ts:89-97

export function getCurrentProjectName(): string {
  try {
    const gitRoot = execSync('git rev-parse --show-toplevel', {
      cwd: process.cwd(),
      encoding: 'utf8',
      stdio: ['pipe', 'pipe', 'ignore']
    }).trim();
    return basename(gitRoot);
  } catch {
    return basename(process.cwd());
  }
}

Issues:

Git Assumption: Assumes git is installed and available in PATH
Non-Git Projects: Silently falls back to cwd basename, but this behavior is undocumented

Edge Cases:

Projects not using git
Monorepos where cwd !== git root is desired
Systems without git installed

Status: ✅ Already handled with fallback, but could benefit from debug logging

Error Handling Analysis

🔴 CRITICAL: Silent Failures Without Logging

1. Settings File Loading (early-settings.ts:20-28)

try {
  if (existsSync(SETTINGS_PATH)) {
    const data = JSON.parse(readFileSync(SETTINGS_PATH, 'utf-8'));
    const fileValue = data.env?.[key];
    if (fileValue !== undefined) return fileValue;
  }
} catch {
  // Fail silently - fall through to env var
}

Problem:

Invalid JSON in settings file fails silently
File read permission errors fail silently
Users have no way to know their settings file is being ignored

Impact: High - Users may think settings are applied when they're actually using defaults

Recommendation:

} catch (error) {
  logger.warn('SETTINGS', 'Failed to load settings file', { path: SETTINGS_PATH }, error);
}

2. Worker Startup Failure (worker-utils.ts:104-107)

try {
  // ... worker startup logic ...
} catch (error) {
  // Failed to start worker
  return false;
}

Problem:

Catches ALL errors during worker startup
Returns boolean with no information about what failed
User only gets generic error after all retries exhausted

Impact: High - Makes debugging worker startup issues extremely difficult

Recommendation:

} catch (error) {
  logger.error('WORKER', 'Failed to start worker', {}, error as Error);
  return false;
}

3. Worker Health Check (worker-utils.ts:30-40)

async function isWorkerHealthy(): Promise<boolean> {
  try {
    const port = getWorkerPort();
    const response = await fetch(`http://127.0.0.1:${port}/health`, {
      signal: AbortSignal.timeout(HEALTH_CHECK_TIMEOUT_MS)
    });
    return response.ok;
  } catch {
    return false;
  }
}

Problem:

Network errors, timeouts, and non-200 responses all indistinguishable
No logging at all - completely silent

Impact: Medium - Hard to debug why health checks fail

Recommendation:

} catch (error) {
  logger.debug('WORKER', 'Health check failed', { port }, error);
  return false;
}

4. Tool Formatting (logger.ts:122-124)

try {
  const input = typeof toolInput === 'string' ? JSON.parse(toolInput) : toolInput;
  // ...
} catch {
  return toolName;
}

Problem:

Invalid JSON in tool input fails silently
Could mask data corruption issues

Impact: Low - Only affects log formatting

Status: ✅ Acceptable for log formatting, but could log at DEBUG level

🟢 GOOD: No Nested Try-Catch Anti-Patterns

Analysis confirmed zero instances of nested try-catch blocks. Error handling is consistently at single level per function.

Artificial Blockers & Unnecessary Checks

🔴 CRITICAL: First-Run Detection (user-message-hook.ts:14-40)

const nodeModulesPath = join(pluginDir, 'node_modules');

if (!existsSync(nodeModulesPath)) {
  // Show first-time setup message
  console.error(`...`);
  process.exit(3);
}

Problems:

False Positive: Will trigger if user manually deletes node_modules (e.g., for troubleshooting)
Installation Race: Could fail if installation is still in progress
Hook-Level Check: Runs on EVERY SessionStart, not just actual first run

Impact: High - Prevents usage until node_modules exists, even if dependencies are installed elsewhere

Edge Cases:

User runs rm -rf node_modules for troubleshooting
Package manager installation interrupted
Symlinked node_modules (some package managers)

Recommendation:

Use a .first-run-complete marker file instead
Move check to npm postinstall script
Make check more robust (check for specific required modules)

🟡 MEDIUM: Overly Specific Validation (paths.ts:117-119)

if (!existsSync(join(commandsDir, 'save.md'))) {
  throw new Error('Package commands directory missing required files');
}

Problem:

Checks for ONE specific file to validate entire directory
Hardcoded filename could break if files reorganized
Error message doesn't specify what's missing

Impact: Medium - Could prevent package from working after internal refactoring

Recommendation:

Remove check entirely (let actual command invocation fail with better error)
Or check all required files if validation is critical

🟡 MEDIUM: Duplicate Health Endpoints

Locations:

src/services/worker-service.ts:107 - /api/health
src/services/worker/http/routes/ViewerRoutes.ts:27 - /health

Usage:

worker-utils.ts uses /health
mcp-server.ts uses /api/health

Problem:

Redundant endpoints doing the same thing
Inconsistent usage across codebase
Maintenance burden

Impact: Low - Both work, but creates confusion

Recommendation:

Standardize on /api/health (follows REST convention)
Remove /health endpoint
Update worker-utils.ts to use /api/health

Timeout Configuration Issues

Inconsistent Timeouts Across Components

Component	Timeout	Location	Purpose
Health check	500ms	worker-utils.ts:13	Check if worker alive
Worker startup wait	1000ms	worker-utils.ts:14	Wait between health checks
Worker startup retries	15x	worker-utils.ts:15	Max retries (15s total)
Hook HTTP requests	2000ms	cleanup-hook.ts:61, save-hook.ts:70, summary-hook.ts:164	Send data to worker
New hook session init	5000ms	new-hook.ts:129	Initialize session
Context hook fetch	5000ms	context-hook.ts:32	Fetch context via curl
User message hook	5000ms	user-message-hook.ts:52	Fetch context display

Problems:

Health Check Too Aggressive: 500ms may be too short for loaded systems or slow network
No Platform Adjustment: Comment says "Windows needs longer timeouts" but values are same
Hook Timeout Variation: Some hooks use 2s, others use 5s with no clear reasoning

Recommendations:

Increase health check timeout to 1000ms minimum
Actually implement longer timeouts for Windows
Standardize hook timeouts to 5000ms across the board
Make timeouts configurable via settings

Edge Case Analysis

Handled Well ✅

JSONL Parsing: summary-hook.ts continues on malformed lines (60-64, 117-121)
Git Not Available: paths.ts falls back to cwd basename (89-97)
Settings File Missing: early-settings.ts falls back to env vars and defaults (20-28)
Privacy Tags: new-hook.ts handles fully-private prompts (99-109)
Tool Skipping: save-hook.ts filters low-value tools (24-30)

Missing Edge Case Handling ⚠️

curl Failure: context-hook.ts has no error handling for curl failures
PM2 Not Installed: worker-utils.ts assumes pm2 exists globally
PowerShell Restrictions: worker-utils.ts doesn't check execution policy
Concurrent Worker Starts: No locking to prevent multiple hooks from starting worker simultaneously
Port Already In Use: No detection or recovery if worker port is taken
Zombie Processes: Windows approach doesn't track PIDs, can't detect/kill zombies

Recommendations Summary

High Priority 🔴

Replace curl with fetch in context-hook.ts
- Eliminates external dependency
- Consistent with rest of codebase
- Better error handling
Add logging to silent failures
- early-settings.ts: Log when settings file fails to load
- worker-utils.ts: Log startup failures with details
- worker-utils.ts: Log health check failures at debug level
Fix first-run detection
- Use marker file instead of node_modules check
- More reliable and intentional

Medium Priority 🟡

Verify PM2 availability before attempting to use it
- Check existence before spawn
- Provide clear error message if missing
Implement platform-specific timeouts
- Actually use longer timeouts on Windows as comment suggests
- Make timeouts configurable
Standardize health endpoints
- Remove duplicate /health endpoint
- Use /api/health everywhere
Add path escaping for Windows PowerShell commands
- Prevent injection issues
- Handle paths with special characters

Low Priority 🟢

Standardize HTTP timeouts across all hooks
Add concurrent startup protection (locking mechanism)
Improve error messages with actionable recovery steps

Testing Recommendations

Cross-Platform Testing Needed

Windows Environments:
- Windows 10 (various versions)
- Windows 11
- Windows Server
- WSL/WSL2
- PowerShell execution policies (Restricted, RemoteSigned, Unrestricted)
Unix Environments:
- macOS (Intel + Apple Silicon)
- Linux (Ubuntu, Fedora, Arch)
- FreeBSD
Edge Environments:
- Docker containers
- CI/CD environments
- Systems without git installed
- Systems without curl (or with restricted curl)
- Corporate networks with proxies
- Low-spec systems (slow startup)

Test Scenarios

Cold Start: First run with no existing data
Corrupt Settings: Invalid JSON in settings.json
Missing Dependencies: No PM2, no git, no curl
Port Conflicts: Worker port already in use
Rapid Hook Invocations: Multiple hooks trying to start worker simultaneously
Permission Issues: Read-only filesystem, restricted execution
Network Issues: Localhost blocked, slow network

Code Quality Assessment

Strengths ✅

Clean separation of concerns (hooks → worker → database)
No nested try-catch anti-patterns
Consistent use of modern async/await
Good use of TypeScript for type safety
Idempotent database operations
Clear documentation in critical sections

Weaknesses ⚠️

Silent failures hinder debugging
Inconsistent error handling patterns
Platform-specific code not fully tested/documented
Timeout configuration hardcoded and inconsistent
Some artificial blockers prevent legitimate use cases

Technical Debt

Duplicate health endpoints
curl dependency when fetch available
PM2 dependency on Unix but not Windows (inconsistent monitoring)
First-run detection using node_modules existence
Hardcoded timeout values

Conclusion

The claude-mem smart install and plugin hook system is architecturally sound with a well-designed separation of concerns. However, several cross-platform compatibility issues and silent failure patterns could cause problems in production, particularly on Windows systems or in edge case scenarios.

The highest priority improvements are:

Removing the curl dependency
Adding proper logging to silent failures
Fixing the fragile first-run detection
Verifying external dependencies before use

These changes would significantly improve debuggability and cross-platform reliability without requiring major architectural changes.

Analysis Methodology:

Systematic review of all TypeScript source files
Static analysis of error handling patterns
Cross-platform compatibility assessment
Edge case identification through code path analysis
Comparison against best practices and KISS principles

Files Analyzed:

src/hooks/*.ts (6 files)
src/services/worker-service.ts
src/services/worker/*.ts (10+ files)
src/servers/mcp-server.ts
src/shared/*.ts (worker-utils, early-settings, paths)
src/utils/*.ts (logger, silent-debug, tag-stripping)

17 KiB Raw Blame History Unescape Escape

Claude-Mem Smart Install & Plugin Hooks - Comprehensive Analysis

Executive Summary

Architecture Overview

Smart Install System Flow

Plugin Hook Lifecycle

Cross-Platform Compatibility Issues

🔴 CRITICAL: curl Dependency (context-hook.ts)

🟡 MEDIUM: Platform-Specific Process Spawning (worker-utils.ts)

🟡 MEDIUM: Git Dependency (paths.ts)

Error Handling Analysis

🔴 CRITICAL: Silent Failures Without Logging

1. Settings File Loading (early-settings.ts:20-28)

2. Worker Startup Failure (worker-utils.ts:104-107)

3. Worker Health Check (worker-utils.ts:30-40)

4. Tool Formatting (logger.ts:122-124)

🟢 GOOD: No Nested Try-Catch Anti-Patterns

Artificial Blockers & Unnecessary Checks

🔴 CRITICAL: First-Run Detection (user-message-hook.ts:14-40)

🟡 MEDIUM: Overly Specific Validation (paths.ts:117-119)

🟡 MEDIUM: Duplicate Health Endpoints

Timeout Configuration Issues

Inconsistent Timeouts Across Components

Edge Case Analysis

Handled Well ✅

Missing Edge Case Handling ⚠️

Recommendations Summary

High Priority 🔴

Medium Priority 🟡

Low Priority 🟢

Testing Recommendations

Cross-Platform Testing Needed

Test Scenarios

Code Quality Assessment

Strengths ✅

Weaknesses ⚠️

Technical Debt

Conclusion

17 KiB

Raw Blame History