feat: add smart-explore AST-based code navigation (#1244)

* feat: add smart-file-read module for token-optimized semantic code search

- Created package.json for the smart-file-read module with dependencies and scripts.
- Implemented parser.ts for code structure parsing using tree-sitter, supporting multiple languages.
- Developed search.ts for searching code files and symbols with grep-style and structural matching.
- Added test-run.mjs for testing search and outline functionalities.
- Configured TypeScript with tsconfig.json for strict type checking and module resolution.

* fix: update .gitignore to include _tree-sitter and remove unused subproject

* feat: add preliminary results and skill recommendation for smart-explore module

* chore: remove outdated plan.md file detailing session start hook issues

* feat: update Smart File Read integration plan and skill documentation for smart-explore

* feat: migrate Smart File Read to web-tree-sitter WASM for cross-platform compatibility

* refactor: switch to tree-sitter CLI for parsing and enhance search functionality

- Updated `parser.ts` to utilize the tree-sitter CLI for AST extraction instead of native bindings, improving compatibility and performance.
- Removed grammar loading logic and replaced it with a path resolution for grammar packages.
- Implemented batch parsing in `parseFilesBatch` to handle multiple files in a single CLI call, enhancing search speed.
- Refactored `searchCodebase` to collect files and parse them in batches, streamlining the search process.
- Adjusted symbol extraction logic to accommodate the new parsing method and ensure accurate symbol matching.

* feat: update Smart File Read integration plan to utilize tree-sitter CLI for improved performance and cross-platform compatibility

* feat: add smart-file-read parser and search to src/services

Copy validated tree-sitter CLI-based parser and search modules from
smart-file-read prototype into the claude-mem source tree for MCP
tool integration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: register smart_search, smart_unfold, smart_outline MCP tools

Add 3 tree-sitter AST-based code exploration tools to the MCP server.
Direct execution (no HTTP delegation) — they call parser/search
functions directly for sub-second response times.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add tree-sitter CLI deps to build system and plugin runtime

Externalize tree-sitter packages in esbuild MCP server build. Add
10 grammar packages + CLI to plugin package.json for runtime install.
Remove unused @chroma-core/default-embed from plugin deps.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: create smart-explore skill with 3-layer workflow docs

Progressive disclosure workflow: search -> outline -> unfold.
Documents all 3 MCP tools with parameters and token economics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add comprehensive documentation for the smart-explore feature

- Introduced a detailed technical reference covering the architecture, parser, search engine, and tool registration for the smart-explore feature in claude-mem.
- Documented the three-layer workflow: search, outline, and unfold, along with their respective MCP tools.
- Explained the parsing process using tree-sitter, including language support, query patterns, and symbol extraction.
- Outlined the search module's functionality, including file discovery, batch parsing, and relevance scoring.
- Provided insights into build system integration and token economics for efficient code exploration.

* chore: remove experiment artifacts, prototypes, and plan files

Remove A/B test docs, prototype smart-file-read directory, and
implementation plans. Keep only production code.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: simplify hooks configuration and remove setup script

* fix: use execFileSync to prevent command injection in tree-sitter parser

Replaces execSync shell string with execFileSync + argument array,
eliminating shell interpretation of file paths. Also corrects
file_pattern description from "Glob pattern" to "Substring filter".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Alex Newman
2026-02-25 21:00:26 -05:00
committed by GitHub
parent 9ab119932a
commit 0e502dbd21
17 changed files with 1634 additions and 594 deletions

1
.gitignore vendored
View File

@@ -2,6 +2,7 @@ datasets/
node_modules/
dist/
!installer/dist/
**/_tree-sitter/
*.log
.DS_Store
.env

3
.markdownlint.json Normal file
View File

@@ -0,0 +1,3 @@
{
"MD013": false
}

View File

@@ -117,6 +117,16 @@
"@types/react-dom": "^18.3.0",
"esbuild": "^0.27.2",
"np": "^11.0.2",
"tree-sitter-c": "^0.24.1",
"tree-sitter-cli": "^0.26.5",
"tree-sitter-cpp": "^0.23.4",
"tree-sitter-go": "^0.25.0",
"tree-sitter-java": "^0.23.5",
"tree-sitter-javascript": "^0.25.0",
"tree-sitter-python": "^0.25.0",
"tree-sitter-ruby": "^0.23.1",
"tree-sitter-rust": "^0.24.0",
"tree-sitter-typescript": "^0.23.2",
"tsx": "^4.20.6",
"typescript": "^5.3.0"
}

52
plan.md
View File

@@ -1,52 +0,0 @@
# Fix: SessionStart Hook "startup hook error" — Worker Not Waiting
## Root Cause
The **installed plugin** (`~/.claude/plugins/marketplaces/thedotmack/`) is version **10.2.5** and has **none** of the recent fixes:
| Fix | Repo Status | Installed Status |
|-----|-------------|-----------------|
| Hook group split (smart-install isolated from worker start) | In `plugin/hooks/hooks.json` | **Missing** — all 3 hooks in one group, smart-install failure blocks worker |
| `waitForReadiness()` after spawn | In `src/services/infrastructure/HealthMonitor.ts` | **Missing** — 0 occurrences in installed `worker-service.cjs` |
| Early `initializationCompleteFlag` (after DB+search, not MCP) | In `src/services/worker-service.ts` | **Missing** — flag set after MCP connection (5+ minute wait) |
The changes exist in source code but were **never built and synced** to the installed location.
---
## Phase 1: Build and Sync
```bash
npm run build-and-sync
```
### Verification
```bash
# 1. Confirm waitForReadiness exists in installed build
grep -c "waitForReadiness" ~/.claude/plugins/marketplaces/thedotmack/plugin/scripts/worker-service.cjs
# Expected: > 0
# 2. Confirm hooks.json has two SessionStart groups (the split)
python3 -c "import json; d=json.load(open('$(echo $HOME)/.claude/plugins/marketplaces/thedotmack/plugin/hooks/hooks.json')); print('SessionStart groups:', len(d['hooks']['SessionStart']))"
# Expected: 2
# 3. Confirm initializationCompleteFlag is set before MCP connection
grep -n "Core initialization complete" ~/.claude/plugins/marketplaces/thedotmack/plugin/scripts/worker-service.cjs | head -1
# Expected: appears BEFORE "MCP server connected"
```
## Phase 2: Restart Worker and Test
```bash
# Stop existing worker
bun plugin/scripts/worker-service.cjs stop
# Verify stopped
curl -s http://127.0.0.1:37777/api/health && echo "STILL RUNNING" || echo "STOPPED"
```
Then start a new Claude Code session and verify:
- No "SessionStart:startup hook error" messages
- Worker is running: `curl http://127.0.0.1:37777/api/health`
- Readiness endpoint works: `curl http://127.0.0.1:37777/api/readiness`

View File

@@ -1,40 +1,18 @@
{
"description": "Claude-mem memory system hooks",
"hooks": {
"Setup": [
{
"matcher": "*",
"hooks": [
{
"type": "command",
"command": "_R=\"${CLAUDE_PLUGIN_ROOT}\"; [ -z \"$_R\" ] && _R=\"$HOME/.claude/plugins/marketplaces/thedotmack/plugin\"; \"$_R/scripts/setup.sh\"",
"timeout": 300
}
]
}
],
"SessionStart": [
{
"matcher": "startup|clear|compact",
"hooks": [
{
"type": "command",
"command": "_R=\"${CLAUDE_PLUGIN_ROOT}\"; [ -z \"$_R\" ] && _R=\"$HOME/.claude/plugins/marketplaces/thedotmack/plugin\"; node \"$_R/scripts/smart-install.js\"",
"command": "_R=\"${CLAUDE_PLUGIN_ROOT:-$HOME/.claude/plugins/marketplaces/thedotmack/plugin}\"; node \"$_R/scripts/smart-install.js\"",
"timeout": 300
}
]
},
{
"matcher": "startup|clear|compact",
"hooks": [
{
"type": "command",
"command": "_R=\"${CLAUDE_PLUGIN_ROOT}\"; [ -z \"$_R\" ] && _R=\"$HOME/.claude/plugins/marketplaces/thedotmack/plugin\"; node \"$_R/scripts/bun-runner.js\" \"$_R/scripts/worker-service.cjs\" start",
"timeout": 60
},
{
"type": "command",
"command": "_R=\"${CLAUDE_PLUGIN_ROOT}\"; [ -z \"$_R\" ] && _R=\"$HOME/.claude/plugins/marketplaces/thedotmack/plugin\"; node \"$_R/scripts/bun-runner.js\" \"$_R/scripts/worker-service.cjs\" hook claude-code context",
"command": "_R=\"${CLAUDE_PLUGIN_ROOT:-$HOME/.claude/plugins/marketplaces/thedotmack/plugin}\"; node \"$_R/scripts/bun-runner.js\" \"$_R/scripts/worker-service.cjs\" hook claude-code context",
"timeout": 60
}
]
@@ -45,7 +23,7 @@
"hooks": [
{
"type": "command",
"command": "_R=\"${CLAUDE_PLUGIN_ROOT}\"; [ -z \"$_R\" ] && _R=\"$HOME/.claude/plugins/marketplaces/thedotmack/plugin\"; node \"$_R/scripts/bun-runner.js\" \"$_R/scripts/worker-service.cjs\" hook claude-code session-init",
"command": "_R=\"${CLAUDE_PLUGIN_ROOT:-$HOME/.claude/plugins/marketplaces/thedotmack/plugin}\"; node \"$_R/scripts/bun-runner.js\" \"$_R/scripts/worker-service.cjs\" hook claude-code session-init",
"timeout": 60
}
]
@@ -53,11 +31,10 @@
],
"PostToolUse": [
{
"matcher": "*",
"hooks": [
{
"type": "command",
"command": "_R=\"${CLAUDE_PLUGIN_ROOT}\"; [ -z \"$_R\" ] && _R=\"$HOME/.claude/plugins/marketplaces/thedotmack/plugin\"; node \"$_R/scripts/bun-runner.js\" \"$_R/scripts/worker-service.cjs\" hook claude-code observation",
"command": "_R=\"${CLAUDE_PLUGIN_ROOT:-$HOME/.claude/plugins/marketplaces/thedotmack/plugin}\"; node \"$_R/scripts/bun-runner.js\" \"$_R/scripts/worker-service.cjs\" hook claude-code observation",
"timeout": 120
}
]
@@ -68,13 +45,13 @@
"hooks": [
{
"type": "command",
"command": "_R=\"${CLAUDE_PLUGIN_ROOT}\"; [ -z \"$_R\" ] && _R=\"$HOME/.claude/plugins/marketplaces/thedotmack/plugin\"; node \"$_R/scripts/bun-runner.js\" \"$_R/scripts/worker-service.cjs\" hook claude-code summarize",
"command": "_R=\"${CLAUDE_PLUGIN_ROOT:-$HOME/.claude/plugins/marketplaces/thedotmack/plugin}\"; node \"$_R/scripts/bun-runner.js\" \"$_R/scripts/worker-service.cjs\" hook claude-code summarize",
"timeout": 120
},
{
"type": "command",
"command": "_R=\"${CLAUDE_PLUGIN_ROOT}\"; [ -z \"$_R\" ] && _R=\"$HOME/.claude/plugins/marketplaces/thedotmack/plugin\"; node \"$_R/scripts/bun-runner.js\" \"$_R/scripts/worker-service.cjs\" hook claude-code session-complete",
"timeout": 30
"command": "_R=\"${CLAUDE_PLUGIN_ROOT:-$HOME/.claude/plugins/marketplaces/thedotmack/plugin}\"; node \"$_R/scripts/bun-runner.js\" \"$_R/scripts/worker-service.cjs\" hook claude-code session-complete",
"timeout": 120
}
]
}

View File

@@ -5,7 +5,16 @@
"description": "Runtime dependencies for claude-mem bundled hooks",
"type": "module",
"dependencies": {
"@chroma-core/default-embed": "^0.1.9"
"tree-sitter-cli": "^0.26.5",
"tree-sitter-c": "^0.24.1",
"tree-sitter-cpp": "^0.23.4",
"tree-sitter-go": "^0.25.0",
"tree-sitter-java": "^0.23.5",
"tree-sitter-javascript": "^0.25.0",
"tree-sitter-python": "^0.25.0",
"tree-sitter-ruby": "^0.23.1",
"tree-sitter-rust": "^0.24.0",
"tree-sitter-typescript": "^0.23.2"
},
"engines": {
"node": ">=18.0.0",

View File

@@ -1,3 +1,5 @@
Never read built source files in this directory. These are compiled outputs — read the source files in `src/` instead.
<claude-mem-context>
# Recent Activity

File diff suppressed because one or more lines are too long

View File

@@ -1,228 +0,0 @@
#!/usr/bin/env bash
#
# claude-mem Setup Hook
# Ensures dependencies are installed before plugin runs
#
set -euo pipefail
# Use CLAUDE_PLUGIN_ROOT if available, otherwise detect from script location
if [[ -z "${CLAUDE_PLUGIN_ROOT:-}" ]]; then
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
ROOT="$(dirname "$SCRIPT_DIR")"
else
ROOT="$CLAUDE_PLUGIN_ROOT"
fi
MARKER="$ROOT/.install-version"
PKG_JSON="$ROOT/package.json"
# Colors (when terminal supports it)
if [[ -t 2 ]]; then
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[0;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color
else
RED='' GREEN='' YELLOW='' BLUE='' NC=''
fi
log_info() { echo -e "${BLUE}${NC} $*" >&2; }
log_ok() { echo -e "${GREEN}${NC} $*" >&2; }
log_warn() { echo -e "${YELLOW}${NC} $*" >&2; }
log_error() { echo -e "${RED}${NC} $*" >&2; }
#
# Detect Bun - check PATH and common locations
#
find_bun() {
# Try PATH first
if command -v bun &>/dev/null; then
echo "bun"
return 0
fi
# Check common install locations
local paths=(
"$HOME/.bun/bin/bun"
"/usr/local/bin/bun"
"/opt/homebrew/bin/bun"
)
for p in "${paths[@]}"; do
if [[ -x "$p" ]]; then
echo "$p"
return 0
fi
done
return 1
}
#
# Detect uv - check PATH and common locations
#
find_uv() {
# Try PATH first
if command -v uv &>/dev/null; then
echo "uv"
return 0
fi
# Check common install locations
local paths=(
"$HOME/.local/bin/uv"
"$HOME/.cargo/bin/uv"
"/usr/local/bin/uv"
"/opt/homebrew/bin/uv"
)
for p in "${paths[@]}"; do
if [[ -x "$p" ]]; then
echo "$p"
return 0
fi
done
return 1
}
#
# Get package.json version
#
get_pkg_version() {
if [[ -f "$PKG_JSON" ]]; then
# Simple grep-based extraction (no jq dependency)
grep -o '"version"[[:space:]]*:[[:space:]]*"[^"]*"' "$PKG_JSON" | head -1 | sed 's/.*"\([^"]*\)"$/\1/'
fi
}
#
# Get marker version (if exists)
#
get_marker_version() {
if [[ -f "$MARKER" ]]; then
grep -o '"version"[[:space:]]*:[[:space:]]*"[^"]*"' "$MARKER" | head -1 | sed 's/.*"\([^"]*\)"$/\1/'
fi
}
#
# Get marker's recorded bun version
#
get_marker_bun() {
if [[ -f "$MARKER" ]]; then
grep -o '"bun"[[:space:]]*:[[:space:]]*"[^"]*"' "$MARKER" | head -1 | sed 's/.*"\([^"]*\)"$/\1/'
fi
}
#
# Check if install is needed
#
needs_install() {
# No node_modules? Definitely need install
if [[ ! -d "$ROOT/node_modules" ]]; then
return 0
fi
# No marker? Need install
if [[ ! -f "$MARKER" ]]; then
return 0
fi
local pkg_ver marker_ver bun_ver marker_bun
pkg_ver=$(get_pkg_version)
marker_ver=$(get_marker_version)
# Version mismatch? Need install
if [[ "$pkg_ver" != "$marker_ver" ]]; then
return 0
fi
# Bun version changed? Need install
if BUN_PATH=$(find_bun); then
bun_ver=$("$BUN_PATH" --version 2>/dev/null || echo "")
marker_bun=$(get_marker_bun)
if [[ -n "$bun_ver" && "$bun_ver" != "$marker_bun" ]]; then
return 0
fi
fi
# All good, no install needed
return 1
}
#
# Write version marker after successful install
#
write_marker() {
local bun_ver uv_ver pkg_ver
pkg_ver=$(get_pkg_version)
bun_ver=$("$BUN_PATH" --version 2>/dev/null || echo "unknown")
if UV_PATH=$(find_uv); then
uv_ver=$("$UV_PATH" --version 2>/dev/null | head -1 || echo "unknown")
else
uv_ver="not-installed"
fi
cat > "$MARKER" <<EOF
{
"version": "$pkg_ver",
"bun": "$bun_ver",
"uv": "$uv_ver",
"installedAt": "$(date -u +%Y-%m-%dT%H:%M:%SZ)"
}
EOF
}
#
# Main
#
# 1. Check for Bun
BUN_PATH=$(find_bun) || true
if [[ -z "$BUN_PATH" ]]; then
log_error "Bun runtime not found!"
echo "" >&2
echo "claude-mem requires Bun to run. Please install it:" >&2
echo "" >&2
echo " curl -fsSL https://bun.sh/install | bash" >&2
echo "" >&2
echo "Or on macOS with Homebrew:" >&2
echo "" >&2
echo " brew install oven-sh/bun/bun" >&2
echo "" >&2
echo "Then restart your terminal and try again." >&2
exit 1
fi
BUN_VERSION=$("$BUN_PATH" --version 2>/dev/null || echo "unknown")
log_ok "Bun $BUN_VERSION found at $BUN_PATH"
# 2. Check for uv (optional - for Python/Chroma support)
UV_PATH=$(find_uv) || true
if [[ -z "$UV_PATH" ]]; then
log_warn "uv not found (optional - needed for Python/Chroma vector search)"
echo " To install: curl -LsSf https://astral.sh/uv/install.sh | sh" >&2
else
UV_VERSION=$("$UV_PATH" --version 2>/dev/null | head -1 || echo "unknown")
log_ok "uv $UV_VERSION found"
fi
# 3. Install dependencies if needed
if needs_install; then
log_info "Installing dependencies with Bun..."
if ! "$BUN_PATH" install --cwd "$ROOT"; then
log_error "Failed to install dependencies"
exit 1
fi
write_marker
log_ok "Dependencies installed ($(get_pkg_version))"
else
log_ok "Dependencies up to date ($(get_marker_version))"
fi
exit 0

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,141 @@
---
name: smart-explore
description: Token-optimized structural code search using tree-sitter AST parsing. Use instead of reading full files when you need to understand code structure, find functions, or explore a codebase efficiently.
---
# Smart Explore
Structural code exploration using AST parsing. **This skill overrides your default exploration behavior.** While this skill is active, use smart_search/smart_outline/smart_unfold as your primary tools instead of Read, Grep, and Glob.
## Your Next Tool Call
This skill only loads instructions. You must call the MCP tools yourself. Your next action should be one of:
```
smart_search(query="<topic>", path="./src") -- discover files + symbols across a directory
smart_outline(file_path="<file>") -- structural skeleton of one file
smart_unfold(file_path="<file>", symbol_name="<name>") -- full source of one symbol
```
Do NOT run Grep, Glob, Read, or find to discover files first. `smart_search` walks directories, parses all code files, and returns ranked symbols in one call. It replaces the Glob → Grep → Read discovery cycle.
## 3-Layer Workflow
### Step 1: Search -- Discover Files and Symbols
```
smart_search(query="shutdown", path="./src", max_results=15)
```
**Returns:** Ranked symbols with signatures, line numbers, match reasons, plus folded file views (~2-6k tokens)
```
-- Matching Symbols --
function performGracefulShutdown (services/infrastructure/GracefulShutdown.ts:56)
function httpShutdown (services/infrastructure/HealthMonitor.ts:92)
method WorkerService.shutdown (services/worker-service.ts:846)
-- Folded File Views --
services/infrastructure/GracefulShutdown.ts (7 symbols)
services/worker-service.ts (12 symbols)
```
This is your discovery tool. It finds relevant files AND shows their structure. No Glob/find pre-scan needed.
**Parameters:**
- `query` (string, required) -- What to search for (function name, concept, class name)
- `path` (string) -- Root directory to search (defaults to cwd)
- `max_results` (number) -- Max matching symbols, default 20, max 50
- `file_pattern` (string, optional) -- Filter to specific files/paths
### Step 2: Outline -- Get File Structure
```
smart_outline(file_path="services/worker-service.ts")
```
**Returns:** Complete structural skeleton -- all functions, classes, methods, properties, imports (~1-2k tokens per file)
**Skip this step** when Step 1's folded file views already provide enough structure. Most useful for files not covered by the search results.
**Parameters:**
- `file_path` (string, required) -- Path to the file
### Step 3: Unfold -- See Implementation
Review symbols from Steps 1-2. Pick the ones you need. Unfold only those:
```
smart_unfold(file_path="services/worker-service.ts", symbol_name="shutdown")
```
**Returns:** Full source code of the specified symbol including JSDoc, decorators, and complete implementation (~1-7k tokens depending on symbol size)
**Parameters:**
- `file_path` (string, required) -- Path to the file (as returned by search/outline)
- `symbol_name` (string, required) -- Name of the function/class/method to expand
## When to Use Standard Tools Instead
Use these only when smart_* tools are the wrong fit:
- **Grep:** Exact string/regex search ("find all TODO comments", "where is `ensureWorkerStarted` defined?")
- **Read:** Small files under ~100 lines, non-code files (JSON, markdown, config)
- **Glob:** File path patterns ("find all test files")
For code files over ~100 lines, prefer smart_outline + smart_unfold over Read.
## Workflow Examples
**Discover how a feature works (cross-cutting):**
```
1. smart_search(query="shutdown", path="./src")
-> 14 symbols across 7 files, full picture in one call
2. smart_unfold(file_path="services/infrastructure/GracefulShutdown.ts", symbol_name="performGracefulShutdown")
-> See the core implementation
```
**Navigate a large file:**
```
1. smart_outline(file_path="services/worker-service.ts")
-> 1,466 tokens: 12 functions, WorkerService class with 24 members
2. smart_unfold(file_path="services/worker-service.ts", symbol_name="startSessionProcessor")
-> 1,610 tokens: the specific method you need
Total: ~3,076 tokens vs ~12,000 to Read the full file
```
**Write documentation about code (hybrid workflow):**
```
1. smart_search(query="feature name", path="./src") -- discover all relevant files and symbols
2. smart_outline on key files -- understand structure
3. smart_unfold on important functions -- get implementation details
4. Read on small config/markdown/plan files -- get non-code context
```
Use smart_* tools for code exploration, Read for non-code files. Mix freely.
**Exploration then precision:**
```
1. smart_search(query="session", path="./src", max_results=10)
-> 10 ranked symbols: SessionMetadata, SessionQueueProcessor, SessionSummary...
2. Pick the relevant one, unfold it
```
## Token Economics
| Approach | Tokens | Use Case |
|----------|--------|----------|
| smart_outline | ~1,500 | "What's in this file?" |
| smart_unfold | ~1,600 | "Show me this function" |
| smart_search | ~2,000-6,000 | "How does X work?" |
| Read (full file) | ~12,000+ | When you truly need everything |
| Explore agent | ~20,000-40,000 | Same as smart_search, 6-12x more expensive |
**8x savings** on file understanding (outline + unfold vs Read). **6-12x savings** on exploration vs Explore agent.

1
scripts/CLAUDE.md Normal file
View File

@@ -0,0 +1 @@
Never read built source files in this directory. These are compiled outputs — read the source files in `src/` instead.

View File

@@ -59,8 +59,16 @@ async function buildHooks() {
description: 'Runtime dependencies for claude-mem bundled hooks',
type: 'module',
dependencies: {
// Chroma embedding function with native ONNX binaries (can't be bundled)
'@chroma-core/default-embed': '^0.1.9'
'tree-sitter-cli': '^0.26.5',
'tree-sitter-c': '^0.24.1',
'tree-sitter-cpp': '^0.23.4',
'tree-sitter-go': '^0.25.0',
'tree-sitter-java': '^0.23.5',
'tree-sitter-javascript': '^0.25.0',
'tree-sitter-python': '^0.25.0',
'tree-sitter-ruby': '^0.23.1',
'tree-sitter-rust': '^0.24.0',
'tree-sitter-typescript': '^0.23.2',
},
engines: {
node: '>=18.0.0',
@@ -128,7 +136,19 @@ async function buildHooks() {
outfile: `${hooksDir}/${MCP_SERVER.name}.cjs`,
minify: true,
logLevel: 'error',
external: ['bun:sqlite'],
external: [
'bun:sqlite',
'tree-sitter-cli',
'tree-sitter-javascript',
'tree-sitter-typescript',
'tree-sitter-python',
'tree-sitter-go',
'tree-sitter-rust',
'tree-sitter-ruby',
'tree-sitter-java',
'tree-sitter-c',
'tree-sitter-cpp',
],
define: {
'__DEFAULT_PACKAGE_VERSION__': `"${version}"`
},
@@ -166,6 +186,7 @@ async function buildHooks() {
console.log('\n📋 Verifying distribution files...');
const requiredDistributionFiles = [
'plugin/skills/mem-search/SKILL.md',
'plugin/skills/smart-explore/SKILL.md',
'plugin/hooks/hooks.json',
'plugin/.claude-plugin/plugin.json',
];

View File

@@ -76,7 +76,7 @@ try {
const gitignoreExcludes = getGitignoreExcludes(rootDir);
execSync(
`rsync -av --delete --exclude=.git --exclude=/.mcp.json --exclude=bun.lock --exclude=package-lock.json ${gitignoreExcludes} ./ ~/.claude/plugins/marketplaces/thedotmack/`,
`rsync -av --delete --exclude=.git --exclude=bun.lock --exclude=package-lock.json ${gitignoreExcludes} ./ ~/.claude/plugins/marketplaces/thedotmack/`,
{ stdio: 'inherit' }
);

View File

@@ -28,6 +28,10 @@ import {
ListToolsRequestSchema,
} from '@modelcontextprotocol/sdk/types.js';
import { getWorkerPort, getWorkerHost } from '../shared/worker-utils.js';
import { searchCodebase, formatSearchResults } from '../services/smart-file-read/search.js';
import { parseFile, formatFoldedView, unfoldSymbol } from '../services/smart-file-read/parser.js';
import { readFile } from 'node:fs/promises';
import { resolve } from 'node:path';
/**
* Worker HTTP API configuration
@@ -233,6 +237,118 @@ NEVER fetch full details without filtering first. 10x token savings.`,
handler: async (args: any) => {
return await callWorkerAPIPost('/api/observations/batch', args);
}
},
{
name: 'smart_search',
description: 'Search codebase for symbols, functions, classes using tree-sitter AST parsing. Returns folded structural views with token counts. Use path parameter to scope the search.',
inputSchema: {
type: 'object',
properties: {
query: {
type: 'string',
description: 'Search term — matches against symbol names, file names, and file content'
},
path: {
type: 'string',
description: 'Root directory to search (default: current working directory)'
},
max_results: {
type: 'number',
description: 'Maximum results to return (default: 20)'
},
file_pattern: {
type: 'string',
description: 'Substring filter for file paths (e.g. ".ts", "src/services")'
}
},
required: ['query']
},
handler: async (args: any) => {
const rootDir = resolve(args.path || process.cwd());
const result = await searchCodebase(rootDir, args.query, {
maxResults: args.max_results || 20,
filePattern: args.file_pattern
});
const formatted = formatSearchResults(result, args.query);
return {
content: [{ type: 'text' as const, text: formatted }]
};
}
},
{
name: 'smart_unfold',
description: 'Expand a specific symbol (function, class, method) from a file. Returns the full source code of just that symbol. Use after smart_search or smart_outline to read specific code.',
inputSchema: {
type: 'object',
properties: {
file_path: {
type: 'string',
description: 'Path to the source file'
},
symbol_name: {
type: 'string',
description: 'Name of the symbol to unfold (function, class, method, etc.)'
}
},
required: ['file_path', 'symbol_name']
},
handler: async (args: any) => {
const filePath = resolve(args.file_path);
const content = await readFile(filePath, 'utf-8');
const unfolded = unfoldSymbol(content, filePath, args.symbol_name);
if (unfolded) {
return {
content: [{ type: 'text' as const, text: unfolded }]
};
}
// Symbol not found — show available symbols
const parsed = parseFile(content, filePath);
if (parsed.symbols.length > 0) {
const available = parsed.symbols.map(s => ` - ${s.name} (${s.kind})`).join('\n');
return {
content: [{
type: 'text' as const,
text: `Symbol "${args.symbol_name}" not found in ${args.file_path}.\n\nAvailable symbols:\n${available}`
}]
};
}
return {
content: [{
type: 'text' as const,
text: `Could not parse ${args.file_path}. File may be unsupported or empty.`
}]
};
}
},
{
name: 'smart_outline',
description: 'Get structural outline of a file — shows all symbols (functions, classes, methods, types) with signatures but bodies folded. Much cheaper than reading the full file.',
inputSchema: {
type: 'object',
properties: {
file_path: {
type: 'string',
description: 'Path to the source file'
}
},
required: ['file_path']
},
handler: async (args: any) => {
const filePath = resolve(args.file_path);
const content = await readFile(filePath, 'utf-8');
const parsed = parseFile(content, filePath);
if (parsed.symbols.length > 0) {
return {
content: [{ type: 'text' as const, text: formatFoldedView(parsed) }]
};
}
return {
content: [{
type: 'text' as const,
text: `Could not parse ${args.file_path}. File may use an unsupported language or be empty.`
}]
};
}
}
];

View File

@@ -0,0 +1,666 @@
/**
* Code structure parser — shells out to tree-sitter CLI for AST-based extraction.
*
* No native bindings. No WASM. Just the CLI binary + query patterns.
*
* Supported: JS, TS, Python, Go, Rust, Ruby, Java, C, C++
*
* by Copter Labs
*/
import { execFileSync } from "node:child_process";
import { writeFileSync, mkdtempSync, rmSync, existsSync } from "node:fs";
import { join, dirname } from "node:path";
import { tmpdir } from "node:os";
import { createRequire } from "node:module";
// CJS-safe require for resolving external packages at runtime.
// In ESM: import.meta.url works. In CJS bundle (esbuild): __filename works.
// typeof check avoids ReferenceError in ESM where __filename doesn't exist.
const _require = typeof __filename !== 'undefined'
? createRequire(__filename)
: createRequire(import.meta.url);
// --- Types ---
export interface CodeSymbol {
name: string;
kind: "function" | "class" | "method" | "interface" | "type" | "const" | "variable" | "export" | "struct" | "enum" | "trait" | "impl" | "property" | "getter" | "setter";
signature: string;
jsdoc?: string;
lineStart: number;
lineEnd: number;
parent?: string;
exported: boolean;
children?: CodeSymbol[];
}
export interface FoldedFile {
filePath: string;
language: string;
symbols: CodeSymbol[];
imports: string[];
totalLines: number;
foldedTokenEstimate: number;
}
// --- Language detection ---
const LANG_MAP: Record<string, string> = {
".js": "javascript",
".mjs": "javascript",
".cjs": "javascript",
".jsx": "tsx",
".ts": "typescript",
".tsx": "tsx",
".py": "python",
".pyw": "python",
".go": "go",
".rs": "rust",
".rb": "ruby",
".java": "java",
".c": "c",
".h": "c",
".cpp": "cpp",
".cc": "cpp",
".cxx": "cpp",
".hpp": "cpp",
".hh": "cpp",
};
export function detectLanguage(filePath: string): string {
const ext = filePath.slice(filePath.lastIndexOf("."));
return LANG_MAP[ext] || "unknown";
}
// --- Grammar path resolution ---
const GRAMMAR_PACKAGES: Record<string, string> = {
javascript: "tree-sitter-javascript",
typescript: "tree-sitter-typescript/typescript",
tsx: "tree-sitter-typescript/tsx",
python: "tree-sitter-python",
go: "tree-sitter-go",
rust: "tree-sitter-rust",
ruby: "tree-sitter-ruby",
java: "tree-sitter-java",
c: "tree-sitter-c",
cpp: "tree-sitter-cpp",
};
function resolveGrammarPath(language: string): string | null {
const pkg = GRAMMAR_PACKAGES[language];
if (!pkg) return null;
try {
const packageJsonPath = _require.resolve(pkg + "/package.json");
return dirname(packageJsonPath);
} catch {
return null;
}
}
// --- Query patterns (declarative symbol extraction) ---
const QUERIES: Record<string, string> = {
jsts: `
(function_declaration name: (identifier) @name) @func
(lexical_declaration (variable_declarator name: (identifier) @name value: [(arrow_function) (function_expression)])) @const_func
(class_declaration name: (type_identifier) @name) @cls
(method_definition name: (property_identifier) @name) @method
(interface_declaration name: (type_identifier) @name) @iface
(type_alias_declaration name: (type_identifier) @name) @tdef
(enum_declaration name: (identifier) @name) @enm
(import_statement) @imp
(export_statement) @exp
`,
python: `
(function_definition name: (identifier) @name) @func
(class_definition name: (identifier) @name) @cls
(import_statement) @imp
(import_from_statement) @imp
`,
go: `
(function_declaration name: (identifier) @name) @func
(method_declaration name: (field_identifier) @name) @method
(type_declaration (type_spec name: (type_identifier) @name)) @tdef
(import_declaration) @imp
`,
rust: `
(function_item name: (identifier) @name) @func
(struct_item name: (type_identifier) @name) @struct_def
(enum_item name: (type_identifier) @name) @enm
(trait_item name: (type_identifier) @name) @trait_def
(impl_item type: (type_identifier) @name) @impl_def
(use_declaration) @imp
`,
ruby: `
(method name: (identifier) @name) @func
(class name: (constant) @name) @cls
(module name: (constant) @name) @cls
(call method: (identifier) @name) @imp
`,
java: `
(method_declaration name: (identifier) @name) @method
(class_declaration name: (identifier) @name) @cls
(interface_declaration name: (identifier) @name) @iface
(enum_declaration name: (identifier) @name) @enm
(import_declaration) @imp
`,
generic: `
(function_declaration name: (identifier) @name) @func
(function_definition name: (identifier) @name) @func
(class_declaration name: (identifier) @name) @cls
(class_definition name: (identifier) @name) @cls
(import_statement) @imp
(import_declaration) @imp
`,
};
function getQueryKey(language: string): string {
switch (language) {
case "javascript":
case "typescript":
case "tsx":
return "jsts";
case "python": return "python";
case "go": return "go";
case "rust": return "rust";
case "ruby": return "ruby";
case "java": return "java";
default: return "generic";
}
}
// --- Temp file management ---
let queryTmpDir: string | null = null;
const queryFileCache = new Map<string, string>();
function getQueryFile(queryKey: string): string {
if (queryFileCache.has(queryKey)) return queryFileCache.get(queryKey)!;
if (!queryTmpDir) {
queryTmpDir = mkdtempSync(join(tmpdir(), "smart-read-queries-"));
}
const filePath = join(queryTmpDir, `${queryKey}.scm`);
writeFileSync(filePath, QUERIES[queryKey]);
queryFileCache.set(queryKey, filePath);
return filePath;
}
// --- CLI execution ---
let cachedBinPath: string | null = null;
function getTreeSitterBin(): string {
if (cachedBinPath) return cachedBinPath;
// Try direct binary from tree-sitter-cli package
try {
const pkgPath = _require.resolve("tree-sitter-cli/package.json");
const binPath = join(dirname(pkgPath), "tree-sitter");
if (existsSync(binPath)) {
cachedBinPath = binPath;
return binPath;
}
} catch { /* fall through */ }
// Fallback: assume it's on PATH
cachedBinPath = "tree-sitter";
return cachedBinPath;
}
interface RawCapture {
tag: string;
startRow: number;
startCol: number;
endRow: number;
endCol: number;
text?: string;
}
interface RawMatch {
pattern: number;
captures: RawCapture[];
}
function runQuery(queryFile: string, sourceFile: string, grammarPath: string): RawMatch[] {
const result = runBatchQuery(queryFile, [sourceFile], grammarPath);
return result.get(sourceFile) || [];
}
function runBatchQuery(queryFile: string, sourceFiles: string[], grammarPath: string): Map<string, RawMatch[]> {
if (sourceFiles.length === 0) return new Map();
const bin = getTreeSitterBin();
const execArgs = ["query", "-p", grammarPath, queryFile, ...sourceFiles];
let output: string;
try {
output = execFileSync(bin, execArgs, { encoding: "utf-8", timeout: 30000, stdio: ["pipe", "pipe", "pipe"] });
} catch {
return new Map();
}
return parseMultiFileQueryOutput(output);
}
function parseMultiFileQueryOutput(output: string): Map<string, RawMatch[]> {
const fileMatches = new Map<string, RawMatch[]>();
let currentFile: string | null = null;
let currentMatch: RawMatch | null = null;
for (const line of output.split("\n")) {
// File header: a line that doesn't start with whitespace and isn't empty
if (line.length > 0 && !line.startsWith(" ") && !line.startsWith("\t")) {
currentFile = line.trim();
if (!fileMatches.has(currentFile)) {
fileMatches.set(currentFile, []);
}
currentMatch = null;
continue;
}
if (!currentFile) continue;
const patternMatch = line.match(/^\s+pattern:\s+(\d+)/);
if (patternMatch) {
currentMatch = { pattern: parseInt(patternMatch[1]), captures: [] };
fileMatches.get(currentFile)!.push(currentMatch);
continue;
}
const captureMatch = line.match(
/^\s+capture:\s+(?:\d+\s*-\s*)?(\w+),\s*start:\s*\((\d+),\s*(\d+)\),\s*end:\s*\((\d+),\s*(\d+)\)(?:,\s*text:\s*`([^`]*)`)?/
);
if (captureMatch && currentMatch) {
currentMatch.captures.push({
tag: captureMatch[1],
startRow: parseInt(captureMatch[2]),
startCol: parseInt(captureMatch[3]),
endRow: parseInt(captureMatch[4]),
endCol: parseInt(captureMatch[5]),
text: captureMatch[6],
});
}
}
return fileMatches;
}
// --- Symbol building ---
const KIND_MAP: Record<string, CodeSymbol["kind"]> = {
func: "function",
const_func: "function",
cls: "class",
method: "method",
iface: "interface",
tdef: "type",
enm: "enum",
struct_def: "struct",
trait_def: "trait",
impl_def: "impl",
};
const CONTAINER_KINDS = new Set(["class", "struct", "impl", "trait"]);
function extractSignatureFromLines(lines: string[], startRow: number, endRow: number, maxLen: number = 200): string {
const firstLine = lines[startRow] || "";
let sig = firstLine;
if (!sig.trimEnd().endsWith("{") && !sig.trimEnd().endsWith(":")) {
const chunk = lines.slice(startRow, Math.min(startRow + 10, endRow + 1)).join("\n");
const braceIdx = chunk.indexOf("{");
if (braceIdx !== -1 && braceIdx < 500) {
sig = chunk.slice(0, braceIdx).replace(/\n/g, " ").replace(/\s+/g, " ").trim();
}
}
sig = sig.replace(/\s*[{:]\s*$/, "").trim();
if (sig.length > maxLen) sig = sig.slice(0, maxLen - 3) + "...";
return sig;
}
function findCommentAbove(lines: string[], startRow: number): string | undefined {
const commentLines: string[] = [];
let foundComment = false;
for (let i = startRow - 1; i >= 0; i--) {
const trimmed = lines[i].trim();
if (trimmed === "") {
if (foundComment) break;
continue;
}
if (trimmed.startsWith("/**") || trimmed.startsWith("*") || trimmed.startsWith("*/") ||
trimmed.startsWith("//") || trimmed.startsWith("///") || trimmed.startsWith("//!") ||
trimmed.startsWith("#") || trimmed.startsWith("@")) {
commentLines.unshift(lines[i]);
foundComment = true;
} else {
break;
}
}
return commentLines.length > 0 ? commentLines.join("\n").trim() : undefined;
}
function findPythonDocstringFromLines(lines: string[], startRow: number, endRow: number): string | undefined {
for (let i = startRow + 1; i <= Math.min(startRow + 3, endRow); i++) {
const trimmed = lines[i]?.trim();
if (!trimmed) continue;
if (trimmed.startsWith('"""') || trimmed.startsWith("'''")) return trimmed;
break;
}
return undefined;
}
function isExported(
name: string, startRow: number, endRow: number,
exportRanges: Array<{ startRow: number; endRow: number }>,
lines: string[], language: string
): boolean {
switch (language) {
case "javascript":
case "typescript":
case "tsx":
return exportRanges.some(r => startRow >= r.startRow && endRow <= r.endRow);
case "python":
return !name.startsWith("_");
case "go":
return name.length > 0 && name[0] === name[0].toUpperCase() && name[0] !== name[0].toLowerCase();
case "rust":
return lines[startRow]?.trimStart().startsWith("pub") ?? false;
default:
return true;
}
}
function buildSymbols(matches: RawMatch[], lines: string[], language: string): { symbols: CodeSymbol[]; imports: string[] } {
const symbols: CodeSymbol[] = [];
const imports: string[] = [];
const exportRanges: Array<{ startRow: number; endRow: number }> = [];
const containers: Array<{ sym: CodeSymbol; startRow: number; endRow: number }> = [];
// Collect exports and imports
for (const match of matches) {
for (const cap of match.captures) {
if (cap.tag === "exp") {
exportRanges.push({ startRow: cap.startRow, endRow: cap.endRow });
}
if (cap.tag === "imp") {
imports.push(cap.text || lines[cap.startRow]?.trim() || "");
}
}
}
// Build symbols
for (const match of matches) {
const kindCapture = match.captures.find(c => KIND_MAP[c.tag]);
const nameCapture = match.captures.find(c => c.tag === "name");
if (!kindCapture) continue;
const name = nameCapture?.text || "anonymous";
const startRow = kindCapture.startRow;
const endRow = kindCapture.endRow;
const kind = KIND_MAP[kindCapture.tag];
const comment = findCommentAbove(lines, startRow);
const docstring = language === "python" ? findPythonDocstringFromLines(lines, startRow, endRow) : undefined;
const sym: CodeSymbol = {
name,
kind,
signature: extractSignatureFromLines(lines, startRow, endRow),
jsdoc: comment || docstring,
lineStart: startRow,
lineEnd: endRow,
exported: isExported(name, startRow, endRow, exportRanges, lines, language),
};
if (CONTAINER_KINDS.has(kind)) {
sym.children = [];
containers.push({ sym, startRow, endRow });
}
symbols.push(sym);
}
// Nest methods inside containers
const nested = new Set<CodeSymbol>();
for (const container of containers) {
for (const sym of symbols) {
if (sym === container.sym) continue;
if (sym.lineStart > container.startRow && sym.lineEnd <= container.endRow) {
if (sym.kind === "function") sym.kind = "method";
container.sym.children!.push(sym);
nested.add(sym);
}
}
}
return { symbols: symbols.filter(s => !nested.has(s)), imports };
}
// --- Main parse functions ---
export function parseFile(content: string, filePath: string): FoldedFile {
const language = detectLanguage(filePath);
const lines = content.split("\n");
const grammarPath = resolveGrammarPath(language);
if (!grammarPath) {
return {
filePath, language, symbols: [], imports: [],
totalLines: lines.length, foldedTokenEstimate: 50,
};
}
const queryKey = getQueryKey(language);
const queryFile = getQueryFile(queryKey);
// Write content to temp file with correct extension for language detection
const ext = filePath.slice(filePath.lastIndexOf(".")) || ".txt";
const tmpDir = mkdtempSync(join(tmpdir(), "smart-src-"));
const tmpFile = join(tmpDir, `source${ext}`);
writeFileSync(tmpFile, content);
try {
const matches = runQuery(queryFile, tmpFile, grammarPath);
const result = buildSymbols(matches, lines, language);
const folded = formatFoldedView({
filePath, language,
symbols: result.symbols, imports: result.imports,
totalLines: lines.length, foldedTokenEstimate: 0,
});
return {
filePath, language,
symbols: result.symbols, imports: result.imports,
totalLines: lines.length,
foldedTokenEstimate: Math.ceil(folded.length / 4),
};
} finally {
rmSync(tmpDir, { recursive: true, force: true });
}
}
/**
* Batch parse multiple on-disk files. Groups by language for one CLI call per language.
* Much faster than calling parseFile() per file (one process spawn per language vs per file).
*/
export function parseFilesBatch(
files: Array<{ absolutePath: string; relativePath: string; content: string }>
): Map<string, FoldedFile> {
const results = new Map<string, FoldedFile>();
// Group files by language (and thus by query + grammar)
const languageGroups = new Map<string, typeof files>();
for (const file of files) {
const language = detectLanguage(file.relativePath);
if (!languageGroups.has(language)) languageGroups.set(language, []);
languageGroups.get(language)!.push(file);
}
for (const [language, groupFiles] of languageGroups) {
const grammarPath = resolveGrammarPath(language);
if (!grammarPath) {
// No grammar — return empty results for these files
for (const file of groupFiles) {
const lines = file.content.split("\n");
results.set(file.relativePath, {
filePath: file.relativePath, language, symbols: [], imports: [],
totalLines: lines.length, foldedTokenEstimate: 50,
});
}
continue;
}
const queryKey = getQueryKey(language);
const queryFile = getQueryFile(queryKey);
// Run one batch query for all files of this language
const absolutePaths = groupFiles.map(f => f.absolutePath);
const batchResults = runBatchQuery(queryFile, absolutePaths, grammarPath);
// Build FoldedFile for each file using the batch results
for (const file of groupFiles) {
const lines = file.content.split("\n");
const matches = batchResults.get(file.absolutePath) || [];
const symbolResult = buildSymbols(matches, lines, language);
const folded = formatFoldedView({
filePath: file.relativePath, language,
symbols: symbolResult.symbols, imports: symbolResult.imports,
totalLines: lines.length, foldedTokenEstimate: 0,
});
results.set(file.relativePath, {
filePath: file.relativePath, language,
symbols: symbolResult.symbols, imports: symbolResult.imports,
totalLines: lines.length,
foldedTokenEstimate: Math.ceil(folded.length / 4),
});
}
}
return results;
}
// --- Formatting ---
export function formatFoldedView(file: FoldedFile): string {
const parts: string[] = [];
parts.push(`📁 ${file.filePath} (${file.language}, ${file.totalLines} lines)`);
parts.push("");
if (file.imports.length > 0) {
parts.push(` 📦 Imports: ${file.imports.length} statements`);
for (const imp of file.imports.slice(0, 10)) {
parts.push(` ${imp}`);
}
if (file.imports.length > 10) {
parts.push(` ... +${file.imports.length - 10} more`);
}
parts.push("");
}
for (const sym of file.symbols) {
parts.push(formatSymbol(sym, " "));
}
return parts.join("\n");
}
function formatSymbol(sym: CodeSymbol, indent: string): string {
const parts: string[] = [];
const icon = getSymbolIcon(sym.kind);
const exportTag = sym.exported ? " [exported]" : "";
const lineRange = sym.lineStart === sym.lineEnd
? `L${sym.lineStart + 1}`
: `L${sym.lineStart + 1}-${sym.lineEnd + 1}`;
parts.push(`${indent}${icon} ${sym.name}${exportTag} (${lineRange})`);
parts.push(`${indent} ${sym.signature}`);
if (sym.jsdoc) {
const jsdocLines = sym.jsdoc.split("\n");
const firstLine = jsdocLines.find(l => {
const t = l.replace(/^[\s*/]+/, "").replace(/^['"`]{3}/, "").trim();
return t.length > 0 && !t.startsWith("/**");
});
if (firstLine) {
const cleaned = firstLine.replace(/^[\s*/]+/, "").replace(/^['"`]{3}/, "").replace(/['"`]{3}$/, "").trim();
if (cleaned) {
parts.push(`${indent} 💬 ${cleaned}`);
}
}
}
if (sym.children && sym.children.length > 0) {
for (const child of sym.children) {
parts.push(formatSymbol(child, indent + " "));
}
}
return parts.join("\n");
}
function getSymbolIcon(kind: CodeSymbol["kind"]): string {
const icons: Record<string, string> = {
function: "ƒ", method: "ƒ", class: "◆", interface: "◇",
type: "◇", const: "●", variable: "○", export: "→",
struct: "◆", enum: "▣", trait: "◇", impl: "◈",
property: "○", getter: "⇢", setter: "⇠",
};
return icons[kind] || "·";
}
// --- Unfold ---
export function unfoldSymbol(content: string, filePath: string, symbolName: string): string | null {
const file = parseFile(content, filePath);
const findSymbol = (symbols: CodeSymbol[]): CodeSymbol | null => {
for (const sym of symbols) {
if (sym.name === symbolName) return sym;
if (sym.children) {
const found = findSymbol(sym.children);
if (found) return found;
}
}
return null;
};
const symbol = findSymbol(file.symbols);
if (!symbol) return null;
const lines = content.split("\n");
// Include preceding comments/decorators
let start = symbol.lineStart;
for (let i = symbol.lineStart - 1; i >= 0; i--) {
const trimmed = lines[i].trim();
if (trimmed === "" || trimmed.startsWith("*") || trimmed.startsWith("/**") ||
trimmed.startsWith("///") || trimmed.startsWith("//") ||
trimmed.startsWith("#") || trimmed.startsWith("@") ||
trimmed === "*/") {
start = i;
} else {
break;
}
}
const extracted = lines.slice(start, symbol.lineEnd + 1).join("\n");
return `// 📍 ${filePath} L${start + 1}-${symbol.lineEnd + 1}\n${extracted}`;
}

View File

@@ -0,0 +1,316 @@
/**
* Search module — finds code files and symbols matching a query.
*
* Two search modes:
* 1. Grep-style: find files/lines containing the query string
* 2. Structural: parse files and match against symbol names/signatures
*
* Both return folded views, not raw content.
*
* Uses batch parsing (one CLI call per language) for fast multi-file search.
*/
import { readFile, readdir, stat } from "node:fs/promises";
import { join, relative } from "node:path";
import { parseFilesBatch, formatFoldedView, type FoldedFile } from "./parser.js";
const CODE_EXTENSIONS = new Set([
".js", ".jsx", ".ts", ".tsx", ".mjs", ".cjs",
".py", ".pyw",
".go",
".rs",
".rb",
".java",
".cs",
".cpp", ".c", ".h", ".hpp",
".swift",
".kt",
".php",
".vue", ".svelte",
]);
const IGNORE_DIRS = new Set([
"node_modules", ".git", "dist", "build", ".next", "__pycache__",
".venv", "venv", "env", ".env", "target", "vendor",
".cache", ".turbo", "coverage", ".nyc_output",
".claude", ".smart-file-read",
]);
const MAX_FILE_SIZE = 512 * 1024; // 512KB — skip huge files
export interface SearchResult {
foldedFiles: FoldedFile[];
matchingSymbols: SymbolMatch[];
totalFilesScanned: number;
totalSymbolsFound: number;
tokenEstimate: number;
}
export interface SymbolMatch {
filePath: string;
symbolName: string;
kind: string;
signature: string;
jsdoc?: string;
lineStart: number;
lineEnd: number;
matchReason: string; // why this matched
}
/**
* Walk a directory recursively, yielding file paths.
*/
async function* walkDir(dir: string, rootDir: string, maxDepth: number = 20): AsyncGenerator<string> {
if (maxDepth <= 0) return;
let entries;
try {
entries = await readdir(dir, { withFileTypes: true });
} catch {
return; // permission denied, etc.
}
for (const entry of entries) {
if (entry.name.startsWith(".") && entry.name !== ".") continue;
if (IGNORE_DIRS.has(entry.name)) continue;
const fullPath = join(dir, entry.name);
if (entry.isDirectory()) {
yield* walkDir(fullPath, rootDir, maxDepth - 1);
} else if (entry.isFile()) {
const ext = entry.name.slice(entry.name.lastIndexOf("."));
if (CODE_EXTENSIONS.has(ext)) {
yield fullPath;
}
}
}
}
/**
* Read a file safely, skipping if too large or binary.
*/
async function safeReadFile(filePath: string): Promise<string | null> {
try {
const stats = await stat(filePath);
if (stats.size > MAX_FILE_SIZE) return null;
if (stats.size === 0) return null;
const content = await readFile(filePath, "utf-8");
// Quick binary check — if first 1000 chars have null bytes, skip
if (content.slice(0, 1000).includes("\0")) return null;
return content;
} catch {
return null;
}
}
/**
* Search a codebase for symbols matching a query.
*
* Phase 1: Collect files and read content
* Phase 2: Batch parse all files (one CLI call per language)
* Phase 3: Match query against parsed symbols
*/
export async function searchCodebase(
rootDir: string,
query: string,
options: {
maxResults?: number;
includeImports?: boolean;
filePattern?: string;
} = {}
): Promise<SearchResult> {
const maxResults = options.maxResults || 20;
const queryLower = query.toLowerCase();
const queryParts = queryLower.split(/[\s_\-./]+/).filter(p => p.length > 0);
// Phase 1: Collect files
const filesToParse: Array<{ absolutePath: string; relativePath: string; content: string }> = [];
for await (const filePath of walkDir(rootDir, rootDir)) {
if (options.filePattern) {
const relPath = relative(rootDir, filePath);
if (!relPath.toLowerCase().includes(options.filePattern.toLowerCase())) continue;
}
const content = await safeReadFile(filePath);
if (!content) continue;
filesToParse.push({
absolutePath: filePath,
relativePath: relative(rootDir, filePath),
content,
});
}
// Phase 2: Batch parse (one CLI call per language)
const parsedFiles = parseFilesBatch(filesToParse);
// Phase 3: Match query against symbols
const foldedFiles: FoldedFile[] = [];
const matchingSymbols: SymbolMatch[] = [];
let totalSymbolsFound = 0;
for (const [relPath, parsed] of parsedFiles) {
totalSymbolsFound += countSymbols(parsed);
const pathMatch = matchScore(relPath.toLowerCase(), queryParts);
let fileHasMatch = pathMatch > 0;
const fileSymbolMatches: SymbolMatch[] = [];
const checkSymbols = (symbols: typeof parsed.symbols, parent?: string) => {
for (const sym of symbols) {
let score = 0;
let reason = "";
const nameScore = matchScore(sym.name.toLowerCase(), queryParts);
if (nameScore > 0) {
score += nameScore * 3;
reason = "name match";
}
if (sym.signature.toLowerCase().includes(queryLower)) {
score += 2;
reason = reason ? `${reason} + signature` : "signature match";
}
if (sym.jsdoc && sym.jsdoc.toLowerCase().includes(queryLower)) {
score += 1;
reason = reason ? `${reason} + jsdoc` : "jsdoc match";
}
if (score > 0) {
fileHasMatch = true;
fileSymbolMatches.push({
filePath: relPath,
symbolName: parent ? `${parent}.${sym.name}` : sym.name,
kind: sym.kind,
signature: sym.signature,
jsdoc: sym.jsdoc,
lineStart: sym.lineStart,
lineEnd: sym.lineEnd,
matchReason: reason,
});
}
if (sym.children) {
checkSymbols(sym.children, sym.name);
}
}
};
checkSymbols(parsed.symbols);
if (fileHasMatch) {
foldedFiles.push(parsed);
matchingSymbols.push(...fileSymbolMatches);
}
}
// Sort by relevance and trim
matchingSymbols.sort((a, b) => {
const aScore = matchScore(a.symbolName.toLowerCase(), queryParts);
const bScore = matchScore(b.symbolName.toLowerCase(), queryParts);
return bScore - aScore;
});
const trimmedSymbols = matchingSymbols.slice(0, maxResults);
const relevantFiles = new Set(trimmedSymbols.map(s => s.filePath));
const trimmedFiles = foldedFiles.filter(f => relevantFiles.has(f.filePath)).slice(0, maxResults);
const tokenEstimate = trimmedFiles.reduce((sum, f) => sum + f.foldedTokenEstimate, 0);
return {
foldedFiles: trimmedFiles,
matchingSymbols: trimmedSymbols,
totalFilesScanned: filesToParse.length,
totalSymbolsFound,
tokenEstimate,
};
}
/**
* Score how well query parts match a string.
* Returns 0 for no match, higher for better matches.
*/
function matchScore(text: string, queryParts: string[]): number {
let score = 0;
for (const part of queryParts) {
if (text === part) {
score += 10; // exact match
} else if (text.includes(part)) {
score += 5; // substring match
} else {
// Fuzzy: check if all chars appear in order
let ti = 0;
let matched = 0;
for (const ch of part) {
const idx = text.indexOf(ch, ti);
if (idx !== -1) {
matched++;
ti = idx + 1;
}
}
if (matched === part.length) {
score += 1; // loose fuzzy match
}
}
}
return score;
}
function countSymbols(file: FoldedFile): number {
let count = file.symbols.length;
for (const sym of file.symbols) {
if (sym.children) count += sym.children.length;
}
return count;
}
/**
* Format search results for LLM consumption.
*/
export function formatSearchResults(result: SearchResult, query: string): string {
const parts: string[] = [];
parts.push(`🔍 Smart Search: "${query}"`);
parts.push(` Scanned ${result.totalFilesScanned} files, found ${result.totalSymbolsFound} symbols`);
parts.push(` ${result.matchingSymbols.length} matches across ${result.foldedFiles.length} files (~${result.tokenEstimate} tokens for folded view)`);
parts.push("");
if (result.matchingSymbols.length === 0) {
parts.push(" No matching symbols found.");
return parts.join("\n");
}
// Show matching symbols first (compact)
parts.push("── Matching Symbols ──");
parts.push("");
for (const match of result.matchingSymbols) {
parts.push(` ${match.kind} ${match.symbolName} (${match.filePath}:${match.lineStart + 1})`);
parts.push(` ${match.signature}`);
if (match.jsdoc) {
const firstLine = match.jsdoc.split("\n").find(l => l.replace(/^[\s*/]+/, "").trim().length > 0);
if (firstLine) {
parts.push(` 💬 ${firstLine.replace(/^[\s*/]+/, "").trim()}`);
}
}
parts.push("");
}
// Show folded file views
parts.push("── Folded File Views ──");
parts.push("");
for (const file of result.foldedFiles) {
parts.push(formatFoldedView(file));
parts.push("");
}
parts.push("── Actions ──");
parts.push(' To see full implementation: use smart_unfold with file path and symbol name');
return parts.join("\n");
}