Code Indexer
System Prompt
⚠️ EXECUTION DIRECTIVE: When the user invokes this command, you MUST:
- IMMEDIATELY execute - no questions, no explanations first
- ALWAYS show full output from script/tool execution
- ALWAYS provide summary after execution completes
DO NOT:
- Say "I don't need to take action" - you ALWAYS execute when invoked
- Ask for confirmation unless
requires_confirmation: truein frontmatter - Skip execution even if it seems redundant - run it anyway
The user invoking the command IS the confirmation.
Usage
/code-indexer
Index and search codebase: $ARGUMENTS
Arguments
$ARGUMENTS - Command and Options
Commands:
index <path>- Index a directory or repositorysearch <query>- Search indexed repositories for relevant codestatus- Show indexing status and statisticslist- List all indexed repositoriesupdate <repo>- Update index for a specific repositoryclear <repo>- Remove index for a repository
Options:
--extensions <ext1,ext2>- File extensions to index (default: py,ts,js,rs,go,java)--exclude <patterns>- Patterns to exclude (default: node_modules,pycache,.git)--max-files <n>- Maximum files to index (default: 1000)--output <path>- Custom output path for index (default: .coditect/indexes/)
Examples
# Index current directory
/code-indexer index .
# Index specific repository
/code-indexer index ~/projects/my-app --extensions py,rs
# Search for authentication patterns
/code-indexer search "user authentication JWT token"
# Search for error handling
/code-indexer search "error handling retry logic"
# Check status
/code-indexer status
# Update existing index
/code-indexer update my-app
Purpose
Create semantic indexes of codebases for:
- Code generation - Find similar implementations as references
- Pattern matching - Locate existing patterns to follow
- Knowledge retrieval - RAG-style code search for agents
- Cross-repository learning - Learn from indexed examples
Indexing Process
Phase 1: File Discovery
- Scan target directory for supported file types
- Apply exclusion patterns (node_modules, build, etc.)
- Create file manifest with metadata
Phase 2: Content Analysis
For each file:
- Extract file summary - Purpose, exports, key functions
- Identify patterns - Design patterns, common idioms
- Generate embeddings - Semantic vectors for similarity search
- Extract keywords - Function names, class names, imports
Phase 3: Relationship Mapping
- Import/dependency analysis - What files depend on what
- Cross-file relationships - Shared types, utilities
- Pattern clusters - Group similar implementations
Phase 4: Index Persistence
- Save to JSON - Structured index data
- Create summary - Human-readable index overview
- Update registry - Track indexed repositories
Index Structure
{
"version": "1.0",
"created": "2025-12-14T10:00:00Z",
"repo_name": "my-project",
"repo_path": "/path/to/my-project",
"total_files": 150,
"total_lines": 25000,
"files": [
{
"path": "src/auth/handler.py",
"language": "python",
"lines": 200,
"summary": "Authentication handler with JWT validation",
"exports": ["AuthHandler", "validate_token", "create_session"],
"imports": ["jwt", "datetime", "typing"],
"patterns": ["singleton", "factory"],
"keywords": ["auth", "jwt", "token", "session", "validate"]
}
],
"relationships": [
{
"source": "src/auth/handler.py",
"target": "src/models/user.py",
"type": "imports",
"description": "Uses User model for authentication"
}
],
"pattern_clusters": {
"authentication": ["src/auth/handler.py", "src/auth/middleware.py"],
"data_models": ["src/models/user.py", "src/models/session.py"]
}
}
Search Algorithm
Relevance Scoring
def calculate_relevance(query: str, file: IndexedFile) -> float:
score = 0.0
# 1. File name similarity (30%)
if query_term in file.path.lower():
score += 0.3
# 2. Summary match (25%)
if query_term in file.summary.lower():
score += 0.25
# 3. Keyword match (25%)
keyword_matches = sum(1 for kw in file.keywords if query_term in kw)
score += min(0.25, keyword_matches * 0.05)
# 4. Export/function match (20%)
export_matches = sum(1 for exp in file.exports if query_term in exp.lower())
score += min(0.2, export_matches * 0.05)
return score
Search Output
## Search Results for: "user authentication"
### Top Matches (Score > 0.7)
1. **src/auth/handler.py** (Score: 0.92)
- Summary: Authentication handler with JWT validation
- Key exports: `AuthHandler`, `validate_token`
- Pattern: singleton, factory
- Lines: 200
2. **src/middleware/auth_middleware.py** (Score: 0.85)
- Summary: Express middleware for route authentication
- Key exports: `requireAuth`, `optionalAuth`
- Pattern: middleware, decorator
- Lines: 75
### Related Files
- src/models/user.py (referenced by top matches)
- src/utils/crypto.py (utility for auth)
Integration with Agents
For Code Implementation Agents
# In agent system prompt or context
references = code_indexer.search("authentication middleware express")
context = f"""
Reference implementations found in indexed repositories:
{format_references(references)}
Use these as guidance for implementing the requested feature.
"""
For Pattern Discovery
# Find similar patterns before implementing
similar = code_indexer.search(f"similar to {target_file}")
if similar:
print(f"Found {len(similar)} similar implementations to reference")
Execution Flow
When this command is invoked:
-
Parse arguments - Determine command (index/search/status)
-
For
indexcommand:- Validate target path exists
- Scan for files matching extensions
- Create todo list tracking indexing progress
- Process files in batches
- Build relationships
- Save index to .coditect/indexes/
-
For
searchcommand:- Load relevant indexes
- Parse search query into terms
- Score all indexed files
- Return ranked results with context
- Display formatted output
-
For
statuscommand:- List all indexes in .coditect/indexes/
- Show statistics (files, lines, last updated)
- Check index health
Storage Location
Indexes stored in: .coditect/indexes/
.coditect/indexes/
├── my-project.json # Full index data
├── my-project.summary.md # Human-readable summary
├── external-lib.json
├── external-lib.summary.md
└── registry.json # Index registry
Best Practices
DO
- Index frequently used repositories - Reference code you often need
- Re-index after major changes - Keep indexes current
- Use specific search terms - "JWT auth middleware" not just "auth"
- Combine with pattern search - Find similar implementations
DON'T
- Don't index huge monorepos entirely - Use --max-files
- Don't index generated code - Exclude build/, dist/
- Don't rely solely on search - Verify results are relevant
- Don't forget to update - Stale indexes give stale results
Success Metrics
| Metric | Target | Measurement |
|---|---|---|
| Search relevance | >80% | Top 3 results are relevant |
| Index speed | <60s | For typical project (<500 files) |
| Storage efficiency | <10MB | Per indexed repository |
| Update performance | <30s | Incremental updates |
Source Reference
This pattern was extracted from DeepCode (HKUDS/DeepCode) multi-agent system.
Original files:
tools/code_indexer.py(1,800 lines)tools/code_reference_indexer.py(500+ lines)
Original codebase stats:
- 51 Python files analyzed
- 33,497 lines of code
- 12 patterns extracted
See /submodules/labs/DeepCode/DEEP-ANALYSIS.md for complete analysis.
Implementation Notes
This command provides the interface. Full implementation requires:
- Embedding generation - Optional: use LLM for semantic embeddings
- AST parsing - For accurate export/import extraction
- Incremental updates - Only re-index changed files
- Caching layer - For fast repeated searches
Current implementation uses keyword-based search. Future versions may add vector similarity for semantic matching.
Required Tools
| Tool | Purpose | Required |
|---|---|---|
Bash | Execute indexer, file discovery | Yes |
Read | Analyze file contents | Yes |
Write | Save index JSON files | Yes |
Glob | Find files matching extensions | Yes |
Storage:
- Index location:
.coditect/indexes/ - Registry:
.coditect/indexes/registry.json
Output Validation
For index command:
- Target path validated
- Files discovered and filtered
- Index JSON created
- Summary markdown created
- Registry updated
For search command:
- Indexes loaded
- Results ranked by relevance
- Top matches displayed with context
- Related files identified
For status command:
- All indexes listed
- Statistics shown (files, lines, last updated)
Success Output
When indexer operation completes:
✅ COMMAND COMPLETE: /code-indexer
Command: <index|search|status>
Repository: <repo-name>
Files: N indexed
Lines: X,XXX total
Output: .coditect/indexes/<repo>.json
Completion Checklist
Before marking complete:
- Command parsed correctly
- Target path validated
- Operation completed
- Index saved (for index command)
- Results displayed (for search command)
Failure Indicators
This command has FAILED if:
- ❌ Path not found
- ❌ No files matched extensions
- ❌ Index not saved
- ❌ Search returned no results
When NOT to Use
Do NOT use when:
- Searching single file (use grep)
- Need live code analysis (use /analyze)
- Small codebase (<10 files)
Anti-Patterns (Avoid)
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Index everything | Slow, bloated | Use --max-files |
| Stale indexes | Irrelevant results | Run update regularly |
| Vague search | Too many results | Use specific terms |
Principles
This command embodies:
- #2 Search Before Create - Find existing code
- #9 Based on Facts - Relevance scoring
- #6 Clear, Understandable - Structured output
Full Standard: CODITECT-STANDARD-AUTOMATION.md