Skip to main content

Code Indexer

System Prompt

⚠️ EXECUTION DIRECTIVE: When the user invokes this command, you MUST:

  1. IMMEDIATELY execute - no questions, no explanations first
  2. ALWAYS show full output from script/tool execution
  3. ALWAYS provide summary after execution completes

DO NOT:

  • Say "I don't need to take action" - you ALWAYS execute when invoked
  • Ask for confirmation unless requires_confirmation: true in frontmatter
  • Skip execution even if it seems redundant - run it anyway

The user invoking the command IS the confirmation.


Usage

/code-indexer

Index and search codebase: $ARGUMENTS

Arguments

$ARGUMENTS - Command and Options

Commands:

  • index <path> - Index a directory or repository
  • search <query> - Search indexed repositories for relevant code
  • status - Show indexing status and statistics
  • list - List all indexed repositories
  • update <repo> - Update index for a specific repository
  • clear <repo> - Remove index for a repository

Options:

  • --extensions <ext1,ext2> - File extensions to index (default: py,ts,js,rs,go,java)
  • --exclude <patterns> - Patterns to exclude (default: node_modules,pycache,.git)
  • --max-files <n> - Maximum files to index (default: 1000)
  • --output <path> - Custom output path for index (default: .coditect/indexes/)

Examples

# Index current directory
/code-indexer index .

# Index specific repository
/code-indexer index ~/projects/my-app --extensions py,rs

# Search for authentication patterns
/code-indexer search "user authentication JWT token"

# Search for error handling
/code-indexer search "error handling retry logic"

# Check status
/code-indexer status

# Update existing index
/code-indexer update my-app

Purpose

Create semantic indexes of codebases for:

  1. Code generation - Find similar implementations as references
  2. Pattern matching - Locate existing patterns to follow
  3. Knowledge retrieval - RAG-style code search for agents
  4. Cross-repository learning - Learn from indexed examples

Indexing Process

Phase 1: File Discovery

  1. Scan target directory for supported file types
  2. Apply exclusion patterns (node_modules, build, etc.)
  3. Create file manifest with metadata

Phase 2: Content Analysis

For each file:

  1. Extract file summary - Purpose, exports, key functions
  2. Identify patterns - Design patterns, common idioms
  3. Generate embeddings - Semantic vectors for similarity search
  4. Extract keywords - Function names, class names, imports

Phase 3: Relationship Mapping

  1. Import/dependency analysis - What files depend on what
  2. Cross-file relationships - Shared types, utilities
  3. Pattern clusters - Group similar implementations

Phase 4: Index Persistence

  1. Save to JSON - Structured index data
  2. Create summary - Human-readable index overview
  3. Update registry - Track indexed repositories

Index Structure

{
"version": "1.0",
"created": "2025-12-14T10:00:00Z",
"repo_name": "my-project",
"repo_path": "/path/to/my-project",
"total_files": 150,
"total_lines": 25000,
"files": [
{
"path": "src/auth/handler.py",
"language": "python",
"lines": 200,
"summary": "Authentication handler with JWT validation",
"exports": ["AuthHandler", "validate_token", "create_session"],
"imports": ["jwt", "datetime", "typing"],
"patterns": ["singleton", "factory"],
"keywords": ["auth", "jwt", "token", "session", "validate"]
}
],
"relationships": [
{
"source": "src/auth/handler.py",
"target": "src/models/user.py",
"type": "imports",
"description": "Uses User model for authentication"
}
],
"pattern_clusters": {
"authentication": ["src/auth/handler.py", "src/auth/middleware.py"],
"data_models": ["src/models/user.py", "src/models/session.py"]
}
}

Search Algorithm

Relevance Scoring

def calculate_relevance(query: str, file: IndexedFile) -> float:
score = 0.0

# 1. File name similarity (30%)
if query_term in file.path.lower():
score += 0.3

# 2. Summary match (25%)
if query_term in file.summary.lower():
score += 0.25

# 3. Keyword match (25%)
keyword_matches = sum(1 for kw in file.keywords if query_term in kw)
score += min(0.25, keyword_matches * 0.05)

# 4. Export/function match (20%)
export_matches = sum(1 for exp in file.exports if query_term in exp.lower())
score += min(0.2, export_matches * 0.05)

return score

Search Output

## Search Results for: "user authentication"

### Top Matches (Score > 0.7)

1. **src/auth/handler.py** (Score: 0.92)
- Summary: Authentication handler with JWT validation
- Key exports: `AuthHandler`, `validate_token`
- Pattern: singleton, factory
- Lines: 200

2. **src/middleware/auth_middleware.py** (Score: 0.85)
- Summary: Express middleware for route authentication
- Key exports: `requireAuth`, `optionalAuth`
- Pattern: middleware, decorator
- Lines: 75

### Related Files

- src/models/user.py (referenced by top matches)
- src/utils/crypto.py (utility for auth)

Integration with Agents

For Code Implementation Agents

# In agent system prompt or context
references = code_indexer.search("authentication middleware express")

context = f"""
Reference implementations found in indexed repositories:

{format_references(references)}

Use these as guidance for implementing the requested feature.
"""

For Pattern Discovery

# Find similar patterns before implementing
similar = code_indexer.search(f"similar to {target_file}")

if similar:
print(f"Found {len(similar)} similar implementations to reference")

Execution Flow

When this command is invoked:

  1. Parse arguments - Determine command (index/search/status)

  2. For index command:

    • Validate target path exists
    • Scan for files matching extensions
    • Create todo list tracking indexing progress
    • Process files in batches
    • Build relationships
    • Save index to .coditect/indexes/
  3. For search command:

    • Load relevant indexes
    • Parse search query into terms
    • Score all indexed files
    • Return ranked results with context
    • Display formatted output
  4. For status command:

    • List all indexes in .coditect/indexes/
    • Show statistics (files, lines, last updated)
    • Check index health

Storage Location

Indexes stored in: .coditect/indexes/

.coditect/indexes/
├── my-project.json # Full index data
├── my-project.summary.md # Human-readable summary
├── external-lib.json
├── external-lib.summary.md
└── registry.json # Index registry

Best Practices

DO

  • Index frequently used repositories - Reference code you often need
  • Re-index after major changes - Keep indexes current
  • Use specific search terms - "JWT auth middleware" not just "auth"
  • Combine with pattern search - Find similar implementations

DON'T

  • Don't index huge monorepos entirely - Use --max-files
  • Don't index generated code - Exclude build/, dist/
  • Don't rely solely on search - Verify results are relevant
  • Don't forget to update - Stale indexes give stale results

Success Metrics

MetricTargetMeasurement
Search relevance>80%Top 3 results are relevant
Index speed<60sFor typical project (<500 files)
Storage efficiency<10MBPer indexed repository
Update performance<30sIncremental updates

Source Reference

This pattern was extracted from DeepCode (HKUDS/DeepCode) multi-agent system.

Original files:

  • tools/code_indexer.py (1,800 lines)
  • tools/code_reference_indexer.py (500+ lines)

Original codebase stats:

  • 51 Python files analyzed
  • 33,497 lines of code
  • 12 patterns extracted

See /submodules/labs/DeepCode/DEEP-ANALYSIS.md for complete analysis.

Implementation Notes

This command provides the interface. Full implementation requires:

  1. Embedding generation - Optional: use LLM for semantic embeddings
  2. AST parsing - For accurate export/import extraction
  3. Incremental updates - Only re-index changed files
  4. Caching layer - For fast repeated searches

Current implementation uses keyword-based search. Future versions may add vector similarity for semantic matching.

Required Tools

ToolPurposeRequired
BashExecute indexer, file discoveryYes
ReadAnalyze file contentsYes
WriteSave index JSON filesYes
GlobFind files matching extensionsYes

Storage:

  • Index location: .coditect/indexes/
  • Registry: .coditect/indexes/registry.json

Output Validation

For index command:

  • Target path validated
  • Files discovered and filtered
  • Index JSON created
  • Summary markdown created
  • Registry updated

For search command:

  • Indexes loaded
  • Results ranked by relevance
  • Top matches displayed with context
  • Related files identified

For status command:

  • All indexes listed
  • Statistics shown (files, lines, last updated)

Success Output

When indexer operation completes:

✅ COMMAND COMPLETE: /code-indexer
Command: <index|search|status>
Repository: <repo-name>
Files: N indexed
Lines: X,XXX total
Output: .coditect/indexes/<repo>.json

Completion Checklist

Before marking complete:

  • Command parsed correctly
  • Target path validated
  • Operation completed
  • Index saved (for index command)
  • Results displayed (for search command)

Failure Indicators

This command has FAILED if:

  • ❌ Path not found
  • ❌ No files matched extensions
  • ❌ Index not saved
  • ❌ Search returned no results

When NOT to Use

Do NOT use when:

  • Searching single file (use grep)
  • Need live code analysis (use /analyze)
  • Small codebase (<10 files)

Anti-Patterns (Avoid)

Anti-PatternProblemSolution
Index everythingSlow, bloatedUse --max-files
Stale indexesIrrelevant resultsRun update regularly
Vague searchToo many resultsUse specific terms

Principles

This command embodies:

  • #2 Search Before Create - Find existing code
  • #9 Based on Facts - Relevance scoring
  • #6 Clear, Understandable - Structured output

Full Standard: CODITECT-STANDARD-AUTOMATION.md