Code Indexer

System Prompt

⚠️ EXECUTION DIRECTIVE: When the user invokes this command, you MUST:

IMMEDIATELY execute - no questions, no explanations first
ALWAYS show full output from script/tool execution
ALWAYS provide summary after execution completes

DO NOT:

Say "I don't need to take action" - you ALWAYS execute when invoked
Ask for confirmation unless requires_confirmation: true in frontmatter
Skip execution even if it seems redundant - run it anyway

The user invoking the command IS the confirmation.

Usage

/code-indexer

Index and search codebase: $ARGUMENTS

Arguments

$ARGUMENTS - Command and Options

Commands:

index <path> - Index a directory or repository
search <query> - Search indexed repositories for relevant code
status - Show indexing status and statistics
list - List all indexed repositories
update <repo> - Update index for a specific repository
clear <repo> - Remove index for a repository

Options:

--extensions <ext1,ext2> - File extensions to index (default: py,ts,js,rs,go,java)
--exclude <patterns> - Patterns to exclude (default: node_modules,pycache,.git)
--max-files <n> - Maximum files to index (default: 1000)
--output <path> - Custom output path for index (default: .coditect/indexes/)

Examples

# Index current directory
/code-indexer index .

# Index specific repository
/code-indexer index ~/projects/my-app --extensions py,rs

# Search for authentication patterns
/code-indexer search "user authentication JWT token"

# Search for error handling
/code-indexer search "error handling retry logic"

# Check status
/code-indexer status

# Update existing index
/code-indexer update my-app

Purpose

Create semantic indexes of codebases for:

Code generation - Find similar implementations as references
Pattern matching - Locate existing patterns to follow
Knowledge retrieval - RAG-style code search for agents
Cross-repository learning - Learn from indexed examples

Indexing Process

Phase 1: File Discovery

Scan target directory for supported file types
Apply exclusion patterns (node_modules, build, etc.)
Create file manifest with metadata

Phase 2: Content Analysis

For each file:

Extract file summary - Purpose, exports, key functions
Identify patterns - Design patterns, common idioms
Generate embeddings - Semantic vectors for similarity search
Extract keywords - Function names, class names, imports

Phase 3: Relationship Mapping

Import/dependency analysis - What files depend on what
Cross-file relationships - Shared types, utilities
Pattern clusters - Group similar implementations

Phase 4: Index Persistence

Save to JSON - Structured index data
Create summary - Human-readable index overview
Update registry - Track indexed repositories

Index Structure

{
  "version": "1.0",
  "created": "2025-12-14T10:00:00Z",
  "repo_name": "my-project",
  "repo_path": "/path/to/my-project",
  "total_files": 150,
  "total_lines": 25000,
  "files": [
    {
      "path": "src/auth/handler.py",
      "language": "python",
      "lines": 200,
      "summary": "Authentication handler with JWT validation",
      "exports": ["AuthHandler", "validate_token", "create_session"],
      "imports": ["jwt", "datetime", "typing"],
      "patterns": ["singleton", "factory"],
      "keywords": ["auth", "jwt", "token", "session", "validate"]
    }
  ],
  "relationships": [
    {
      "source": "src/auth/handler.py",
      "target": "src/models/user.py",
      "type": "imports",
      "description": "Uses User model for authentication"
    }
  ],
  "pattern_clusters": {
    "authentication": ["src/auth/handler.py", "src/auth/middleware.py"],
    "data_models": ["src/models/user.py", "src/models/session.py"]
  }
}

Search Algorithm

Relevance Scoring

def calculate_relevance(query: str, file: IndexedFile) -> float:
    score = 0.0

    # 1. File name similarity (30%)
    if query_term in file.path.lower():
        score += 0.3

    # 2. Summary match (25%)
    if query_term in file.summary.lower():
        score += 0.25

    # 3. Keyword match (25%)
    keyword_matches = sum(1 for kw in file.keywords if query_term in kw)
    score += min(0.25, keyword_matches * 0.05)

    # 4. Export/function match (20%)
    export_matches = sum(1 for exp in file.exports if query_term in exp.lower())
    score += min(0.2, export_matches * 0.05)

    return score

Search Output

## Search Results for: "user authentication"

### Top Matches (Score > 0.7)

1. **src/auth/handler.py** (Score: 0.92)
   - Summary: Authentication handler with JWT validation
   - Key exports: `AuthHandler`, `validate_token`
   - Pattern: singleton, factory
   - Lines: 200

2. **src/middleware/auth_middleware.py** (Score: 0.85)
   - Summary: Express middleware for route authentication
   - Key exports: `requireAuth`, `optionalAuth`
   - Pattern: middleware, decorator
   - Lines: 75

### Related Files

- src/models/user.py (referenced by top matches)
- src/utils/crypto.py (utility for auth)

Integration with Agents

For Code Implementation Agents

# In agent system prompt or context
references = code_indexer.search("authentication middleware express")

context = f"""
Reference implementations found in indexed repositories:

{format_references(references)}

Use these as guidance for implementing the requested feature.
"""

For Pattern Discovery

# Find similar patterns before implementing
similar = code_indexer.search(f"similar to {target_file}")

if similar:
    print(f"Found {len(similar)} similar implementations to reference")

Execution Flow

When this command is invoked:

Parse arguments - Determine command (index/search/status)
For index command:
- Validate target path exists
- Scan for files matching extensions
- Create todo list tracking indexing progress
- Process files in batches
- Build relationships
- Save index to .coditect/indexes/
For search command:
- Load relevant indexes
- Parse search query into terms
- Score all indexed files
- Return ranked results with context
- Display formatted output
For status command:
- List all indexes in .coditect/indexes/
- Show statistics (files, lines, last updated)
- Check index health

Storage Location

Indexes stored in: .coditect/indexes/

.coditect/indexes/
├── my-project.json          # Full index data
├── my-project.summary.md    # Human-readable summary
├── external-lib.json
├── external-lib.summary.md
└── registry.json            # Index registry

Best Practices

DO

Index frequently used repositories - Reference code you often need
Re-index after major changes - Keep indexes current
Use specific search terms - "JWT auth middleware" not just "auth"
Combine with pattern search - Find similar implementations

DON'T

Don't index huge monorepos entirely - Use --max-files
Don't index generated code - Exclude build/, dist/
Don't rely solely on search - Verify results are relevant
Don't forget to update - Stale indexes give stale results

Success Metrics

Metric	Target	Measurement
Search relevance	>80%	Top 3 results are relevant
Index speed	<60s	For typical project (<500 files)
Storage efficiency	<10MB	Per indexed repository
Update performance	<30s	Incremental updates

Source Reference

This pattern was extracted from DeepCode (HKUDS/DeepCode) multi-agent system.

Original files:

tools/code_indexer.py (1,800 lines)
tools/code_reference_indexer.py (500+ lines)

Original codebase stats:

51 Python files analyzed
33,497 lines of code
12 patterns extracted

See /submodules/labs/DeepCode/DEEP-ANALYSIS.md for complete analysis.

Implementation Notes

This command provides the interface. Full implementation requires:

Embedding generation - Optional: use LLM for semantic embeddings
AST parsing - For accurate export/import extraction
Incremental updates - Only re-index changed files
Caching layer - For fast repeated searches

Current implementation uses keyword-based search. Future versions may add vector similarity for semantic matching.

Required Tools

Tool	Purpose	Required
`Bash`	Execute indexer, file discovery	Yes
`Read`	Analyze file contents	Yes
`Write`	Save index JSON files	Yes
`Glob`	Find files matching extensions	Yes

Storage:

Index location: .coditect/indexes/
Registry: .coditect/indexes/registry.json

Output Validation

For index command:

For search command:

Indexes loaded
Results ranked by relevance
Top matches displayed with context
Related files identified

For status command:

All indexes listed
Statistics shown (files, lines, last updated)

Success Output

When indexer operation completes:

✅ COMMAND COMPLETE: /code-indexer
Command: <index|search|status>
Repository: <repo-name>
Files: N indexed
Lines: X,XXX total
Output: .coditect/indexes/<repo>.json

Completion Checklist

Before marking complete:

Failure Indicators

This command has FAILED if:

❌ Path not found
❌ No files matched extensions
❌ Index not saved
❌ Search returned no results

When NOT to Use

Do NOT use when:

Searching single file (use grep)
Need live code analysis (use /analyze)
Small codebase (<10 files)

Anti-Patterns (Avoid)

Anti-Pattern	Problem	Solution
Index everything	Slow, bloated	Use --max-files
Stale indexes	Irrelevant results	Run update regularly
Vague search	Too many results	Use specific terms

Principles

This command embodies:

#2 Search Before Create - Find existing code
#9 Based on Facts - Relevance scoring
#6 Clear, Understandable - Structured output

Full Standard: CODITECT-STANDARD-AUTOMATION.md

System Prompt​

Usage​

Arguments​

$ARGUMENTS - Command and Options​

Examples​

Purpose​

Indexing Process​

Phase 1: File Discovery​

Phase 2: Content Analysis​

Phase 3: Relationship Mapping​

Phase 4: Index Persistence​

Index Structure​

Search Algorithm​

Relevance Scoring​

Search Output​

Integration with Agents​

For Code Implementation Agents​

For Pattern Discovery​

Execution Flow​

Storage Location​

Best Practices​

DO​

DON'T​

Success Metrics​

Source Reference​

Implementation Notes​

Required Tools​

Output Validation​

Success Output​

Completion Checklist​

Failure Indicators​

When NOT to Use​

Anti-Patterns (Avoid)​

Principles​