Skip to main content

CLAUDE.md

This file provides guidance to Claude Code and other AI assistants when working with this codebase.

🚀 Starting a New Session?

LATEST STATUS: ✅ Build #18 OPERATIONAL - Pods Healthy, Ready for MVP Scaling (Build #18 Success Report)

  • Build #18 (2025-10-28): ✅ DEPLOYED AND OPERATIONAL
    • CrashLoopBackOff Fixed: Permission denied errors resolved (non-root execution working)
    • All Pods Healthy: coditect-combined-0 and coditect-combined-1 serving traffic
    • Services Running: theia IDE, CODI2 monitoring, File monitor, NGINX all operational
    • Image: 8449bd02-7a28-4de2-8e26-7618396b3c2f
    • Security: Non-root user (coditect, UID 1001, GID 1001)
    • 📄 Fix Applied: Changed log directories from /var/log/* to /app/logs/* (user-writable)
  • ⚠️ MVP SCALING REQUIREMENT (MVP Scaling Analysis):
    • Current: 3 pods = 3-6 user capacity
    • Required: 10-15 pods for 20 user pilot/beta testing
    • Cost Impact: $150/month → $500-750/month
    • Action: Scale to 10 pods + deploy HPA before MVP launch
    • Command: kubectl scale statefulset/coditect-combined --replicas=10 -n coditect-app
  • 📊 Comprehensive Status Report (2025-10-28T00:50:00Z Status Report):
    • Complete Build #18 analysis and deployment results
    • MVP capacity planning and cost analysis
    • Risk assessment: Capacity shortage blocking MVP without scaling
    • Next actions: Scale immediately, deploy HPA, test with beta users
  • Previous Achievements:
    • Build #18 Features: Multi-llm CLI suite (7 providers), zsh + oh-my-zsh, Fixed extensions (38 VSIX), Complete branding
    • Multi-llm: Claude CLI, OpenAI CLI, Aider, Shell-GPT, Grok CLI, Anthropic SDK, Gemini CLI
    • StatefulSet Migration: Complete with persistent storage (100 GB workspace + 10 GB config per pod)
    • Starter Configuration: 10-20 user capacity tier ready (4 vCPU, 8 GB RAM per pod)
  • Socket.IO Investigation (Oct 20, 2025):
    • Root Cause #1 - CDN Caching: GCP CDN was caching Socket.IO requests with stale session IDs → FIXED
      • Solution Applied: Disabled CDN in BackendConfig (k8s/backend-config-no-cdn.yaml)
      • Validation: CDN headers removed from responses
    • Root Cause #2 - Session Affinity Missing: GCP backend service shows SESSION_AFFINITY: NONEIN PROGRESS
      • Problem: Service missing BackendConfig annotation (NEG requirement)
      • Solution Applied: Annotated Service with BackendConfig reference
      • Status: Propagating to GCP load balancer (2-5 minutes)
    • 📋 Additional Fixes Recommended (from reference docs):
      • P0: WebSocket annotation to Ingress (85% success probability)
      • P0: Health check endpoint creation (70% success probability)
      • P1: Increased backend timeout and connection draining optimization
      • P0: Add PVCs for workspace persistence ✨ NEW (prevents data loss)
  • 🗄️ Hybrid Storage Architecture (ADR-028, Part 2) (2025-10-28):
    • Problem Identified: Pod-local storage causes data loss on scale-down events
    • Solution Designed: Hybrid Storage = Shared Base (Docker layers) + User PVCs (10 GB each)
    • Cost Savings: $7.05/user/month (96% cheaper than NFS $205/month)
    • Performance: <1ms latency with GCE Persistent Disk SSD (15K-30K IOPS)
    • Scalability: 10 user slots per pod, dynamic assignment via backend routing
    • 📋 Implementation Timeline: 30-38 hours (6 phases: Image layers, PVC provisioning, StatefulSet slots, routing, testing, backups)
    • 📊 Daily Backups: VolumeSnapshots with 7-day retention for disaster recovery
    • 🔧 Status: Design complete, ready for implementation (Phase 1: Modify Dockerfile)
  • Comprehensive Documentation:
  • Next Steps (Priority Order):
    1. 🔴 P0 CRITICAL: Add PVCs to deployment (stop data loss immediately) - 30 min
    2. 🔴 P0 CRITICAL: Disable theia Cloud session timeout - 15 min
    3. Validate session affinity propagation (check GCP backend service)
    4. Apply WebSocket annotation to Ingress - 15 min
    5. Create /health and /ready endpoints in nginx - 30 min
    6. Run automated diagnostic script for comprehensive validation
    7. Test at https://coditect.ai/theia after all fixes applied
    8. Verify persistence: Create file in pod → delete pod → check file still exists
  • Architecture: See "Production Architecture (GKE)" section below

IMPORTANT: See docs/reading-order.md for the complete list of documents to read before beginning work.

Critical first reads:

Build #17 Session Exports (2025-10-27):

🛠️ Quick Reference Commands

# Development
npm run dev # Start dev server
npm run build # Production build
npm run type-check # TypeScript check

# Backend (Rust)
cd backend
cargo build --release # Build backend
cargo test # Run unit tests

# GKE Deployment (PRODUCTION)
kubectl get pods -n coditect-app # View all pods
kubectl get services -n coditect-app # View services
kubectl port-forward -n coditect-app service/coditect-api-v5-service 8080:80 # Test V5 API locally
kubectl logs -f deployment/coditect-api-v5 -n coditect-app # View V5 logs

# Testing & Quality Assurance
./scripts/test-runner.sh # Run comprehensive test suite
npm run test # Frontend unit tests
cargo test # Backend unit tests (from backend/)

# File Monitor (Rust-based audit logging)
scripts/monitor/start-file-monitor.sh # Start daemon (use --poll on WSL2!)
tail -f .coditect/logs/events.log # View JSON events

# Combined Build & Deploy (Frontend + theia)
npm run build # Build frontend first (creates dist/)
gcloud builds submit --config cloudbuild-combined.yaml . # Deploy to GKE

⚠️ Note: All production services run on GKE, not Cloud Run


⚡ Skills - Check FIRST Before Reinventing

CRITICAL: Before implementing ANY workflow, check if a skill already exists!

Quick Skill Lookup

Step 1: Check Registry

# View all available skills (14 total)
cat .claude/skills/REGISTRY.json | jq '.skills[] | {name, description, tags}'

Step 2: Use Skill If Match Found

# Example: Deployment workflow
cd .claude/skills/build-deploy-workflow
./core/deploy.sh --build-num=20 --changes="Feature X"

# Example: Git commit
cd .claude/skills/git-workflow-automation
./core/git-helper.sh --commit --message="Fix bug" --type=fix

# Example: Code editor (autonomous modification)
"Use code-editor skill to implement user profile editing with backend + frontend"

Available Production Skills (5 High-Value)

SkillUse WhenTime Saved / Token EfficiencyIntegration
code-editorMulti-file code modifications (3+ files)30-40% token reductionOrchestrator Phase 3
build-deploy-workflowBuilding & deploying to GKE40 min (45→5)Deployment automation
gcp-resource-cleanupCleaning legacy GCP resources28 min (30→2)Cost optimization
git-workflow-automationGit operations, commits, PRs8 min (10→2)Git workflows
cross-file-documentation-updateSyncing 4 doc files13 min (15→2)Documentation sync

Additional Technical Skills (9 Total)

  • deployment-archeology - Find previous successful deployments
  • foundationdb-queries - FDB query patterns, tenant isolation
  • rust-backend-patterns - Actix-web patterns for T2
  • search-strategies - Grep/Glob optimization
  • framework-patterns - Event-driven, FSM patterns
  • evaluation-framework - llm-as-judge patterns
  • production-patterns - Circuit breakers, error handling
  • communication-protocols - Multi-agent handoff
  • google-cloud-build - Cloud Build optimization
  • internal-comms - Team communication
  • multi-agent-workflow - Token management, orchestration

Token Efficiency Strategy

Why Skills First:

  • ❌ Reinventing solution: 5,000-10,000 tokens
  • ✅ Using existing skill: 1,500-2,500 tokens (70-80% reduction)

Pattern:

  1. Check .claude/skills/REGISTRY.json for matching skill
  2. Load full SKILL.md only if match found (progressive disclosure)
  3. Execute skill's proven workflow
  4. Fall back to agent/custom solution only if no skill exists

Registry Update (after adding new skills):

python3 .claude/scripts/build-skill-registry.py

See: .claude/skills/README.md for complete skill documentation


🔍 Deployment Archeology - Finding Previous Successful Builds

When deployments fail, use this process to find and restore working configurations:

Quick Process

# 1. Get deployment creation date
kubectl get deployment <NAME> -n coditect-app -o jsonpath='{.metadata.creationTimestamp}'

# 2. Find successful builds on that date
gcloud builds list --filter="createTime>='YYYY-MM-DDT00:00:00Z'" --limit=20

# 3. Analyze successful build config
gcloud builds describe <BUILD_ID> --format="yaml(steps,options)"

# 4. Check git history for archived files
git log --all --full-history -- <FILENAME>

# 5. Restore from archive if needed
cp docs/99-archive/deployment-obsolete/<FILE> ./<FILE>

Example: Combined Service Recovery (Oct 18, 2025)

Problem: New dockerfile.combined failing to build

Solution Found:

  1. Deployment created: 2025-10-13T09:58:29Z
  2. Successful build: 6e95a4d9 at 09:50:07Z (8 min before)
  3. Used file: dockerfile.local-test (archived, not dockerfile.combined!)
  4. Machine: E2_HIGHCPU_32 (32 CPUs, NOT 8 CPUs)
  5. Node memory: 8GB heap (NODE_OPTIONS=--max_old_space_size=8192)
  6. Deploy method: gke-deploy (NOT kubectl)

Key Files:

  • Working Dockerfile: dockerfile.local-test (restored from docs/99-archive/)
  • Cloud Build config: cloudbuild-combined.yaml (updated with proven settings)
  • Pre-requisite: Frontend must be built first (npm run build creates dist/)

See: .claude/skills/deployment-archeology/SKILL.md for complete process


🔧 API URL Configuration - Production Best Practices

Critical Issue Fixed (Oct 20, 2025): Frontend was calling http://localhost:8080 instead of /api/v5

The Problem

Deployed frontend bundle contained wrong API URL:

// Deployed bundle (WRONG)
VITE_API_URL=http://localhost:8080

Root Cause: .env file being included in Docker build despite .dockerignore, and Vite processing it at build time.

The Solution Journey (12 Attempts)

AttemptStrategyResultWhy It Failed/Worked
#9Added ENV VITE_API_URL=/api/v5 to Dockerfile❌ Failed.env processed by Vite before ENV set
#10Added RUN rm -f .env* + ENV❌ FailedDocker layer cache reused old build
#11Fixed .env source file to /api/v5❌ FailedDocker cache didn't detect .env content change
#12Hardcoded /api/v5 in TypeScript source✅ SUCCESSSource change invalidates cache

Production Best Practice

File: src/services/api-client.ts

BEFORE (Environment-Dependent):

// Relies on environment variables - can break in deployment
const API_BASE_URL = import.meta.env.VITE_API_URL || '/api/v5';

AFTER (Production-Ready):

// API base URL - hardcoded for production reliability
// Relative path works in all environments (dev, staging, prod)
const API_BASE_URL = '/api/v5';

Why This Works:

  • ✅ Source code change invalidates Docker layer cache
  • ✅ No dependency on environment variables
  • ✅ Relative path /api/v5 works in ALL environments
  • ✅ Clear and explicit - no hidden configuration
  • ✅ Eliminates entire class of deployment issues

Lessons Learned

  1. Avoid .env files for production config - Too easy for them to be included in builds
  2. Hardcode relative paths in source - More reliable than environment variables
  3. Docker layer caching - ENV changes and file deletions don't invalidate previous layers
  4. Source code changes - Only way to guarantee Docker cache invalidation

Verification

# Check deployed bundle (should NOT contain VITE_API_URL)
curl -s https://coditect.ai/assets/*.js | grep -o 'VITE_API_URL[^,}]*'
# Expected: Nothing (hardcoded, no env var in bundle)

# Or check for hardcoded path
curl -s https://coditect.ai/assets/*.js | grep -o '/api/v5'
# Expected: /api/v5

Build #12 Details

  • Status: SUCCESS (10m20s)
  • Build ID: 13e4134c-818e-4192-9963-c7dce7a02265
  • Image: us-central1-docker.pkg.dev/serene-voltage-464305-n2/coditect/coditect-combined:13e4134c-818e-4192-9963-c7dce7a02265
  • Change: Hardcoded /api/v5 in src/services/api-client.ts:6
  • Pending: Deploy and verify

Project Overview

Coditect AI IDE (T2) - Browser-based IDE built on Eclipse theia with:

  • 16+ local llm models via LM Studio (no cloud)
  • Multi-agent system (MCP + A2A protocols)
  • Multi-session architecture for parallel work
  • FoundationDB persistence + OPFS browser cache
  • Privacy-first: local-only processing

Key Architectural Decision

Foundation: Eclipse theia (EPL 2.0) - NOT building from scratch

✅ Saves 6-12 months development time ✅ Free commercial use (no license fees) ✅ VS Code extension compatible ✅ Monaco editor + terminal included

Critical: We're customizing theia with extensions, not building a new IDE.

Architecture Evolution

Comprehensive documentation of design evolution:

  1. ADR-028: theia IDE Integration Evolution (2025-10-27)

    • Why we adopted Eclipse theia - Decision rationale and comparison with custom IDE
    • Technical integration - 68 theia packages, custom branding, VS Code extensions
    • Benefits realized - 9-10 months time savings, $225K cost savings, 3x more features at MVP
    • Current architecture - Combined deployment with frontend + theia on GKE
    • Future roadmap - MCP integration, multi-llm chat panel, collaborative editing
    • Lessons learned - Production patterns and best practices
  2. ADR-029: StatefulSet Persistent Storage Migration (2025-10-27)

    • The problem - Data loss on pod restarts (critical user impact)
    • The solution - Migration from Deployment → StatefulSet with persistent volumes
    • Storage architecture - 100 GB workspace + 10 GB config per user (PVCs)
    • Capacity planning - Starter (10-20 users), Production (50-100), Enterprise (500+)
    • Cost analysis - $4-8/user/month storage costs (competitive with Gitpod, GitHub Codespaces)
    • Migration process - 4-step deployment with zero downtime
    • Lessons learned - Kubernetes best practices for stateful applications

🏗️ Production Architecture (GKE)

ALL services run on Google Kubernetes Engine (GKE), NOT Cloud Run:

┌─────────────────────────────────────────────────┐
│ GKE Ingress (34.8.51.57) │
│ - coditect.ai, www.coditect.ai, api.coditect.ai│
└──────────────┬──────────────────────────────────┘

┌───────┴────────┐
│ │
┌──────▼────────┐ ┌───▼──────────────────┐
│ coditect- │ │ coditect-api-v5 │ ← NEW Rust API
│ combined │ │ (3 pods) │
│ (3 pods) │ │ - Actix-web + FDB │
│ ├─ V5 React │ │ - JWT auth │
│ ├─ theia IDE │ │ - Port 8080 │
│ └─ NGINX │ └──────────────────────┘
└───────────────┘

┌──────▼──────────────────────────────────┐
│ FoundationDB │
│ - foundationdb-0/1/2 (StatefulSet) │
│ - fdb-proxy (2 pods) │
│ - Internal LB: 10.128.0.10:4500 │
└──────────────────────────────────────────┘

Current Deployment Status (Oct 19, 2025):

  • coditect-api-v5 (11d old) - V5 Rust backend with JWT auth - WORKING
  • coditect-combined (5d old) - Frontend + theia with MCP SDK fix - WORKING
    • 3/3 pods Running, health checks passing
    • Bundled backend resolves ESM/CJS incompatibility
    • Health check endpoint: / with 60s timeout
  • coditect-api-v2 (19d old) - LEGACY V2 API, TO BE DELETED in Sprint 3
  • Cloud Run deployment - MISTAKEN deployment, TO BE DELETED

Sprint 3 Goals:

  • Integrate frontend with V5 Rust API (replace V2 API calls)
  • Enable LM Studio multi-llm features (16+ models)
  • Delete legacy V2 API and Cloud Run deployment
  • End-to-end testing with real user workflows

🗄️ Storage Architecture - Hybrid Approach

Problem: Pod-local storage tied workspace files to specific pods → data loss on scale-down

Solution: Hybrid Storage = Shared Base (Docker image layers) + User-Specific PVCs

Architecture Overview

┌──────────────────────────────────────────────────────┐
│ Docker Image Layers (Shared Base - Read-Only) │
│ ├─ System dependencies (git, npm, python, etc.) │
│ ├─ .coditect configs (5 agents, 2 skills, 15 tools) │
│ ├─ Multi-llm CLIs (7 providers: Claude, OpenAI...) │
│ └─ theia extensions (38 VSIX plugins) │
└──────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────┐
│ StatefulSet with Pre-Attached PVC Slots │
│ │
│ Pod-0 (10 user slots) │
│ ├─ /workspace/slot-0 → workspace-user-001 (10 GB) │
│ ├─ /workspace/slot-1 → workspace-user-002 (10 GB) │
│ ├─ ... │
│ └─ /workspace/slot-9 → workspace-user-010 (10 GB) │
│ │
│ Pod-1 (10 user slots) │
│ ├─ /workspace/slot-0 → workspace-user-011 (10 GB) │
│ ├─ ... │
│ └─ /workspace/slot-9 → workspace-user-020 (10 GB) │
│ │
│ Pod-2 (10 user slots) │
│ ├─ ... │
└──────────────────────────────────────────────────────┘

Key Components

1. Shared Base (Docker Image Layers)

  • Size: ~5 GB compressed (baked into image)
  • Content: System tools, .coditect configs, theia extensions, multi-llm CLIs
  • Storage: Pulled once per node, shared across all pods
  • Cost: $0 (no PVC charges, included in image)

2. User-Specific PVCs

  • Size: 10 GB per user (GCE Persistent Disk SSD)
  • Performance: <1ms latency, 15K-30K IOPS
  • Access Mode: ReadWriteOnce (single pod mounting)
  • Lifecycle: Independent of pods - persists across scale-down/up
  • Cost: $0.20/GB/month × 10 GB = $2.00/user/month

3. Pre-Attached PVC Slots

  • Slots per pod: 10 (configurable)
  • Total capacity: 3 pods × 10 slots = 30 concurrent users (minimum)
  • Scaling: HPA adds pods as needed (max 30 pods = 300 users)
  • Assignment: Backend routes users to pods with free slots via Kubernetes API

User Experience

Login Flow:

  1. User logs in → Backend checks for existing assignment
  2. If exists: Route to assigned pod + slot
  3. If new: Find pod with free slot → Create assignment → Attach PVC
  4. User workspace mounted at /workspace (transparent to user)

Persistence Guarantee:

  • User creates file in workspace → Written to user's PVC
  • Pod scales down → PVC remains (not deleted)
  • Pod scales up → Backend finds user's PVC → Mounts to new pod + slot
  • User logs in again → Same files, same state (100% persistence)

Cost Comparison

OptionStorage TypeMonthly Cost (20 users)LatencyPOSIX
HybridImage + PVCs$141 ($7.05/user)<1ms
User PVCs10 GB PVCs$500 ($25/user)<1ms
Google Cloud StorageGCS buckets$600 ($30/user)50-100ms
NFSFilestore$4,100 ($205/user)1-5ms

Cost Breakdown (Hybrid):

  • Shared base: $0 (Docker image layers, no PVC)
  • User PVCs: $2.00/user/month (10 GB SSD)
  • Pod compute: $4.80/user/month (4 vCPU, 8 GB RAM, 2 users/pod)
  • Backup snapshots: $0.26/user/month (10 GB × $0.026/GB/month)
  • Total: $7.05/user/month

Savings: 96% cheaper than NFS, 72% cheaper than plain user PVCs

Implementation Phases

See: docs/07-adr/adr-028-part-2-hybrid-storage-decision-implementation.md for complete implementation plan

Timeline: 30-38 hours (4-5 days)

PhaseTaskDurationStatus
1Modify Dockerfile (image layers)2h⏳ Pending
2User PVC provisioning script3h⏳ Pending
3StatefulSet with pre-attached slots6-8h⏳ Pending
4Session-based routing logic12-16h⏳ Pending
5Testing & validation5-6h⏳ Pending
6Backup strategy (VolumeSnapshots)2-3h⏳ Pending

Backup & Disaster Recovery

Daily Backups (VolumeSnapshots):

  • Schedule: 2 AM daily (CronJob)
  • Retention: 7 days
  • Scope: All user PVCs
  • Recovery: kubectl create pvc --from-snapshot=<snapshot-name>

Cost: $0.026/GB/month × 10 GB × 7 snapshots = $1.82/user total ($0.26/user/month amortized)

Kubernetes Deployment YAMLs:


🔑 Critical Architecture Insights

Non-obvious patterns that drive the entire codebase:

1. Eclipse theia ≠ VS Code

  • theia is a framework for building VS Code-like IDEs
  • Uses dependency injection (InversifyJS)
  • Extensions register via ContainerModule.bind()
  • Look for: *-module.ts and *-contribution.ts, NOT main.ts

2. MCP Tools vs A2A Messages

  • MCP: AI agent → external tool (llm, file ops, database)
  • A2A: AI agent → AI agent (coordination, delegation)
  • Example: Code gen agent uses MCP to call LM Studio, then A2A to request review

3. Session Isolation Architecture

  • Every action happens in session context (sessionId)
  • Sessions ≠ browser tabs (logical workspaces)
  • Multiple sessions can exist in same tab
  • FDB pattern: All keys prefixed tenant_id/session_id/...

4. OPFS vs FoundationDB Split

  • FoundationDB: Source of truth (sessions, files, agent state)
  • OPFS: Browser cache for offline/performance
  • Sync pattern: Write to FDB, cache to OPFS, read OPFS with FDB fallback
  • Critical: NEVER use OPFS as primary storage

5. Agent Execution Model

  • Agents are stateless - state lives in sessionId context
  • Agent "memory" = FDB queries with session filters
  • Sub-agents are specialized skill modules, not child processes

6. V4 Reference Usage

  • V4 = custom web app with K8s pods
  • T2 = theia extensions (different architecture)
  • V4 useful for: FDB patterns, agent logic, MCP/A2A examples
  • V4 NOT useful for: UI, file ops, IDE features (theia has these)

Technology Stack

Core

ComponentTechnologyVersionPurpose
IDE FrameworkEclipse theia1.45+Foundation
FrontendReact + TypeScript18 + 5.3UI layer
editorMonaco editor0.45Code editing
terminalxterm.js5.3terminal
UIChakra UI2.8Components
StateZustand4.4State mgmt
DBFoundationDB7.1+Persistence
Browser StorageOPFS-Cache

Protocols

ProtocolPurpose
MCPModel Context Protocol (Anthropic) - Tools/Resources
A2AAgent2Agent (Google/Linux) - Agent collaboration
LM Studio APIOpenAI-compatible local llm
Claude Code APIAnthropic AI assistant

📁 Project Structure

See: docs/project-structure.md for complete directory tree.

Quick overview:

/workspace/PROJECTS/t2/
├── .claude/ # Claude Code config (6 agents, 24 commands, 2 submodules)
├── docs/ # All documentation (see DOCUMENTATION-index.md)
├── src/ # V5 Frontend (React + theia)
├── backend/ # Rust backend (Actix-web, deployed to GCP)
├── .theia/ # theia config (16+ models, 4 MCP servers, 3 agents)
└── archive/ # V4 reference materials (submodules)

⚠️ Important:

  • src/ = V5 Frontend (ACTIVE)
  • backend/ = V5 Backend (ACTIVE)
  • archive/v4-reference/ = V4 Reference (NOT ACTIVE - reference only)

Development Workflows

See: docs/development-guide.md for detailed code examples and workflows.

Common Tasks

  • Add New Agent - Extend Agent base class, use MCP for tools, A2A for collaboration
  • Add theia Extension - Use ContainerModule.bind() with dependency injection
  • Add MCP Tool - Register via server.setRequestHandler()

When Working on Code

If asked to build IDE features:

  • STOP - Check if theia already has it
  • ✅ File explorer, editor tabs, terminal, settings → theia has it
  • Build as theia extension, don't reinvent

If asked about persistence:

  • Use fdbService (primary) or opfsService (cache)
  • Don't create new persistence

🤖 Using Specialized Agents

The .claude/agents/ directory contains 12 specialized sub-agents that can be invoked for specific tasks. These agents work alongside you to handle focused responsibilities.

Agent Categories:

  • 8 Original agents (codebase analysis, research, organization)
  • 4 TDD-focused agents ✨ NEW (2025-10-20) (validation, quality gates, research)

When to Use Agents

Invoke agents proactively when tasks match their specializations:

# Example: Analyzing codebase implementation
"Use codebase-analyzer to understand how authentication works in auth.rs"

# Example: Finding files and locations
"Use codebase-locator to find all session management files"

# Example: Organizing project structure
"Use project-organizer to clean up the root directory"

# Example: TDD validation
"Use tdd-validator to run tests before marking task complete"

# Example: Quality gate validation
"Use quality-gate to validate security, performance, and accessibility"

Available Agents

AgentPurposeWhen to Invoke
orchestratorMulti-agent coordinationComplex workflows (full-stack features, security audits)
codebase-analyzerAnalyze implementation detailsUnderstanding HOW code works
codebase-locatorFind code locationsSearching for specific components
codebase-pattern-finderIdentify patternsFinding similar implementations
thoughts-analyzerAnalyze decision-makingUnderstanding thought processes
thoughts-locatorFind documentationSearching thoughts/ directory
web-search-researcherWeb researchGathering external information
project-organizerMaintain clean structureOrganizing files/directories
tdd-validatorTDD validationBefore marking tasks complete, enforcing RED-GREEN-REFACTOR
quality-gateComprehensive quality checkPre-deployment validation (security, performance, accessibility)
completion-gateBinary COMPLETE/INCOMPLETEEvidence-based task completion validation
research-agentTechnical researchImplementation decisions, library comparisons, best practices

Project Organizer Agent (NEW)

Primary responsibility: Maintain production-ready directory structure.

Use this agent when:

  • Root directory is cluttered with research docs, session exports, or analysis files
  • Need to reorganize files into proper subdirectories
  • Want to audit project structure for production readiness
  • Cleaning up after long research/implementation sessions

Example usage:

# Clean up cluttered root directory
"Use project-organizer to analyze the root directory and move files to proper locations"

# Audit project structure
"Use project-organizer to check if our directory structure follows production standards"

# Organize after session
"Use project-organizer to move session exports and research docs to appropriate folders"

What it does:

  1. ✅ Analyzes directory structure
  2. ✅ Categorizes misplaced files (session exports, research docs, status reports)
  3. ✅ Creates organization plan with target locations
  4. ✅ Executes moves using git mv (preserves history)
  5. ✅ Commits changes with descriptive messages

Agent capabilities:

  • Knows production-ready directory structure for T2 project
  • Follows organizational rules (see .claude/agents/project-organizer.md)
  • Uses git mv to preserve file history
  • Creates target directories if needed
  • Groups related moves in atomic commits

Organizational rules (enforced by agent):

✅ Root should contain: package.json, tsconfig.json, vite.config.ts,
README.md, CLAUDE.md, docker files, k8s manifests

❌ Root should NOT contain: Research docs, session exports, status reports,
analysis docs, implementation plans, checkpoint files

→ Target locations:
- Session exports → docs/09-sessions/
- Research documents → docs/11-analysis/
- Status reports → docs/10-execution-plans/
- Implementation plans → docs/10-execution-plans/
- Development guides → docs/01-getting-started/
- Reference materials → docs/reference/
- Sprint checkpoints → thoughts/shared/research/

Workflow with project-organizer:

  1. Agent analyzes root directory
  2. Agent creates organization plan (table of moves)
  3. Agent presents plan for approval
  4. Upon approval, agent executes moves with git mv
  5. Agent commits changes and pushes to repository

See: .claude/agents/project-organizer.md for complete rules and categorization logic.

How to Invoke Agents

Direct invocation:

"Use [agent-name] to [specific task]"

Parallel invocation (multiple agents):

"Use codebase-locator and codebase-analyzer in parallel to find and understand the authentication system"

Agent coordination:

"Use project-organizer to clean root, then use codebase-analyzer to verify no broken imports"

Agent Best Practices

  1. Be specific - Give agents clear, focused tasks
  2. Use right agent - Match task to agent specialization
  3. Review results - Agents return reports for you to act on
  4. Combine agents - Use multiple agents for complex workflows

See: .claude/CLAUDE.md for complete agent documentation and autonomous development mode.


Architecture Decision Records (ADRs)

Always read relevant ADRs before making changes.

Most critical:

ADRDecisionWhen to Read
ADR-014Eclipse theiaREAD THIS FIRST
ADR-010MCP ProtocolTool/resource work
ADR-013Agentic ArchitectureAgent system work
ADR-004FoundationDBPersistence changes
ADR-007Multi-SessionSession features

Full list: See docs/07-adr/ (24 ADRs total)


⚠️ Common Pitfalls

Top 5 mistakes that break architecture:

  1. Building features theia has → Search theia docs first
  2. Using global state → Use dependency injection (@inject)
  3. Calling llms directly → Use MCP (mcpClient.callTool)
  4. Copying V4 UI → Use theia widgets, not V4 components
  5. OPFS as primary storage → Write to FDB first, cache to OPFS

Full list: See docs/development-guide.md#troubleshooting (10 total)


Environment Setup

# Configure Claude Code output token limit
export CLAUDE_CODE_MAX_OUTPUT_TOKENS=8192
docker-compose up -d                     # Start all services
# Access: http://localhost:3000

Local Development

npm install
npm run dev
# Access: http://localhost:5173

LM Studio Configuration

  • Host: host.docker.internal (Docker) or localhost (local)
  • Port: 1234
  • API: OpenAI-compatible at /v1
  • Models: 16+ available (qwen, llama, deepseek, etc.)

See: docs/01-getting-started/development-modes.md for deployment modes (Container-Only, Volume Mount, Remote SSH, Native Desktop)


File Monitor (Audit Logging)

Rust-based file system monitoring for compliance.

# Start daemon (use --poll on WSL2!)
scripts/monitor/start-file-monitor.sh

# View events
tail -f .coditect/logs/events.log

See: docs/file-monitor/dual-log-configuration.md


Git Workflow

Strategy:

  1. Create meaningful commits
  2. Use conventional commit format
  3. Reference issues/ADRs
  4. Keep commits atomic

Repository: https://github.com/coditect-ai/LM-Studio-multiple-llm-IDE.git

See: docs/git-workflow.md for detailed git configuration, conventional commits, hooks, and troubleshooting.


Important Constraints

What NOT to Do

❌ Don't build IDE features from scratch → Use theia ❌ Don't create custom persistence → Use fdbService/opfsService ❌ Don't make agents without sessions → Always pass sessionId ❌ Don't bypass MCP → Use mcpClient for llm access ❌ Don't commit secrets → Use .env variables

What TO Do

✅ Build theia extensions for new IDE features ✅ Use existing services (llmService, mcpClient, a2aClient) ✅ Follow agent patterns (extend Agent base class) ✅ Make everything session-aware (pass sessionId) ✅ Use MCP for tools, A2A for agents ✅ Document major decisions (create ADRs)


Success Criteria

When implementing features:

Works in theia - Extensions integrate properly ✅ Session-aware - All state tied to sessionId ✅ Uses MCP/A2A - Protocols used correctly ✅ Privacy-first - No cloud calls (except optional Claude) ✅ Well-documented - ADRs for major decisions ✅ Type-safe - Full TypeScript coverage


Remember

  1. We're building on theia, not from scratch
  2. Use MCP for tools, A2A for agents
  3. Everything is session-aware
  4. Privacy-first: local llms only
  5. Document major decisions as ADRs
  6. Eclipse theia = EPL 2.0 = Free commercial use

When in doubt:

  1. Read ADR-014 (theia decision)
  2. Check if theia already has the feature
  3. Build as extension, not standalone
  4. Follow existing patterns in codebase