Skip to main content

Gap Analysis: Current State → Production-Ready Persistent Pods

Date: 2025-10-26 Analysis Source: claude-code-initial-setup submodule + current deployment Status: ✅ Complete - Ready for Implementation


📊 Executive Summary

Current State (Build #12): Minimal deployment with missing components Production-Ready State: Full-featured persistent pods with all development tools

Key Gaps Identified:

  1. ❌ Rust binaries NOT bundled (3 missing: api-server, codi2, file-monitor)
  2. ❌ .coditect configs NOT in Docker image (agents, skills, workflows missing)
  3. ❌ Development tools NOT pre-installed (31 Debian packages missing)
  4. ❌ Cloud Build using insufficient resources (8 CPUs, should be 32)
  5. ❌ Docker layer count potentially exceeding limits (no optimization)

Implementation Time: 2-3 hours (Dockerfile updates + Cloud Build + Deployment) Build Time Improvement: 60+ min → 30-40 min (with E2_HIGHCPU_32)


🔍 Detailed Gap Analysis

1. Rust Binaries

BinaryCurrent StateProduction StateGapLocation
coditect-v5-api❌ Deployed separately (different container)✅ Bundled in combined imageBuild Stage 1backend/target/release/api-server
codi2❌ Not included✅ Bundled at /usr/local/bin/codi2Build Stage 2archive/coditect-v4/codi2/
file-monitor❌ Not included✅ Bundled at /usr/local/bin/file-monitorBuild Stage 3src/file-monitor/

Impact:

  • Current: Backend API runs in separate pod (coditect-api-v5 deployment)
  • Missing: File monitoring and audit logging in persistent pods
  • Missing: Multi-agent coordination tools (codi2)

Solution: Add 3 Rust build stages to dockerfile.combined-fixed


2. Debian Libraries

TierPackagesCurrentProductionGap
TIER 1 (Essential)7 packages (build-essential, jq, wget, tree, htop, vim, nano)❌ Not installed✅ Pre-installedRuntime apt-get layer
TIER 2 (Dev Tools)8 packages (git-lfs, ripgrep, fzf, tmux, rsync, zip, unzip, ag)❌ Not installed✅ Pre-installedRuntime apt-get layer
TIER 3 (Python)6 packages (python3, pip, venv, dev, pylint, black)⚠️ Partially (python3 only)✅ Full Python stackConditional apt-get layer
TIER 4 (Network)7 packages (netcat, telnet, nmap, traceroute, dnsutils, etc.)❌ Not installed⏭️ OptionalSkipped for MVP

Impact:

  • Current: Pods cannot run rg (ripgrep), jq (JSON processing), tree (directory viz)
  • Missing: Essential development workflow tools
  • Missing: Debugging capabilities (htop, network tools)

Solution: Add combined RUN statement with 15-21 packages


3. .coditect Configuration

ComponentCurrentProductionGapSize
Agents❌ Not included✅ 5 agents (code-reviewer, test-writer, doc-generator, file-organizer, prior-session-agent)Copy directory~50 KB
Skills❌ Not included✅ 2 skills (session-aware, project-tracker)Copy directory~100 KB
Scripts❌ Not included✅ 15 scripts (install-dev-environment, install-modern-dev-stack, etc.)Copy directory~200 KB
Workflows❌ Not included✅ 3 workflows (code-review, testing, cicd)Copy directory~50 KB
Settings❌ Not included✅ settings.json with hooksCopy file~5 KB

Total Additional Size: ~405 KB (negligible impact on 4.5 GB image)

Impact:

  • Current: Claude Code cannot use automated agents/skills/workflows
  • Missing: Session recovery and prior work analysis
  • Missing: Automated code review and testing workflows

Solution: COPY archive/claude-code-initial-setup/.claude /app/.coditect


4. Node.js Global Packages

CategoryPackagesCurrentProductionGap
TypeScript4 packages (typescript, ts-node, eslint, prettier)⚠️ Local only (package.json)✅ Global installnpm install -g
Build Tools4 packages (vite, esbuild, webpack, webpack-cli)⚠️ vite local✅ All globalnpm install -g
Package Mgrs2 packages (pnpm, yarn)❌ Not installed✅ Global installnpm install -g
AI Tools1 package (@google/gemini-cli)❌ Not installed✅ Global installnpm install -g
Dev Utils6 packages (http-server, nodemon, concurrently, react-devtools, create-react-app, create-next-app)❌ Not installed✅ Global installnpm install -g

Total: 17 npm global packages

Impact:

  • Current: TypeScript/ESLint only available in project root (not in user workspaces)
  • Missing: Quick HTTP server for testing (http-server)
  • Missing: Auto-restart for development (nodemon)

Solution: npm install -g [17 packages]


5. Cloud Build Configuration

ParameterCurrentProductionGapImpact
Machine Type⚠️ E2_HIGHCPU_8 or default✅ E2_HIGHCPU_32Update cloudbuild-combined.yaml60+ min → 30-40 min build
Timeout⚠️ 1800s (30 min)✅ 3600s (60 min)Update cloudbuild-combined.yamlPrevents timeout failures
Node Heap⚠️ Default (1-2 GB)✅ 8 GB (NODE_OPTIONS=--max_old_space_size=8192)Add env varPrevents OOM during theia build
BuildKit❌ Not enabled✅ Enabled (DOCKER_BUILDKIT=1)Add env varBetter layer caching
Cache From❌ Not used--cache-from=...coditect-combined:latestAdd docker argFaster rebuilds

Impact:

  • Current: Builds may timeout or fail due to insufficient resources
  • Missing: Layer caching = every build starts from scratch
  • Missing: Proper heap allocation for Node.js = OOM crashes

Solution: Update cloudbuild-combined.yaml with proven E2_HIGHCPU_32 config from Oct 13


6. Docker Layer Optimization

IssueCurrentProductionGapSolution
RUN Statements⚠️ Multiple separate RUN commands✅ Combined with &&Merge RUN statementsReduce layers
Layer Count⚠️ Unknown (could exceed 127)✅ ~70 layers totalOptimize DockerfilePrevent build failures
Package Install⚠️ Separate apt-get commands✅ Single combined installCombine TIER 1+21 layer instead of 10+

Example Optimization:

BEFORE (BAD - 10+ layers):

RUN apt-get update
RUN apt-get install -y build-essential
RUN apt-get install -y jq
RUN apt-get install -y wget
# ... 10 more RUN statements

AFTER (GOOD - 1 layer):

RUN apt-get update && apt-get install -y \
build-essential jq wget tree htop vim nano \
git-lfs ripgrep fzf tmux rsync zip unzip \
&& rm -rf /var/lib/apt/lists/*

Impact:

  • Current: Risk of exceeding Docker's 127 layer limit
  • Missing: Efficient layer caching strategy
  • Missing: Cleanup of package lists (wastes space)

Solution: Combine all apt-get installs into 2 layers (TIER 1+2, TIER 3 separate)


📋 Implementation Checklist

Phase 1: Update dockerfile.combined-fixed

  • Add Rust Build Stages (Stages 1-3)

    FROM rust:1.75-slim AS v5-backend-builder
    # Build coditect-v5-api

    FROM rust:1.75-slim AS codi2-builder
    # Build codi2

    FROM rust:1.75-slim AS monitor-builder
    # Build file-monitor
  • Update Runtime Stage (Stage 4)

    # Add TIER 1 + TIER 2 packages (combined)
    RUN apt-get update && apt-get install -y \
    build-essential jq wget tree htop vim nano \
    git-lfs ripgrep fzf tmux rsync zip unzip silversearcher-ag \
    nginx curl ca-certificates \
    && rm -rf /var/lib/apt/lists/*

    # Add TIER 3 Python (separate layer, conditional)
    RUN apt-get update && apt-get install -y \
    python3 python3-pip python3-venv python3-dev \
    && rm -rf /var/lib/apt/lists/*
  • Copy Rust Binaries

    COPY --from=v5-backend-builder /build/backend/target/release/api-server /usr/local/bin/coditect-v5-api
    COPY --from=codi2-builder /build/codi2/target/release/codi2 /usr/local/bin/codi2
    COPY --from=monitor-builder /build/file-monitor/target/release/examples/monitor /usr/local/bin/file-monitor
    RUN chmod +x /usr/local/bin/coditect-v5-api /usr/local/bin/codi2 /usr/local/bin/file-monitor
  • Copy .coditect Configs

    COPY archive/claude-code-initial-setup/.claude /app/.coditect
    RUN mkdir -p /app/.coditect/logs
  • Install Node.js Global Packages

    RUN npm install -g \
    typescript ts-node eslint prettier \
    vite esbuild webpack webpack-cli \
    pnpm yarn \
    http-server nodemon concurrently \
    react-devtools create-react-app create-next-app \
    @google/gemini-cli

Phase 2: Update cloudbuild-combined.yaml

  • Update Machine Type

    options:
    machineType: 'E2_HIGHCPU_32' # ← Change from E2_HIGHCPU_8
    diskSizeGb: 100
  • Increase Timeout

    timeout: '3600s'  # ← Change from 1800s
  • Add Environment Variables

    options:
    env:
    - 'NODE_OPTIONS=--max_old_space_size=8192' # ← Add 8GB heap
    - 'DOCKER_BUILDKIT=1' # ← Enable BuildKit
  • Add Cache From

    steps:
    - name: 'gcr.io/cloud-builders/docker'
    args:
    - 'build'
    - '--cache-from=us-central1-docker.pkg.dev/${PROJECT_ID}/coditect/coditect-combined:latest' # ← Add caching
    - '--build-arg=BUILDKIT_INLINE_CACHE=1' # ← Enable cache export

Phase 3: Testing

  • Local Build Test

    docker build -f dockerfile.combined-fixed -t coditect-combined:prod-test .
  • Verify Binaries

    docker run --rm coditect-combined:prod-test coditect-v5-api --version
    docker run --rm coditect-combined:prod-test codi2 --version
    docker run --rm coditect-combined:prod-test file-monitor --help
  • Verify Debian Packages

    docker run --rm coditect-combined:prod-test which rg jq tree htop vim
  • Verify .coditect Configs

    docker run --rm coditect-combined:prod-test ls -la /app/.coditect
  • Verify npm Global Packages

    docker run --rm coditect-combined:prod-test npm list -g --depth=0

Phase 4: Cloud Build & Deploy

  • Submit Cloud Build

    gcloud builds submit --config cloudbuild-combined.yaml .
  • Monitor Build Progress

    gcloud builds log --stream <BUILD_ID>
  • Deploy to GKE

    kubectl set image statefulset/coditect-combined \
    combined=us-central1-docker.pkg.dev/.../coditect-combined:BUILD_ID \
    -n coditect-app
  • Verify Deployment

    kubectl rollout status statefulset/coditect-combined -n coditect-app
    kubectl get pods -n coditect-app -l app=coditect-combined

Phase 5: Production Validation

  • Test theia Icons - Open https://coditect.ai/theia, verify icons visible
  • Test Custom Branding - Verify "Coditect AI Agents" displays
  • Test Rust Binaries - SSH into pod, run codi2 --version, file-monitor --help
  • Test Development Tools - Run rg --version, jq --version, tree --version
  • Test .coditect Configs - Verify agents/skills/workflows accessible
  • Test Frontend - Verify https://coditect.ai loads correctly
  • Test Backend API - Verify https://api.coditect.ai/health returns 200

🎯 Expected Outcomes

Build Metrics

MetricCurrent (Estimated)Production (Expected)Improvement
Build Time60-90 min (8 CPUs)30-40 min (32 CPUs)40-60% faster
Build Success Rate~60% (OOM, timeouts)95%+ (proper resources)35%+ improvement
Layer CountUnknown (risky)~70 layersSafe margin (127 limit)
Image Size~3.5 GB~4.5 GB+1 GB (dev tools worth it)

Runtime Benefits

BenefitImpact
No apt-get at runtimeFaster pod startup (~30s → ~10s)
Pre-compiled Rust binariesInstant availability of codi2, file-monitor, API server
Global npm packagesTypeScript/ESLint work anywhere in pod
.coditect configs readyClaude Code agents/skills work immediately
Development tools readyNo waiting for package installs
Consistent environmentAll pods have identical tooling

Cost Impact

CostCurrentProductionChange
Build Costs~$0.80/build (E2_HIGHCPU_8, 60 min)~$1.20/build (E2_HIGHCPU_32, 35 min)+$0.40/build
Storage Costs~$0.10/GB/month × 3.5 GB = $0.35~$0.10/GB/month × 4.5 GB = $0.45+$0.10/month
Runtime CostsSame (3 pods, 2 vCPU, 4 GB each)Same$0
Total Extra Cost-~$0.50/month + $0.40/buildNegligible

Cost-Benefit Analysis: Worth it! ~$0.50/month for production-ready persistent pods with all development tools.


🚀 Deployment Timeline

PhaseDurationTasks
Phase 1: Dockerfile Updates30-45 minAdd 3 Rust build stages, update runtime stage, optimize layers
Phase 2: Cloud Build Updates10-15 minUpdate machine type, timeout, env vars, caching
Phase 3: Local Testing20-30 minBuild locally, verify binaries, packages, configs
Phase 4: Cloud Build & Deploy40-50 minSubmit build (30-40 min), deploy to GKE (5-10 min)
Phase 5: Production Validation15-20 minTest all components, verify functionality
Total2-3 hoursEnd-to-end production-ready deployment

✅ Success Criteria

Deployment considered successful when ALL criteria met:

Docker Build

  • dockerfile.combined-fixed builds without errors
  • Cloud Build completes in 30-40 minutes
  • Image pushed to Artifact Registry (~4.5 GB)
  • All 3 Rust binaries included and functional
  • All Debian packages installed
  • .coditect configs copied correctly
  • npm global packages installed

GKE Deployment

  • StatefulSet rolling update completes
  • All pods reach Running status (3/3 or 10/10)
  • Health checks passing
  • No CrashLoopBackOff or ImagePullBackOff errors

Functionality Tests

  • theia Icons: All icons visible in file tree, toolbar, menus
  • Custom Branding: "Coditect AI Agents" displays in AI Chat
  • V5 API: curl https://api.coditect.ai/health returns 200
  • Codi2: kubectl exec ... -- codi2 --version shows v0.2.0
  • File Monitor: kubectl exec ... -- file-monitor --help works
  • Debian Tools: kubectl exec ... -- rg --version works
  • .coditect Configs: kubectl exec ... -- ls /app/.coditect/agents shows 5 agents
  • npm Global: kubectl exec ... -- tsc --version works

No Regressions


Status: ✅ Analysis Complete, Ready for Implementation Next Action: Update dockerfile.combined-fixed with Rust build stages ETA to Production: 2-3 hours Confidence: High (based on proven Oct 13 build configuration)