Critical Infrastructure Roadmap
Date: 2025-10-06 Priority: URGENT - Core infrastructure blockers for V5 MVP
🚨 Critical Issues to Resolve
1. WebSocket Pod OS Connection Issue
Problem: WebSocket connections to user workspace pods fail due to OS-level networking Status: 🔴 BLOCKING Impact: Real-time IDE communication broken, limits interactive features
Root Cause Analysis Needed:
# Check WebSocket gateway deployment
kubectl get deployment -n coditect-app | grep websocket
# Check pod-to-pod networking
kubectl exec -n coditect-app <websocket-pod> -- ping <workspace-pod-ip>
# Check service mesh / network policy
kubectl get networkpolicies -n coditect-app
kubectl describe svc -n coditect-app | grep -i websocket
Potential Fixes:
-
Option A: Use Kubernetes Service for pod-to-pod communication
- Create ClusterIP service for each workspace pod
- WebSocket gateway connects via service DNS
- Example:
workspace-{user_id}.coditect-app.svc.cluster.local
-
Option B: Deploy WebSocket gateway as sidecar
- Run WebSocket proxy in same pod as workspace
- Communicate via localhost (127.0.0.1)
- No inter-pod networking needed
-
Option C: Use Istio/Envoy service mesh
- Add service mesh for transparent proxying
- Automatic mTLS and load balancing
- More complex, but production-grade
Recommended: Option B (sidecar) for simplicity and reliability
Implementation:
# workspace pod with WebSocket sidecar
apiVersion: v1
kind: Pod
metadata:
name: workspace
namespace: user-{user_id}
spec:
containers:
# Main workspace container
- name: workspace
image: gcr.io/serene-voltage-464305-n2/t2-workspace-pod:latest
ports:
- containerPort: 3000 # theia
env:
- name: WEBSOCKET_URL
value: "ws://localhost:8765" # Connect to sidecar
# WebSocket sidecar
- name: websocket-gateway
image: gcr.io/serene-voltage-464305-n2/coditect-websocket:latest
ports:
- containerPort: 8765 # WebSocket server
env:
- name: FDB_CLUSTER_STRING
value: "coditect:production@10.128.0.8:4500"
- name: JWT_SECRET
valueFrom:
secretKeyRef:
name: jwt-secret
key: secret
2. JWT Authorization + FoundationDB Integration
Problem: JWT validation not integrated with FDB user lookup Status: 🟡 PARTIAL - JWT generation works, validation needs FDB Impact: Auth works, but sessions not stored/retrieved from FDB
Current Flow (Incomplete):
1. User login → API generates JWT
2. JWT contains {userId, tenantId}
3. Frontend stores JWT in localStorage
4. Frontend sends JWT in Authorization header
5. ❌ API doesn't validate JWT against FDB session
6. ❌ No session persistence in FDB
Target Flow (Complete):
1. User login → API generates JWT
2. API stores session in FDB: tenant_id/session/{session_id}
3. JWT contains {userId, tenantId, sessionId}
4. Frontend sends JWT in Authorization header
5. ✅ API validates JWT signature
6. ✅ API looks up session in FDB
7. ✅ API verifies session is active (not expired/terminated)
8. ✅ Request proceeds with user context
Implementation (Rust/Axum):
// src/middleware/jwt_auth.rs
use actix_web::{dev::ServiceRequest, Error, HttpMessage};
use actix_web_httpauth::extractors::bearer::BearerAuth;
use jsonwebtoken::{decode, DecodingKey, Validation};
use foundationdb::Database;
pub struct JwtClaims {
pub user_id: String,
pub tenant_id: String,
pub session_id: String,
pub exp: usize,
}
pub async fn validate_jwt(
req: ServiceRequest,
credentials: BearerAuth,
fdb: Database,
) -> Result<ServiceRequest, Error> {
let token = credentials.token();
// 1. Verify JWT signature
let secret = std::env::var("JWT_SECRET").expect("JWT_SECRET not set");
let token_data = decode::<JwtClaims>(
token,
&DecodingKey::from_secret(secret.as_bytes()),
&Validation::default(),
)?;
let claims = token_data.claims;
// 2. Look up session in FDB
let fdb_tx = fdb.create_trx()?;
let session_key = format!(
"{}/session/{}",
claims.tenant_id,
claims.session_id
);
let session_data = fdb_tx.get(session_key.as_bytes(), false).await?;
match session_data {
Some(data) => {
let session: Session = serde_json::from_slice(&data)?;
// 3. Verify session is active
if session.status != "active" {
return Err(actix_web::error::ErrorUnauthorized("Session inactive"));
}
// 4. Check session expiration
let now = chrono::Utc::now();
let last_access = chrono::DateTime::parse_from_rfc3339(&session.last_access_at)?;
let session_timeout = chrono::Duration::hours(24);
if now.signed_duration_since(last_access) > session_timeout {
return Err(actix_web::error::ErrorUnauthorized("Session expired"));
}
// 5. Update last access time
let mut updated_session = session.clone();
updated_session.last_access_at = now.to_rfc3339();
fdb_tx.set(
session_key.as_bytes(),
serde_json::to_vec(&updated_session)?.as_slice(),
);
fdb_tx.commit().await?;
// 6. Attach user context to request
req.extensions_mut().insert(UserContext {
user_id: claims.user_id,
tenant_id: claims.tenant_id,
session_id: claims.session_id,
});
Ok(req)
}
None => Err(activ_web::error::ErrorUnauthorized("Session not found"))
}
}
Session Model (FDB):
// src/models/session.rs
use serde::{Deserialize, Serialize};
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Session {
pub session_id: String,
pub user_id: String,
pub tenant_id: String,
pub workspace_id: String,
pub pod_namespace: Option<String>,
pub pod_name: Option<String>,
pub websocket_connected: bool,
pub status: String, // "active" | "suspended" | "terminated"
pub created_at: String, // ISO 8601
pub last_access_at: String, // ISO 8601
pub metadata: Option<serde_json::Value>,
}
impl Session {
pub fn fdb_key(&self) -> String {
format!("{}/session/{}", self.tenant_id, self.session_id)
}
pub fn is_active(&self) -> bool {
self.status == "active"
}
pub fn is_expired(&self, timeout_hours: i64) -> bool {
let now = chrono::Utc::now();
let last_access = chrono::DateTime::parse_from_rfc3339(&self.last_access_at)
.expect("Invalid last_access_at format");
let timeout = chrono::Duration::hours(timeout_hours);
now.signed_duration_since(last_access) > timeout
}
}
Deliverables:
- Create JWT validation middleware with FDB lookup
- Implement session expiration logic
- Update last_access_at on each request
- Add session cleanup job (delete expired sessions)
3. Coditect Server + Knowledge Base as a Service (KBaaS)
Problem: Coditect server not deployed, Knowledge Base service not integrated with GCP Status: 🔴 NOT DEPLOYED Impact: No centralized knowledge management, agent coordination limited
Architecture:
┌─────────────────────────────────────────────────────┐
│ Google Cloud Platform │
│ │
│ ┌────────────────────────────────────────────────┐ │
│ │ GKE Cluster (codi-poc-e2-cluster) │ │
│ │ │ │
│ │ ┌──────────────┐ ┌─────────────────────────┐│ │
│ │ │ theia IDE │ │ Coditect Server ││ │
│ │ │ (Frontend) │ │ (Knowledge Hub) ││ │
│ │ └──────────────┘ └─────────────────────────┘│ │
│ │ ↓ ↓ │ │
│ │ ┌─────────────────────────────────────────┐ │ │
│ │ │ FoundationDB Cluster (StatefulSet) │ │ │
│ │ │ - User data │ │ │
│ │ │ - Session state │ │ │
│ │ │ - Knowledge base index │ │ │
│ │ └─────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Cloud Storage (Knowledge Artifacts) │ │
│ │ gs://serene-voltage-464305-n2-knowledge/ │ │
│ │ - Documentation embeddings │ │
│ │ - Code snippets │ │
│ │ - Agent templates │ │
│ └─────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Vertex AI (Embeddings + RAG) │ │
│ │ - text-embedding-004 │ │
│ │ - Context retrieval for agents │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
Coditect Server Features:
-
Knowledge Indexing:
- Index all documentation from V4
- Create embeddings using Vertex AI
- Store in FoundationDB + Cloud Storage
-
Agent Coordination:
- Central registry of available agents
- Task distribution and load balancing
- Agent-to-agent communication (A2A protocol)
-
Session Management:
- Centralized session state
- WebSocket connection registry
- Pod lifecycle management
Implementation:
// src/coditect-server/main.rs
use actix_web::{web, App, HttpServer};
use foundationdb::Database;
use vertex_ai::EmbeddingClient;
pub struct CoditectServer {
fdb: Database,
embedding_client: EmbeddingClient,
storage_bucket: String,
}
impl CoditectServer {
pub async fn new() -> Result<Self, Box<dyn std::error::Error>> {
// Connect to FoundationDB
let fdb = Database::default()?;
// Initialize Vertex AI
let embedding_client = EmbeddingClient::new(
"serene-voltage-464305-n2",
"text-embedding-004"
).await?;
// Cloud Storage bucket
let storage_bucket = "gs://serene-voltage-464305-n2-knowledge".to_string();
Ok(Self {
fdb,
embedding_client,
storage_bucket,
})
}
// Index documentation
pub async fn index_knowledge(&self, content: &str, metadata: serde_json::Value) -> Result<String, Box<dyn std::error::Error>> {
// 1. Generate embedding
let embedding = self.embedding_client.embed_text(content).await?;
// 2. Store in FDB with embedding
let doc_id = uuid::Uuid::new_v4().to_string();
let fdb_tx = self.fdb.create_trx()?;
let knowledge_entry = KnowledgeEntry {
id: doc_id.clone(),
content: content.to_string(),
embedding: embedding.clone(),
metadata,
created_at: chrono::Utc::now().to_rfc3339(),
};
fdb_tx.set(
format!("knowledge/{}", doc_id).as_bytes(),
serde_json::to_vec(&knowledge_entry)?.as_slice(),
);
fdb_tx.commit().await?;
// 3. Upload to Cloud Storage (backup)
let storage_path = format!("{}/docs/{}.json", self.storage_bucket, doc_id);
// Upload logic here...
Ok(doc_id)
}
// Retrieve relevant knowledge (RAG)
pub async fn retrieve_knowledge(&self, query: &str, top_k: usize) -> Result<Vec<KnowledgeEntry>, Box<dyn std::error::Error>> {
// 1. Generate query embedding
let query_embedding = self.embedding_client.embed_text(query).await?;
// 2. Search FDB for similar embeddings (cosine similarity)
let fdb_tx = self.fdb.create_trx()?;
let knowledge_docs = fdb_tx
.get_range(
"knowledge/".as_bytes(),
"knowledge/\xff".as_bytes(),
foundationdb::options::StreamingMode::WantAll,
false,
)
.await?;
// 3. Rank by similarity
let mut results: Vec<(KnowledgeEntry, f64)> = knowledge_docs
.iter()
.map(|(_, value)| {
let entry: KnowledgeEntry = serde_json::from_slice(value).unwrap();
let similarity = cosine_similarity(&query_embedding, &entry.embedding);
(entry, similarity)
})
.collect();
results.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
Ok(results.into_iter().take(top_k).map(|(entry, _)| entry).collect())
}
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct KnowledgeEntry {
pub id: String,
pub content: String,
pub embedding: Vec<f32>,
pub metadata: serde_json::Value,
pub created_at: String,
}
fn cosine_similarity(a: &[f32], b: &[f32]) -> f64 {
let dot_product: f32 = a.iter().zip(b.iter()).map(|(x, y)| x * y).sum();
let magnitude_a: f32 = a.iter().map(|x| x * x).sum::<f32>().sqrt();
let magnitude_b: f32 = b.iter().map(|x| x * x).sum::<f32>().sqrt();
(dot_product / (magnitude_a * magnitude_b)) as f64
}
Deployment:
# k8s/coditect-server-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: coditect-server
namespace: coditect-app
spec:
replicas: 3
selector:
matchLabels:
app: coditect-server
template:
metadata:
labels:
app: coditect-server
spec:
containers:
- name: coditect-server
image: gcr.io/serene-voltage-464305-n2/coditect-server:latest
ports:
- containerPort: 8080
env:
- name: FDB_CLUSTER_STRING
value: "coditect:production@10.128.0.8:4500"
- name: GCP_PROJECT
value: "serene-voltage-464305-n2"
- name: KNOWLEDGE_BUCKET
value: "gs://serene-voltage-464305-n2-knowledge"
- name: VERTEX_AI_REGION
value: "us-central1"
volumeMounts:
- name: fdb-cluster-config
mountPath: /etc/foundationdb
volumes:
- name: fdb-cluster-config
configMap:
name: fdb-cluster-config
---
apiVersion: v1
kind: Service
metadata:
name: coditect-server
namespace: coditect-app
spec:
selector:
app: coditect-server
ports:
- port: 8080
targetPort: 8080
Deliverables:
- Build Coditect server container
- Create Cloud Storage bucket for knowledge artifacts
- Enable Vertex AI embeddings API
- Deploy to GKE
- Index V4 documentation
- Create agent coordination endpoints
🎯 Priority Task List
Week 1: WebSocket Fix (URGENT)
-
Diagnose WebSocket Issue:
# Check current WebSocket gateway deployment
kubectl get all -n coditect-app | grep websocket
# Test pod-to-pod connectivity
kubectl exec -n coditect-app <api-pod> -- ping <workspace-pod-ip>
# Check logs
kubectl logs -n coditect-app -l app=coditect-websocket --tail=100 -
Implement Sidecar Solution:
- Update workspace pod YAML with WebSocket sidecar
- Build combined image (workspace + WebSocket)
- Test localhost communication
- Deploy to test namespace
- Verify WebSocket connections work
-
Update API to use new WebSocket endpoint:
- Change WebSocket URL from
ws://<pod-ip>:8765tows://localhost:8765 - Update frontend WebSocket client
- Test end-to-end
- Change WebSocket URL from
Week 2: JWT + FDB Integration (HIGH)
-
Implement JWT Middleware:
- Create
jwt_auth.rswith FDB session lookup - Add session expiration logic
- Update
last_access_aton each request - Add to API routes
- Create
-
Update Login/Register:
- Store session in FDB on login
- Include
sessionIdin JWT payload - Return session data to frontend
-
Session Cleanup Job:
- Create CronJob to delete expired sessions
- Run daily at 2 AM UTC
- Log cleanup statistics
Week 3-4: Coditect Server + KBaaS (MEDIUM)
-
Infrastructure Setup:
- Create Cloud Storage bucket:
serene-voltage-464305-n2-knowledge - Enable Vertex AI API
- Create service account with Vertex AI permissions
- Create Cloud Storage bucket:
-
Build Coditect Server:
- Implement knowledge indexing
- Implement RAG retrieval
- Create agent coordination endpoints
- Add health check endpoint
-
Deploy and Index:
- Deploy to GKE
- Index all V4 documentation
- Test knowledge retrieval
- Integrate with agents
📊 Success Metrics
Week 1 (WebSocket):
- ✅ WebSocket connections stable (no disconnects)
- ✅ Pod-to-pod communication working
- ✅ Real-time messages flowing
Week 2 (JWT/FDB):
- ✅ JWT validation against FDB sessions
- ✅ Session expiration enforced
- ✅ No unauthorized access
Week 3-4 (KBaaS):
- ✅ Documentation fully indexed
- ✅ RAG retrieval working (< 500ms)
- ✅ Agents using knowledge base
- ✅ 95%+ relevance on queries
🔗 Related Documents
- V5 Migration Plan:
docs/v4-analysis/v5-migration-plan-serena-reuse.md - Current Deployment:
docs/v4-analysis/current-deployment-status.md - V4 Frontend Integration:
docs/v4-analysis/v4-theia-frontend-integration.md - V4 CLAUDE.md:
coditect-v4/CLAUDE.md
Next Steps:
- Fix WebSocket pod OS connection (sidecar pattern)
- Integrate JWT validation with FDB sessions
- Deploy Coditect server with KBaaS to GKE