Socket.IO CDN Caching Fix
Date: 2025-10-20 Issue: Socket.IO 400 Bad Request errors preventing theia IDE from loading Status: ✅ FIXED - CDN disabled
Problem
Socket.IO polling requests were failing with HTTP 400 errors when accessing theia IDE at https://coditect.ai/theia.
Symptoms:
- Repeated 400 errors on
/theia/socket.io/?EIO=4&transport=polling&t=XXX&sid=YYY - WebSocket upgrade failures:
wss://coditect.ai/theia/socket.io/?EIO=4&transport=websocket&sid=XXX failed - theia IDE not loading in browser
- BUT Socket.IO worked perfectly inside the cluster when tested with curl
Root Cause
GCP Cloud CDN was caching Socket.IO requests with stale session IDs.
BackendConfig had:
spec:
cdn:
cachePolicy:
includeHost: true
includeProtocol: true
includeQueryString: false # ← PROBLEM: Socket.IO query params ignored
enabled: true # ← CDN caching Socket.IO requests
Why this broke Socket.IO:
- Socket.IO uses query parameters for session management:
?sid=XXXXX&t=YYYYY - CDN cache policy had
includeQueryString: false→ query params ignored in cache key - All Socket.IO requests got same cached response regardless of session ID
- Cached response contained stale session ID → 400 Bad Request
- WebSocket upgrade also failed because session handshake never completed
Solution
Disabled CDN in BackendConfig to allow Socket.IO session-based polling to work correctly.
File: k8s/backend-config-no-cdn.yaml
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
name: coditect-backend-config
namespace: coditect-app
spec:
# CDN DISABLED - Socket.IO requires session affinity without caching
cdn:
enabled: false
connectionDraining:
drainingTimeoutSec: 60
healthCheck:
checkIntervalSec: 10
healthyThreshold: 2
port: 80
requestPath: /health
timeoutSec: 5
type: HTTP
unhealthyThreshold: 3
# Session affinity REQUIRED for Socket.IO
sessionAffinity:
affinityCookieTtlSec: 86400 # 24-hour session cookie
affinityType: CLIENT_IP
# 24-hour timeout for long-lived WebSocket connections
timeoutSec: 86400
Applied with:
kubectl apply -f k8s/backend-config-no-cdn.yaml
Verification
# Check CDN is disabled
kubectl get backendconfig -n coditect-app coditect-backend-config -o jsonpath='{.spec.cdn.enabled}'
# Output: false
# Check ingress still references the BackendConfig
kubectl get ingress -n coditect-app coditect-production-ingress -o jsonpath='{.metadata.annotations.cloud\.google\.com/backend-config}'
# Output: {"default": "coditect-backend-config"}
Timeline
Builds #25-31: nginx Configuration (Oct 19-20, 2025)
- Build #29: Added
/bundle.jslocation block - Build #30: Added Connection header mapping for WebSocket
- Build #31: Added dedicated Socket.IO location block with regex matching
- Result: Socket.IO worked INSIDE cluster but failed from browser
BackendConfig Fix (Oct 20, 2025)
- Discovery: CDN caching was the root cause (not nginx)
- Fix: Disabled CDN in BackendConfig
- Status: Applied and propagating to GCP load balancer
nginx Configuration (Already Correct)
The nginx configuration in Build #31 was already correct. It properly routes Socket.IO with:
# Socket.IO - MUST come before /theia location for proper matching
location ~ ^/theia/socket\.io/ {
rewrite ^/theia(.*)$ $1 break;
proxy_pass http://localhost:3000;
proxy_http_version 1.1;
# WebSocket upgrade support
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
# Socket.IO requirements
proxy_buffering off;
proxy_cache off;
# Timeouts (24 hours)
proxy_read_timeout 86400;
proxy_send_timeout 86400;
}
Tested inside cluster:
# Direct to theia backend - works
curl "http://localhost:3000/socket.io/?EIO=4&transport=polling"
# HTTP/1.1 200 OK
# 0{"sid":"mExhI5ZBcHQt0b0TAASG","upgrades":["websocket"],...}
# Through nginx - works
curl "http://localhost/theia/socket.io/?EIO=4&transport=polling"
# HTTP/1.1 200 OK
# 0{"sid":"VzxUlVmTF3Drtpr7AATQ","upgrades":["websocket"],...}
Testing
After GCP load balancer picks up the BackendConfig change (typically 2-5 minutes):
- Visit https://coditect.ai/theia
- Open browser DevTools Console
- Check for Socket.IO errors:
- ✅ Should see no 400 errors
- ✅ Should see successful Socket.IO connection
- ✅ theia IDE should load
Future Optimization (Optional)
To re-enable CDN while preserving Socket.IO functionality, use path-based exclusions:
Option 1: Include query string in cache key (NOT recommended for Socket.IO):
spec:
cdn:
cachePolicy:
includeQueryString: true # Include Socket.IO session params
enabled: true
Problem: Still caches Socket.IO responses, just with different keys. Session IDs expire quickly.
Option 2: Path-based CDN bypass (BETTER): Use separate BackendConfigs for static content (with CDN) and dynamic content (without CDN).
# backend-config-static.yaml (for /, /assets/*, etc.)
spec:
cdn:
enabled: true
cachePolicy:
includeQueryString: false
# backend-config-dynamic.yaml (for /theia/*, /api/*)
spec:
cdn:
enabled: false
Then annotate services with different BackendConfigs:
metadata:
annotations:
cloud.google.com/backend-config: '{"ports": {"80":"backend-config-static"}}'
Complexity: Requires splitting service into multiple backends with path-based routing.
References
- Build History: See
CLAUDE.md"API URL Configuration" section - nginx Config:
nginx-combined.conf:26-49(Socket.IO location block) - BackendConfig:
k8s/backend-config-no-cdn.yaml - Ingress:
kubectl get ingress -n coditect-app coditect-production-ingress
Key Lessons
- CDN and WebSocket/Socket.IO don't mix - Session-based protocols require direct connection
- Test at multiple layers - Internal cluster tests showed nginx was correct; external tests revealed CDN issue
- Query parameters matter - Socket.IO session IDs in query params must not be ignored
- Session affinity is critical - Both service-level and ingress-level session affinity required
- GCP BackendConfig propagation - Takes 2-5 minutes for load balancer to pick up config changes