Kubernetes + Terraform + Helm Quick Start Guide
⏱️ Time to Deploy: 15 minutes | 💰 Cost: ~$200/month | 🎯 Difficulty: Intermediate
This guide gets you from zero to production-ready Coditect V5 infrastructure in 3 simple phases.
📋 Prerequisites Checklist
Before starting, ensure you have:
- Google Cloud Account with billing enabled
- Project Created (e.g.,
serene-voltage-464305-n2) - gcloud CLI installed and authenticated
- Terraform >= 1.5.0 installed
- kubectl installed
- terminal access (bash/zsh)
- 15 minutes of focused time
🚀 Phase 1: Infrastructure Setup (Terraform)
Duration: 12 minutes | What you'll deploy: VPC, GKE cluster, FoundationDB, API v5
Step 1.1: Enable GCP APIs (1 minute)
# Set your project ID
export PROJECT_ID="serene-voltage-464305-n2"
gcloud config set project $PROJECT_ID
# Enable required APIs
gcloud services enable \
compute.googleapis.com \
container.googleapis.com \
artifactregistry.googleapis.com \
cloudbuild.googleapis.com \
logging.googleapis.com \
monitoring.googleapis.com
# Verify APIs are enabled
gcloud services list --enabled | grep -E 'compute|container|artifact'
Expected Output:
compute.googleapis.com
container.googleapis.com
artifactregistry.googleapis.com
Step 1.2: Configure Terraform (2 minutes)
# Navigate to Terraform directory
cd /workspace/PROJECTS/t2/infrastructure/terraform
# Copy example configuration
cp terraform.tfvars.example terraform.tfvars
# Generate JWT secret
JWT_SECRET=$(openssl rand -base64 32)
echo "Generated JWT Secret: $JWT_SECRET"
# Update terraform.tfvars with your values
cat > terraform.tfvars << EOF
# Project Configuration
project_id = "$PROJECT_ID"
region = "us-central1"
zone = "us-central1-a"
# Network Configuration
network_name = "coditect-vpc"
subnet_cidr_range = "10.128.0.0/20"
pods_cidr_range = "10.4.0.0/14"
services_cidr_range = "10.0.32.0/20"
allowed_ip_ranges = ["$(curl -s ifconfig.me)/32"] # Your IP only
# GKE Configuration
cluster_name = "codi-poc-e2-cluster"
node_pool_config = {
name = "default-pool"
machine_type = "e2-medium"
disk_size_gb = 50
disk_type = "pd-standard"
initial_node_count = 3
min_node_count = 1
max_node_count = 10
preemptible = false
}
# FoundationDB Configuration
fdb_namespace = "foundationdb"
fdb_cluster_name = "fdb-cluster"
fdb_replicas = 3
fdb_storage_class = "standard-rwo"
fdb_storage_size = "10Gi"
fdb_cpu_request = "500m"
fdb_memory_request = "2Gi"
fdb_cpu_limit = "2000m"
fdb_memory_limit = "4Gi"
# API Configuration
api_namespace = "coditect-app"
api_deployment_name = "coditect-api-v5"
api_replicas = 3
image_registry = "us-central1-docker.pkg.dev/$PROJECT_ID/coditect"
api_image_tag = "latest"
jwt_secret = "$JWT_SECRET"
api_service_type = "LoadBalancer"
api_service_port = 80
api_cpu_request = "100m"
api_memory_request = "256Mi"
api_cpu_limit = "1000m"
api_memory_limit = "512Mi"
# Domain Configuration
domains = ["coditect.ai", "www.coditect.ai"]
# Labels
labels = {
environment = "production"
project = "coditect-v5"
managed_by = "terraform"
}
EOF
echo "✅ Configuration complete!"
Step 1.3: Initialize Terraform (1 minute)
# Initialize Terraform (downloads providers)
terraform init
# Expected output:
# Terraform has been successfully initialized!
What this does:
- Downloads Google Cloud provider plugins
- Initializes backend (local state)
- Validates module structure
Step 1.4: Plan Deployment (2 minutes)
# Create execution plan
terraform plan -out=tfplan
# Review output - you should see:
# - ~35-40 resources to be created
# - VPC network, subnet, firewall rules
# - GKE cluster and node pool
# - FoundationDB StatefulSet (3 pods)
# - API deployment (3 pods)
# - Services, ConfigMaps, Secrets
Expected Resources:
Plan: 38 to add, 0 to change, 0 to destroy.
⚠️ IMPORTANT: Review the plan carefully. If you see to destroy or unexpected changes, STOP and investigate.
Step 1.5: Apply Infrastructure (6 minutes)
# Apply the plan
terraform apply tfplan
# This will take 10-12 minutes
# Progress indicators:
# [1/38] Creating VPC network...
# [5/38] Creating GKE cluster...
# [20/38] Creating FoundationDB pods...
# [30/38] Creating API deployment...
# [38/38] Complete!
⏳ Grab coffee while this runs (~10 minutes)
Expected Final Output:
Apply complete! Resources: 38 added, 0 changed, 0 destroyed.
Outputs:
api_service_ip = "34.123.45.67"
cluster_endpoint = "https://35.223.45.78"
cluster_name = "codi-poc-e2-cluster"
connection_info = {
"api_url" = "http://34.123.45.67/api/v5"
"cluster_name" = "codi-poc-e2-cluster"
"fdb_coordinator" = "10.128.0.8:4500"
"region" = "us-central1"
}
fdb_cluster_ip = "10.128.0.8"
kubectl_config_command = "gcloud container clusters get-credentials codi-poc-e2-cluster --region us-central1 --project serene-voltage-464305-n2"
load_balancer_ip = "34.123.45.67"
🔍 Phase 2: Verification (Kubernetes)
Duration: 2 minutes | What you'll verify: Cluster, pods, services
Step 2.1: Configure kubectl (30 seconds)
# Get cluster credentials
terraform output -raw kubectl_config_command | bash
# Verify connection
kubectl get nodes
# Expected output:
# NAME STATUS ROLES AGE version
# gke-codi-poc-e2-cluster-default-pool-... Ready <none> 5m30s v1.28.3-gke.1203000
# gke-codi-poc-e2-cluster-default-pool-... Ready <none> 5m28s v1.28.3-gke.1203000
# gke-codi-poc-e2-cluster-default-pool-... Ready <none> 5m29s v1.28.3-gke.1203000
Step 2.2: Check FoundationDB (30 seconds)
# Check FDB pods
kubectl get pods -n foundationdb
# Expected output:
# NAME READY STATUS RESTARTS AGE
# fdb-cluster-0 1/1 Running 0 3m
# fdb-cluster-1 1/1 Running 0 3m
# fdb-cluster-2 1/1 Running 0 3m
# Verify FDB cluster status
kubectl exec -n foundationdb fdb-cluster-0 -- fdbcli --exec "status"
# Expected output (key lines):
# Replication health: Healthy
# Storage server count: 3
✅ SUCCESS INDICATOR: Replication health: Healthy
Step 2.3: Check API Deployment (30 seconds)
# Check API pods
kubectl get pods -n coditect-app
# Expected output:
# NAME READY STATUS RESTARTS AGE
# coditect-api-v5-xxxxxxxxxx-xxxxx 1/1 Running 0 2m
# coditect-api-v5-xxxxxxxxxx-xxxxx 1/1 Running 0 2m
# coditect-api-v5-xxxxxxxxxx-xxxxx 1/1 Running 0 2m
# Check API service
kubectl get svc -n coditect-app
# Expected output:
# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
# coditect-api-v5 LoadBalancer 10.0.40.123 34.123.45.67 80:30123/TCP 2m
✅ SUCCESS INDICATOR: All pods 1/1 Running, service has EXTERNAL-IP
Step 2.4: Test API Health (30 seconds)
# Get API IP
API_IP=$(terraform output -raw api_service_ip)
# Test health endpoint
curl http://$API_IP/api/v5/health
# Expected output:
# {"success":true,"data":{"service":"coditect-v5-api","status":"healthy"}}
# Test ready endpoint
curl http://$API_IP/api/v5/ready
# Expected output:
# {"success":true,"data":{"service":"coditect-v5-api","status":"ready","database":"connected"}}
✅ SUCCESS INDICATOR: Both endpoints return "success":true
📦 Phase 3: Helm Charts (Future Enhancement)
Status: 🚧 Not implemented yet (planned for Phase 3)
What Helm will provide:
- Better application packaging
- Versioned releases
- Easier rollbacks
- Template-based configuration
Preview of future Helm workflow:
# Future: Deploy with Helm instead of Terraform Kubernetes provider
helm install coditect-api-v5 ./helm/coditect-api-v5 \
--namespace coditect-app \
--set image.tag=v1.2.3 \
--set jwt.secret=$JWT_SECRET
# Future: Upgrade
helm upgrade coditect-api-v5 ./helm/coditect-api-v5 \
--set image.tag=v1.2.4
# Future: Rollback
helm rollback coditect-api-v5 1
When to implement: After production deployment is stable
📊 Quick Reference
Important URLs and IPs
# Get all connection info
terraform output connection_info
# Get specific values
terraform output api_service_ip # API external IP
terraform output cluster_endpoint # GKE master endpoint
terraform output fdb_cluster_ip # FoundationDB internal IP
Useful Commands
# View logs
kubectl logs -n coditect-app -l app=coditect-api-v5 --tail=50 -f
# Scale API
kubectl scale deployment -n coditect-app coditect-api-v5 --replicas=5
# Restart API pods
kubectl rollout restart deployment -n coditect-app coditect-api-v5
# Check HPA status
kubectl get hpa -n coditect-app
# Check pod resources
kubectl top pods -n coditect-app
kubectl top pods -n foundationdb
Terraform Commands
# View all outputs
terraform output
# Refresh outputs (if changed manually)
terraform refresh
# Check for drift
terraform plan
# Apply changes
terraform apply
# Destroy everything (careful!)
terraform destroy
🐛 Troubleshooting
Issue 1: API Pods in CrashLoopBackOff
Symptom:
kubectl get pods -n coditect-app
# coditect-api-v5-xxx 0/1 CrashLoopBackOff 3 2m
Solution:
# Check logs
kubectl logs -n coditect-app <pod-name>
# Common causes:
# 1. FDB not ready - Wait 2-3 minutes for FDB cluster
# 2. Wrong FDB cluster file - Check ConfigMap
kubectl get configmap -n coditect-app coditect-api-v5-fdb-config -o yaml
# 3. JWT secret missing - Check Secret
kubectl get secret -n coditect-app coditect-api-v5-jwt -o yaml
Issue 2: LoadBalancer Has No External IP
Symptom:
kubectl get svc -n coditect-app
# coditect-api-v5 LoadBalancer 10.0.40.123 <pending> 80:30123/TCP 5m
Solution:
# Wait up to 5 minutes for IP allocation
kubectl get svc -n coditect-app coditect-api-v5 --watch
# Check events
kubectl describe svc -n coditect-app coditect-api-v5
# Verify quota (if stuck on <pending>)
gcloud compute project-info describe --project $PROJECT_ID | grep -A 5 EXTERNAL
Issue 3: FoundationDB "Replication Unhealthy"
Symptom:
kubectl exec -n foundationdb fdb-cluster-0 -- fdbcli --exec "status"
# Replication health: UNHEALTHY
Solution:
# Check all FDB pods are running
kubectl get pods -n foundationdb
# If pod is stuck in Pending:
kubectl describe pod -n foundationdb fdb-cluster-0
# Common cause: PVC not bound
kubectl get pvc -n foundationdb
# Wait 2-3 minutes for cluster to stabilize
# FDB cluster initialization takes time
Issue 4: Terraform Apply Fails
Symptom:
Error: Error creating Network: googleapi: Error 409: Already exists
Solution:
# Option 1: Import existing resource
terraform import module.networking.google_compute_network.vpc \
projects/$PROJECT_ID/global/networks/coditect-vpc
# Option 2: Destroy and recreate
terraform destroy -target=module.networking
terraform apply
# Option 3: Start fresh (nuclear option)
terraform destroy
terraform apply
Issue 5: kubectl Can't Connect to Cluster
Symptom:
Unable to connect to the server: dial tcp: lookup xxx on 8.8.8.8:53: no such host
Solution:
# Re-authenticate
gcloud auth login
gcloud config set project $PROJECT_ID
# Get credentials again
gcloud container clusters get-credentials codi-poc-e2-cluster \
--region us-central1 \
--project $PROJECT_ID
# Verify
kubectl cluster-info
🎯 Next Steps
Immediate (Post-Deployment)
-
Configure DNS (if using custom domain):
# Get LoadBalancer IP
terraform output load_balancer_ip
# Create A record in your DNS provider:
# coditect.ai -> <LoadBalancer IP>
# Wait for SSL certificate to provision (~15 minutes)
kubectl get managedcertificate -n coditect-app -
Set up Monitoring:
# View in Cloud Console
echo "https://console.cloud.google.com/kubernetes/clusters/details/us-central1/codi-poc-e2-cluster?project=$PROJECT_ID"
# Create dashboard
echo "https://console.cloud.google.com/monitoring/dashboards?project=$PROJECT_ID" -
Test API Endpoints:
API_IP=$(terraform output -raw api_service_ip)
# Health check
curl http://$API_IP/api/v5/health
# Register user (example)
curl -X POST http://$API_IP/api/v5/auth/register \
-H "Content-Type: application/json" \
-d '{
"email": "test@example.com",
"password": "SecurePass123!",
"first_name": "Test",
"last_name": "User"
}'
Short-Term (Week 1)
-
Set Up Remote State:
# Create GCS bucket for Terraform state
gsutil mb gs://$PROJECT_ID-terraform-state
gsutil versioning set on gs://$PROJECT_ID-terraform-state
# Update terraform/main.tf backend configuration
# Uncomment backend "gcs" block
# Migrate state
terraform init -migrate-state -
Configure Backups:
- FoundationDB: Set up continuous backup to GCS
- Terraform state: Already versioned in GCS
- Configuration: Git repository backups
-
Set Up Alerts:
# Create alert policy for pod failures
gcloud alpha monitoring policies create \
--notification-channels=<channel-id> \
--display-name="API Pod Failures" \
--condition-display-name="Pod crash rate > 2/min"
Medium-Term (Month 1)
-
Implement GitOps with ArgoCD:
- Install ArgoCD on cluster
- Configure application sync
- Set up GitHub Actions for CI/CD
-
Migrate to Helm Charts:
- Create Helm chart for API deployment
- Replace Terraform Kubernetes provider
- Implement versioned releases
-
Multi-Environment Setup:
- Create
environments/dev/ - Create
environments/staging/ - Establish promotion workflow
- Create
-
Cost Optimization:
- Review actual resource usage
- Right-size node pool and pods
- Apply committed use discounts
💰 Cost Breakdown
Monthly Costs (current configuration):
| Resource | Cost |
|---|---|
| GKE Cluster Management | ~$73/month |
| 3× e2-medium nodes | ~$67/month |
| Persistent Disks (30GB) | ~$12/month |
| LoadBalancer | ~$18/month |
| Logging/Monitoring | ~$10-20/month |
| Egress Traffic (est.) | ~$10-30/month |
| Total | ~$190-220/month |
Optimization Tips:
- Apply 1-year committed use: Save ~$25/month (37% discount)
- Right-size after monitoring: Potential 20-30% savings
- Use preemptible nodes for dev: Save 60-80% (not for production)
📚 Documentation Links
- Detailed Terraform Guide:
terraform/README.md - AI Assistant Guidance:
terraform/CLAUDE.md - Infrastructure Overview:
README.md - Implementation Summary:
../docs/03-infrastructure/iac-implementation-summary.md - Deployment Resolution Report:
../docs/06-backend/backend-deployment-resolution.md
✅ Success Checklist
After completing all phases, verify:
- All 3 GKE nodes are
Ready - All 3 FDB pods are
1/1 Running - All 3 API pods are
1/1 Running - FDB cluster status shows
Replication Healthy - LoadBalancer has external IP
-
/api/v5/healthreturns{"success":true} -
/api/v5/readyreturns{"success":true} - Terraform state is saved (local or GCS)
- JWT secret is stored securely
- Monitoring is configured
- DNS is pointed to LoadBalancer (if applicable)
If all checked: 🎉 Deployment successful!
🆘 Getting Help
Issues with deployment?
- Check troubleshooting section above ⬆️
- Review logs:
kubectl logs -n coditect-app -l app=coditect-api-v5
kubectl logs -n foundationdb fdb-cluster-0 - Check Terraform state:
terraform plan # Look for drift
terraform show # View current state - Consult detailed docs:
- Terraform README: terraform/README.md
- Backend Deployment Report: ../docs/06-backend/backend-deployment-resolution.md
Still stuck?
- Check GitHub issues in project repository
- Review GCP Console for resource status
- Verify GCP quotas and billing
⏱️ Total Time: 15 minutes | 🎯 Result: Production-ready Coditect V5 infrastructure
Happy deploying! 🚀