One-Click Installation Readiness Assessment
CODITECT Cloud Infrastructure
Date: November 23, 2025 Repository: coditect-cloud-infra Purpose: License Management System Infrastructure Target: Automated one-click installation capability
Executive Summary
Current Installation Complexity Score: 4/10 (Moderate - Requires 15-20 manual steps)
Key Findings:
- Infrastructure modules are production-ready with good defaults
- Manual steps required: GCP authentication, project setup, credential management, OpenTofu initialization
- No Helm charts exist (Kubernetes manifests are skeletal)
- No application deployment automation
- Secrets must be manually populated
- Missing: Prerequisites validation, rollback mechanisms, health checks
Estimated Engineering Effort to One-Click: 120-160 hours (3-4 weeks, 1 engineer)
Recommendation: FEASIBLE - Infrastructure foundation is solid, needs orchestration layer and Helm packaging
1. Current Installation Complexity Analysis
1.1 Manual Steps Breakdown (from README.md)
The current installation requires 15 distinct manual steps:
Phase 1: Prerequisites Setup (5 steps)
Step 1: Install Google Cloud SDK
- Location: Manual download/install
- User Input: OS-specific installation
- Automation: Partially automated via
/Users/halcasteel/PROJECTS/coditect-rollout-master/submodules/cloud/coditect-cloud-infra/scripts/install-tools.sh
Step 2: Install OpenTofu 1.6.7+
- Location: Manual download/install
- User Input: Version selection
- Automation: Script exists at
/Users/halcasteel/PROJECTS/coditect-rollout-master/submodules/cloud/coditect-cloud-infra/scripts/install-tools.sh
Step 3: Install kubectl 1.28+
- Location: Manual download/install
- User Input: Version selection
- Automation: Script exists at
/Users/halcasteel/PROJECTS/coditect-rollout-master/submodules/cloud/coditect-cloud-infra/scripts/install-tools.sh
Step 4: Install Python 3.11+
- Location: Manual download/install
- User Input: Version selection
- Automation: OS-dependent
Step 5: Install Docker (optional)
- Location: Manual download/install
- User Input: Docker Desktop installation
- Automation: Not automated (requires GUI)
Phase 2: GCP Authentication (3 steps)
Step 6: Authenticate with GCP
gcloud auth application-default login
- User Input: Google account credentials (OAuth browser flow)
- Hard-coded Values: None
- Automation: Interactive only
Step 7: Set GCP project
gcloud config set project coditect-citus-prod
- User Input: Project ID (
coditect-citus-prodhard-coded) - Hard-coded Values: YES - Project ID in README
- Parameterization Needed: Project ID should be configurable
Step 8: Create GCP project (if needed)
- Location:
/Users/halcasteel/PROJECTS/coditect-rollout-master/submodules/cloud/coditect-cloud-infra/scripts/gcp-setup.sh - User Input: Environment selection (dev/staging/production)
- Automation: GOOD - Script exists
Phase 3: Infrastructure Provisioning (4 steps)
Step 9: Initialize OpenTofu
cd opentofu/environments/dev
tofu init
- User Input: None (if backend configured)
- Hard-coded Values: Environment path
- Automation: Partially - Backend creation script exists
Step 10: Review infrastructure plan
tofu plan
- User Input: Manual review required
- Hard-coded Values: None
- Automation: Manual review necessary (best practice)
Step 11: Apply infrastructure
tofu apply
- User Input: Confirmation prompt (
yes) - Hard-coded Values: None
- Automation: Could use
-auto-approve(risky)
Step 12: Configure kubectl
gcloud container clusters get-credentials coditect-dev \
--region us-central1 \
--project coditect-citus-prod
- User Input: Cluster name, region, project
- Hard-coded Values: YES - All three values
- Parameterization Needed: Cluster name, region, project ID
Phase 4: Verification (3 steps)
Step 13: Verify nodes
kubectl get nodes
- User Input: None
- Automation: Could be automated
Step 14: Verify pods
kubectl get pods -A
- User Input: None
- Automation: Could be automated
Step 15: Populate secrets manually
gcloud secrets versions add SECRET_NAME --data-file=VALUE_FILE
- User Input: CRITICAL - All secret values must be provided
- Hard-coded Values: None (intentionally manual for security)
- Automation: BLOCKER - No automated secret population
1.2 Hard-Coded Values Requiring Parameterization
| Location | Value | Type | Fix Required |
|---|---|---|---|
README.md:93 | coditect-citus-prod | Project ID | Use environment variable |
README.md:104-106 | coditect-dev | Cluster name | Use variable |
README.md:105 | us-central1 | Region | Use variable |
.env.example:28 | coditect-dev | Project ID | Already parameterized |
.env.example:29 | us-central1 | Region | Already parameterized |
opentofu/backend/create-backends.sh:9 | coditect-citus-prod | Project ID | BLOCKER - Hard-coded |
opentofu/environments/dev/main.tf:5-9 | Multiple locals | Naming pattern | Good (uses variables) |
Count: 7 hard-coded values across documentation and scripts
1.3 Dependencies Requiring Pre-Installation
Required Dependencies:
-
gcloud CLI - Google Cloud SDK
- Version: Latest (script installs)
- Source: https://cloud.google.com/sdk/docs/install
- Automation: ✅
scripts/install-tools.sh
-
OpenTofu - Infrastructure provisioning
- Version: >= 1.5.0 (enforced in all modules)
- Source: https://opentofu.org/docs/install
- Automation: ✅
scripts/install-tools.sh
-
kubectl - Kubernetes CLI
- Version: >= 1.28.0
- Source: Installed via gcloud components
- Automation: ✅
scripts/install-tools.sh
-
Python 3.11+ - Scripting and automation
- Version: >= 3.10.0
- Source: OS package manager
- Automation: ⚠️ OS-dependent
-
Docker (optional) - Local testing
- Version: Latest
- Source: Docker Desktop
- Automation: ❌ Manual GUI installation
Optional Dependencies:
- Helm 3+ - Kubernetes package manager (not currently used)
- pre-commit - Git hooks for validation
- poetry - Python dependency management
Missing Checks:
- No prerequisite version validation script (exists but not integrated)
- No automated retry for dependency installation failures
- No OS compatibility check (macOS vs Linux vs Windows)
2. OpenTofu Module Maturity Assessment
2.1 Module Overview
| Module | Purpose | Defaults Quality | Required Vars | Optional Vars | Validation |
|---|---|---|---|---|---|
| networking | VPC, subnets, NAT | ⭐⭐⭐⭐⭐ Excellent | 3 | 19 | ✅ Comprehensive |
| gke | Kubernetes cluster | ⭐⭐⭐⭐⭐ Excellent | 4 | 18 | ✅ Comprehensive |
| cloudsql | PostgreSQL database | ⭐⭐⭐⭐⭐ Excellent | 3 | 25 | ✅ Comprehensive |
| redis | Memorystore cache | ⭐⭐⭐⭐⭐ Excellent | 2 | 15 | ✅ Comprehensive |
| firewall | Network security | ⭐⭐⭐⭐ Good | 2 | 1 | ✅ Basic |
| secrets | Secret Manager | ⭐⭐⭐ Fair | 2 | 5 | ✅ Basic |
Overall Module Quality: ⭐⭐⭐⭐⭐ Production-Ready (5/5)
2.2 Sensible Defaults Analysis
networking module (opentofu/modules/networking/variables.tf)
Excellent defaults:
network_name:"coditect-vpc"(sensible)routing_mode:"REGIONAL"(cost-effective)primary_subnet_cidr:"10.0.0.0/20"(4,096 IPs - adequate)pods_secondary_cidr:"10.4.0.0/14"(262,144 IPs - generous for GKE)services_secondary_cidr:"10.8.0.0/20"(4,096 IPs - adequate)flow_logs_sampling:0.5(50% - balanced cost/visibility)enable_nat_logging:true(good for debugging)
Required inputs: project_id, environment, region
Safe to re-run: ✅ Yes - Idempotent (OpenTofu manages state)
gke module (opentofu/modules/gke/variables.tf)
Excellent defaults:
cluster_name:"coditect-citus"(sensible)machine_type:"n1-standard-4"(4 vCPU, 15GB RAM - production-grade)min_node_count:3(HA requirement met)max_node_count:20(auto-scaling headroom)disk_size_gb:100(adequate for containers)release_channel:"STABLE"(production-ready)enable_binary_authorization:true(security best practice)enable_managed_prometheus:true(observability built-in)
Dev environment overrides:
machine_type:"n1-standard-2"(cost-optimized)min_node_count:1(reduced for dev)use_preemptible_nodes:true(80% cost savings)
Required inputs: project_id, environment, network_name, subnet_name, node_service_account
Safe to re-run: ✅ Yes - Node pool updates use blue/green deployment (max_surge, max_unavailable)
cloudsql module (opentofu/modules/cloudsql/variables.tf)
Excellent defaults:
database_version:"POSTGRES_16"(latest stable)tier:"db-custom-4-16384"(4 vCPU, 16GB RAM - production-grade)availability_type:"REGIONAL"(HA by default)disk_size_gb:100(adequate starting size)deletion_protection:true(safety net)enable_point_in_time_recovery:true(disaster recovery)backup_retention_count:7(1 week retention)max_connections:"200"(reasonable for 16GB RAM)shared_buffers:"4096MB"(25% of RAM - PostgreSQL best practice)
Dev environment overrides:
tier:"db-custom-2-8192"(2 vCPU, 8GB RAM - cost-optimized)availability_type:"ZONAL"(single-zone for dev)deletion_protection:false(easier cleanup)
Required inputs: project_id, environment, private_network, app_user_password, readonly_user_password
Validation: ✅ Strong - Regex for database version, range checks for disk size, enum validation for availability
Safe to re-run: ✅ Yes - Cloud SQL updates are applied during maintenance window
redis module (opentofu/modules/redis/variables.tf)
Excellent defaults:
tier:"STANDARD_HA"(high availability)memory_size_gb:5(adequate for session caching)redis_version:"REDIS_7_2"(latest stable)auth_enabled:true(security best practice)transit_encryption_mode:"SERVER_AUTHENTICATION"(TLS enabled)maxmemory_policy:"allkeys-lru"(sensible eviction policy)maintenance_window_day:"SUNDAY"(low-traffic day)
Dev environment overrides:
tier:"BASIC"(single-node, no HA)memory_size_gb:1(cost-optimized)
Required inputs: project_id, environment, authorized_network
Safe to re-run: ✅ Yes - Redis updates applied during maintenance window
secrets module (opentofu/modules/secrets/variables.tf)
Fair defaults:
- Default secrets defined in
main.tf:database-passwordredis-auth-tokendjango-secret-keystripe-api-keyjwt-secret-key
rotation_period: Not enforced by default (should be)
Required inputs: project_id, environment
Gap: Secret values not populated automatically (intentional security design)
Safe to re-run: ✅ Yes - Secrets module only creates secret containers, not versions
2.3 Variable Validation Summary
All modules have comprehensive validation:
✅ Project ID validation: Regex pattern enforces GCP naming rules ✅ Environment validation: Enum check (dev, staging, production only) ✅ CIDR validation: Prefix length checks ✅ Version validation: Regex for database/Redis versions ✅ Range validation: Disk size, memory size, node counts ✅ Enum validation: Tier types, availability modes, routing modes
No modules lack validation - High quality IaC code.
2.4 Idempotency Assessment
| Module | Idempotent | Safe Re-run | Notes |
|---|---|---|---|
| networking | ✅ | ✅ | State-managed, no destructive changes |
| gke | ✅ | ✅ | Blue/green node pool updates |
| cloudsql | ✅ | ⚠️ | Safe but slow (maintenance window applies) |
| redis | ✅ | ⚠️ | Safe but slow (maintenance window applies) |
| firewall | ✅ | ✅ | Rule updates are instant |
| secrets | ✅ | ✅ | Only creates containers, not versions |
Overall: ✅ Safe to re-run tofu apply multiple times
3. Kubernetes Readiness Analysis
3.1 Existing Manifests Inventory
Base manifests (kubernetes/base/):
| File | Purpose | Completeness | Application-Specific |
|---|---|---|---|
namespaces.yaml | Namespace definitions | ✅ Complete | ❌ Generic (3 envs) |
rbac.yaml | RBAC roles/bindings | ⚠️ Skeletal | ❌ Generic |
network-policies.yaml | Network segmentation | ⚠️ Skeletal | ❌ Generic |
resource-quotas.yaml | Resource limits per namespace | ⚠️ Skeletal | ❌ Generic |
limit-ranges.yaml | Default container limits | ⚠️ Skeletal | ❌ Generic |
kustomization.yaml | Kustomize base config | ✅ Complete | ❌ Generic |
Application manifests: ❌ MISSING - No Deployment, Service, Ingress, ConfigMap for actual application
3.2 Missing Kubernetes Components
Critical Blockers:
- No Deployment manifest - No FastAPI backend deployment
- No Service manifest - No LoadBalancer/ClusterIP for API
- No Ingress manifest - No external access configuration
- No ConfigMap - No application configuration (non-sensitive)
- No HorizontalPodAutoscaler - No auto-scaling rules
- No PodDisruptionBudget - No high-availability guarantees
Required for License API:
# Example structure needed (NOT PRESENT)
kubernetes/
base/
backend/
deployment.yaml # FastAPI containers
service.yaml # ClusterIP service
configmap.yaml # App configuration
hpa.yaml # Auto-scaling
ingress/
ingress.yaml # External access
tls-secret.yaml # SSL certificates
Current state: Only infrastructure-level resources exist, no application deployment.
3.3 ConfigMap and Secret Requirements
ConfigMaps Needed:
-
backend-config
- Database connection string (non-password parts)
- Redis connection string (non-password parts)
- GCP project ID
- GCP region
- Log level
- Feature flags
-
environment-config
- Environment name (dev/staging/prod)
- API endpoint URLs
- CORS allowed origins
Secrets Needed:
-
database-credentials
DB_PASSWORD(from Cloud SQL)DB_CONNECTION_NAME(Cloud SQL proxy)
-
redis-credentials
REDIS_AUTH_TOKEN(from Memorystore)
-
application-secrets
DJANGO_SECRET_KEYJWT_SECRET_KEYSTRIPE_API_KEY
Current Gap: Secret Manager secrets exist but no Kubernetes Secret manifests to mount them.
Solution Required: External Secrets Operator or Workload Identity integration.
3.4 Resource Limits/Requests
Current state: kubernetes/base/limit-ranges.yaml exists but contains placeholder values.
Production requirements for License API:
# Backend API container
resources:
requests:
cpu: 500m # 0.5 CPU
memory: 512Mi # 512 MB
limits:
cpu: 2000m # 2 CPU
memory: 2Gi # 2 GB
Missing: No resource requests/limits defined in actual Deployment manifests (since they don't exist).
3.5 Helm Chart Conversion Requirements
Current state: ❌ No Helm chart exists
Recommended Helm chart structure:
charts/
coditect-license-api/
Chart.yaml # Metadata
values.yaml # Default values
values-dev.yaml # Dev overrides
values-staging.yaml # Staging overrides
values-prod.yaml # Production overrides
templates/
deployment.yaml # Backend API deployment
service.yaml # ClusterIP service
ingress.yaml # External access
configmap.yaml # Configuration
hpa.yaml # Horizontal Pod Autoscaler
pdb.yaml # Pod Disruption Budget
serviceaccount.yaml # Workload Identity
secrets.yaml # External Secrets integration
tests/
test-connection.yaml # Helm test for API health
Effort estimate: 40 hours (1 week)
4. Installation Script Analysis
4.1 Existing Automation Scripts
| Script | Purpose | Error Handling | Rollback | Completeness |
|---|---|---|---|---|
scripts/install-tools.sh | Install CLI tools | ✅ Good | ❌ None | ⭐⭐⭐⭐ |
scripts/verify-tools.sh | Validate prerequisites | ✅ Excellent | N/A | ⭐⭐⭐⭐⭐ |
scripts/gcp-setup.sh | Create GCP project | ✅ Good | ❌ None | ⭐⭐⭐⭐ |
scripts/iam-setup.sh | Configure IAM | ⚠️ Not found | ❌ None | ❌ Missing |
opentofu/backend/create-backends.sh | Create state buckets | ✅ Good | ❌ None | ⭐⭐⭐⭐ |
Total scripts: 4 operational, 1 referenced but missing
4.2 Error Handling Assessment
install-tools.sh (/Users/halcasteel/PROJECTS/coditect-rollout-master/submodules/cloud/coditect-cloud-infra/scripts/install-tools.sh)
Error handling:
- ✅
set -euo pipefail(exit on error, unset variables, pipe failures) - ✅ Command existence checks (
command -v) - ✅ Version detection with fallback to "unknown"
- ✅ OS detection with graceful degradation
- ⚠️ Installation failures logged but script continues
- ❌ No rollback if partial installation fails
Example:
# Line 262-278
install_gcloud || log_warn "gcloud installation skipped"
echo ""
install_terraform || log_warn "Terraform installation skipped"
echo ""
Gap: Partial installs leave system in inconsistent state (no cleanup on failure).
verify-tools.sh (/Users/halcasteel/PROJECTS/coditect-rollout-master/submodules/cloud/coditect-cloud-infra/scripts/verify-tools.sh)
Error handling:
- ✅ Excellent - Detailed checks with pass/fail/warn categorization
- ✅ Exit code reflects overall status
- ✅ Clear remediation instructions for each failure
- ✅ Summary at end with counts
Example:
# Lines 350-363
if [ $FAILED -eq 0 ]; then
if [ $WARNINGS -eq 0 ]; then
echo -e "${GREEN}✓ All checks passed! Environment is ready.${NC}"
exit 0
else
echo -e "${YELLOW}⚠ Environment is mostly ready but has warnings.${NC}"
exit 0
fi
else
echo -e "${RED}✗ Some required tools are missing or misconfigured.${NC}"
exit 1
fi
Strength: Best-in-class verification script.
gcp-setup.sh (/Users/halcasteel/PROJECTS/coditect-rollout-master/submodules/cloud/coditect-cloud-infra/scripts/gcp-setup.sh)
Error handling:
- ✅
set -euo pipefail - ✅ Environment parameter validation
- ✅ gcloud authentication check
- ✅ Billing account detection
- ⚠️ API enablement continues on errors (
2>/dev/null || true) - ❌ No rollback if project creation fails mid-way
Example:
# Lines 142-145
for api in "${REQUIRED_APIS[@]}"; do
log_info "Enabling $api..."
gcloud services enable "$api" --project="$PROJECT_ID" 2>/dev/null || true
done
Gap: API enablement failures are silently ignored.
create-backends.sh (/Users/halcasteel/PROJECTS/coditect-rollout-master/submodules/cloud/coditect-cloud-infra/opentofu/backend/create-backends.sh)
Error handling:
- ✅
set -euo pipefail - ✅ Bucket existence check (skip if exists)
- ✅ gcloud/gsutil command checks
- ✅ Authentication verification
- ❌ No rollback if bucket creation succeeds but versioning/lifecycle fails
Hard-coded value: PROJECT_ID="coditect-citus-prod" (line 9) - BLOCKER
4.3 Manual Steps That Could Be Automated
| Manual Step | Current Status | Automation Feasibility | Effort (hours) |
|---|---|---|---|
| 1. Install prerequisites | ✅ Automated (install-tools.sh) | N/A | 0 |
| 2. GCP authentication | ❌ Interactive OAuth | ⚠️ Service account possible | 8 |
| 3. Create GCP project | ✅ Automated (gcp-setup.sh) | N/A | 0 |
| 4. Enable APIs | ✅ Automated (gcp-setup.sh) | N/A | 0 |
| 5. Create state buckets | ✅ Automated (create-backends.sh) | N/A | 0 |
| 6. Initialize OpenTofu | ❌ Manual | ✅ Trivial | 2 |
| 7. Review plan | ❌ Manual (best practice) | ⚠️ Auto-approve risky | N/A |
| 8. Apply infrastructure | ❌ Manual confirmation | ✅ -auto-approve flag | 1 |
| 9. Configure kubectl | ❌ Manual | ✅ Script with cluster name | 2 |
| 10. Populate secrets | ❌ Manual (security) | ⚠️ Requires secure input method | 16 |
| 11. Deploy application | ❌ No automation | ✅ Helm install command | 24 |
| 12. Verify health | ❌ Manual | ✅ kubectl wait + curl | 4 |
Total automatable hours: 57 hours
Non-automatable steps: GCP authentication (requires OAuth), manual plan review (best practice)
4.4 Rollback Mechanisms
Current state: ❌ No rollback mechanisms exist
Gaps:
-
No state backups before apply
- OpenTofu state stored in GCS with versioning
- Manual restoration:
gsutil cp gs://bucket/path/to/state.tfstate.backup terraform.tfstate - No automated rollback script
-
No infrastructure snapshots
- Cloud SQL: Automated backups exist (7-day retention)
- GKE: No cluster snapshots (recreate from OpenTofu)
- Redis: RDB snapshots exist (managed by Google)
- No coordinated restore procedure
-
No application rollback
- Kubernetes:
kubectl rollout undo deployment/backendworks - No Helm chart = no
helm rollbackcapability
- Kubernetes:
-
No health check gates
tofu applycompletes when resources created- No verification that services are actually healthy
- Could apply infrastructure but app fails to start
Recommendation: Implement scripts/rollback.sh with state restoration logic.
5. Configuration Management Analysis
5.1 Environment Variables Inventory
From .env.example (/Users/halcasteel/PROJECTS/coditect-rollout-master/submodules/cloud/coditect-cloud-infra/.env.example):
Total environment variables: 102 variables across 12 categories
Required Variables (Cannot have defaults):
| Variable | Category | Used By | Default Available? |
|---|---|---|---|
DJANGO_SECRET_KEY | Security | Django | ❌ Must generate |
DB_PASSWORD | Database | PostgreSQL | ❌ Must generate |
REDIS_AUTH_TOKEN | Cache | Redis | ❌ Auto-generated by GCP |
GOOGLE_APPLICATION_CREDENTIALS | GCP | All services | ❌ Service account key path |
STRIPE_API_KEY | Billing | Payment processing | ❌ External service |
STRIPE_WEBHOOK_SECRET | Billing | Payment webhooks | ❌ External service |
OAUTH_CLIENT_SECRET | Auth | Hydra | ❌ Must generate |
EMAIL_HOST_PASSWORD | SMTP | ❌ External service |
Count: 8 critical variables with no sensible defaults
Optional Variables (Have sensible defaults):
| Variable | Default Value | Safe for Production? |
|---|---|---|
DJANGO_DEBUG | True | ❌ Must set False |
DJANGO_ALLOWED_HOSTS | localhost,127.0.0.1 | ❌ Must configure domain |
DATABASE_URL | postgresql://...@localhost | ❌ Must use Cloud SQL |
REDIS_URL | redis://localhost:6379 | ❌ Must use Memorystore |
GOOGLE_REGION | us-central1 | ✅ Sensible default |
GKE_CLUSTER_NAME | coditect-gke-dev | ⚠️ Should match OpenTofu output |
LOG_LEVEL | INFO | ✅ Sensible default |
ENABLE_DEBUG_TOOLBAR | True | ❌ Must set False in prod |
Count: 94 variables with defaults, 12 unsafe for production
5.2 Configuration Validation
Current state: ❌ No configuration validation script exists
Gaps:
-
No environment variable validation
- Script should check required vars are set
- Script should validate format (URLs, email addresses, JSON)
- Script should detect dangerous defaults (DEBUG=True in prod)
-
No configuration file validation
.envfile not validated against.env.example- No check for missing required variables
- No type checking (integer vs string)
-
No cross-service consistency checks
- Example:
GKE_CLUSTER_NAMEshould match OpenTofu output - Example:
REDIS_HOSTshould match Memorystore IP - No automated detection of mismatches
- Example:
Recommendation: Create scripts/validate-config.sh to check all variables.
Effort: 8 hours
5.3 Secrets Management Approach
Current approach: Google Cloud Secret Manager
Implementation in OpenTofu:
opentofu/modules/secrets/main.tf creates these secrets:
database-password(Cloud SQL)redis-auth-token(Memorystore)django-secret-key(Django)stripe-api-key(Stripe)stripe-webhook-secret(Stripe)jwt-secret-key(JWT signing)oauth-client-secret(OAuth)smtp-password(Email)sentry-dsn(Monitoring)
Gap: Secrets containers created but values not populated.
Manual population required:
gcloud secrets versions add database-password --data-file=password.txt
One-click install blocker: No way to securely provide secret values during automated install.
Possible solutions:
-
Interactive prompts (guided wizard)
read -sp "Enter database password: " DB_PASSWORD
echo -n "$DB_PASSWORD" | gcloud secrets versions add database-password --data-file=- -
Secret file input (one-time config file)
# secrets.yaml (user-provided)
database_password: "xxxxx"
redis_auth_token: "xxxxx"
# Script reads YAML and populates secrets -
Random generation + display (for auto-generable secrets)
DJANGO_SECRET=$(openssl rand -base64 32)
echo "DJANGO_SECRET_KEY (save this): $DJANGO_SECRET"
echo -n "$DJANGO_SECRET" | gcloud secrets versions add django-secret-key --data-file=-
Recommendation: Hybrid approach - generate random secrets where possible, prompt for external API keys.
Effort: 16 hours
5.4 Default Values vs Required Inputs
Well-designed defaults:
✅ Networking
- All CIDR ranges have sensible defaults
- No IP conflicts between ranges
- Adequate IP space for scaling
✅ GKE
- Production-grade machine types
- Sensible min/max node counts
- Security features enabled by default
✅ Cloud SQL
- Latest PostgreSQL version
- High availability by default (prod)
- Automated backups enabled
- Point-in-time recovery enabled
✅ Redis
- High availability tier (prod)
- Authentication enabled
- TLS encryption enabled
Required inputs (unavoidable):
❌ project_id - Must be globally unique ❌ environment - Determines resource naming ❌ db_app_user_password - Security requirement ❌ db_readonly_user_password - Security requirement
Recommendation: Default values are production-ready, only 4 truly required inputs per environment.
6. Gap Analysis for One-Click Installation
6.1 Current Blockers
| # | Blocker | Severity | Impact | Workaround Available? |
|---|---|---|---|---|
| 1 | No application Helm chart | 🔴 CRITICAL | Cannot deploy License API | ❌ No |
| 2 | No secret population automation | 🔴 CRITICAL | Application won't start | ⚠️ Manual prompts |
| 3 | Hard-coded project ID in scripts | 🟡 HIGH | Cannot use custom project | ✅ Parameterize |
| 4 | No health check validation | 🟡 HIGH | Silent failures | ✅ Add kubectl wait |
| 5 | No rollback mechanism | 🟡 HIGH | Risky deployments | ⚠️ Manual restoration |
| 6 | GCP auth requires OAuth | 🟢 MEDIUM | User interaction needed | ✅ Service account option |
| 7 | No configuration validation | 🟢 MEDIUM | Silent misconfigurations | ✅ Add validation script |
| 8 | Missing IAM setup script | 🟢 MEDIUM | Manual permission setup | ✅ Create script |
Critical path: Blockers #1 and #2 must be resolved for one-click install.
6.2 Missing Components
6.2.1 Helm Charts
Status: ❌ Does not exist
Required charts:
-
coditect-license-api (main application)
- Deployment: FastAPI backend
- Service: ClusterIP + LoadBalancer
- Ingress: External access with TLS
- ConfigMap: Application configuration
- HorizontalPodAutoscaler: Auto-scaling
- ServiceAccount: Workload Identity for GCP access
-
coditect-monitoring (observability stack)
- Prometheus: Metrics collection
- Grafana: Dashboards
- Loki: Log aggregation
- Jaeger: Distributed tracing
Effort:
- License API chart: 40 hours
- Monitoring stack: 24 hours (use existing charts)
- Total: 64 hours
6.2.2 Prerequisites Checks
Status: ⚠️ Exists (verify-tools.sh) but not integrated into install flow
Missing checks:
-
Billing enabled verification
gcloud billing projects describe $PROJECT_ID --format="value(billingEnabled)" -
API quota checks
- Verify sufficient GCE instances quota
- Verify IP address quota
- Verify Cloud SQL quota
-
IAM permission validation
- Verify user has
roles/owneror equivalent - Check service account permissions
- Verify user has
-
Network connectivity
- Check internet access
- Verify GCP API reachability
- Test DNS resolution
Recommendation: Create scripts/pre-flight-check.sh that runs all validations.
Effort: 8 hours
6.2.3 Post-Install Configuration
Status: ❌ No automation exists
Required steps:
-
DNS configuration
- Create Cloud DNS zone
- Configure A records for API endpoint
- Configure TLS certificate (Let's Encrypt via cert-manager)
-
Initial data seeding
- Create default tenant
- Create admin user
- Generate API keys
-
Monitoring setup
- Configure Grafana datasources
- Import dashboards
- Setup alerting rules
-
Backup verification
- Test Cloud SQL backup restoration
- Verify Redis snapshot creation
- Test disaster recovery procedure
Effort: 24 hours
6.3 Automation Roadmap
Phase 1: Foundation (Week 1 - 40 hours)
Goal: Eliminate hard-coded values and create prerequisite validation
-
T1.1: Parameterize all hard-coded values (8h)
- Convert
coditect-citus-prodto$PROJECT_IDvariable - Make cluster name configurable
- Make region configurable
- Create global configuration file
- Convert
-
T1.2: Create
scripts/pre-flight-check.sh(8h)- Billing verification
- API quota checks
- IAM permission validation
- Network connectivity tests
-
T1.3: Create
scripts/validate-config.sh(8h)- Environment variable validation
- Type checking
- Cross-service consistency checks
- Production safety checks
-
T1.4: Create
scripts/rollback.sh(8h)- OpenTofu state restoration
- Cloud SQL backup restoration
- Kubernetes deployment rollback
- Coordination logic
-
T1.5: Integrate scripts into unified installer (8h)
scripts/install.sh- Master orchestration script- Call sequence: pre-flight → install-tools → gcp-setup → validate-config
- Error handling and rollback integration
Deliverables:
- ✅ All scripts parameterized
- ✅ Comprehensive pre-flight checks
- ✅ Configuration validation
- ✅ Rollback capability
- ✅ Unified install script
Phase 2: Application Packaging (Week 2 - 40 hours)
Goal: Create Helm chart for License API application
-
T2.1: Create Helm chart structure (8h)
- Chart.yaml with metadata
- values.yaml with all configurable options
- Environment-specific value files (dev, staging, prod)
-
T2.2: Create deployment templates (16h)
templates/deployment.yaml- FastAPI backendtemplates/service.yaml- ClusterIP + LoadBalancertemplates/ingress.yaml- External access with TLStemplates/configmap.yaml- Application configurationtemplates/hpa.yaml- Horizontal Pod Autoscalertemplates/pdb.yaml- Pod Disruption Budgettemplates/serviceaccount.yaml- Workload Identity
-
T2.3: Integrate External Secrets Operator (8h)
- Install ESO Helm chart
- Create SecretStore for GCP Secret Manager
- Create ExternalSecret resources
- Test secret synchronization
-
T2.4: Add Helm tests (4h)
templates/tests/test-connection.yaml- API health checktemplates/tests/test-database.yaml- Database connectivitytemplates/tests/test-redis.yaml- Redis connectivity
-
T2.5: Document Helm usage (4h)
- Update README with Helm install instructions
- Create values.yaml documentation
- Add troubleshooting guide
Deliverables:
- ✅ Production-ready Helm chart
- ✅ External Secrets integration
- ✅ Automated tests
- ✅ Documentation
Phase 3: Secret Management (Week 3 - 40 hours)
Goal: Automate secret generation and population
-
T3.1: Create secret generation script (8h)
- Generate random secrets (Django, JWT)
- Display generated values for user to save
- Option to provide custom values
-
T3.2: Create interactive secret wizard (16h)
- Guided prompts for all required secrets
- Validation for external API keys (Stripe, SMTP)
- Secure input (masked passwords)
- Confirmation step before population
-
T3.3: Implement secret population (8h)
- Batch upload to GCP Secret Manager
- Verify secret accessibility
- Test External Secrets synchronization
-
T3.4: Add secret rotation automation (8h)
- Create
scripts/rotate-secrets.sh - Zero-downtime rotation procedure
- Update Secret Manager + Kubernetes pods
- Create
Deliverables:
- ✅ Automated secret generation
- ✅ Interactive secret wizard
- ✅ Secret population automation
- ✅ Rotation capability
Phase 4: Health Checks & Validation (Week 4 - 40 hours)
Goal: Ensure deployed infrastructure is healthy
-
T4.1: Add infrastructure health checks (8h)
- GKE cluster ready check (
kubectl wait --for=condition=Ready nodes) - Cloud SQL connectivity test
- Redis connectivity test
- Network connectivity validation
- GKE cluster ready check (
-
T4.2: Add application health checks (8h)
- API /health endpoint check
- Database migration verification
- Redis session storage test
- End-to-end API test
-
T4.3: Create post-install verification (8h)
scripts/verify-deployment.sh- Check all Kubernetes resources healthy
- Verify ingress accessibility
- Test authentication flow
-
T4.4: Add monitoring setup automation (8h)
- Deploy Prometheus/Grafana via Helm
- Import pre-built dashboards
- Configure alerting rules
- Setup Slack/email notifications
-
T4.5: Create comprehensive installer (8h)
scripts/one-click-install.sh- Master script- Interactive mode with prompts
- Non-interactive mode with config file
- Progress indicators and logging
Deliverables:
- ✅ Comprehensive health checks
- ✅ Post-install verification
- ✅ Monitoring automation
- ✅ One-click installer script
6.4 Helm Chart Requirements Specification
Chart name: coditect-license-api
Chart version: 1.0.0
App version: 0.1.0
Values Structure
# values.yaml
replicaCount: 3
image:
repository: gcr.io/coditect-citus-prod/license-api
tag: "latest"
pullPolicy: IfNotPresent
service:
type: LoadBalancer
port: 80
targetPort: 8000
annotations:
cloud.google.com/load-balancer-type: "External"
ingress:
enabled: true
className: "nginx"
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
hosts:
- host: api.coditect.ai
paths:
- path: /
pathType: Prefix
tls:
- secretName: api-coditect-tls
hosts:
- api.coditect.ai
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 2000m
memory: 2Gi
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 20
targetCPUUtilizationPercentage: 70
targetMemoryUtilizationPercentage: 80
podDisruptionBudget:
enabled: true
minAvailable: 2
database:
host: "10.67.0.3" # Cloud SQL private IP (from OpenTofu output)
port: 5432
name: "coditect"
user: "app_user"
existingSecret: "database-credentials"
secretKey: "password"
redis:
host: "10.121.42.67" # Memorystore IP (from OpenTofu output)
port: 6378
existingSecret: "redis-credentials"
secretKey: "auth-token"
config:
logLevel: "INFO"
workers: 4
maxRequests: 1000
maxRequestsJitter: 50
externalSecrets:
enabled: true
secretStore: "gcpsm-secret-store"
secrets:
- name: database-credentials
key: database-password
- name: redis-credentials
key: redis-auth-token
- name: app-secrets
keys:
- django-secret-key
- jwt-secret-key
- stripe-api-key
Environment-Specific Overrides
values-dev.yaml:
replicaCount: 1
autoscaling:
enabled: false
ingress:
enabled: false
resources:
requests:
cpu: 250m
memory: 256Mi
values-prod.yaml:
replicaCount: 5
autoscaling:
minReplicas: 5
maxReplicas: 50
resources:
requests:
cpu: 1000m
memory: 1Gi
limits:
cpu: 4000m
memory: 4Gi
6.5 Installation Script Requirements
Script name: scripts/one-click-install.sh
Usage:
# Interactive mode (guided wizard)
./scripts/one-click-install.sh --interactive
# Non-interactive mode (config file)
./scripts/one-click-install.sh --config install-config.yaml
# Dry-run mode (plan only)
./scripts/one-click-install.sh --dry-run
Script Flow
Configuration File Format
install-config.yaml:
# GCP Configuration
gcp:
project_id: "coditect-prod"
region: "us-central1"
billing_account: "012345-67890A-BCDEF0"
# Environment Configuration
environment: "production"
# Cluster Configuration
gke:
cluster_name: "coditect-prod"
node_count: 5
machine_type: "n1-standard-4"
# Database Configuration
database:
tier: "db-custom-4-16384"
availability_type: "REGIONAL"
# Application Configuration
application:
domain: "api.coditect.ai"
replicas: 5
# Secrets Configuration (optional - will prompt if missing)
secrets:
stripe_api_key: "sk_live_xxxxx"
smtp_password: "xxxxx"
# Auto-generated secrets:
# - django_secret_key (random)
# - jwt_secret_key (random)
# - database_password (random)
# - redis_auth_token (GCP-managed)
Script Features
✅ Progress Indicators
[1/15] Running pre-flight checks... ✓
[2/15] Installing prerequisites... ✓
[3/15] Authenticating with GCP... ✓
✅ Logging
# All output logged to install-TIMESTAMP.log
# Errors highlighted in red
# Warnings in yellow
# Success in green
✅ Rollback on Failure
# Automatic rollback if any step fails
# State restoration from backup
# Resource cleanup
✅ Idempotency
# Safe to re-run after partial failure
# Skips already-completed steps
# Resumes from last checkpoint
7. Estimated Engineering Effort
7.1 Effort Breakdown by Component
| Component | Tasks | Hours | Engineer Profile |
|---|---|---|---|
| Phase 1: Foundation | 5 tasks | 40 | DevOps Engineer |
| - Parameterization | T1.1 | 8 | |
| - Pre-flight checks | T1.2 | 8 | |
| - Config validation | T1.3 | 8 | |
| - Rollback script | T1.4 | 8 | |
| - Unified installer | T1.5 | 8 | |
| Phase 2: Helm Chart | 5 tasks | 40 | Full-Stack Engineer |
| - Chart structure | T2.1 | 8 | |
| - Deployment templates | T2.2 | 16 | |
| - External Secrets | T2.3 | 8 | |
| - Helm tests | T2.4 | 4 | |
| - Documentation | T2.5 | 4 | |
| Phase 3: Secret Mgmt | 4 tasks | 40 | Security Engineer |
| - Secret generation | T3.1 | 8 | |
| - Interactive wizard | T3.2 | 16 | |
| - Secret population | T3.3 | 8 | |
| - Rotation automation | T3.4 | 8 | |
| Phase 4: Validation | 5 tasks | 40 | SRE / DevOps |
| - Infra health checks | T4.1 | 8 | |
| - App health checks | T4.2 | 8 | |
| - Post-install verify | T4.3 | 8 | |
| - Monitoring setup | T4.4 | 8 | |
| - One-click script | T4.5 | 8 | |
| Total | 19 tasks | 160 hours | 1 Engineer (4 weeks) |
7.2 Timeline
Assumptions:
- 1 full-time engineer
- 40-hour work weeks
- No blockers or dependencies
Schedule:
| Week | Phase | Deliverables | Risk Level |
|---|---|---|---|
| Week 1 | Foundation | Parameterized scripts, pre-flight checks, validation | 🟢 Low |
| Week 2 | Helm Chart | Production-ready chart, External Secrets integration | 🟡 Medium |
| Week 3 | Secret Mgmt | Automated secret generation and population | 🟡 Medium |
| Week 4 | Validation | Health checks, monitoring, one-click installer | 🟢 Low |
Critical Path: Weeks 2-3 (Helm chart + Secret management) - Highest complexity
Earliest Completion: December 21, 2025 (4 weeks from now)
7.3 Risk Factors
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| External Secrets Operator integration issues | Medium | High | Test early in Week 2, fallback to manual secrets |
| GCP quota limits | Low | High | Pre-flight quota checks, request quota increase |
| Helm chart template complexity | Medium | Medium | Start with minimal chart, iterate |
| Secret rotation breaking app | Low | Critical | Extensive testing, staged rollout |
| One-click script platform issues | Medium | Medium | Test on macOS, Linux, and Cloud Shell |
7.4 Dependencies
External dependencies:
- ✅ GCP project with billing enabled (already exists)
- ✅ OpenTofu modules (already complete and tested)
- ⚠️ FastAPI application Docker image (must exist in GCR)
- ⚠️ External Secrets Operator (must deploy first)
- ⚠️ NGINX Ingress Controller (must deploy first)
- ⚠️ cert-manager (must deploy first for TLS)
Recommendation: Add Kubernetes infrastructure prerequisites to Phase 1 (ESO, NGINX, cert-manager).
Additional effort: +16 hours (Week 1)
Revised total effort: 176 hours (4.4 weeks)
8. Recommendations
8.1 Priority Ranking
| Priority | Item | Rationale | Effort (hours) |
|---|---|---|---|
| P0 - CRITICAL | Create Helm chart for License API | Blocks application deployment | 40 |
| P0 - CRITICAL | Automate secret population | Blocks application startup | 40 |
| P1 - HIGH | Parameterize hard-coded values | Enables multi-environment deployment | 8 |
| P1 - HIGH | Add comprehensive health checks | Prevents silent failures | 16 |
| P2 - MEDIUM | Create one-click installer script | User experience improvement | 24 |
| P2 - MEDIUM | Add rollback automation | Risk mitigation | 8 |
| P3 - LOW | Setup monitoring automation | Operational excellence | 8 |
| P3 - LOW | Create configuration validation | Quality of life | 8 |
8.2 Phased Approach
Minimum Viable One-Click Install (MVP)
Scope: Automated infrastructure + manual application deployment
Includes:
- ✅ Parameterized scripts
- ✅ Pre-flight checks
- ✅ Automated OpenTofu apply
- ✅ Basic health checks
- ❌ Application Helm chart (manual helm install)
- ❌ Secret population (manual steps documented)
Effort: 64 hours (1.5 weeks)
Deliverable: Semi-automated install with clear manual steps for app deployment
Full One-Click Install
Scope: End-to-end automation
Includes:
- ✅ Everything in MVP
- ✅ Helm chart for License API
- ✅ Automated secret generation and population
- ✅ Comprehensive health checks
- ✅ Monitoring setup
- ✅ Single command installation
Effort: 176 hours (4.4 weeks)
Deliverable: True one-click install with zero manual steps (except GCP OAuth)
8.3 Quick Wins (Low-Hanging Fruit)
Can be completed in 1 day (8 hours):
-
Parameterize project ID (2 hours)
- Replace
coditect-citus-prodwith$PROJECT_IDvariable - Update all scripts and README
- Replace
-
Add basic health checks (3 hours)
kubectl wait --for=condition=Readychecks- Cloud SQL connectivity test
- Redis connectivity test
-
Create configuration validation script (3 hours)
- Check required environment variables
- Validate variable formats
- Warn about unsafe defaults
Impact: Significantly improves install reliability with minimal effort
8.4 Long-Term Improvements
Beyond one-click install:
-
Multi-region deployment (80 hours)
- Replicate infrastructure across regions
- Global load balancing
- Cross-region database replication
-
Blue/green deployment (40 hours)
- Zero-downtime upgrades
- Automated canary deployments
- Traffic shifting logic
-
Disaster recovery automation (40 hours)
- Automated backup testing
- One-click restore procedure
- Regional failover automation
-
Infrastructure cost optimization (24 hours)
- Right-sizing recommendations
- Committed use discount analysis
- Spot instance integration
-
Compliance automation (40 hours)
- CIS GCP Benchmark checks
- Automated security scanning
- Compliance reporting
Total future work: 224 hours (5.6 weeks)
9. Conclusion
9.1 Current State Summary
Strengths:
- ✅ OpenTofu modules are production-ready with excellent defaults
- ✅ Comprehensive variable validation
- ✅ Good prerequisite installation scripts
- ✅ Infrastructure deployment is largely automated
- ✅ Idempotent and safe to re-run
Weaknesses:
- ❌ No application deployment automation (Helm chart missing)
- ❌ No secret population automation
- ❌ Hard-coded values in scripts
- ❌ No comprehensive health checks
- ❌ No rollback mechanisms
9.2 Feasibility Assessment
Is one-click installation achievable?
✅ YES - The infrastructure foundation is solid. With focused engineering effort, a true one-click installer is feasible.
Key success factors:
- Create production-ready Helm chart (40 hours)
- Automate secret management (40 hours)
- Add comprehensive health checks (16 hours)
- Create unified installer script (24 hours)
Total critical path: 120 hours (3 weeks)
9.3 Recommended Next Steps
Immediate (This Week):
- ✅ Parameterize all hard-coded values (8 hours)
- ✅ Create pre-flight check script (8 hours)
- ✅ Add basic health checks (8 hours)
Short-Term (Next 2 Weeks):
- ⏸️ Create Helm chart for License API (40 hours)
- ⏸️ Implement External Secrets integration (8 hours)
Medium-Term (Weeks 3-4):
- ⏸️ Automate secret generation and population (40 hours)
- ⏸️ Create comprehensive one-click installer (24 hours)
Timeline to MVP: 1.5 weeks Timeline to Full One-Click: 4.4 weeks
9.4 Success Metrics
Definition of "One-Click Install":
# User runs single command:
./scripts/one-click-install.sh --config my-config.yaml
# Script completes:
# ✅ Prerequisites installed
# ✅ GCP project created and configured
# ✅ Infrastructure provisioned (GKE, Cloud SQL, Redis)
# ✅ Secrets generated and populated
# ✅ Application deployed and healthy
# ✅ Monitoring configured
# ✅ External access working (https://api.example.com)
# Total time: 15-20 minutes
Acceptance criteria:
- Installation completes without manual intervention
- All health checks pass
- Application accessible via HTTPS
- Monitoring dashboards populated
- Documentation includes rollback procedure
- Script tested on macOS, Linux, and Google Cloud Shell
End of Assessment
Document Version: 1.0 Last Updated: November 23, 2025 Next Review: After Phase 1 completion Owner: CODITECT Infrastructure Team