Skip to main content

One-Click Installation Readiness Assessment

CODITECT Cloud Infrastructure

Date: November 23, 2025 Repository: coditect-cloud-infra Purpose: License Management System Infrastructure Target: Automated one-click installation capability


Executive Summary

Current Installation Complexity Score: 4/10 (Moderate - Requires 15-20 manual steps)

Key Findings:

  • Infrastructure modules are production-ready with good defaults
  • Manual steps required: GCP authentication, project setup, credential management, OpenTofu initialization
  • No Helm charts exist (Kubernetes manifests are skeletal)
  • No application deployment automation
  • Secrets must be manually populated
  • Missing: Prerequisites validation, rollback mechanisms, health checks

Estimated Engineering Effort to One-Click: 120-160 hours (3-4 weeks, 1 engineer)

Recommendation: FEASIBLE - Infrastructure foundation is solid, needs orchestration layer and Helm packaging


1. Current Installation Complexity Analysis

1.1 Manual Steps Breakdown (from README.md)

The current installation requires 15 distinct manual steps:

Phase 1: Prerequisites Setup (5 steps)

Step 1: Install Google Cloud SDK

  • Location: Manual download/install
  • User Input: OS-specific installation
  • Automation: Partially automated via /Users/halcasteel/PROJECTS/coditect-rollout-master/submodules/cloud/coditect-cloud-infra/scripts/install-tools.sh

Step 2: Install OpenTofu 1.6.7+

  • Location: Manual download/install
  • User Input: Version selection
  • Automation: Script exists at /Users/halcasteel/PROJECTS/coditect-rollout-master/submodules/cloud/coditect-cloud-infra/scripts/install-tools.sh

Step 3: Install kubectl 1.28+

  • Location: Manual download/install
  • User Input: Version selection
  • Automation: Script exists at /Users/halcasteel/PROJECTS/coditect-rollout-master/submodules/cloud/coditect-cloud-infra/scripts/install-tools.sh

Step 4: Install Python 3.11+

  • Location: Manual download/install
  • User Input: Version selection
  • Automation: OS-dependent

Step 5: Install Docker (optional)

  • Location: Manual download/install
  • User Input: Docker Desktop installation
  • Automation: Not automated (requires GUI)

Phase 2: GCP Authentication (3 steps)

Step 6: Authenticate with GCP

gcloud auth application-default login
  • User Input: Google account credentials (OAuth browser flow)
  • Hard-coded Values: None
  • Automation: Interactive only

Step 7: Set GCP project

gcloud config set project coditect-citus-prod
  • User Input: Project ID (coditect-citus-prod hard-coded)
  • Hard-coded Values: YES - Project ID in README
  • Parameterization Needed: Project ID should be configurable

Step 8: Create GCP project (if needed)

  • Location: /Users/halcasteel/PROJECTS/coditect-rollout-master/submodules/cloud/coditect-cloud-infra/scripts/gcp-setup.sh
  • User Input: Environment selection (dev/staging/production)
  • Automation: GOOD - Script exists

Phase 3: Infrastructure Provisioning (4 steps)

Step 9: Initialize OpenTofu

cd opentofu/environments/dev
tofu init
  • User Input: None (if backend configured)
  • Hard-coded Values: Environment path
  • Automation: Partially - Backend creation script exists

Step 10: Review infrastructure plan

tofu plan
  • User Input: Manual review required
  • Hard-coded Values: None
  • Automation: Manual review necessary (best practice)

Step 11: Apply infrastructure

tofu apply
  • User Input: Confirmation prompt (yes)
  • Hard-coded Values: None
  • Automation: Could use -auto-approve (risky)

Step 12: Configure kubectl

gcloud container clusters get-credentials coditect-dev \
--region us-central1 \
--project coditect-citus-prod
  • User Input: Cluster name, region, project
  • Hard-coded Values: YES - All three values
  • Parameterization Needed: Cluster name, region, project ID

Phase 4: Verification (3 steps)

Step 13: Verify nodes

kubectl get nodes
  • User Input: None
  • Automation: Could be automated

Step 14: Verify pods

kubectl get pods -A
  • User Input: None
  • Automation: Could be automated

Step 15: Populate secrets manually

gcloud secrets versions add SECRET_NAME --data-file=VALUE_FILE
  • User Input: CRITICAL - All secret values must be provided
  • Hard-coded Values: None (intentionally manual for security)
  • Automation: BLOCKER - No automated secret population

1.2 Hard-Coded Values Requiring Parameterization

LocationValueTypeFix Required
README.md:93coditect-citus-prodProject IDUse environment variable
README.md:104-106coditect-devCluster nameUse variable
README.md:105us-central1RegionUse variable
.env.example:28coditect-devProject IDAlready parameterized
.env.example:29us-central1RegionAlready parameterized
opentofu/backend/create-backends.sh:9coditect-citus-prodProject IDBLOCKER - Hard-coded
opentofu/environments/dev/main.tf:5-9Multiple localsNaming patternGood (uses variables)

Count: 7 hard-coded values across documentation and scripts


1.3 Dependencies Requiring Pre-Installation

Required Dependencies:

  1. gcloud CLI - Google Cloud SDK

  2. OpenTofu - Infrastructure provisioning

  3. kubectl - Kubernetes CLI

    • Version: >= 1.28.0
    • Source: Installed via gcloud components
    • Automation: ✅ scripts/install-tools.sh
  4. Python 3.11+ - Scripting and automation

    • Version: >= 3.10.0
    • Source: OS package manager
    • Automation: ⚠️ OS-dependent
  5. Docker (optional) - Local testing

    • Version: Latest
    • Source: Docker Desktop
    • Automation: ❌ Manual GUI installation

Optional Dependencies:

  • Helm 3+ - Kubernetes package manager (not currently used)
  • pre-commit - Git hooks for validation
  • poetry - Python dependency management

Missing Checks:

  • No prerequisite version validation script (exists but not integrated)
  • No automated retry for dependency installation failures
  • No OS compatibility check (macOS vs Linux vs Windows)

2. OpenTofu Module Maturity Assessment

2.1 Module Overview

ModulePurposeDefaults QualityRequired VarsOptional VarsValidation
networkingVPC, subnets, NAT⭐⭐⭐⭐⭐ Excellent319✅ Comprehensive
gkeKubernetes cluster⭐⭐⭐⭐⭐ Excellent418✅ Comprehensive
cloudsqlPostgreSQL database⭐⭐⭐⭐⭐ Excellent325✅ Comprehensive
redisMemorystore cache⭐⭐⭐⭐⭐ Excellent215✅ Comprehensive
firewallNetwork security⭐⭐⭐⭐ Good21✅ Basic
secretsSecret Manager⭐⭐⭐ Fair25✅ Basic

Overall Module Quality: ⭐⭐⭐⭐⭐ Production-Ready (5/5)


2.2 Sensible Defaults Analysis

networking module (opentofu/modules/networking/variables.tf)

Excellent defaults:

  • network_name: "coditect-vpc" (sensible)
  • routing_mode: "REGIONAL" (cost-effective)
  • primary_subnet_cidr: "10.0.0.0/20" (4,096 IPs - adequate)
  • pods_secondary_cidr: "10.4.0.0/14" (262,144 IPs - generous for GKE)
  • services_secondary_cidr: "10.8.0.0/20" (4,096 IPs - adequate)
  • flow_logs_sampling: 0.5 (50% - balanced cost/visibility)
  • enable_nat_logging: true (good for debugging)

Required inputs: project_id, environment, region

Safe to re-run: ✅ Yes - Idempotent (OpenTofu manages state)


gke module (opentofu/modules/gke/variables.tf)

Excellent defaults:

  • cluster_name: "coditect-citus" (sensible)
  • machine_type: "n1-standard-4" (4 vCPU, 15GB RAM - production-grade)
  • min_node_count: 3 (HA requirement met)
  • max_node_count: 20 (auto-scaling headroom)
  • disk_size_gb: 100 (adequate for containers)
  • release_channel: "STABLE" (production-ready)
  • enable_binary_authorization: true (security best practice)
  • enable_managed_prometheus: true (observability built-in)

Dev environment overrides:

  • machine_type: "n1-standard-2" (cost-optimized)
  • min_node_count: 1 (reduced for dev)
  • use_preemptible_nodes: true (80% cost savings)

Required inputs: project_id, environment, network_name, subnet_name, node_service_account

Safe to re-run: ✅ Yes - Node pool updates use blue/green deployment (max_surge, max_unavailable)


cloudsql module (opentofu/modules/cloudsql/variables.tf)

Excellent defaults:

  • database_version: "POSTGRES_16" (latest stable)
  • tier: "db-custom-4-16384" (4 vCPU, 16GB RAM - production-grade)
  • availability_type: "REGIONAL" (HA by default)
  • disk_size_gb: 100 (adequate starting size)
  • deletion_protection: true (safety net)
  • enable_point_in_time_recovery: true (disaster recovery)
  • backup_retention_count: 7 (1 week retention)
  • max_connections: "200" (reasonable for 16GB RAM)
  • shared_buffers: "4096MB" (25% of RAM - PostgreSQL best practice)

Dev environment overrides:

  • tier: "db-custom-2-8192" (2 vCPU, 8GB RAM - cost-optimized)
  • availability_type: "ZONAL" (single-zone for dev)
  • deletion_protection: false (easier cleanup)

Required inputs: project_id, environment, private_network, app_user_password, readonly_user_password

Validation: ✅ Strong - Regex for database version, range checks for disk size, enum validation for availability

Safe to re-run: ✅ Yes - Cloud SQL updates are applied during maintenance window


redis module (opentofu/modules/redis/variables.tf)

Excellent defaults:

  • tier: "STANDARD_HA" (high availability)
  • memory_size_gb: 5 (adequate for session caching)
  • redis_version: "REDIS_7_2" (latest stable)
  • auth_enabled: true (security best practice)
  • transit_encryption_mode: "SERVER_AUTHENTICATION" (TLS enabled)
  • maxmemory_policy: "allkeys-lru" (sensible eviction policy)
  • maintenance_window_day: "SUNDAY" (low-traffic day)

Dev environment overrides:

  • tier: "BASIC" (single-node, no HA)
  • memory_size_gb: 1 (cost-optimized)

Required inputs: project_id, environment, authorized_network

Safe to re-run: ✅ Yes - Redis updates applied during maintenance window


secrets module (opentofu/modules/secrets/variables.tf)

Fair defaults:

  • Default secrets defined in main.tf:
    • database-password
    • redis-auth-token
    • django-secret-key
    • stripe-api-key
    • jwt-secret-key
  • rotation_period: Not enforced by default (should be)

Required inputs: project_id, environment

Gap: Secret values not populated automatically (intentional security design)

Safe to re-run: ✅ Yes - Secrets module only creates secret containers, not versions


2.3 Variable Validation Summary

All modules have comprehensive validation:

Project ID validation: Regex pattern enforces GCP naming rules ✅ Environment validation: Enum check (dev, staging, production only) ✅ CIDR validation: Prefix length checks ✅ Version validation: Regex for database/Redis versions ✅ Range validation: Disk size, memory size, node counts ✅ Enum validation: Tier types, availability modes, routing modes

No modules lack validation - High quality IaC code.


2.4 Idempotency Assessment

ModuleIdempotentSafe Re-runNotes
networkingState-managed, no destructive changes
gkeBlue/green node pool updates
cloudsql⚠️Safe but slow (maintenance window applies)
redis⚠️Safe but slow (maintenance window applies)
firewallRule updates are instant
secretsOnly creates containers, not versions

Overall: ✅ Safe to re-run tofu apply multiple times


3. Kubernetes Readiness Analysis

3.1 Existing Manifests Inventory

Base manifests (kubernetes/base/):

FilePurposeCompletenessApplication-Specific
namespaces.yamlNamespace definitions✅ Complete❌ Generic (3 envs)
rbac.yamlRBAC roles/bindings⚠️ Skeletal❌ Generic
network-policies.yamlNetwork segmentation⚠️ Skeletal❌ Generic
resource-quotas.yamlResource limits per namespace⚠️ Skeletal❌ Generic
limit-ranges.yamlDefault container limits⚠️ Skeletal❌ Generic
kustomization.yamlKustomize base config✅ Complete❌ Generic

Application manifests:MISSING - No Deployment, Service, Ingress, ConfigMap for actual application


3.2 Missing Kubernetes Components

Critical Blockers:

  1. No Deployment manifest - No FastAPI backend deployment
  2. No Service manifest - No LoadBalancer/ClusterIP for API
  3. No Ingress manifest - No external access configuration
  4. No ConfigMap - No application configuration (non-sensitive)
  5. No HorizontalPodAutoscaler - No auto-scaling rules
  6. No PodDisruptionBudget - No high-availability guarantees

Required for License API:

# Example structure needed (NOT PRESENT)
kubernetes/
base/
backend/
deployment.yaml # FastAPI containers
service.yaml # ClusterIP service
configmap.yaml # App configuration
hpa.yaml # Auto-scaling
ingress/
ingress.yaml # External access
tls-secret.yaml # SSL certificates

Current state: Only infrastructure-level resources exist, no application deployment.


3.3 ConfigMap and Secret Requirements

ConfigMaps Needed:

  1. backend-config

    • Database connection string (non-password parts)
    • Redis connection string (non-password parts)
    • GCP project ID
    • GCP region
    • Log level
    • Feature flags
  2. environment-config

    • Environment name (dev/staging/prod)
    • API endpoint URLs
    • CORS allowed origins

Secrets Needed:

  1. database-credentials

    • DB_PASSWORD (from Cloud SQL)
    • DB_CONNECTION_NAME (Cloud SQL proxy)
  2. redis-credentials

    • REDIS_AUTH_TOKEN (from Memorystore)
  3. application-secrets

    • DJANGO_SECRET_KEY
    • JWT_SECRET_KEY
    • STRIPE_API_KEY

Current Gap: Secret Manager secrets exist but no Kubernetes Secret manifests to mount them.

Solution Required: External Secrets Operator or Workload Identity integration.


3.4 Resource Limits/Requests

Current state: kubernetes/base/limit-ranges.yaml exists but contains placeholder values.

Production requirements for License API:

# Backend API container
resources:
requests:
cpu: 500m # 0.5 CPU
memory: 512Mi # 512 MB
limits:
cpu: 2000m # 2 CPU
memory: 2Gi # 2 GB

Missing: No resource requests/limits defined in actual Deployment manifests (since they don't exist).


3.5 Helm Chart Conversion Requirements

Current state: ❌ No Helm chart exists

Recommended Helm chart structure:

charts/
coditect-license-api/
Chart.yaml # Metadata
values.yaml # Default values
values-dev.yaml # Dev overrides
values-staging.yaml # Staging overrides
values-prod.yaml # Production overrides
templates/
deployment.yaml # Backend API deployment
service.yaml # ClusterIP service
ingress.yaml # External access
configmap.yaml # Configuration
hpa.yaml # Horizontal Pod Autoscaler
pdb.yaml # Pod Disruption Budget
serviceaccount.yaml # Workload Identity
secrets.yaml # External Secrets integration
tests/
test-connection.yaml # Helm test for API health

Effort estimate: 40 hours (1 week)


4. Installation Script Analysis

4.1 Existing Automation Scripts

ScriptPurposeError HandlingRollbackCompleteness
scripts/install-tools.shInstall CLI tools✅ Good❌ None⭐⭐⭐⭐
scripts/verify-tools.shValidate prerequisites✅ ExcellentN/A⭐⭐⭐⭐⭐
scripts/gcp-setup.shCreate GCP project✅ Good❌ None⭐⭐⭐⭐
scripts/iam-setup.shConfigure IAM⚠️ Not found❌ None❌ Missing
opentofu/backend/create-backends.shCreate state buckets✅ Good❌ None⭐⭐⭐⭐

Total scripts: 4 operational, 1 referenced but missing


4.2 Error Handling Assessment

install-tools.sh (/Users/halcasteel/PROJECTS/coditect-rollout-master/submodules/cloud/coditect-cloud-infra/scripts/install-tools.sh)

Error handling:

  • set -euo pipefail (exit on error, unset variables, pipe failures)
  • ✅ Command existence checks (command -v)
  • ✅ Version detection with fallback to "unknown"
  • ✅ OS detection with graceful degradation
  • ⚠️ Installation failures logged but script continues
  • ❌ No rollback if partial installation fails

Example:

# Line 262-278
install_gcloud || log_warn "gcloud installation skipped"
echo ""

install_terraform || log_warn "Terraform installation skipped"
echo ""

Gap: Partial installs leave system in inconsistent state (no cleanup on failure).


verify-tools.sh (/Users/halcasteel/PROJECTS/coditect-rollout-master/submodules/cloud/coditect-cloud-infra/scripts/verify-tools.sh)

Error handling:

  • ✅ Excellent - Detailed checks with pass/fail/warn categorization
  • ✅ Exit code reflects overall status
  • ✅ Clear remediation instructions for each failure
  • ✅ Summary at end with counts

Example:

# Lines 350-363
if [ $FAILED -eq 0 ]; then
if [ $WARNINGS -eq 0 ]; then
echo -e "${GREEN}✓ All checks passed! Environment is ready.${NC}"
exit 0
else
echo -e "${YELLOW}⚠ Environment is mostly ready but has warnings.${NC}"
exit 0
fi
else
echo -e "${RED}✗ Some required tools are missing or misconfigured.${NC}"
exit 1
fi

Strength: Best-in-class verification script.


gcp-setup.sh (/Users/halcasteel/PROJECTS/coditect-rollout-master/submodules/cloud/coditect-cloud-infra/scripts/gcp-setup.sh)

Error handling:

  • set -euo pipefail
  • ✅ Environment parameter validation
  • ✅ gcloud authentication check
  • ✅ Billing account detection
  • ⚠️ API enablement continues on errors (2>/dev/null || true)
  • ❌ No rollback if project creation fails mid-way

Example:

# Lines 142-145
for api in "${REQUIRED_APIS[@]}"; do
log_info "Enabling $api..."
gcloud services enable "$api" --project="$PROJECT_ID" 2>/dev/null || true
done

Gap: API enablement failures are silently ignored.


create-backends.sh (/Users/halcasteel/PROJECTS/coditect-rollout-master/submodules/cloud/coditect-cloud-infra/opentofu/backend/create-backends.sh)

Error handling:

  • set -euo pipefail
  • ✅ Bucket existence check (skip if exists)
  • ✅ gcloud/gsutil command checks
  • ✅ Authentication verification
  • ❌ No rollback if bucket creation succeeds but versioning/lifecycle fails

Hard-coded value: PROJECT_ID="coditect-citus-prod" (line 9) - BLOCKER


4.3 Manual Steps That Could Be Automated

Manual StepCurrent StatusAutomation FeasibilityEffort (hours)
1. Install prerequisites✅ Automated (install-tools.sh)N/A0
2. GCP authentication❌ Interactive OAuth⚠️ Service account possible8
3. Create GCP project✅ Automated (gcp-setup.sh)N/A0
4. Enable APIs✅ Automated (gcp-setup.sh)N/A0
5. Create state buckets✅ Automated (create-backends.sh)N/A0
6. Initialize OpenTofu❌ Manual✅ Trivial2
7. Review plan❌ Manual (best practice)⚠️ Auto-approve riskyN/A
8. Apply infrastructure❌ Manual confirmation-auto-approve flag1
9. Configure kubectl❌ Manual✅ Script with cluster name2
10. Populate secrets❌ Manual (security)⚠️ Requires secure input method16
11. Deploy application❌ No automation✅ Helm install command24
12. Verify health❌ Manual✅ kubectl wait + curl4

Total automatable hours: 57 hours

Non-automatable steps: GCP authentication (requires OAuth), manual plan review (best practice)


4.4 Rollback Mechanisms

Current state:No rollback mechanisms exist

Gaps:

  1. No state backups before apply

    • OpenTofu state stored in GCS with versioning
    • Manual restoration: gsutil cp gs://bucket/path/to/state.tfstate.backup terraform.tfstate
    • No automated rollback script
  2. No infrastructure snapshots

    • Cloud SQL: Automated backups exist (7-day retention)
    • GKE: No cluster snapshots (recreate from OpenTofu)
    • Redis: RDB snapshots exist (managed by Google)
    • No coordinated restore procedure
  3. No application rollback

    • Kubernetes: kubectl rollout undo deployment/backend works
    • No Helm chart = no helm rollback capability
  4. No health check gates

    • tofu apply completes when resources created
    • No verification that services are actually healthy
    • Could apply infrastructure but app fails to start

Recommendation: Implement scripts/rollback.sh with state restoration logic.


5. Configuration Management Analysis

5.1 Environment Variables Inventory

From .env.example (/Users/halcasteel/PROJECTS/coditect-rollout-master/submodules/cloud/coditect-cloud-infra/.env.example):

Total environment variables: 102 variables across 12 categories

Required Variables (Cannot have defaults):

VariableCategoryUsed ByDefault Available?
DJANGO_SECRET_KEYSecurityDjango❌ Must generate
DB_PASSWORDDatabasePostgreSQL❌ Must generate
REDIS_AUTH_TOKENCacheRedis❌ Auto-generated by GCP
GOOGLE_APPLICATION_CREDENTIALSGCPAll services❌ Service account key path
STRIPE_API_KEYBillingPayment processing❌ External service
STRIPE_WEBHOOK_SECRETBillingPayment webhooks❌ External service
OAUTH_CLIENT_SECRETAuthHydra❌ Must generate
EMAIL_HOST_PASSWORDEmailSMTP❌ External service

Count: 8 critical variables with no sensible defaults


Optional Variables (Have sensible defaults):

VariableDefault ValueSafe for Production?
DJANGO_DEBUGTrue❌ Must set False
DJANGO_ALLOWED_HOSTSlocalhost,127.0.0.1❌ Must configure domain
DATABASE_URLpostgresql://...@localhost❌ Must use Cloud SQL
REDIS_URLredis://localhost:6379❌ Must use Memorystore
GOOGLE_REGIONus-central1✅ Sensible default
GKE_CLUSTER_NAMEcoditect-gke-dev⚠️ Should match OpenTofu output
LOG_LEVELINFO✅ Sensible default
ENABLE_DEBUG_TOOLBARTrue❌ Must set False in prod

Count: 94 variables with defaults, 12 unsafe for production


5.2 Configuration Validation

Current state: ❌ No configuration validation script exists

Gaps:

  1. No environment variable validation

    • Script should check required vars are set
    • Script should validate format (URLs, email addresses, JSON)
    • Script should detect dangerous defaults (DEBUG=True in prod)
  2. No configuration file validation

    • .env file not validated against .env.example
    • No check for missing required variables
    • No type checking (integer vs string)
  3. No cross-service consistency checks

    • Example: GKE_CLUSTER_NAME should match OpenTofu output
    • Example: REDIS_HOST should match Memorystore IP
    • No automated detection of mismatches

Recommendation: Create scripts/validate-config.sh to check all variables.

Effort: 8 hours


5.3 Secrets Management Approach

Current approach: Google Cloud Secret Manager

Implementation in OpenTofu:

opentofu/modules/secrets/main.tf creates these secrets:

  1. database-password (Cloud SQL)
  2. redis-auth-token (Memorystore)
  3. django-secret-key (Django)
  4. stripe-api-key (Stripe)
  5. stripe-webhook-secret (Stripe)
  6. jwt-secret-key (JWT signing)
  7. oauth-client-secret (OAuth)
  8. smtp-password (Email)
  9. sentry-dsn (Monitoring)

Gap: Secrets containers created but values not populated.

Manual population required:

gcloud secrets versions add database-password --data-file=password.txt

One-click install blocker: No way to securely provide secret values during automated install.

Possible solutions:

  1. Interactive prompts (guided wizard)

    read -sp "Enter database password: " DB_PASSWORD
    echo -n "$DB_PASSWORD" | gcloud secrets versions add database-password --data-file=-
  2. Secret file input (one-time config file)

    # secrets.yaml (user-provided)
    database_password: "xxxxx"
    redis_auth_token: "xxxxx"
    # Script reads YAML and populates secrets
  3. Random generation + display (for auto-generable secrets)

    DJANGO_SECRET=$(openssl rand -base64 32)
    echo "DJANGO_SECRET_KEY (save this): $DJANGO_SECRET"
    echo -n "$DJANGO_SECRET" | gcloud secrets versions add django-secret-key --data-file=-

Recommendation: Hybrid approach - generate random secrets where possible, prompt for external API keys.

Effort: 16 hours


5.4 Default Values vs Required Inputs

Well-designed defaults:

Networking

  • All CIDR ranges have sensible defaults
  • No IP conflicts between ranges
  • Adequate IP space for scaling

GKE

  • Production-grade machine types
  • Sensible min/max node counts
  • Security features enabled by default

Cloud SQL

  • Latest PostgreSQL version
  • High availability by default (prod)
  • Automated backups enabled
  • Point-in-time recovery enabled

Redis

  • High availability tier (prod)
  • Authentication enabled
  • TLS encryption enabled

Required inputs (unavoidable):

project_id - Must be globally unique ❌ environment - Determines resource naming ❌ db_app_user_password - Security requirement ❌ db_readonly_user_password - Security requirement

Recommendation: Default values are production-ready, only 4 truly required inputs per environment.


6. Gap Analysis for One-Click Installation

6.1 Current Blockers

#BlockerSeverityImpactWorkaround Available?
1No application Helm chart🔴 CRITICALCannot deploy License API❌ No
2No secret population automation🔴 CRITICALApplication won't start⚠️ Manual prompts
3Hard-coded project ID in scripts🟡 HIGHCannot use custom project✅ Parameterize
4No health check validation🟡 HIGHSilent failures✅ Add kubectl wait
5No rollback mechanism🟡 HIGHRisky deployments⚠️ Manual restoration
6GCP auth requires OAuth🟢 MEDIUMUser interaction needed✅ Service account option
7No configuration validation🟢 MEDIUMSilent misconfigurations✅ Add validation script
8Missing IAM setup script🟢 MEDIUMManual permission setup✅ Create script

Critical path: Blockers #1 and #2 must be resolved for one-click install.


6.2 Missing Components

6.2.1 Helm Charts

Status: ❌ Does not exist

Required charts:

  1. coditect-license-api (main application)

    • Deployment: FastAPI backend
    • Service: ClusterIP + LoadBalancer
    • Ingress: External access with TLS
    • ConfigMap: Application configuration
    • HorizontalPodAutoscaler: Auto-scaling
    • ServiceAccount: Workload Identity for GCP access
  2. coditect-monitoring (observability stack)

    • Prometheus: Metrics collection
    • Grafana: Dashboards
    • Loki: Log aggregation
    • Jaeger: Distributed tracing

Effort:

  • License API chart: 40 hours
  • Monitoring stack: 24 hours (use existing charts)
  • Total: 64 hours

6.2.2 Prerequisites Checks

Status: ⚠️ Exists (verify-tools.sh) but not integrated into install flow

Missing checks:

  1. Billing enabled verification

    gcloud billing projects describe $PROJECT_ID --format="value(billingEnabled)"
  2. API quota checks

    • Verify sufficient GCE instances quota
    • Verify IP address quota
    • Verify Cloud SQL quota
  3. IAM permission validation

    • Verify user has roles/owner or equivalent
    • Check service account permissions
  4. Network connectivity

    • Check internet access
    • Verify GCP API reachability
    • Test DNS resolution

Recommendation: Create scripts/pre-flight-check.sh that runs all validations.

Effort: 8 hours


6.2.3 Post-Install Configuration

Status: ❌ No automation exists

Required steps:

  1. DNS configuration

    • Create Cloud DNS zone
    • Configure A records for API endpoint
    • Configure TLS certificate (Let's Encrypt via cert-manager)
  2. Initial data seeding

    • Create default tenant
    • Create admin user
    • Generate API keys
  3. Monitoring setup

    • Configure Grafana datasources
    • Import dashboards
    • Setup alerting rules
  4. Backup verification

    • Test Cloud SQL backup restoration
    • Verify Redis snapshot creation
    • Test disaster recovery procedure

Effort: 24 hours


6.3 Automation Roadmap

Phase 1: Foundation (Week 1 - 40 hours)

Goal: Eliminate hard-coded values and create prerequisite validation

  • T1.1: Parameterize all hard-coded values (8h)

    • Convert coditect-citus-prod to $PROJECT_ID variable
    • Make cluster name configurable
    • Make region configurable
    • Create global configuration file
  • T1.2: Create scripts/pre-flight-check.sh (8h)

    • Billing verification
    • API quota checks
    • IAM permission validation
    • Network connectivity tests
  • T1.3: Create scripts/validate-config.sh (8h)

    • Environment variable validation
    • Type checking
    • Cross-service consistency checks
    • Production safety checks
  • T1.4: Create scripts/rollback.sh (8h)

    • OpenTofu state restoration
    • Cloud SQL backup restoration
    • Kubernetes deployment rollback
    • Coordination logic
  • T1.5: Integrate scripts into unified installer (8h)

    • scripts/install.sh - Master orchestration script
    • Call sequence: pre-flight → install-tools → gcp-setup → validate-config
    • Error handling and rollback integration

Deliverables:

  • ✅ All scripts parameterized
  • ✅ Comprehensive pre-flight checks
  • ✅ Configuration validation
  • ✅ Rollback capability
  • ✅ Unified install script

Phase 2: Application Packaging (Week 2 - 40 hours)

Goal: Create Helm chart for License API application

  • T2.1: Create Helm chart structure (8h)

    • Chart.yaml with metadata
    • values.yaml with all configurable options
    • Environment-specific value files (dev, staging, prod)
  • T2.2: Create deployment templates (16h)

    • templates/deployment.yaml - FastAPI backend
    • templates/service.yaml - ClusterIP + LoadBalancer
    • templates/ingress.yaml - External access with TLS
    • templates/configmap.yaml - Application configuration
    • templates/hpa.yaml - Horizontal Pod Autoscaler
    • templates/pdb.yaml - Pod Disruption Budget
    • templates/serviceaccount.yaml - Workload Identity
  • T2.3: Integrate External Secrets Operator (8h)

    • Install ESO Helm chart
    • Create SecretStore for GCP Secret Manager
    • Create ExternalSecret resources
    • Test secret synchronization
  • T2.4: Add Helm tests (4h)

    • templates/tests/test-connection.yaml - API health check
    • templates/tests/test-database.yaml - Database connectivity
    • templates/tests/test-redis.yaml - Redis connectivity
  • T2.5: Document Helm usage (4h)

    • Update README with Helm install instructions
    • Create values.yaml documentation
    • Add troubleshooting guide

Deliverables:

  • ✅ Production-ready Helm chart
  • ✅ External Secrets integration
  • ✅ Automated tests
  • ✅ Documentation

Phase 3: Secret Management (Week 3 - 40 hours)

Goal: Automate secret generation and population

  • T3.1: Create secret generation script (8h)

    • Generate random secrets (Django, JWT)
    • Display generated values for user to save
    • Option to provide custom values
  • T3.2: Create interactive secret wizard (16h)

    • Guided prompts for all required secrets
    • Validation for external API keys (Stripe, SMTP)
    • Secure input (masked passwords)
    • Confirmation step before population
  • T3.3: Implement secret population (8h)

    • Batch upload to GCP Secret Manager
    • Verify secret accessibility
    • Test External Secrets synchronization
  • T3.4: Add secret rotation automation (8h)

    • Create scripts/rotate-secrets.sh
    • Zero-downtime rotation procedure
    • Update Secret Manager + Kubernetes pods

Deliverables:

  • ✅ Automated secret generation
  • ✅ Interactive secret wizard
  • ✅ Secret population automation
  • ✅ Rotation capability

Phase 4: Health Checks & Validation (Week 4 - 40 hours)

Goal: Ensure deployed infrastructure is healthy

  • T4.1: Add infrastructure health checks (8h)

    • GKE cluster ready check (kubectl wait --for=condition=Ready nodes)
    • Cloud SQL connectivity test
    • Redis connectivity test
    • Network connectivity validation
  • T4.2: Add application health checks (8h)

    • API /health endpoint check
    • Database migration verification
    • Redis session storage test
    • End-to-end API test
  • T4.3: Create post-install verification (8h)

    • scripts/verify-deployment.sh
    • Check all Kubernetes resources healthy
    • Verify ingress accessibility
    • Test authentication flow
  • T4.4: Add monitoring setup automation (8h)

    • Deploy Prometheus/Grafana via Helm
    • Import pre-built dashboards
    • Configure alerting rules
    • Setup Slack/email notifications
  • T4.5: Create comprehensive installer (8h)

    • scripts/one-click-install.sh - Master script
    • Interactive mode with prompts
    • Non-interactive mode with config file
    • Progress indicators and logging

Deliverables:

  • ✅ Comprehensive health checks
  • ✅ Post-install verification
  • ✅ Monitoring automation
  • ✅ One-click installer script

6.4 Helm Chart Requirements Specification

Chart name: coditect-license-api Chart version: 1.0.0 App version: 0.1.0

Values Structure

# values.yaml
replicaCount: 3

image:
repository: gcr.io/coditect-citus-prod/license-api
tag: "latest"
pullPolicy: IfNotPresent

service:
type: LoadBalancer
port: 80
targetPort: 8000
annotations:
cloud.google.com/load-balancer-type: "External"

ingress:
enabled: true
className: "nginx"
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
hosts:
- host: api.coditect.ai
paths:
- path: /
pathType: Prefix
tls:
- secretName: api-coditect-tls
hosts:
- api.coditect.ai

resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 2000m
memory: 2Gi

autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 20
targetCPUUtilizationPercentage: 70
targetMemoryUtilizationPercentage: 80

podDisruptionBudget:
enabled: true
minAvailable: 2

database:
host: "10.67.0.3" # Cloud SQL private IP (from OpenTofu output)
port: 5432
name: "coditect"
user: "app_user"
existingSecret: "database-credentials"
secretKey: "password"

redis:
host: "10.121.42.67" # Memorystore IP (from OpenTofu output)
port: 6378
existingSecret: "redis-credentials"
secretKey: "auth-token"

config:
logLevel: "INFO"
workers: 4
maxRequests: 1000
maxRequestsJitter: 50

externalSecrets:
enabled: true
secretStore: "gcpsm-secret-store"
secrets:
- name: database-credentials
key: database-password
- name: redis-credentials
key: redis-auth-token
- name: app-secrets
keys:
- django-secret-key
- jwt-secret-key
- stripe-api-key

Environment-Specific Overrides

values-dev.yaml:

replicaCount: 1
autoscaling:
enabled: false
ingress:
enabled: false
resources:
requests:
cpu: 250m
memory: 256Mi

values-prod.yaml:

replicaCount: 5
autoscaling:
minReplicas: 5
maxReplicas: 50
resources:
requests:
cpu: 1000m
memory: 1Gi
limits:
cpu: 4000m
memory: 4Gi

6.5 Installation Script Requirements

Script name: scripts/one-click-install.sh

Usage:

# Interactive mode (guided wizard)
./scripts/one-click-install.sh --interactive

# Non-interactive mode (config file)
./scripts/one-click-install.sh --config install-config.yaml

# Dry-run mode (plan only)
./scripts/one-click-install.sh --dry-run

Script Flow

Configuration File Format

install-config.yaml:

# GCP Configuration
gcp:
project_id: "coditect-prod"
region: "us-central1"
billing_account: "012345-67890A-BCDEF0"

# Environment Configuration
environment: "production"

# Cluster Configuration
gke:
cluster_name: "coditect-prod"
node_count: 5
machine_type: "n1-standard-4"

# Database Configuration
database:
tier: "db-custom-4-16384"
availability_type: "REGIONAL"

# Application Configuration
application:
domain: "api.coditect.ai"
replicas: 5

# Secrets Configuration (optional - will prompt if missing)
secrets:
stripe_api_key: "sk_live_xxxxx"
smtp_password: "xxxxx"
# Auto-generated secrets:
# - django_secret_key (random)
# - jwt_secret_key (random)
# - database_password (random)
# - redis_auth_token (GCP-managed)

Script Features

Progress Indicators

[1/15] Running pre-flight checks... ✓
[2/15] Installing prerequisites... ✓
[3/15] Authenticating with GCP... ✓

Logging

# All output logged to install-TIMESTAMP.log
# Errors highlighted in red
# Warnings in yellow
# Success in green

Rollback on Failure

# Automatic rollback if any step fails
# State restoration from backup
# Resource cleanup

Idempotency

# Safe to re-run after partial failure
# Skips already-completed steps
# Resumes from last checkpoint

7. Estimated Engineering Effort

7.1 Effort Breakdown by Component

ComponentTasksHoursEngineer Profile
Phase 1: Foundation5 tasks40DevOps Engineer
- ParameterizationT1.18
- Pre-flight checksT1.28
- Config validationT1.38
- Rollback scriptT1.48
- Unified installerT1.58
Phase 2: Helm Chart5 tasks40Full-Stack Engineer
- Chart structureT2.18
- Deployment templatesT2.216
- External SecretsT2.38
- Helm testsT2.44
- DocumentationT2.54
Phase 3: Secret Mgmt4 tasks40Security Engineer
- Secret generationT3.18
- Interactive wizardT3.216
- Secret populationT3.38
- Rotation automationT3.48
Phase 4: Validation5 tasks40SRE / DevOps
- Infra health checksT4.18
- App health checksT4.28
- Post-install verifyT4.38
- Monitoring setupT4.48
- One-click scriptT4.58
Total19 tasks160 hours1 Engineer (4 weeks)

7.2 Timeline

Assumptions:

  • 1 full-time engineer
  • 40-hour work weeks
  • No blockers or dependencies

Schedule:

WeekPhaseDeliverablesRisk Level
Week 1FoundationParameterized scripts, pre-flight checks, validation🟢 Low
Week 2Helm ChartProduction-ready chart, External Secrets integration🟡 Medium
Week 3Secret MgmtAutomated secret generation and population🟡 Medium
Week 4ValidationHealth checks, monitoring, one-click installer🟢 Low

Critical Path: Weeks 2-3 (Helm chart + Secret management) - Highest complexity

Earliest Completion: December 21, 2025 (4 weeks from now)

7.3 Risk Factors

RiskProbabilityImpactMitigation
External Secrets Operator integration issuesMediumHighTest early in Week 2, fallback to manual secrets
GCP quota limitsLowHighPre-flight quota checks, request quota increase
Helm chart template complexityMediumMediumStart with minimal chart, iterate
Secret rotation breaking appLowCriticalExtensive testing, staged rollout
One-click script platform issuesMediumMediumTest on macOS, Linux, and Cloud Shell

7.4 Dependencies

External dependencies:

  • ✅ GCP project with billing enabled (already exists)
  • ✅ OpenTofu modules (already complete and tested)
  • ⚠️ FastAPI application Docker image (must exist in GCR)
  • ⚠️ External Secrets Operator (must deploy first)
  • ⚠️ NGINX Ingress Controller (must deploy first)
  • ⚠️ cert-manager (must deploy first for TLS)

Recommendation: Add Kubernetes infrastructure prerequisites to Phase 1 (ESO, NGINX, cert-manager).

Additional effort: +16 hours (Week 1)

Revised total effort: 176 hours (4.4 weeks)


8. Recommendations

8.1 Priority Ranking

PriorityItemRationaleEffort (hours)
P0 - CRITICALCreate Helm chart for License APIBlocks application deployment40
P0 - CRITICALAutomate secret populationBlocks application startup40
P1 - HIGHParameterize hard-coded valuesEnables multi-environment deployment8
P1 - HIGHAdd comprehensive health checksPrevents silent failures16
P2 - MEDIUMCreate one-click installer scriptUser experience improvement24
P2 - MEDIUMAdd rollback automationRisk mitigation8
P3 - LOWSetup monitoring automationOperational excellence8
P3 - LOWCreate configuration validationQuality of life8

8.2 Phased Approach

Minimum Viable One-Click Install (MVP)

Scope: Automated infrastructure + manual application deployment

Includes:

  • ✅ Parameterized scripts
  • ✅ Pre-flight checks
  • ✅ Automated OpenTofu apply
  • ✅ Basic health checks
  • ❌ Application Helm chart (manual helm install)
  • ❌ Secret population (manual steps documented)

Effort: 64 hours (1.5 weeks)

Deliverable: Semi-automated install with clear manual steps for app deployment


Full One-Click Install

Scope: End-to-end automation

Includes:

  • ✅ Everything in MVP
  • ✅ Helm chart for License API
  • ✅ Automated secret generation and population
  • ✅ Comprehensive health checks
  • ✅ Monitoring setup
  • ✅ Single command installation

Effort: 176 hours (4.4 weeks)

Deliverable: True one-click install with zero manual steps (except GCP OAuth)


8.3 Quick Wins (Low-Hanging Fruit)

Can be completed in 1 day (8 hours):

  1. Parameterize project ID (2 hours)

    • Replace coditect-citus-prod with $PROJECT_ID variable
    • Update all scripts and README
  2. Add basic health checks (3 hours)

    • kubectl wait --for=condition=Ready checks
    • Cloud SQL connectivity test
    • Redis connectivity test
  3. Create configuration validation script (3 hours)

    • Check required environment variables
    • Validate variable formats
    • Warn about unsafe defaults

Impact: Significantly improves install reliability with minimal effort


8.4 Long-Term Improvements

Beyond one-click install:

  1. Multi-region deployment (80 hours)

    • Replicate infrastructure across regions
    • Global load balancing
    • Cross-region database replication
  2. Blue/green deployment (40 hours)

    • Zero-downtime upgrades
    • Automated canary deployments
    • Traffic shifting logic
  3. Disaster recovery automation (40 hours)

    • Automated backup testing
    • One-click restore procedure
    • Regional failover automation
  4. Infrastructure cost optimization (24 hours)

    • Right-sizing recommendations
    • Committed use discount analysis
    • Spot instance integration
  5. Compliance automation (40 hours)

    • CIS GCP Benchmark checks
    • Automated security scanning
    • Compliance reporting

Total future work: 224 hours (5.6 weeks)


9. Conclusion

9.1 Current State Summary

Strengths:

  • ✅ OpenTofu modules are production-ready with excellent defaults
  • ✅ Comprehensive variable validation
  • ✅ Good prerequisite installation scripts
  • ✅ Infrastructure deployment is largely automated
  • ✅ Idempotent and safe to re-run

Weaknesses:

  • ❌ No application deployment automation (Helm chart missing)
  • ❌ No secret population automation
  • ❌ Hard-coded values in scripts
  • ❌ No comprehensive health checks
  • ❌ No rollback mechanisms

9.2 Feasibility Assessment

Is one-click installation achievable?

YES - The infrastructure foundation is solid. With focused engineering effort, a true one-click installer is feasible.

Key success factors:

  1. Create production-ready Helm chart (40 hours)
  2. Automate secret management (40 hours)
  3. Add comprehensive health checks (16 hours)
  4. Create unified installer script (24 hours)

Total critical path: 120 hours (3 weeks)

Immediate (This Week):

  1. ✅ Parameterize all hard-coded values (8 hours)
  2. ✅ Create pre-flight check script (8 hours)
  3. ✅ Add basic health checks (8 hours)

Short-Term (Next 2 Weeks):

  1. ⏸️ Create Helm chart for License API (40 hours)
  2. ⏸️ Implement External Secrets integration (8 hours)

Medium-Term (Weeks 3-4):

  1. ⏸️ Automate secret generation and population (40 hours)
  2. ⏸️ Create comprehensive one-click installer (24 hours)

Timeline to MVP: 1.5 weeks Timeline to Full One-Click: 4.4 weeks

9.4 Success Metrics

Definition of "One-Click Install":

# User runs single command:
./scripts/one-click-install.sh --config my-config.yaml

# Script completes:
# ✅ Prerequisites installed
# ✅ GCP project created and configured
# ✅ Infrastructure provisioned (GKE, Cloud SQL, Redis)
# ✅ Secrets generated and populated
# ✅ Application deployed and healthy
# ✅ Monitoring configured
# ✅ External access working (https://api.example.com)

# Total time: 15-20 minutes

Acceptance criteria:

  • Installation completes without manual intervention
  • All health checks pass
  • Application accessible via HTTPS
  • Monitoring dashboards populated
  • Documentation includes rollback procedure
  • Script tested on macOS, Linux, and Google Cloud Shell

End of Assessment

Document Version: 1.0 Last Updated: November 23, 2025 Next Review: After Phase 1 completion Owner: CODITECT Infrastructure Team