Skip to main content

Cloud Architecture Analysis - GCP Portability Assessment

CODITECT Cloud Infrastructure - Multi-Cloud Readiness Report

Version: 1.0 Date: November 23, 2025 Status: Infrastructure Analysis Complete Purpose: Evaluate GCP-specific dependencies and multi-cloud migration path


Executive Summary

This document analyzes the coditect-cloud-infra repository to identify GCP-specific dependencies, assess cloud portability, and provide a roadmap for multi-cloud deployment capabilities.

Key Findings:

  • Current State: 100% GCP-locked infrastructure
  • Portability Score: 40/100 (significant refactoring required)
  • Migration Effort: 3-4 weeks for AWS, 4-6 weeks for Azure
  • Recommended Approach: Multi-cloud abstraction layer with provider-specific implementations

Table of Contents

  1. Current State: GCP-Specific Dependencies
  2. Infrastructure Layer Analysis
  3. Data Flow & Service Dependencies
  4. Cloud Portability Assessment
  5. Gap Analysis: Multi-Cloud Support
  6. Target Architecture: Cloud-Agnostic Design
  7. Migration Path & Implementation Plan
  8. Recommendations & Next Steps

Current State: GCP-Specific Dependencies

1. OpenTofu Modules - 100% GCP-Locked

All infrastructure modules are tightly coupled to GCP provider resources.

Module: GKE Cluster (opentofu/modules/gke/main.tf)

GCP-Specific Resources:

  • google_container_cluster (line 17) - GKE cluster with Google-specific configuration
  • google_container_node_pool (line 137) - GKE node pool management
  • Workload Identity configuration (line 66-68) - GCP-specific authentication
  • Binary Authorization (commented line 70) - GCP security feature
  • Shielded instance config (line 189-192) - GCP-specific node security

Hard Dependencies:

workload_identity_config {
workload_pool = "${var.project_id}.svc.id.goog" # GCP-specific format
}

shielded_instance_config {
enable_secure_boot = true
enable_integrity_monitoring = true
}

AWS Equivalent: EKS cluster would require IRSA (IAM Roles for Service Accounts) instead of Workload Identity Azure Equivalent: AKS cluster would use Managed Identity instead

Module: Cloud SQL (opentofu/modules/cloudsql/main.tf)

GCP-Specific Resources:

  • google_sql_database_instance (line 25) - Cloud SQL PostgreSQL
  • google_sql_database (line 108) - Database creation
  • google_sql_user (line 122, 129) - User management
  • Regional HA configuration (line 32) - GCP-specific availability model
  • Insights configuration (line 87-92) - GCP-specific query monitoring

Hard Dependencies:

settings {
tier = var.tier # GCP machine types (db-custom-2-7680)
availability_type = var.availability_type # REGIONAL vs ZONAL

insights_config {
query_insights_enabled = var.enable_query_insights # GCP-only feature
}
}

AWS Equivalent: RDS PostgreSQL with Multi-AZ deployment Azure Equivalent: Azure Database for PostgreSQL with zone-redundant HA

Module: Redis (opentofu/modules/redis/main.tf)

GCP-Specific Resources:

  • google_redis_instance (line 16) - Cloud Memorystore for Redis
  • STANDARD_HA tier configuration (line 22, 32) - GCP-specific tiers
  • read_replicas_mode (line 33) - GCP-specific replication mode
  • maintenance_policy (line 52-60) - GCP weekly maintenance windows

Hard Dependencies:

tier = var.tier  # BASIC or STANDARD_HA (GCP-specific)
read_replicas_mode = var.read_replicas_mode # READ_REPLICAS_ENABLED (GCP-only)

AWS Equivalent: ElastiCache for Redis with cluster mode Azure Equivalent: Azure Cache for Redis with clustering

Module: Networking (opentofu/modules/networking/main.tf)

GCP-Specific Resources:

  • google_compute_network (line 16) - VPC network
  • google_compute_subnetwork (line 29) - Subnet with secondary IP ranges for GKE
  • google_compute_router (line 60) - Cloud Router for NAT
  • google_compute_router_nat (line 73) - Cloud NAT for egress traffic
  • google_service_networking_connection (line 125) - Private service connection for Cloud SQL
  • google_dns_managed_zone (line 132) - Cloud DNS private zone

Hard Dependencies:

# Secondary IP ranges for GKE pods/services (GCP-specific)
secondary_ip_range {
range_name = var.pods_secondary_range_name
ip_cidr_range = var.pods_secondary_cidr
}

# Cloud NAT configuration (GCP-specific)
google_compute_router_nat "nat" {
nat_ip_allocate_option = var.nat_ip_allocate_option
enable_dynamic_port_allocation = var.enable_dynamic_port_allocation
}

AWS Equivalent: VPC with NAT Gateway, secondary CIDR blocks for EKS Azure Equivalent: VNet with NAT Gateway, subnet delegation for AKS

Module: Firewall (opentofu/modules/firewall/main.tf)

GCP-Specific Resources:

  • google_compute_firewall (lines 16, 47, 73, 98, etc.) - VPC firewall rules
  • GCP load balancer health check ranges (line 58-61) - 130.211.0.0/22, 35.191.0.0/16
  • GKE master CIDR configuration (line 162) - GCP-specific master node networking

Hard Dependencies:

# GCP-specific health check source ranges
source_ranges = [
"130.211.0.0/22", # GCP Health Check ranges
"35.191.0.0/16",
]

AWS Equivalent: Security Groups with different health check ranges Azure Equivalent: Network Security Groups with Azure load balancer IPs

Module: Secrets (opentofu/modules/secrets/main.tf)

GCP-Specific Resources:

  • google_secret_manager_secret (line 59) - Secret Manager secrets
  • google_secret_manager_secret_iam_member (line 100, 110, 120) - IAM bindings
  • Automatic replication (line 66-68) - GCP-specific replication strategy
  • Rotation disabled (line 72-85) - GCP Pub/Sub notification integration

Hard Dependencies:

replication {
auto {
# Automatic replication across all regions (GCP-specific)
}
}

# IAM bindings for service accounts (GCP-specific)
member = "serviceAccount:${var.app_runtime_service_account}"

AWS Equivalent: AWS Secrets Manager with replication regions Azure Equivalent: Azure Key Vault with secret replication

2. Environment Configurations - GCP Provider Lock-In

Environment: Development (opentofu/environments/dev/main.tf)

Provider Configuration:

# providers.tf (line 1-30)
terraform {
required_providers {
google = {
source = "hashicorp/google"
version = "~> 5.0"
}
google-beta = {
source = "hashicorp/google-beta"
version = "~> 5.0"
}
}
}

provider "google" {
project = var.project_id
region = var.region
}

Module Instantiation with GCP-Specific Parameters:

# GKE with GCP service account (line 179)
node_service_account = var.gke_service_account_email

# Cloud SQL with private network (line 73)
private_network = module.networking.network_self_link
enable_public_ip = false

# Redis with authorized network (line 128)
authorized_network = module.networking.network_self_link

3. Kubernetes Manifests - Cloud-Agnostic (MISSING)

Status: No Kubernetes manifests exist in the repository yet.

Planned Kubernetes Components:

  • License API deployment (FastAPI)
  • Admin Dashboard deployment (React)
  • Ingress controller (NGINX)
  • Service accounts with Workload Identity annotations (GCP-specific)

Anticipated GCP Lock-In:

# Future Workload Identity annotation (GCP-specific)
apiVersion: v1
kind: ServiceAccount
metadata:
name: license-api-sa
annotations:
iam.gke.io/gcp-service-account: license-api@project.iam.gserviceaccount.com

4. Documentation - GCP-Centric

GCP-Specific References:

  • README.md - "Cloud Provider: Google Cloud Platform (GCP)" (line 11)
  • docs/architecture.md - Entire document assumes GCP (lines 40-54 technology stack)
  • docs/architecture/c2-container-diagram.md - GCP services throughout (Cloud SQL, Memorystore, Cloud KMS, Identity Platform)
  • docs/gcp-infrastructure-inventory.md - Complete GCP resource inventory

Infrastructure Layer Analysis

Compute Layer

ComponentGCP ServiceGCP-Locked?Cloud-Agnostic Alternative
Container OrchestrationGKE (Google Kubernetes Engine)✅ Yes✅ Kubernetes (EKS, AKS)
Node Pool ManagementGKE Node Pools✅ Yes❌ Cloud-specific (EKS Node Groups, AKS Node Pools)
Workload IdentityGKE Workload Identity✅ Yes❌ Cloud-specific (IRSA for EKS, Managed Identity for AKS)
Auto-scalingGKE HPA + Cluster Autoscaler⚠️ Partial✅ Kubernetes HPA (universal), cloud-specific cluster autoscaler

Data Layer

ComponentGCP ServiceGCP-Locked?Cloud-Agnostic Alternative
PostgreSQLCloud SQL PostgreSQL✅ Yes❌ Cloud-specific (RDS, Azure Database)
RedisCloud Memorystore for Redis✅ Yes❌ Cloud-specific (ElastiCache, Azure Cache)
Database HARegional HA (Cloud SQL)✅ Yes❌ Cloud-specific (Multi-AZ, Zone-redundant)
Connection PoolingBuilt-in Cloud SQL Proxy✅ Yes⚠️ Self-managed (PgBouncer on K8s)

Network Layer

ComponentGCP ServiceGCP-Locked?Cloud-Agnostic Alternative
VPCGoogle Compute Network✅ Yes❌ Cloud-specific (AWS VPC, Azure VNet)
SubnetsGoogle Compute Subnetwork✅ Yes❌ Cloud-specific (AWS Subnet, Azure Subnet)
NAT GatewayCloud NAT✅ Yes❌ Cloud-specific (AWS NAT Gateway, Azure NAT Gateway)
Load BalancerGCP Load Balancer✅ Yes⚠️ Kubernetes Ingress (cloud-agnostic)
Private Service ConnectionVPC Peering for Cloud SQL✅ Yes❌ Cloud-specific (VPC Peering, Private Link)
DNSCloud DNS✅ Yes⚠️ External DNS provider (Route53, Azure DNS)

Security Layer

ComponentGCP ServiceGCP-Locked?Cloud-Agnostic Alternative
Secrets ManagementSecret Manager✅ Yes❌ Cloud-specific (AWS Secrets Manager, Azure Key Vault)
Cryptographic SigningCloud KMS✅ Yes❌ Cloud-specific (AWS KMS, Azure Key Vault)
AuthenticationIdentity Platform (Firebase)✅ Yes✅ Self-managed (Keycloak, Ory Hydra)
IAMGoogle IAM✅ Yes❌ Cloud-specific (AWS IAM, Azure RBAC)
Firewall RulesVPC Firewall Rules✅ Yes❌ Cloud-specific (Security Groups, NSGs)

Monitoring & Observability

ComponentGCP ServiceGCP-Locked?Cloud-Agnostic Alternative
LoggingCloud Logging✅ Yes✅ Self-managed (Fluentd + Loki)
MetricsCloud Monitoring⚠️ Partial✅ Self-managed (Prometheus)
TracingCloud Trace⚠️ Partial✅ Self-managed (Jaeger)
DashboardsCloud Console✅ Yes✅ Self-managed (Grafana)

Summary:

  • Compute: 75% GCP-locked (Kubernetes core is portable, node management is not)
  • Data: 100% GCP-locked (managed database services)
  • Network: 100% GCP-locked (VPC primitives)
  • Security: 90% GCP-locked (Identity Platform could be replaced)
  • Monitoring: 50% GCP-locked (self-managed stack exists)

Data Flow & Service Dependencies

1. License Acquisition Flow

GCP-Specific Touchpoints:

CODITECT CLI (Local)
↓ HTTPS
GCP Load Balancer (global) [GCP-Specific: Cloud Armor, SSL certificates]

GKE Ingress Controller (NGINX) [Cloud-Agnostic]

License API Pod (FastAPI) [Cloud-Agnostic]
├─→ Identity Platform (JWT verification) [GCP-Specific: Firebase Auth]
├─→ Cloud SQL PostgreSQL (license lookup) [GCP-Specific: Private IP via VPC Peering]
├─→ Cloud Memorystore Redis (seat counting) [GCP-Specific: Private IP via authorized_network]
└─→ Cloud KMS (license signing) [GCP-Specific: RSA-4096 asymmetric keys]

GCP Lock-In Points:

  1. Identity Platform: JWT verification via Firebase Admin SDK (line 247-257, docs/architecture/c2-container-diagram.md)
  2. Cloud SQL Connection: Private IP via VPC peering, google_service_networking_connection resource
  3. Redis Connection: Private IP via authorized_network, no VPC peering option
  4. Cloud KMS: AsymmetricSign API call (line 348-353, C2 diagram)

2. Heartbeat Mechanism

GCP-Specific Touchpoints:

CODITECT Heartbeat Thread (Local)
↓ HTTPS (every 5 min)
License API (FastAPI)
↓ Redis protocol
Cloud Memorystore Redis
└─→ EXPIRE session:{id} 360 [Cloud-Agnostic]

GCP Lock-In Points:

  1. Redis Connection: Must use private IP from GCP VPC (line 128, opentofu/environments/dev/main.tf)

3. Session Expiry Cleanup

GCP-Specific Touchpoints:

Cloud Memorystore Redis (TTL expires)
↓ Keyspace Notification
Redis Pub/Sub Channel [Cloud-Agnostic]

Cleanup Worker Pod (FastAPI background task)
├─→ Redis (DECR seat count)
└─→ Cloud SQL PostgreSQL (audit log)

GCP Lock-In Points:

  1. Redis Keyspace Notifications: Works on any Redis, but connection is GCP-specific

4. Stripe Webhook Flow

GCP-Specific Touchpoints:

Stripe API (External)
↓ HTTPS Webhook
GCP Load Balancer

License API (FastAPI)
├─→ Cloud SQL PostgreSQL (update tenant plan) [GCP-Specific]
└─→ SendGrid API (send emails) [Cloud-Agnostic]

GCP Lock-In Points:

  1. Database Update: Direct connection to Cloud SQL via private IP

Service Dependency Map

┌─────────────────────────────────────────────────────────────┐
│ GCP Project │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ GKE Cluster (us-central1) │ │
│ │ ┌───────────────┐ ┌───────────────┐ │ │
│ │ │ License API │ │ Admin UI │ │ │
│ │ │ Deployment │ │ Deployment │ │ │
│ │ └───────┬───────┘ └───────────────┘ │ │
│ │ │ │ │
│ │ │ Service Account (Workload Identity) │ │
│ │ └──────┬──────────┬──────────┬──────────┐ │ │
│ └─────────────────┼──────────┼──────────┼──────────┼─┘ │
│ │ │ │ │ │
│ ┌─────────────────▼────┐ ┌──▼───────┐ ┌▼────────┐ ┌▼────┐ │
│ │ Cloud SQL │ │ Redis │ │ KMS │ │ IDP │ │
│ │ PostgreSQL 16 │ │ 6GB │ │ Signing │ │Auth │ │
│ │ Private IP: 10.67.0.3│ │10.121. │ │ RSA-4096│ │JWT │ │
│ └──────────────────────┘ └──────────┘ └─────────┘ └─────┘ │
│ │
│ All components communicate via private IPs (VPC internal) │
└─────────────────────────────────────────────────────────────┘

Critical GCP-Specific Dependencies:

  1. VPC Peering: Cloud SQL requires google_service_networking_connection for private IP access
  2. Authorized Networks: Redis requires authorized_network parameter pointing to VPC
  3. Workload Identity: GKE service accounts bound to Google IAM service accounts
  4. Cloud KMS Access: Requires IAM role roles/cloudkms.signerVerifier on service account

Cloud Portability Assessment

Portability Scoring Methodology

Criteria:

  • Infrastructure as Code: Can modules be rewritten for other providers?
  • Service Availability: Does equivalent service exist on AWS/Azure?
  • Data Portability: Can data be migrated without vendor lock-in?
  • Network Architecture: Can networking patterns translate to other clouds?
  • Application Code: Does backend code use cloud-agnostic libraries?

Component-by-Component Assessment

ComponentPortability ScoreRationaleMigration Effort
GKE Cluster6/10Kubernetes is universal, but GKE-specific features (Workload Identity, node pools) require refactoring2 weeks
Cloud SQL7/10PostgreSQL is standard, but connection method (VPC peering) is GCP-specific1 week
Cloud Memorystore8/10Redis is standard protocol, but authorized_network is GCP-specific3 days
Cloud KMS4/10Signing API is GCP-specific, requires complete rewrite for AWS KMS/Azure Key Vault1 week
Identity Platform3/10Firebase Auth is GCP-specific, migration to Cognito/Azure AD B2C is complex2 weeks
VPC Networking5/10Networking primitives differ significantly across clouds (NAT, peering, secondary IPs)1 week
Secret Manager7/10Secrets are just key-value pairs, but IAM bindings differ3 days
Load Balancer9/10Kubernetes Ingress abstracts load balancer, minimal changes1 day
Monitoring8/10Prometheus/Grafana are cloud-agnostic (if self-managed)3 days

Overall Portability Score: 40/100

Breakdown:

  • Infrastructure Modules: 30/100 (heavily GCP-locked)
  • Application Code: 80/100 (FastAPI is cloud-agnostic, assuming no GCP SDK usage)
  • Data Storage: 50/100 (PostgreSQL/Redis are portable, but connection patterns are not)
  • Authentication: 20/100 (Identity Platform is GCP-specific)
  • Monitoring: 70/100 (self-managed stack exists)

Cloud-Specific Feature Usage Analysis

Features That Prevent Easy Migration

1. Workload Identity (GKE)

  • What it is: Binds Kubernetes service accounts to Google IAM service accounts
  • Why it's locked: Requires GCP IAM API and GKE-specific annotations
  • AWS Equivalent: IRSA (IAM Roles for Service Accounts) - different annotation format
  • Azure Equivalent: Managed Identity - completely different authentication flow

Example:

# GCP Workload Identity
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
iam.gke.io/gcp-service-account: license-api@project.iam.gserviceaccount.com

# AWS IRSA
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/license-api-role

2. VPC Peering for Cloud SQL

  • What it is: Private service connection for Cloud SQL instances
  • Why it's locked: Uses google_service_networking_connection resource
  • AWS Equivalent: RDS uses VPC security groups, not peering
  • Azure Equivalent: Private Link for Azure Database

Example:

# GCP VPC Peering
resource "google_service_networking_connection" "private_vpc_connection" {
network = google_compute_network.vpc.id
service = "servicenetworking.googleapis.com"
reserved_peering_ranges = [google_compute_global_address.private_ip_address.name]
}

# AWS doesn't require this - RDS lives in VPC subnets directly

3. Cloud KMS Asymmetric Signing

  • What it is: RSA-4096 asymmetric key signing for license tokens
  • Why it's locked: Uses GCP-specific AsymmetricSign API
  • AWS Equivalent: AWS KMS Sign operation with different parameters
  • Azure Equivalent: Azure Key Vault Sign operation with different SDK

Example:

# GCP Cloud KMS
from google.cloud import kms

kms_client = kms.KeyManagementServiceClient()
sign_response = kms_client.asymmetric_sign(
request={"name": key_name, "digest": {"sha256": digest}}
)

# AWS KMS
import boto3

kms_client = boto3.client('kms')
sign_response = kms_client.sign(
KeyId='alias/license-signing-key',
Message=digest,
MessageType='DIGEST',
SigningAlgorithm='RSASSA_PKCS1_V1_5_SHA_256'
)

4. Cloud Memorystore Authorized Networks

  • What it is: Restricts Redis access to specific VPC network
  • Why it's locked: Uses GCP VPC resource ID format
  • AWS Equivalent: ElastiCache uses VPC security groups
  • Azure Equivalent: Azure Cache uses VNet integration

5. Identity Platform Multi-Tenancy

  • What it is: Firebase Auth with multi-tenant support
  • Why it's locked: Uses Firebase Admin SDK with GCP-specific tenant management
  • AWS Equivalent: Cognito User Pools (different API)
  • Azure Equivalent: Azure AD B2C (different authentication flow)

Gap Analysis: Multi-Cloud Support

What We Have (GCP-Only)

CategoryCurrent ImplementationCloud ProviderPortability
ComputeGKE cluster with Workload IdentityGCPLow
DatabaseCloud SQL PostgreSQL with VPC peeringGCPMedium
CacheCloud Memorystore Redis with authorized_networkGCPMedium
SecretsSecret Manager with IAM bindingsGCPMedium
SigningCloud KMS asymmetric keysGCPLow
AuthIdentity Platform (Firebase)GCPLow
NetworkingVPC with Cloud NAT and private service connectionGCPLow
Load BalancingGCP Load Balancer (planned NGINX Ingress)GCPHigh

What We Need (Multi-Cloud)

CategoryTarget ImplementationAbstraction LevelMigration Effort
ComputeKubernetes with cloud-agnostic service account mappingHigh2 weeks
DatabasePostgreSQL with cloud-agnostic connection stringHigh1 week
CacheRedis with cloud-agnostic connection stringHigh3 days
SecretsHashiCorp Vault or cloud-agnostic secret providerHigh1 week
SigningSelf-managed signing service with HSMHigh2 weeks
AuthKeycloak or Ory Hydra (self-managed)High2-3 weeks
NetworkingCloud-agnostic VPC module with provider-specific implementationsMedium1 week
Load BalancingKubernetes Ingress (NGINX or cloud-agnostic)High1 day

Missing Abstractions

1. Service Account Abstraction

  • Gap: No abstraction layer between Kubernetes service accounts and cloud IAM
  • Impact: Cannot deploy to AWS/Azure without rewriting IAM bindings
  • Solution: Create abstraction module that generates provider-specific annotations

2. Database Connection Abstraction

  • Gap: Hard-coded private IP connections via VPC peering
  • Impact: Connection method changes per cloud provider
  • Solution: Use connection proxy or service mesh for database access

3. Secrets Provider Abstraction

  • Gap: Direct Secret Manager API calls in future application code
  • Impact: Cannot switch secret providers without code changes
  • Solution: Use Kubernetes External Secrets Operator or HashiCorp Vault

4. Signing Service Abstraction

  • Gap: Direct Cloud KMS API calls for license signing
  • Impact: Cannot switch to AWS KMS or Azure Key Vault without rewrite
  • Solution: Create internal signing microservice that abstracts KMS provider

5. Authentication Provider Abstraction

  • Gap: Identity Platform (Firebase) is GCP-specific
  • Impact: Cannot use Cognito (AWS) or Azure AD B2C without complete rewrite
  • Solution: Migrate to self-managed Keycloak or Ory Hydra

Hard-Coded GCP Resource Names

Examples Found:

  1. Workload Identity Pool Format:
# opentofu/modules/gke/main.tf (line 67)
workload_pool = "${var.project_id}.svc.id.goog" # GCP-specific format
  1. Health Check IP Ranges:
# opentofu/modules/firewall/main.tf (line 58-61)
source_ranges = [
"130.211.0.0/22", # GCP Health Check ranges
"35.191.0.0/16",
]
  1. Provider Version Constraints:
# opentofu/environments/dev/providers.tf (line 7-9)
google = {
source = "hashicorp/google"
version = "~> 5.0"
}

Provider-Specific APIs in Use

GCP APIs Detected:

APIUsage LocationPurposeCloud-Agnostic Alternative
google_container_clusteropentofu/modules/gke/main.tf:17GKE cluster creationAWS: aws_eks_cluster
google_sql_database_instanceopentofu/modules/cloudsql/main.tf:25Cloud SQL instanceAWS: aws_db_instance
google_redis_instanceopentofu/modules/redis/main.tf:16Memorystore RedisAWS: aws_elasticache_cluster
google_compute_networkopentofu/modules/networking/main.tf:16VPC networkAWS: aws_vpc
google_compute_router_natopentofu/modules/networking/main.tf:73Cloud NATAWS: aws_nat_gateway
google_secret_manager_secretopentofu/modules/secrets/main.tf:59Secret storageAWS: aws_secretsmanager_secret
google_service_networking_connectionopentofu/modules/networking/main.tf:125Private service connectionAWS: VPC subnet placement

Target Architecture: Cloud-Agnostic Design

Multi-Cloud Abstraction Strategy

Approach: Terraform/OpenTofu module abstraction with provider-specific implementations.

Directory Structure:

opentofu/
├── modules/
│ ├── compute/ # Cloud-agnostic Kubernetes
│ │ ├── interface.tf # Common variables/outputs
│ │ ├── gcp/ # GKE implementation
│ │ │ └── main.tf
│ │ ├── aws/ # EKS implementation
│ │ │ └── main.tf
│ │ └── azure/ # AKS implementation
│ │ └── main.tf
│ │
│ ├── database/ # Cloud-agnostic PostgreSQL
│ │ ├── interface.tf
│ │ ├── gcp/ # Cloud SQL
│ │ ├── aws/ # RDS
│ │ └── azure/ # Azure Database
│ │
│ ├── cache/ # Cloud-agnostic Redis
│ │ ├── interface.tf
│ │ ├── gcp/ # Memorystore
│ │ ├── aws/ # ElastiCache
│ │ └── azure/ # Azure Cache
│ │
│ ├── networking/ # Cloud-agnostic VPC
│ │ ├── interface.tf
│ │ ├── gcp/
│ │ ├── aws/
│ │ └── azure/
│ │
│ └── secrets/ # Cloud-agnostic secrets
│ ├── interface.tf
│ ├── gcp/
│ ├── aws/
│ └── azure/

└── environments/
├── dev/
│ ├── main.tf # Uses cloud-agnostic modules
│ ├── provider.tf # Cloud selection: GCP | AWS | Azure
│ └── variables.tf
└── production/
└── ...

Cloud-Agnostic Module Interface

Example: Compute Module Interface

# opentofu/modules/compute/interface.tf

# Input Variables (Cloud-Agnostic)
variable "cluster_name" {
type = string
description = "Name of the Kubernetes cluster"
}

variable "region" {
type = string
description = "Cloud region (e.g., us-central1, us-east-1, eastus)"
}

variable "kubernetes_version" {
type = string
description = "Kubernetes version (e.g., 1.28)"
}

variable "node_machine_type" {
type = string
description = "Machine type for nodes (cloud-agnostic naming)"
# Examples: "small" (2 vCPU, 4GB), "medium" (4 vCPU, 8GB), "large" (8 vCPU, 16GB)
}

variable "min_nodes" {
type = number
description = "Minimum number of nodes"
}

variable "max_nodes" {
type = number
description = "Maximum number of nodes"
}

variable "network_id" {
type = string
description = "Cloud-agnostic network identifier"
}

variable "subnet_id" {
type = string
description = "Cloud-agnostic subnet identifier"
}

variable "enable_private_nodes" {
type = bool
description = "Enable private IP nodes (no public IPs)"
default = true
}

variable "master_authorized_cidrs" {
type = list(string)
description = "List of CIDRs authorized to access Kubernetes API"
default = []
}

# Output Values (Cloud-Agnostic)
output "cluster_id" {
value = module.cloud_specific.cluster_id
description = "Unique identifier for the cluster"
}

output "cluster_endpoint" {
value = module.cloud_specific.cluster_endpoint
description = "Kubernetes API endpoint"
sensitive = true
}

output "cluster_ca_certificate" {
value = module.cloud_specific.cluster_ca_certificate
description = "Cluster CA certificate (base64 encoded)"
sensitive = true
}

output "kubeconfig" {
value = module.cloud_specific.kubeconfig
description = "Kubeconfig for cluster access"
sensitive = true
}

output "service_account_email" {
value = module.cloud_specific.service_account_email
description = "Email of the service account for Kubernetes workloads"
}

# Provider-Specific Module Selection
module "cloud_specific" {
source = var.cloud_provider == "gcp" ? "./gcp" : (
var.cloud_provider == "aws" ? "./aws" : "./azure"
)

cluster_name = var.cluster_name
region = var.region
kubernetes_version = var.kubernetes_version
node_machine_type = local.machine_type_map[var.node_machine_type]
min_nodes = var.min_nodes
max_nodes = var.max_nodes
network_id = var.network_id
subnet_id = var.subnet_id
enable_private_nodes = var.enable_private_nodes
master_authorized_cidrs = var.master_authorized_cidrs
}

# Machine Type Translation
locals {
machine_type_map = {
"small" = var.cloud_provider == "gcp" ? "n1-standard-2" : (
var.cloud_provider == "aws" ? "t3.medium" : "Standard_D2s_v3"
)
"medium" = var.cloud_provider == "gcp" ? "n1-standard-4" : (
var.cloud_provider == "aws" ? "t3.large" : "Standard_D4s_v3"
)
"large" = var.cloud_provider == "gcp" ? "n1-standard-8" : (
var.cloud_provider == "aws" ? "t3.xlarge" : "Standard_D8s_v3"
)
}
}

GCP-Specific Implementation

# opentofu/modules/compute/gcp/main.tf

resource "google_container_cluster" "primary" {
name = var.cluster_name
location = var.region

remove_default_node_pool = true
initial_node_count = 1

network = var.network_id
subnetwork = var.subnet_id

private_cluster_config {
enable_private_nodes = var.enable_private_nodes
enable_private_endpoint = false
}

dynamic "master_authorized_networks_config" {
for_each = length(var.master_authorized_cidrs) > 0 ? [1] : []
content {
dynamic "cidr_blocks" {
for_each = var.master_authorized_cidrs
content {
cidr_block = cidr_blocks.value
}
}
}
}

workload_identity_config {
workload_pool = "${var.project_id}.svc.id.goog"
}
}

resource "google_container_node_pool" "primary_nodes" {
name = "${var.cluster_name}-node-pool"
location = var.region
cluster = google_container_cluster.primary.name

autoscaling {
min_node_count = var.min_nodes
max_node_count = var.max_nodes
}

node_config {
machine_type = var.node_machine_type
preemptible = var.enable_preemptible

workload_metadata_config {
mode = "GKE_METADATA"
}
}
}

# Outputs
output "cluster_id" {
value = google_container_cluster.primary.id
}

output "cluster_endpoint" {
value = google_container_cluster.primary.endpoint
}

output "cluster_ca_certificate" {
value = google_container_cluster.primary.master_auth[0].cluster_ca_certificate
}

output "service_account_email" {
value = google_service_account.gke_sa.email
}

AWS-Specific Implementation

# opentofu/modules/compute/aws/main.tf

resource "aws_eks_cluster" "primary" {
name = var.cluster_name
role_arn = aws_iam_role.eks_cluster.arn
version = var.kubernetes_version

vpc_config {
subnet_ids = [var.subnet_id]
endpoint_private_access = true
endpoint_public_access = !var.enable_private_nodes
public_access_cidrs = var.master_authorized_cidrs
}
}

resource "aws_eks_node_group" "primary_nodes" {
cluster_name = aws_eks_cluster.primary.name
node_group_name = "${var.cluster_name}-node-group"
node_role_arn = aws_iam_role.eks_nodes.arn
subnet_ids = [var.subnet_id]

scaling_config {
desired_size = var.min_nodes
max_size = var.max_nodes
min_size = var.min_nodes
}

instance_types = [var.node_machine_type]
capacity_type = var.enable_preemptible ? "SPOT" : "ON_DEMAND"
}

# Outputs
output "cluster_id" {
value = aws_eks_cluster.primary.id
}

output "cluster_endpoint" {
value = aws_eks_cluster.primary.endpoint
}

output "cluster_ca_certificate" {
value = aws_eks_cluster.primary.certificate_authority[0].data
}

output "service_account_email" {
value = aws_iam_role.eks_service_account.arn
}

Application-Level Cloud Abstraction

License API Backend - Cloud-Agnostic Database Connection

Current (GCP-Locked):

# app/database.py
import asyncpg

DATABASE_URL = "postgresql://user:pass@10.67.0.3:5432/coditect" # GCP private IP

async def get_db():
pool = await asyncpg.create_pool(DATABASE_URL)
async with pool.acquire() as conn:
yield conn

Target (Cloud-Agnostic):

# app/database.py
import os
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession

# Database URL from environment variable (works on any cloud)
DATABASE_URL = os.getenv("DATABASE_URL") # postgresql://user:pass@hostname:5432/coditect

engine = create_async_engine(DATABASE_URL, echo=False)

async def get_db():
async with AsyncSession(engine) as session:
yield session

Configuration per Cloud:

# GCP: Use Cloud SQL Proxy sidecar container
DATABASE_URL=postgresql://user:pass@localhost:5432/coditect

# AWS: Use RDS endpoint
DATABASE_URL=postgresql://user:pass@coditect-prod.abc123.us-east-1.rds.amazonaws.com:5432/coditect

# Azure: Use Azure Database endpoint
DATABASE_URL=postgresql://user:pass@coditect-prod.postgres.database.azure.com:5432/coditect

License API Backend - Cloud-Agnostic Signing Service

Current (GCP-Locked):

# app/signing.py
from google.cloud import kms
import hashlib
import json
import base64

kms_client = kms.KeyManagementServiceClient()
key_name = "projects/PROJECT/locations/REGION/keyRings/RING/cryptoKeys/KEY"

def sign_license(payload: dict) -> str:
digest = hashlib.sha256(json.dumps(payload, sort_keys=True).encode()).digest()
sign_response = kms_client.asymmetric_sign(
request={"name": key_name, "digest": {"sha256": digest}}
)
return base64.b64encode(sign_response.signature).decode()

Target (Cloud-Agnostic with Provider Factory):

# app/signing.py
import os
from abc import ABC, abstractmethod

class SigningProvider(ABC):
@abstractmethod
def sign(self, payload: dict) -> str:
pass

@abstractmethod
def verify(self, payload: dict, signature: str) -> bool:
pass

class GCPKMSProvider(SigningProvider):
def __init__(self, key_name: str):
from google.cloud import kms
self.client = kms.KeyManagementServiceClient()
self.key_name = key_name

def sign(self, payload: dict) -> str:
import hashlib, json, base64
digest = hashlib.sha256(json.dumps(payload, sort_keys=True).encode()).digest()
sign_response = self.client.asymmetric_sign(
request={"name": self.key_name, "digest": {"sha256": digest}}
)
return base64.b64encode(sign_response.signature).decode()

class AWSKMSProvider(SigningProvider):
def __init__(self, key_id: str):
import boto3
self.client = boto3.client('kms')
self.key_id = key_id

def sign(self, payload: dict) -> str:
import hashlib, json, base64
digest = hashlib.sha256(json.dumps(payload, sort_keys=True).encode()).digest()
sign_response = self.client.sign(
KeyId=self.key_id,
Message=digest,
MessageType='DIGEST',
SigningAlgorithm='RSASSA_PKCS1_V1_5_SHA_256'
)
return base64.b64encode(sign_response['Signature']).decode()

class AzureKeyVaultProvider(SigningProvider):
def __init__(self, vault_url: str, key_name: str):
from azure.keyvault.keys.crypto import CryptographyClient
from azure.identity import DefaultAzureCredential
self.client = CryptographyClient(
key=f"{vault_url}/keys/{key_name}",
credential=DefaultAzureCredential()
)

def sign(self, payload: dict) -> str:
import hashlib, json, base64
from azure.keyvault.keys.crypto import SignatureAlgorithm
digest = hashlib.sha256(json.dumps(payload, sort_keys=True).encode()).digest()
result = self.client.sign(SignatureAlgorithm.rs256, digest)
return base64.b64encode(result.signature).decode()

# Factory
def get_signing_provider() -> SigningProvider:
provider = os.getenv("CLOUD_PROVIDER", "gcp")

if provider == "gcp":
key_name = os.getenv("GCP_KMS_KEY_NAME")
return GCPKMSProvider(key_name)
elif provider == "aws":
key_id = os.getenv("AWS_KMS_KEY_ID")
return AWSKMSProvider(key_id)
elif provider == "azure":
vault_url = os.getenv("AZURE_KEYVAULT_URL")
key_name = os.getenv("AZURE_KEY_NAME")
return AzureKeyVaultProvider(vault_url, key_name)
else:
raise ValueError(f"Unsupported cloud provider: {provider}")

# Usage in FastAPI
signer = get_signing_provider()
signature = signer.sign(license_payload)

Migration Path & Implementation Plan

Phase 1: Module Abstraction (2-3 weeks)

Goal: Create cloud-agnostic module interfaces with GCP-specific implementations.

Tasks:

  1. Create Module Interface Layer (3 days)

    • Define standardized input/output variables for each module
    • Create machine type translation maps (small/medium/large → cloud-specific)
    • Document cloud-agnostic parameter naming conventions
  2. Refactor GKE Module (3 days)

    • Move opentofu/modules/gke/opentofu/modules/compute/gcp/
    • Create opentofu/modules/compute/interface.tf
    • Test with existing dev environment
  3. Refactor Cloud SQL Module (2 days)

    • Move opentofu/modules/cloudsql/opentofu/modules/database/gcp/
    • Create opentofu/modules/database/interface.tf
    • Abstract connection method (private IP vs connection string)
  4. Refactor Redis Module (2 days)

    • Move opentofu/modules/redis/opentofu/modules/cache/gcp/
    • Create opentofu/modules/cache/interface.tf
    • Standardize connection parameters
  5. Refactor Networking Module (3 days)

    • Move opentofu/modules/networking/opentofu/modules/networking/gcp/
    • Create opentofu/modules/networking/interface.tf
    • Abstract VPC, subnets, NAT, and private service connections
  6. Refactor Secrets Module (2 days)

    • Move opentofu/modules/secrets/opentofu/modules/secrets/gcp/
    • Create opentofu/modules/secrets/interface.tf
    • Abstract IAM bindings

Deliverables:

  • 6 cloud-agnostic module interfaces
  • GCP-specific implementations moved to gcp/ subdirectories
  • Updated environment configurations using new module paths
  • Documentation: Module interface specifications

Phase 2: AWS Implementation (3-4 weeks)

Goal: Implement AWS-specific modules matching cloud-agnostic interfaces.

Tasks:

  1. AWS Compute Module (EKS) (5 days)

    • Create opentofu/modules/compute/aws/main.tf
    • Implement EKS cluster with node groups
    • Configure IRSA (IAM Roles for Service Accounts)
    • Test deployment in AWS dev environment
  2. AWS Database Module (RDS) (3 days)

    • Create opentofu/modules/database/aws/main.tf
    • Implement RDS PostgreSQL with Multi-AZ
    • Configure security groups for database access
    • Test connectivity from EKS
  3. AWS Cache Module (ElastiCache) (3 days)

    • Create opentofu/modules/cache/aws/main.tf
    • Implement ElastiCache for Redis with cluster mode
    • Configure security groups and subnet groups
    • Test connectivity from EKS
  4. AWS Networking Module (VPC) (3 days)

    • Create opentofu/modules/networking/aws/main.tf
    • Implement VPC with public/private subnets
    • Configure NAT Gateway for private subnet egress
    • Setup VPC endpoints for AWS services
  5. AWS Secrets Module (Secrets Manager) (2 days)

    • Create opentofu/modules/secrets/aws/main.tf
    • Implement Secrets Manager secrets
    • Configure IAM policies for EKS service accounts
    • Test secret retrieval from pods
  6. AWS Signing Service (KMS) (3 days)

    • Create opentofu/modules/signing/aws/main.tf
    • Implement KMS key with RSA-4096
    • Configure IAM policies for signing
    • Update License API to support AWS KMS provider
  7. AWS Environment Configuration (3 days)

    • Create opentofu/environments/aws-dev/
    • Configure AWS provider and backend (S3 + DynamoDB)
    • Deploy full stack to AWS
    • End-to-end testing

Deliverables:

  • 6 AWS-specific module implementations
  • AWS dev environment fully operational
  • Documentation: AWS deployment guide
  • CI/CD pipeline for AWS deployments

Phase 3: Application Cloud Abstraction (2 weeks)

Goal: Remove cloud-specific dependencies from application code.

Tasks:

  1. Database Connection Abstraction (2 days)

    • Replace private IP connections with environment variable DATABASE_URL
    • Implement Cloud SQL Proxy sidecar for GCP
    • Test with both GCP and AWS database connections
  2. Redis Connection Abstraction (1 day)

    • Replace authorized_network connections with connection string
    • Use environment variable REDIS_URL
    • Test with both GCP Memorystore and AWS ElastiCache
  3. Signing Service Abstraction (3 days)

    • Create SigningProvider interface with GCP/AWS implementations
    • Implement provider factory based on CLOUD_PROVIDER env var
    • Update License API to use abstracted signing
    • Test license signing on both clouds
  4. Authentication Provider Evaluation (3 days)

    • Research self-managed alternatives (Keycloak, Ory Hydra)
    • Proof of concept: Deploy Keycloak on Kubernetes
    • Compare feature parity with Identity Platform
    • Decision: Keep Identity Platform or migrate to self-managed
  5. Secrets Provider Abstraction (2 days)

    • Evaluate External Secrets Operator for Kubernetes
    • Install and configure for GCP Secret Manager
    • Install and configure for AWS Secrets Manager
    • Update Kubernetes manifests to use external secrets
  6. Monitoring Stack Migration (3 days)

    • Deploy Prometheus + Grafana on Kubernetes (cloud-agnostic)
    • Configure Loki for log aggregation
    • Deploy Jaeger for distributed tracing
    • Migrate from Cloud Monitoring to self-managed stack

Deliverables:

  • Application code runs on both GCP and AWS without changes
  • Signing service provider factory implementation
  • Documentation: Application cloud abstraction guide
  • Decision document: Authentication provider strategy

Phase 4: Azure Implementation (4-6 weeks)

Goal: Implement Azure-specific modules and validate multi-cloud strategy.

Tasks:

  1. Azure Compute Module (AKS) (5 days)
  2. Azure Database Module (Azure Database for PostgreSQL) (3 days)
  3. Azure Cache Module (Azure Cache for Redis) (3 days)
  4. Azure Networking Module (VNet) (3 days)
  5. Azure Secrets Module (Key Vault) (2 days)
  6. Azure Signing Service (Key Vault) (3 days)
  7. Azure Environment Configuration (3 days)

Deliverables:

  • 6 Azure-specific module implementations
  • Azure dev environment fully operational
  • Documentation: Azure deployment guide
  • Multi-cloud comparison matrix

Phase 5: Kubernetes Manifests & CI/CD (2 weeks)

Goal: Create cloud-agnostic Kubernetes manifests with provider-specific customizations.

Tasks:

  1. Base Kubernetes Manifests (3 days)

    • Create kubernetes/base/ with Deployment, Service, Ingress
    • Use Kustomize for cloud-specific overlays
    • Implement service account annotations per cloud
  2. Cloud-Specific Overlays (3 days)

    • Create kubernetes/overlays/gcp/
    • Create kubernetes/overlays/aws/
    • Create kubernetes/overlays/azure/
    • Configure Workload Identity / IRSA / Managed Identity
  3. CI/CD Pipeline (5 days)

    • Create GitHub Actions workflow for multi-cloud deployments
    • Implement environment-specific deployment targets (GCP, AWS, Azure)
    • Add automated testing for each cloud
    • Configure deployment gates and approvals
  4. Documentation (3 days)

    • Multi-cloud deployment guide
    • Cloud-specific configuration differences
    • Troubleshooting guide for each provider
    • Cost comparison matrix

Deliverables:

  • Cloud-agnostic Kubernetes manifests with overlays
  • Multi-cloud CI/CD pipeline
  • Comprehensive deployment documentation
  • Automated testing for all three clouds

Recommendations & Next Steps

Short-Term (Next 3 Months)

Priority 1: Complete GCP Implementation

  • Action: Finish Phase 0 infrastructure deployment (Cloud KMS, Identity Platform)
  • Effort: 2-3 days
  • Rationale: Deliver working license system on GCP first, validate architecture

Priority 2: Document Cloud-Specific Dependencies

  • Action: Create inventory of all GCP-specific resources and APIs
  • Effort: 1 day
  • Rationale: Establish baseline for multi-cloud planning

Priority 3: Design Module Abstraction Layer

  • Action: Create detailed specification for cloud-agnostic module interfaces
  • Effort: 3 days
  • Rationale: Foundation for multi-cloud support

Medium-Term (3-6 Months)

Priority 1: Implement Module Abstraction (Phase 1)

  • Action: Refactor existing GCP modules into abstraction layer
  • Effort: 2-3 weeks
  • Rationale: Enable future multi-cloud support without disrupting current deployments

Priority 2: AWS Implementation (Phase 2)

  • Action: Create AWS-specific modules and deploy to AWS dev environment
  • Effort: 3-4 weeks
  • Rationale: Validate abstraction layer with second cloud provider

Priority 3: Application Cloud Abstraction (Phase 3)

  • Action: Remove cloud-specific dependencies from License API
  • Effort: 2 weeks
  • Rationale: Enable application portability across clouds

Long-Term (6-12 Months)

Priority 1: Azure Implementation (Phase 4)

  • Action: Create Azure-specific modules for complete multi-cloud coverage
  • Effort: 4-6 weeks
  • Rationale: Full multi-cloud capability for enterprise customers

Priority 2: Self-Managed Authentication

  • Action: Migrate from Identity Platform to Keycloak or Ory Hydra
  • Effort: 4 weeks
  • Rationale: Remove last major cloud-specific dependency

Priority 3: Multi-Cloud Monitoring & Observability

  • Action: Deploy self-managed Prometheus/Grafana/Jaeger stack
  • Effort: 2 weeks
  • Rationale: Consistent monitoring across all cloud providers

Cost-Benefit Analysis

Investment Required:

  • Phase 1 (Module Abstraction): 2-3 weeks engineering time ($15K-$22K)
  • Phase 2 (AWS Implementation): 3-4 weeks engineering time ($22K-$30K)
  • Phase 3 (Application Abstraction): 2 weeks engineering time ($15K)
  • Phase 4 (Azure Implementation): 4-6 weeks engineering time ($30K-$45K)
  • Phase 5 (CI/CD): 2 weeks engineering time ($15K)
  • Total: 13-17 weeks, $97K-$127K

Benefits:

  • Customer Flexibility: Support enterprise customers with existing AWS/Azure commitments
  • Vendor Negotiation: Leverage multi-cloud capability for better pricing
  • Risk Mitigation: Reduce dependency on single cloud provider
  • Geographic Expansion: Deploy in regions not available on GCP
  • Disaster Recovery: Cross-cloud failover capabilities

ROI Timeline:

  • Break-even: 6-12 months (assumes 10% revenue increase from multi-cloud support)
  • Long-term value: Significant competitive advantage for enterprise sales

Decision Framework

When to Invest in Multi-Cloud:

Invest Now If:

  • Enterprise customers require AWS/Azure deployment
  • GCP pricing is not competitive for planned scale
  • Geographic regions needed are not available on GCP
  • Business strategy includes multi-cloud as differentiator

Delay Investment If:

  • Current GCP implementation meets all customer needs
  • No near-term enterprise deals requiring other clouds
  • Engineering resources needed for core product features
  • GCP pricing and availability are satisfactory

Recommended Approach:

  1. Complete GCP Implementation (Priority 1)
  2. Implement Module Abstraction (Phase 1) as insurance policy
  3. Wait for customer demand before implementing AWS/Azure
  4. Re-evaluate quarterly based on sales pipeline and customer requests

Appendix

A. Cloud Provider Comparison Matrix

FeatureGCPAWSAzure
KubernetesGKEEKSAKS
PostgreSQLCloud SQLRDSAzure Database
RedisMemorystoreElastiCacheAzure Cache
SecretsSecret ManagerSecrets ManagerKey Vault
KMSCloud KMSAWS KMSKey Vault
AuthIdentity PlatformCognitoAzure AD B2C
VPCVPCVPCVNet
NATCloud NATNAT GatewayNAT Gateway
Load BalancerCloud Load BalancerALB/NLBAzure Load Balancer
DNSCloud DNSRoute53Azure DNS

B. Module Interface Specifications

Complete interface specifications for all cloud-agnostic modules available in:

  • opentofu/modules/compute/interface.tf
  • opentofu/modules/database/interface.tf
  • opentofu/modules/cache/interface.tf
  • opentofu/modules/networking/interface.tf
  • opentofu/modules/secrets/interface.tf
  • opentofu/modules/signing/interface.tf

C. Migration Checklist

Infrastructure Migration Checklist (per cloud):

  • VPC/VNet with subnets created
  • NAT Gateway configured
  • Kubernetes cluster operational
  • PostgreSQL database deployed with HA
  • Redis cluster deployed
  • Secrets service configured
  • KMS signing keys created
  • Load balancer with SSL configured
  • DNS records created
  • Firewall rules configured
  • Monitoring stack deployed
  • Backup and disaster recovery tested
  • Cost monitoring enabled
  • Security audit completed
  • End-to-end integration testing passed

Application Migration Checklist:

  • Database connection uses environment variable
  • Redis connection uses environment variable
  • Signing service uses provider factory
  • Secrets access uses abstraction layer
  • Authentication uses cloud-agnostic provider
  • Monitoring uses self-managed stack
  • Kubernetes manifests use overlays
  • CI/CD pipeline supports multi-cloud
  • Documentation updated for all clouds
  • Load testing completed on all clouds

Document Version: 1.0 Last Updated: November 23, 2025 Next Review: February 2026 Owner: CODITECT Infrastructure Team Status: Analysis Complete, Awaiting Business Decision