PROJECT PLAN: CODITECT License Management Platform

Project: CODITECT License Management Platform (FastAPI + PostgreSQL) Date: November 24, 2025 Owner: CODITECT Infrastructure Team Status: ACTIVE DEVELOPMENT (35% Complete) Version: 3.0 (License Management Focus)

Executive Summary
Technology Stack
Architecture Overview
Phase 0: Infrastructure & Documentation
Phase 1: Security Services
Phase 2: Backend Development
Phase 3: Deployment
Phase 4: Client SDK
Phase 5: End-to-End Testing
Phase 6: Production Hardening
Timeline & Resource Requirements
Budget & Cost Analysis
Risk Assessment
Quality Gates & Success Metrics

1. Executive Summary

This plan outlines the complete implementation of the CODITECT License Management Platform, a production-grade floating license system that enables CODITECT's local-first AI development framework to validate licenses, track concurrent sessions, and manage multi-tenant licensing through a secure cloud API.

Key Objectives

Floating Concurrent Licensing: Limit simultaneous users, not installations
Check-on-Start Pattern: Fast validation at CODITECT startup (local-first architecture)
Cloud KMS Signing: Tamper-proof licenses verified locally without network
Multi-Tenant Isolation: Complete tenant separation at application and database levels
Production-Ready: Comprehensive monitoring, testing, and deployment automation

Current Status

Overall Completion: 35% of MVP (Phase 0 complete, Phase 1-6 pending)

Completed Work:

✅ Phase 0: Infrastructure & Documentation (100%) - November 20-24, 2025
- GKE cluster deployed (3 nodes, auto-scaling 1-10)
- Cloud SQL PostgreSQL 16 with regional HA
- Redis Memorystore 6GB with RDB persistence
- VPC networking with private subnets and Cloud NAT
- Secret Manager with 9 secrets configured
- Documentation organized (7 categories, 100/100 CODITECT standards)
- 17 comprehensive diagrams (C4 architecture, workflows, deployment)
- Production-ready README.md and CLAUDE.md

Remaining Work:

⏸️ Phase 1: Security Services (2-3 days) - Cloud KMS + Identity Platform
⏸️ Phase 2: Backend Development (5-7 days) - FastAPI license API
⏸️ Phase 3: Deployment (2-3 days) - Kubernetes deployment + SSL/DNS
⏸️ Phase 4: Client SDK (1-2 days) - Python License Client for coditect-core
⏸️ Phase 5: E2E Testing (2 days) - Integration and load testing
⏸️ Phase 6: Production Hardening (2 days) - Monitoring and runbooks

Success Criteria

✅ CODITECT can acquire licenses on startup
✅ License API validates JWT and checks seat availability atomically (Redis Lua)
✅ License API signs tokens with Cloud KMS (RSA-4096)
✅ CODITECT validates signature locally (offline-capable)
✅ Heartbeat keeps session alive (every 5 min)
✅ Graceful license release on CODITECT exit
✅ End-to-end test passing (acquire → heartbeat → release)
✅ Unit test coverage ≥80%
✅ Load test passing (100 concurrent users)

2. Technology Stack

Core Platform

Component	Technology	Version	Purpose
Backend Framework	FastAPI	0.104+	Async REST API framework
Database	PostgreSQL	16+	License and tenant data storage
Session Management	Redis	7.x	Concurrent seat tracking with TTL
Authentication	Identity Platform	Latest	OAuth2/OIDC with Google/GitHub
License Signing	Cloud KMS	Latest	RSA-4096 asymmetric key signing
ORM	SQLAlchemy async	2.0+	Database models and queries
Validation	Pydantic	2.x	Request/response schema validation

Infrastructure

Component	Technology	Purpose
Cloud Platform	Google Cloud Platform (GCP)	Cloud infrastructure
Orchestration	Kubernetes (GKE)	Container orchestration
Infrastructure as Code	OpenTofu v1.10.7	GCP resource provisioning (MPL 2.0)
Load Balancer	GCP Load Balancer + Ingress	Traffic distribution
Container Registry	Google Container Registry	Docker image storage
Secrets Management	GCP Secret Manager	Secure credential storage

Observability & Monitoring (Phase 6)

Component	Technology	Purpose
Metrics	Prometheus	Metrics collection
Visualization	Grafana	Metrics dashboards
Distributed Tracing	Jaeger (optional)	Request tracing
Logging	Google Cloud Logging	Centralized logging
Error Tracking	Sentry (optional)	Exception tracking

Development & CI/CD

Component	Technology	Purpose
Version Control	Git + GitHub	Source code management
CI/CD	GitHub Actions	Automated testing and deployment
Testing	pytest + Locust	Unit/integration/load testing
Code Quality	Ruff + Black + MyPy	Linting and type checking
Security Scanning	Trivy + Safety	Vulnerability scanning

3. Architecture Overview

License Management Pattern: Check-on-Start

CODITECT uses a local-first architecture - the framework runs entirely on the user's machine. The cloud infrastructure only validates licenses and tracks concurrent usage.

Flow:

User starts CODITECT locally
    ↓
CODITECT → License API: "Can I run?" (hardware_id, jwt_token)
    ↓
License API (running on GKE):
    1. Validate JWT token (Identity Platform)
    2. Check license active in PostgreSQL
    3. Atomic seat check in Redis (Lua script)
    4. Sign license with Cloud KMS (RSA-4096)
    ↓
CODITECT ← Signed License Token
    ↓
Validate signature locally (offline-capable)
    ↓
Run CODITECT with periodic heartbeats (every 5 min)
    ↓
On exit: Release seat OR wait for 6-min TTL expiry (automatic cleanup)

Key Architectural Decisions

Floating Concurrent Licensing - Limit simultaneous users, not installations
Session Management - Redis TTL (6 min) prevents zombie sessions automatically
Offline-Capable - Signed licenses verified locally, works without network
Security - Cloud KMS signing (tamper-proof), no client-side secrets
Multi-Tenant - Tenant isolation via application-level filtering in PostgreSQL

See: docs/architecture/c1-system-context.md for detailed architecture

System Context (C4 Level 1)

┌──────────────────────────────────────────────────────────────┐
│                    CODITECT Developer                        │
│  (Uses CODITECT CLI for local AI development)                │
└────────────┬─────────────────────────────────────────────────┘
             │
             │ 1. Request license (hardware_id, JWT)
             │ 2. Send heartbeat (every 5 min)
             │ 3. Release license on exit
             ▼
┌──────────────────────────────────────────────────────────────┐
│          CODITECT License Management Platform                │
│                                                              │
│  ┌─────────────┐  ┌──────────────┐  ┌───────────────┐      │
│  │ FastAPI     │  │ PostgreSQL   │  │ Redis         │      │
│  │ License API │→ │ (licenses,   │  │ (session      │      │
│  │             │  │  tenants,     │  │  tracking,    │      │
│  │             │  │  users)       │  │  TTL 6 min)   │      │
│  └─────────────┘  └──────────────┘  └───────────────┘      │
│         │                                                     │
│         └──────────┬─────────────────┬──────────────────┐   │
│                    ▼                 ▼                  ▼   │
│            ┌──────────────┐  ┌──────────────┐  ┌─────────┐ │
│            │ Identity     │  │ Cloud KMS    │  │ Secret  │ │
│            │ Platform     │  │ (RSA-4096    │  │ Manager │ │
│            │ (OAuth2)     │  │  signing)    │  │         │ │
│            └──────────────┘  └──────────────┘  └─────────┘ │
└──────────────────────────────────────────────────────────────┘

4. Phase 0: Infrastructure & Documentation

Duration: 4 days (November 20-24, 2025) Status: ✅ COMPLETE (100%) Team: DevOps Engineer, Documentation Specialist

Objectives

✅ Provision production-grade GCP infrastructure
✅ Deploy GKE cluster, Cloud SQL PostgreSQL, Redis Memorystore
✅ Configure VPC networking with private subnets
✅ Organize documentation to 100/100 CODITECT standards
✅ Create comprehensive architecture diagrams
✅ Update README.md and CLAUDE.md to production quality

Work Completed

Infrastructure Deployment (November 20-21)

GKE Cluster:

3-node cluster in us-central1 (auto-scaling 1-10 nodes)
Node type: n1-standard-2 (2 vCPU, 7.5GB RAM)
Preemptible nodes for cost optimization ($100/month)
Workload Identity enabled for GCP service account integration

Cloud SQL PostgreSQL 16:

Regional HA with automatic failover
Instance type: db-custom-2-7680 (2 vCPU, 7.5GB RAM)
100GB SSD storage with auto-increase enabled
Private IP connectivity via VPC peering
Automated daily backups with 7-day retention
Cost: $150/month

Redis Memorystore:

6GB BASIC tier (RDB persistence enabled)
Private IP connectivity to GKE
Used for atomic seat counting (Lua scripts)
Cost: $30/month

VPC Networking:

Custom VPC with RFC 1918 address space
Private subnets for GKE and Cloud SQL
Cloud NAT for outbound internet access
VPC peering between GKE and Cloud SQL
Private Google Access enabled

Secret Manager:

9 secrets configured (DB passwords, API keys placeholders)
Workload Identity for pod access to secrets
Secret rotation policies documented

Total Infrastructure Cost: $310/month (development environment)

Documentation Organization (November 24)

Directory Restructuring:

Created 7 documentation categories:
- docs/architecture/ - C4 diagrams (C1, C2, C3)
- docs/workflows/ - Sequence diagrams with code examples
- docs/deployment/ - Deployment and infrastructure guides
- docs/guides/ - Development setup and troubleshooting
- docs/reference/ - GCP inventory, API reference
- docs/project-management/ - PROJECT-PLAN, TASKLIST, CRITICAL-PATH
- docs/research/ - Gap analysis, OpenTofu migration research
Created comprehensive README files for all directories
Updated master docs/README.md with complete navigation

Diagram Library:

Created 17 comprehensive diagrams:
- C4 Architecture (5 diagrams):
  - C1: System Context
  - C2: Container Architecture
  - C3: GKE Components
  - C3: Networking Components
  - C3: Security Components
- Workflow Sequences (5 diagrams):
  - License acquisition flow
  - Heartbeat mechanism
  - License release flow
  - User registration flow
  - Multi-tenant isolation verification
- Deployment Diagrams (2 diagrams):
  - Blue/Green deployment strategy
  - Infrastructure topology
- Supporting Diagrams (5 diagrams):
  - Redis atomic seat counting
  - Cloud KMS signing flow
  - Session TTL management
  - Error handling patterns
  - Scaling strategy

Documentation Quality:

Updated README.md to 558 lines (production quality)
Updated CLAUDE.md to 672 lines (AI-optimized context)
Created 6 documentation planning documents (101KB total)
Achieved 100% README coverage across all directories
All documentation follows CODITECT standards (100/100 score)

Repository Organization:

Root directory cleaned to 14 essential files
All files properly categorized
Git history preserved (all moves with git mv)
Production-ready structure

Deliverables

✅ Complete GCP infrastructure deployed and operational
✅ 7 documentation categories with comprehensive content
✅ 17 architecture and workflow diagrams
✅ Production-quality README.md and CLAUDE.md
✅ 100/100 CODITECT standards compliance
✅ Complete navigation and cross-referencing

Success Criteria

GKE cluster running in us-central1
Cloud SQL PostgreSQL accessible from GKE
Redis cluster operational and accessible
All infrastructure costs within budget ($310/month dev)
Documentation organized to CODITECT standards
All directories have comprehensive README files
Diagrams cover all major system components and flows

Cost Summary (Phase 0)

Component	Monthly Cost
GKE Cluster (3 preemptible nodes)	$100
Cloud SQL PostgreSQL (HA)	$150
Redis Memorystore (6GB)	$30
VPC Networking + Cloud NAT	$20
Secret Manager	$10
Total	$310

5. Phase 1: Security Services

Duration: 2-3 days Status: ⏸️ NEXT (0% Complete) Team: DevOps Engineer, Security Specialist Goal: Deploy Cloud KMS and Identity Platform for OAuth2 authentication

Objectives

Deploy Cloud KMS for license signing (RSA-4096)
Configure Identity Platform for OAuth2 (Google, GitHub)
Test end-to-end authentication flow
Document security architecture

Tasks

Day 1: Cloud KMS Setup

P1-T01: Create Cloud KMS OpenTofu Module

Create opentofu/modules/kms/ directory
Define RSA-4096 asymmetric key for license signing
Configure key rotation policy (90 days)
Setup IAM permissions for GKE service account
Time Estimate: 2 hours

P1-T02: Deploy Cloud KMS

Deploy KMS key ring to us-central1
Create signing key: coditect-license-signing-key
Grant cloudkms.cryptoKeyEncrypterDecrypter to GKE SA
Test signing with gcloud command
Time Estimate: 1 hour

Day 2: Identity Platform Setup

P1-T03: Create Identity Platform Module

Create opentofu/modules/identity-platform/ directory
Configure OAuth2 providers (Google, GitHub)
Setup OAuth consent screen
Configure redirect URIs (localhost, staging, production)
Time Estimate: 4 hours

P1-T04: Deploy Identity Platform

Enable Identity Platform API
Create OAuth2 clients (web app, mobile app)
Configure authorized domains
Test OAuth flow with Google account
Time Estimate: 2 hours

Day 3: Testing & Documentation

P1-T05: End-to-End Auth Testing

Test OAuth2 authorization code flow
Verify JWT token generation
Validate token claims (user_id, email, tenant_id)
Test token refresh flow
Time Estimate: 4 hours

P1-T06: Security Documentation

Document OAuth2 flow with sequence diagrams
Create runbook for token validation
Document KMS signing process
Update C3-Security diagram
Time Estimate: 3 hours

Deliverables

⏸️ Cloud KMS operational with RSA-4096 signing key
⏸️ Identity Platform configured with Google/GitHub OAuth
⏸️ End-to-end authentication flow tested
⏸️ Security architecture documented

Success Criteria

Cloud KMS can sign arbitrary data
Identity Platform OAuth flow works for Google and GitHub
JWT tokens contain correct claims (user_id, tenant_id)
Token validation succeeds with public key
Security documentation complete

Blocking Dependencies

Phase 2 Backend Development REQUIRES Phase 1 completion:

JWT validation middleware needs Identity Platform public keys
License signing needs Cloud KMS integration

6. Phase 2: Backend Development

Duration: 5-7 days Status: ⏸️ PENDING (0% Complete) Team: Backend Engineers (2x), Database Architect Goal: Build complete FastAPI license API with multi-tenant support

Objectives

Setup FastAPI project structure
Create database models (SQLAlchemy async)
Build REST APIs (acquire, heartbeat, release)
Implement Redis Lua scripts for atomic seat counting
Integrate Cloud KMS for license signing
Integrate Identity Platform for JWT validation
Write comprehensive tests (80%+ coverage)

Tasks

Day 1-2: FastAPI Project & Database Models

P2-T01: FastAPI Project Setup

Create backend/ directory structure
Initialize FastAPI project with Poetry
Configure settings.py (environment-based config)
Setup async database connection (asyncpg)
Configure CORS middleware
Time Estimate: 2 hours

P2-T02: Database Models (SQLAlchemy async)

Create models/ directory
Tenant model (id, name, subdomain, plan, max_seats)
User model (id, email, tenant_id, auth_provider_id)
License model (id, tenant_id, plan, seats_total, active)
Session model (id, license_id, user_id, hardware_id, started_at, last_heartbeat)
AuditLog model (id, tenant_id, user_id, action, timestamp)
Create Alembic migrations
Time Estimate: 3 hours

Day 3-4: License API Endpoints

P2-T03: License Acquire Endpoint

POST /api/v1/licenses/acquire
Request: {user_id, hardware_id, jwt_token}
Validate JWT with Identity Platform
Check license active in PostgreSQL
Atomic seat check in Redis (Lua script)
Sign license with Cloud KMS
Response: {license_token, expires_at, public_key}
Time Estimate: 4 hours

P2-T04: Heartbeat Endpoint

POST /api/v1/licenses/heartbeat
Request: {session_id, jwt_token}
Validate JWT
Update last_heartbeat timestamp in Redis (extend TTL to 6 min)
Response: {status: "ok", next_heartbeat_at}
Time Estimate: 2 hours

P2-T05: License Release Endpoint

POST /api/v1/licenses/release
Request: {session_id, jwt_token}
Validate JWT
Delete session from Redis (atomic decrement)
Log release in audit log
Response: {status: "released"}
Time Estimate: 2 hours

Day 5: Redis & KMS Integration

P2-T06: Redis Lua Scripts (Atomic Seat Counting)

Create redis_scripts/ directory
acquire_seat.lua - Atomic check and increment
release_seat.lua - Atomic decrement
extend_ttl.lua - Extend session TTL on heartbeat
Load scripts on Redis connection
Time Estimate: 3 hours

P2-T07: Cloud KMS Integration

Create kms/ service module
Implement async KMS signing with aiogoogle
Sign license payload (tenant_id, user_id, hardware_id, expires_at)
Cache public key in Redis (1 hour TTL)
Time Estimate: 3 hours

Day 6: Auth & Admin Endpoints

P2-T08: JWT Auth Middleware

Create middleware/auth.py
Validate JWT signature with Identity Platform public keys
Extract user_id and tenant_id from claims
Add tenant context to request state
Handle token expiration and refresh
Time Estimate: 4 hours

P2-T09: Admin Endpoints (Tenant CRUD)

POST /api/v1/tenants/ (create tenant)
GET /api/v1/tenants/{id}/ (tenant details)
PUT /api/v1/tenants/{id}/ (update tenant)
GET /api/v1/tenants/{id}/users/ (list users)
POST /api/v1/tenants/{id}/users/ (create user)
Time Estimate: 4 hours

Day 7: Testing

P2-T10: Unit Tests (pytest)

Test database models (validation, constraints)
Test API endpoints (CRUD operations)
Test Redis Lua scripts (atomic operations)
Test KMS signing and verification
Test JWT middleware (valid/invalid tokens)
Target: 80%+ code coverage
Time Estimate: 6 hours

P2-T11: Integration Tests

Test end-to-end license acquisition flow
Test multi-tenant isolation (tenant A can't access tenant B data)
Test concurrent seat acquisition (10 users)
Test session TTL expiry (Redis)
Time Estimate: 4 hours

Deliverables

⏸️ Complete FastAPI application
⏸️ Database models with migrations
⏸️ License API endpoints (acquire, heartbeat, release)
⏸️ Redis Lua scripts for atomic seat counting
⏸️ Cloud KMS signing integration
⏸️ JWT authentication middleware
⏸️ Admin endpoints for tenant management
⏸️ Comprehensive test suite (80%+ coverage)

Success Criteria

All API endpoints return correct responses
Multi-tenant isolation verified (integration tests)
Unit test coverage ≥80%
Redis atomic operations work correctly
KMS signing and verification functional
JWT middleware validates tokens correctly

Blocking Dependencies

Phase 3 Deployment REQUIRES Phase 2 completion:

Cannot deploy without backend code
Dockerfile needs working FastAPI application

7. Phase 3: Deployment

Duration: 2-3 days Status: ⏸️ PENDING (0% Complete) Team: DevOps Engineer Goal: Deploy FastAPI backend to GKE with SSL/DNS configuration

Objectives

Create Dockerfile for FastAPI application
Build and push Docker image to GCR
Create Kubernetes manifests (Deployment, Service, Ingress)
Deploy to GKE
Configure SSL certificate + DNS
Verify end-to-end deployment

Tasks

Day 1: Containerization

P3-T01: Create Dockerfile (Multi-Stage Build)

Stage 1: Build dependencies (Poetry install)
Stage 2: Runtime image (Python 3.11-slim)
Copy application code
Set entrypoint: uvicorn main:app --host 0.0.0.0 --port 8000
Time Estimate: 2 hours

P3-T02: Build and Push Docker Image

Build image: docker build -t gcr.io/coditect-cloud-infra/license-api:v1.0.0
Test locally: docker run -p 8000:8000 license-api:v1.0.0
Push to GCR: docker push gcr.io/coditect-cloud-infra/license-api:v1.0.0
Time Estimate: 1 hour

Day 2: Kubernetes Manifests

P3-T03: Kubernetes Deployment Manifest

Create kubernetes/base/deployment.yaml
3 replicas for high availability
Resource requests: 500m CPU, 512Mi memory
Resource limits: 1000m CPU, 1Gi memory
Liveness probe: /health
Readiness probe: /ready
Environment variables from Secret Manager
Time Estimate: 2 hours

P3-T04: Kubernetes Service Manifest

Create kubernetes/base/service.yaml
Type: ClusterIP (internal load balancing)
Port: 80 → 8000 (container port)
Selector: app: license-api
Time Estimate: 1 hour

P3-T05: Ingress + cert-manager Config

Create kubernetes/base/ingress.yaml
Host: auth.coditect.ai
TLS: Use cert-manager for Let's Encrypt
Annotations: cert-manager.io/cluster-issuer: letsencrypt-prod
Time Estimate: 2 hours

Day 3: Deployment & DNS

P3-T06: Deploy to GKE

Apply Kubernetes manifests: kubectl apply -k kubernetes/base/
Verify pods running: kubectl get pods
Check logs: kubectl logs <pod-name>
Time Estimate: 2 hours

P3-T07: Configure Cloud DNS

Create A record: auth.coditect.ai → GCP Load Balancer IP
Verify DNS propagation: dig auth.coditect.ai
Time Estimate: 1 hour

P3-T08: SSL Certificate Verification

Wait for cert-manager to provision certificate (5-10 min)
Verify certificate: curl -I https://auth.coditect.ai/health
Check expiry: openssl s_client -connect auth.coditect.ai:443
Time Estimate: 1 hour

Deliverables

⏸️ Dockerfile with multi-stage build
⏸️ Docker image pushed to GCR
⏸️ Kubernetes manifests (Deployment, Service, Ingress)
⏸️ FastAPI deployed to GKE
⏸️ SSL certificate on auth.coditect.ai
⏸️ DNS configured and propagated

Success Criteria

Docker image builds successfully
Kubernetes deployment healthy (3/3 pods running)
Health check endpoint responds: https://auth.coditect.ai/health
SSL certificate valid (Let's Encrypt)
DNS resolves correctly

Blocking Dependencies

Phase 5 E2E Testing REQUIRES Phase 3 completion:

Cannot test end-to-end without deployed API
Client SDK needs production URL

8. Phase 4: Client SDK

Duration: 1-2 days Status: ⏸️ PENDING (0% Complete) Team: Python Developer Goal: Create Python License Client SDK for coditect-core integration

Objectives

Create license-client Python package
Implement hardware fingerprinting
Implement signature verification (public key)
Implement heartbeat background thread
Implement offline mode with grace period
Write comprehensive tests

Tasks

Day 1: Client SDK Foundation

P4-T01: Create License Client Package

Create license-client/ directory in coditect-core
Create LicenseClient class
Methods: acquire(), heartbeat(), release()
Configuration: API URL, timeout, retry settings
Time Estimate: 4 hours

P4-T02: Hardware Fingerprinting

Generate unique hardware ID (MAC address + CPU ID + disk serial)
Hash hardware ID (SHA256)
Store hardware ID in local cache (~/.coditect/hardware_id)
Time Estimate: 2 hours

P4-T03: Signature Verification (Public Key)

Fetch public key from /api/v1/public-key endpoint
Cache public key locally (1 hour TTL)
Verify license token signature with RSA-4096
Validate token claims (tenant_id, user_id, expires_at)
Time Estimate: 2 hours

Day 2: Heartbeat & Offline Mode

P4-T04: Heartbeat Background Thread

Create HeartbeatThread class
Send heartbeat every 5 minutes
Handle network failures (retry 3 times)
Graceful shutdown on CODITECT exit
Time Estimate: 2 hours

P4-T05: Offline Mode with Grace Period

If heartbeat fails, continue running (grace period: 24 hours)
Check last successful heartbeat timestamp
Display warning: "License server unreachable, offline mode"
Force exit after 24 hours offline
Time Estimate: 2 hours

P4-T06: Error Handling and Retries

Retry logic for network failures (exponential backoff)
Handle 429 Too Many Requests (back off)
Handle 401 Unauthorized (re-authenticate)
Handle 503 Service Unavailable (wait and retry)
Time Estimate: 2 hours

P4-T07: Client Unit Tests

Test hardware fingerprinting
Test signature verification
Test heartbeat thread
Test offline mode grace period
Test error handling
Target: 80%+ coverage
Time Estimate: 3 hours

Deliverables

⏸️ license-client Python package
⏸️ Hardware fingerprinting implemented
⏸️ Signature verification with public key
⏸️ Heartbeat background thread
⏸️ Offline mode with 24-hour grace period
⏸️ Comprehensive error handling
⏸️ Client unit tests (80%+ coverage)

Success Criteria

Client can acquire license from API
Signature verification succeeds
Heartbeat thread runs in background
Offline mode works (grace period)
Error handling tested (network failures)
Client tests pass (80%+ coverage)

9. Phase 5: End-to-End Testing

Duration: 2 days Status: ⏸️ PENDING (0% Complete) Team: QA Engineer, Backend Engineer Goal: Verify complete system integration with load testing

Objectives

Test end-to-end license flow (acquire → heartbeat → release)
Test multi-user concurrent access (10 users)
Perform load testing (100 concurrent users)
Run security scan (OWASP ZAP)
Fix critical bugs

Tasks

Day 1: Integration Testing

P5-T01: End-to-End Integration Test

Test: User signs up → acquires license → heartbeat → release
Verify JWT token validation
Verify Redis seat counting (atomic)
Verify KMS signature verification
Verify session TTL expiry (wait 6 min, session auto-released)
Time Estimate: 4 hours

P5-T02: Multi-User Concurrent Test (10 users)

Simulate 10 users acquiring licenses simultaneously
Verify seat limit enforced (e.g., 5 seat plan)
Verify 6th user gets "No seats available" error
Verify seat release frees up capacity
Time Estimate: 2 hours

Day 2: Load & Security Testing

P5-T03: Load Test with Locust (100 users, 1000 req/min)

Create Locust test script:
- 100 concurrent users
- Each user: acquire → 3 heartbeats → release
- Total: 1000 requests/min
Run for 10 minutes
Measure:
- p50, p95, p99 latency
- Error rate
- Throughput (req/s)
Target: p99 < 500ms, error rate < 1%
Time Estimate: 4 hours

P5-T04: Security Scan (OWASP ZAP)

Run OWASP ZAP active scan against staging API
Check for:
- SQL injection vulnerabilities
- XSS vulnerabilities
- Authentication bypass
- Sensitive data exposure
Fix critical/high vulnerabilities
Time Estimate: 4 hours

P5-T05: Fix Critical Bugs

Review test results
Fix bugs discovered in E2E testing
Fix performance bottlenecks (if p99 > 500ms)
Re-run tests to verify fixes
Time Estimate: 8 hours

Deliverables

⏸️ End-to-end test passing
⏸️ Multi-user concurrent test passing
⏸️ Load test passing (100 users, p99 < 500ms)
⏸️ Security scan completed (no critical issues)
⏸️ All critical bugs fixed

Success Criteria

E2E test passes (acquire → heartbeat → release)
Multi-user test passes (seat limit enforced)
Load test passes (100 users, p99 < 500ms, error rate < 1%)
Security scan shows no critical/high vulnerabilities
All critical bugs fixed and verified

10. Phase 6: Production Hardening

Duration: 2 days Status: ⏸️ PENDING (0% Complete) Team: DevOps Engineer Goal: Production readiness with monitoring and runbooks

Objectives

Create GKE monitoring dashboards (Prometheus + Grafana)
Setup alerting (PagerDuty integration)
Write runbook for incident response
Create deployment checklist
Document production procedures

Tasks

Day 1: Monitoring & Alerting

P6-T01: Create GKE Monitoring Dashboards

Setup Prometheus in GKE (or use GCP Managed Prometheus)
Create Grafana dashboards:
- License API performance (latency, error rate, throughput)
- Active sessions by tenant
- Redis connection pool usage
- PostgreSQL query performance
- GKE pod health (CPU, memory, restarts)
Time Estimate: 4 hours

P6-T02: Setup Alerting (PagerDuty)

Configure PagerDuty integration
Create alert rules:
- High error rate (>5% 5xx errors)
- High latency (p99 > 1s)
- Redis connection pool exhaustion
- Database connection failures
- Pod crash loops
Setup on-call rotation
Time Estimate: 2 hours

Day 2: Runbooks & Checklists

P6-T03: Write Runbook (Incident Response)

Create docs/RUNBOOK.md
Sections:
- Common Issues (API 5xx errors, database connection failures)
- Troubleshooting Steps (check logs, restart pods, rollback)
- Escalation Procedures (when to page CTO)
- Rollback Procedures (kubectl rollout undo)
- Database Backup and Restore
Time Estimate: 3 hours

P6-T04: Create Deployment Checklist

Create docs/DEPLOYMENT-CHECKLIST.md
Pre-deployment:
- All tests passing (unit, integration, E2E)
- Code review approved
- Database migrations tested
- Staging deployment successful
Deployment:
- Blue-green deployment to production
- Health checks passing
- Monitor for 1 hour (check logs, metrics)
Post-deployment:
- Verify end-to-end flow
- Check error rate (should be < 1%)
- Update CHANGELOG.md
Time Estimate: 1 hour

Deliverables

⏸️ Prometheus + Grafana monitoring dashboards
⏸️ PagerDuty alerting configured
⏸️ Runbook for common issues
⏸️ Deployment checklist

Success Criteria

Monitoring dashboards display real-time data
Alerts fire correctly for test scenarios
Runbook reviewed and approved by team
Deployment checklist complete

11. Timeline & Resource Requirements

Overall Timeline

Total Duration: 14-18 business days (3-4 weeks) Start Date: November 20, 2025 (Phase 0 started) Phase 0 Complete: November 24, 2025 ✅ Target MVP Completion: December 6, 2025 (Phase 1-6)

Phase Breakdown

Phase	Duration	Start Date	End Date	Status	Completion
Phase 0	4 days	Nov 20	Nov 24	✅ COMPLETE	100%
Phase 1	2-3 days	Nov 25	Nov 27	⏸️ NEXT	0%
Phase 2	5-7 days	Nov 28	Dec 4	⏸️ PENDING	0%
Phase 3	2-3 days	Dec 5	Dec 6	⏸️ PENDING	0%
Phase 4	1-2 days	Dec 5*	Dec 6*	⏸️ PENDING	0%
Phase 5	2 days	Dec 6	Dec 9	⏸️ PENDING	0%
Phase 6	2 days	Dec 9	Dec 10	⏸️ PENDING	0%

*Phase 4 (Client SDK) can run in parallel with Phase 2-3

Team Composition

Required Team:

1x DevOps Engineer (40 hrs/week) - Infrastructure, deployment, monitoring
1x Backend Engineer (40 hrs/week) - FastAPI development, testing
1x Python Developer (20 hrs/week) - Client SDK development
1x QA Engineer (20 hrs/week) - Load testing, security scanning

Total: 3 FTEs (Full-Time Equivalents)

Resource Allocation by Phase

Phase	DevOps	Backend	Python Dev	QA	Total Hours
Phase 0 ✅	32	0	0	0	32
Phase 1	16	8	0	0	24
Phase 2	8	40	0	8	56
Phase 3	16	8	0	0	24
Phase 4	0	0	16	4	20
Phase 5	8	8	0	16	32
Phase 6	16	0	0	8	24
Total	96	64	16	36	212

12. Budget & Cost Analysis

Development Costs

Labor Costs (4 weeks):

Role	Rate	Hours	Total
DevOps Engineer	$130/hr	96	$12,480
Backend Engineer	$120/hr	64	$7,680
Python Developer	$120/hr	16	$1,920
QA Engineer	$100/hr	36	$3,600

Total Labor: $25,680

Infrastructure Costs

Development Environment (4 weeks):

GKE cluster: $100/month × 1 month = $100
Cloud SQL: $150/month × 1 month = $150
Redis: $30/month × 1 month = $30
Cloud KMS: $10/month × 1 month = $10
Identity Platform: Free (up to 50K MAU)
Networking: $20/month × 1 month = $20

Development Total: $310

Production Environment (initial month):

GKE cluster: $500/month (production-grade, 5 nodes)
Cloud SQL: $400/month (higher tier, HA)
Redis: $150/month (16GB STANDARD tier)
Cloud KMS: $10/month
Identity Platform: $50/month (up to 50K MAU)
Load Balancer + SSL: $50/month
Monitoring: $40/month

Production Total: $1,200/month

Total Project Budget

Category	Cost
Labor (4 weeks)	$25,680
Infrastructure (Dev, 1 month)	$310
Infrastructure (Prod, 1 month)	$1,200
Contingency (10%)	$2,719
TOTAL	$29,909

Ongoing Costs (Post-Launch)

Monthly Recurring Costs:

Production infrastructure: $1,200/month
Support & maintenance (0.5 FTE): $10,000/month

Total: ~$11,200/month

13. Risk Assessment

Critical Risks

Risk	Likelihood	Impact	Mitigation Strategy
Identity Platform OAuth app review delay	Medium (30%)	High (+2-3 days)	Start OAuth app approval process immediately; use Firebase emulator for development
Redis Lua script bugs (race conditions)	Medium (40%)	High (+1-2 days)	Thorough unit testing with concurrent users; code review by senior engineer
GKE networking issues (ingress, DNS)	Low (20%)	Medium (+1 day)	Test networking in dev environment first; have fallback to Cloud Run
FastAPI async/await complexity	Low (15%)	Low (+1 day)	Use existing FastAPI templates; pair programming for complex async code
SSL certificate provisioning delay	Low (10%)	Low (+4 hours)	Use Let's Encrypt staging first; manual cert creation as backup

Medium Risks

Risk	Likelihood	Impact	Mitigation Strategy
KMS signing performance issues	Low (10%)	Low (+4 hours)	Cache public key; use async KMS client
PostgreSQL connection pool exhaustion	Low (5%)	Low (+2 hours)	Set appropriate pool size (20-50); monitor connections
Load test failures (p99 > 500ms)	Medium (25%)	Medium (+1 day)	Database query optimization; add indexes; implement caching

Risk Response Plans

If OAuth Review Delayed:

Use Firebase Auth emulator for development
Continue backend development with mock JWT tokens
Deploy to staging with test OAuth credentials
Switch to production OAuth when approved

If Load Test Fails:

Identify bottleneck (database, API, Redis)
Optimize hot path queries
Add database indexes on frequently queried columns
Implement caching for tenant settings (Redis)
Scale horizontally (add GKE nodes)

If Security Scan Fails:

Review OWASP ZAP findings
Fix critical/high vulnerabilities immediately
Re-run security scan
Document remediation in security log

14. Quality Gates & Success Metrics

Phase Completion Criteria

Each phase must meet the following criteria before proceeding to the next phase:

Phase 0: Infrastructure & Documentation ✅ COMPLETE

GKE cluster deployed and operational
Cloud SQL PostgreSQL accessible from GKE
Redis cluster operational
All infrastructure costs within budget ($310/month dev)
Documentation organized to CODITECT standards (100/100)
Diagrams cover all major system components

Phase 1: Security Services

Cloud KMS can sign arbitrary data
Identity Platform OAuth flow works for Google and GitHub
JWT tokens contain correct claims (user_id, tenant_id)
Security documentation complete

Phase 2: Backend Development

All API endpoints return correct responses
Multi-tenant isolation verified (integration tests)
Unit test coverage ≥80%
Redis atomic operations work correctly
KMS signing and verification functional

Phase 3: Deployment

Docker image builds successfully
Kubernetes deployment healthy (3/3 pods running)
Health check endpoint responds: https://auth.coditect.ai/health
SSL certificate valid (Let's Encrypt)

Phase 4: Client SDK

Client can acquire license from API
Signature verification succeeds
Heartbeat thread runs in background
Offline mode works (grace period)
Client tests pass (80%+ coverage)

Phase 5: End-to-End Testing

E2E test passes (acquire → heartbeat → release)
Multi-user test passes (seat limit enforced)
Load test passes (100 users, p99 < 500ms, error rate < 1%)
Security scan shows no critical/high vulnerabilities

Phase 6: Production Hardening

Monitoring dashboards display real-time data
Alerts fire correctly for test scenarios
Runbook complete and reviewed
Deployment checklist complete

Key Performance Indicators (KPIs)

Technical KPIs:

Metric	Target	Measurement
API Latency (p99)	<500ms	Prometheus histogram
API Error Rate	<1%	Prometheus counter
Database Query Time (p99)	<100ms	Prometheus histogram
Uptime	99.9%	Grafana uptime panel
Test Coverage	≥80% backend, ≥80% client	pytest
Security Vulnerabilities	0 critical/high	OWASP ZAP, Trivy

Business KPIs (Post-Launch):

Metric	Target	Measurement
License Acquisition Success Rate	≥99%	API logs
Heartbeat Reliability	≥99.9%	API logs
Active Sessions by Tenant	100-1000	Database query
API Usage per Tenant	10,000 requests/month	Kong analytics

Acceptance Testing

Pre-Launch Checklist:

All features functional in production environment
Load test passed (100 concurrent users)
Security audit completed (no critical issues)
Disaster recovery tested (backup restore)
Documentation complete (API, runbook, deployment)
On-call rotation established
Monitoring and alerting validated

Appendices

A. Glossary

Floating License: License that limits concurrent users, not installations
Check-on-Start: Validation pattern where license is checked at application startup
Cloud KMS: Google Cloud Key Management Service for cryptographic operations
Identity Platform: Google's OAuth2/OIDC service for authentication
Redis Lua: Server-side scripting in Redis for atomic operations
JWT: JSON Web Token (for authentication)
TTL: Time To Live (automatic expiration in Redis)
Heartbeat: Periodic signal to indicate session is still active

B. References

FastAPI Documentation: https://fastapi.tiangolo.com/
SQLAlchemy Async: https://docs.sqlalchemy.org/en/20/orm/extensions/asyncio.html
Cloud KMS: https://cloud.google.com/kms/docs
Identity Platform: https://cloud.google.com/identity-platform/docs
Redis Lua Scripting: https://redis.io/docs/manual/programmability/eval-intro/
GKE Documentation: https://cloud.google.com/kubernetes-engine/docs

C. Architecture Decision Records (ADRs)

See master repository docs/adrs/ for project-wide architectural decisions:

ADR-001: OpenTofu over Terraform (licensing)
ADR-002: FastAPI over Django (performance)
ADR-003: Floating concurrent licensing pattern
ADR-004: Local-first architecture (check-on-start)
ADR-005: Cloud KMS for license signing

Document Version: 3.0 (License Management Focus) Last Updated: November 24, 2025 Next Review: November 25, 2025 (Phase 1 kickoff) Owner: CODITECT Infrastructure Team Status: PHASE 0 COMPLETE ✅ | PHASE 1-6 PENDING ⏸️

Document Change History

Version	Date	Author	Changes
1.0	Nov 22, 2025	Infrastructure Team	Initial (incorrect Django/Citus focus)
2.0	Nov 23, 2025	Infrastructure Team	Expanded Django plan (still incorrect)
3.0	Nov 24, 2025	Documentation Specialist	Complete rewrite for License Management Platform (FastAPI + PostgreSQL) - Phase 0 completion documented

Table of Contents​

1. Executive Summary​

Key Objectives​

Current Status​

Success Criteria​

2. Technology Stack​

Core Platform​

Infrastructure​

Observability & Monitoring (Phase 6)​

Development & CI/CD​

3. Architecture Overview​

License Management Pattern: Check-on-Start​

Key Architectural Decisions​

System Context (C4 Level 1)​

4. Phase 0: Infrastructure & Documentation​

Objectives​

Work Completed​

Infrastructure Deployment (November 20-21)​

Documentation Organization (November 24)​

Deliverables​

Success Criteria​

Cost Summary (Phase 0)​

5. Phase 1: Security Services​

Objectives​

Tasks​

Day 1: Cloud KMS Setup​

Day 2: Identity Platform Setup​

Day 3: Testing & Documentation​

Deliverables​

Success Criteria​

Blocking Dependencies​

6. Phase 2: Backend Development​

Objectives​

Tasks​

Day 1-2: FastAPI Project & Database Models​

Day 3-4: License API Endpoints​

Day 5: Redis & KMS Integration​

Day 6: Auth & Admin Endpoints​

Day 7: Testing​

Deliverables​

Success Criteria​

Blocking Dependencies​

7. Phase 3: Deployment​

Objectives​

Tasks​

Day 1: Containerization​

Day 2: Kubernetes Manifests​

Day 3: Deployment & DNS​

Deliverables​

Success Criteria​

Blocking Dependencies​

8. Phase 4: Client SDK​

Objectives​

Tasks​

Day 1: Client SDK Foundation​

Day 2: Heartbeat & Offline Mode​

Deliverables​

Success Criteria​

9. Phase 5: End-to-End Testing​

Objectives​

Tasks​

Day 1: Integration Testing​

Day 2: Load & Security Testing​

Deliverables​

Success Criteria​

10. Phase 6: Production Hardening​

Objectives​

Tasks​

Day 1: Monitoring & Alerting​

Day 2: Runbooks & Checklists​

Deliverables​

Success Criteria​

11. Timeline & Resource Requirements​

Overall Timeline​

Phase Breakdown​

Team Composition​

Resource Allocation by Phase​

12. Budget & Cost Analysis​

Development Costs​

Infrastructure Costs​

Table of Contents

1. Executive Summary

Key Objectives

Current Status

Success Criteria

2. Technology Stack

Core Platform

Infrastructure

Observability & Monitoring (Phase 6)

Development & CI/CD

3. Architecture Overview

License Management Pattern: Check-on-Start

Key Architectural Decisions

System Context (C4 Level 1)

4. Phase 0: Infrastructure & Documentation

Objectives

Work Completed

Infrastructure Deployment (November 20-21)

Documentation Organization (November 24)

Deliverables

Success Criteria

Cost Summary (Phase 0)

5. Phase 1: Security Services

Objectives

Tasks

Day 1: Cloud KMS Setup

Day 2: Identity Platform Setup

Day 3: Testing & Documentation

Deliverables

Success Criteria

Blocking Dependencies

6. Phase 2: Backend Development

Objectives

Tasks

Day 1-2: FastAPI Project & Database Models

Day 3-4: License API Endpoints

Day 5: Redis & KMS Integration

Day 6: Auth & Admin Endpoints

Day 7: Testing

Deliverables

Success Criteria

Blocking Dependencies

7. Phase 3: Deployment

Objectives

Tasks

Day 1: Containerization

Day 2: Kubernetes Manifests

Day 3: Deployment & DNS

Deliverables

Success Criteria

Blocking Dependencies

8. Phase 4: Client SDK

Objectives

Tasks

Day 1: Client SDK Foundation

Day 2: Heartbeat & Offline Mode

Deliverables

Success Criteria

9. Phase 5: End-to-End Testing

Objectives

Tasks

Day 1: Integration Testing

Day 2: Load & Security Testing

Deliverables

Success Criteria

10. Phase 6: Production Hardening

Objectives

Tasks

Day 1: Monitoring & Alerting

Day 2: Runbooks & Checklists

Deliverables

Success Criteria

11. Timeline & Resource Requirements

Overall Timeline

Phase Breakdown

Team Composition

Resource Allocation by Phase

12. Budget & Cost Analysis

Development Costs

Infrastructure Costs