Skip to main content

PROJECT PLAN: CODITECT License Management Platform

Project: CODITECT License Management Platform (FastAPI + PostgreSQL) Date: November 24, 2025 Owner: CODITECT Infrastructure Team Status: ACTIVE DEVELOPMENT (35% Complete) Version: 3.0 (License Management Focus)


Table of Contents

  1. Executive Summary
  2. Technology Stack
  3. Architecture Overview
  4. Phase 0: Infrastructure & Documentation
  5. Phase 1: Security Services
  6. Phase 2: Backend Development
  7. Phase 3: Deployment
  8. Phase 4: Client SDK
  9. Phase 5: End-to-End Testing
  10. Phase 6: Production Hardening
  11. Timeline & Resource Requirements
  12. Budget & Cost Analysis
  13. Risk Assessment
  14. Quality Gates & Success Metrics

1. Executive Summary

This plan outlines the complete implementation of the CODITECT License Management Platform, a production-grade floating license system that enables CODITECT's local-first AI development framework to validate licenses, track concurrent sessions, and manage multi-tenant licensing through a secure cloud API.

Key Objectives

  • Floating Concurrent Licensing: Limit simultaneous users, not installations
  • Check-on-Start Pattern: Fast validation at CODITECT startup (local-first architecture)
  • Cloud KMS Signing: Tamper-proof licenses verified locally without network
  • Multi-Tenant Isolation: Complete tenant separation at application and database levels
  • Production-Ready: Comprehensive monitoring, testing, and deployment automation

Current Status

Overall Completion: 35% of MVP (Phase 0 complete, Phase 1-6 pending)

Completed Work:

  • Phase 0: Infrastructure & Documentation (100%) - November 20-24, 2025
    • GKE cluster deployed (3 nodes, auto-scaling 1-10)
    • Cloud SQL PostgreSQL 16 with regional HA
    • Redis Memorystore 6GB with RDB persistence
    • VPC networking with private subnets and Cloud NAT
    • Secret Manager with 9 secrets configured
    • Documentation organized (7 categories, 100/100 CODITECT standards)
    • 17 comprehensive diagrams (C4 architecture, workflows, deployment)
    • Production-ready README.md and CLAUDE.md

Remaining Work:

  • ⏸️ Phase 1: Security Services (2-3 days) - Cloud KMS + Identity Platform
  • ⏸️ Phase 2: Backend Development (5-7 days) - FastAPI license API
  • ⏸️ Phase 3: Deployment (2-3 days) - Kubernetes deployment + SSL/DNS
  • ⏸️ Phase 4: Client SDK (1-2 days) - Python License Client for coditect-core
  • ⏸️ Phase 5: E2E Testing (2 days) - Integration and load testing
  • ⏸️ Phase 6: Production Hardening (2 days) - Monitoring and runbooks

Success Criteria

  • ✅ CODITECT can acquire licenses on startup
  • ✅ License API validates JWT and checks seat availability atomically (Redis Lua)
  • ✅ License API signs tokens with Cloud KMS (RSA-4096)
  • ✅ CODITECT validates signature locally (offline-capable)
  • ✅ Heartbeat keeps session alive (every 5 min)
  • ✅ Graceful license release on CODITECT exit
  • ✅ End-to-end test passing (acquire → heartbeat → release)
  • ✅ Unit test coverage ≥80%
  • ✅ Load test passing (100 concurrent users)

2. Technology Stack

Core Platform

ComponentTechnologyVersionPurpose
Backend FrameworkFastAPI0.104+Async REST API framework
DatabasePostgreSQL16+License and tenant data storage
Session ManagementRedis7.xConcurrent seat tracking with TTL
AuthenticationIdentity PlatformLatestOAuth2/OIDC with Google/GitHub
License SigningCloud KMSLatestRSA-4096 asymmetric key signing
ORMSQLAlchemy async2.0+Database models and queries
ValidationPydantic2.xRequest/response schema validation

Infrastructure

ComponentTechnologyPurpose
Cloud PlatformGoogle Cloud Platform (GCP)Cloud infrastructure
OrchestrationKubernetes (GKE)Container orchestration
Infrastructure as CodeOpenTofu v1.10.7GCP resource provisioning (MPL 2.0)
Load BalancerGCP Load Balancer + IngressTraffic distribution
Container RegistryGoogle Container RegistryDocker image storage
Secrets ManagementGCP Secret ManagerSecure credential storage

Observability & Monitoring (Phase 6)

ComponentTechnologyPurpose
MetricsPrometheusMetrics collection
VisualizationGrafanaMetrics dashboards
Distributed TracingJaeger (optional)Request tracing
LoggingGoogle Cloud LoggingCentralized logging
Error TrackingSentry (optional)Exception tracking

Development & CI/CD

ComponentTechnologyPurpose
Version ControlGit + GitHubSource code management
CI/CDGitHub ActionsAutomated testing and deployment
Testingpytest + LocustUnit/integration/load testing
Code QualityRuff + Black + MyPyLinting and type checking
Security ScanningTrivy + SafetyVulnerability scanning

3. Architecture Overview

License Management Pattern: Check-on-Start

CODITECT uses a local-first architecture - the framework runs entirely on the user's machine. The cloud infrastructure only validates licenses and tracks concurrent usage.

Flow:

User starts CODITECT locally

CODITECT → License API: "Can I run?" (hardware_id, jwt_token)

License API (running on GKE):
1. Validate JWT token (Identity Platform)
2. Check license active in PostgreSQL
3. Atomic seat check in Redis (Lua script)
4. Sign license with Cloud KMS (RSA-4096)

CODITECT ← Signed License Token

Validate signature locally (offline-capable)

Run CODITECT with periodic heartbeats (every 5 min)

On exit: Release seat OR wait for 6-min TTL expiry (automatic cleanup)

Key Architectural Decisions

  1. Floating Concurrent Licensing - Limit simultaneous users, not installations
  2. Session Management - Redis TTL (6 min) prevents zombie sessions automatically
  3. Offline-Capable - Signed licenses verified locally, works without network
  4. Security - Cloud KMS signing (tamper-proof), no client-side secrets
  5. Multi-Tenant - Tenant isolation via application-level filtering in PostgreSQL

See: docs/architecture/c1-system-context.md for detailed architecture

System Context (C4 Level 1)

┌──────────────────────────────────────────────────────────────┐
│ CODITECT Developer │
│ (Uses CODITECT CLI for local AI development) │
└────────────┬─────────────────────────────────────────────────┘

│ 1. Request license (hardware_id, JWT)
│ 2. Send heartbeat (every 5 min)
│ 3. Release license on exit

┌──────────────────────────────────────────────────────────────┐
│ CODITECT License Management Platform │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌───────────────┐ │
│ │ FastAPI │ │ PostgreSQL │ │ Redis │ │
│ │ License API │→ │ (licenses, │ │ (session │ │
│ │ │ │ tenants, │ │ tracking, │ │
│ │ │ │ users) │ │ TTL 6 min) │ │
│ └─────────────┘ └──────────────┘ └───────────────┘ │
│ │ │
│ └──────────┬─────────────────┬──────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌─────────┐ │
│ │ Identity │ │ Cloud KMS │ │ Secret │ │
│ │ Platform │ │ (RSA-4096 │ │ Manager │ │
│ │ (OAuth2) │ │ signing) │ │ │ │
│ └──────────────┘ └──────────────┘ └─────────┘ │
└──────────────────────────────────────────────────────────────┘

4. Phase 0: Infrastructure & Documentation

Duration: 4 days (November 20-24, 2025) Status: ✅ COMPLETE (100%) Team: DevOps Engineer, Documentation Specialist

Objectives

  • ✅ Provision production-grade GCP infrastructure
  • ✅ Deploy GKE cluster, Cloud SQL PostgreSQL, Redis Memorystore
  • ✅ Configure VPC networking with private subnets
  • ✅ Organize documentation to 100/100 CODITECT standards
  • ✅ Create comprehensive architecture diagrams
  • ✅ Update README.md and CLAUDE.md to production quality

Work Completed

Infrastructure Deployment (November 20-21)

GKE Cluster:

  • 3-node cluster in us-central1 (auto-scaling 1-10 nodes)
  • Node type: n1-standard-2 (2 vCPU, 7.5GB RAM)
  • Preemptible nodes for cost optimization ($100/month)
  • Workload Identity enabled for GCP service account integration

Cloud SQL PostgreSQL 16:

  • Regional HA with automatic failover
  • Instance type: db-custom-2-7680 (2 vCPU, 7.5GB RAM)
  • 100GB SSD storage with auto-increase enabled
  • Private IP connectivity via VPC peering
  • Automated daily backups with 7-day retention
  • Cost: $150/month

Redis Memorystore:

  • 6GB BASIC tier (RDB persistence enabled)
  • Private IP connectivity to GKE
  • Used for atomic seat counting (Lua scripts)
  • Cost: $30/month

VPC Networking:

  • Custom VPC with RFC 1918 address space
  • Private subnets for GKE and Cloud SQL
  • Cloud NAT for outbound internet access
  • VPC peering between GKE and Cloud SQL
  • Private Google Access enabled

Secret Manager:

  • 9 secrets configured (DB passwords, API keys placeholders)
  • Workload Identity for pod access to secrets
  • Secret rotation policies documented

Total Infrastructure Cost: $310/month (development environment)

Documentation Organization (November 24)

Directory Restructuring:

  • Created 7 documentation categories:
    • docs/architecture/ - C4 diagrams (C1, C2, C3)
    • docs/workflows/ - Sequence diagrams with code examples
    • docs/deployment/ - Deployment and infrastructure guides
    • docs/guides/ - Development setup and troubleshooting
    • docs/reference/ - GCP inventory, API reference
    • docs/project-management/ - PROJECT-PLAN, TASKLIST, CRITICAL-PATH
    • docs/research/ - Gap analysis, OpenTofu migration research
  • Created comprehensive README files for all directories
  • Updated master docs/README.md with complete navigation

Diagram Library:

  • Created 17 comprehensive diagrams:
    • C4 Architecture (5 diagrams):
      • C1: System Context
      • C2: Container Architecture
      • C3: GKE Components
      • C3: Networking Components
      • C3: Security Components
    • Workflow Sequences (5 diagrams):
      • License acquisition flow
      • Heartbeat mechanism
      • License release flow
      • User registration flow
      • Multi-tenant isolation verification
    • Deployment Diagrams (2 diagrams):
      • Blue/Green deployment strategy
      • Infrastructure topology
    • Supporting Diagrams (5 diagrams):
      • Redis atomic seat counting
      • Cloud KMS signing flow
      • Session TTL management
      • Error handling patterns
      • Scaling strategy

Documentation Quality:

  • Updated README.md to 558 lines (production quality)
  • Updated CLAUDE.md to 672 lines (AI-optimized context)
  • Created 6 documentation planning documents (101KB total)
  • Achieved 100% README coverage across all directories
  • All documentation follows CODITECT standards (100/100 score)

Repository Organization:

  • Root directory cleaned to 14 essential files
  • All files properly categorized
  • Git history preserved (all moves with git mv)
  • Production-ready structure

Deliverables

  • ✅ Complete GCP infrastructure deployed and operational
  • ✅ 7 documentation categories with comprehensive content
  • ✅ 17 architecture and workflow diagrams
  • ✅ Production-quality README.md and CLAUDE.md
  • ✅ 100/100 CODITECT standards compliance
  • ✅ Complete navigation and cross-referencing

Success Criteria

  • GKE cluster running in us-central1
  • Cloud SQL PostgreSQL accessible from GKE
  • Redis cluster operational and accessible
  • All infrastructure costs within budget ($310/month dev)
  • Documentation organized to CODITECT standards
  • All directories have comprehensive README files
  • Diagrams cover all major system components and flows

Cost Summary (Phase 0)

ComponentMonthly Cost
GKE Cluster (3 preemptible nodes)$100
Cloud SQL PostgreSQL (HA)$150
Redis Memorystore (6GB)$30
VPC Networking + Cloud NAT$20
Secret Manager$10
Total$310

5. Phase 1: Security Services

Duration: 2-3 days Status: ⏸️ NEXT (0% Complete) Team: DevOps Engineer, Security Specialist Goal: Deploy Cloud KMS and Identity Platform for OAuth2 authentication

Objectives

  • Deploy Cloud KMS for license signing (RSA-4096)
  • Configure Identity Platform for OAuth2 (Google, GitHub)
  • Test end-to-end authentication flow
  • Document security architecture

Tasks

Day 1: Cloud KMS Setup

P1-T01: Create Cloud KMS OpenTofu Module

  • Create opentofu/modules/kms/ directory
  • Define RSA-4096 asymmetric key for license signing
  • Configure key rotation policy (90 days)
  • Setup IAM permissions for GKE service account
  • Time Estimate: 2 hours

P1-T02: Deploy Cloud KMS

  • Deploy KMS key ring to us-central1
  • Create signing key: coditect-license-signing-key
  • Grant cloudkms.cryptoKeyEncrypterDecrypter to GKE SA
  • Test signing with gcloud command
  • Time Estimate: 1 hour

Day 2: Identity Platform Setup

P1-T03: Create Identity Platform Module

  • Create opentofu/modules/identity-platform/ directory
  • Configure OAuth2 providers (Google, GitHub)
  • Setup OAuth consent screen
  • Configure redirect URIs (localhost, staging, production)
  • Time Estimate: 4 hours

P1-T04: Deploy Identity Platform

  • Enable Identity Platform API
  • Create OAuth2 clients (web app, mobile app)
  • Configure authorized domains
  • Test OAuth flow with Google account
  • Time Estimate: 2 hours

Day 3: Testing & Documentation

P1-T05: End-to-End Auth Testing

  • Test OAuth2 authorization code flow
  • Verify JWT token generation
  • Validate token claims (user_id, email, tenant_id)
  • Test token refresh flow
  • Time Estimate: 4 hours

P1-T06: Security Documentation

  • Document OAuth2 flow with sequence diagrams
  • Create runbook for token validation
  • Document KMS signing process
  • Update C3-Security diagram
  • Time Estimate: 3 hours

Deliverables

  • ⏸️ Cloud KMS operational with RSA-4096 signing key
  • ⏸️ Identity Platform configured with Google/GitHub OAuth
  • ⏸️ End-to-end authentication flow tested
  • ⏸️ Security architecture documented

Success Criteria

  • Cloud KMS can sign arbitrary data
  • Identity Platform OAuth flow works for Google and GitHub
  • JWT tokens contain correct claims (user_id, tenant_id)
  • Token validation succeeds with public key
  • Security documentation complete

Blocking Dependencies

Phase 2 Backend Development REQUIRES Phase 1 completion:

  • JWT validation middleware needs Identity Platform public keys
  • License signing needs Cloud KMS integration

6. Phase 2: Backend Development

Duration: 5-7 days Status: ⏸️ PENDING (0% Complete) Team: Backend Engineers (2x), Database Architect Goal: Build complete FastAPI license API with multi-tenant support

Objectives

  • Setup FastAPI project structure
  • Create database models (SQLAlchemy async)
  • Build REST APIs (acquire, heartbeat, release)
  • Implement Redis Lua scripts for atomic seat counting
  • Integrate Cloud KMS for license signing
  • Integrate Identity Platform for JWT validation
  • Write comprehensive tests (80%+ coverage)

Tasks

Day 1-2: FastAPI Project & Database Models

P2-T01: FastAPI Project Setup

  • Create backend/ directory structure
  • Initialize FastAPI project with Poetry
  • Configure settings.py (environment-based config)
  • Setup async database connection (asyncpg)
  • Configure CORS middleware
  • Time Estimate: 2 hours

P2-T02: Database Models (SQLAlchemy async)

  • Create models/ directory
  • Tenant model (id, name, subdomain, plan, max_seats)
  • User model (id, email, tenant_id, auth_provider_id)
  • License model (id, tenant_id, plan, seats_total, active)
  • Session model (id, license_id, user_id, hardware_id, started_at, last_heartbeat)
  • AuditLog model (id, tenant_id, user_id, action, timestamp)
  • Create Alembic migrations
  • Time Estimate: 3 hours

Day 3-4: License API Endpoints

P2-T03: License Acquire Endpoint

  • POST /api/v1/licenses/acquire
  • Request: {user_id, hardware_id, jwt_token}
  • Validate JWT with Identity Platform
  • Check license active in PostgreSQL
  • Atomic seat check in Redis (Lua script)
  • Sign license with Cloud KMS
  • Response: {license_token, expires_at, public_key}
  • Time Estimate: 4 hours

P2-T04: Heartbeat Endpoint

  • POST /api/v1/licenses/heartbeat
  • Request: {session_id, jwt_token}
  • Validate JWT
  • Update last_heartbeat timestamp in Redis (extend TTL to 6 min)
  • Response: {status: "ok", next_heartbeat_at}
  • Time Estimate: 2 hours

P2-T05: License Release Endpoint

  • POST /api/v1/licenses/release
  • Request: {session_id, jwt_token}
  • Validate JWT
  • Delete session from Redis (atomic decrement)
  • Log release in audit log
  • Response: {status: "released"}
  • Time Estimate: 2 hours

Day 5: Redis & KMS Integration

P2-T06: Redis Lua Scripts (Atomic Seat Counting)

  • Create redis_scripts/ directory
  • acquire_seat.lua - Atomic check and increment
  • release_seat.lua - Atomic decrement
  • extend_ttl.lua - Extend session TTL on heartbeat
  • Load scripts on Redis connection
  • Time Estimate: 3 hours

P2-T07: Cloud KMS Integration

  • Create kms/ service module
  • Implement async KMS signing with aiogoogle
  • Sign license payload (tenant_id, user_id, hardware_id, expires_at)
  • Cache public key in Redis (1 hour TTL)
  • Time Estimate: 3 hours

Day 6: Auth & Admin Endpoints

P2-T08: JWT Auth Middleware

  • Create middleware/auth.py
  • Validate JWT signature with Identity Platform public keys
  • Extract user_id and tenant_id from claims
  • Add tenant context to request state
  • Handle token expiration and refresh
  • Time Estimate: 4 hours

P2-T09: Admin Endpoints (Tenant CRUD)

  • POST /api/v1/tenants/ (create tenant)
  • GET /api/v1/tenants/{id}/ (tenant details)
  • PUT /api/v1/tenants/{id}/ (update tenant)
  • GET /api/v1/tenants/{id}/users/ (list users)
  • POST /api/v1/tenants/{id}/users/ (create user)
  • Time Estimate: 4 hours

Day 7: Testing

P2-T10: Unit Tests (pytest)

  • Test database models (validation, constraints)
  • Test API endpoints (CRUD operations)
  • Test Redis Lua scripts (atomic operations)
  • Test KMS signing and verification
  • Test JWT middleware (valid/invalid tokens)
  • Target: 80%+ code coverage
  • Time Estimate: 6 hours

P2-T11: Integration Tests

  • Test end-to-end license acquisition flow
  • Test multi-tenant isolation (tenant A can't access tenant B data)
  • Test concurrent seat acquisition (10 users)
  • Test session TTL expiry (Redis)
  • Time Estimate: 4 hours

Deliverables

  • ⏸️ Complete FastAPI application
  • ⏸️ Database models with migrations
  • ⏸️ License API endpoints (acquire, heartbeat, release)
  • ⏸️ Redis Lua scripts for atomic seat counting
  • ⏸️ Cloud KMS signing integration
  • ⏸️ JWT authentication middleware
  • ⏸️ Admin endpoints for tenant management
  • ⏸️ Comprehensive test suite (80%+ coverage)

Success Criteria

  • All API endpoints return correct responses
  • Multi-tenant isolation verified (integration tests)
  • Unit test coverage ≥80%
  • Redis atomic operations work correctly
  • KMS signing and verification functional
  • JWT middleware validates tokens correctly

Blocking Dependencies

Phase 3 Deployment REQUIRES Phase 2 completion:

  • Cannot deploy without backend code
  • Dockerfile needs working FastAPI application

7. Phase 3: Deployment

Duration: 2-3 days Status: ⏸️ PENDING (0% Complete) Team: DevOps Engineer Goal: Deploy FastAPI backend to GKE with SSL/DNS configuration

Objectives

  • Create Dockerfile for FastAPI application
  • Build and push Docker image to GCR
  • Create Kubernetes manifests (Deployment, Service, Ingress)
  • Deploy to GKE
  • Configure SSL certificate + DNS
  • Verify end-to-end deployment

Tasks

Day 1: Containerization

P3-T01: Create Dockerfile (Multi-Stage Build)

  • Stage 1: Build dependencies (Poetry install)
  • Stage 2: Runtime image (Python 3.11-slim)
  • Copy application code
  • Set entrypoint: uvicorn main:app --host 0.0.0.0 --port 8000
  • Time Estimate: 2 hours

P3-T02: Build and Push Docker Image

  • Build image: docker build -t gcr.io/coditect-cloud-infra/license-api:v1.0.0
  • Test locally: docker run -p 8000:8000 license-api:v1.0.0
  • Push to GCR: docker push gcr.io/coditect-cloud-infra/license-api:v1.0.0
  • Time Estimate: 1 hour

Day 2: Kubernetes Manifests

P3-T03: Kubernetes Deployment Manifest

  • Create kubernetes/base/deployment.yaml
  • 3 replicas for high availability
  • Resource requests: 500m CPU, 512Mi memory
  • Resource limits: 1000m CPU, 1Gi memory
  • Liveness probe: /health
  • Readiness probe: /ready
  • Environment variables from Secret Manager
  • Time Estimate: 2 hours

P3-T04: Kubernetes Service Manifest

  • Create kubernetes/base/service.yaml
  • Type: ClusterIP (internal load balancing)
  • Port: 80 → 8000 (container port)
  • Selector: app: license-api
  • Time Estimate: 1 hour

P3-T05: Ingress + cert-manager Config

  • Create kubernetes/base/ingress.yaml
  • Host: auth.coditect.ai
  • TLS: Use cert-manager for Let's Encrypt
  • Annotations: cert-manager.io/cluster-issuer: letsencrypt-prod
  • Time Estimate: 2 hours

Day 3: Deployment & DNS

P3-T06: Deploy to GKE

  • Apply Kubernetes manifests: kubectl apply -k kubernetes/base/
  • Verify pods running: kubectl get pods
  • Check logs: kubectl logs <pod-name>
  • Time Estimate: 2 hours

P3-T07: Configure Cloud DNS

  • Create A record: auth.coditect.ai → GCP Load Balancer IP
  • Verify DNS propagation: dig auth.coditect.ai
  • Time Estimate: 1 hour

P3-T08: SSL Certificate Verification

  • Wait for cert-manager to provision certificate (5-10 min)
  • Verify certificate: curl -I https://auth.coditect.ai/health
  • Check expiry: openssl s_client -connect auth.coditect.ai:443
  • Time Estimate: 1 hour

Deliverables

  • ⏸️ Dockerfile with multi-stage build
  • ⏸️ Docker image pushed to GCR
  • ⏸️ Kubernetes manifests (Deployment, Service, Ingress)
  • ⏸️ FastAPI deployed to GKE
  • ⏸️ SSL certificate on auth.coditect.ai
  • ⏸️ DNS configured and propagated

Success Criteria

  • Docker image builds successfully
  • Kubernetes deployment healthy (3/3 pods running)
  • Health check endpoint responds: https://auth.coditect.ai/health
  • SSL certificate valid (Let's Encrypt)
  • DNS resolves correctly

Blocking Dependencies

Phase 5 E2E Testing REQUIRES Phase 3 completion:

  • Cannot test end-to-end without deployed API
  • Client SDK needs production URL

8. Phase 4: Client SDK

Duration: 1-2 days Status: ⏸️ PENDING (0% Complete) Team: Python Developer Goal: Create Python License Client SDK for coditect-core integration

Objectives

  • Create license-client Python package
  • Implement hardware fingerprinting
  • Implement signature verification (public key)
  • Implement heartbeat background thread
  • Implement offline mode with grace period
  • Write comprehensive tests

Tasks

Day 1: Client SDK Foundation

P4-T01: Create License Client Package

  • Create license-client/ directory in coditect-core
  • Create LicenseClient class
  • Methods: acquire(), heartbeat(), release()
  • Configuration: API URL, timeout, retry settings
  • Time Estimate: 4 hours

P4-T02: Hardware Fingerprinting

  • Generate unique hardware ID (MAC address + CPU ID + disk serial)
  • Hash hardware ID (SHA256)
  • Store hardware ID in local cache (~/.coditect/hardware_id)
  • Time Estimate: 2 hours

P4-T03: Signature Verification (Public Key)

  • Fetch public key from /api/v1/public-key endpoint
  • Cache public key locally (1 hour TTL)
  • Verify license token signature with RSA-4096
  • Validate token claims (tenant_id, user_id, expires_at)
  • Time Estimate: 2 hours

Day 2: Heartbeat & Offline Mode

P4-T04: Heartbeat Background Thread

  • Create HeartbeatThread class
  • Send heartbeat every 5 minutes
  • Handle network failures (retry 3 times)
  • Graceful shutdown on CODITECT exit
  • Time Estimate: 2 hours

P4-T05: Offline Mode with Grace Period

  • If heartbeat fails, continue running (grace period: 24 hours)
  • Check last successful heartbeat timestamp
  • Display warning: "License server unreachable, offline mode"
  • Force exit after 24 hours offline
  • Time Estimate: 2 hours

P4-T06: Error Handling and Retries

  • Retry logic for network failures (exponential backoff)
  • Handle 429 Too Many Requests (back off)
  • Handle 401 Unauthorized (re-authenticate)
  • Handle 503 Service Unavailable (wait and retry)
  • Time Estimate: 2 hours

P4-T07: Client Unit Tests

  • Test hardware fingerprinting
  • Test signature verification
  • Test heartbeat thread
  • Test offline mode grace period
  • Test error handling
  • Target: 80%+ coverage
  • Time Estimate: 3 hours

Deliverables

  • ⏸️ license-client Python package
  • ⏸️ Hardware fingerprinting implemented
  • ⏸️ Signature verification with public key
  • ⏸️ Heartbeat background thread
  • ⏸️ Offline mode with 24-hour grace period
  • ⏸️ Comprehensive error handling
  • ⏸️ Client unit tests (80%+ coverage)

Success Criteria

  • Client can acquire license from API
  • Signature verification succeeds
  • Heartbeat thread runs in background
  • Offline mode works (grace period)
  • Error handling tested (network failures)
  • Client tests pass (80%+ coverage)

9. Phase 5: End-to-End Testing

Duration: 2 days Status: ⏸️ PENDING (0% Complete) Team: QA Engineer, Backend Engineer Goal: Verify complete system integration with load testing

Objectives

  • Test end-to-end license flow (acquire → heartbeat → release)
  • Test multi-user concurrent access (10 users)
  • Perform load testing (100 concurrent users)
  • Run security scan (OWASP ZAP)
  • Fix critical bugs

Tasks

Day 1: Integration Testing

P5-T01: End-to-End Integration Test

  • Test: User signs up → acquires license → heartbeat → release
  • Verify JWT token validation
  • Verify Redis seat counting (atomic)
  • Verify KMS signature verification
  • Verify session TTL expiry (wait 6 min, session auto-released)
  • Time Estimate: 4 hours

P5-T02: Multi-User Concurrent Test (10 users)

  • Simulate 10 users acquiring licenses simultaneously
  • Verify seat limit enforced (e.g., 5 seat plan)
  • Verify 6th user gets "No seats available" error
  • Verify seat release frees up capacity
  • Time Estimate: 2 hours

Day 2: Load & Security Testing

P5-T03: Load Test with Locust (100 users, 1000 req/min)

  • Create Locust test script:
    • 100 concurrent users
    • Each user: acquire → 3 heartbeats → release
    • Total: 1000 requests/min
  • Run for 10 minutes
  • Measure:
    • p50, p95, p99 latency
    • Error rate
    • Throughput (req/s)
  • Target: p99 < 500ms, error rate < 1%
  • Time Estimate: 4 hours

P5-T04: Security Scan (OWASP ZAP)

  • Run OWASP ZAP active scan against staging API
  • Check for:
    • SQL injection vulnerabilities
    • XSS vulnerabilities
    • Authentication bypass
    • Sensitive data exposure
  • Fix critical/high vulnerabilities
  • Time Estimate: 4 hours

P5-T05: Fix Critical Bugs

  • Review test results
  • Fix bugs discovered in E2E testing
  • Fix performance bottlenecks (if p99 > 500ms)
  • Re-run tests to verify fixes
  • Time Estimate: 8 hours

Deliverables

  • ⏸️ End-to-end test passing
  • ⏸️ Multi-user concurrent test passing
  • ⏸️ Load test passing (100 users, p99 < 500ms)
  • ⏸️ Security scan completed (no critical issues)
  • ⏸️ All critical bugs fixed

Success Criteria

  • E2E test passes (acquire → heartbeat → release)
  • Multi-user test passes (seat limit enforced)
  • Load test passes (100 users, p99 < 500ms, error rate < 1%)
  • Security scan shows no critical/high vulnerabilities
  • All critical bugs fixed and verified

10. Phase 6: Production Hardening

Duration: 2 days Status: ⏸️ PENDING (0% Complete) Team: DevOps Engineer Goal: Production readiness with monitoring and runbooks

Objectives

  • Create GKE monitoring dashboards (Prometheus + Grafana)
  • Setup alerting (PagerDuty integration)
  • Write runbook for incident response
  • Create deployment checklist
  • Document production procedures

Tasks

Day 1: Monitoring & Alerting

P6-T01: Create GKE Monitoring Dashboards

  • Setup Prometheus in GKE (or use GCP Managed Prometheus)
  • Create Grafana dashboards:
    • License API performance (latency, error rate, throughput)
    • Active sessions by tenant
    • Redis connection pool usage
    • PostgreSQL query performance
    • GKE pod health (CPU, memory, restarts)
  • Time Estimate: 4 hours

P6-T02: Setup Alerting (PagerDuty)

  • Configure PagerDuty integration
  • Create alert rules:
    • High error rate (>5% 5xx errors)
    • High latency (p99 > 1s)
    • Redis connection pool exhaustion
    • Database connection failures
    • Pod crash loops
  • Setup on-call rotation
  • Time Estimate: 2 hours

Day 2: Runbooks & Checklists

P6-T03: Write Runbook (Incident Response)

  • Create docs/RUNBOOK.md
  • Sections:
    • Common Issues (API 5xx errors, database connection failures)
    • Troubleshooting Steps (check logs, restart pods, rollback)
    • Escalation Procedures (when to page CTO)
    • Rollback Procedures (kubectl rollout undo)
    • Database Backup and Restore
  • Time Estimate: 3 hours

P6-T04: Create Deployment Checklist

  • Create docs/DEPLOYMENT-CHECKLIST.md
  • Pre-deployment:
    • All tests passing (unit, integration, E2E)
    • Code review approved
    • Database migrations tested
    • Staging deployment successful
  • Deployment:
    • Blue-green deployment to production
    • Health checks passing
    • Monitor for 1 hour (check logs, metrics)
  • Post-deployment:
    • Verify end-to-end flow
    • Check error rate (should be < 1%)
    • Update CHANGELOG.md
  • Time Estimate: 1 hour

Deliverables

  • ⏸️ Prometheus + Grafana monitoring dashboards
  • ⏸️ PagerDuty alerting configured
  • ⏸️ Runbook for common issues
  • ⏸️ Deployment checklist

Success Criteria

  • Monitoring dashboards display real-time data
  • Alerts fire correctly for test scenarios
  • Runbook reviewed and approved by team
  • Deployment checklist complete

11. Timeline & Resource Requirements

Overall Timeline

Total Duration: 14-18 business days (3-4 weeks) Start Date: November 20, 2025 (Phase 0 started) Phase 0 Complete: November 24, 2025 ✅ Target MVP Completion: December 6, 2025 (Phase 1-6)

Phase Breakdown

PhaseDurationStart DateEnd DateStatusCompletion
Phase 04 daysNov 20Nov 24✅ COMPLETE100%
Phase 12-3 daysNov 25Nov 27⏸️ NEXT0%
Phase 25-7 daysNov 28Dec 4⏸️ PENDING0%
Phase 32-3 daysDec 5Dec 6⏸️ PENDING0%
Phase 41-2 daysDec 5*Dec 6*⏸️ PENDING0%
Phase 52 daysDec 6Dec 9⏸️ PENDING0%
Phase 62 daysDec 9Dec 10⏸️ PENDING0%

*Phase 4 (Client SDK) can run in parallel with Phase 2-3

Team Composition

Required Team:

  • 1x DevOps Engineer (40 hrs/week) - Infrastructure, deployment, monitoring
  • 1x Backend Engineer (40 hrs/week) - FastAPI development, testing
  • 1x Python Developer (20 hrs/week) - Client SDK development
  • 1x QA Engineer (20 hrs/week) - Load testing, security scanning

Total: 3 FTEs (Full-Time Equivalents)

Resource Allocation by Phase

PhaseDevOpsBackendPython DevQATotal Hours
Phase 0 ✅3200032
Phase 11680024
Phase 28400856
Phase 31680024
Phase 40016420
Phase 58801632
Phase 61600824
Total96641636212

12. Budget & Cost Analysis

Development Costs

Labor Costs (4 weeks):

RoleRateHoursTotal
DevOps Engineer$130/hr96$12,480
Backend Engineer$120/hr64$7,680
Python Developer$120/hr16$1,920
QA Engineer$100/hr36$3,600

Total Labor: $25,680

Infrastructure Costs

Development Environment (4 weeks):

  • GKE cluster: $100/month × 1 month = $100
  • Cloud SQL: $150/month × 1 month = $150
  • Redis: $30/month × 1 month = $30
  • Cloud KMS: $10/month × 1 month = $10
  • Identity Platform: Free (up to 50K MAU)
  • Networking: $20/month × 1 month = $20

Development Total: $310

Production Environment (initial month):

  • GKE cluster: $500/month (production-grade, 5 nodes)
  • Cloud SQL: $400/month (higher tier, HA)
  • Redis: $150/month (16GB STANDARD tier)
  • Cloud KMS: $10/month
  • Identity Platform: $50/month (up to 50K MAU)
  • Load Balancer + SSL: $50/month
  • Monitoring: $40/month

Production Total: $1,200/month

Total Project Budget

CategoryCost
Labor (4 weeks)$25,680
Infrastructure (Dev, 1 month)$310
Infrastructure (Prod, 1 month)$1,200
Contingency (10%)$2,719
TOTAL$29,909

Ongoing Costs (Post-Launch)

Monthly Recurring Costs:

  • Production infrastructure: $1,200/month
  • Support & maintenance (0.5 FTE): $10,000/month

Total: ~$11,200/month


13. Risk Assessment

Critical Risks

RiskLikelihoodImpactMitigation Strategy
Identity Platform OAuth app review delayMedium (30%)High (+2-3 days)Start OAuth app approval process immediately; use Firebase emulator for development
Redis Lua script bugs (race conditions)Medium (40%)High (+1-2 days)Thorough unit testing with concurrent users; code review by senior engineer
GKE networking issues (ingress, DNS)Low (20%)Medium (+1 day)Test networking in dev environment first; have fallback to Cloud Run
FastAPI async/await complexityLow (15%)Low (+1 day)Use existing FastAPI templates; pair programming for complex async code
SSL certificate provisioning delayLow (10%)Low (+4 hours)Use Let's Encrypt staging first; manual cert creation as backup

Medium Risks

RiskLikelihoodImpactMitigation Strategy
KMS signing performance issuesLow (10%)Low (+4 hours)Cache public key; use async KMS client
PostgreSQL connection pool exhaustionLow (5%)Low (+2 hours)Set appropriate pool size (20-50); monitor connections
Load test failures (p99 > 500ms)Medium (25%)Medium (+1 day)Database query optimization; add indexes; implement caching

Risk Response Plans

If OAuth Review Delayed:

  1. Use Firebase Auth emulator for development
  2. Continue backend development with mock JWT tokens
  3. Deploy to staging with test OAuth credentials
  4. Switch to production OAuth when approved

If Load Test Fails:

  1. Identify bottleneck (database, API, Redis)
  2. Optimize hot path queries
  3. Add database indexes on frequently queried columns
  4. Implement caching for tenant settings (Redis)
  5. Scale horizontally (add GKE nodes)

If Security Scan Fails:

  1. Review OWASP ZAP findings
  2. Fix critical/high vulnerabilities immediately
  3. Re-run security scan
  4. Document remediation in security log

14. Quality Gates & Success Metrics

Phase Completion Criteria

Each phase must meet the following criteria before proceeding to the next phase:

Phase 0: Infrastructure & Documentation ✅ COMPLETE

  • GKE cluster deployed and operational
  • Cloud SQL PostgreSQL accessible from GKE
  • Redis cluster operational
  • All infrastructure costs within budget ($310/month dev)
  • Documentation organized to CODITECT standards (100/100)
  • Diagrams cover all major system components

Phase 1: Security Services

  • Cloud KMS can sign arbitrary data
  • Identity Platform OAuth flow works for Google and GitHub
  • JWT tokens contain correct claims (user_id, tenant_id)
  • Security documentation complete

Phase 2: Backend Development

  • All API endpoints return correct responses
  • Multi-tenant isolation verified (integration tests)
  • Unit test coverage ≥80%
  • Redis atomic operations work correctly
  • KMS signing and verification functional

Phase 3: Deployment

  • Docker image builds successfully
  • Kubernetes deployment healthy (3/3 pods running)
  • Health check endpoint responds: https://auth.coditect.ai/health
  • SSL certificate valid (Let's Encrypt)

Phase 4: Client SDK

  • Client can acquire license from API
  • Signature verification succeeds
  • Heartbeat thread runs in background
  • Offline mode works (grace period)
  • Client tests pass (80%+ coverage)

Phase 5: End-to-End Testing

  • E2E test passes (acquire → heartbeat → release)
  • Multi-user test passes (seat limit enforced)
  • Load test passes (100 users, p99 < 500ms, error rate < 1%)
  • Security scan shows no critical/high vulnerabilities

Phase 6: Production Hardening

  • Monitoring dashboards display real-time data
  • Alerts fire correctly for test scenarios
  • Runbook complete and reviewed
  • Deployment checklist complete

Key Performance Indicators (KPIs)

Technical KPIs:

MetricTargetMeasurement
API Latency (p99)<500msPrometheus histogram
API Error Rate<1%Prometheus counter
Database Query Time (p99)<100msPrometheus histogram
Uptime99.9%Grafana uptime panel
Test Coverage≥80% backend, ≥80% clientpytest
Security Vulnerabilities0 critical/highOWASP ZAP, Trivy

Business KPIs (Post-Launch):

MetricTargetMeasurement
License Acquisition Success Rate≥99%API logs
Heartbeat Reliability≥99.9%API logs
Active Sessions by Tenant100-1000Database query
API Usage per Tenant10,000 requests/monthKong analytics

Acceptance Testing

Pre-Launch Checklist:

  • All features functional in production environment
  • Load test passed (100 concurrent users)
  • Security audit completed (no critical issues)
  • Disaster recovery tested (backup restore)
  • Documentation complete (API, runbook, deployment)
  • On-call rotation established
  • Monitoring and alerting validated

Appendices

A. Glossary

  • Floating License: License that limits concurrent users, not installations
  • Check-on-Start: Validation pattern where license is checked at application startup
  • Cloud KMS: Google Cloud Key Management Service for cryptographic operations
  • Identity Platform: Google's OAuth2/OIDC service for authentication
  • Redis Lua: Server-side scripting in Redis for atomic operations
  • JWT: JSON Web Token (for authentication)
  • TTL: Time To Live (automatic expiration in Redis)
  • Heartbeat: Periodic signal to indicate session is still active

B. References

C. Architecture Decision Records (ADRs)

See master repository docs/adrs/ for project-wide architectural decisions:

  • ADR-001: OpenTofu over Terraform (licensing)
  • ADR-002: FastAPI over Django (performance)
  • ADR-003: Floating concurrent licensing pattern
  • ADR-004: Local-first architecture (check-on-start)
  • ADR-005: Cloud KMS for license signing

Document Version: 3.0 (License Management Focus) Last Updated: November 24, 2025 Next Review: November 25, 2025 (Phase 1 kickoff) Owner: CODITECT Infrastructure Team Status: PHASE 0 COMPLETE ✅ | PHASE 1-6 PENDING ⏸️


Document Change History

VersionDateAuthorChanges
1.0Nov 22, 2025Infrastructure TeamInitial (incorrect Django/Citus focus)
2.0Nov 23, 2025Infrastructure TeamExpanded Django plan (still incorrect)
3.0Nov 24, 2025Documentation SpecialistComplete rewrite for License Management Platform (FastAPI + PostgreSQL) - Phase 0 completion documented