PROJECT PLAN: CODITECT License Management Platform
Project: CODITECT License Management Platform (FastAPI + PostgreSQL) Date: November 24, 2025 Owner: CODITECT Infrastructure Team Status: ACTIVE DEVELOPMENT (35% Complete) Version: 3.0 (License Management Focus)
Table of Contents
- Executive Summary
- Technology Stack
- Architecture Overview
- Phase 0: Infrastructure & Documentation
- Phase 1: Security Services
- Phase 2: Backend Development
- Phase 3: Deployment
- Phase 4: Client SDK
- Phase 5: End-to-End Testing
- Phase 6: Production Hardening
- Timeline & Resource Requirements
- Budget & Cost Analysis
- Risk Assessment
- Quality Gates & Success Metrics
1. Executive Summary
This plan outlines the complete implementation of the CODITECT License Management Platform, a production-grade floating license system that enables CODITECT's local-first AI development framework to validate licenses, track concurrent sessions, and manage multi-tenant licensing through a secure cloud API.
Key Objectives
- Floating Concurrent Licensing: Limit simultaneous users, not installations
- Check-on-Start Pattern: Fast validation at CODITECT startup (local-first architecture)
- Cloud KMS Signing: Tamper-proof licenses verified locally without network
- Multi-Tenant Isolation: Complete tenant separation at application and database levels
- Production-Ready: Comprehensive monitoring, testing, and deployment automation
Current Status
Overall Completion: 35% of MVP (Phase 0 complete, Phase 1-6 pending)
Completed Work:
- ✅ Phase 0: Infrastructure & Documentation (100%) - November 20-24, 2025
- GKE cluster deployed (3 nodes, auto-scaling 1-10)
- Cloud SQL PostgreSQL 16 with regional HA
- Redis Memorystore 6GB with RDB persistence
- VPC networking with private subnets and Cloud NAT
- Secret Manager with 9 secrets configured
- Documentation organized (7 categories, 100/100 CODITECT standards)
- 17 comprehensive diagrams (C4 architecture, workflows, deployment)
- Production-ready README.md and CLAUDE.md
Remaining Work:
- ⏸️ Phase 1: Security Services (2-3 days) - Cloud KMS + Identity Platform
- ⏸️ Phase 2: Backend Development (5-7 days) - FastAPI license API
- ⏸️ Phase 3: Deployment (2-3 days) - Kubernetes deployment + SSL/DNS
- ⏸️ Phase 4: Client SDK (1-2 days) - Python License Client for coditect-core
- ⏸️ Phase 5: E2E Testing (2 days) - Integration and load testing
- ⏸️ Phase 6: Production Hardening (2 days) - Monitoring and runbooks
Success Criteria
- ✅ CODITECT can acquire licenses on startup
- ✅ License API validates JWT and checks seat availability atomically (Redis Lua)
- ✅ License API signs tokens with Cloud KMS (RSA-4096)
- ✅ CODITECT validates signature locally (offline-capable)
- ✅ Heartbeat keeps session alive (every 5 min)
- ✅ Graceful license release on CODITECT exit
- ✅ End-to-end test passing (acquire → heartbeat → release)
- ✅ Unit test coverage ≥80%
- ✅ Load test passing (100 concurrent users)
2. Technology Stack
Core Platform
| Component | Technology | Version | Purpose |
|---|---|---|---|
| Backend Framework | FastAPI | 0.104+ | Async REST API framework |
| Database | PostgreSQL | 16+ | License and tenant data storage |
| Session Management | Redis | 7.x | Concurrent seat tracking with TTL |
| Authentication | Identity Platform | Latest | OAuth2/OIDC with Google/GitHub |
| License Signing | Cloud KMS | Latest | RSA-4096 asymmetric key signing |
| ORM | SQLAlchemy async | 2.0+ | Database models and queries |
| Validation | Pydantic | 2.x | Request/response schema validation |
Infrastructure
| Component | Technology | Purpose |
|---|---|---|
| Cloud Platform | Google Cloud Platform (GCP) | Cloud infrastructure |
| Orchestration | Kubernetes (GKE) | Container orchestration |
| Infrastructure as Code | OpenTofu v1.10.7 | GCP resource provisioning (MPL 2.0) |
| Load Balancer | GCP Load Balancer + Ingress | Traffic distribution |
| Container Registry | Google Container Registry | Docker image storage |
| Secrets Management | GCP Secret Manager | Secure credential storage |
Observability & Monitoring (Phase 6)
| Component | Technology | Purpose |
|---|---|---|
| Metrics | Prometheus | Metrics collection |
| Visualization | Grafana | Metrics dashboards |
| Distributed Tracing | Jaeger (optional) | Request tracing |
| Logging | Google Cloud Logging | Centralized logging |
| Error Tracking | Sentry (optional) | Exception tracking |
Development & CI/CD
| Component | Technology | Purpose |
|---|---|---|
| Version Control | Git + GitHub | Source code management |
| CI/CD | GitHub Actions | Automated testing and deployment |
| Testing | pytest + Locust | Unit/integration/load testing |
| Code Quality | Ruff + Black + MyPy | Linting and type checking |
| Security Scanning | Trivy + Safety | Vulnerability scanning |
3. Architecture Overview
License Management Pattern: Check-on-Start
CODITECT uses a local-first architecture - the framework runs entirely on the user's machine. The cloud infrastructure only validates licenses and tracks concurrent usage.
Flow:
User starts CODITECT locally
↓
CODITECT → License API: "Can I run?" (hardware_id, jwt_token)
↓
License API (running on GKE):
1. Validate JWT token (Identity Platform)
2. Check license active in PostgreSQL
3. Atomic seat check in Redis (Lua script)
4. Sign license with Cloud KMS (RSA-4096)
↓
CODITECT ← Signed License Token
↓
Validate signature locally (offline-capable)
↓
Run CODITECT with periodic heartbeats (every 5 min)
↓
On exit: Release seat OR wait for 6-min TTL expiry (automatic cleanup)
Key Architectural Decisions
- Floating Concurrent Licensing - Limit simultaneous users, not installations
- Session Management - Redis TTL (6 min) prevents zombie sessions automatically
- Offline-Capable - Signed licenses verified locally, works without network
- Security - Cloud KMS signing (tamper-proof), no client-side secrets
- Multi-Tenant - Tenant isolation via application-level filtering in PostgreSQL
See: docs/architecture/c1-system-context.md for detailed architecture
System Context (C4 Level 1)
┌──────────────────────────────────────────────────────────────┐
│ CODITECT Developer │
│ (Uses CODITECT CLI for local AI development) │
└────────────┬─────────────────────────────────────────────────┘
│
│ 1. Request license (hardware_id, JWT)
│ 2. Send heartbeat (every 5 min)
│ 3. Release license on exit
▼
┌──────────────────────────────────────────────────────────────┐
│ CODITECT License Management Platform │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌───────────────┐ │
│ │ FastAPI │ │ PostgreSQL │ │ Redis │ │
│ │ License API │→ │ (licenses, │ │ (session │ │
│ │ │ │ tenants, │ │ tracking, │ │
│ │ │ │ users) │ │ TTL 6 min) │ │
│ └─────────────┘ └──────────────┘ └───────────────┘ │
│ │ │
│ └──────────┬─────────────────┬──────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌─────────┐ │
│ │ Identity │ │ Cloud KMS │ │ Secret │ │
│ │ Platform │ │ (RSA-4096 │ │ Manager │ │
│ │ (OAuth2) │ │ signing) │ │ │ │
│ └──────────────┘ └──────────────┘ └─────────┘ │
└──────────────────────────────────────────────────────────────┘
4. Phase 0: Infrastructure & Documentation
Duration: 4 days (November 20-24, 2025) Status: ✅ COMPLETE (100%) Team: DevOps Engineer, Documentation Specialist
Objectives
- ✅ Provision production-grade GCP infrastructure
- ✅ Deploy GKE cluster, Cloud SQL PostgreSQL, Redis Memorystore
- ✅ Configure VPC networking with private subnets
- ✅ Organize documentation to 100/100 CODITECT standards
- ✅ Create comprehensive architecture diagrams
- ✅ Update README.md and CLAUDE.md to production quality
Work Completed
Infrastructure Deployment (November 20-21)
GKE Cluster:
- 3-node cluster in us-central1 (auto-scaling 1-10 nodes)
- Node type: n1-standard-2 (2 vCPU, 7.5GB RAM)
- Preemptible nodes for cost optimization ($100/month)
- Workload Identity enabled for GCP service account integration
Cloud SQL PostgreSQL 16:
- Regional HA with automatic failover
- Instance type: db-custom-2-7680 (2 vCPU, 7.5GB RAM)
- 100GB SSD storage with auto-increase enabled
- Private IP connectivity via VPC peering
- Automated daily backups with 7-day retention
- Cost: $150/month
Redis Memorystore:
- 6GB BASIC tier (RDB persistence enabled)
- Private IP connectivity to GKE
- Used for atomic seat counting (Lua scripts)
- Cost: $30/month
VPC Networking:
- Custom VPC with RFC 1918 address space
- Private subnets for GKE and Cloud SQL
- Cloud NAT for outbound internet access
- VPC peering between GKE and Cloud SQL
- Private Google Access enabled
Secret Manager:
- 9 secrets configured (DB passwords, API keys placeholders)
- Workload Identity for pod access to secrets
- Secret rotation policies documented
Total Infrastructure Cost: $310/month (development environment)
Documentation Organization (November 24)
Directory Restructuring:
- Created 7 documentation categories:
docs/architecture/- C4 diagrams (C1, C2, C3)docs/workflows/- Sequence diagrams with code examplesdocs/deployment/- Deployment and infrastructure guidesdocs/guides/- Development setup and troubleshootingdocs/reference/- GCP inventory, API referencedocs/project-management/- PROJECT-PLAN, TASKLIST, CRITICAL-PATHdocs/research/- Gap analysis, OpenTofu migration research
- Created comprehensive README files for all directories
- Updated master
docs/README.mdwith complete navigation
Diagram Library:
- Created 17 comprehensive diagrams:
- C4 Architecture (5 diagrams):
- C1: System Context
- C2: Container Architecture
- C3: GKE Components
- C3: Networking Components
- C3: Security Components
- Workflow Sequences (5 diagrams):
- License acquisition flow
- Heartbeat mechanism
- License release flow
- User registration flow
- Multi-tenant isolation verification
- Deployment Diagrams (2 diagrams):
- Blue/Green deployment strategy
- Infrastructure topology
- Supporting Diagrams (5 diagrams):
- Redis atomic seat counting
- Cloud KMS signing flow
- Session TTL management
- Error handling patterns
- Scaling strategy
- C4 Architecture (5 diagrams):
Documentation Quality:
- Updated README.md to 558 lines (production quality)
- Updated CLAUDE.md to 672 lines (AI-optimized context)
- Created 6 documentation planning documents (101KB total)
- Achieved 100% README coverage across all directories
- All documentation follows CODITECT standards (100/100 score)
Repository Organization:
- Root directory cleaned to 14 essential files
- All files properly categorized
- Git history preserved (all moves with
git mv) - Production-ready structure
Deliverables
- ✅ Complete GCP infrastructure deployed and operational
- ✅ 7 documentation categories with comprehensive content
- ✅ 17 architecture and workflow diagrams
- ✅ Production-quality README.md and CLAUDE.md
- ✅ 100/100 CODITECT standards compliance
- ✅ Complete navigation and cross-referencing
Success Criteria
- GKE cluster running in us-central1
- Cloud SQL PostgreSQL accessible from GKE
- Redis cluster operational and accessible
- All infrastructure costs within budget ($310/month dev)
- Documentation organized to CODITECT standards
- All directories have comprehensive README files
- Diagrams cover all major system components and flows
Cost Summary (Phase 0)
| Component | Monthly Cost |
|---|---|
| GKE Cluster (3 preemptible nodes) | $100 |
| Cloud SQL PostgreSQL (HA) | $150 |
| Redis Memorystore (6GB) | $30 |
| VPC Networking + Cloud NAT | $20 |
| Secret Manager | $10 |
| Total | $310 |
5. Phase 1: Security Services
Duration: 2-3 days Status: ⏸️ NEXT (0% Complete) Team: DevOps Engineer, Security Specialist Goal: Deploy Cloud KMS and Identity Platform for OAuth2 authentication
Objectives
- Deploy Cloud KMS for license signing (RSA-4096)
- Configure Identity Platform for OAuth2 (Google, GitHub)
- Test end-to-end authentication flow
- Document security architecture
Tasks
Day 1: Cloud KMS Setup
P1-T01: Create Cloud KMS OpenTofu Module
- Create
opentofu/modules/kms/directory - Define RSA-4096 asymmetric key for license signing
- Configure key rotation policy (90 days)
- Setup IAM permissions for GKE service account
- Time Estimate: 2 hours
P1-T02: Deploy Cloud KMS
- Deploy KMS key ring to us-central1
- Create signing key:
coditect-license-signing-key - Grant
cloudkms.cryptoKeyEncrypterDecrypterto GKE SA - Test signing with gcloud command
- Time Estimate: 1 hour
Day 2: Identity Platform Setup
P1-T03: Create Identity Platform Module
- Create
opentofu/modules/identity-platform/directory - Configure OAuth2 providers (Google, GitHub)
- Setup OAuth consent screen
- Configure redirect URIs (localhost, staging, production)
- Time Estimate: 4 hours
P1-T04: Deploy Identity Platform
- Enable Identity Platform API
- Create OAuth2 clients (web app, mobile app)
- Configure authorized domains
- Test OAuth flow with Google account
- Time Estimate: 2 hours
Day 3: Testing & Documentation
P1-T05: End-to-End Auth Testing
- Test OAuth2 authorization code flow
- Verify JWT token generation
- Validate token claims (user_id, email, tenant_id)
- Test token refresh flow
- Time Estimate: 4 hours
P1-T06: Security Documentation
- Document OAuth2 flow with sequence diagrams
- Create runbook for token validation
- Document KMS signing process
- Update C3-Security diagram
- Time Estimate: 3 hours
Deliverables
- ⏸️ Cloud KMS operational with RSA-4096 signing key
- ⏸️ Identity Platform configured with Google/GitHub OAuth
- ⏸️ End-to-end authentication flow tested
- ⏸️ Security architecture documented
Success Criteria
- Cloud KMS can sign arbitrary data
- Identity Platform OAuth flow works for Google and GitHub
- JWT tokens contain correct claims (user_id, tenant_id)
- Token validation succeeds with public key
- Security documentation complete
Blocking Dependencies
Phase 2 Backend Development REQUIRES Phase 1 completion:
- JWT validation middleware needs Identity Platform public keys
- License signing needs Cloud KMS integration
6. Phase 2: Backend Development
Duration: 5-7 days Status: ⏸️ PENDING (0% Complete) Team: Backend Engineers (2x), Database Architect Goal: Build complete FastAPI license API with multi-tenant support
Objectives
- Setup FastAPI project structure
- Create database models (SQLAlchemy async)
- Build REST APIs (acquire, heartbeat, release)
- Implement Redis Lua scripts for atomic seat counting
- Integrate Cloud KMS for license signing
- Integrate Identity Platform for JWT validation
- Write comprehensive tests (80%+ coverage)
Tasks
Day 1-2: FastAPI Project & Database Models
P2-T01: FastAPI Project Setup
- Create
backend/directory structure - Initialize FastAPI project with Poetry
- Configure settings.py (environment-based config)
- Setup async database connection (asyncpg)
- Configure CORS middleware
- Time Estimate: 2 hours
P2-T02: Database Models (SQLAlchemy async)
- Create
models/directory - Tenant model (id, name, subdomain, plan, max_seats)
- User model (id, email, tenant_id, auth_provider_id)
- License model (id, tenant_id, plan, seats_total, active)
- Session model (id, license_id, user_id, hardware_id, started_at, last_heartbeat)
- AuditLog model (id, tenant_id, user_id, action, timestamp)
- Create Alembic migrations
- Time Estimate: 3 hours
Day 3-4: License API Endpoints
P2-T03: License Acquire Endpoint
POST /api/v1/licenses/acquire- Request:
{user_id, hardware_id, jwt_token} - Validate JWT with Identity Platform
- Check license active in PostgreSQL
- Atomic seat check in Redis (Lua script)
- Sign license with Cloud KMS
- Response:
{license_token, expires_at, public_key} - Time Estimate: 4 hours
P2-T04: Heartbeat Endpoint
POST /api/v1/licenses/heartbeat- Request:
{session_id, jwt_token} - Validate JWT
- Update
last_heartbeattimestamp in Redis (extend TTL to 6 min) - Response:
{status: "ok", next_heartbeat_at} - Time Estimate: 2 hours
P2-T05: License Release Endpoint
POST /api/v1/licenses/release- Request:
{session_id, jwt_token} - Validate JWT
- Delete session from Redis (atomic decrement)
- Log release in audit log
- Response:
{status: "released"} - Time Estimate: 2 hours
Day 5: Redis & KMS Integration
P2-T06: Redis Lua Scripts (Atomic Seat Counting)
- Create
redis_scripts/directory acquire_seat.lua- Atomic check and incrementrelease_seat.lua- Atomic decrementextend_ttl.lua- Extend session TTL on heartbeat- Load scripts on Redis connection
- Time Estimate: 3 hours
P2-T07: Cloud KMS Integration
- Create
kms/service module - Implement async KMS signing with aiogoogle
- Sign license payload (tenant_id, user_id, hardware_id, expires_at)
- Cache public key in Redis (1 hour TTL)
- Time Estimate: 3 hours
Day 6: Auth & Admin Endpoints
P2-T08: JWT Auth Middleware
- Create
middleware/auth.py - Validate JWT signature with Identity Platform public keys
- Extract user_id and tenant_id from claims
- Add tenant context to request state
- Handle token expiration and refresh
- Time Estimate: 4 hours
P2-T09: Admin Endpoints (Tenant CRUD)
POST /api/v1/tenants/(create tenant)GET /api/v1/tenants/{id}/(tenant details)PUT /api/v1/tenants/{id}/(update tenant)GET /api/v1/tenants/{id}/users/(list users)POST /api/v1/tenants/{id}/users/(create user)- Time Estimate: 4 hours
Day 7: Testing
P2-T10: Unit Tests (pytest)
- Test database models (validation, constraints)
- Test API endpoints (CRUD operations)
- Test Redis Lua scripts (atomic operations)
- Test KMS signing and verification
- Test JWT middleware (valid/invalid tokens)
- Target: 80%+ code coverage
- Time Estimate: 6 hours
P2-T11: Integration Tests
- Test end-to-end license acquisition flow
- Test multi-tenant isolation (tenant A can't access tenant B data)
- Test concurrent seat acquisition (10 users)
- Test session TTL expiry (Redis)
- Time Estimate: 4 hours
Deliverables
- ⏸️ Complete FastAPI application
- ⏸️ Database models with migrations
- ⏸️ License API endpoints (acquire, heartbeat, release)
- ⏸️ Redis Lua scripts for atomic seat counting
- ⏸️ Cloud KMS signing integration
- ⏸️ JWT authentication middleware
- ⏸️ Admin endpoints for tenant management
- ⏸️ Comprehensive test suite (80%+ coverage)
Success Criteria
- All API endpoints return correct responses
- Multi-tenant isolation verified (integration tests)
- Unit test coverage ≥80%
- Redis atomic operations work correctly
- KMS signing and verification functional
- JWT middleware validates tokens correctly
Blocking Dependencies
Phase 3 Deployment REQUIRES Phase 2 completion:
- Cannot deploy without backend code
- Dockerfile needs working FastAPI application
7. Phase 3: Deployment
Duration: 2-3 days Status: ⏸️ PENDING (0% Complete) Team: DevOps Engineer Goal: Deploy FastAPI backend to GKE with SSL/DNS configuration
Objectives
- Create Dockerfile for FastAPI application
- Build and push Docker image to GCR
- Create Kubernetes manifests (Deployment, Service, Ingress)
- Deploy to GKE
- Configure SSL certificate + DNS
- Verify end-to-end deployment
Tasks
Day 1: Containerization
P3-T01: Create Dockerfile (Multi-Stage Build)
- Stage 1: Build dependencies (Poetry install)
- Stage 2: Runtime image (Python 3.11-slim)
- Copy application code
- Set entrypoint:
uvicorn main:app --host 0.0.0.0 --port 8000 - Time Estimate: 2 hours
P3-T02: Build and Push Docker Image
- Build image:
docker build -t gcr.io/coditect-cloud-infra/license-api:v1.0.0 - Test locally:
docker run -p 8000:8000 license-api:v1.0.0 - Push to GCR:
docker push gcr.io/coditect-cloud-infra/license-api:v1.0.0 - Time Estimate: 1 hour
Day 2: Kubernetes Manifests
P3-T03: Kubernetes Deployment Manifest
- Create
kubernetes/base/deployment.yaml - 3 replicas for high availability
- Resource requests: 500m CPU, 512Mi memory
- Resource limits: 1000m CPU, 1Gi memory
- Liveness probe:
/health - Readiness probe:
/ready - Environment variables from Secret Manager
- Time Estimate: 2 hours
P3-T04: Kubernetes Service Manifest
- Create
kubernetes/base/service.yaml - Type: ClusterIP (internal load balancing)
- Port: 80 → 8000 (container port)
- Selector:
app: license-api - Time Estimate: 1 hour
P3-T05: Ingress + cert-manager Config
- Create
kubernetes/base/ingress.yaml - Host:
auth.coditect.ai - TLS: Use cert-manager for Let's Encrypt
- Annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod - Time Estimate: 2 hours
Day 3: Deployment & DNS
P3-T06: Deploy to GKE
- Apply Kubernetes manifests:
kubectl apply -k kubernetes/base/ - Verify pods running:
kubectl get pods - Check logs:
kubectl logs <pod-name> - Time Estimate: 2 hours
P3-T07: Configure Cloud DNS
- Create A record:
auth.coditect.ai→ GCP Load Balancer IP - Verify DNS propagation:
dig auth.coditect.ai - Time Estimate: 1 hour
P3-T08: SSL Certificate Verification
- Wait for cert-manager to provision certificate (5-10 min)
- Verify certificate:
curl -I https://auth.coditect.ai/health - Check expiry:
openssl s_client -connect auth.coditect.ai:443 - Time Estimate: 1 hour
Deliverables
- ⏸️ Dockerfile with multi-stage build
- ⏸️ Docker image pushed to GCR
- ⏸️ Kubernetes manifests (Deployment, Service, Ingress)
- ⏸️ FastAPI deployed to GKE
- ⏸️ SSL certificate on
auth.coditect.ai - ⏸️ DNS configured and propagated
Success Criteria
- Docker image builds successfully
- Kubernetes deployment healthy (3/3 pods running)
- Health check endpoint responds:
https://auth.coditect.ai/health - SSL certificate valid (Let's Encrypt)
- DNS resolves correctly
Blocking Dependencies
Phase 5 E2E Testing REQUIRES Phase 3 completion:
- Cannot test end-to-end without deployed API
- Client SDK needs production URL
8. Phase 4: Client SDK
Duration: 1-2 days Status: ⏸️ PENDING (0% Complete) Team: Python Developer Goal: Create Python License Client SDK for coditect-core integration
Objectives
- Create
license-clientPython package - Implement hardware fingerprinting
- Implement signature verification (public key)
- Implement heartbeat background thread
- Implement offline mode with grace period
- Write comprehensive tests
Tasks
Day 1: Client SDK Foundation
P4-T01: Create License Client Package
- Create
license-client/directory in coditect-core - Create
LicenseClientclass - Methods:
acquire(),heartbeat(),release() - Configuration: API URL, timeout, retry settings
- Time Estimate: 4 hours
P4-T02: Hardware Fingerprinting
- Generate unique hardware ID (MAC address + CPU ID + disk serial)
- Hash hardware ID (SHA256)
- Store hardware ID in local cache (~/.coditect/hardware_id)
- Time Estimate: 2 hours
P4-T03: Signature Verification (Public Key)
- Fetch public key from
/api/v1/public-keyendpoint - Cache public key locally (1 hour TTL)
- Verify license token signature with RSA-4096
- Validate token claims (tenant_id, user_id, expires_at)
- Time Estimate: 2 hours
Day 2: Heartbeat & Offline Mode
P4-T04: Heartbeat Background Thread
- Create
HeartbeatThreadclass - Send heartbeat every 5 minutes
- Handle network failures (retry 3 times)
- Graceful shutdown on CODITECT exit
- Time Estimate: 2 hours
P4-T05: Offline Mode with Grace Period
- If heartbeat fails, continue running (grace period: 24 hours)
- Check last successful heartbeat timestamp
- Display warning: "License server unreachable, offline mode"
- Force exit after 24 hours offline
- Time Estimate: 2 hours
P4-T06: Error Handling and Retries
- Retry logic for network failures (exponential backoff)
- Handle 429 Too Many Requests (back off)
- Handle 401 Unauthorized (re-authenticate)
- Handle 503 Service Unavailable (wait and retry)
- Time Estimate: 2 hours
P4-T07: Client Unit Tests
- Test hardware fingerprinting
- Test signature verification
- Test heartbeat thread
- Test offline mode grace period
- Test error handling
- Target: 80%+ coverage
- Time Estimate: 3 hours
Deliverables
- ⏸️
license-clientPython package - ⏸️ Hardware fingerprinting implemented
- ⏸️ Signature verification with public key
- ⏸️ Heartbeat background thread
- ⏸️ Offline mode with 24-hour grace period
- ⏸️ Comprehensive error handling
- ⏸️ Client unit tests (80%+ coverage)
Success Criteria
- Client can acquire license from API
- Signature verification succeeds
- Heartbeat thread runs in background
- Offline mode works (grace period)
- Error handling tested (network failures)
- Client tests pass (80%+ coverage)
9. Phase 5: End-to-End Testing
Duration: 2 days Status: ⏸️ PENDING (0% Complete) Team: QA Engineer, Backend Engineer Goal: Verify complete system integration with load testing
Objectives
- Test end-to-end license flow (acquire → heartbeat → release)
- Test multi-user concurrent access (10 users)
- Perform load testing (100 concurrent users)
- Run security scan (OWASP ZAP)
- Fix critical bugs
Tasks
Day 1: Integration Testing
P5-T01: End-to-End Integration Test
- Test: User signs up → acquires license → heartbeat → release
- Verify JWT token validation
- Verify Redis seat counting (atomic)
- Verify KMS signature verification
- Verify session TTL expiry (wait 6 min, session auto-released)
- Time Estimate: 4 hours
P5-T02: Multi-User Concurrent Test (10 users)
- Simulate 10 users acquiring licenses simultaneously
- Verify seat limit enforced (e.g., 5 seat plan)
- Verify 6th user gets "No seats available" error
- Verify seat release frees up capacity
- Time Estimate: 2 hours
Day 2: Load & Security Testing
P5-T03: Load Test with Locust (100 users, 1000 req/min)
- Create Locust test script:
- 100 concurrent users
- Each user: acquire → 3 heartbeats → release
- Total: 1000 requests/min
- Run for 10 minutes
- Measure:
- p50, p95, p99 latency
- Error rate
- Throughput (req/s)
- Target: p99 < 500ms, error rate < 1%
- Time Estimate: 4 hours
P5-T04: Security Scan (OWASP ZAP)
- Run OWASP ZAP active scan against staging API
- Check for:
- SQL injection vulnerabilities
- XSS vulnerabilities
- Authentication bypass
- Sensitive data exposure
- Fix critical/high vulnerabilities
- Time Estimate: 4 hours
P5-T05: Fix Critical Bugs
- Review test results
- Fix bugs discovered in E2E testing
- Fix performance bottlenecks (if p99 > 500ms)
- Re-run tests to verify fixes
- Time Estimate: 8 hours
Deliverables
- ⏸️ End-to-end test passing
- ⏸️ Multi-user concurrent test passing
- ⏸️ Load test passing (100 users, p99 < 500ms)
- ⏸️ Security scan completed (no critical issues)
- ⏸️ All critical bugs fixed
Success Criteria
- E2E test passes (acquire → heartbeat → release)
- Multi-user test passes (seat limit enforced)
- Load test passes (100 users, p99 < 500ms, error rate < 1%)
- Security scan shows no critical/high vulnerabilities
- All critical bugs fixed and verified
10. Phase 6: Production Hardening
Duration: 2 days Status: ⏸️ PENDING (0% Complete) Team: DevOps Engineer Goal: Production readiness with monitoring and runbooks
Objectives
- Create GKE monitoring dashboards (Prometheus + Grafana)
- Setup alerting (PagerDuty integration)
- Write runbook for incident response
- Create deployment checklist
- Document production procedures
Tasks
Day 1: Monitoring & Alerting
P6-T01: Create GKE Monitoring Dashboards
- Setup Prometheus in GKE (or use GCP Managed Prometheus)
- Create Grafana dashboards:
- License API performance (latency, error rate, throughput)
- Active sessions by tenant
- Redis connection pool usage
- PostgreSQL query performance
- GKE pod health (CPU, memory, restarts)
- Time Estimate: 4 hours
P6-T02: Setup Alerting (PagerDuty)
- Configure PagerDuty integration
- Create alert rules:
- High error rate (>5% 5xx errors)
- High latency (p99 > 1s)
- Redis connection pool exhaustion
- Database connection failures
- Pod crash loops
- Setup on-call rotation
- Time Estimate: 2 hours
Day 2: Runbooks & Checklists
P6-T03: Write Runbook (Incident Response)
- Create
docs/RUNBOOK.md - Sections:
- Common Issues (API 5xx errors, database connection failures)
- Troubleshooting Steps (check logs, restart pods, rollback)
- Escalation Procedures (when to page CTO)
- Rollback Procedures (kubectl rollout undo)
- Database Backup and Restore
- Time Estimate: 3 hours
P6-T04: Create Deployment Checklist
- Create
docs/DEPLOYMENT-CHECKLIST.md - Pre-deployment:
- All tests passing (unit, integration, E2E)
- Code review approved
- Database migrations tested
- Staging deployment successful
- Deployment:
- Blue-green deployment to production
- Health checks passing
- Monitor for 1 hour (check logs, metrics)
- Post-deployment:
- Verify end-to-end flow
- Check error rate (should be < 1%)
- Update CHANGELOG.md
- Time Estimate: 1 hour
Deliverables
- ⏸️ Prometheus + Grafana monitoring dashboards
- ⏸️ PagerDuty alerting configured
- ⏸️ Runbook for common issues
- ⏸️ Deployment checklist
Success Criteria
- Monitoring dashboards display real-time data
- Alerts fire correctly for test scenarios
- Runbook reviewed and approved by team
- Deployment checklist complete
11. Timeline & Resource Requirements
Overall Timeline
Total Duration: 14-18 business days (3-4 weeks) Start Date: November 20, 2025 (Phase 0 started) Phase 0 Complete: November 24, 2025 ✅ Target MVP Completion: December 6, 2025 (Phase 1-6)
Phase Breakdown
| Phase | Duration | Start Date | End Date | Status | Completion |
|---|---|---|---|---|---|
| Phase 0 | 4 days | Nov 20 | Nov 24 | ✅ COMPLETE | 100% |
| Phase 1 | 2-3 days | Nov 25 | Nov 27 | ⏸️ NEXT | 0% |
| Phase 2 | 5-7 days | Nov 28 | Dec 4 | ⏸️ PENDING | 0% |
| Phase 3 | 2-3 days | Dec 5 | Dec 6 | ⏸️ PENDING | 0% |
| Phase 4 | 1-2 days | Dec 5* | Dec 6* | ⏸️ PENDING | 0% |
| Phase 5 | 2 days | Dec 6 | Dec 9 | ⏸️ PENDING | 0% |
| Phase 6 | 2 days | Dec 9 | Dec 10 | ⏸️ PENDING | 0% |
*Phase 4 (Client SDK) can run in parallel with Phase 2-3
Team Composition
Required Team:
- 1x DevOps Engineer (40 hrs/week) - Infrastructure, deployment, monitoring
- 1x Backend Engineer (40 hrs/week) - FastAPI development, testing
- 1x Python Developer (20 hrs/week) - Client SDK development
- 1x QA Engineer (20 hrs/week) - Load testing, security scanning
Total: 3 FTEs (Full-Time Equivalents)
Resource Allocation by Phase
| Phase | DevOps | Backend | Python Dev | QA | Total Hours |
|---|---|---|---|---|---|
| Phase 0 ✅ | 32 | 0 | 0 | 0 | 32 |
| Phase 1 | 16 | 8 | 0 | 0 | 24 |
| Phase 2 | 8 | 40 | 0 | 8 | 56 |
| Phase 3 | 16 | 8 | 0 | 0 | 24 |
| Phase 4 | 0 | 0 | 16 | 4 | 20 |
| Phase 5 | 8 | 8 | 0 | 16 | 32 |
| Phase 6 | 16 | 0 | 0 | 8 | 24 |
| Total | 96 | 64 | 16 | 36 | 212 |
12. Budget & Cost Analysis
Development Costs
Labor Costs (4 weeks):
| Role | Rate | Hours | Total |
|---|---|---|---|
| DevOps Engineer | $130/hr | 96 | $12,480 |
| Backend Engineer | $120/hr | 64 | $7,680 |
| Python Developer | $120/hr | 16 | $1,920 |
| QA Engineer | $100/hr | 36 | $3,600 |
Total Labor: $25,680
Infrastructure Costs
Development Environment (4 weeks):
- GKE cluster: $100/month × 1 month = $100
- Cloud SQL: $150/month × 1 month = $150
- Redis: $30/month × 1 month = $30
- Cloud KMS: $10/month × 1 month = $10
- Identity Platform: Free (up to 50K MAU)
- Networking: $20/month × 1 month = $20
Development Total: $310
Production Environment (initial month):
- GKE cluster: $500/month (production-grade, 5 nodes)
- Cloud SQL: $400/month (higher tier, HA)
- Redis: $150/month (16GB STANDARD tier)
- Cloud KMS: $10/month
- Identity Platform: $50/month (up to 50K MAU)
- Load Balancer + SSL: $50/month
- Monitoring: $40/month
Production Total: $1,200/month
Total Project Budget
| Category | Cost |
|---|---|
| Labor (4 weeks) | $25,680 |
| Infrastructure (Dev, 1 month) | $310 |
| Infrastructure (Prod, 1 month) | $1,200 |
| Contingency (10%) | $2,719 |
| TOTAL | $29,909 |
Ongoing Costs (Post-Launch)
Monthly Recurring Costs:
- Production infrastructure: $1,200/month
- Support & maintenance (0.5 FTE): $10,000/month
Total: ~$11,200/month
13. Risk Assessment
Critical Risks
| Risk | Likelihood | Impact | Mitigation Strategy |
|---|---|---|---|
| Identity Platform OAuth app review delay | Medium (30%) | High (+2-3 days) | Start OAuth app approval process immediately; use Firebase emulator for development |
| Redis Lua script bugs (race conditions) | Medium (40%) | High (+1-2 days) | Thorough unit testing with concurrent users; code review by senior engineer |
| GKE networking issues (ingress, DNS) | Low (20%) | Medium (+1 day) | Test networking in dev environment first; have fallback to Cloud Run |
| FastAPI async/await complexity | Low (15%) | Low (+1 day) | Use existing FastAPI templates; pair programming for complex async code |
| SSL certificate provisioning delay | Low (10%) | Low (+4 hours) | Use Let's Encrypt staging first; manual cert creation as backup |
Medium Risks
| Risk | Likelihood | Impact | Mitigation Strategy |
|---|---|---|---|
| KMS signing performance issues | Low (10%) | Low (+4 hours) | Cache public key; use async KMS client |
| PostgreSQL connection pool exhaustion | Low (5%) | Low (+2 hours) | Set appropriate pool size (20-50); monitor connections |
| Load test failures (p99 > 500ms) | Medium (25%) | Medium (+1 day) | Database query optimization; add indexes; implement caching |
Risk Response Plans
If OAuth Review Delayed:
- Use Firebase Auth emulator for development
- Continue backend development with mock JWT tokens
- Deploy to staging with test OAuth credentials
- Switch to production OAuth when approved
If Load Test Fails:
- Identify bottleneck (database, API, Redis)
- Optimize hot path queries
- Add database indexes on frequently queried columns
- Implement caching for tenant settings (Redis)
- Scale horizontally (add GKE nodes)
If Security Scan Fails:
- Review OWASP ZAP findings
- Fix critical/high vulnerabilities immediately
- Re-run security scan
- Document remediation in security log
14. Quality Gates & Success Metrics
Phase Completion Criteria
Each phase must meet the following criteria before proceeding to the next phase:
Phase 0: Infrastructure & Documentation ✅ COMPLETE
- GKE cluster deployed and operational
- Cloud SQL PostgreSQL accessible from GKE
- Redis cluster operational
- All infrastructure costs within budget ($310/month dev)
- Documentation organized to CODITECT standards (100/100)
- Diagrams cover all major system components
Phase 1: Security Services
- Cloud KMS can sign arbitrary data
- Identity Platform OAuth flow works for Google and GitHub
- JWT tokens contain correct claims (user_id, tenant_id)
- Security documentation complete
Phase 2: Backend Development
- All API endpoints return correct responses
- Multi-tenant isolation verified (integration tests)
- Unit test coverage ≥80%
- Redis atomic operations work correctly
- KMS signing and verification functional
Phase 3: Deployment
- Docker image builds successfully
- Kubernetes deployment healthy (3/3 pods running)
- Health check endpoint responds:
https://auth.coditect.ai/health - SSL certificate valid (Let's Encrypt)
Phase 4: Client SDK
- Client can acquire license from API
- Signature verification succeeds
- Heartbeat thread runs in background
- Offline mode works (grace period)
- Client tests pass (80%+ coverage)
Phase 5: End-to-End Testing
- E2E test passes (acquire → heartbeat → release)
- Multi-user test passes (seat limit enforced)
- Load test passes (100 users, p99 < 500ms, error rate < 1%)
- Security scan shows no critical/high vulnerabilities
Phase 6: Production Hardening
- Monitoring dashboards display real-time data
- Alerts fire correctly for test scenarios
- Runbook complete and reviewed
- Deployment checklist complete
Key Performance Indicators (KPIs)
Technical KPIs:
| Metric | Target | Measurement |
|---|---|---|
| API Latency (p99) | <500ms | Prometheus histogram |
| API Error Rate | <1% | Prometheus counter |
| Database Query Time (p99) | <100ms | Prometheus histogram |
| Uptime | 99.9% | Grafana uptime panel |
| Test Coverage | ≥80% backend, ≥80% client | pytest |
| Security Vulnerabilities | 0 critical/high | OWASP ZAP, Trivy |
Business KPIs (Post-Launch):
| Metric | Target | Measurement |
|---|---|---|
| License Acquisition Success Rate | ≥99% | API logs |
| Heartbeat Reliability | ≥99.9% | API logs |
| Active Sessions by Tenant | 100-1000 | Database query |
| API Usage per Tenant | 10,000 requests/month | Kong analytics |
Acceptance Testing
Pre-Launch Checklist:
- All features functional in production environment
- Load test passed (100 concurrent users)
- Security audit completed (no critical issues)
- Disaster recovery tested (backup restore)
- Documentation complete (API, runbook, deployment)
- On-call rotation established
- Monitoring and alerting validated
Appendices
A. Glossary
- Floating License: License that limits concurrent users, not installations
- Check-on-Start: Validation pattern where license is checked at application startup
- Cloud KMS: Google Cloud Key Management Service for cryptographic operations
- Identity Platform: Google's OAuth2/OIDC service for authentication
- Redis Lua: Server-side scripting in Redis for atomic operations
- JWT: JSON Web Token (for authentication)
- TTL: Time To Live (automatic expiration in Redis)
- Heartbeat: Periodic signal to indicate session is still active
B. References
- FastAPI Documentation: https://fastapi.tiangolo.com/
- SQLAlchemy Async: https://docs.sqlalchemy.org/en/20/orm/extensions/asyncio.html
- Cloud KMS: https://cloud.google.com/kms/docs
- Identity Platform: https://cloud.google.com/identity-platform/docs
- Redis Lua Scripting: https://redis.io/docs/manual/programmability/eval-intro/
- GKE Documentation: https://cloud.google.com/kubernetes-engine/docs
C. Architecture Decision Records (ADRs)
See master repository docs/adrs/ for project-wide architectural decisions:
- ADR-001: OpenTofu over Terraform (licensing)
- ADR-002: FastAPI over Django (performance)
- ADR-003: Floating concurrent licensing pattern
- ADR-004: Local-first architecture (check-on-start)
- ADR-005: Cloud KMS for license signing
Document Version: 3.0 (License Management Focus) Last Updated: November 24, 2025 Next Review: November 25, 2025 (Phase 1 kickoff) Owner: CODITECT Infrastructure Team Status: PHASE 0 COMPLETE ✅ | PHASE 1-6 PENDING ⏸️
Document Change History
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0 | Nov 22, 2025 | Infrastructure Team | Initial (incorrect Django/Citus focus) |
| 2.0 | Nov 23, 2025 | Infrastructure Team | Expanded Django plan (still incorrect) |
| 3.0 | Nov 24, 2025 | Documentation Specialist | Complete rewrite for License Management Platform (FastAPI + PostgreSQL) - Phase 0 completion documented |