Skip to main content

Dashboard 2.0 - Cloud Migration Architecture

Version: 2.0.0 (Phase 2 - Production Multi-Tenant SaaS)
Status: Architecture Planning
Target Platform: Google Cloud Platform (GKE)
Migration Timeline: 12-16 weeks
Last Updated: 2025-11-28


Executive Summary​

This document outlines the comprehensive architecture for migrating Dashboard 2.0 from a local proof-of-concept to a production-ready, multi-tenant SaaS platform deployed on Google Kubernetes Engine (GKE). The migration transforms a single-user, single-project, local SQLite application into a scalable, secure, multi-tenant platform supporting hundreds of organizations with thousands of projects and repositories.

Current State (Phase 1 - POC):

  • βœ… Single-user local deployment
  • βœ… SQLite database (single file)
  • βœ… Manual git hook installation
  • βœ… Single project/repository support
  • βœ… Hybrid AI + TF-IDF linking (proven)
  • βœ… GPS dashboard UI (production-ready)

Target State (Phase 2 - Production):

  • 🎯 Multi-tenant SaaS platform
  • 🎯 Cloud SQL PostgreSQL with tenant isolation
  • 🎯 OAuth2/JWT authentication
  • 🎯 Multi-project, multi-repository support
  • 🎯 GitHub/GitLab webhook integration
  • 🎯 Horizontal autoscaling (100-10,000 users)
  • 🎯 99.9% uptime SLA

Investment Required: $180K development + $2.4K/month operational costs
Expected Timeline: 12-16 weeks (3 phases)
ROI Target: Break-even at 120 paying customers ($10K MRR)


Table of Contents​

  1. Phase 2 Architecture Overview
  2. Multi-Tenant Data Isolation
  3. Database Migration Strategy
  4. Authentication & Authorization
  5. Multi-Repository Support
  6. Scalability & Performance
  7. Security Architecture
  8. Deployment Architecture (GKE)
  9. Cost Analysis
  10. Migration Roadmap
  11. Risk Assessment
  12. Success Metrics

Phase 2 Architecture Overview​

High-Level Architecture Diagram​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ INTERNET (Users) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Google Cloud Load Balancer β”‚
β”‚ (Global HTTPS, SSL/TLS) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Cloud Armor (WAF/DDoS Protection) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ GKE Cluster (Multi-Zone, us-central1) β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Ingress Controller (nginx) β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β–Ό β–Ό β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Frontend Pods β”‚ β”‚ Backend Pods β”‚ β”‚
β”‚ β”‚ (React SPA) β”‚ β”‚ (Flask API) β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β€’ GPS UI │◄─────REST API──────────── β€’ v2.0 API β”‚ β”‚
β”‚ β”‚ β€’ Dashboard β”‚ β”‚ β€’ Hybrid AI β”‚ β”‚
β”‚ β”‚ β€’ Filters β”‚ β”‚ β€’ TF-IDF β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β€’ Auth β”‚ β”‚
β”‚ β”‚ HPA: 2-20 pods β”‚ β”‚ HPA: 3-50 pods β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β”‚ β–Ό β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Redis Cache β”‚ β”‚
β”‚ β”‚ (Sessions) β”‚ β”‚
β”‚ β”‚ 3 nodes, HA β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ β”‚ β”‚
β–Ό β–Ό β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Cloud SQL Postgresβ”‚ β”‚ Cloud Storage β”‚ β”‚ Pub/Sub β”‚
β”‚ (Multi-tenant DB) β”‚ β”‚ (Artifacts, Logs) β”‚ β”‚ (Webhooks) β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β€’ HA (Primary + β”‚ β”‚ β€’ Task exports β”‚ β”‚ β€’ GitHub β”‚
β”‚ Read Replicas) β”‚ β”‚ β€’ Backups β”‚ β”‚ β€’ GitLab β”‚
β”‚ β€’ Auto backups β”‚ β”‚ β€’ User uploads β”‚ β”‚ β€’ Bitbucket β”‚
β”‚ β€’ Point-in-time β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ recovery β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β”‚
β”‚ tenant_id based β”‚
β”‚ row-level securityβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Secret Manager β”‚
β”‚ (API Keys, OAuth) β”‚
β”‚ β”‚
β”‚ β€’ Anthropic API β”‚
β”‚ β€’ GitHub tokens β”‚
β”‚ β€’ JWT secrets β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ External Integrations β”‚
β”‚ β”‚
β”‚ GitHub API β”‚ GitLab API β”‚ Bitbucket API β”‚ Anthropic Claude β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Component Breakdown​

1. Frontend (React SPA)​

  • Current: Vanilla JS, Python SimpleHTTPServer
  • Target: React 18 + TypeScript, served via nginx
  • Features:
    • GPS navigation dashboard (migrated from POC)
    • Multi-project switcher
    • Organization/team management UI
    • Repository connection wizard
    • Real-time updates (WebSocket)

2. Backend API (Flask β†’ FastAPI)​

  • Current: Flask with SQLite
  • Target: FastAPI (async) with PostgreSQL
  • Why FastAPI:
    • Native async/await support (better performance)
    • Automatic OpenAPI docs
    • Pydantic validation (type safety)
    • WebSocket support built-in
  • Endpoints:
    • All v2.0 endpoints (migrated)
    • Multi-tenant authentication
    • Organization/team management
    • Repository webhook handlers
    • Subscription/billing API

3. Database (Cloud SQL PostgreSQL)​

  • Instance: db-n1-standard-4 (4 vCPU, 15 GB RAM)
  • Storage: 100 GB SSD (auto-expand to 500 GB)
  • Backups: Daily automated + point-in-time recovery
  • High Availability: Primary + read replica (auto-failover)
  • Connection: Private IP (VPC peering with GKE)

4. Caching Layer (Redis)​

  • Purpose: Session storage, API response caching, rate limiting
  • Configuration: 3-node cluster (HA), 4 GB memory per node
  • Cache Strategy:
    • Sessions: 24-hour TTL
    • Task lists: 5-minute TTL (invalidate on update)
    • AI link suggestions: 1-hour TTL

5. Message Queue (Pub/Sub)​

  • Purpose: Webhook event processing, async AI analysis
  • Topics:
    • webhook-events (GitHub/GitLab push events)
    • ai-analysis-requests (commit analysis jobs)
    • tenant-provisioning (new signup workflows)

Multi-Tenant Data Isolation​

Tenant Isolation Strategy​

We use shared database, shared schema with row-level security (RLS) for optimal cost/performance balance.

Data Model​

-- Core tenant table
CREATE TABLE tenants (
tenant_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
organization_name VARCHAR(255) NOT NULL,
subdomain VARCHAR(63) UNIQUE NOT NULL, -- e.g., "acme" β†’ acme.dashboard.app
plan_tier VARCHAR(50) NOT NULL, -- free, pro, enterprise
max_users INTEGER NOT NULL DEFAULT 5,
max_projects INTEGER NOT NULL DEFAULT 3,
max_repositories INTEGER NOT NULL DEFAULT 10,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
status VARCHAR(20) DEFAULT 'active', -- active, suspended, deleted

-- Billing
stripe_customer_id VARCHAR(255),
subscription_status VARCHAR(50), -- trialing, active, past_due, canceled
trial_ends_at TIMESTAMP,

-- Feature flags
features JSONB DEFAULT '{}'::jsonb -- {"ai_linking": true, "webhooks": true}
);

CREATE INDEX idx_tenants_subdomain ON tenants(subdomain);
CREATE INDEX idx_tenants_status ON tenants(status);

-- Users (multi-tenant)
CREATE TABLE users (
user_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(tenant_id) ON DELETE CASCADE,
email VARCHAR(255) NOT NULL,
full_name VARCHAR(255),
password_hash VARCHAR(255), -- bcrypt
role VARCHAR(50) NOT NULL DEFAULT 'member', -- owner, admin, member, viewer

-- OAuth
github_user_id INTEGER,
github_username VARCHAR(255),
gitlab_user_id INTEGER,

-- Status
is_active BOOLEAN DEFAULT true,
email_verified BOOLEAN DEFAULT false,
last_login_at TIMESTAMP,
created_at TIMESTAMP DEFAULT NOW(),

UNIQUE(tenant_id, email)
);

CREATE INDEX idx_users_tenant ON users(tenant_id);
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_github ON users(github_user_id) WHERE github_user_id IS NOT NULL;

-- Enable Row-Level Security
ALTER TABLE users ENABLE ROW LEVEL SECURITY;

-- RLS Policy: Users can only see their own tenant's users
CREATE POLICY users_tenant_isolation ON users
USING (tenant_id = current_setting('app.current_tenant_id')::uuid);

-- Projects (replaces single project from POC)
CREATE TABLE projects (
project_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(tenant_id) ON DELETE CASCADE,
name VARCHAR(255) NOT NULL,
description TEXT,
short_name VARCHAR(10) NOT NULL, -- CODITECT standard (2-6 uppercase letters)

-- Project settings
enable_ai_linking BOOLEAN DEFAULT true,
ai_confidence_threshold DECIMAL(3,2) DEFAULT 0.30,
enable_webhooks BOOLEAN DEFAULT true,

created_by UUID REFERENCES users(user_id),
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),

UNIQUE(tenant_id, short_name)
);

CREATE INDEX idx_projects_tenant ON projects(tenant_id);

ALTER TABLE projects ENABLE ROW LEVEL SECURITY;
CREATE POLICY projects_tenant_isolation ON projects
USING (tenant_id = current_setting('app.current_tenant_id')::uuid);

-- Repositories (NEW - multi-repo support)
CREATE TABLE repositories (
repository_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(tenant_id) ON DELETE CASCADE,
project_id UUID REFERENCES projects(project_id) ON DELETE CASCADE,

-- Repository identification
provider VARCHAR(50) NOT NULL, -- github, gitlab, bitbucket
provider_repo_id INTEGER NOT NULL, -- GitHub repo ID
full_name VARCHAR(255) NOT NULL, -- "owner/repo"
clone_url TEXT,
default_branch VARCHAR(255) DEFAULT 'main',

-- Webhook
webhook_id VARCHAR(255), -- Provider's webhook ID
webhook_secret VARCHAR(255), -- HMAC secret for verification
webhook_enabled BOOLEAN DEFAULT false,

-- Sync status
last_sync_at TIMESTAMP,
sync_status VARCHAR(50) DEFAULT 'pending', -- pending, syncing, synced, error

created_at TIMESTAMP DEFAULT NOW(),

UNIQUE(tenant_id, provider, provider_repo_id)
);

CREATE INDEX idx_repositories_tenant ON repositories(tenant_id);
CREATE INDEX idx_repositories_project ON repositories(project_id);
CREATE INDEX idx_repositories_provider ON repositories(provider, provider_repo_id);

ALTER TABLE repositories ENABLE ROW LEVEL SECURITY;
CREATE POLICY repositories_tenant_isolation ON repositories
USING (tenant_id = current_setting('app.current_tenant_id')::uuid);

-- Tasks (migrated from POC, multi-tenant)
CREATE TABLE tasks (
task_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(tenant_id) ON DELETE CASCADE,
project_id UUID NOT NULL REFERENCES projects(project_id) ON DELETE CASCADE,

-- Task identification (CODITECT standard)
task_identifier VARCHAR(50) NOT NULL, -- TASK-PROJ-0001
title VARCHAR(500) NOT NULL,
description TEXT,

-- Status and metadata
status VARCHAR(50) NOT NULL DEFAULT 'pending', -- pending, in_progress, completed, blocked
complexity VARCHAR(10), -- S, M, L, XL
complexity_hours INTEGER,
assignee_id UUID REFERENCES users(user_id),
due_date DATE,

-- Time tracking
time_estimate INTEGER, -- hours
time_actual INTEGER,
time_remaining INTEGER,

-- Metadata
tags JSONB DEFAULT '[]'::jsonb,
acceptance_criteria JSONB DEFAULT '[]'::jsonb,
dependencies JSONB DEFAULT '[]'::jsonb, -- [{"task_id": "TASK-PROJ-0002"}]
blockers TEXT,
notes TEXT,

created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),

UNIQUE(tenant_id, project_id, task_identifier)
);

CREATE INDEX idx_tasks_tenant ON tasks(tenant_id);
CREATE INDEX idx_tasks_project ON tasks(project_id);
CREATE INDEX idx_tasks_identifier ON tasks(task_identifier);
CREATE INDEX idx_tasks_status ON tasks(status);

ALTER TABLE tasks ENABLE ROW LEVEL SECURITY;
CREATE POLICY tasks_tenant_isolation ON tasks
USING (tenant_id = current_setting('app.current_tenant_id')::uuid);

-- Commits (multi-repo support)
CREATE TABLE commits (
commit_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(tenant_id) ON DELETE CASCADE,
repository_id UUID NOT NULL REFERENCES repositories(repository_id) ON DELETE CASCADE,

sha VARCHAR(40) NOT NULL,
message TEXT NOT NULL,
author_name VARCHAR(255),
author_email VARCHAR(255),
committer_name VARCHAR(255),
committer_email VARCHAR(255),
committed_at TIMESTAMP NOT NULL,

-- Diff (optional, for AI analysis)
files_changed JSONB DEFAULT '[]'::jsonb, -- [{"path": "src/main.py", "additions": 10, "deletions": 5}]
additions INTEGER DEFAULT 0,
deletions INTEGER DEFAULT 0,

created_at TIMESTAMP DEFAULT NOW(),

UNIQUE(tenant_id, repository_id, sha)
);

CREATE INDEX idx_commits_tenant ON commits(tenant_id);
CREATE INDEX idx_commits_repository ON commits(repository_id);
CREATE INDEX idx_commits_sha ON commits(sha);
CREATE INDEX idx_commits_committed_at ON commits(committed_at DESC);

ALTER TABLE commits ENABLE ROW LEVEL SECURITY;
CREATE POLICY commits_tenant_isolation ON commits
USING (tenant_id = current_setting('app.current_tenant_id')::uuid);

-- Task-Commit Links (migrated from POC)
CREATE TABLE task_commit_links (
link_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(tenant_id) ON DELETE CASCADE,
task_id UUID NOT NULL REFERENCES tasks(task_id) ON DELETE CASCADE,
commit_id UUID NOT NULL REFERENCES commits(commit_id) ON DELETE CASCADE,

-- Link metadata
confidence DECIMAL(5,4) NOT NULL, -- 0.0000 to 1.0000
method VARCHAR(50) NOT NULL, -- ai, tfidf, hybrid, manual
reasoning TEXT, -- AI-generated explanation

-- Manual override
is_manual BOOLEAN DEFAULT false,
created_by UUID REFERENCES users(user_id),

created_at TIMESTAMP DEFAULT NOW(),

UNIQUE(tenant_id, task_id, commit_id)
);

CREATE INDEX idx_links_tenant ON task_commit_links(tenant_id);
CREATE INDEX idx_links_task ON task_commit_links(task_id);
CREATE INDEX idx_links_commit ON task_commit_links(commit_id);
CREATE INDEX idx_links_confidence ON task_commit_links(confidence DESC);

ALTER TABLE task_commit_links ENABLE ROW LEVEL SECURITY;
CREATE POLICY links_tenant_isolation ON task_commit_links
USING (tenant_id = current_setting('app.current_tenant_id')::uuid);

Setting Current Tenant​

Every API request sets the tenant context:

# FastAPI middleware
@app.middleware("http")
async def set_tenant_context(request: Request, call_next):
# Extract tenant from JWT or subdomain
tenant_id = extract_tenant_from_request(request)

# Set PostgreSQL session variable
async with database.session() as session:
await session.execute(
f"SET LOCAL app.current_tenant_id = '{tenant_id}'"
)
response = await call_next(request)

return response

Benefits:

  • βœ… Automatic tenant isolation (no manual WHERE clauses)
  • βœ… Prevents cross-tenant data leaks
  • βœ… Database-level enforcement (even if app has bugs)
  • βœ… Performance: PostgreSQL query planner optimizes for RLS

Database Migration Strategy​

Migration Phases​

Phase 1: Schema Migration (Week 1-2)​

Objective: Transform POC schema to multi-tenant production schema

Steps:

  1. Export POC data from SQLite

    sqlite3 data/dashboard.db .dump > poc_data_export.sql
  2. Create Cloud SQL PostgreSQL instance

    gcloud sql instances create dashboard-prod \
    --database-version=POSTGRES_15 \
    --tier=db-n1-standard-4 \
    --region=us-central1 \
    --availability-type=REGIONAL \
    --backup-start-time=03:00 \
    --enable-bin-log \
    --storage-size=100GB \
    --storage-auto-increase
  3. Run migration scripts

    -- migrations/001_create_multi_tenant_schema.sql
    -- (Full schema from above)

    -- migrations/002_migrate_poc_data.sql
    -- Create default tenant for POC data
    INSERT INTO tenants (organization_name, subdomain, plan_tier)
    VALUES ('CODITECT Team', 'coditect', 'enterprise')
    RETURNING tenant_id; -- abc-123-def

    -- Migrate tasks (add tenant_id)
    INSERT INTO tasks (tenant_id, project_id, task_identifier, title, ...)
    SELECT
    'abc-123-def'::uuid, -- POC tenant
    (SELECT project_id FROM projects WHERE short_name = 'DASH'),
    'TASK-DASH-' || LPAD(id::text, 4, '0'), -- Convert old ID format
    title,
    ...
    FROM poc_tasks;
  4. Validate migration

    • Row counts match (tasks, commits, links)
    • All foreign keys valid
    • RLS policies work correctly

Phase 2: Connection Pool Setup (Week 2)​

SQLAlchemy Async Configuration:

from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
from sqlalchemy.orm import sessionmaker
import google.auth
from google.cloud.sql.connector import Connector

# Cloud SQL connector (IAM auth)
async def getconn():
connector = Connector()
conn = await connector.connect_async(
"project-id:us-central1:dashboard-prod",
"asyncpg",
user="dashboard-app@project-id.iam",
enable_iam_auth=True,
db="dashboard"
)
return conn

# Engine with connection pooling
engine = create_async_engine(
"postgresql+asyncpg://",
async_creator=getconn,
pool_size=20, # Per backend pod
max_overflow=10,
pool_pre_ping=True, # Validate connections
pool_recycle=3600 # Recycle after 1 hour
)

# Session factory
AsyncSessionLocal = sessionmaker(
engine, class_=AsyncSession, expire_on_commit=False
)

Phase 3: Read Replica Setup (Week 3)​

Configuration:

gcloud sql instances create dashboard-prod-replica \
--master-instance-name=dashboard-prod \
--region=us-east1 \ # Different region for disaster recovery
--replica-type=READ \
--availability-type=ZONAL

Query Routing:

# Read from replica for analytics/reports
@app.get("/api/v2/analytics/task-velocity")
async def get_task_velocity(tenant_id: UUID):
# Use read replica (no writes needed)
async with read_replica_session() as session:
result = await session.execute(
"""
SELECT DATE_TRUNC('week', completed_at), COUNT(*)
FROM tasks
WHERE tenant_id = :tenant_id AND status = 'completed'
GROUP BY 1 ORDER BY 1 DESC LIMIT 12
""",
{"tenant_id": tenant_id}
)
return result.fetchall()

Authentication & Authorization​

OAuth 2.0 + JWT Architecture​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Browser β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
β”‚ 1. Click "Sign in with GitHub"
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Frontend (React SPA) β”‚
β”‚ Redirects to /auth/github β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ 2. GET /auth/github?redirect_uri=...
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Backend API (FastAPI) β”‚
β”‚ Redirects to GitHub OAuth β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ 3. Redirect to github.com/login/oauth/authorize
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ GitHub OAuth β”‚
β”‚ User authorizes β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ 4. Callback: GET /auth/github/callback?code=abc123
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Backend API β”‚
β”‚ 1. Exchange code for access token β”‚
β”‚ 2. Fetch user info from GitHub API β”‚
β”‚ 3. Create/update user in database β”‚
β”‚ 4. Generate JWT token β”‚
β”‚ 5. Set httpOnly cookie β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ 5. Redirect to /dashboard with session cookie
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Frontend Dashboard β”‚
β”‚ Authenticated! β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

JWT Token Structure​

{
"header": {
"alg": "RS256",
"typ": "JWT"
},
"payload": {
"sub": "user-uuid-abc-123",
"tenant_id": "tenant-uuid-def-456",
"email": "user@example.com",
"role": "admin",
"iat": 1732800000,
"exp": 1732886400,
"permissions": [
"tasks:read",
"tasks:write",
"projects:read",
"projects:write",
"repositories:manage",
"users:invite"
]
},
"signature": "..."
}

Role-Based Access Control (RBAC)​

RolePermissions
OwnerAll permissions, manage billing, delete organization
AdminManage projects, invite users, configure webhooks
MemberCreate/edit tasks, link commits, view all projects
ViewerRead-only access to tasks and dashboards

Implementation:

from fastapi import Depends, HTTPException, status
from fastapi.security import HTTPBearer
import jwt

security = HTTPBearer()

async def get_current_user(token: str = Depends(security)):
try:
payload = jwt.decode(
token.credentials,
settings.JWT_PUBLIC_KEY,
algorithms=["RS256"]
)
user_id = UUID(payload["sub"])
tenant_id = UUID(payload["tenant_id"])

# Fetch user from database
async with AsyncSessionLocal() as session:
await session.execute(
f"SET LOCAL app.current_tenant_id = '{tenant_id}'"
)
user = await session.get(User, user_id)
if not user or not user.is_active:
raise HTTPException(status_code=401, detail="Invalid user")
return user
except jwt.ExpiredSignatureError:
raise HTTPException(status_code=401, detail="Token expired")
except Exception:
raise HTTPException(status_code=401, detail="Invalid token")

# Permission decorator
def require_permission(permission: str):
async def permission_checker(user: User = Depends(get_current_user)):
if permission not in user.permissions:
raise HTTPException(status_code=403, detail="Insufficient permissions")
return user
return permission_checker

# Usage
@app.post("/api/v2/projects")
async def create_project(
project: ProjectCreate,
user: User = Depends(require_permission("projects:write"))
):
# user has projects:write permission
...

Multi-Repository Support​

Webhook Architecture​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ GitHub/GitLab β”‚
β”‚ Push event β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ POST /webhooks/github/{repo_id}
β”‚ X-Hub-Signature-256: sha256=...
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ GKE Load Balancer β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Backend Pod (Webhook Handler) β”‚
β”‚ 1. Verify HMAC signature β”‚
β”‚ 2. Extract commits from payload β”‚
β”‚ 3. Publish to Pub/Sub β”‚
β”‚ 4. Return 200 OK (< 10s) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”‚ Publishes to Pub/Sub
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Pub/Sub Topic: webhook-events β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”‚ Subscribers (async workers)
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Worker Pods (3-10 instances) β”‚
β”‚ 1. Pull message from Pub/Sub β”‚
β”‚ 2. Store commits in database β”‚
β”‚ 3. Trigger AI analysis β”‚
β”‚ 4. Send to ai-analysis-requests β”‚
β”‚ 5. Ack message β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Pub/Sub: ai-analysis-requests β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ AI Analysis Workers (2-5 pods) β”‚
β”‚ 1. Fetch commit + tasks β”‚
β”‚ 2. Call Anthropic Claude API β”‚
β”‚ 3. Generate link suggestions β”‚
β”‚ 4. Store in task_commit_links β”‚
β”‚ 5. Notify frontend via WebSocketβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

GitHub Webhook Setup (Automated)​

# When user connects repository
async def setup_github_webhook(repo: Repository, github_access_token: str):
"""
Automatically create webhook on GitHub repository
"""
webhook_url = f"{settings.API_BASE_URL}/webhooks/github/{repo.repository_id}"
webhook_secret = secrets.token_urlsafe(32)

# Create webhook via GitHub API
async with httpx.AsyncClient() as client:
response = await client.post(
f"https://api.github.com/repos/{repo.full_name}/hooks",
headers={
"Authorization": f"Bearer {github_access_token}",
"Accept": "application/vnd.github+json"
},
json={
"name": "web",
"active": True,
"events": ["push", "pull_request"],
"config": {
"url": webhook_url,
"content_type": "json",
"secret": webhook_secret,
"insecure_ssl": "0"
}
}
)

if response.status_code == 201:
webhook_data = response.json()

# Store webhook credentials
repo.webhook_id = str(webhook_data["id"])
repo.webhook_secret = webhook_secret
repo.webhook_enabled = True

await session.commit()
return True

return False

# Webhook handler
@app.post("/webhooks/github/{repo_id}")
async def github_webhook(
repo_id: UUID,
request: Request,
x_hub_signature_256: str = Header(None)
):
"""
Handle GitHub push events
"""
# 1. Fetch repository
async with AsyncSessionLocal() as session:
repo = await session.get(Repository, repo_id)
if not repo or not repo.webhook_enabled:
raise HTTPException(status_code=404)

# 2. Verify HMAC signature
body = await request.body()
expected_signature = "sha256=" + hmac.new(
repo.webhook_secret.encode(),
body,
hashlib.sha256
).hexdigest()

if not hmac.compare_digest(expected_signature, x_hub_signature_256):
raise HTTPException(status_code=401, detail="Invalid signature")

# 3. Parse payload
payload = await request.json()
event_type = request.headers.get("X-GitHub-Event")

if event_type == "push":
# Extract commits
commits = payload.get("commits", [])

# Publish to Pub/Sub for async processing
for commit in commits:
await pubsub_publisher.publish(
topic="webhook-events",
message={
"tenant_id": str(repo.tenant_id),
"repository_id": str(repo.repository_id),
"sha": commit["id"],
"message": commit["message"],
"author_name": commit["author"]["name"],
"author_email": commit["author"]["email"],
"timestamp": commit["timestamp"],
"url": commit["url"]
}
)

# 4. Return 200 OK quickly (GitHub times out at 10s)
return {"status": "received"}

Multi-Provider Support​

ProviderWebhook EventAuthenticationRate Limits
GitHubpush, pull_requestHMAC SHA-2565,000 req/hr
GitLabPush Hook, Merge Request HookSecret token600 req/min
Bitbucketrepo:push, pullrequest:createdHMAC SHA-2561,000 req/hr

Scalability & Performance​

Horizontal Pod Autoscaling (HPA)​

# Backend API HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: dashboard-backend-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: dashboard-backend
minReplicas: 3
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 50
periodSeconds: 60
- type: Pods
value: 5
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60

Caching Strategy​

Redis Cache Layers:

from redis.asyncio import Redis
import pickle

redis_client = Redis(
host="redis-cluster",
port=6379,
db=0,
decode_responses=False # Binary mode for pickle
)

# Cache decorator
def cached(ttl: int = 300):
def decorator(func):
async def wrapper(*args, **kwargs):
# Generate cache key
cache_key = f"{func.__name__}:{pickle.dumps((args, kwargs))}"

# Check cache
cached_value = await redis_client.get(cache_key)
if cached_value:
return pickle.loads(cached_value)

# Execute function
result = await func(*args, **kwargs)

# Store in cache
await redis_client.setex(cache_key, ttl, pickle.dumps(result))

return result
return wrapper
return decorator

# Usage
@cached(ttl=300) # 5-minute cache
async def get_task_list(tenant_id: UUID, project_id: UUID):
async with AsyncSessionLocal() as session:
tasks = await session.execute(
select(Task)
.where(Task.tenant_id == tenant_id)
.where(Task.project_id == project_id)
)
return tasks.scalars().all()

Cache Invalidation:

# After task update
@app.put("/api/v2/tasks/{task_id}")
async def update_task(task_id: UUID, task_update: TaskUpdate):
# Update database
...

# Invalidate cache
await redis_client.delete(
f"get_task_list:{pickle.dumps((tenant_id, project_id))}"
)

# Notify WebSocket clients
await websocket_manager.broadcast(
tenant_id,
{"type": "task_updated", "task_id": str(task_id)}
)

Performance Targets​

MetricTargetMeasurement
API Response Time (p95)< 200msPrometheus + Grafana
API Response Time (p99)< 500msPrometheus + Grafana
Database Query Time (p95)< 50msCloud SQL Insights
Cache Hit Rate> 80%Redis INFO stats
WebSocket Latency< 100msCustom metrics
Concurrent Users10,000+Load testing (k6)
Requests/second5,000+Load testing (k6)

Security Architecture​

Security Layers​

  1. Network Security

    • VPC isolation (GKE cluster in private network)
    • Cloud Armor (WAF, DDoS protection, rate limiting)
    • Private IP for Cloud SQL (no public internet access)
    • VPC peering between GKE and Cloud SQL
  2. Application Security

    • JWT with RS256 (asymmetric keys)
    • CSRF protection (double-submit cookie)
    • XSS protection (Content-Security-Policy headers)
    • Input validation (Pydantic models)
    • SQL injection prevention (parameterized queries)
  3. Data Security

    • Encryption at rest (Cloud SQL automatic encryption)
    • Encryption in transit (TLS 1.3)
    • Row-level security (PostgreSQL RLS)
    • Secrets in Secret Manager (not env vars)
    • API key rotation (30-day policy)
  4. Compliance

    • GDPR: Right to delete (cascade deletes), data export
    • SOC 2 Type II: Audit logging, access controls
    • HIPAA (if needed): BAA with GCP, encryption, logging

Security Checklist​

  • Enable Cloud Armor with OWASP Core Rule Set
  • Configure VPC Service Controls
  • Setup Cloud IAM least-privilege policies
  • Enable Cloud Audit Logs
  • Configure Secret Manager with automatic rotation
  • Setup Vulnerability Scanning (Artifact Registry)
  • Enable Binary Authorization for container images
  • Configure Network Policies (Kubernetes)
  • Setup WAF rules for SQL injection, XSS
  • Enable DDoS protection (Cloud Armor)
  • Configure rate limiting (per tenant, per IP)
  • Setup security scanning in CI/CD (Snyk, Dependabot)

Deployment Architecture (GKE)​

GKE Cluster Configuration​

# cluster.yaml
apiVersion: container.cnrm.cloud.google.com/v1beta1
kind: ContainerCluster
metadata:
name: dashboard-prod
spec:
location: us-central1
initialNodeCount: 3

# Release channel (automatic updates)
releaseChannel:
channel: REGULAR

# VPC-native cluster
ipAllocationPolicy:
clusterSecondaryRangeName: pods
servicesSecondaryRangeName: services

# Network config
networkConfig:
enableIntraNodeVisibility: true

# Workload Identity (IAM for pods)
workloadIdentityConfig:
workloadPool: project-id.svc.id.goog

# Binary Authorization
binaryAuthorization:
evaluationMode: PROJECT_SINGLETON_POLICY_ENFORCE

# Private cluster
privateClusterConfig:
enablePrivateNodes: true
enablePrivateEndpoint: false
masterIpv4CidrBlock: 172.16.0.0/28

# Master authorized networks
masterAuthorizedNetworksConfig:
enabled: true
cidrBlocks:
- displayName: "Office"
cidrBlock: "203.0.113.0/24"

# Node pools
nodePools:
- name: default-pool
initialNodeCount: 3
config:
machineType: e2-standard-4
diskSizeGb: 100
diskType: pd-ssd
preemptible: false

# Workload Identity
workloadMetadataConfig:
mode: GKE_METADATA

# OAuth scopes
oauthScopes:
- https://www.googleapis.com/auth/cloud-platform

# Resource labels
labels:
env: production
app: dashboard

# Autoscaling
autoscaling:
enabled: true
minNodeCount: 3
maxNodeCount: 10

# Node management
management:
autoUpgrade: true
autoRepair: true

Kubernetes Manifests​

Backend Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
name: dashboard-backend
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: dashboard-backend
template:
metadata:
labels:
app: dashboard-backend
spec:
serviceAccountName: dashboard-backend-sa

# Init container (run migrations)
initContainers:
- name: migrate-db
image: gcr.io/project-id/dashboard-backend:latest
command: ["alembic", "upgrade", "head"]
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: database-secrets
key: connection-string

# Main container
containers:
- name: backend
image: gcr.io/project-id/dashboard-backend:latest
ports:
- containerPort: 8000
protocol: TCP

env:
- name: ENV
value: "production"
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: database-secrets
key: connection-string
- name: REDIS_URL
value: "redis://redis-cluster:6379"
- name: ANTHROPIC_API_KEY
valueFrom:
secretKeyRef:
name: api-keys
key: anthropic

# Health checks
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
periodSeconds: 10

readinessProbe:
httpGet:
path: /ready
port: 8000
initialDelaySeconds: 5
periodSeconds: 5

# Resources
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2000m
memory: 4Gi

# Security context
securityContext:
allowPrivilegeEscalation: false
runAsNonRoot: true
runAsUser: 1000
capabilities:
drop:
- ALL

Frontend Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
name: dashboard-frontend
namespace: production
spec:
replicas: 2
selector:
matchLabels:
app: dashboard-frontend
template:
metadata:
labels:
app: dashboard-frontend
spec:
containers:
- name: frontend
image: gcr.io/project-id/dashboard-frontend:latest
ports:
- containerPort: 80
protocol: TCP

# nginx configuration
volumeMounts:
- name: nginx-config
mountPath: /etc/nginx/conf.d

resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi

volumes:
- name: nginx-config
configMap:
name: nginx-config

CI/CD Pipeline (Cloud Build)​

# cloudbuild.yaml
steps:
# 1. Run tests
- name: 'python:3.14'
entrypoint: 'bash'
args:
- '-c'
- |
pip install -r requirements.txt
pytest tests/ -v --cov=backend --cov-report=xml

# 2. Build Docker image
- name: 'gcr.io/cloud-builders/docker'
args:
- 'build'
- '-t'
- 'gcr.io/$PROJECT_ID/dashboard-backend:$COMMIT_SHA'
- '-t'
- 'gcr.io/$PROJECT_ID/dashboard-backend:latest'
- '.'

# 3. Push to Artifact Registry
- name: 'gcr.io/cloud-builders/docker'
args:
- 'push'
- 'gcr.io/$PROJECT_ID/dashboard-backend:$COMMIT_SHA'

# 4. Deploy to GKE
- name: 'gcr.io/cloud-builders/kubectl'
args:
- 'set'
- 'image'
- 'deployment/dashboard-backend'
- 'backend=gcr.io/$PROJECT_ID/dashboard-backend:$COMMIT_SHA'
- '--namespace=production'
env:
- 'CLOUDSDK_COMPUTE_REGION=us-central1'
- 'CLOUDSDK_CONTAINER_CLUSTER=dashboard-prod'

# 5. Wait for rollout
- name: 'gcr.io/cloud-builders/kubectl'
args:
- 'rollout'
- 'status'
- 'deployment/dashboard-backend'
- '--namespace=production'
- '--timeout=5m'

options:
machineType: 'E2_HIGHCPU_8'
logging: CLOUD_LOGGING_ONLY

Cost Analysis​

GCP Monthly Costs (Projected)​

For 500 active tenants, 5,000 users, 50,000 tasks, 100K commits/month

ServiceConfigurationMonthly Cost
GKE Cluster3-10 e2-standard-4 nodes$450-$1,500
Cloud SQLdb-n1-standard-4 + replica$600
Cloud Storage500 GB (backups, artifacts)$10
Cloud Load BalancerGlobal HTTPS, ~100M requests$50
Cloud ArmorWAF rules, DDoS protection$50
Pub/Sub10M messages/month$4
Secret Manager50 secrets, 1M accesses$1
Cloud Build100 builds/month (10 min avg)$15
Cloud Logging100 GB logs/month$50
Cloud MonitoringMetrics, dashboards, alerts$20
Egress (Internet)500 GB/month$40
Redis (Memorystore)12 GB HA cluster$120
Anthropic Claude API5M tokens/month (avg)$75
Total (Minimum)~$1,485/month
Total (Scale to 5K users)~$2,400/month

Cost per Tenant​

  • Free tier (5 users, 3 projects): ~$1.50/month/tenant
  • Pro tier (25 users, 20 projects): ~$3.50/month/tenant
  • Enterprise tier (unlimited): Custom pricing

Revenue Requirements (Break-Even)​

Pricing Model:

  • Free: $0/month (limited features)
  • Pro: $49/month per organization
  • Enterprise: $199/month per organization

Break-Even Analysis:

  • Fixed costs: $2,400/month
  • Break-even: 49 Pro customers OR 12 Enterprise customers
  • Target (profitability): 120 customers = $5,880/month revenue
  • Profit margin at 120 customers: ~60% ($3,480 profit/month)

Development Costs (One-Time)​

PhaseDurationTeamCost
Phase 2A: Multi-tenant DB migration2 weeks2 backend devs$20K
Phase 2B: Auth + OAuth integration2 weeks1 backend, 1 frontend$15K
Phase 2C: Multi-repo webhooks3 weeks2 backend devs$30K
Phase 2D: Frontend React migration3 weeks2 frontend devs$30K
Phase 2E: GKE deployment + DevOps2 weeks1 DevOps engineer$15K
Phase 2F: Load testing + optimization2 weeks1 backend, 1 DevOps$15K
Phase 2G: Security audit + compliance1 weekSecurity consultant$10K
Phase 2H: Documentation + onboarding1 weekTechnical writer$5K
Total Development16 weeks$140K
Contingency (20%)$28K
Infrastructure setup$12K
Grand Total16 weeks$180K

Migration Roadmap​

Phase 2A: Database Migration (Weeks 1-2)​

Objectives:

  • βœ… Migrate SQLite β†’ Cloud SQL PostgreSQL
  • βœ… Implement multi-tenant schema with RLS
  • βœ… Setup connection pooling

Deliverables:

  • Cloud SQL instance provisioned (HA + replica)
  • Migration scripts tested and validated
  • POC data migrated successfully
  • RLS policies enforced and tested
  • Connection pooling configured (SQLAlchemy async)

Acceptance Criteria:

  • All 8 POC tasks migrated with correct tenant_id
  • RLS prevents cross-tenant data access
  • Database performance < 50ms (p95) for queries

Phase 2B: Authentication & Multi-Tenancy (Weeks 3-4)​

Objectives:

  • βœ… Implement OAuth 2.0 (GitHub, GitLab)
  • βœ… JWT-based authentication
  • βœ… Multi-tenant user management
  • βœ… RBAC (owner, admin, member, viewer)

Deliverables:

  • GitHub OAuth integration working
  • GitLab OAuth integration working
  • JWT generation and validation
  • User registration and tenant creation
  • Role-based permission system
  • Admin UI for user management

Acceptance Criteria:

  • Users can sign up via GitHub/GitLab
  • JWT tokens properly scoped to tenant
  • RBAC enforced at API level
  • Admin can invite users with roles

Phase 2C: Multi-Repository Webhooks (Weeks 5-7)​

Objectives:

  • βœ… GitHub/GitLab webhook integration
  • βœ… Automatic webhook setup
  • βœ… Commit ingestion via Pub/Sub
  • βœ… Background workers for AI analysis

Deliverables:

  • Webhook handlers for GitHub and GitLab
  • Pub/Sub topics and subscriptions
  • Worker pods for async processing
  • Automatic webhook creation on repo connection
  • Repository management UI

Acceptance Criteria:

  • Webhooks process push events in < 10s
  • Commits stored in database correctly
  • AI analysis completes in < 30s
  • Users can connect unlimited repositories

Phase 2D: Frontend Migration to React (Weeks 8-10)​

Objectives:

  • βœ… Migrate GPS dashboard to React + TypeScript
  • βœ… Add multi-project switcher
  • βœ… Add organization/repository management
  • βœ… Real-time updates via WebSocket

Deliverables:

  • React 18 + TypeScript setup
  • GPS dashboard components migrated
  • Multi-project navigation
  • Repository connection wizard
  • WebSocket integration for live updates
  • Responsive mobile design

Acceptance Criteria:

  • All POC features preserved
  • Multi-project switching works smoothly
  • WebSocket updates < 100ms latency
  • Lighthouse score > 90

Phase 2E: GKE Deployment & DevOps (Weeks 11-12)​

Objectives:

  • βœ… GKE cluster setup
  • βœ… CI/CD pipeline (Cloud Build)
  • βœ… Monitoring (Prometheus + Grafana)
  • βœ… Logging (Cloud Logging)

Deliverables:

  • GKE cluster provisioned (multi-zone HA)
  • Kubernetes manifests (deployments, services, HPA)
  • Cloud Build CI/CD pipeline
  • Prometheus + Grafana dashboards
  • Alerting rules (PagerDuty integration)
  • Disaster recovery plan

Acceptance Criteria:

  • Deployment completes in < 5 minutes
  • HPA scales 3-50 pods based on load
  • 99.9% uptime SLA met
  • Alerts fire within 2 minutes of issues

Phase 2F: Load Testing & Optimization (Weeks 13-14)​

Objectives:

  • βœ… Load testing (k6)
  • βœ… Performance optimization
  • βœ… Database query tuning
  • βœ… Caching layer validation

Deliverables:

  • Load tests for 10,000 concurrent users
  • Performance optimization report
  • Database indexes optimized
  • Redis cache hit rate > 80%
  • API response time < 200ms (p95)

Acceptance Criteria:

  • System handles 5,000 req/s
  • No errors at 10,000 concurrent users
  • Cache hit rate > 80%
  • Database queries < 50ms (p95)

Phase 2G: Security Audit (Week 15)​

Objectives:

  • βœ… Security penetration testing
  • βœ… OWASP Top 10 compliance
  • βœ… Data encryption validation
  • βœ… GDPR compliance check

Deliverables:

  • Penetration test report
  • OWASP compliance checklist
  • GDPR compliance documentation
  • Security fixes implemented

Acceptance Criteria:

  • Zero critical vulnerabilities
  • All OWASP Top 10 mitigated
  • GDPR data export/delete working
  • Security scan passes (Snyk, Dependabot)

Phase 2H: Beta Launch (Week 16)​

Objectives:

  • βœ… Documentation complete
  • βœ… Onboarding flow tested
  • βœ… Beta user recruitment
  • βœ… Support system ready

Deliverables:

  • User documentation (help center)
  • API documentation (OpenAPI)
  • Onboarding tutorial
  • Beta user feedback system
  • Support ticketing system

Acceptance Criteria:

  • 50 beta users onboarded
  • < 5% onboarding drop-off rate
  • User satisfaction > 4.0/5.0
  • Support response time < 4 hours

Risk Assessment​

RiskImpactProbabilitySeverityMitigation
Database migration data lossHighLowMEDIUMThorough testing, rollback plan, backup validation
GCP cost overrunsHighMediumHIGHBudget alerts, cost monitoring, HPA limits
OAuth provider downtimeMediumLowLOWFallback email/password auth, status monitoring
AI API rate limitsMediumMediumMEDIUMRequest queuing, TF-IDF fallback, caching
Multi-tenant data leakageCriticalLowCRITICALRLS testing, penetration testing, code review
Webhook reliability issuesMediumMediumMEDIUMRetry logic, dead-letter queue, monitoring
Performance degradationHighMediumHIGHLoad testing, HPA, database optimization
Security vulnerabilitiesCriticalLowCRITICALSecurity audit, automated scanning, WAF

Critical Path Risks (Require Immediate Attention)​

Risk 1: Multi-Tenant Data Leakage​

  • Impact: Tenant A sees Tenant B's data (catastrophic security breach)
  • Mitigation:
    • Comprehensive RLS policy testing
    • Integration tests with multiple tenants
    • Penetration testing before launch
    • Code review by security expert
    • Database audit logging enabled

Risk 2: GCP Cost Overruns​

  • Impact: Budget exceeded, unexpected bills
  • Mitigation:
    • Budget alerts at 50%, 80%, 100% thresholds
    • HPA max limits enforced (maxReplicas: 50)
    • Preemptible nodes for non-critical workloads
    • Monthly cost review meetings
    • Reserved instances for predictable workloads

Risk 3: Performance Degradation at Scale​

  • Impact: Slow API responses, poor user experience
  • Mitigation:
    • Load testing at 2x expected scale
    • Database query optimization (indexes, EXPLAIN ANALYZE)
    • Redis caching for hot data
    • CDN for static assets
    • Read replicas for analytics queries

Success Metrics​

Technical Metrics​

MetricTargetCurrent (POC)Phase 2 Target
API Response Time (p95)< 200ms~50ms< 150ms
API Response Time (p99)< 500ms~100ms< 300ms
Database Query Time (p95)< 50ms~20ms (SQLite)< 50ms
Uptime99.9%N/A (local)99.9%
Concurrent Users10,000+110,000+
Requests/second5,000+~105,000+
Cache Hit Rate> 80%0%> 80%
Deployment Time< 5 minN/A< 5 min

Business Metrics​

MetricTargetTimeline
Beta Users50Month 1
Paying Customers120Month 6
Monthly Recurring Revenue$10,000Month 6
Customer Acquisition Cost< $100Month 3
Customer Lifetime Value> $1,200Month 6
Churn Rate< 5% monthlyMonth 6
Net Promoter Score> 50Month 3

User Experience Metrics​

MetricTarget
Onboarding Completion Rate> 95%
Time to First Value< 5 minutes
Support Response Time< 4 hours
Customer Satisfaction> 4.5/5.0
Feature Adoption Rate> 70%

Appendix A: Technology Stack​

Backend​

  • Language: Python 3.14
  • Framework: FastAPI (async)
  • ORM: SQLAlchemy 2.0 (async)
  • Database: PostgreSQL 15 (Cloud SQL)
  • Cache: Redis 7.0 (Memorystore)
  • Task Queue: Pub/Sub
  • AI: Anthropic Claude Sonnet 4.5

Frontend​

  • Language: TypeScript 5.0
  • Framework: React 18
  • State: Redux Toolkit
  • UI: Material-UI (MUI)
  • Build: Vite
  • Testing: Jest + React Testing Library

Infrastructure​

  • Cloud: Google Cloud Platform (GCP)
  • Container Orchestration: Google Kubernetes Engine (GKE)
  • CI/CD: Cloud Build
  • Monitoring: Prometheus + Grafana + Cloud Monitoring
  • Logging: Cloud Logging (Loki)
  • Secrets: Secret Manager
  • CDN: Cloud CDN (for static assets)

Development Tools​

  • Version Control: Git + GitHub
  • IDE: VS Code + Claude Code
  • API Docs: OpenAPI (FastAPI auto-generated)
  • Load Testing: k6
  • Security Scanning: Snyk, Dependabot

Appendix B: API Migration Map (POC β†’ Phase 2)​

POC EndpointPhase 2 EndpointChanges
GET /api/v2/tasksGET /api/v2/tenants/{tenant_id}/projects/{project_id}/tasksAdd tenant/project scoping
POST /api/v2/tasksPOST /api/v2/tenants/{tenant_id}/projects/{project_id}/tasksAdd tenant/project scoping
GET /api/v2/commitsGET /api/v2/tenants/{tenant_id}/repositories/{repo_id}/commitsMulti-repo support
POST /api/v2/commitsWebhook: POST /webhooks/github/{repo_id}GitHub webhook handler
POST /api/v2/commits/analyzeAsync worker (Pub/Sub)Background processing
GET /api/v2/linksGET /api/v2/tenants/{tenant_id}/projects/{project_id}/linksAdd tenant/project scoping
POST /api/v2/links/suggestAsync worker (Pub/Sub)Background AI analysis
N/APOST /auth/githubOAuth 2.0 login
N/APOST /auth/logoutSession invalidation
N/AGET /api/v2/tenants/{tenant_id}/usersUser management
N/APOST /api/v2/tenantsOrganization signup
N/APOST /api/v2/repositoriesConnect repository

Last Updated: 2025-11-28
Version: 2.0.0
Status: Architecture Planning
Next Review: 2025-12-15
Owner: Hal Casteel (Founder/CEO/CTO)