C3 Component Diagram: PostgreSQL Components

Purpose: Component-level architecture of the Cloud SQL PostgreSQL database, showing Row-Level Security (RLS) enforcement, connection pooling, backup strategy, and performance optimization for multi-tenant data isolation.

Scope: PostgreSQL 16 database layer (Cloud SQL)

Related Diagrams:

C2: Container Diagram - Shows PostgreSQL as a container
SUP-01: Entity-Relationship Diagram - Complete database schema
C4-04: Multi-Tenant Context Manager - Application integration
ADR-007: Django Multi-Tenant Architecture - RLS specification

Mermaid Component Diagram

Component Details

1. Connection Management

Connection Pool (Django + asyncpg)

Purpose: Manage database connections efficiently with pooling.

Configuration:

# settings/base.py

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': os.getenv('DB_NAME', 'coditect'),
        'USER': os.getenv('DB_USER', 'app_user'),
        'PASSWORD': os.getenv('DB_PASSWORD'),  # From Secret Manager
        'HOST': os.getenv('DB_HOST', '/cloudsql/...'),
        'PORT': os.getenv('DB_PORT', '5432'),
        'ATOMIC_REQUESTS': True,
        'CONN_MAX_AGE': 600,  # Connection pooling (10 min)
        'OPTIONS': {
            'connect_timeout': 10,
            'options': '-c search_path=public',
            # Connection pooling settings
            'pool_size': 5,        # Minimum connections
            'max_overflow': 15,    # Additional connections (total 20)
            'pool_timeout': 30,    # Wait 30s for connection
            'pool_recycle': 3600,  # Recycle connections every hour
            'pool_pre_ping': True, # Test connection before using
        }
    }
}

Key Features:

Persistent connections: CONN_MAX_AGE = 600 (10 min)
Pool size: 5 minimum, 20 maximum connections
Pre-ping: Test connection health before use
Recycle: Refresh connections every hour (prevents stale connections)
Timeout: 30s wait for available connection (prevents deadlock)

Benefits:

Reduced latency (no connection overhead per request)
Better resource utilization
Automatic connection health checks

Cloud SQL Proxy (Unix Socket)

Purpose: Secure connection to Cloud SQL without exposing public IP.

Configuration:

# Kubernetes deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: django-backend
spec:
  template:
    spec:
      containers:
      - name: cloud-sql-proxy
        image: gcr.io/cloudsql-docker/gce-proxy:latest
        command:
        - "/cloud_sql_proxy"
        - "-instances=coditect-cloud-infra:us-central1:licenses-db=unix:/cloudsql/licenses-db"
        - "-credential_file=/secrets/service-account-key.json"
        volumeMounts:
        - name: cloudsql
          mountPath: /cloudsql

      - name: django
        image: gcr.io/coditect-cloud-infra/license-server:latest
        env:
        - name: DB_HOST
          value: "/cloudsql/coditect-cloud-infra:us-central1:licenses-db"
        volumeMounts:
        - name: cloudsql
          mountPath: /cloudsql

      volumes:
      - name: cloudsql
        emptyDir: {}

Key Features:

Unix socket connection: No network overhead
Automatic TLS: End-to-end encryption
IAM authentication: No passwords in connection string
Connection multiplexing: Single proxy handles multiple connections

Benefits:

No public IP exposure (secure)
Automatic SSL/TLS encryption
IAM-based authentication
Simplified network configuration

2. Row-Level Security (RLS) Enforcement

Session Variable (app.current_tenant_id)

Purpose: Store current tenant ID for RLS policy evaluation.

PostgreSQL Session Variables:

-- Set session variable (called by Django)
SELECT set_current_tenant('tenant-uuid');
-- Internally: PERFORM set_config('app.current_tenant_id', 'tenant-uuid', false)

-- Get session variable (used by RLS policies)
SELECT current_setting('app.current_tenant_id')::UUID;
-- Returns: 'tenant-uuid'

-- Clear session variable (called after request)
SELECT clear_current_tenant();
-- Internally: PERFORM set_config('app.current_tenant_id', '', false)

Lifecycle:

Request Start → TenantMiddleware
  ↓
set_current_tenant(tenant) → PostgreSQL session variable set
  ↓
All queries → RLS policies use current_setting('app.current_tenant_id')
  ↓
Request End → TenantMiddleware
  ↓
clear_current_tenant() → PostgreSQL session variable cleared

RLS Policy Engine

Purpose: Evaluate RLS policies on every query and rewrite queries to include tenant filter.

RLS Policy Example:

-- Enable RLS on users table
ALTER TABLE users ENABLE ROW LEVEL SECURITY;

-- Create RLS policy
CREATE POLICY users_tenant_isolation ON users
    USING (tenant_id = current_setting('app.current_tenant_id')::UUID);

Query Rewriting:

-- Original query (Django ORM)
SELECT * FROM users WHERE is_active = TRUE;

-- PostgreSQL RLS rewrites to:
SELECT * FROM users
WHERE is_active = TRUE
  AND tenant_id = current_setting('app.current_tenant_id')::UUID;

Performance:

RLS policies evaluated at planner stage (not per-row)
Filters pushed down to index scans (efficient)
Minimal overhead (<5% query time)

Query Rewriter

Purpose: Modify queries to include RLS filters automatically.

Execution Plan:

EXPLAIN ANALYZE
SELECT * FROM users WHERE email = 'user@example.com';

-- Output:
Index Scan using idx_users_email on users  (cost=0.29..8.31 rows=1 width=...)
  Index Cond: ((tenant_id = current_setting('app.current_tenant_id')::UUID)
               AND (email = 'user@example.com'::text))
  Filter: (tenant_id = current_setting('app.current_tenant_id')::UUID)
  Rows Removed by Filter: 0
Planning Time: 0.156 ms
Execution Time: 0.087 ms

Key Points:

RLS filter included in index condition (efficient)
No additional overhead (index already includes tenant_id)
Planning time: <1ms

3. Database Instance (High Availability)

Primary Instance

Specifications:

Instance Type: db-custom-2-7680 (2 vCPU, 7.5 GB RAM)
Storage: 100 GB SSD (auto-expanding)
Zone: us-central1-a (primary)
PostgreSQL Version: 16.x

Configuration:

-- PostgreSQL configuration
ALTER SYSTEM SET max_connections = 100;
ALTER SYSTEM SET shared_buffers = '2GB';
ALTER SYSTEM SET effective_cache_size = '6GB';
ALTER SYSTEM SET maintenance_work_mem = '512MB';
ALTER SYSTEM SET work_mem = '32MB';
ALTER SYSTEM SET checkpoint_completion_target = 0.9;
ALTER SYSTEM SET wal_buffers = '16MB';
ALTER SYSTEM SET default_statistics_target = 100;
ALTER SYSTEM SET random_page_cost = 1.1;  # SSD storage
ALTER SYSTEM SET effective_io_concurrency = 200;  # SSD concurrent IO

Performance Tuning:

shared_buffers: 25% of RAM (2 GB)
effective_cache_size: 75% of RAM (6 GB)
work_mem: 32 MB per query (prevents temp file spill)
random_page_cost: 1.1 (SSD-optimized)

Standby Instance (HA)

Purpose: Automatic failover for high availability.

Configuration:

Zone: us-central1-b (different zone from primary)
Replication: Synchronous (zero data loss)
Failover: Automatic (< 60s downtime)

Replication Flow:

Primary (us-central1-a)
  ↓ (synchronous replication)
Standby (us-central1-b)
  ↓ (ready for automatic promotion)
Failover (if primary fails)

Failover Process:

Primary instance failure detected (<30s)
Cloud SQL promotes standby to primary (automatic)
DNS updated to point to new primary
Connections rerouted to new primary
Total downtime: <60s

Read Replica (Optional)

Purpose: Offload read queries for analytics and reporting.

Configuration:

Zone: us-east1-b (different region for geographic distribution)
Replication: Asynchronous (eventual consistency)
Lag: <10s typical

Use Cases:

Analytics queries (long-running reports)
Dashboard data (tolerates lag)
Backup queries (no impact on primary)

4. Indexing & Performance

B-Tree Indexes

Composite Indexes (tenant_id + column):

-- Users table
CREATE INDEX idx_users_tenant ON users(tenant_id);
CREATE INDEX idx_users_email ON users(tenant_id, email);
CREATE INDEX idx_users_username ON users(tenant_id, username);

-- License sessions table
CREATE INDEX idx_license_sessions_tenant ON license_sessions(tenant_id);
CREATE INDEX idx_license_sessions_status ON license_sessions(tenant_id, status);
CREATE INDEX idx_license_sessions_token ON license_sessions(session_token);

-- Projects table
CREATE INDEX idx_projects_tenant ON projects(tenant_id);
CREATE INDEX idx_projects_slug ON projects(tenant_id, slug);

Why Composite Indexes (tenant_id first)?

RLS policies always filter by tenant_id
Composite index (tenant_id, column) allows index-only scans
Significantly faster than table scans

Performance:

-- Without composite index (slow)
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'user@example.com';
-- Seq Scan on users (cost=0.00..1234.56 rows=1 width=...)

-- With composite index (fast)
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'user@example.com';
-- Index Scan using idx_users_email (cost=0.29..8.31 rows=1 width=...)

Partial Indexes

Indexes with WHERE clause (smaller, faster):

-- Index only active license sessions (most common query)
CREATE INDEX idx_license_sessions_expires ON license_sessions(expires_at)
    WHERE status = 'active';

-- Index only active tenants (most common query)
CREATE INDEX idx_tenants_active ON tenants(status)
    WHERE status = 'active';

Benefits:

Smaller index size (faster scans)
Only indexes relevant rows
Reduced maintenance overhead (no inactive rows indexed)

Statistics Collector (Auto-Analyze)

Purpose: Collect query statistics for query planner optimization.

Configuration:

ALTER SYSTEM SET track_counts = on;
ALTER SYSTEM SET track_functions = all;
ALTER SYSTEM SET track_activity_query_size = 4096;
ALTER SYSTEM SET track_io_timing = on;

-- Auto-analyze settings
ALTER SYSTEM SET autovacuum_analyze_threshold = 50;
ALTER SYSTEM SET autovacuum_analyze_scale_factor = 0.1;

Statistics Views:

-- Query performance
SELECT * FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10;

-- Index usage
SELECT * FROM pg_stat_user_indexes ORDER BY idx_scan DESC;

-- Table statistics
SELECT * FROM pg_stat_user_tables;

5. Backup & Recovery

Automated Backups

Configuration:

Frequency: Daily at 2 AM UTC
Retention: 7 days (configurable up to 365 days)
Window: 4-hour window (2 AM - 6 AM UTC)
Transaction logs: Continuous archiving for PITR

Backup Process:

Cloud SQL initiates backup at 2 AM UTC
Full database snapshot created
Compressed and uploaded to Cloud Storage
Previous backups rotated (7-day retention)
Backup verification (checksum)
Backup available for restore

Restore:

# List backups
gcloud sql backups list --instance=licenses-db

# Restore from specific backup
gcloud sql backups restore BACKUP_ID \
  --backup-instance=licenses-db \
  --instance=licenses-db-restored

Point-in-Time Recovery (PITR)

Purpose: Restore database to any point in time within retention period.

Configuration:

WAL Archiving: Enabled (continuous)
Retention: 7 days
Granularity: 1-second precision

PITR Process:

Base backup (daily at 2 AM)
WAL segments archived continuously to Cloud Storage
PITR: Restore base backup + replay WAL to target time
Result: Database state at exact target timestamp

Restore to Specific Time:

gcloud sql instances restore-backup licenses-db \
  --backup-id=BACKUP_ID \
  --pitr-target-time="2025-11-30T12:34:56Z"

Use Cases:

Accidental data deletion (restore before deletion)
Application bug (restore before bug deployed)
Corruption detection (restore to last known good state)

6. Monitoring & Observability

pg_stat_* Views

Key Statistics Views:

-- Table statistics
SELECT schemaname, relname, n_tup_ins, n_tup_upd, n_tup_del, n_live_tup, n_dead_tup
FROM pg_stat_user_tables
ORDER BY n_tup_ins DESC;

-- Index usage
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

-- Database connections
SELECT datname, numbackends, xact_commit, xact_rollback, blks_read, blks_hit
FROM pg_stat_database;

pg_stat_statements (Slow Query Tracking)

Enable Extension:

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

Query Analysis:

-- Top 10 slowest queries
SELECT
    query,
    calls,
    total_exec_time / 1000 AS total_time_seconds,
    mean_exec_time / 1000 AS mean_time_seconds,
    max_exec_time / 1000 AS max_time_seconds
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

-- Most frequent queries
SELECT
    query,
    calls,
    rows,
    total_exec_time / calls AS avg_time_ms
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 10;

Cloud Monitoring Integration

Metrics Exported:

Metric	Description	Alert Threshold
`database/cpu/utilization`	CPU usage	>80%
`database/memory/utilization`	Memory usage	>85%
`database/disk/utilization`	Disk usage	>90%
`database/connections`	Active connections	>80 (of 100 max)
`database/replication_lag`	Standby lag	>30s
`database/transaction_count`	TPS	<100 (unusual drop)

Alerting:

# Monitoring alert policy (Cloud Monitoring)
alert_policy:
  - name: "Database CPU High"
    condition: database/cpu/utilization > 0.8
    duration: 5m
    notification_channels:
      - email: ops@coditect.ai
      - pagerduty: on-call-team

  - name: "Replication Lag High"
    condition: database/replication_lag > 30s
    duration: 2m
    notification_channels:
      - pagerduty: on-call-team

7. Maintenance

Auto-Vacuum (MVCC Cleanup)

Purpose: Clean up dead tuples from UPDATE/DELETE operations.

Configuration:

-- Auto-vacuum settings
ALTER SYSTEM SET autovacuum = on;
ALTER SYSTEM SET autovacuum_max_workers = 3;
ALTER SYSTEM SET autovacuum_naptime = 60;  # Check every 60s
ALTER SYSTEM SET autovacuum_vacuum_threshold = 50;
ALTER SYSTEM SET autovacuum_vacuum_scale_factor = 0.2;
ALTER SYSTEM SET autovacuum_vacuum_cost_delay = 20ms;

Dead Tuple Cleanup:

Table receives UPDATE (old row marked dead, new row inserted)
Dead tuples accumulate (20% of table or 50 rows)
Auto-vacuum triggered (background process)
Dead tuples cleaned up (space reclaimed)
Statistics updated (query planner optimization)

Monitoring:

-- Check for tables needing vacuum
SELECT schemaname, relname, n_dead_tup, n_live_tup,
       n_dead_tup::float / (n_live_tup + 1) AS dead_ratio
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY dead_ratio DESC;

Reindex (Manual)

Purpose: Rebuild bloated indexes for optimal performance.

When to Reindex:

Index bloat >50%
Query performance degradation
After major data changes

Reindex Strategy:

-- Reindex concurrently (no table locking)
REINDEX INDEX CONCURRENTLY idx_users_email;

-- Reindex table (all indexes, requires lock)
REINDEX TABLE CONCURRENTLY users;

Bloat Detection:

-- Check index bloat
SELECT schemaname, tablename, indexname,
       pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
       pg_size_pretty(pg_total_relation_size(indexrelid) - pg_relation_size(indexrelid)) AS bloat
FROM pg_stat_user_indexes
ORDER BY pg_total_relation_size(indexrelid) DESC
LIMIT 10;

Performance Optimization Examples

Query Performance (License Validation)

Query: Validate license session and check expiration

-- Set tenant context
SELECT set_current_tenant('tenant-uuid');

-- Query
SELECT * FROM license_sessions
WHERE session_token = 'session-uuid'
  AND status = 'active'
  AND expires_at > NOW();

Execution Plan:

Index Scan using idx_license_sessions_token on license_sessions
  (cost=0.29..8.31 rows=1 width=...)
  Index Cond: (session_token = 'session-uuid'::text)
  Filter: ((status = 'active') AND (expires_at > now())
           AND (tenant_id = current_setting('app.current_tenant_id')::UUID))
Planning Time: 0.156 ms
Execution Time: 0.087 ms

Performance:

Planning: 0.156 ms
Execution: 0.087 ms
Total: <1ms (well below 100ms target)

Query: Login user by email

SELECT set_current_tenant('tenant-uuid');

SELECT * FROM users
WHERE email = 'user@example.com'
  AND is_active = TRUE;

Execution Plan:

Index Scan using idx_users_email on users
  (cost=0.29..8.31 rows=1 width=...)
  Index Cond: ((tenant_id = current_setting('app.current_tenant_id')::UUID)
               AND (email = 'user@example.com'::text))
  Filter: (is_active = TRUE)
Planning Time: 0.123 ms
Execution Time: 0.065 ms

Performance:

Total: <1ms (excellent)
Index Used: Composite index (tenant_id, email)
Rows Scanned: 1 (optimal)

Disaster Recovery

Recovery Time Objective (RTO): <5 minutes

Scenario: Primary instance failure

Detection: Cloud SQL health checks fail (30s)
Failover: Standby promoted to primary (60s)
DNS Update: Connection string updated (30s)
Verification: Health checks pass (30s)
Application Reconnect: Connection pool refreshes (60s)

Total RTO: <5 minutes

Recovery Point Objective (RPO): <1 second

Synchronous Replication:

Primary writes to standby before commit acknowledgment
Zero data loss on failover
RPO: 0 seconds (no data loss)

Asynchronous Replication (Read Replicas):

Primary commits before replicating to read replica
Potential data loss: <10s lag
RPO: <10 seconds

C2: Container Diagram - High-level architecture
SUP-01: Entity-Relationship Diagram - Complete schema
C4-04: Multi-Tenant Context Manager - RLS integration
ADR-007: Django Multi-Tenant Architecture - Complete spec
PostgreSQL Docs: Row-Level Security
Cloud SQL Docs: High Availability

Last Updated: 2025-11-30 Diagram Type: C3 Component (Mermaid) Scope: PostgreSQL Database Layer - Cloud SQL

Mermaid Component Diagram​

Component Details​

1. Connection Management​

Connection Pool (Django + asyncpg)​

Cloud SQL Proxy (Unix Socket)​

2. Row-Level Security (RLS) Enforcement​

Session Variable (app.current_tenant_id)​

RLS Policy Engine​

Query Rewriter​

3. Database Instance (High Availability)​

Primary Instance​

Standby Instance (HA)​

Read Replica (Optional)​

4. Indexing & Performance​

B-Tree Indexes​

Partial Indexes​

Statistics Collector (Auto-Analyze)​

5. Backup & Recovery​

Automated Backups​

Point-in-Time Recovery (PITR)​

6. Monitoring & Observability​

pg_stat_* Views​

pg_stat_statements (Slow Query Tracking)​

Cloud Monitoring Integration​

7. Maintenance​

Auto-Vacuum (MVCC Cleanup)​

Reindex (Manual)​

Performance Optimization Examples​

Query Performance (License Validation)​

Index Selection (User Login)​

Disaster Recovery​

Recovery Time Objective (RTO): <5 minutes​

Recovery Point Objective (RPO): <1 second​

Related Documentation​

Mermaid Component Diagram

Component Details

1. Connection Management

Connection Pool (Django + asyncpg)

Cloud SQL Proxy (Unix Socket)

2. Row-Level Security (RLS) Enforcement

Session Variable (app.current_tenant_id)

RLS Policy Engine

Query Rewriter

3. Database Instance (High Availability)

Primary Instance

Standby Instance (HA)

Read Replica (Optional)

4. Indexing & Performance

B-Tree Indexes

Partial Indexes

Statistics Collector (Auto-Analyze)

5. Backup & Recovery

Automated Backups

Point-in-Time Recovery (PITR)

6. Monitoring & Observability

pg_stat_* Views

pg_stat_statements (Slow Query Tracking)

Cloud Monitoring Integration

7. Maintenance

Auto-Vacuum (MVCC Cleanup)

Reindex (Manual)

Performance Optimization Examples

Query Performance (License Validation)

Index Selection (User Login)

Disaster Recovery

Recovery Time Objective (RTO): <5 minutes

Recovery Point Objective (RPO): <1 second

Related Documentation