C3 Component Diagram: PostgreSQL Components
Purpose: Component-level architecture of the Cloud SQL PostgreSQL database, showing Row-Level Security (RLS) enforcement, connection pooling, backup strategy, and performance optimization for multi-tenant data isolation.
Scope: PostgreSQL 16 database layer (Cloud SQL)
Related Diagrams:
- C2: Container Diagram - Shows PostgreSQL as a container
- SUP-01: Entity-Relationship Diagram - Complete database schema
- C4-04: Multi-Tenant Context Manager - Application integration
- ADR-007: Django Multi-Tenant Architecture - RLS specification
Mermaid Component Diagram
Component Details
1. Connection Management
Connection Pool (Django + asyncpg)
Purpose: Manage database connections efficiently with pooling.
Configuration:
# settings/base.py
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': os.getenv('DB_NAME', 'coditect'),
'USER': os.getenv('DB_USER', 'app_user'),
'PASSWORD': os.getenv('DB_PASSWORD'), # From Secret Manager
'HOST': os.getenv('DB_HOST', '/cloudsql/...'),
'PORT': os.getenv('DB_PORT', '5432'),
'ATOMIC_REQUESTS': True,
'CONN_MAX_AGE': 600, # Connection pooling (10 min)
'OPTIONS': {
'connect_timeout': 10,
'options': '-c search_path=public',
# Connection pooling settings
'pool_size': 5, # Minimum connections
'max_overflow': 15, # Additional connections (total 20)
'pool_timeout': 30, # Wait 30s for connection
'pool_recycle': 3600, # Recycle connections every hour
'pool_pre_ping': True, # Test connection before using
}
}
}
Key Features:
- Persistent connections:
CONN_MAX_AGE = 600(10 min) - Pool size: 5 minimum, 20 maximum connections
- Pre-ping: Test connection health before use
- Recycle: Refresh connections every hour (prevents stale connections)
- Timeout: 30s wait for available connection (prevents deadlock)
Benefits:
- Reduced latency (no connection overhead per request)
- Better resource utilization
- Automatic connection health checks
Cloud SQL Proxy (Unix Socket)
Purpose: Secure connection to Cloud SQL without exposing public IP.
Configuration:
# Kubernetes deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: django-backend
spec:
template:
spec:
containers:
- name: cloud-sql-proxy
image: gcr.io/cloudsql-docker/gce-proxy:latest
command:
- "/cloud_sql_proxy"
- "-instances=coditect-cloud-infra:us-central1:licenses-db=unix:/cloudsql/licenses-db"
- "-credential_file=/secrets/service-account-key.json"
volumeMounts:
- name: cloudsql
mountPath: /cloudsql
- name: django
image: gcr.io/coditect-cloud-infra/license-server:latest
env:
- name: DB_HOST
value: "/cloudsql/coditect-cloud-infra:us-central1:licenses-db"
volumeMounts:
- name: cloudsql
mountPath: /cloudsql
volumes:
- name: cloudsql
emptyDir: {}
Key Features:
- Unix socket connection: No network overhead
- Automatic TLS: End-to-end encryption
- IAM authentication: No passwords in connection string
- Connection multiplexing: Single proxy handles multiple connections
Benefits:
- No public IP exposure (secure)
- Automatic SSL/TLS encryption
- IAM-based authentication
- Simplified network configuration
2. Row-Level Security (RLS) Enforcement
Session Variable (app.current_tenant_id)
Purpose: Store current tenant ID for RLS policy evaluation.
PostgreSQL Session Variables:
-- Set session variable (called by Django)
SELECT set_current_tenant('tenant-uuid');
-- Internally: PERFORM set_config('app.current_tenant_id', 'tenant-uuid', false)
-- Get session variable (used by RLS policies)
SELECT current_setting('app.current_tenant_id')::UUID;
-- Returns: 'tenant-uuid'
-- Clear session variable (called after request)
SELECT clear_current_tenant();
-- Internally: PERFORM set_config('app.current_tenant_id', '', false)
Lifecycle:
Request Start → TenantMiddleware
↓
set_current_tenant(tenant) → PostgreSQL session variable set
↓
All queries → RLS policies use current_setting('app.current_tenant_id')
↓
Request End → TenantMiddleware
↓
clear_current_tenant() → PostgreSQL session variable cleared
RLS Policy Engine
Purpose: Evaluate RLS policies on every query and rewrite queries to include tenant filter.
RLS Policy Example:
-- Enable RLS on users table
ALTER TABLE users ENABLE ROW LEVEL SECURITY;
-- Create RLS policy
CREATE POLICY users_tenant_isolation ON users
USING (tenant_id = current_setting('app.current_tenant_id')::UUID);
Query Rewriting:
-- Original query (Django ORM)
SELECT * FROM users WHERE is_active = TRUE;
-- PostgreSQL RLS rewrites to:
SELECT * FROM users
WHERE is_active = TRUE
AND tenant_id = current_setting('app.current_tenant_id')::UUID;
Performance:
- RLS policies evaluated at planner stage (not per-row)
- Filters pushed down to index scans (efficient)
- Minimal overhead (<5% query time)
Query Rewriter
Purpose: Modify queries to include RLS filters automatically.
Execution Plan:
EXPLAIN ANALYZE
SELECT * FROM users WHERE email = 'user@example.com';
-- Output:
Index Scan using idx_users_email on users (cost=0.29..8.31 rows=1 width=...)
Index Cond: ((tenant_id = current_setting('app.current_tenant_id')::UUID)
AND (email = 'user@example.com'::text))
Filter: (tenant_id = current_setting('app.current_tenant_id')::UUID)
Rows Removed by Filter: 0
Planning Time: 0.156 ms
Execution Time: 0.087 ms
Key Points:
- RLS filter included in index condition (efficient)
- No additional overhead (index already includes tenant_id)
- Planning time: <1ms
3. Database Instance (High Availability)
Primary Instance
Specifications:
- Instance Type: db-custom-2-7680 (2 vCPU, 7.5 GB RAM)
- Storage: 100 GB SSD (auto-expanding)
- Zone: us-central1-a (primary)
- PostgreSQL Version: 16.x
Configuration:
-- PostgreSQL configuration
ALTER SYSTEM SET max_connections = 100;
ALTER SYSTEM SET shared_buffers = '2GB';
ALTER SYSTEM SET effective_cache_size = '6GB';
ALTER SYSTEM SET maintenance_work_mem = '512MB';
ALTER SYSTEM SET work_mem = '32MB';
ALTER SYSTEM SET checkpoint_completion_target = 0.9;
ALTER SYSTEM SET wal_buffers = '16MB';
ALTER SYSTEM SET default_statistics_target = 100;
ALTER SYSTEM SET random_page_cost = 1.1; # SSD storage
ALTER SYSTEM SET effective_io_concurrency = 200; # SSD concurrent IO
Performance Tuning:
- shared_buffers: 25% of RAM (2 GB)
- effective_cache_size: 75% of RAM (6 GB)
- work_mem: 32 MB per query (prevents temp file spill)
- random_page_cost: 1.1 (SSD-optimized)
Standby Instance (HA)
Purpose: Automatic failover for high availability.
Configuration:
- Zone: us-central1-b (different zone from primary)
- Replication: Synchronous (zero data loss)
- Failover: Automatic (< 60s downtime)
Replication Flow:
Primary (us-central1-a)
↓ (synchronous replication)
Standby (us-central1-b)
↓ (ready for automatic promotion)
Failover (if primary fails)
Failover Process:
1. Primary instance failure detected (<30s)
2. Cloud SQL promotes standby to primary (automatic)
3. DNS updated to point to new primary
4. Connections rerouted to new primary
5. Total downtime: <60s
Read Replica (Optional)
Purpose: Offload read queries for analytics and reporting.
Configuration:
- Zone: us-east1-b (different region for geographic distribution)
- Replication: Asynchronous (eventual consistency)
- Lag: <10s typical
Use Cases:
- Analytics queries (long-running reports)
- Dashboard data (tolerates lag)
- Backup queries (no impact on primary)
4. Indexing & Performance
B-Tree Indexes
Composite Indexes (tenant_id + column):
-- Users table
CREATE INDEX idx_users_tenant ON users(tenant_id);
CREATE INDEX idx_users_email ON users(tenant_id, email);
CREATE INDEX idx_users_username ON users(tenant_id, username);
-- License sessions table
CREATE INDEX idx_license_sessions_tenant ON license_sessions(tenant_id);
CREATE INDEX idx_license_sessions_status ON license_sessions(tenant_id, status);
CREATE INDEX idx_license_sessions_token ON license_sessions(session_token);
-- Projects table
CREATE INDEX idx_projects_tenant ON projects(tenant_id);
CREATE INDEX idx_projects_slug ON projects(tenant_id, slug);
Why Composite Indexes (tenant_id first)?
- RLS policies always filter by
tenant_id - Composite index (tenant_id, column) allows index-only scans
- Significantly faster than table scans
Performance:
-- Without composite index (slow)
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'user@example.com';
-- Seq Scan on users (cost=0.00..1234.56 rows=1 width=...)
-- With composite index (fast)
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'user@example.com';
-- Index Scan using idx_users_email (cost=0.29..8.31 rows=1 width=...)
Partial Indexes
Indexes with WHERE clause (smaller, faster):
-- Index only active license sessions (most common query)
CREATE INDEX idx_license_sessions_expires ON license_sessions(expires_at)
WHERE status = 'active';
-- Index only active tenants (most common query)
CREATE INDEX idx_tenants_active ON tenants(status)
WHERE status = 'active';
Benefits:
- Smaller index size (faster scans)
- Only indexes relevant rows
- Reduced maintenance overhead (no inactive rows indexed)
Statistics Collector (Auto-Analyze)
Purpose: Collect query statistics for query planner optimization.
Configuration:
ALTER SYSTEM SET track_counts = on;
ALTER SYSTEM SET track_functions = all;
ALTER SYSTEM SET track_activity_query_size = 4096;
ALTER SYSTEM SET track_io_timing = on;
-- Auto-analyze settings
ALTER SYSTEM SET autovacuum_analyze_threshold = 50;
ALTER SYSTEM SET autovacuum_analyze_scale_factor = 0.1;
Statistics Views:
-- Query performance
SELECT * FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10;
-- Index usage
SELECT * FROM pg_stat_user_indexes ORDER BY idx_scan DESC;
-- Table statistics
SELECT * FROM pg_stat_user_tables;
5. Backup & Recovery
Automated Backups
Configuration:
- Frequency: Daily at 2 AM UTC
- Retention: 7 days (configurable up to 365 days)
- Window: 4-hour window (2 AM - 6 AM UTC)
- Transaction logs: Continuous archiving for PITR
Backup Process:
1. Cloud SQL initiates backup at 2 AM UTC
2. Full database snapshot created
3. Compressed and uploaded to Cloud Storage
4. Previous backups rotated (7-day retention)
5. Backup verification (checksum)
6. Backup available for restore
Restore:
# List backups
gcloud sql backups list --instance=licenses-db
# Restore from specific backup
gcloud sql backups restore BACKUP_ID \
--backup-instance=licenses-db \
--instance=licenses-db-restored
Point-in-Time Recovery (PITR)
Purpose: Restore database to any point in time within retention period.
Configuration:
- WAL Archiving: Enabled (continuous)
- Retention: 7 days
- Granularity: 1-second precision
PITR Process:
1. Base backup (daily at 2 AM)
2. WAL segments archived continuously to Cloud Storage
3. PITR: Restore base backup + replay WAL to target time
4. Result: Database state at exact target timestamp
Restore to Specific Time:
gcloud sql instances restore-backup licenses-db \
--backup-id=BACKUP_ID \
--pitr-target-time="2025-11-30T12:34:56Z"
Use Cases:
- Accidental data deletion (restore before deletion)
- Application bug (restore before bug deployed)
- Corruption detection (restore to last known good state)
6. Monitoring & Observability
pg_stat_* Views
Key Statistics Views:
-- Table statistics
SELECT schemaname, relname, n_tup_ins, n_tup_upd, n_tup_del, n_live_tup, n_dead_tup
FROM pg_stat_user_tables
ORDER BY n_tup_ins DESC;
-- Index usage
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;
-- Database connections
SELECT datname, numbackends, xact_commit, xact_rollback, blks_read, blks_hit
FROM pg_stat_database;
pg_stat_statements (Slow Query Tracking)
Enable Extension:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
Query Analysis:
-- Top 10 slowest queries
SELECT
query,
calls,
total_exec_time / 1000 AS total_time_seconds,
mean_exec_time / 1000 AS mean_time_seconds,
max_exec_time / 1000 AS max_time_seconds
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
-- Most frequent queries
SELECT
query,
calls,
rows,
total_exec_time / calls AS avg_time_ms
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 10;
Cloud Monitoring Integration
Metrics Exported:
| Metric | Description | Alert Threshold |
|---|---|---|
database/cpu/utilization | CPU usage | >80% |
database/memory/utilization | Memory usage | >85% |
database/disk/utilization | Disk usage | >90% |
database/connections | Active connections | >80 (of 100 max) |
database/replication_lag | Standby lag | >30s |
database/transaction_count | TPS | <100 (unusual drop) |
Alerting:
# Monitoring alert policy (Cloud Monitoring)
alert_policy:
- name: "Database CPU High"
condition: database/cpu/utilization > 0.8
duration: 5m
notification_channels:
- email: ops@coditect.ai
- pagerduty: on-call-team
- name: "Replication Lag High"
condition: database/replication_lag > 30s
duration: 2m
notification_channels:
- pagerduty: on-call-team
7. Maintenance
Auto-Vacuum (MVCC Cleanup)
Purpose: Clean up dead tuples from UPDATE/DELETE operations.
Configuration:
-- Auto-vacuum settings
ALTER SYSTEM SET autovacuum = on;
ALTER SYSTEM SET autovacuum_max_workers = 3;
ALTER SYSTEM SET autovacuum_naptime = 60; # Check every 60s
ALTER SYSTEM SET autovacuum_vacuum_threshold = 50;
ALTER SYSTEM SET autovacuum_vacuum_scale_factor = 0.2;
ALTER SYSTEM SET autovacuum_vacuum_cost_delay = 20ms;
Dead Tuple Cleanup:
1. Table receives UPDATE (old row marked dead, new row inserted)
2. Dead tuples accumulate (20% of table or 50 rows)
3. Auto-vacuum triggered (background process)
4. Dead tuples cleaned up (space reclaimed)
5. Statistics updated (query planner optimization)
Monitoring:
-- Check for tables needing vacuum
SELECT schemaname, relname, n_dead_tup, n_live_tup,
n_dead_tup::float / (n_live_tup + 1) AS dead_ratio
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY dead_ratio DESC;
Reindex (Manual)
Purpose: Rebuild bloated indexes for optimal performance.
When to Reindex:
- Index bloat >50%
- Query performance degradation
- After major data changes
Reindex Strategy:
-- Reindex concurrently (no table locking)
REINDEX INDEX CONCURRENTLY idx_users_email;
-- Reindex table (all indexes, requires lock)
REINDEX TABLE CONCURRENTLY users;
Bloat Detection:
-- Check index bloat
SELECT schemaname, tablename, indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
pg_size_pretty(pg_total_relation_size(indexrelid) - pg_relation_size(indexrelid)) AS bloat
FROM pg_stat_user_indexes
ORDER BY pg_total_relation_size(indexrelid) DESC
LIMIT 10;
Performance Optimization Examples
Query Performance (License Validation)
Query: Validate license session and check expiration
-- Set tenant context
SELECT set_current_tenant('tenant-uuid');
-- Query
SELECT * FROM license_sessions
WHERE session_token = 'session-uuid'
AND status = 'active'
AND expires_at > NOW();
Execution Plan:
Index Scan using idx_license_sessions_token on license_sessions
(cost=0.29..8.31 rows=1 width=...)
Index Cond: (session_token = 'session-uuid'::text)
Filter: ((status = 'active') AND (expires_at > now())
AND (tenant_id = current_setting('app.current_tenant_id')::UUID))
Planning Time: 0.156 ms
Execution Time: 0.087 ms
Performance:
- Planning: 0.156 ms
- Execution: 0.087 ms
- Total: <1ms (well below 100ms target)
Index Selection (User Login)
Query: Login user by email
SELECT set_current_tenant('tenant-uuid');
SELECT * FROM users
WHERE email = 'user@example.com'
AND is_active = TRUE;
Execution Plan:
Index Scan using idx_users_email on users
(cost=0.29..8.31 rows=1 width=...)
Index Cond: ((tenant_id = current_setting('app.current_tenant_id')::UUID)
AND (email = 'user@example.com'::text))
Filter: (is_active = TRUE)
Planning Time: 0.123 ms
Execution Time: 0.065 ms
Performance:
- Total: <1ms (excellent)
- Index Used: Composite index (tenant_id, email)
- Rows Scanned: 1 (optimal)
Disaster Recovery
Recovery Time Objective (RTO): <5 minutes
Scenario: Primary instance failure
- Detection: Cloud SQL health checks fail (30s)
- Failover: Standby promoted to primary (60s)
- DNS Update: Connection string updated (30s)
- Verification: Health checks pass (30s)
- Application Reconnect: Connection pool refreshes (60s)
Total RTO: <5 minutes
Recovery Point Objective (RPO): <1 second
Synchronous Replication:
- Primary writes to standby before commit acknowledgment
- Zero data loss on failover
- RPO: 0 seconds (no data loss)
Asynchronous Replication (Read Replicas):
- Primary commits before replicating to read replica
- Potential data loss: <10s lag
- RPO: <10 seconds
Related Documentation
- C2: Container Diagram - High-level architecture
- SUP-01: Entity-Relationship Diagram - Complete schema
- C4-04: Multi-Tenant Context Manager - RLS integration
- ADR-007: Django Multi-Tenant Architecture - Complete spec
- PostgreSQL Docs: Row-Level Security
- Cloud SQL Docs: High Availability
Last Updated: 2025-11-30 Diagram Type: C3 Component (Mermaid) Scope: PostgreSQL Database Layer - Cloud SQL