Skip to main content

C3 Component Diagram: PostgreSQL Components

Purpose: Component-level architecture of the Cloud SQL PostgreSQL database, showing Row-Level Security (RLS) enforcement, connection pooling, backup strategy, and performance optimization for multi-tenant data isolation.

Scope: PostgreSQL 16 database layer (Cloud SQL)

Related Diagrams:


Mermaid Component Diagram


Component Details

1. Connection Management

Connection Pool (Django + asyncpg)

Purpose: Manage database connections efficiently with pooling.

Configuration:

# settings/base.py

DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': os.getenv('DB_NAME', 'coditect'),
'USER': os.getenv('DB_USER', 'app_user'),
'PASSWORD': os.getenv('DB_PASSWORD'), # From Secret Manager
'HOST': os.getenv('DB_HOST', '/cloudsql/...'),
'PORT': os.getenv('DB_PORT', '5432'),
'ATOMIC_REQUESTS': True,
'CONN_MAX_AGE': 600, # Connection pooling (10 min)
'OPTIONS': {
'connect_timeout': 10,
'options': '-c search_path=public',
# Connection pooling settings
'pool_size': 5, # Minimum connections
'max_overflow': 15, # Additional connections (total 20)
'pool_timeout': 30, # Wait 30s for connection
'pool_recycle': 3600, # Recycle connections every hour
'pool_pre_ping': True, # Test connection before using
}
}
}

Key Features:

  • Persistent connections: CONN_MAX_AGE = 600 (10 min)
  • Pool size: 5 minimum, 20 maximum connections
  • Pre-ping: Test connection health before use
  • Recycle: Refresh connections every hour (prevents stale connections)
  • Timeout: 30s wait for available connection (prevents deadlock)

Benefits:

  • Reduced latency (no connection overhead per request)
  • Better resource utilization
  • Automatic connection health checks

Cloud SQL Proxy (Unix Socket)

Purpose: Secure connection to Cloud SQL without exposing public IP.

Configuration:

# Kubernetes deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: django-backend
spec:
template:
spec:
containers:
- name: cloud-sql-proxy
image: gcr.io/cloudsql-docker/gce-proxy:latest
command:
- "/cloud_sql_proxy"
- "-instances=coditect-cloud-infra:us-central1:licenses-db=unix:/cloudsql/licenses-db"
- "-credential_file=/secrets/service-account-key.json"
volumeMounts:
- name: cloudsql
mountPath: /cloudsql

- name: django
image: gcr.io/coditect-cloud-infra/license-server:latest
env:
- name: DB_HOST
value: "/cloudsql/coditect-cloud-infra:us-central1:licenses-db"
volumeMounts:
- name: cloudsql
mountPath: /cloudsql

volumes:
- name: cloudsql
emptyDir: {}

Key Features:

  • Unix socket connection: No network overhead
  • Automatic TLS: End-to-end encryption
  • IAM authentication: No passwords in connection string
  • Connection multiplexing: Single proxy handles multiple connections

Benefits:

  • No public IP exposure (secure)
  • Automatic SSL/TLS encryption
  • IAM-based authentication
  • Simplified network configuration

2. Row-Level Security (RLS) Enforcement

Session Variable (app.current_tenant_id)

Purpose: Store current tenant ID for RLS policy evaluation.

PostgreSQL Session Variables:

-- Set session variable (called by Django)
SELECT set_current_tenant('tenant-uuid');
-- Internally: PERFORM set_config('app.current_tenant_id', 'tenant-uuid', false)

-- Get session variable (used by RLS policies)
SELECT current_setting('app.current_tenant_id')::UUID;
-- Returns: 'tenant-uuid'

-- Clear session variable (called after request)
SELECT clear_current_tenant();
-- Internally: PERFORM set_config('app.current_tenant_id', '', false)

Lifecycle:

Request Start → TenantMiddleware

set_current_tenant(tenant) → PostgreSQL session variable set

All queries → RLS policies use current_setting('app.current_tenant_id')

Request End → TenantMiddleware

clear_current_tenant() → PostgreSQL session variable cleared

RLS Policy Engine

Purpose: Evaluate RLS policies on every query and rewrite queries to include tenant filter.

RLS Policy Example:

-- Enable RLS on users table
ALTER TABLE users ENABLE ROW LEVEL SECURITY;

-- Create RLS policy
CREATE POLICY users_tenant_isolation ON users
USING (tenant_id = current_setting('app.current_tenant_id')::UUID);

Query Rewriting:

-- Original query (Django ORM)
SELECT * FROM users WHERE is_active = TRUE;

-- PostgreSQL RLS rewrites to:
SELECT * FROM users
WHERE is_active = TRUE
AND tenant_id = current_setting('app.current_tenant_id')::UUID;

Performance:

  • RLS policies evaluated at planner stage (not per-row)
  • Filters pushed down to index scans (efficient)
  • Minimal overhead (<5% query time)

Query Rewriter

Purpose: Modify queries to include RLS filters automatically.

Execution Plan:

EXPLAIN ANALYZE
SELECT * FROM users WHERE email = 'user@example.com';

-- Output:
Index Scan using idx_users_email on users (cost=0.29..8.31 rows=1 width=...)
Index Cond: ((tenant_id = current_setting('app.current_tenant_id')::UUID)
AND (email = 'user@example.com'::text))
Filter: (tenant_id = current_setting('app.current_tenant_id')::UUID)
Rows Removed by Filter: 0
Planning Time: 0.156 ms
Execution Time: 0.087 ms

Key Points:

  • RLS filter included in index condition (efficient)
  • No additional overhead (index already includes tenant_id)
  • Planning time: <1ms

3. Database Instance (High Availability)

Primary Instance

Specifications:

  • Instance Type: db-custom-2-7680 (2 vCPU, 7.5 GB RAM)
  • Storage: 100 GB SSD (auto-expanding)
  • Zone: us-central1-a (primary)
  • PostgreSQL Version: 16.x

Configuration:

-- PostgreSQL configuration
ALTER SYSTEM SET max_connections = 100;
ALTER SYSTEM SET shared_buffers = '2GB';
ALTER SYSTEM SET effective_cache_size = '6GB';
ALTER SYSTEM SET maintenance_work_mem = '512MB';
ALTER SYSTEM SET work_mem = '32MB';
ALTER SYSTEM SET checkpoint_completion_target = 0.9;
ALTER SYSTEM SET wal_buffers = '16MB';
ALTER SYSTEM SET default_statistics_target = 100;
ALTER SYSTEM SET random_page_cost = 1.1; # SSD storage
ALTER SYSTEM SET effective_io_concurrency = 200; # SSD concurrent IO

Performance Tuning:

  • shared_buffers: 25% of RAM (2 GB)
  • effective_cache_size: 75% of RAM (6 GB)
  • work_mem: 32 MB per query (prevents temp file spill)
  • random_page_cost: 1.1 (SSD-optimized)

Standby Instance (HA)

Purpose: Automatic failover for high availability.

Configuration:

  • Zone: us-central1-b (different zone from primary)
  • Replication: Synchronous (zero data loss)
  • Failover: Automatic (< 60s downtime)

Replication Flow:

Primary (us-central1-a)
↓ (synchronous replication)
Standby (us-central1-b)
↓ (ready for automatic promotion)
Failover (if primary fails)

Failover Process:

1. Primary instance failure detected (<30s)
2. Cloud SQL promotes standby to primary (automatic)
3. DNS updated to point to new primary
4. Connections rerouted to new primary
5. Total downtime: <60s

Read Replica (Optional)

Purpose: Offload read queries for analytics and reporting.

Configuration:

  • Zone: us-east1-b (different region for geographic distribution)
  • Replication: Asynchronous (eventual consistency)
  • Lag: <10s typical

Use Cases:

  • Analytics queries (long-running reports)
  • Dashboard data (tolerates lag)
  • Backup queries (no impact on primary)

4. Indexing & Performance

B-Tree Indexes

Composite Indexes (tenant_id + column):

-- Users table
CREATE INDEX idx_users_tenant ON users(tenant_id);
CREATE INDEX idx_users_email ON users(tenant_id, email);
CREATE INDEX idx_users_username ON users(tenant_id, username);

-- License sessions table
CREATE INDEX idx_license_sessions_tenant ON license_sessions(tenant_id);
CREATE INDEX idx_license_sessions_status ON license_sessions(tenant_id, status);
CREATE INDEX idx_license_sessions_token ON license_sessions(session_token);

-- Projects table
CREATE INDEX idx_projects_tenant ON projects(tenant_id);
CREATE INDEX idx_projects_slug ON projects(tenant_id, slug);

Why Composite Indexes (tenant_id first)?

  • RLS policies always filter by tenant_id
  • Composite index (tenant_id, column) allows index-only scans
  • Significantly faster than table scans

Performance:

-- Without composite index (slow)
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'user@example.com';
-- Seq Scan on users (cost=0.00..1234.56 rows=1 width=...)

-- With composite index (fast)
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'user@example.com';
-- Index Scan using idx_users_email (cost=0.29..8.31 rows=1 width=...)

Partial Indexes

Indexes with WHERE clause (smaller, faster):

-- Index only active license sessions (most common query)
CREATE INDEX idx_license_sessions_expires ON license_sessions(expires_at)
WHERE status = 'active';

-- Index only active tenants (most common query)
CREATE INDEX idx_tenants_active ON tenants(status)
WHERE status = 'active';

Benefits:

  • Smaller index size (faster scans)
  • Only indexes relevant rows
  • Reduced maintenance overhead (no inactive rows indexed)

Statistics Collector (Auto-Analyze)

Purpose: Collect query statistics for query planner optimization.

Configuration:

ALTER SYSTEM SET track_counts = on;
ALTER SYSTEM SET track_functions = all;
ALTER SYSTEM SET track_activity_query_size = 4096;
ALTER SYSTEM SET track_io_timing = on;

-- Auto-analyze settings
ALTER SYSTEM SET autovacuum_analyze_threshold = 50;
ALTER SYSTEM SET autovacuum_analyze_scale_factor = 0.1;

Statistics Views:

-- Query performance
SELECT * FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10;

-- Index usage
SELECT * FROM pg_stat_user_indexes ORDER BY idx_scan DESC;

-- Table statistics
SELECT * FROM pg_stat_user_tables;

5. Backup & Recovery

Automated Backups

Configuration:

  • Frequency: Daily at 2 AM UTC
  • Retention: 7 days (configurable up to 365 days)
  • Window: 4-hour window (2 AM - 6 AM UTC)
  • Transaction logs: Continuous archiving for PITR

Backup Process:

1. Cloud SQL initiates backup at 2 AM UTC
2. Full database snapshot created
3. Compressed and uploaded to Cloud Storage
4. Previous backups rotated (7-day retention)
5. Backup verification (checksum)
6. Backup available for restore

Restore:

# List backups
gcloud sql backups list --instance=licenses-db

# Restore from specific backup
gcloud sql backups restore BACKUP_ID \
--backup-instance=licenses-db \
--instance=licenses-db-restored

Point-in-Time Recovery (PITR)

Purpose: Restore database to any point in time within retention period.

Configuration:

  • WAL Archiving: Enabled (continuous)
  • Retention: 7 days
  • Granularity: 1-second precision

PITR Process:

1. Base backup (daily at 2 AM)
2. WAL segments archived continuously to Cloud Storage
3. PITR: Restore base backup + replay WAL to target time
4. Result: Database state at exact target timestamp

Restore to Specific Time:

gcloud sql instances restore-backup licenses-db \
--backup-id=BACKUP_ID \
--pitr-target-time="2025-11-30T12:34:56Z"

Use Cases:

  • Accidental data deletion (restore before deletion)
  • Application bug (restore before bug deployed)
  • Corruption detection (restore to last known good state)

6. Monitoring & Observability

pg_stat_* Views

Key Statistics Views:

-- Table statistics
SELECT schemaname, relname, n_tup_ins, n_tup_upd, n_tup_del, n_live_tup, n_dead_tup
FROM pg_stat_user_tables
ORDER BY n_tup_ins DESC;

-- Index usage
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

-- Database connections
SELECT datname, numbackends, xact_commit, xact_rollback, blks_read, blks_hit
FROM pg_stat_database;

pg_stat_statements (Slow Query Tracking)

Enable Extension:

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

Query Analysis:

-- Top 10 slowest queries
SELECT
query,
calls,
total_exec_time / 1000 AS total_time_seconds,
mean_exec_time / 1000 AS mean_time_seconds,
max_exec_time / 1000 AS max_time_seconds
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

-- Most frequent queries
SELECT
query,
calls,
rows,
total_exec_time / calls AS avg_time_ms
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 10;

Cloud Monitoring Integration

Metrics Exported:

MetricDescriptionAlert Threshold
database/cpu/utilizationCPU usage>80%
database/memory/utilizationMemory usage>85%
database/disk/utilizationDisk usage>90%
database/connectionsActive connections>80 (of 100 max)
database/replication_lagStandby lag>30s
database/transaction_countTPS<100 (unusual drop)

Alerting:

# Monitoring alert policy (Cloud Monitoring)
alert_policy:
- name: "Database CPU High"
condition: database/cpu/utilization > 0.8
duration: 5m
notification_channels:
- email: ops@coditect.ai
- pagerduty: on-call-team

- name: "Replication Lag High"
condition: database/replication_lag > 30s
duration: 2m
notification_channels:
- pagerduty: on-call-team

7. Maintenance

Auto-Vacuum (MVCC Cleanup)

Purpose: Clean up dead tuples from UPDATE/DELETE operations.

Configuration:

-- Auto-vacuum settings
ALTER SYSTEM SET autovacuum = on;
ALTER SYSTEM SET autovacuum_max_workers = 3;
ALTER SYSTEM SET autovacuum_naptime = 60; # Check every 60s
ALTER SYSTEM SET autovacuum_vacuum_threshold = 50;
ALTER SYSTEM SET autovacuum_vacuum_scale_factor = 0.2;
ALTER SYSTEM SET autovacuum_vacuum_cost_delay = 20ms;

Dead Tuple Cleanup:

1. Table receives UPDATE (old row marked dead, new row inserted)
2. Dead tuples accumulate (20% of table or 50 rows)
3. Auto-vacuum triggered (background process)
4. Dead tuples cleaned up (space reclaimed)
5. Statistics updated (query planner optimization)

Monitoring:

-- Check for tables needing vacuum
SELECT schemaname, relname, n_dead_tup, n_live_tup,
n_dead_tup::float / (n_live_tup + 1) AS dead_ratio
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY dead_ratio DESC;

Reindex (Manual)

Purpose: Rebuild bloated indexes for optimal performance.

When to Reindex:

  • Index bloat >50%
  • Query performance degradation
  • After major data changes

Reindex Strategy:

-- Reindex concurrently (no table locking)
REINDEX INDEX CONCURRENTLY idx_users_email;

-- Reindex table (all indexes, requires lock)
REINDEX TABLE CONCURRENTLY users;

Bloat Detection:

-- Check index bloat
SELECT schemaname, tablename, indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
pg_size_pretty(pg_total_relation_size(indexrelid) - pg_relation_size(indexrelid)) AS bloat
FROM pg_stat_user_indexes
ORDER BY pg_total_relation_size(indexrelid) DESC
LIMIT 10;

Performance Optimization Examples

Query Performance (License Validation)

Query: Validate license session and check expiration

-- Set tenant context
SELECT set_current_tenant('tenant-uuid');

-- Query
SELECT * FROM license_sessions
WHERE session_token = 'session-uuid'
AND status = 'active'
AND expires_at > NOW();

Execution Plan:

Index Scan using idx_license_sessions_token on license_sessions
(cost=0.29..8.31 rows=1 width=...)
Index Cond: (session_token = 'session-uuid'::text)
Filter: ((status = 'active') AND (expires_at > now())
AND (tenant_id = current_setting('app.current_tenant_id')::UUID))
Planning Time: 0.156 ms
Execution Time: 0.087 ms

Performance:

  • Planning: 0.156 ms
  • Execution: 0.087 ms
  • Total: <1ms (well below 100ms target)

Index Selection (User Login)

Query: Login user by email

SELECT set_current_tenant('tenant-uuid');

SELECT * FROM users
WHERE email = 'user@example.com'
AND is_active = TRUE;

Execution Plan:

Index Scan using idx_users_email on users
(cost=0.29..8.31 rows=1 width=...)
Index Cond: ((tenant_id = current_setting('app.current_tenant_id')::UUID)
AND (email = 'user@example.com'::text))
Filter: (is_active = TRUE)
Planning Time: 0.123 ms
Execution Time: 0.065 ms

Performance:

  • Total: <1ms (excellent)
  • Index Used: Composite index (tenant_id, email)
  • Rows Scanned: 1 (optimal)

Disaster Recovery

Recovery Time Objective (RTO): <5 minutes

Scenario: Primary instance failure

  1. Detection: Cloud SQL health checks fail (30s)
  2. Failover: Standby promoted to primary (60s)
  3. DNS Update: Connection string updated (30s)
  4. Verification: Health checks pass (30s)
  5. Application Reconnect: Connection pool refreshes (60s)

Total RTO: <5 minutes


Recovery Point Objective (RPO): <1 second

Synchronous Replication:

  • Primary writes to standby before commit acknowledgment
  • Zero data loss on failover
  • RPO: 0 seconds (no data loss)

Asynchronous Replication (Read Replicas):

  • Primary commits before replicating to read replica
  • Potential data loss: <10s lag
  • RPO: <10 seconds


Last Updated: 2025-11-30 Diagram Type: C3 Component (Mermaid) Scope: PostgreSQL Database Layer - Cloud SQL