PostgreSQL Patterns Skill
PostgreSQL Patterns Skill
When to Use This Skill
Use this skill when implementing postgresql patterns patterns in your codebase.
How to Use This Skill
- Review the patterns and examples below
- Apply the relevant patterns to your implementation
- Follow the best practices outlined in this skill
Advanced PostgreSQL patterns for production-grade database design, query optimization, and operational excellence.
Query Patterns
Common Table Expressions (CTEs)
-- Recursive CTE for hierarchical data
WITH RECURSIVE org_tree AS (
-- Base case: top-level managers
SELECT id, name, manager_id, 1 as level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive case: employees under managers
SELECT e.id, e.name, e.manager_id, ot.level + 1
FROM employees e
INNER JOIN org_tree ot ON e.manager_id = ot.id
)
SELECT * FROM org_tree ORDER BY level, name;
-- Materialized CTE for performance (PostgreSQL 12+)
WITH summary AS MATERIALIZED (
SELECT department_id, COUNT(*) as emp_count
FROM employees
GROUP BY department_id
)
SELECT d.name, s.emp_count
FROM departments d
JOIN summary s ON d.id = s.department_id;
Window Functions
-- Running totals and rankings
SELECT
order_date,
amount,
SUM(amount) OVER (ORDER BY order_date) as running_total,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) as order_rank,
LAG(amount) OVER (ORDER BY order_date) as prev_amount,
amount - LAG(amount) OVER (ORDER BY order_date) as change
FROM orders;
-- Percentile calculations
SELECT
department_id,
salary,
PERCENT_RANK() OVER (PARTITION BY department_id ORDER BY salary) as percentile,
NTILE(4) OVER (PARTITION BY department_id ORDER BY salary) as quartile
FROM employees;
UPSERT Patterns
-- INSERT ON CONFLICT (upsert)
INSERT INTO products (sku, name, price, updated_at)
VALUES ('ABC123', 'Widget', 29.99, NOW())
ON CONFLICT (sku)
DO UPDATE SET
name = EXCLUDED.name,
price = EXCLUDED.price,
updated_at = EXCLUDED.updated_at
WHERE products.price <> EXCLUDED.price;
-- Returning affected rows
INSERT INTO inventory (product_id, quantity)
VALUES (1, 100)
ON CONFLICT (product_id)
DO UPDATE SET quantity = inventory.quantity + EXCLUDED.quantity
RETURNING *;
Indexing Strategies
Index Types
-- B-tree (default, equality and range queries)
CREATE INDEX idx_users_email ON users (email);
-- Partial index (filtered subset)
CREATE INDEX idx_active_users ON users (email)
WHERE status = 'active';
-- Covering index (include non-key columns)
CREATE INDEX idx_orders_customer ON orders (customer_id)
INCLUDE (order_date, total);
-- Composite index (multi-column)
CREATE INDEX idx_orders_date_status ON orders (order_date DESC, status);
-- GIN index (arrays, JSONB, full-text)
CREATE INDEX idx_products_tags ON products USING GIN (tags);
CREATE INDEX idx_products_data ON products USING GIN (metadata jsonb_path_ops);
-- GiST index (geometric, range types)
CREATE INDEX idx_locations_coords ON locations USING GIST (coordinates);
-- BRIN index (large tables, naturally ordered)
CREATE INDEX idx_logs_created ON logs USING BRIN (created_at);
Index Maintenance
-- Check index usage
SELECT
schemaname,
relname,
indexrelname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;
-- Reindex without blocking (PostgreSQL 12+)
REINDEX INDEX CONCURRENTLY idx_users_email;
-- Check bloat
SELECT
schemaname, tablename,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) as size,
n_dead_tup,
n_live_tup,
round(n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0), 2) as dead_ratio
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;
Partitioning
Range Partitioning
-- Create partitioned table
CREATE TABLE orders (
id BIGSERIAL,
customer_id INTEGER,
order_date DATE NOT NULL,
total DECIMAL(10,2),
PRIMARY KEY (id, order_date)
) PARTITION BY RANGE (order_date);
-- Create partitions
CREATE TABLE orders_2024_q1 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
CREATE TABLE orders_2024_q2 PARTITION OF orders
FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');
-- Default partition for unmatched data
CREATE TABLE orders_default PARTITION OF orders DEFAULT;
-- Automatic partition creation script
DO $$
DECLARE
start_date DATE := '2024-01-01';
end_date DATE;
partition_name TEXT;
BEGIN
FOR i IN 0..11 LOOP
end_date := start_date + INTERVAL '1 month';
partition_name := 'orders_' || TO_CHAR(start_date, 'YYYY_MM');
EXECUTE format(
'CREATE TABLE IF NOT EXISTS %I PARTITION OF orders
FOR VALUES FROM (%L) TO (%L)',
partition_name, start_date, end_date
);
start_date := end_date;
END LOOP;
END $$;
List Partitioning
CREATE TABLE events (
id BIGSERIAL,
event_type VARCHAR(50) NOT NULL,
data JSONB,
created_at TIMESTAMP DEFAULT NOW(),
PRIMARY KEY (id, event_type)
) PARTITION BY LIST (event_type);
CREATE TABLE events_user PARTITION OF events
FOR VALUES IN ('user.created', 'user.updated', 'user.deleted');
CREATE TABLE events_order PARTITION OF events
FOR VALUES IN ('order.placed', 'order.shipped', 'order.delivered');
Connection Pooling
PgBouncer Configuration
; pgbouncer.ini
[databases]
mydb = host=localhost port=5432 dbname=mydb
[pgbouncer]
listen_addr = *
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
; Pool settings
pool_mode = transaction
default_pool_size = 20
min_pool_size = 5
reserve_pool_size = 5
reserve_pool_timeout = 3
max_client_conn = 1000
max_db_connections = 50
; Timeouts
server_idle_timeout = 600
server_lifetime = 3600
query_timeout = 300
Application Connection Settings
# SQLAlchemy connection pool
from sqlalchemy import create_engine
engine = create_engine(
"postgresql://user:pass@localhost:6432/mydb",
pool_size=10,
max_overflow=20,
pool_pre_ping=True,
pool_recycle=3600,
connect_args={
"connect_timeout": 10,
"application_name": "my_app"
}
)
JSON/JSONB Operations
JSONB Queries
-- Extract values
SELECT
data->>'name' as name,
data->'address'->>'city' as city,
(data->>'age')::int as age
FROM users;
-- JSON path queries (PostgreSQL 12+)
SELECT * FROM products
WHERE metadata @@ '$.tags[*] == "featured"';
SELECT jsonb_path_query_array(
metadata,
'$.variants[*] ? (@.price < 100)'
) as cheap_variants
FROM products;
-- Update JSONB
UPDATE users
SET data = jsonb_set(data, '{address,zip}', '"12345"')
WHERE id = 1;
-- Append to array
UPDATE products
SET tags = tags || '["new"]'::jsonb
WHERE id = 1;
JSONB Indexing
-- GIN index for containment queries (@>)
CREATE INDEX idx_data ON users USING GIN (data);
-- Optimized for specific path queries
CREATE INDEX idx_data_ops ON users USING GIN (data jsonb_path_ops);
-- Expression index for specific field
CREATE INDEX idx_data_email ON users ((data->>'email'));
Backup & Recovery
Logical Backup
# Full database dump
pg_dump -Fc -f backup.dump mydb
# Schema only
pg_dump -s -f schema.sql mydb
# Data only for specific tables
pg_dump -t 'orders*' --data-only mydb > orders_data.sql
# Parallel dump (PostgreSQL 9.3+)
pg_dump -Fd -j 4 -f backup_dir mydb
Point-in-Time Recovery
# postgresql.conf
archive_mode = on
archive_command = 'cp %p /archive/%f'
# recovery.conf (PostgreSQL < 12) or postgresql.conf
restore_command = 'cp /archive/%f %p'
recovery_target_time = '2024-01-15 14:30:00'
recovery_target_action = 'promote'
Monitoring Queries
Performance Statistics
-- Slow queries
SELECT
calls,
round(total_exec_time::numeric, 2) as total_time_ms,
round(mean_exec_time::numeric, 2) as mean_time_ms,
query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
-- Table statistics
SELECT
schemaname,
relname,
seq_scan,
idx_scan,
n_tup_ins,
n_tup_upd,
n_tup_del,
n_live_tup,
n_dead_tup,
last_vacuum,
last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;
-- Active connections
SELECT
pid,
usename,
application_name,
client_addr,
state,
query_start,
query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY query_start;
Lock Monitoring
-- Blocked queries
SELECT
blocked.pid as blocked_pid,
blocked.query as blocked_query,
blocking.pid as blocking_pid,
blocking.query as blocking_query
FROM pg_stat_activity blocked
JOIN pg_locks blocked_locks ON blocked.pid = blocked_locks.pid
JOIN pg_locks blocking_locks ON blocked_locks.locktype = blocking_locks.locktype
AND blocked_locks.relation = blocking_locks.relation
AND blocked_locks.pid != blocking_locks.pid
JOIN pg_stat_activity blocking ON blocking_locks.pid = blocking.pid
WHERE NOT blocked_locks.granted;
High Availability
Streaming Replication Setup
-- On primary
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secret';
-- pg_hba.conf
-- host replication replicator replica_ip/32 md5
-- On replica
-- pg_basebackup -h primary_ip -D /var/lib/postgresql/data -U replicator -P -R
Logical Replication
-- On publisher
CREATE PUBLICATION my_pub FOR TABLE orders, customers;
-- On subscriber
CREATE SUBSCRIPTION my_sub
CONNECTION 'host=publisher_ip dbname=mydb user=replicator'
PUBLICATION my_pub;
Success Output
When this skill is successfully applied, you should see:
✅ SKILL COMPLETE: postgresql-patterns
Completed:
- [x] Database schema designed with proper normalization
- [x] Indexes created (B-tree, GIN, partial indexes as needed)
- [x] Query optimization applied with measurable improvements
- [x] Monitoring queries configured for performance tracking
- [x] Backup/recovery procedures implemented
Outputs:
- Schema DDL files with table definitions
- Index creation statements with justification
- Optimized query versions with EXPLAIN ANALYZE results
- Monitoring dashboard queries (pg_stat_statements, locks, bloat)
- Documentation of design decisions and tradeoffs
Performance Metrics:
- Query execution time improvements (before/after)
- Index usage statistics
- Table/index size optimizations
Completion Checklist
Before marking this skill as complete, verify:
- Schema follows normalization principles (3NF minimum)
- Primary keys and foreign keys defined with proper constraints
- Indexes created for all foreign keys and frequent query patterns
- Partial indexes used where appropriate to reduce index size
- JSONB fields indexed with GIN where queried
- Partitioning strategy implemented for large tables (>10M rows)
- Query performance measured with EXPLAIN ANALYZE
- Connection pooling configured (PgBouncer or application-level)
- Backup procedures tested and documented
- Monitoring queries added to observability stack
Failure Indicators
This skill has FAILED if:
- ❌ Schema has redundant data or update anomalies (poor normalization)
- ❌ Missing indexes on foreign keys causing sequential scans
- ❌ Queries still slow after optimization (no measurable improvement)
- ❌ Index bloat >30% without maintenance plan
- ❌ No connection pooling (application making 100+ direct connections)
- ❌ Backup procedures untested or missing
- ❌ No monitoring of slow queries or connection exhaustion
- ❌ Partitioning strategy doesn't align with query patterns
- ❌ JSONB queries without appropriate GIN indexes
When NOT to Use
Do NOT use this skill when:
- Simple SQLite suffices - For single-user applications or <10K rows, PostgreSQL is overkill
- No complex queries - If just key-value lookups, use Redis or similar
- Embedded scenarios - Mobile apps should use SQLite, not PostgreSQL
- Serverless functions - Cold start overhead makes PostgreSQL connections expensive (use connection pooling service or Aurora Serverless)
- Real-time analytics - For sub-second aggregations on billions of rows, use ClickHouse or TimescaleDB instead
- Graph queries dominant - If mostly traversing relationships, use Neo4j or similar graph database
- Document store needs - If schema-free JSON documents without relational joins, use MongoDB
Use alternatives:
- High-velocity time-series → TimescaleDB (PostgreSQL extension) or InfluxDB
- Distributed scale-out → CockroachDB or YugabyteDB (PostgreSQL-compatible)
- Graph relationships → Neo4j, AWS Neptune
- Full-text search → Elasticsearch, Meilisearch
Anti-Patterns (Avoid)
| Anti-Pattern | Problem | Solution |
|---|---|---|
| No indexes on foreign keys | Sequential scans on JOINs, terrible performance | Always index foreign keys: CREATE INDEX idx_orders_customer ON orders(customer_id) |
| SELECT * in production | Fetches unnecessary columns, wastes bandwidth | Explicitly list columns: SELECT id, name, email FROM users |
| N+1 query problem | 1 query for list + N queries for details | Use JOINs or CTEs to fetch related data in single query |
| No connection pooling | Application creates 1000s of connections, exhausts PostgreSQL | Use PgBouncer or application-level pooling (SQLAlchemy pool_size=10) |
| Over-normalization | Excessive JOINs for simple queries | Denormalize hot paths, use materialized views for aggregates |
| Wrong index type | B-tree on JSONB, GIN on equality checks | Match index to query: GIN for JSONB/arrays, B-tree for equality/range |
| No vacuum strategy | Table bloat, dead tuples accumulate | Configure autovacuum or schedule manual VACUUM ANALYZE |
| Unbounded queries | SELECT * FROM logs fetches millions of rows | Always use LIMIT/OFFSET or cursor pagination |
| No query timeout | Runaway queries block connections for hours | Set statement_timeout = '30s' in postgresql.conf |
| Storing files in database | BLOBs cause bloat, slow backups | Store files in object storage (S3), keep URLs in database |
Principles
This skill embodies CODITECT automation principles:
#1 Recycle → Extend → Re-Use → Create
- Recycle existing patterns from pg_stat_statements analysis - optimize what's actually slow
- Extend with indexes rather than rewriting queries when possible
- Re-use proven patterns (window functions, CTEs, partitioning) from this skill
- Create custom solutions only when standard patterns don't fit
#2 First Principles Thinking
- Understand query execution - Use EXPLAIN ANALYZE to see actual plans
- Know index tradeoffs - Indexes speed reads but slow writes and consume space
- Measure before optimizing - Premature optimization based on assumptions wastes time
#5 Eliminate Ambiguity
- Explicit schema constraints - NOT NULL, CHECK, UNIQUE prevent invalid data
- Clear index naming -
idx_tablename_columnnameshows purpose at a glance - Document partitioning strategy - Future maintainers understand why tables are partitioned
#6 Clear, Understandable, Explainable
- Comment complex queries - Explain why CTEs or window functions are needed
- Document index rationale - Why GIN vs B-tree, why partial index, etc.
- Provide monitoring queries - Team can diagnose issues without expert help
#8 No Assumptions
- Test backup restoration - Don't assume pg_dump works until you've restored it
- Verify replication lag - Monitor streaming replication, don't assume zero lag
- Check index usage - Use pg_stat_user_indexes to confirm indexes are actually used
#10 Automation First
- Automate partition creation - Use DO blocks or cron to create monthly partitions
- Automate monitoring - pg_stat_statements + Prometheus alerts for slow queries
- Automate backups - Scheduled pg_dump or WAL archiving to cloud storage
Usage Examples
Design Database Schema
Apply postgresql-patterns skill to design normalized schema for e-commerce platform with proper indexing and partitioning strategy
Optimize Slow Queries
Apply postgresql-patterns skill to analyze and optimize slow queries identified in pg_stat_statements
Setup High Availability
Apply postgresql-patterns skill to configure streaming replication with automatic failover for production database