Skip to main content

PostgreSQL Patterns Skill

PostgreSQL Patterns Skill

When to Use This Skill

Use this skill when implementing postgresql patterns patterns in your codebase.

How to Use This Skill

  1. Review the patterns and examples below
  2. Apply the relevant patterns to your implementation
  3. Follow the best practices outlined in this skill

Advanced PostgreSQL patterns for production-grade database design, query optimization, and operational excellence.

Query Patterns

Common Table Expressions (CTEs)

-- Recursive CTE for hierarchical data
WITH RECURSIVE org_tree AS (
-- Base case: top-level managers
SELECT id, name, manager_id, 1 as level
FROM employees
WHERE manager_id IS NULL

UNION ALL

-- Recursive case: employees under managers
SELECT e.id, e.name, e.manager_id, ot.level + 1
FROM employees e
INNER JOIN org_tree ot ON e.manager_id = ot.id
)
SELECT * FROM org_tree ORDER BY level, name;

-- Materialized CTE for performance (PostgreSQL 12+)
WITH summary AS MATERIALIZED (
SELECT department_id, COUNT(*) as emp_count
FROM employees
GROUP BY department_id
)
SELECT d.name, s.emp_count
FROM departments d
JOIN summary s ON d.id = s.department_id;

Window Functions

-- Running totals and rankings
SELECT
order_date,
amount,
SUM(amount) OVER (ORDER BY order_date) as running_total,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) as order_rank,
LAG(amount) OVER (ORDER BY order_date) as prev_amount,
amount - LAG(amount) OVER (ORDER BY order_date) as change
FROM orders;

-- Percentile calculations
SELECT
department_id,
salary,
PERCENT_RANK() OVER (PARTITION BY department_id ORDER BY salary) as percentile,
NTILE(4) OVER (PARTITION BY department_id ORDER BY salary) as quartile
FROM employees;

UPSERT Patterns

-- INSERT ON CONFLICT (upsert)
INSERT INTO products (sku, name, price, updated_at)
VALUES ('ABC123', 'Widget', 29.99, NOW())
ON CONFLICT (sku)
DO UPDATE SET
name = EXCLUDED.name,
price = EXCLUDED.price,
updated_at = EXCLUDED.updated_at
WHERE products.price <> EXCLUDED.price;

-- Returning affected rows
INSERT INTO inventory (product_id, quantity)
VALUES (1, 100)
ON CONFLICT (product_id)
DO UPDATE SET quantity = inventory.quantity + EXCLUDED.quantity
RETURNING *;

Indexing Strategies

Index Types

-- B-tree (default, equality and range queries)
CREATE INDEX idx_users_email ON users (email);

-- Partial index (filtered subset)
CREATE INDEX idx_active_users ON users (email)
WHERE status = 'active';

-- Covering index (include non-key columns)
CREATE INDEX idx_orders_customer ON orders (customer_id)
INCLUDE (order_date, total);

-- Composite index (multi-column)
CREATE INDEX idx_orders_date_status ON orders (order_date DESC, status);

-- GIN index (arrays, JSONB, full-text)
CREATE INDEX idx_products_tags ON products USING GIN (tags);
CREATE INDEX idx_products_data ON products USING GIN (metadata jsonb_path_ops);

-- GiST index (geometric, range types)
CREATE INDEX idx_locations_coords ON locations USING GIST (coordinates);

-- BRIN index (large tables, naturally ordered)
CREATE INDEX idx_logs_created ON logs USING BRIN (created_at);

Index Maintenance

-- Check index usage
SELECT
schemaname,
relname,
indexrelname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

-- Reindex without blocking (PostgreSQL 12+)
REINDEX INDEX CONCURRENTLY idx_users_email;

-- Check bloat
SELECT
schemaname, tablename,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) as size,
n_dead_tup,
n_live_tup,
round(n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0), 2) as dead_ratio
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

Partitioning

Range Partitioning

-- Create partitioned table
CREATE TABLE orders (
id BIGSERIAL,
customer_id INTEGER,
order_date DATE NOT NULL,
total DECIMAL(10,2),
PRIMARY KEY (id, order_date)
) PARTITION BY RANGE (order_date);

-- Create partitions
CREATE TABLE orders_2024_q1 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');

CREATE TABLE orders_2024_q2 PARTITION OF orders
FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');

-- Default partition for unmatched data
CREATE TABLE orders_default PARTITION OF orders DEFAULT;

-- Automatic partition creation script
DO $$
DECLARE
start_date DATE := '2024-01-01';
end_date DATE;
partition_name TEXT;
BEGIN
FOR i IN 0..11 LOOP
end_date := start_date + INTERVAL '1 month';
partition_name := 'orders_' || TO_CHAR(start_date, 'YYYY_MM');

EXECUTE format(
'CREATE TABLE IF NOT EXISTS %I PARTITION OF orders
FOR VALUES FROM (%L) TO (%L)',
partition_name, start_date, end_date
);

start_date := end_date;
END LOOP;
END $$;

List Partitioning

CREATE TABLE events (
id BIGSERIAL,
event_type VARCHAR(50) NOT NULL,
data JSONB,
created_at TIMESTAMP DEFAULT NOW(),
PRIMARY KEY (id, event_type)
) PARTITION BY LIST (event_type);

CREATE TABLE events_user PARTITION OF events
FOR VALUES IN ('user.created', 'user.updated', 'user.deleted');

CREATE TABLE events_order PARTITION OF events
FOR VALUES IN ('order.placed', 'order.shipped', 'order.delivered');

Connection Pooling

PgBouncer Configuration

; pgbouncer.ini
[databases]
mydb = host=localhost port=5432 dbname=mydb

[pgbouncer]
listen_addr = *
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt

; Pool settings
pool_mode = transaction
default_pool_size = 20
min_pool_size = 5
reserve_pool_size = 5
reserve_pool_timeout = 3

max_client_conn = 1000
max_db_connections = 50

; Timeouts
server_idle_timeout = 600
server_lifetime = 3600
query_timeout = 300

Application Connection Settings

# SQLAlchemy connection pool
from sqlalchemy import create_engine

engine = create_engine(
"postgresql://user:pass@localhost:6432/mydb",
pool_size=10,
max_overflow=20,
pool_pre_ping=True,
pool_recycle=3600,
connect_args={
"connect_timeout": 10,
"application_name": "my_app"
}
)

JSON/JSONB Operations

JSONB Queries

-- Extract values
SELECT
data->>'name' as name,
data->'address'->>'city' as city,
(data->>'age')::int as age
FROM users;

-- JSON path queries (PostgreSQL 12+)
SELECT * FROM products
WHERE metadata @@ '$.tags[*] == "featured"';

SELECT jsonb_path_query_array(
metadata,
'$.variants[*] ? (@.price < 100)'
) as cheap_variants
FROM products;

-- Update JSONB
UPDATE users
SET data = jsonb_set(data, '{address,zip}', '"12345"')
WHERE id = 1;

-- Append to array
UPDATE products
SET tags = tags || '["new"]'::jsonb
WHERE id = 1;

JSONB Indexing

-- GIN index for containment queries (@>)
CREATE INDEX idx_data ON users USING GIN (data);

-- Optimized for specific path queries
CREATE INDEX idx_data_ops ON users USING GIN (data jsonb_path_ops);

-- Expression index for specific field
CREATE INDEX idx_data_email ON users ((data->>'email'));

Backup & Recovery

Logical Backup

# Full database dump
pg_dump -Fc -f backup.dump mydb

# Schema only
pg_dump -s -f schema.sql mydb

# Data only for specific tables
pg_dump -t 'orders*' --data-only mydb > orders_data.sql

# Parallel dump (PostgreSQL 9.3+)
pg_dump -Fd -j 4 -f backup_dir mydb

Point-in-Time Recovery

# postgresql.conf
archive_mode = on
archive_command = 'cp %p /archive/%f'

# recovery.conf (PostgreSQL < 12) or postgresql.conf
restore_command = 'cp /archive/%f %p'
recovery_target_time = '2024-01-15 14:30:00'
recovery_target_action = 'promote'

Monitoring Queries

Performance Statistics

-- Slow queries
SELECT
calls,
round(total_exec_time::numeric, 2) as total_time_ms,
round(mean_exec_time::numeric, 2) as mean_time_ms,
query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

-- Table statistics
SELECT
schemaname,
relname,
seq_scan,
idx_scan,
n_tup_ins,
n_tup_upd,
n_tup_del,
n_live_tup,
n_dead_tup,
last_vacuum,
last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

-- Active connections
SELECT
pid,
usename,
application_name,
client_addr,
state,
query_start,
query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY query_start;

Lock Monitoring

-- Blocked queries
SELECT
blocked.pid as blocked_pid,
blocked.query as blocked_query,
blocking.pid as blocking_pid,
blocking.query as blocking_query
FROM pg_stat_activity blocked
JOIN pg_locks blocked_locks ON blocked.pid = blocked_locks.pid
JOIN pg_locks blocking_locks ON blocked_locks.locktype = blocking_locks.locktype
AND blocked_locks.relation = blocking_locks.relation
AND blocked_locks.pid != blocking_locks.pid
JOIN pg_stat_activity blocking ON blocking_locks.pid = blocking.pid
WHERE NOT blocked_locks.granted;

High Availability

Streaming Replication Setup

-- On primary
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secret';

-- pg_hba.conf
-- host replication replicator replica_ip/32 md5

-- On replica
-- pg_basebackup -h primary_ip -D /var/lib/postgresql/data -U replicator -P -R

Logical Replication

-- On publisher
CREATE PUBLICATION my_pub FOR TABLE orders, customers;

-- On subscriber
CREATE SUBSCRIPTION my_sub
CONNECTION 'host=publisher_ip dbname=mydb user=replicator'
PUBLICATION my_pub;

Success Output

When this skill is successfully applied, you should see:

✅ SKILL COMPLETE: postgresql-patterns

Completed:
- [x] Database schema designed with proper normalization
- [x] Indexes created (B-tree, GIN, partial indexes as needed)
- [x] Query optimization applied with measurable improvements
- [x] Monitoring queries configured for performance tracking
- [x] Backup/recovery procedures implemented

Outputs:
- Schema DDL files with table definitions
- Index creation statements with justification
- Optimized query versions with EXPLAIN ANALYZE results
- Monitoring dashboard queries (pg_stat_statements, locks, bloat)
- Documentation of design decisions and tradeoffs

Performance Metrics:
- Query execution time improvements (before/after)
- Index usage statistics
- Table/index size optimizations

Completion Checklist

Before marking this skill as complete, verify:

  • Schema follows normalization principles (3NF minimum)
  • Primary keys and foreign keys defined with proper constraints
  • Indexes created for all foreign keys and frequent query patterns
  • Partial indexes used where appropriate to reduce index size
  • JSONB fields indexed with GIN where queried
  • Partitioning strategy implemented for large tables (>10M rows)
  • Query performance measured with EXPLAIN ANALYZE
  • Connection pooling configured (PgBouncer or application-level)
  • Backup procedures tested and documented
  • Monitoring queries added to observability stack

Failure Indicators

This skill has FAILED if:

  • ❌ Schema has redundant data or update anomalies (poor normalization)
  • ❌ Missing indexes on foreign keys causing sequential scans
  • ❌ Queries still slow after optimization (no measurable improvement)
  • ❌ Index bloat >30% without maintenance plan
  • ❌ No connection pooling (application making 100+ direct connections)
  • ❌ Backup procedures untested or missing
  • ❌ No monitoring of slow queries or connection exhaustion
  • ❌ Partitioning strategy doesn't align with query patterns
  • ❌ JSONB queries without appropriate GIN indexes

When NOT to Use

Do NOT use this skill when:

  • Simple SQLite suffices - For single-user applications or <10K rows, PostgreSQL is overkill
  • No complex queries - If just key-value lookups, use Redis or similar
  • Embedded scenarios - Mobile apps should use SQLite, not PostgreSQL
  • Serverless functions - Cold start overhead makes PostgreSQL connections expensive (use connection pooling service or Aurora Serverless)
  • Real-time analytics - For sub-second aggregations on billions of rows, use ClickHouse or TimescaleDB instead
  • Graph queries dominant - If mostly traversing relationships, use Neo4j or similar graph database
  • Document store needs - If schema-free JSON documents without relational joins, use MongoDB

Use alternatives:

  • High-velocity time-series → TimescaleDB (PostgreSQL extension) or InfluxDB
  • Distributed scale-out → CockroachDB or YugabyteDB (PostgreSQL-compatible)
  • Graph relationships → Neo4j, AWS Neptune
  • Full-text search → Elasticsearch, Meilisearch

Anti-Patterns (Avoid)

Anti-PatternProblemSolution
No indexes on foreign keysSequential scans on JOINs, terrible performanceAlways index foreign keys: CREATE INDEX idx_orders_customer ON orders(customer_id)
SELECT * in productionFetches unnecessary columns, wastes bandwidthExplicitly list columns: SELECT id, name, email FROM users
N+1 query problem1 query for list + N queries for detailsUse JOINs or CTEs to fetch related data in single query
No connection poolingApplication creates 1000s of connections, exhausts PostgreSQLUse PgBouncer or application-level pooling (SQLAlchemy pool_size=10)
Over-normalizationExcessive JOINs for simple queriesDenormalize hot paths, use materialized views for aggregates
Wrong index typeB-tree on JSONB, GIN on equality checksMatch index to query: GIN for JSONB/arrays, B-tree for equality/range
No vacuum strategyTable bloat, dead tuples accumulateConfigure autovacuum or schedule manual VACUUM ANALYZE
Unbounded queriesSELECT * FROM logs fetches millions of rowsAlways use LIMIT/OFFSET or cursor pagination
No query timeoutRunaway queries block connections for hoursSet statement_timeout = '30s' in postgresql.conf
Storing files in databaseBLOBs cause bloat, slow backupsStore files in object storage (S3), keep URLs in database

Principles

This skill embodies CODITECT automation principles:

#1 Recycle → Extend → Re-Use → Create

  • Recycle existing patterns from pg_stat_statements analysis - optimize what's actually slow
  • Extend with indexes rather than rewriting queries when possible
  • Re-use proven patterns (window functions, CTEs, partitioning) from this skill
  • Create custom solutions only when standard patterns don't fit

#2 First Principles Thinking

  • Understand query execution - Use EXPLAIN ANALYZE to see actual plans
  • Know index tradeoffs - Indexes speed reads but slow writes and consume space
  • Measure before optimizing - Premature optimization based on assumptions wastes time

#5 Eliminate Ambiguity

  • Explicit schema constraints - NOT NULL, CHECK, UNIQUE prevent invalid data
  • Clear index naming - idx_tablename_columnname shows purpose at a glance
  • Document partitioning strategy - Future maintainers understand why tables are partitioned

#6 Clear, Understandable, Explainable

  • Comment complex queries - Explain why CTEs or window functions are needed
  • Document index rationale - Why GIN vs B-tree, why partial index, etc.
  • Provide monitoring queries - Team can diagnose issues without expert help

#8 No Assumptions

  • Test backup restoration - Don't assume pg_dump works until you've restored it
  • Verify replication lag - Monitor streaming replication, don't assume zero lag
  • Check index usage - Use pg_stat_user_indexes to confirm indexes are actually used

#10 Automation First

  • Automate partition creation - Use DO blocks or cron to create monthly partitions
  • Automate monitoring - pg_stat_statements + Prometheus alerts for slow queries
  • Automate backups - Scheduled pg_dump or WAL archiving to cloud storage

Usage Examples

Design Database Schema

Apply postgresql-patterns skill to design normalized schema for e-commerce platform with proper indexing and partitioning strategy

Optimize Slow Queries

Apply postgresql-patterns skill to analyze and optimize slow queries identified in pg_stat_statements

Setup High Availability

Apply postgresql-patterns skill to configure streaming replication with automatic failover for production database