PostgreSQL Patterns Skill

When to Use This Skill

Use this skill when implementing postgresql patterns patterns in your codebase.

How to Use This Skill

Review the patterns and examples below
Apply the relevant patterns to your implementation
Follow the best practices outlined in this skill

Advanced PostgreSQL patterns for production-grade database design, query optimization, and operational excellence.

Query Patterns

Common Table Expressions (CTEs)

-- Recursive CTE for hierarchical data
WITH RECURSIVE org_tree AS (
    -- Base case: top-level managers
    SELECT id, name, manager_id, 1 as level
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive case: employees under managers
    SELECT e.id, e.name, e.manager_id, ot.level + 1
    FROM employees e
    INNER JOIN org_tree ot ON e.manager_id = ot.id
)
SELECT * FROM org_tree ORDER BY level, name;

-- Materialized CTE for performance (PostgreSQL 12+)
WITH summary AS MATERIALIZED (
    SELECT department_id, COUNT(*) as emp_count
    FROM employees
    GROUP BY department_id
)
SELECT d.name, s.emp_count
FROM departments d
JOIN summary s ON d.id = s.department_id;

Window Functions

-- Running totals and rankings
SELECT
    order_date,
    amount,
    SUM(amount) OVER (ORDER BY order_date) as running_total,
    ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) as order_rank,
    LAG(amount) OVER (ORDER BY order_date) as prev_amount,
    amount - LAG(amount) OVER (ORDER BY order_date) as change
FROM orders;

-- Percentile calculations
SELECT
    department_id,
    salary,
    PERCENT_RANK() OVER (PARTITION BY department_id ORDER BY salary) as percentile,
    NTILE(4) OVER (PARTITION BY department_id ORDER BY salary) as quartile
FROM employees;

UPSERT Patterns

-- INSERT ON CONFLICT (upsert)
INSERT INTO products (sku, name, price, updated_at)
VALUES ('ABC123', 'Widget', 29.99, NOW())
ON CONFLICT (sku)
DO UPDATE SET
    name = EXCLUDED.name,
    price = EXCLUDED.price,
    updated_at = EXCLUDED.updated_at
WHERE products.price <> EXCLUDED.price;

-- Returning affected rows
INSERT INTO inventory (product_id, quantity)
VALUES (1, 100)
ON CONFLICT (product_id)
DO UPDATE SET quantity = inventory.quantity + EXCLUDED.quantity
RETURNING *;

Indexing Strategies

Index Types

-- B-tree (default, equality and range queries)
CREATE INDEX idx_users_email ON users (email);

-- Partial index (filtered subset)
CREATE INDEX idx_active_users ON users (email)
WHERE status = 'active';

-- Covering index (include non-key columns)
CREATE INDEX idx_orders_customer ON orders (customer_id)
INCLUDE (order_date, total);

-- Composite index (multi-column)
CREATE INDEX idx_orders_date_status ON orders (order_date DESC, status);

-- GIN index (arrays, JSONB, full-text)
CREATE INDEX idx_products_tags ON products USING GIN (tags);
CREATE INDEX idx_products_data ON products USING GIN (metadata jsonb_path_ops);

-- GiST index (geometric, range types)
CREATE INDEX idx_locations_coords ON locations USING GIST (coordinates);

-- BRIN index (large tables, naturally ordered)
CREATE INDEX idx_logs_created ON logs USING BRIN (created_at);

Index Maintenance

-- Check index usage
SELECT
    schemaname,
    relname,
    indexrelname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

-- Reindex without blocking (PostgreSQL 12+)
REINDEX INDEX CONCURRENTLY idx_users_email;

-- Check bloat
SELECT
    schemaname, tablename,
    pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) as size,
    n_dead_tup,
    n_live_tup,
    round(n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0), 2) as dead_ratio
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

Partitioning

Range Partitioning

-- Create partitioned table
CREATE TABLE orders (
    id BIGSERIAL,
    customer_id INTEGER,
    order_date DATE NOT NULL,
    total DECIMAL(10,2),
    PRIMARY KEY (id, order_date)
) PARTITION BY RANGE (order_date);

-- Create partitions
CREATE TABLE orders_2024_q1 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');

CREATE TABLE orders_2024_q2 PARTITION OF orders
    FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');

-- Default partition for unmatched data
CREATE TABLE orders_default PARTITION OF orders DEFAULT;

-- Automatic partition creation script
DO $$
DECLARE
    start_date DATE := '2024-01-01';
    end_date DATE;
    partition_name TEXT;
BEGIN
    FOR i IN 0..11 LOOP
        end_date := start_date + INTERVAL '1 month';
        partition_name := 'orders_' || TO_CHAR(start_date, 'YYYY_MM');

        EXECUTE format(
            'CREATE TABLE IF NOT EXISTS %I PARTITION OF orders
             FOR VALUES FROM (%L) TO (%L)',
            partition_name, start_date, end_date
        );

        start_date := end_date;
    END LOOP;
END $$;

List Partitioning

CREATE TABLE events (
    id BIGSERIAL,
    event_type VARCHAR(50) NOT NULL,
    data JSONB,
    created_at TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY (id, event_type)
) PARTITION BY LIST (event_type);

CREATE TABLE events_user PARTITION OF events
    FOR VALUES IN ('user.created', 'user.updated', 'user.deleted');

CREATE TABLE events_order PARTITION OF events
    FOR VALUES IN ('order.placed', 'order.shipped', 'order.delivered');

Connection Pooling

PgBouncer Configuration

; pgbouncer.ini
[databases]
mydb = host=localhost port=5432 dbname=mydb

[pgbouncer]
listen_addr = *
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt

; Pool settings
pool_mode = transaction
default_pool_size = 20
min_pool_size = 5
reserve_pool_size = 5
reserve_pool_timeout = 3

max_client_conn = 1000
max_db_connections = 50

; Timeouts
server_idle_timeout = 600
server_lifetime = 3600
query_timeout = 300

Application Connection Settings

# SQLAlchemy connection pool
from sqlalchemy import create_engine

engine = create_engine(
    "postgresql://user:pass@localhost:6432/mydb",
    pool_size=10,
    max_overflow=20,
    pool_pre_ping=True,
    pool_recycle=3600,
    connect_args={
        "connect_timeout": 10,
        "application_name": "my_app"
    }
)

JSON/JSONB Operations

JSONB Queries

-- Extract values
SELECT
    data->>'name' as name,
    data->'address'->>'city' as city,
    (data->>'age')::int as age
FROM users;

-- JSON path queries (PostgreSQL 12+)
SELECT * FROM products
WHERE metadata @@ '$.tags[*] == "featured"';

SELECT jsonb_path_query_array(
    metadata,
    '$.variants[*] ? (@.price < 100)'
) as cheap_variants
FROM products;

-- Update JSONB
UPDATE users
SET data = jsonb_set(data, '{address,zip}', '"12345"')
WHERE id = 1;

-- Append to array
UPDATE products
SET tags = tags || '["new"]'::jsonb
WHERE id = 1;

JSONB Indexing

-- GIN index for containment queries (@>)
CREATE INDEX idx_data ON users USING GIN (data);

-- Optimized for specific path queries
CREATE INDEX idx_data_ops ON users USING GIN (data jsonb_path_ops);

-- Expression index for specific field
CREATE INDEX idx_data_email ON users ((data->>'email'));

Backup & Recovery

Logical Backup

# Full database dump
pg_dump -Fc -f backup.dump mydb

# Schema only
pg_dump -s -f schema.sql mydb

# Data only for specific tables
pg_dump -t 'orders*' --data-only mydb > orders_data.sql

# Parallel dump (PostgreSQL 9.3+)
pg_dump -Fd -j 4 -f backup_dir mydb

Point-in-Time Recovery

# postgresql.conf
archive_mode = on
archive_command = 'cp %p /archive/%f'

# recovery.conf (PostgreSQL < 12) or postgresql.conf
restore_command = 'cp /archive/%f %p'
recovery_target_time = '2024-01-15 14:30:00'
recovery_target_action = 'promote'

Monitoring Queries

Performance Statistics

-- Slow queries
SELECT
    calls,
    round(total_exec_time::numeric, 2) as total_time_ms,
    round(mean_exec_time::numeric, 2) as mean_time_ms,
    query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

-- Table statistics
SELECT
    schemaname,
    relname,
    seq_scan,
    idx_scan,
    n_tup_ins,
    n_tup_upd,
    n_tup_del,
    n_live_tup,
    n_dead_tup,
    last_vacuum,
    last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

-- Active connections
SELECT
    pid,
    usename,
    application_name,
    client_addr,
    state,
    query_start,
    query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY query_start;

Lock Monitoring

-- Blocked queries
SELECT
    blocked.pid as blocked_pid,
    blocked.query as blocked_query,
    blocking.pid as blocking_pid,
    blocking.query as blocking_query
FROM pg_stat_activity blocked
JOIN pg_locks blocked_locks ON blocked.pid = blocked_locks.pid
JOIN pg_locks blocking_locks ON blocked_locks.locktype = blocking_locks.locktype
    AND blocked_locks.relation = blocking_locks.relation
    AND blocked_locks.pid != blocking_locks.pid
JOIN pg_stat_activity blocking ON blocking_locks.pid = blocking.pid
WHERE NOT blocked_locks.granted;

High Availability

Streaming Replication Setup

-- On primary
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secret';

-- pg_hba.conf
-- host replication replicator replica_ip/32 md5

-- On replica
-- pg_basebackup -h primary_ip -D /var/lib/postgresql/data -U replicator -P -R

Logical Replication

-- On publisher
CREATE PUBLICATION my_pub FOR TABLE orders, customers;

-- On subscriber
CREATE SUBSCRIPTION my_sub
CONNECTION 'host=publisher_ip dbname=mydb user=replicator'
PUBLICATION my_pub;

Success Output

When this skill is successfully applied, you should see:

✅ SKILL COMPLETE: postgresql-patterns

Completed:
- [x] Database schema designed with proper normalization
- [x] Indexes created (B-tree, GIN, partial indexes as needed)
- [x] Query optimization applied with measurable improvements
- [x] Monitoring queries configured for performance tracking
- [x] Backup/recovery procedures implemented

Outputs:
- Schema DDL files with table definitions
- Index creation statements with justification
- Optimized query versions with EXPLAIN ANALYZE results
- Monitoring dashboard queries (pg_stat_statements, locks, bloat)
- Documentation of design decisions and tradeoffs

Performance Metrics:
- Query execution time improvements (before/after)
- Index usage statistics
- Table/index size optimizations

Completion Checklist

Before marking this skill as complete, verify:

Failure Indicators

This skill has FAILED if:

❌ Schema has redundant data or update anomalies (poor normalization)
❌ Missing indexes on foreign keys causing sequential scans
❌ Queries still slow after optimization (no measurable improvement)
❌ Index bloat >30% without maintenance plan
❌ No connection pooling (application making 100+ direct connections)
❌ Backup procedures untested or missing
❌ No monitoring of slow queries or connection exhaustion
❌ Partitioning strategy doesn't align with query patterns
❌ JSONB queries without appropriate GIN indexes

When NOT to Use

Do NOT use this skill when:

Simple SQLite suffices - For single-user applications or <10K rows, PostgreSQL is overkill
No complex queries - If just key-value lookups, use Redis or similar
Embedded scenarios - Mobile apps should use SQLite, not PostgreSQL
Serverless functions - Cold start overhead makes PostgreSQL connections expensive (use connection pooling service or Aurora Serverless)
Real-time analytics - For sub-second aggregations on billions of rows, use ClickHouse or TimescaleDB instead
Graph queries dominant - If mostly traversing relationships, use Neo4j or similar graph database
Document store needs - If schema-free JSON documents without relational joins, use MongoDB

Use alternatives:

High-velocity time-series → TimescaleDB (PostgreSQL extension) or InfluxDB
Distributed scale-out → CockroachDB or YugabyteDB (PostgreSQL-compatible)
Graph relationships → Neo4j, AWS Neptune
Full-text search → Elasticsearch, Meilisearch

Anti-Patterns (Avoid)

Anti-Pattern	Problem	Solution
No indexes on foreign keys	Sequential scans on JOINs, terrible performance	Always index foreign keys: `CREATE INDEX idx_orders_customer ON orders(customer_id)`
*SELECT in production**	Fetches unnecessary columns, wastes bandwidth	Explicitly list columns: `SELECT id, name, email FROM users`
N+1 query problem	1 query for list + N queries for details	Use JOINs or CTEs to fetch related data in single query
No connection pooling	Application creates 1000s of connections, exhausts PostgreSQL	Use PgBouncer or application-level pooling (SQLAlchemy pool_size=10)
Over-normalization	Excessive JOINs for simple queries	Denormalize hot paths, use materialized views for aggregates
Wrong index type	B-tree on JSONB, GIN on equality checks	Match index to query: GIN for JSONB/arrays, B-tree for equality/range
No vacuum strategy	Table bloat, dead tuples accumulate	Configure autovacuum or schedule manual VACUUM ANALYZE
Unbounded queries	`SELECT * FROM logs` fetches millions of rows	Always use LIMIT/OFFSET or cursor pagination
No query timeout	Runaway queries block connections for hours	Set `statement_timeout = '30s'` in postgresql.conf
Storing files in database	BLOBs cause bloat, slow backups	Store files in object storage (S3), keep URLs in database

Principles

This skill embodies CODITECT automation principles:

#1 Recycle → Extend → Re-Use → Create

Recycle existing patterns from pg_stat_statements analysis - optimize what's actually slow
Extend with indexes rather than rewriting queries when possible
Re-use proven patterns (window functions, CTEs, partitioning) from this skill
Create custom solutions only when standard patterns don't fit

#2 First Principles Thinking

Understand query execution - Use EXPLAIN ANALYZE to see actual plans
Know index tradeoffs - Indexes speed reads but slow writes and consume space
Measure before optimizing - Premature optimization based on assumptions wastes time

#5 Eliminate Ambiguity

Explicit schema constraints - NOT NULL, CHECK, UNIQUE prevent invalid data
Clear index naming - idx_tablename_columnname shows purpose at a glance
Document partitioning strategy - Future maintainers understand why tables are partitioned

#6 Clear, Understandable, Explainable

Comment complex queries - Explain why CTEs or window functions are needed
Document index rationale - Why GIN vs B-tree, why partial index, etc.
Provide monitoring queries - Team can diagnose issues without expert help

#8 No Assumptions

Test backup restoration - Don't assume pg_dump works until you've restored it
Verify replication lag - Monitor streaming replication, don't assume zero lag
Check index usage - Use pg_stat_user_indexes to confirm indexes are actually used

#10 Automation First

Automate partition creation - Use DO blocks or cron to create monthly partitions
Automate monitoring - pg_stat_statements + Prometheus alerts for slow queries
Automate backups - Scheduled pg_dump or WAL archiving to cloud storage

Usage Examples

Design Database Schema

Apply postgresql-patterns skill to design normalized schema for e-commerce platform with proper indexing and partitioning strategy

Optimize Slow Queries

Apply postgresql-patterns skill to analyze and optimize slow queries identified in pg_stat_statements

Setup High Availability

Apply postgresql-patterns skill to configure streaming replication with automatic failover for production database

When to Use This Skill​

How to Use This Skill​

Query Patterns​

Common Table Expressions (CTEs)​

Window Functions​

UPSERT Patterns​

Indexing Strategies​

Index Types​

Index Maintenance​

Partitioning​

Range Partitioning​

List Partitioning​

Connection Pooling​

PgBouncer Configuration​

Application Connection Settings​

JSON/JSONB Operations​

JSONB Queries​

JSONB Indexing​

Backup & Recovery​

Logical Backup​

Point-in-Time Recovery​

Monitoring Queries​

Performance Statistics​

Lock Monitoring​

High Availability​

Streaming Replication Setup​

Logical Replication​

Success Output​

Completion Checklist​

Failure Indicators​

When NOT to Use​

Anti-Patterns (Avoid)​

Principles​

#1 Recycle → Extend → Re-Use → Create​

#2 First Principles Thinking​

#5 Eliminate Ambiguity​

#6 Clear, Understandable, Explainable​

#8 No Assumptions​

#10 Automation First​

Usage Examples​

Design Database Schema​

Optimize Slow Queries​

Setup High Availability​