Agent Skills Framework Extension (Optional)
Database Schema Optimization Skill
When to Use This Skill
Use this skill when implementing database schema optimization patterns in your codebase.
How to Use This Skill
- Review the patterns and examples below
- Apply the relevant patterns to your implementation
- Follow the best practices outlined in this skill
Database design with indexing strategies, query optimization, and multi-tenant data architecture.
Core Capabilities
- Schema Design - Normalization, denormalization, constraints
- Index Optimization - B-tree, GIN, partial indexes
- Query Tuning - EXPLAIN ANALYZE, query rewrites
- Partitioning - Range, list, hash partitioning
- Multi-Tenant Patterns - Row-level, schema-per-tenant
Multi-Tenant Schema Design
-- Core tenant schema with RLS
CREATE TABLE tenants (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(100) NOT NULL,
slug VARCHAR(50) NOT NULL UNIQUE,
status VARCHAR(20) NOT NULL DEFAULT 'active' CHECK (status IN ('active', 'suspended', 'deleted')),
tier VARCHAR(20) NOT NULL DEFAULT 'free' CHECK (tier IN ('free', 'pro', 'enterprise')),
settings JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Users with tenant association
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
email VARCHAR(255) NOT NULL,
name VARCHAR(100) NOT NULL,
password_hash VARCHAR(255),
mfa_enabled BOOLEAN NOT NULL DEFAULT false,
status VARCHAR(20) NOT NULL DEFAULT 'active',
last_login_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (tenant_id, email)
);
-- Projects with full tenant isolation
CREATE TABLE projects (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
name VARCHAR(100) NOT NULL,
description TEXT,
status VARCHAR(20) NOT NULL DEFAULT 'active',
settings JSONB NOT NULL DEFAULT '{}',
created_by UUID NOT NULL REFERENCES users(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Enable RLS on all tenant tables
ALTER TABLE users ENABLE ROW LEVEL SECURITY;
ALTER TABLE projects ENABLE ROW LEVEL SECURITY;
-- RLS policies
CREATE POLICY users_tenant_isolation ON users
FOR ALL USING (tenant_id = current_setting('app.current_tenant_id')::uuid);
CREATE POLICY projects_tenant_isolation ON projects
FOR ALL USING (tenant_id = current_setting('app.current_tenant_id')::uuid);
-- Function to set tenant context
CREATE OR REPLACE FUNCTION set_tenant_context(tenant_uuid UUID)
RETURNS void AS $$
BEGIN
PERFORM set_config('app.current_tenant_id', tenant_uuid::text, false);
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;
Index Optimization
-- Composite indexes for common query patterns
CREATE INDEX idx_users_tenant_email ON users (tenant_id, email);
CREATE INDEX idx_users_tenant_status ON users (tenant_id, status) WHERE status = 'active';
CREATE INDEX idx_projects_tenant_created ON projects (tenant_id, created_at DESC);
-- Partial index for active records only
CREATE INDEX idx_projects_active ON projects (tenant_id, name)
WHERE status = 'active';
-- GIN index for JSONB queries
CREATE INDEX idx_projects_settings ON projects USING GIN (settings);
-- Expression index for case-insensitive search
CREATE INDEX idx_users_email_lower ON users (tenant_id, lower(email));
-- Covering index to avoid heap access
CREATE INDEX idx_projects_list ON projects (tenant_id, created_at DESC)
INCLUDE (name, status);
-- Index for foreign key lookups
CREATE INDEX idx_projects_created_by ON projects (created_by);
-- Analyze query patterns
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;
-- Find unused indexes
SELECT
schemaname || '.' || relname AS table,
indexrelname AS index,
pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size,
idx_scan AS index_scans
FROM pg_stat_user_indexes ui
JOIN pg_index i ON ui.indexrelid = i.indexrelid
WHERE NOT indisunique
AND idx_scan < 50
AND pg_relation_size(relid) > 5 * 8192
ORDER BY pg_relation_size(i.indexrelid) DESC;
Query Optimization
-- Enable query timing and statistics
SET track_io_timing = on;
SET log_min_duration_statement = 100;
-- Analyze slow query
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT p.*, u.name as owner_name
FROM projects p
JOIN users u ON u.id = p.created_by
WHERE p.tenant_id = '550e8400-e29b-41d4-a716-446655440000'
AND p.status = 'active'
ORDER BY p.created_at DESC
LIMIT 20;
-- Query rewrite: Use CTE for complex queries
WITH active_projects AS (
SELECT id, name, created_at, created_by
FROM projects
WHERE tenant_id = $1
AND status = 'active'
ORDER BY created_at DESC
LIMIT 20
)
SELECT ap.*, u.name as owner_name
FROM active_projects ap
JOIN users u ON u.id = ap.created_by;
-- Avoid SELECT * in production
SELECT
p.id,
p.name,
p.status,
p.created_at,
u.name as owner_name
FROM projects p
JOIN users u ON u.id = p.created_by
WHERE p.tenant_id = $1
AND p.status = 'active'
ORDER BY p.created_at DESC
LIMIT 20;
-- Use EXISTS instead of IN for subqueries
SELECT p.*
FROM projects p
WHERE p.tenant_id = $1
AND EXISTS (
SELECT 1 FROM project_members pm
WHERE pm.project_id = p.id
AND pm.user_id = $2
);
-- pg_stat_statements for query analysis
SELECT
query,
calls,
total_exec_time / 1000 as total_seconds,
mean_exec_time as avg_ms,
rows
FROM pg_stat_statements
WHERE query NOT LIKE '%pg_stat%'
ORDER BY total_exec_time DESC
LIMIT 20;
Partitioning Strategies
-- Range partitioning by date (for time-series data)
CREATE TABLE audit_logs (
id UUID NOT NULL DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
event_type VARCHAR(50) NOT NULL,
user_id UUID,
details JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (created_at);
-- Create monthly partitions
CREATE TABLE audit_logs_2025_01 PARTITION OF audit_logs
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
CREATE TABLE audit_logs_2025_02 PARTITION OF audit_logs
FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');
-- Automatic partition creation
CREATE OR REPLACE FUNCTION create_audit_partition()
RETURNS void AS $$
DECLARE
partition_date DATE;
partition_name TEXT;
start_date TEXT;
end_date TEXT;
BEGIN
partition_date := DATE_TRUNC('month', NOW() + INTERVAL '1 month');
partition_name := 'audit_logs_' || TO_CHAR(partition_date, 'YYYY_MM');
start_date := TO_CHAR(partition_date, 'YYYY-MM-DD');
end_date := TO_CHAR(partition_date + INTERVAL '1 month', 'YYYY-MM-DD');
EXECUTE format(
'CREATE TABLE IF NOT EXISTS %I PARTITION OF audit_logs
FOR VALUES FROM (%L) TO (%L)',
partition_name, start_date, end_date
);
END;
$$ LANGUAGE plpgsql;
-- Hash partitioning by tenant (for even distribution)
CREATE TABLE agent_tasks (
id UUID NOT NULL DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
project_id UUID NOT NULL,
status VARCHAR(20) NOT NULL,
input JSONB,
output JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (id, tenant_id)
) PARTITION BY HASH (tenant_id);
CREATE TABLE agent_tasks_p0 PARTITION OF agent_tasks
FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE agent_tasks_p1 PARTITION OF agent_tasks
FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE agent_tasks_p2 PARTITION OF agent_tasks
FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE agent_tasks_p3 PARTITION OF agent_tasks
FOR VALUES WITH (MODULUS 4, REMAINDER 3);
-- Partition maintenance: Drop old partitions
DROP TABLE audit_logs_2024_01;
-- Attach existing table as partition
ALTER TABLE audit_logs ATTACH PARTITION audit_logs_archive
FOR VALUES FROM ('2020-01-01') TO ('2024-01-01');
Connection Pooling
// src/database/pool.ts
import { Pool, PoolConfig } from 'pg';
const poolConfig: PoolConfig = {
host: process.env.DB_HOST,
port: parseInt(process.env.DB_PORT || '5432'),
database: process.env.DB_NAME,
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
// Connection pool settings
min: 5, // Minimum pool size
max: 20, // Maximum pool size
idleTimeoutMillis: 30000, // Close idle connections after 30s
connectionTimeoutMillis: 5000, // Timeout waiting for connection
// Statement timeout
statement_timeout: 30000, // 30 second query timeout
// SSL configuration
ssl: process.env.NODE_ENV === 'production' ? {
rejectUnauthorized: true,
ca: process.env.DB_CA_CERT,
} : false,
};
export const pool = new Pool(poolConfig);
// Monitor pool health
pool.on('connect', (client) => {
metrics.increment('db.connections.created');
});
pool.on('acquire', (client) => {
metrics.increment('db.connections.acquired');
});
pool.on('release', (client) => {
metrics.increment('db.connections.released');
});
pool.on('error', (err, client) => {
logger.error('Unexpected pool error', { error: err.message });
metrics.increment('db.connections.errors');
});
// Periodic pool stats
setInterval(() => {
const stats = {
total: pool.totalCount,
idle: pool.idleCount,
waiting: pool.waitingCount,
};
metrics.gauge('db.pool.total', stats.total);
metrics.gauge('db.pool.idle', stats.idle);
metrics.gauge('db.pool.waiting', stats.waiting);
}, 10000);
// Graceful shutdown
process.on('SIGTERM', async () => {
await pool.end();
});
Migration Strategy
// migrations/001_initial_schema.ts
import { Knex } from 'knex';
export async function up(knex: Knex): Promise<void> {
// Enable extensions
await knex.raw('CREATE EXTENSION IF NOT EXISTS "uuid-ossp"');
await knex.raw('CREATE EXTENSION IF NOT EXISTS "pg_trgm"');
// Create tenants table
await knex.schema.createTable('tenants', (table) => {
table.uuid('id').primary().defaultTo(knex.raw('gen_random_uuid()'));
table.string('name', 100).notNullable();
table.string('slug', 50).notNullable().unique();
table.string('status', 20).notNullable().defaultTo('active');
table.string('tier', 20).notNullable().defaultTo('free');
table.jsonb('settings').notNullable().defaultTo('{}');
table.timestamps(true, true);
table.check('status IN (\'active\', \'suspended\', \'deleted\')');
});
// Create users table
await knex.schema.createTable('users', (table) => {
table.uuid('id').primary().defaultTo(knex.raw('gen_random_uuid()'));
table.uuid('tenant_id').notNullable().references('id').inTable('tenants').onDelete('CASCADE');
table.string('email', 255).notNullable();
table.string('name', 100).notNullable();
table.string('password_hash', 255);
table.boolean('mfa_enabled').notNullable().defaultTo(false);
table.string('status', 20).notNullable().defaultTo('active');
table.timestamp('last_login_at');
table.timestamps(true, true);
table.unique(['tenant_id', 'email']);
});
// Enable RLS
await knex.raw('ALTER TABLE users ENABLE ROW LEVEL SECURITY');
await knex.raw(`
CREATE POLICY users_tenant_isolation ON users
FOR ALL USING (tenant_id = current_setting('app.current_tenant_id')::uuid)
`);
}
export async function down(knex: Knex): Promise<void> {
await knex.schema.dropTableIfExists('users');
await knex.schema.dropTableIfExists('tenants');
}
// Zero-downtime migration example
// migrations/005_add_user_preferences.ts
export async function up(knex: Knex): Promise<void> {
// Step 1: Add nullable column
await knex.schema.alterTable('users', (table) => {
table.jsonb('preferences').nullable();
});
// Step 2: Backfill in batches (run separately)
// await knex.raw(`
// UPDATE users SET preferences = '{"theme": "light", "notifications": true}'
// WHERE preferences IS NULL
// `);
// Step 3: Add NOT NULL constraint (after backfill)
// await knex.raw(`
// ALTER TABLE users ALTER COLUMN preferences SET NOT NULL
// `);
}
Backup and Recovery
#!/bin/bash
# scripts/backup-database.sh
set -euo pipefail
DB_HOST="${DB_HOST:-localhost}"
DB_NAME="${DB_NAME:-coditect}"
DB_USER="${DB_USER:-postgres}"
BACKUP_DIR="${BACKUP_DIR:-/backups}"
RETENTION_DAYS="${RETENTION_DAYS:-30}"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="${BACKUP_DIR}/${DB_NAME}_${TIMESTAMP}.sql.gz"
# Create backup with parallel jobs
pg_dump \
-h "$DB_HOST" \
-U "$DB_USER" \
-d "$DB_NAME" \
--format=custom \
--compress=9 \
--jobs=4 \
--file="$BACKUP_FILE"
# Upload to cloud storage
gsutil cp "$BACKUP_FILE" "gs://coditect-backups/database/"
# Verify backup
pg_restore --list "$BACKUP_FILE" > /dev/null
# Clean old local backups
find "$BACKUP_DIR" -name "*.sql.gz" -mtime +7 -delete
# Clean old cloud backups
gsutil ls -l "gs://coditect-backups/database/" | \
awk -v date="$(date -d "-${RETENTION_DAYS} days" +%Y-%m-%d)" '$2 < date {print $3}' | \
xargs -r gsutil rm
echo "Backup completed: $BACKUP_FILE"
Performance Monitoring
-- Create monitoring views
CREATE VIEW db_performance_stats AS
SELECT
datname as database,
numbackends as active_connections,
xact_commit as commits,
xact_rollback as rollbacks,
blks_read as blocks_read,
blks_hit as blocks_hit,
round(100.0 * blks_hit / nullif(blks_hit + blks_read, 0), 2) as cache_hit_ratio,
tup_returned as rows_returned,
tup_fetched as rows_fetched,
tup_inserted as rows_inserted,
tup_updated as rows_updated,
tup_deleted as rows_deleted
FROM pg_stat_database
WHERE datname = current_database();
-- Table bloat analysis
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) as total_size,
pg_size_pretty(pg_relation_size(schemaname || '.' || tablename)) as table_size,
pg_size_pretty(pg_indexes_size(schemaname || '.' || tablename::regclass)) as indexes_size,
n_live_tup as live_rows,
n_dead_tup as dead_rows,
round(100.0 * n_dead_tup / nullif(n_live_tup + n_dead_tup, 0), 2) as dead_ratio
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC
LIMIT 20;
-- Lock monitoring
SELECT
blocked_locks.pid as blocked_pid,
blocked_activity.usename as blocked_user,
blocking_locks.pid as blocking_pid,
blocking_activity.usename as blocking_user,
blocked_activity.query as blocked_statement,
blocking_activity.query as blocking_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;
Usage Examples
Design Multi-Tenant Schema
Apply database-schema-optimization skill to create PostgreSQL schema with RLS for complete tenant isolation
Optimize Query Performance
Apply database-schema-optimization skill to analyze and optimize slow queries using EXPLAIN ANALYZE
Implement Partitioning
Apply database-schema-optimization skill to add time-based partitioning for audit logs with automatic partition management
Integration Points
- system-architecture-design - Data modeling per bounded context
- multi-tenant-security - Row-level security implementation
- infrastructure-as-code - Database provisioning and configuration
- monitoring-observability - Database metrics and alerting
Success Output
When this skill completes successfully, output:
✅ SKILL COMPLETE: database-schema-optimization
Completed:
- [x] Multi-tenant schema designed with RLS
- [x] Indexes optimized (composite, partial, GIN, covering)
- [x] Query performance tuned (EXPLAIN ANALYZE results improved)
- [x] Partitioning implemented (range/hash strategies)
- [x] Connection pooling configured
- [x] Migration strategy established (zero-downtime)
- [x] Backup and recovery tested
- [x] Performance monitoring views created
Outputs:
- migrations/001_initial_schema.ts (Core schema with RLS)
- migrations/002_indexes.sql (Optimized indexes)
- migrations/003_partitions.sql (Partitioning setup)
- src/database/pool.ts (Connection pool config)
- scripts/backup-database.sh (Backup automation)
Performance Metrics:
- Cache hit ratio: XX% (target: >95%)
- Average query time: XXms (target: <100ms)
- Connection pool utilization: XX% (target: 60-80%)
- Dead tuple ratio: XX% (target: <10%)
Completion Checklist
Before marking this skill as complete, verify:
- Multi-tenant schema created with tenant_id on all scoped tables
- RLS policies enabled and tested (no cross-tenant leakage)
- Indexes created for all foreign keys and common query patterns
- EXPLAIN ANALYZE shows index usage (no sequential scans on large tables)
- Partitioning applied to time-series and high-volume tables
- Connection pool configured with appropriate min/max/timeout settings
- Migrations tested with both UP and DOWN (rollback verified)
- Backup script tested and recovery validated
- Performance monitoring views queried successfully
- pg_stat_statements enabled and analyzed
Failure Indicators
This skill has FAILED if:
- ❌ RLS policies missing or incorrectly configured (data leakage)
- ❌ Sequential scans on large tables (missing indexes)
- ❌ Slow queries (>1s) not identified and optimized
- ❌ Partitioning creates deadlocks or constraint violations
- ❌ Connection pool exhaustion (all connections in use)
- ❌ Migration rollback fails (can't recover)
- ❌ Backup incomplete or recovery untested
- ❌ Cache hit ratio <90% (inefficient queries)
- ❌ Dead tuple ratio >20% (vacuum not running)
- ❌ Lock contention detected (blocking queries)
When NOT to Use
Do NOT use this skill when:
- Early prototyping - Premature optimization slows development velocity
- Small datasets - < 10K rows don't need partitioning or complex indexing
- Read-only workloads - No writes means simpler schema without RLS overhead
- No multi-tenancy - Don't add tenant isolation if not a SaaS application
- NoSQL better fit - Document databases may suit unstructured data better
- Simple CRUD only - Basic operations don't need query optimization
- Testing/dev environments - Use simpler schemas for faster iteration
Use these alternatives instead:
- Prototyping: Start with basic schema, optimize later
- Small data: Single-table designs, add indexes only when needed
- Read-only: Materialized views instead of complex queries
- Single-tenant: Standard schema without RLS
- Unstructured data: MongoDB, DynamoDB for flexibility
- Simple apps: ORMs with auto-migrations (Prisma, TypeORM)
- Dev/test: SQLite or in-memory databases
Anti-Patterns (Avoid)
| Anti-Pattern | Problem | Solution |
|---|---|---|
| SELECT * in production | Fetches unnecessary columns, slow | Specify explicit column list |
| Missing indexes on foreign keys | Slow JOINs, sequential scans | Create index for every FK |
| Over-indexing | Write performance degrades, storage bloat | Index only high-frequency queries |
| No index on WHERE clauses | Full table scans on every query | Analyze slow queries, add targeted indexes |
| Using IN with large lists | Subquery inefficiency | Use EXISTS or JOIN instead |
| N+1 query pattern | Hundreds of queries instead of one | Use JOINs or batch loading |
| No connection pooling | Connection overhead on every request | Configure pgBouncer or app-level pool |
| Migrations without rollback | Can't undo failed deployments | Always write DOWN migrations |
| No query timeouts | Runaway queries block other operations | Set statement_timeout |
| Skipping VACUUM | Table bloat, degraded performance | Enable autovacuum, monitor dead tuples |
| No partitioning for time-series | Slow queries on large audit logs | Partition by date range (monthly) |
| Unmonitored slow queries | Performance issues go unnoticed | Enable pg_stat_statements logging |
| Storing JSON blobs | Poor queryability, no constraints | Use proper columns with types |
| No backup testing | Backups exist but recovery never tested | Schedule quarterly recovery drills |
Principles
This skill embodies these CODITECT principles:
#2 First Principles Thinking
- Understand WHY queries are slow (missing indexes, poor schema design)
- Question normalization needs: when to denormalize for performance
- Design from access patterns up, not entity relationships down
#3 Keep It Simple
- Start with normalized schema, denormalize only when measured performance issues
- Use standard B-tree indexes first, exotic types (GIN, GiST) only when needed
- Range partitioning before hash partitioning (simpler to manage)
#4 Separation of Concerns
- Schema design separate from query optimization
- RLS policies enforce security at database level, not application code
- Partitioning strategy separate from business logic
#5 Eliminate Ambiguity
- Explicit column types (no "text" for everything)
- Clear CHECK constraints for valid values
- NOT NULL where applicable (no ambiguous nulls)
#7 Security First
- RLS prevents unauthorized access at database level
- Prepared statements prevent SQL injection
- Least privilege: connection users have minimal permissions
#8 No Assumptions
- Don't assume indexes exist; verify with EXPLAIN ANALYZE
- Don't assume backups work; test recovery regularly
- Don't assume queries are fast; measure with pg_stat_statements
#11 Automate Everything
- Automated migrations with version control
- Automated index creation based on slow query analysis
- Automated backup and retention management
- Automated performance monitoring and alerting
Full Standard: CODITECT-STANDARD-AUTOMATION.md
Version: 1.1.0 | Updated: 2026-01-04 | Quality Standard: SKILL-QUALITY-STANDARD.md v1.0.0