Migration Strategy Skill
Migration Strategy Skill
When to Use This Skill
Use this skill when implementing migration strategy patterns in your codebase.
How to Use This Skill
- Review the patterns and examples below
- Apply the relevant patterns to your implementation
- Follow the best practices outlined in this skill
Database migration patterns for safe, zero-downtime schema changes with comprehensive rollback capabilities.
Migration Principles
Golden Rules
- Always backward compatible - Old code must work with new schema
- Deployable independently - Migrations deploy before code changes
- Reversible when possible - Include down migrations
- Tested before production - Run against production-like data
- Small and focused - One logical change per migration
Deployment Order
1. Deploy migration (expand)
2. Deploy new code
3. Verify functionality
4. Deploy cleanup migration (contract)
Zero-Downtime Patterns
Adding Columns
-- Safe: Add nullable column
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
-- Safe: Add column with default (PostgreSQL 11+)
ALTER TABLE users ADD COLUMN status VARCHAR(20) DEFAULT 'active';
-- Unsafe: Adding NOT NULL without default (table lock)
-- ALTER TABLE users ADD COLUMN phone VARCHAR(20) NOT NULL; -- DON'T DO THIS
-- Safe alternative: Multi-step approach
-- Step 1: Add nullable
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
-- Step 2: Backfill data
UPDATE users SET phone = 'unknown' WHERE phone IS NULL;
-- Step 3: Add constraint
ALTER TABLE users ALTER COLUMN phone SET NOT NULL;
Removing Columns
-- Step 1: Stop writing to column (code change)
-- Step 2: Deploy code that doesn't read column
-- Step 3: Drop column
ALTER TABLE users DROP COLUMN legacy_field;
Renaming Columns
-- DON'T: Direct rename causes downtime
-- ALTER TABLE users RENAME COLUMN name TO full_name;
-- DO: Multi-step approach
-- Step 1: Add new column
ALTER TABLE users ADD COLUMN full_name VARCHAR(255);
-- Step 2: Backfill and sync (trigger or application)
UPDATE users SET full_name = name WHERE full_name IS NULL;
CREATE OR REPLACE FUNCTION sync_name_columns()
RETURNS TRIGGER AS $$
BEGIN
IF TG_OP = 'INSERT' OR TG_OP = 'UPDATE' THEN
IF NEW.full_name IS NULL THEN
NEW.full_name := NEW.name;
END IF;
IF NEW.name IS NULL THEN
NEW.name := NEW.full_name;
END IF;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER sync_names
BEFORE INSERT OR UPDATE ON users
FOR EACH ROW EXECUTE FUNCTION sync_name_columns();
-- Step 3: Update code to use new column
-- Step 4: Drop trigger and old column
DROP TRIGGER sync_names ON users;
ALTER TABLE users DROP COLUMN name;
Changing Column Types
-- Safe: Widening (VARCHAR(50) to VARCHAR(100))
ALTER TABLE users ALTER COLUMN name TYPE VARCHAR(100);
-- Unsafe: Narrowing or incompatible types
-- Use multi-step approach:
-- Step 1: Add new column
ALTER TABLE products ADD COLUMN price_cents BIGINT;
-- Step 2: Backfill
UPDATE products SET price_cents = (price * 100)::BIGINT;
-- Step 3: Sync with trigger
CREATE TRIGGER sync_price
BEFORE INSERT OR UPDATE ON products
FOR EACH ROW EXECUTE FUNCTION sync_price_columns();
-- Step 4: Update code to use new column
-- Step 5: Drop old column
ALTER TABLE products DROP COLUMN price;
ALTER TABLE products RENAME COLUMN price_cents TO price;
Adding Indexes
-- Always use CONCURRENTLY to avoid table locks
CREATE INDEX CONCURRENTLY idx_users_email ON users (email);
-- If it fails, drop invalid index and retry
DROP INDEX IF EXISTS idx_users_email;
CREATE INDEX CONCURRENTLY idx_users_email ON users (email);
Adding Constraints
-- Adding NOT NULL (safe in PostgreSQL 12+ with default)
ALTER TABLE users
ADD COLUMN status VARCHAR(20) DEFAULT 'active' NOT NULL;
-- Adding UNIQUE constraint
-- Step 1: Create index concurrently
CREATE UNIQUE INDEX CONCURRENTLY idx_users_email_unique ON users (email);
-- Step 2: Add constraint using index
ALTER TABLE users
ADD CONSTRAINT users_email_unique UNIQUE USING INDEX idx_users_email_unique;
-- Adding CHECK constraint (PostgreSQL 12+)
ALTER TABLE orders
ADD CONSTRAINT orders_amount_positive CHECK (amount > 0) NOT VALID;
-- Validate separately (doesn't lock)
ALTER TABLE orders VALIDATE CONSTRAINT orders_amount_positive;
Adding Foreign Keys
-- Step 1: Add constraint without validation
ALTER TABLE orders
ADD CONSTRAINT orders_customer_fk
FOREIGN KEY (customer_id) REFERENCES customers(id)
NOT VALID;
-- Step 2: Validate separately (no lock on parent table)
ALTER TABLE orders VALIDATE CONSTRAINT orders_customer_fk;
Rollback Strategies
Reversible Migrations
-- Migration: Add column
-- UP
ALTER TABLE users ADD COLUMN middle_name VARCHAR(100);
-- DOWN
ALTER TABLE users DROP COLUMN middle_name;
Data Preservation Rollback
-- UP: Rename column via copy
ALTER TABLE users ADD COLUMN full_name VARCHAR(255);
UPDATE users SET full_name = name;
-- Don't drop 'name' yet
-- DOWN: Just drop new column (data preserved in 'name')
ALTER TABLE users DROP COLUMN IF EXISTS full_name;
Feature Flags for Rollback
# Application-level rollback via feature flags
def get_user_name(user):
if feature_enabled('use_full_name_column'):
return user.full_name
return user.name
Migration Tools
Flyway (Java/Multi-language)
-- V1__create_users.sql
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255) NOT NULL UNIQUE,
created_at TIMESTAMP DEFAULT NOW()
);
-- V2__add_user_status.sql
ALTER TABLE users ADD COLUMN status VARCHAR(20) DEFAULT 'active';
-- R__refresh_user_view.sql (repeatable)
CREATE OR REPLACE VIEW active_users AS
SELECT * FROM users WHERE status = 'active';
# Run migrations
flyway migrate
# Check status
flyway info
# Rollback (Teams edition)
flyway undo
Alembic (Python/SQLAlchemy)
# alembic/versions/001_create_users.py
from alembic import op
import sqlalchemy as sa
revision = '001'
down_revision = None
def upgrade():
op.create_table(
'users',
sa.Column('id', sa.Integer(), primary_key=True),
sa.Column('email', sa.String(255), nullable=False, unique=True),
sa.Column('created_at', sa.DateTime(), server_default=sa.func.now())
)
def downgrade():
op.drop_table('users')
# Create migration
alembic revision -m "add_user_status"
# Auto-generate from models
alembic revision --autogenerate -m "add_user_status"
# Run migrations
alembic upgrade head
# Rollback one step
alembic downgrade -1
Prisma (TypeScript/JavaScript)
// schema.prisma
model User {
id Int @id @default(autoincrement())
email String @unique
status String @default("active")
createdAt DateTime @default(now())
}
# Create migration
npx prisma migrate dev --name add_user_status
# Apply to production
npx prisma migrate deploy
# Reset database
npx prisma migrate reset
Data Migrations
Large Table Updates
-- DON'T: Update all rows at once
-- UPDATE users SET status = 'active' WHERE status IS NULL;
-- DO: Batch updates
DO $$
DECLARE
batch_size INT := 10000;
affected INT;
BEGIN
LOOP
UPDATE users
SET status = 'active'
WHERE id IN (
SELECT id FROM users
WHERE status IS NULL
LIMIT batch_size
FOR UPDATE SKIP LOCKED
);
GET DIAGNOSTICS affected = ROW_COUNT;
EXIT WHEN affected = 0;
COMMIT;
PERFORM pg_sleep(0.1); -- Reduce load
END LOOP;
END $$;
ETL Pattern
# Python batch migration
def migrate_user_status(batch_size=10000):
while True:
with transaction():
users = db.execute("""
SELECT id FROM users
WHERE status IS NULL
LIMIT %s
FOR UPDATE SKIP LOCKED
""", [batch_size])
if not users:
break
ids = [u.id for u in users]
db.execute("""
UPDATE users SET status = 'active'
WHERE id = ANY(%s)
""", [ids])
time.sleep(0.1) # Backpressure
Testing Migrations
Pre-Production Testing
# 1. Restore production backup to staging
pg_restore -d staging_db production_backup.dump
# 2. Run migrations
flyway migrate -url=jdbc:postgresql://staging/staging_db
# 3. Run application tests
pytest tests/integration/
# 4. Verify schema matches expected
pg_dump -s staging_db > actual_schema.sql
diff expected_schema.sql actual_schema.sql
Migration Test Framework
# tests/test_migrations.py
import pytest
from alembic.config import Config
from alembic import command
@pytest.fixture
def alembic_config():
return Config("alembic.ini")
def test_upgrade_downgrade(alembic_config, test_db):
"""Test all migrations can upgrade and downgrade"""
# Upgrade to head
command.upgrade(alembic_config, "head")
# Downgrade to base
command.downgrade(alembic_config, "base")
# Upgrade again
command.upgrade(alembic_config, "head")
def test_migration_reversibility(alembic_config, test_db):
"""Test each migration is reversible"""
revisions = get_all_revisions(alembic_config)
for rev in revisions:
command.upgrade(alembic_config, rev)
command.downgrade(alembic_config, "-1")
command.upgrade(alembic_config, rev)
CI/CD Integration
GitHub Actions
name: Database Migrations
on:
pull_request:
paths:
- 'migrations/**'
jobs:
test-migrations:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:15
env:
POSTGRES_PASSWORD: test
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- uses: actions/checkout@v4
- name: Run migrations
run: flyway migrate
- name: Test rollback
run: flyway undo
- name: Re-apply migrations
run: flyway migrate
- name: Verify schema
run: |
pg_dump -s testdb > actual.sql
diff expected_schema.sql actual.sql
Usage Examples
Plan Zero-Downtime Migration
Apply migration-strategy skill to plan zero-downtime column rename from 'name' to 'full_name' in users table
Large Data Migration
Apply migration-strategy skill to design batched migration for updating 50M rows with minimal performance impact
Setup Migration Pipeline
Apply migration-strategy skill to configure Alembic with automated testing and CI/CD integration
Success Output
When successful, this skill MUST output:
✅ SKILL COMPLETE: migration-strategy
Completed:
- [x] Migration plan created with step-by-step deployment order
- [x] Zero-downtime strategy validated (no table locks)
- [x] Rollback procedure documented and tested
- [x] Migration files generated (SQL, Alembic, Flyway, or Prisma)
- [x] Testing plan executed against production-like data
- [x] CI/CD pipeline configured for migration deployment
Outputs:
- migrations/ directory with versioned migration files
- MIGRATION-PLAN.md with deployment steps
- ROLLBACK-PLAN.md with reversal procedure
- CI/CD workflow configuration (.github/workflows/ or similar)
- Test results from staging environment
Completion Checklist
Before marking this skill as complete, verify:
- Migration tested on production data snapshot in staging
- All schema changes use CONCURRENTLY or non-locking equivalents
- Backward compatibility verified (old code works with new schema)
- Forward compatibility verified (new code works with old schema)
- Rollback procedure documented with exact commands
- Deployment order documented (migration → code → cleanup)
- Data integrity constraints validated after migration
- Performance impact measured (query time, table locks, replication lag)
- Monitoring alerts configured for migration execution
Failure Indicators
This skill has FAILED if:
- ❌ Migration causes table locks in production (missing CONCURRENTLY)
- ❌ Old code breaks with new schema (backward incompatibility)
- ❌ Rollback procedure not tested or doesn't work
- ❌ Migration untested on production-like dataset
- ❌ Data loss or corruption during migration
- ❌ No deployment order documented (unclear how to deploy safely)
- ❌ Migration timeout or performance degradation in production
- ❌ Constraint violations discovered after migration completes
- ❌ No monitoring for migration progress/failures
When NOT to Use
Do NOT use this skill when:
- Working with development databases (just drop and recreate)
- Prototyping without production data (schema evolution not critical)
- Using schema-less databases (MongoDB, DynamoDB)
- Migrating data between different database systems (use
data-migration-patterns) - Migrating to cloud-native databases with different paradigms (use vendor-specific guides)
- One-time data imports or exports (use ETL tools)
- Schema changes in single-user applications (no zero-downtime requirements)
Anti-Patterns (Avoid)
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Direct ALTER TABLE without CONCURRENTLY | Table locks block production | Use CONCURRENTLY for indexes/constraints |
| Renaming columns directly | Instant breaking change | Multi-step: add → sync → migrate code → drop old |
| Backfilling all rows at once | Long table locks | Batch updates with FOR UPDATE SKIP LOCKED |
| No rollback plan | Can't recover from failures | Document and test rollback before deploying |
| Testing only on empty database | Misses performance issues | Test on production-sized dataset |
| Ignoring replication lag | Replicas fall behind | Monitor lag, add backpressure |
| Running migrations manually | Human error, inconsistency | Automate via CI/CD with Flyway/Alembic |
| Skipping feature flag strategy | Can't roll back code changes | Use feature flags for gradual rollout |
Principles
This skill embodies:
- #2 First Principles - Understand database locking and replication before migrating
- #3 Keep It Simple - Small, incremental migrations over large, risky changes
- #8 No Assumptions - Test migrations on production-like data
- #10 Quality Over Speed - Zero-downtime migrations require careful planning
- #11 Measure Twice, Cut Once - Test rollback before deploying forward migration
- #13 Automation - Automate migration deployment to prevent human error
Full Principles: CODITECT-STANDARD-AUTOMATION.md