ADR-008: Deployment Automation (CI/CD Pipeline)
Status: Proposed Date: 2025-11-26 Deciders: DevOps Team, Engineering Team Related ADRs: ADR-001 (Hybrid Architecture), ADR-003 (FastAPI vs Django)
Context
The context intelligence platform must support two deployment modes with automated CI/CD pipelines:
Deployment Requirements
1. Standalone Mode
- Deploy to Kubernetes (GKE, EKS, or AKS)
- FastAPI application + PostgreSQL + Weaviate + Celery + Redis
- Docker containers for each service
- Independent scaling (no CODITECT dependencies)
- Target: Deploy to production in <10 minutes
2. CODITECT Integration Mode
- Deploy as Django app within existing CODITECT platform
- Reuse CODITECT's PostgreSQL, Redis, Celery workers
- Add Weaviate as new service
- Integrate with CODITECT's deployment pipeline (GCP Cloud Build + Cloud Run)
- Target: Deploy to production in <5 minutes (leverages existing infra)
Quality Gates
Both modes must pass:
- ✅ Unit tests (80%+ coverage)
- ✅ Integration tests (API endpoints, database operations)
- ✅ Security scanning (SAST: Bandit, DAST: OWASP ZAP)
- ✅ Performance tests (p95 latency <100ms)
- ✅ Database migrations (zero downtime)
Decision
We will implement GitHub Actions for CI/CD with:
- Standalone Mode: Kubernetes deployment (Helm charts)
- CODITECT Mode: Django app integration (GCP Cloud Build)
- Shared quality gates: Tests run once, results used by both pipelines
- Progressive rollout: Canary deployments (10% → 50% → 100%)
Architecture
┌─────────────────────────────────────────────────────────────┐
│ GitHub Repository: coditect-dev-context │
│ │
│ Event: Push to main OR Pull Request │
└───────────────────────┬─────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ GitHub Actions: CI Pipeline (Shared) │
│ │
│ 1. Build Docker Images │
│ - app:latest (FastAPI/Django code) │
│ - migrations:latest (Alembic/Django migrations) │
│ │
│ 2. Run Tests (Parallel) │
│ - Unit tests (pytest) - 5 minutes │
│ - Integration tests (API endpoints) - 3 minutes │
│ - Security scan (Bandit, Safety) - 2 minutes │
│ - Linting (Ruff, Black, MyPy) - 1 minute │
│ │
│ 3. Build Artifacts │
│ - Helm chart (standalone mode) │
│ - Django migration files (CODITECT mode) │
└───────────────────────┬─────────────────────────────────────┘
↓
┌───────────────┴───────────────┐
↓ ↓
┌──────────────────┐ ┌──────────────────┐
│ Standalone │ │ CODITECT │
│ Deployment │ │ Deployment │
│ (Kubernetes) │ │ (GCP Cloud Run) │
│ │ │ │
│ 1. Push to │ │ 1. Trigger │
│ Docker Hub │ │ Cloud Build │
│ │ │ │
│ 2. Deploy to │ │ 2. Django │
│ Staging │ │ Migration │
│ (Helm) │ │ │
│ │ │ 3. Deploy to │
│ 3. Smoke Tests │ │ Cloud Run │
│ (5 min) │ │ │
│ │ │ 4. Health Check │
│ 4. Canary │ │ (2 min) │
│ (10% traffic) │ │ │
│ │ │ 5. Gradual │
│ 5. Full Rollout │ │ Rollout │
│ (100%) │ │ (100%) │
│ │ │ │
│ Total: 8-10 min │ │ Total: 4-5 min │
└──────────────────┘ └──────────────────┘
Implementation
Step 1: GitHub Actions CI Pipeline
# .github/workflows/ci.yml
name: CI Pipeline
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
env:
PYTHON_VERSION: '3.11'
POETRY_VERSION: '1.7'
jobs:
# Job 1: Build Docker images
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build FastAPI app image
uses: docker/build-push-action@v5
with:
context: .
file: ./docker/Dockerfile.app
tags: coditect-context:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max
push: false
load: true
- name: Build migrations image
uses: docker/build-push-action@v5
with:
context: .
file: ./docker/Dockerfile.migrations
tags: coditect-context-migrations:${{ github.sha }}
push: false
load: true
# Job 2: Unit tests
unit-tests:
runs-on: ubuntu-latest
needs: build
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ env.PYTHON_VERSION }}
- name: Install Poetry
uses: snok/install-poetry@v1
with:
version: ${{ env.POETRY_VERSION }}
- name: Install dependencies
run: poetry install --with dev
- name: Run unit tests
run: poetry run pytest tests/unit/ --cov=core --cov-report=xml
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v4
with:
file: ./coverage.xml
# Job 3: Integration tests
integration-tests:
runs-on: ubuntu-latest
needs: build
services:
postgres:
image: postgres:15
env:
POSTGRES_PASSWORD: test
POSTGRES_DB: context_test
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
redis:
image: redis:7-alpine
options: >-
--health-cmd "redis-cli ping"
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ env.PYTHON_VERSION }}
- name: Install Poetry
uses: snok/install-poetry@v1
- name: Install dependencies
run: poetry install --with dev
- name: Run database migrations
env:
DATABASE_URL: postgresql://postgres:test@localhost:5432/context_test
run: poetry run alembic upgrade head
- name: Run integration tests
env:
DATABASE_URL: postgresql://postgres:test@localhost:5432/context_test
REDIS_URL: redis://localhost:6379/0
run: poetry run pytest tests/integration/ -v
# Job 4: Security scanning
security-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ env.PYTHON_VERSION }}
- name: Install Poetry
uses: snok/install-poetry@v1
- name: Install dependencies
run: poetry install --with dev
- name: Run Bandit (SAST)
run: poetry run bandit -r core/ standalone/ coditect/ -f json -o bandit-report.json
- name: Run Safety (dependency scan)
run: poetry run safety check --json > safety-report.json
- name: Upload security reports
uses: actions/upload-artifact@v4
with:
name: security-reports
path: |
bandit-report.json
safety-report.json
# Job 5: Linting
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ env.PYTHON_VERSION }}
- name: Install Poetry
uses: snok/install-poetry@v1
- name: Install dependencies
run: poetry install --with dev
- name: Run Ruff
run: poetry run ruff check .
- name: Run Black
run: poetry run black --check .
- name: Run MyPy
run: poetry run mypy core/ standalone/ coditect/
Step 2: Standalone Deployment (Kubernetes with Helm)
# .github/workflows/deploy-standalone.yml
name: Deploy Standalone (Kubernetes)
on:
push:
branches: [main]
workflow_dispatch: # Manual trigger
env:
DOCKER_REGISTRY: docker.io/coditect
HELM_CHART_VERSION: 1.0.0
jobs:
deploy-staging:
runs-on: ubuntu-latest
environment: staging
needs: [unit-tests, integration-tests, security-scan, lint]
steps:
- uses: actions/checkout@v4
- name: Log in to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
context: .
file: ./docker/Dockerfile.app
push: true
tags: |
${{ env.DOCKER_REGISTRY }}/context-app:${{ github.sha }}
${{ env.DOCKER_REGISTRY }}/context-app:latest
- name: Set up Helm
uses: azure/setup-helm@v3
with:
version: '3.13.0'
- name: Configure kubectl
uses: azure/k8s-set-context@v3
with:
method: kubeconfig
kubeconfig: ${{ secrets.KUBECONFIG_STAGING }}
- name: Deploy to staging
run: |
helm upgrade --install context-intelligence \
./helm/context-intelligence \
--namespace staging \
--create-namespace \
--set image.tag=${{ github.sha }} \
--set postgresql.enabled=true \
--set weaviate.enabled=true \
--set redis.enabled=true \
--wait --timeout 5m
- name: Run smoke tests
run: |
kubectl run smoke-test \
--image=curlimages/curl:latest \
--restart=Never \
--namespace=staging \
--rm -i \
--command -- curl -f http://context-intelligence/health || exit 1
deploy-production:
runs-on: ubuntu-latest
environment: production
needs: deploy-staging
steps:
- uses: actions/checkout@v4
- name: Configure kubectl
uses: azure/k8s-set-context@v3
with:
method: kubeconfig
kubeconfig: ${{ secrets.KUBECONFIG_PRODUCTION }}
- name: Deploy canary (10% traffic)
run: |
helm upgrade --install context-intelligence \
./helm/context-intelligence \
--namespace production \
--set image.tag=${{ github.sha }} \
--set canary.enabled=true \
--set canary.weight=10 \
--wait --timeout 5m
- name: Wait 5 minutes (monitor canary)
run: sleep 300
- name: Full rollout (100% traffic)
run: |
helm upgrade context-intelligence \
./helm/context-intelligence \
--namespace production \
--set canary.enabled=false \
--reuse-values \
--wait --timeout 5m
Step 3: CODITECT Deployment (GCP Cloud Build)
# .github/workflows/deploy-coditect.yml
name: Deploy to CODITECT (GCP Cloud Run)
on:
push:
branches: [main]
env:
GCP_PROJECT_ID: coditect-production
CLOUD_RUN_SERVICE: coditect-backend
jobs:
deploy-coditect:
runs-on: ubuntu-latest
environment: coditect-production
needs: [unit-tests, integration-tests, security-scan, lint]
steps:
- uses: actions/checkout@v4
- name: Authenticate to Google Cloud
uses: google-github-actions/auth@v2
with:
credentials_json: ${{ secrets.GCP_SA_KEY }}
- name: Set up Cloud SDK
uses: google-github-actions/setup-gcloud@v2
- name: Copy Django app to CODITECT repo
run: |
# Clone CODITECT repo
git clone https://github.com/coditect-ai/coditect-backend.git /tmp/coditect
# Copy context intelligence Django app
cp -r coditect/ /tmp/coditect/apps/context_intelligence/
# Update requirements.txt
cat requirements.txt >> /tmp/coditect/requirements.txt
cd /tmp/coditect
- name: Run Django migrations
run: |
cd /tmp/coditect
python manage.py makemigrations context_intelligence
python manage.py migrate --noinput
- name: Trigger Cloud Build
run: |
gcloud builds submit \
--config=cloudbuild.yaml \
--project=${{ env.GCP_PROJECT_ID }} \
--substitutions=_SERVICE_NAME=${{ env.CLOUD_RUN_SERVICE }}
- name: Deploy to Cloud Run
run: |
gcloud run deploy ${{ env.CLOUD_RUN_SERVICE }} \
--image=gcr.io/${{ env.GCP_PROJECT_ID }}/coditect-backend:${{ github.sha }} \
--platform=managed \
--region=us-central1 \
--allow-unauthenticated \
--set-env-vars="DJANGO_SETTINGS_MODULE=coditect.settings.production"
- name: Health check
run: |
URL=$(gcloud run services describe ${{ env.CLOUD_RUN_SERVICE }} \
--platform=managed \
--region=us-central1 \
--format='value(status.url)')
curl -f $URL/health || exit 1
Step 4: Database Migrations (Zero Downtime)
# migrations/migration_strategy.py
"""
Zero-downtime migration strategy for PostgreSQL.
Pattern: Expand-Migrate-Contract
1. EXPAND: Add new column (nullable)
2. MIGRATE: Backfill data
3. CONTRACT: Make column non-nullable (after all services updated)
"""
# Example: Adding required column
# Step 1 (Week 1): EXPAND
def upgrade_step1():
# Add nullable column
op.add_column('conversations',
sa.Column('organization_id', sa.UUID(), nullable=True)
)
# Add index (for performance)
op.create_index('idx_conversations_org',
'conversations', ['organization_id']
)
# Step 2 (Week 2): MIGRATE (background job)
@celery_task
def backfill_organization_id():
# Backfill data in batches
for conversation in Conversation.objects.filter(organization_id__isnull=True).iterator():
conversation.organization_id = conversation.user.organization_id
conversation.save(update_fields=['organization_id'])
# Step 3 (Week 3): CONTRACT
def upgrade_step3():
# Make column non-nullable (after backfill complete)
op.alter_column('conversations', 'organization_id',
existing_type=sa.UUID(),
nullable=False
)
Consequences
Positive
-
✅ Fast Deployments
- Standalone: <10 minutes (build → test → deploy → rollout)
- CODITECT: <5 minutes (leverages existing infrastructure)
-
✅ Automated Quality Gates
- 80%+ test coverage enforced
- Security scans block vulnerable code
- Linting ensures code quality
-
✅ Progressive Rollout (Canary)
- Catch bugs before affecting all users
- 10% → 50% → 100% traffic shift
- Automatic rollback on errors
-
✅ Zero-Downtime Migrations
- Expand-Migrate-Contract pattern
- Database changes don't block deployments
- Background backfill jobs
-
✅ Deployment Flexibility
- Standalone: Deploy to any Kubernetes cluster (GKE, EKS, AKS)
- CODITECT: Integrated with existing GCP Cloud Run deployment
Negative
-
⚠️ Dual Pipeline Maintenance
- Complexity: Two deployment pipelines (Kubernetes vs. GCP)
- Mitigation: 90% shared CI steps (build, test, scan)
- Overhead: ~10% additional DevOps time
-
⚠️ Migration Coordination
- Problem: Must coordinate 3-step migration (Expand → Migrate → Contract)
- Risk: Forgetting Step 3 (Contract) leaves nullable columns
- Mitigation: Automated reminders, checklist in JIRA
-
⚠️ Canary Monitoring
- Problem: Must monitor canary for errors (manual vigilance)
- Mitigation: Automated monitoring (Prometheus alerts)
- Future: Automatic rollback based on error rate threshold
-
⚠️ Helm Chart Maintenance
- Complexity: Must maintain Helm chart for Kubernetes deployment
- Overhead: ~5% additional time for Helm updates
- Mitigation: Use best-practice templates (Bitnami charts as base)
Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Failed Deployment | Low | High | Automated rollback, staging environment |
| Database Migration Failure | Low | Critical | Test migrations in staging, backup database |
| Canary Goes Unnoticed | Medium | Medium | Prometheus alerts, Slack notifications |
| Secret Leakage | Low | Critical | GitHub Secrets, never commit secrets |
Alternatives Considered
Alternative 1: Manual Deployment (No CI/CD)
Architecture: Developers deploy manually via kubectl apply or gcloud run deploy
Pros:
- ✅ No CI/CD setup required
Cons:
- ❌ Error-prone: Human mistakes (wrong image tag, missed migration)
- ❌ Slow: 30-60 minutes per deployment
- ❌ No quality gates: Can deploy broken code
Why Rejected: Manual deployment is unacceptable for production system.
Alternative 2: GitLab CI/CD (Instead of GitHub Actions)
Architecture: Use GitLab CI/CD pipelines
Pros:
- ✅ Integrated with GitLab (if using GitLab)
- ✅ Powerful CI/CD features
Cons:
- ❌ GitHub-first: Project already on GitHub
- ❌ Migration cost: Must migrate repo or use GitLab mirroring
- ❌ Team familiarity: Team knows GitHub Actions
Why Rejected: Stick with GitHub Actions (project already on GitHub).
Alternative 3: ArgoCD for Kubernetes (GitOps)
Architecture: Use ArgoCD for declarative Kubernetes deployments
Pros:
- ✅ GitOps best practice (desired state in Git)
- ✅ Automatic drift detection
- ✅ Better rollback (just revert Git commit)
Cons:
- ❌ Additional complexity: Must learn ArgoCD
- ❌ Overhead: Extra service to maintain
- ❌ Overkill: Simple deployments don't need GitOps
Why Rejected (for now): Helm is sufficient for Phase 1. Revisit ArgoCD in Year 2 if needed.
Success Metrics
Deployment Metrics
- Deployment frequency: 5+ deployments per week
- Deployment time: <10 minutes (standalone), <5 minutes (CODITECT)
- Success rate: 95%+ deployments succeed without rollback
- Rollback time: <5 minutes if deployment fails
Quality Metrics
- Test coverage: 80%+ maintained
- Security scan failures: 0 critical vulnerabilities in production
- Linting violations: 0 errors (100% compliance)
Reliability Metrics
- Zero-downtime migrations: 100% (no service interruption)
- Canary detection: 90%+ errors caught before full rollout
Implementation Plan
Phase 1: CI Pipeline (Week 1)
- Set up GitHub Actions workflows
- Configure unit tests, integration tests, security scans
- Set up Codecov for coverage reporting
- Add status badges to README
Phase 2: Standalone Deployment (Week 2)
- Create Dockerfile for FastAPI app
- Create Helm chart
- Deploy to staging Kubernetes cluster
- Configure canary deployments
Phase 3: CODITECT Deployment (Week 3)
- Integrate with CODITECT's Cloud Build
- Test Django migrations in CODITECT staging
- Deploy to CODITECT production
- Monitor for 1 week
Phase 4: Zero-Downtime Migrations (Week 4)
- Document Expand-Migrate-Contract pattern
- Create migration templates
- Test 3-step migration in staging
- Add automated reminders for Step 3
References
CI/CD Best Practices:
Zero-Downtime Migrations:
Related ADRs:
- ADR-001: Hybrid Architecture (two deployment modes)
- ADR-003: FastAPI vs Django (different deployment strategies)
- ADR-004: Multi-Tenant RLS (migration strategies for multi-tenant data)
Status: Proposed Review Date: 2025-12-03 Projected ADR Score: 38/40 (A) Complexity: Medium-High (dual pipelines, zero-downtime migrations) Owner: DevOps Team + Engineering Team
Next Steps:
- Approve CI/CD strategy (GitHub Actions + Kubernetes + GCP Cloud Build)
- Set up GitHub Actions workflows (Week 1)
- Create Helm chart for standalone deployment (Week 2)
- Integrate with CODITECT Cloud Build (Week 3)
- Document zero-downtime migration pattern (Week 4)