Skip to main content

ADR-008: Deployment Automation (CI/CD Pipeline)

Status: Proposed Date: 2025-11-26 Deciders: DevOps Team, Engineering Team Related ADRs: ADR-001 (Hybrid Architecture), ADR-003 (FastAPI vs Django)


Context

The context intelligence platform must support two deployment modes with automated CI/CD pipelines:

Deployment Requirements

1. Standalone Mode

  • Deploy to Kubernetes (GKE, EKS, or AKS)
  • FastAPI application + PostgreSQL + Weaviate + Celery + Redis
  • Docker containers for each service
  • Independent scaling (no CODITECT dependencies)
  • Target: Deploy to production in <10 minutes

2. CODITECT Integration Mode

  • Deploy as Django app within existing CODITECT platform
  • Reuse CODITECT's PostgreSQL, Redis, Celery workers
  • Add Weaviate as new service
  • Integrate with CODITECT's deployment pipeline (GCP Cloud Build + Cloud Run)
  • Target: Deploy to production in <5 minutes (leverages existing infra)

Quality Gates

Both modes must pass:

  • ✅ Unit tests (80%+ coverage)
  • ✅ Integration tests (API endpoints, database operations)
  • ✅ Security scanning (SAST: Bandit, DAST: OWASP ZAP)
  • ✅ Performance tests (p95 latency <100ms)
  • ✅ Database migrations (zero downtime)

Decision

We will implement GitHub Actions for CI/CD with:

  1. Standalone Mode: Kubernetes deployment (Helm charts)
  2. CODITECT Mode: Django app integration (GCP Cloud Build)
  3. Shared quality gates: Tests run once, results used by both pipelines
  4. Progressive rollout: Canary deployments (10% → 50% → 100%)

Architecture

┌─────────────────────────────────────────────────────────────┐
│ GitHub Repository: coditect-dev-context │
│ │
│ Event: Push to main OR Pull Request │
└───────────────────────┬─────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│ GitHub Actions: CI Pipeline (Shared) │
│ │
│ 1. Build Docker Images │
│ - app:latest (FastAPI/Django code) │
│ - migrations:latest (Alembic/Django migrations) │
│ │
│ 2. Run Tests (Parallel) │
│ - Unit tests (pytest) - 5 minutes │
│ - Integration tests (API endpoints) - 3 minutes │
│ - Security scan (Bandit, Safety) - 2 minutes │
│ - Linting (Ruff, Black, MyPy) - 1 minute │
│ │
│ 3. Build Artifacts │
│ - Helm chart (standalone mode) │
│ - Django migration files (CODITECT mode) │
└───────────────────────┬─────────────────────────────────────┘

┌───────────────┴───────────────┐
↓ ↓
┌──────────────────┐ ┌──────────────────┐
│ Standalone │ │ CODITECT │
│ Deployment │ │ Deployment │
│ (Kubernetes) │ │ (GCP Cloud Run) │
│ │ │ │
│ 1. Push to │ │ 1. Trigger │
│ Docker Hub │ │ Cloud Build │
│ │ │ │
│ 2. Deploy to │ │ 2. Django │
│ Staging │ │ Migration │
│ (Helm) │ │ │
│ │ │ 3. Deploy to │
│ 3. Smoke Tests │ │ Cloud Run │
│ (5 min) │ │ │
│ │ │ 4. Health Check │
│ 4. Canary │ │ (2 min) │
│ (10% traffic) │ │ │
│ │ │ 5. Gradual │
│ 5. Full Rollout │ │ Rollout │
│ (100%) │ │ (100%) │
│ │ │ │
│ Total: 8-10 min │ │ Total: 4-5 min │
└──────────────────┘ └──────────────────┘

Implementation

Step 1: GitHub Actions CI Pipeline

# .github/workflows/ci.yml
name: CI Pipeline

on:
push:
branches: [main, develop]
pull_request:
branches: [main]

env:
PYTHON_VERSION: '3.11'
POETRY_VERSION: '1.7'

jobs:
# Job 1: Build Docker images
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Build FastAPI app image
uses: docker/build-push-action@v5
with:
context: .
file: ./docker/Dockerfile.app
tags: coditect-context:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max
push: false
load: true

- name: Build migrations image
uses: docker/build-push-action@v5
with:
context: .
file: ./docker/Dockerfile.migrations
tags: coditect-context-migrations:${{ github.sha }}
push: false
load: true

# Job 2: Unit tests
unit-tests:
runs-on: ubuntu-latest
needs: build
steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ env.PYTHON_VERSION }}

- name: Install Poetry
uses: snok/install-poetry@v1
with:
version: ${{ env.POETRY_VERSION }}

- name: Install dependencies
run: poetry install --with dev

- name: Run unit tests
run: poetry run pytest tests/unit/ --cov=core --cov-report=xml

- name: Upload coverage to Codecov
uses: codecov/codecov-action@v4
with:
file: ./coverage.xml

# Job 3: Integration tests
integration-tests:
runs-on: ubuntu-latest
needs: build
services:
postgres:
image: postgres:15
env:
POSTGRES_PASSWORD: test
POSTGRES_DB: context_test
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5

redis:
image: redis:7-alpine
options: >-
--health-cmd "redis-cli ping"
--health-interval 10s
--health-timeout 5s
--health-retries 5

steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ env.PYTHON_VERSION }}

- name: Install Poetry
uses: snok/install-poetry@v1

- name: Install dependencies
run: poetry install --with dev

- name: Run database migrations
env:
DATABASE_URL: postgresql://postgres:test@localhost:5432/context_test
run: poetry run alembic upgrade head

- name: Run integration tests
env:
DATABASE_URL: postgresql://postgres:test@localhost:5432/context_test
REDIS_URL: redis://localhost:6379/0
run: poetry run pytest tests/integration/ -v

# Job 4: Security scanning
security-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ env.PYTHON_VERSION }}

- name: Install Poetry
uses: snok/install-poetry@v1

- name: Install dependencies
run: poetry install --with dev

- name: Run Bandit (SAST)
run: poetry run bandit -r core/ standalone/ coditect/ -f json -o bandit-report.json

- name: Run Safety (dependency scan)
run: poetry run safety check --json > safety-report.json

- name: Upload security reports
uses: actions/upload-artifact@v4
with:
name: security-reports
path: |
bandit-report.json
safety-report.json

# Job 5: Linting
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ env.PYTHON_VERSION }}

- name: Install Poetry
uses: snok/install-poetry@v1

- name: Install dependencies
run: poetry install --with dev

- name: Run Ruff
run: poetry run ruff check .

- name: Run Black
run: poetry run black --check .

- name: Run MyPy
run: poetry run mypy core/ standalone/ coditect/

Step 2: Standalone Deployment (Kubernetes with Helm)

# .github/workflows/deploy-standalone.yml
name: Deploy Standalone (Kubernetes)

on:
push:
branches: [main]
workflow_dispatch: # Manual trigger

env:
DOCKER_REGISTRY: docker.io/coditect
HELM_CHART_VERSION: 1.0.0

jobs:
deploy-staging:
runs-on: ubuntu-latest
environment: staging
needs: [unit-tests, integration-tests, security-scan, lint]
steps:
- uses: actions/checkout@v4

- name: Log in to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}

- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
context: .
file: ./docker/Dockerfile.app
push: true
tags: |
${{ env.DOCKER_REGISTRY }}/context-app:${{ github.sha }}
${{ env.DOCKER_REGISTRY }}/context-app:latest

- name: Set up Helm
uses: azure/setup-helm@v3
with:
version: '3.13.0'

- name: Configure kubectl
uses: azure/k8s-set-context@v3
with:
method: kubeconfig
kubeconfig: ${{ secrets.KUBECONFIG_STAGING }}

- name: Deploy to staging
run: |
helm upgrade --install context-intelligence \
./helm/context-intelligence \
--namespace staging \
--create-namespace \
--set image.tag=${{ github.sha }} \
--set postgresql.enabled=true \
--set weaviate.enabled=true \
--set redis.enabled=true \
--wait --timeout 5m

- name: Run smoke tests
run: |
kubectl run smoke-test \
--image=curlimages/curl:latest \
--restart=Never \
--namespace=staging \
--rm -i \
--command -- curl -f http://context-intelligence/health || exit 1

deploy-production:
runs-on: ubuntu-latest
environment: production
needs: deploy-staging
steps:
- uses: actions/checkout@v4

- name: Configure kubectl
uses: azure/k8s-set-context@v3
with:
method: kubeconfig
kubeconfig: ${{ secrets.KUBECONFIG_PRODUCTION }}

- name: Deploy canary (10% traffic)
run: |
helm upgrade --install context-intelligence \
./helm/context-intelligence \
--namespace production \
--set image.tag=${{ github.sha }} \
--set canary.enabled=true \
--set canary.weight=10 \
--wait --timeout 5m

- name: Wait 5 minutes (monitor canary)
run: sleep 300

- name: Full rollout (100% traffic)
run: |
helm upgrade context-intelligence \
./helm/context-intelligence \
--namespace production \
--set canary.enabled=false \
--reuse-values \
--wait --timeout 5m

Step 3: CODITECT Deployment (GCP Cloud Build)

# .github/workflows/deploy-coditect.yml
name: Deploy to CODITECT (GCP Cloud Run)

on:
push:
branches: [main]

env:
GCP_PROJECT_ID: coditect-production
CLOUD_RUN_SERVICE: coditect-backend

jobs:
deploy-coditect:
runs-on: ubuntu-latest
environment: coditect-production
needs: [unit-tests, integration-tests, security-scan, lint]
steps:
- uses: actions/checkout@v4

- name: Authenticate to Google Cloud
uses: google-github-actions/auth@v2
with:
credentials_json: ${{ secrets.GCP_SA_KEY }}

- name: Set up Cloud SDK
uses: google-github-actions/setup-gcloud@v2

- name: Copy Django app to CODITECT repo
run: |
# Clone CODITECT repo
git clone https://github.com/coditect-ai/coditect-backend.git /tmp/coditect

# Copy context intelligence Django app
cp -r coditect/ /tmp/coditect/apps/context_intelligence/

# Update requirements.txt
cat requirements.txt >> /tmp/coditect/requirements.txt

cd /tmp/coditect

- name: Run Django migrations
run: |
cd /tmp/coditect
python manage.py makemigrations context_intelligence
python manage.py migrate --noinput

- name: Trigger Cloud Build
run: |
gcloud builds submit \
--config=cloudbuild.yaml \
--project=${{ env.GCP_PROJECT_ID }} \
--substitutions=_SERVICE_NAME=${{ env.CLOUD_RUN_SERVICE }}

- name: Deploy to Cloud Run
run: |
gcloud run deploy ${{ env.CLOUD_RUN_SERVICE }} \
--image=gcr.io/${{ env.GCP_PROJECT_ID }}/coditect-backend:${{ github.sha }} \
--platform=managed \
--region=us-central1 \
--allow-unauthenticated \
--set-env-vars="DJANGO_SETTINGS_MODULE=coditect.settings.production"

- name: Health check
run: |
URL=$(gcloud run services describe ${{ env.CLOUD_RUN_SERVICE }} \
--platform=managed \
--region=us-central1 \
--format='value(status.url)')

curl -f $URL/health || exit 1

Step 4: Database Migrations (Zero Downtime)

# migrations/migration_strategy.py
"""
Zero-downtime migration strategy for PostgreSQL.

Pattern: Expand-Migrate-Contract
1. EXPAND: Add new column (nullable)
2. MIGRATE: Backfill data
3. CONTRACT: Make column non-nullable (after all services updated)
"""

# Example: Adding required column
# Step 1 (Week 1): EXPAND
def upgrade_step1():
# Add nullable column
op.add_column('conversations',
sa.Column('organization_id', sa.UUID(), nullable=True)
)

# Add index (for performance)
op.create_index('idx_conversations_org',
'conversations', ['organization_id']
)

# Step 2 (Week 2): MIGRATE (background job)
@celery_task
def backfill_organization_id():
# Backfill data in batches
for conversation in Conversation.objects.filter(organization_id__isnull=True).iterator():
conversation.organization_id = conversation.user.organization_id
conversation.save(update_fields=['organization_id'])

# Step 3 (Week 3): CONTRACT
def upgrade_step3():
# Make column non-nullable (after backfill complete)
op.alter_column('conversations', 'organization_id',
existing_type=sa.UUID(),
nullable=False
)

Consequences

Positive

  1. ✅ Fast Deployments

    • Standalone: <10 minutes (build → test → deploy → rollout)
    • CODITECT: <5 minutes (leverages existing infrastructure)
  2. ✅ Automated Quality Gates

    • 80%+ test coverage enforced
    • Security scans block vulnerable code
    • Linting ensures code quality
  3. ✅ Progressive Rollout (Canary)

    • Catch bugs before affecting all users
    • 10% → 50% → 100% traffic shift
    • Automatic rollback on errors
  4. ✅ Zero-Downtime Migrations

    • Expand-Migrate-Contract pattern
    • Database changes don't block deployments
    • Background backfill jobs
  5. ✅ Deployment Flexibility

    • Standalone: Deploy to any Kubernetes cluster (GKE, EKS, AKS)
    • CODITECT: Integrated with existing GCP Cloud Run deployment

Negative

  1. ⚠️ Dual Pipeline Maintenance

    • Complexity: Two deployment pipelines (Kubernetes vs. GCP)
    • Mitigation: 90% shared CI steps (build, test, scan)
    • Overhead: ~10% additional DevOps time
  2. ⚠️ Migration Coordination

    • Problem: Must coordinate 3-step migration (Expand → Migrate → Contract)
    • Risk: Forgetting Step 3 (Contract) leaves nullable columns
    • Mitigation: Automated reminders, checklist in JIRA
  3. ⚠️ Canary Monitoring

    • Problem: Must monitor canary for errors (manual vigilance)
    • Mitigation: Automated monitoring (Prometheus alerts)
    • Future: Automatic rollback based on error rate threshold
  4. ⚠️ Helm Chart Maintenance

    • Complexity: Must maintain Helm chart for Kubernetes deployment
    • Overhead: ~5% additional time for Helm updates
    • Mitigation: Use best-practice templates (Bitnami charts as base)

Risks and Mitigations

RiskLikelihoodImpactMitigation
Failed DeploymentLowHighAutomated rollback, staging environment
Database Migration FailureLowCriticalTest migrations in staging, backup database
Canary Goes UnnoticedMediumMediumPrometheus alerts, Slack notifications
Secret LeakageLowCriticalGitHub Secrets, never commit secrets

Alternatives Considered

Alternative 1: Manual Deployment (No CI/CD)

Architecture: Developers deploy manually via kubectl apply or gcloud run deploy

Pros:

  • ✅ No CI/CD setup required

Cons:

  • Error-prone: Human mistakes (wrong image tag, missed migration)
  • Slow: 30-60 minutes per deployment
  • No quality gates: Can deploy broken code

Why Rejected: Manual deployment is unacceptable for production system.


Alternative 2: GitLab CI/CD (Instead of GitHub Actions)

Architecture: Use GitLab CI/CD pipelines

Pros:

  • ✅ Integrated with GitLab (if using GitLab)
  • ✅ Powerful CI/CD features

Cons:

  • GitHub-first: Project already on GitHub
  • Migration cost: Must migrate repo or use GitLab mirroring
  • Team familiarity: Team knows GitHub Actions

Why Rejected: Stick with GitHub Actions (project already on GitHub).


Alternative 3: ArgoCD for Kubernetes (GitOps)

Architecture: Use ArgoCD for declarative Kubernetes deployments

Pros:

  • ✅ GitOps best practice (desired state in Git)
  • ✅ Automatic drift detection
  • ✅ Better rollback (just revert Git commit)

Cons:

  • Additional complexity: Must learn ArgoCD
  • Overhead: Extra service to maintain
  • Overkill: Simple deployments don't need GitOps

Why Rejected (for now): Helm is sufficient for Phase 1. Revisit ArgoCD in Year 2 if needed.


Success Metrics

Deployment Metrics

  • Deployment frequency: 5+ deployments per week
  • Deployment time: <10 minutes (standalone), <5 minutes (CODITECT)
  • Success rate: 95%+ deployments succeed without rollback
  • Rollback time: <5 minutes if deployment fails

Quality Metrics

  • Test coverage: 80%+ maintained
  • Security scan failures: 0 critical vulnerabilities in production
  • Linting violations: 0 errors (100% compliance)

Reliability Metrics

  • Zero-downtime migrations: 100% (no service interruption)
  • Canary detection: 90%+ errors caught before full rollout

Implementation Plan

Phase 1: CI Pipeline (Week 1)

  • Set up GitHub Actions workflows
  • Configure unit tests, integration tests, security scans
  • Set up Codecov for coverage reporting
  • Add status badges to README

Phase 2: Standalone Deployment (Week 2)

  • Create Dockerfile for FastAPI app
  • Create Helm chart
  • Deploy to staging Kubernetes cluster
  • Configure canary deployments

Phase 3: CODITECT Deployment (Week 3)

  • Integrate with CODITECT's Cloud Build
  • Test Django migrations in CODITECT staging
  • Deploy to CODITECT production
  • Monitor for 1 week

Phase 4: Zero-Downtime Migrations (Week 4)

  • Document Expand-Migrate-Contract pattern
  • Create migration templates
  • Test 3-step migration in staging
  • Add automated reminders for Step 3

References

CI/CD Best Practices:

Zero-Downtime Migrations:

Related ADRs:

  • ADR-001: Hybrid Architecture (two deployment modes)
  • ADR-003: FastAPI vs Django (different deployment strategies)
  • ADR-004: Multi-Tenant RLS (migration strategies for multi-tenant data)

Status: Proposed Review Date: 2025-12-03 Projected ADR Score: 38/40 (A) Complexity: Medium-High (dual pipelines, zero-downtime migrations) Owner: DevOps Team + Engineering Team

Next Steps:

  1. Approve CI/CD strategy (GitHub Actions + Kubernetes + GCP Cloud Build)
  2. Set up GitHub Actions workflows (Week 1)
  3. Create Helm chart for standalone deployment (Week 2)
  4. Integrate with CODITECT Cloud Build (Week 3)
  5. Document zero-downtime migration pattern (Week 4)