Skip to main content

Work Order QMS Module — Deployment Architecture

Classification: Internal — Engineering / DevOps
Date: 2026-02-13
Artifact: 72 of WO System Series
Status: Proposed
Source Artifacts: 13-tdd.md §3, 14-c4-architecture.md, 65-testing-strategy.md §6, 66-operational-readiness.md, 69-versioning-evolution-strategy.md §6


1. Environment Strategy

1.1 Environment Tiers

EnvironmentPurposeDataRefresh CadenceAccess
localDeveloper workstationSynthetic seedOn demandDeveloper only
devIntegration testing, feature branchesSynthetic seedDaily resetEngineering team
stagingPre-release validation, OQ executionSynthetic + golden datasetPer release candidateEngineering + QA + Compliance
productionCustomer-facingReal customer data (L3/L4)N/ASRE + on-call (limited)
drDisaster recovery standbyReplicated from productionContinuous (WAL)SRE (activation only)

1.2 Environment Parity Rules

  • local/dev/staging use identical container images, only configuration differs.
  • staging runs the same PostgreSQL version, RLS policies, and NATS configuration as production.
  • staging audit trail verification and e-signature validation are fully active (not mocked).
  • Feature flags (69-versioning-evolution-strategy.md §2) control feature availability per environment, never code branches.
  • No production data ever flows to non-production environments (63-data-architecture.md §2.1).

1.3 Configuration Hierarchy

Base config (defaults)
└── Environment overlay (dev/staging/prod)
└── Tenant overlay (per-customer config)
└── Feature flags (dynamic, runtime)

Configuration sources:

SourceContentsFormatSecrets
Git repo (config/)Base + environment overlaysYAMLNever
Terraform stateInfrastructure resource IDs, endpointsHCL → stateOutputs only
GCP Secret Manager / VaultAPI keys, DB credentials, signing keysKey-valueAll secrets
PostgreSQL tenant_configPer-tenant settings, compliance framework selectionJSONNever (references to secrets only)
Runtime feature flagsFeature toggles per tenant/rolePostgreSQL feature_flags tableNever

2. Infrastructure as Code

2.1 Terraform Structure

infrastructure/
├── modules/
│ ├── gke-cluster/ # GKE cluster + node pools
│ ├── cloud-sql/ # PostgreSQL HA instance + read replicas
│ ├── nats/ # NATS JetStream cluster (Helm)
│ ├── vault/ # HashiCorp Vault (Helm + config)
│ ├── networking/ # VPC, subnets, firewall rules, Cloud NAT
│ ├── observability/ # Prometheus, Grafana, OTEL collector
│ ├── dns-ssl/ # Cloud DNS + managed SSL certificates
│ └── iam/ # Service accounts, IAM bindings
├── environments/
│ ├── dev/
│ │ ├── main.tf # Module instantiation with dev params
│ │ ├── variables.tf # Dev-specific variable values
│ │ └── backend.tf # GCS state bucket (dev)
│ ├── staging/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── backend.tf
│ └── production/
│ ├── main.tf
│ ├── variables.tf
│ └── backend.tf
├── dr/ # DR region infrastructure (mirrored)
│ └── main.tf
└── shared/
├── container-registry/ # Artifact Registry
├── kms/ # Cloud KMS keys for encryption
└── state-buckets/ # GCS buckets for Terraform state

2.2 Key Infrastructure Resources

ResourceServiceSpec (Production)HA Strategy
Kubernetes clusterGKE Autopilot3 zones, auto-scalingMulti-zone
PostgreSQLCloud SQL Enterprisedb-custom-4-16384, HASynchronous replica + PITR
Event busNATS JetStream3-node cluster on GKEBuilt-in Raft consensus
SecretsVault on GKE3-node HA, GCS backendRaft storage backend
Container registryArtifact RegistryStandardMulti-region replication
CDN / LBCloud Load BalancingGlobal HTTP(S) LBAnycast, multi-region
DNSCloud DNSPublic zone100% SLA
KMSCloud KMSAutomatic key rotationRegional + multi-region
Object storageCloud StorageStandard (backups, evidence)Multi-region bucket

2.3 Network Architecture

VPC: coditect-vpc (10.0.0.0/16)
├── Subnet: gke-nodes (10.0.0.0/20) — GKE node IPs
├── Subnet: gke-pods (10.4.0.0/14) — GKE pod IPs (secondary)
├── Subnet: gke-services (10.8.0.0/20) — GKE service IPs (secondary)
├── Subnet: cloud-sql (10.0.16.0/24) — Private Service Connect
└── Subnet: vault (10.0.17.0/24) — Vault cluster

Firewall rules:
- Ingress: HTTPS (443) from Cloud LB only
- Internal: All pods can reach NATS, PostgreSQL, Vault
- Egress: AI model APIs (Anthropic, OpenAI), SMTP, webhook delivery
- Deny: All other ingress/egress (default deny)

Private Service Connect: Cloud SQL, Secret Manager (no public IPs)
Cloud NAT: Outbound internet for AI API calls and webhook delivery

3. Container Strategy

3.1 Container Images

ImageBaseSize TargetContents
wo-apigcr.io/distroless/nodejs20-debian12< 150MBExpress API server, state machine, RBAC
wo-compliancegcr.io/distroless/python3-debian12< 200MBCompliance engine, policy rules, PHI scanner
wo-agentsgcr.io/distroless/python3-debian12< 200MBAgent workers, model routing, orchestration
wo-migrationsnode:20-alpine< 100MBPrisma migrations + seed data
wo-schedulergcr.io/distroless/nodejs20-debian12< 100MBPM automation, schedule-based WO generation

3.2 Image Security

ControlImplementationReference
Base imagesDistroless only (no shell, no package manager)64-security-architecture.md §6
Vulnerability scanningTrivy on every build; block Critical/HighCI pipeline stage
Image signingCosign with Cloud KMS keySupply chain security
SBOM generationSyft → SPDX format, attached to image64-security-architecture.md §6
No rootAll containers run as non-root (UID 1000)Kubernetes SecurityContext
Read-only filesystemreadOnlyRootFilesystem: truePod spec

3.3 Kubernetes Manifests

k8s/
├── base/ # Kustomize base
│ ├── kustomization.yaml
│ ├── namespace.yaml # wo-system namespace
│ ├── wo-api/
│ │ ├── deployment.yaml # 3 replicas, rolling update
│ │ ├── service.yaml # ClusterIP
│ │ ├── hpa.yaml # CPU 70% → scale 3-10
│ │ └── pdb.yaml # minAvailable: 2
│ ├── wo-compliance/
│ │ ├── deployment.yaml # 2 replicas
│ │ └── ...
│ ├── wo-agents/
│ │ ├── deployment.yaml # 2-8 replicas (auto-scale on queue depth)
│ │ └── ...
│ ├── wo-scheduler/
│ │ ├── cronjob.yaml # Schedule-based PM generation
│ │ └── ...
│ ├── networkpolicy.yaml # Pod-to-pod communication rules
│ └── serviceaccount.yaml # Workload Identity for GCP access
├── overlays/
│ ├── dev/ # 1 replica, lower resources
│ ├── staging/ # Production-like replicas
│ └── production/ # Full replicas, resource limits
└── jobs/
├── migration.yaml # Database migration Job
├── seed.yaml # Seed data Job (non-prod only)
└── audit-verify.yaml # Periodic audit chain verification

4. CI/CD Pipeline

4.1 Pipeline Architecture

                    ┌─────────────┐
│ Developer │
│ pushes code │
└──────┬──────┘

┌──────▼──────┐
│ GitHub PR │
│ created │
└──────┬──────┘

┌────────────▼────────────┐
│ CI: Validate │
│ ┌─────────────────┐ │
│ │ Lint + Type Check│ │
│ │ Unit Tests │ │
│ │ Contract Tests │ │
│ │ Security Scan │ │
│ │ SBOM Generate │ │
│ └─────────────────┘ │
└────────────┬────────────┘
│ All pass
┌────────────▼────────────┐
│ CI: Build & Publish │
│ ┌─────────────────┐ │
│ │ Docker Build │ │
│ │ Trivy Scan │ │
│ │ Cosign Sign │ │
│ │ Push to Registry │ │
│ └─────────────────┘ │
└────────────┬────────────┘
│ Merge to main
┌────────────▼────────────┐
│ CD: Deploy to Dev │
│ (auto on merge) │
└────────────┬────────────┘
│ Integration tests pass
┌────────────▼────────────┐
│ CD: Deploy to Staging │
│ (auto after dev green) │
└────────────┬────────────┘

┌────────────▼────────────┐
│ Validation Gates │
│ ┌─────────────────┐ │
│ │ OQ Test Suite │ │ ← Automated (65-testing-strategy.md)
│ │ PQ Subset │ │
│ │ Compliance Check │ │ ← Compliance engine validation
│ │ Performance Test │ │ ← P95 < 500ms verified
│ └─────────────────┘ │
└────────────┬────────────┘
│ All pass
┌────────────▼────────────┐
│ Manual Gate: Release │ ← Human approval required
│ Approval (QA + Eng) │ for production deployment
└────────────┬────────────┘
│ Approved
┌────────────▼────────────┐
│ CD: Deploy to Prod │
│ (canary → rolling) │
└────────────┬────────────┘

┌────────────▼────────────┐
│ Post-Deploy Validation │
│ ┌─────────────────┐ │
│ │ Smoke Tests │ │
│ │ Audit Integrity │ │
│ │ Canary Metrics │ │
│ └─────────────────┘ │
└─────────────────────────┘

4.2 CI Stage Details

StageToolDuration TargetFailure Action
Lint + formatESLint, Prettier, Ruff< 30sBlock PR
Type checktsc --noEmit, mypy< 60sBlock PR
Unit testsVitest (TS), pytest (Python)< 2 minBlock PR
Contract testsPact< 1 minBlock PR
Security scanSnyk, semgrep< 2 minBlock PR (Critical/High)
SBOM generationSyft< 30sInformational
Docker buildDocker + Buildx< 3 minBlock PR
Image vulnerability scanTrivy< 1 minBlock (Critical), warn (High)
Image signingCosign + Cloud KMS< 15sBlock PR
Total CI time< 10 min

4.3 CD Stage Details

StageTriggerStrategyRollback
Deploy to devMerge to mainReplace (fast)Re-deploy previous image
Deploy to stagingDev integration tests passRolling updateAutomatic on test failure
Validation gatesStaging deploy completeAutomated test suitesBlock promotion
Release approvalValidation gates passManual (QA + Eng sign-off)N/A
Deploy to productionRelease approvedCanary (10% → 50% → 100%)Automatic on error rate spike
Post-deploy validationProduction deploy completeSmoke tests + canary metricsAutomatic rollback if fail

4.4 Production Deployment Strategy

# Canary deployment progression
canary:
steps:
- setWeight: 10 # 10% of traffic to new version
pause: { duration: 5m }
analysis:
- metric: error_rate
threshold: 0.01 # < 1% error rate
- metric: p95_latency
threshold: 500 # < 500ms
- setWeight: 50 # 50% if canary healthy
pause: { duration: 10m }
analysis: [same metrics]
- setWeight: 100 # Full rollout if still healthy
rollback:
trigger: error_rate > 0.02 OR p95_latency > 1000
action: automatic_rollback

4.5 Database Migration Strategy

Migrations run as Kubernetes Jobs before application deployment:

1. Pre-migration snapshot (automated Cloud SQL backup)
2. Run migration Job (Prisma migrate deploy)
3. Verify migration success (check migration table)
4. If failed → automatic restore from snapshot
5. If succeeded → proceed with application deployment

Rules:
- All migrations are forward-only in production
- Expand-contract pattern for schema changes (69-versioning-evolution-strategy.md §3)
- L4 tables (audit_trail, electronic_signature): additive-only, never ALTER/DROP
- RLS policies verified post-migration
- Migration timing: off-peak hours (2–4 AM UTC) for schema changes

5. Release Process

5.1 Release Types

TypeCadenceApprovalDB MigrationDowntime
Patch (x.y.Z)As neededEng leadRarelyZero
Minor (x.Y.0)Bi-weeklyEng lead + QASometimesZero
Major (X.0.0)QuarterlyEng lead + QA + Compliance + ProductUsuallyPlanned (< 15 min)
HotfixEmergencyCTO + on-callIf neededZero (canary)

5.2 Release Checklist

## Release v[X.Y.Z] Checklist

Pre-Release:
☐ All CI checks passing on release branch
☐ Integration tests passing in staging
☐ OQ test suite passing (automated)
☐ Performance tests meeting SLA targets
☐ Security scan: zero Critical, zero High unmitigated
☐ SBOM generated and attached
☐ Changelog updated
☐ Feature flags configured for target audience

Regulatory (if applicable):
☐ Compliance engine validation passing
☐ Audit trail integrity verified in staging
☐ E-signature flow tested end-to-end
☐ IQ evidence package generated (for new infrastructure)
☐ OQ evidence package generated (for functional changes)
☐ PQ evidence package generated (for performance changes)
☐ Traceability matrix updated

Approval:
☐ Engineering lead sign-off
☐ QA sign-off (for minor+)
☐ Compliance sign-off (for major or regulatory changes)
☐ Product sign-off (for major)

Deploy:
☐ Database migration tested in staging
☐ Pre-migration backup verified
☐ Canary deployment initiated
☐ Canary metrics monitored (10% → 50% → 100%)
☐ Post-deploy smoke tests passing

Post-Release:
☐ Customer release notes published
☐ Status page updated
☐ Monitoring dashboards verified
☐ On-call team briefed on changes

6. Secrets Management in Deployment

SecretStorageInjection MethodRotation
PostgreSQL credentialsVault / GCP Secret ManagerKubernetes External Secrets Operator90 days
AI model API keysVaultSidecar injection90 days
E-signature signing keyCloud KMSIAM-based access (Workload Identity)Annual
NATS credentialsVaultKubernetes secret90 days
Webhook HMAC keysVaultApplication bootstrapPer customer
Container signing keyCloud KMSCI/CD service accountAnnual

Zero secrets in Git, environment variables, or container images. All secrets injected at runtime via Vault Agent sidecar or External Secrets Operator.


7. Observability in Deployment

SignalCollectionStorageRetention
MetricsPrometheus (scrape)Grafana Cloud13 months
TracesOTEL SDK → OTEL CollectorGrafana Tempo30 days
LogsStructured JSON → FluentbitGrafana Loki90 days (L0–L2), 7 years (L3–L4 audit)
AlertsGrafana Alerting → PagerDutyPagerDutyPer 66-operational-readiness.md §9

8. Cross-Reference

ConcernSpecification Source
Infrastructure cost model66-operational-readiness.md §4
DR failover procedure66-operational-readiness.md §6
Test pyramid in CI65-testing-strategy.md §1
IQ/OQ/PQ in CD70-validation-protocol-templates.md
Schema evolution69-versioning-evolution-strategy.md §3
Feature flags69-versioning-evolution-strategy.md §2
Container security64-security-architecture.md §6
API versioning69-versioning-evolution-strategy.md §1
Environment config13-tdd.md §2

The deployment architecture is the bridge between design and reality. Every artifact in the WO system corpus is dead weight until this pipeline delivers it to production safely and repeatably. This document is the operational counterpart to the SDD — it answers "how does it get there" rather than "what does it look like."