Work Order QMS Module — Security Architecture

Classification: Internal — Security Engineering Date: 2026-02-13 Artifact: 64 of WO System Series Prompt Section: v8.0 §5 — Security Architecture

1. Threat Model (STRIDE)

1.1 System Boundary

The WO system's attack surface spans six boundaries: the API Gateway (external-facing), the Agent Orchestrator (internal, trusted), the Compliance Engine (internal, trusted), Agent Workers (internal, semi-trusted), the Vendor Portal (external-facing, limited trust), and the State Store (internal, highest trust).

1.2 STRIDE Analysis

Spoofing (Identity)

Attack Vector	Target	Likelihood	Impact	Mitigation	Detection	Status
Stolen JWT used to access API	API Gateway	Medium	High — attacker acts as authenticated user	Short-lived JWTs (1hr), refresh token rotation, device fingerprinting	Failed auth monitoring, IP anomaly detection	✅ Designed
Agent token reuse across WO executions	Agent Workers	Low	High — agent acts outside intended scope	Ephemeral per-execution tokens scoped to WO ID, token invalidated on WO completion	Token reuse detection in audit trail	✅ Designed
Vendor impersonation via shared credentials	Vendor Portal	Medium	Medium — unauthorized WO modifications	Per-vendor unique credentials, MFA required, IP allowlisting optional	Login anomaly detection, geo-mismatch alerts	✅ Designed
Forged e-signature (identity claim)	Signature Service	Low	Critical — invalidates regulatory compliance	Re-authentication at signing time (§11.100(b)), cryptographic hash binding (§11.70)	Hash verification on every signature read, chain integrity audit	⚠️ Partial (G02, G05)

Tampering (Integrity)

Attack Vector	Target	Likelihood	Impact	Mitigation	Detection	Status
Direct DB modification bypassing application	State Store	Low (requires DB admin access)	Critical — violates audit trail integrity	PostgreSQL triggers prevent UPDATE/DELETE on audit_trail; separate DB credentials for app vs. admin	Hash chain verification (nightly job), chain break = immediate P1 alert	⚠️ Partial (G03)
WO field modification after approval	WO Service	Low	High — approved record no longer matches what was approved	Optimistic locking (version field), post-approval fields immutable (application-enforced)	Version mismatch detection, audit trail diff on every mutation	✅ Designed
Agent message tampering in transit	Event Bus (NATS)	Low	Medium — agent acts on false instructions	HMAC-SHA256 message signing, nonce-based replay prevention	Signature verification on receipt, sequence gap detection	⚠️ Partial (G04)
Malicious schema migration	State Store	Very Low	Critical — corrupts regulated data	Migration requires approval (ADR link), pre-migration snapshot, tested rollback	Schema hash comparison, migration audit log	✅ Designed

Repudiation (Non-repudiation)

Attack Vector	Target	Likelihood	Impact	Mitigation	Detection	Status
User denies approving WO	Approval/Signature	Medium	High — regulatory compliance failure	ElectronicSignature with re-auth attestation, cryptographic hash binding, immutable audit trail	Signature chain verification, re-auth log correlation	⚠️ Partial (G02, G05)
Agent denies performing action	Agent Workers	Low	Medium — audit gap	Agent session ID in every audit trail entry, message signing, correlation ID chain	Agent execution trace in observability stack	✅ Designed
Admin denies configuration change	Tenant Settings	Low	Medium — accountability gap	Admin actions generate L4 audit entries with re-authentication	Admin audit trail review (weekly)	✅ Designed

Information Disclosure (Confidentiality)

Attack Vector	Target	Likelihood	Impact	Mitigation	Detection	Status
Cross-tenant data leakage	State Store	Low (RLS enforced)	Critical — regulatory violation, trust destruction	PostgreSQL RLS on every table, tenant_id set at connection pool level, RLS penetration tested quarterly	Cross-tenant access attempt logging, automated RLS policy verification	✅ Implemented
PHI exposure in WO descriptions	WO Service	Medium	High — HIPAA violation	PHI detection scanner on WO creation/update, confidence-based response (block/flag/log)	PHI scan results dashboard, false negative review	⚠️ Design Only (G09)
Vendor sees non-assigned WO data	Vendor Portal	Low	Medium — confidentiality breach	Vendor role scoped to assigned WOs only (RBAC + application-level filtering)	Vendor access audit (monthly), access pattern anomaly	✅ Implemented
Credential exposure in job plan	JobPlan Service	Medium	High — lateral movement risk	Vault references only (vault://path), never plaintext; PHI scanner catches credential patterns	Credential pattern detection in L2+ fields	⚠️ Design Only (G01)
Audit trail data exfiltration	API / Export	Low	Critical — bulk regulated data exposure	Export requires AUDITOR or ADMIN role, rate limited, logged, paginated (no bulk dump)	Export volume anomaly detection	✅ Designed

Denial of Service (Availability)

Attack Vector	Target	Likelihood	Impact	Mitigation	Detection	Status
API rate abuse	API Gateway	High	Medium — service degradation	Per-tenant token bucket rate limiting, burst + sustained rates	Rate limit hit monitoring, auto-scaling triggers	✅ Designed
Agent execution storm (infinite loop)	Agent Orchestrator	Medium	High — token budget exhaustion, system overload	Token budget controller (hard stop at 95%), max iteration limits, circuit breakers per agent	Budget threshold alerts (80%), iteration count monitoring	✅ Implemented
Database connection exhaustion	State Store	Low	Critical — system-wide outage	Connection pooling (PgBouncer), per-tenant connection limits, query timeout (30s)	Connection pool saturation alerts, slow query logging	✅ Designed
Event bus flood	NATS	Low	Medium — message processing delay	Per-agent publish rate limits, message size limits (1MB), backpressure signaling	Queue depth monitoring, consumer lag alerts	✅ Designed

Elevation of Privilege (Authorization)

Attack Vector	Target	Likelihood	Impact	Mitigation	Detection	Status
Agent attempts to approve WO (self-elevation)	Agent Workers → Approval	Low (architectural constraint)	Critical — bypasses human control	Agents NEVER hold SYSTEM_OWNER, QA, or ADMIN roles; approval endpoints reject agent tokens	Agent-attempted-approval alert (immediate P1)	✅ Implemented
ASSIGNEE approves their own WO	Approval Service	Medium (user error or intent)	High — SOD violation, Part 11 breach	SOD guard: ASSIGNEE ≠ APPROVER enforced in state machine guard T5	SOD violation audit log, blocked transition logged	✅ Implemented
Admin bypasses approval chain	Admin Console	Low	High — undermines regulatory workflow	ADMIN role can cancel but cannot approve/reject; documented in RBAC matrix	Admin action audit (all admin operations logged at L4)	✅ Implemented
Break-glass abuse (over-broad emergency access)	Break-Glass System	Low	Medium — unauthorized access under emergency cover	4-hour time limit, enhanced audit logging, mandatory post-incident review within 72 hours, break-glass does not bypass SOD	Break-glass activation alert (immediate), usage pattern analysis	⚠️ Design Only (G10)

1.3 Threat Model Summary

STRIDE Category	Total Vectors	✅ Implemented	✅ Designed	⚠️ Partial/Design	Coverage
Spoofing	4	0	3	1	75%
Tampering	4	0	2	2	50%
Repudiation	3	0	2	1	67%
Information Disclosure	5	2	1	2	60%
Denial of Service	4	1	3	0	100%
Elevation of Privilege	4	3	0	1	75%
Total	24	6	11	7	71%

The 7 partial/design-only items map directly to gap closure prompts G01–G05, G09, G10.

2. Authentication Architecture

2.1 Authentication Flow

                    ┌──────────────────────┐
                    │   Identity Provider   │
                    │   (Okta / Azure AD /  │
                    │    Auth0 / Cognito)   │
                    └──────────┬───────────┘
                               │ OIDC / SAML 2.0
                    ┌──────────▼───────────┐
                    │   API Gateway         │
                    │   ┌─────────────────┐ │
                    │   │ Token Validator  │ │
                    │   │ (JWT RS256)      │ │
                    │   └─────────────────┘ │
                    └──────────┬───────────┘
                               │ Validated Claims
              ┌────────────────┼────────────────┐
              ▼                ▼                ▼
        ┌───────────┐  ┌────────────┐  ┌────────────┐
        │ Human     │  │ Service    │  │ Agent      │
        │ Sessions  │  │ Accounts   │  │ Tokens     │
        │           │  │            │  │            │
        │ JWT +     │  │ mTLS +     │  │ Scoped,    │
        │ Refresh   │  │ API Key    │  │ Ephemeral  │
        └───────────┘  └────────────┘  └────────────┘

2.2 Authentication Types

Auth Type	Mechanism	Lifetime	Scope	Rotation	WO System Use
Human session	JWT (RS256) + refresh token	1hr access / 7d refresh / 30min idle timeout	Tenant + roles	Refresh on use	All human API calls
E-signature re-auth	Re-authentication attestation	5 minutes	Single signature	Per-signature	Approval signing events
Service-to-service	mTLS + API key	Certificate: 90d	Service identity	Auto-rotate at 60d	Compliance Engine ↔ State Store
Agent execution	Scoped ephemeral token	WO execution duration	WO ID + agent role	Per-execution	All agent API calls
Vendor portal	JWT (RS256) + MFA	1hr access / no refresh (re-login required)	Assigned WOs only	Per-session	Vendor interactions
Break-glass	Emergency override token	4 hours max	Specified scope (not SOD bypass)	Single-use	Emergency access only

2.3 Session Management

Parameter	Value	Regulatory Requirement
Idle timeout	30 minutes (configurable: 5–120 min)	HIPAA §164.312(a)(2)(iii)
Absolute timeout	12 hours	Security best practice
E-signature window	5 minutes	FDA §11.100(b)
Concurrent sessions	3 max per user	Security best practice
Grace warning	2 minutes before idle timeout	UX requirement
Failed login lockout	5 attempts → 15 minute lockout	HIPAA §164.312(a)(1)
Re-auth for signatures	Every signature event	FDA §11.100(b), §11.200

3. Authorization Architecture

3.1 Layered Model

Layer 1: RBAC  ──→  "Does this role have this permission?"
  │                  8 roles: ORIGINATOR, ASSIGNER, ASSIGNEE, SYSTEM_OWNER,
  │                  QA, VENDOR, ADMIN, AUDITOR
  │                  6 agent roles: AGENT_ORCHESTRATOR, AGENT_ASSET_MGMT,
  │                  AGENT_SCHEDULER, AGENT_VENDOR_COORD, AGENT_DOCUMENTATION,
  │                  AGENT_QA_ASSIST
  ▼
Layer 2: RLS   ──→  "Can this tenant see this row?"
  │                  PostgreSQL RLS on all 22 tables
  │                  tenant_id = current_setting('app.tenant_id')
  ▼
Layer 3: SOD   ──→  "Does this create a conflict of interest?"
  │                  ASSIGNEE ≠ APPROVER
  │                  Both SO + QA required for regulatory WOs
  │                  Agents never approve
  ▼
Layer 4: Scope ──→  "Can this actor access THIS specific resource?"
  │                  Vendors: only assigned WOs
  │                  Agents: only current execution scope
  │                  Auditors: read-only everything in tenant
  ▼
Layer 5: Context ─→ "Do special conditions apply?"
                    Break-glass: bypasses RBAC (not SOD)
                    Training expiration: blocks assignment
                    Certification lapse: blocks execution

3.2 Policy Decision Flow

async function authorize(request: AuthzRequest): Promise<AuthzDecision> {
  // Layer 1: RBAC
  const rolePermission = await checkRBAC(request.actorRole, request.permission);
  if (!rolePermission.allowed) {
    return deny('RBAC', `Role ${request.actorRole} lacks ${request.permission}`);
  }

  // Layer 2: RLS (enforced at DB level, but verified here for defense-in-depth)
  const tenantMatch = request.actorTenantId === request.resourceTenantId;
  if (!tenantMatch) {
    return deny('RLS', 'Cross-tenant access denied');
  }

  // Layer 3: SOD
  if (request.permission === 'APPROVE_WO') {
    const isAssignee = await isActorAssignee(request.actorId, request.resourceId);
    if (isAssignee) {
      return deny('SOD', 'Assignee cannot approve own WO (§11.10(g))');
    }
  }

  // Layer 4: Scope
  if (request.actorRole === 'VENDOR') {
    const isAssignedVendor = await isVendorAssigned(request.actorId, request.resourceId);
    if (!isAssignedVendor) {
      return deny('SCOPE', 'Vendor not assigned to this WO');
    }
  }

  // Layer 5: Context
  if (request.contextFlags?.breakGlass) {
    // Break-glass bypasses RBAC but NOT SOD
    await logBreakGlassAccess(request);
    // Still enforce Layer 3 SOD checks
  }

  return allow(request, [rolePermission]);
}

3.3 Agent Permission Boundaries

Constraint	Enforcement Point	Consequence of Violation
Agents cannot hold SO, QA, or ADMIN roles	Token issuer (Orchestrator)	Token rejected at API Gateway
Agents cannot approve or reject WOs	State machine guard (T5)	Guard violation, human checkpoint triggered
Agents cannot sign electronically	Signature service	Request rejected, P1 alert
Agent scope limited to current WO execution	Token claims include WO ID	Requests outside scope return 403
Agent actions always attributed	Audit trail includes agent session ID	Agent actions auditable end-to-end

4. Secrets Management

4.1 Secret Inventory

Secret	Classification	Storage	Rotation	Current Status
Database connection strings	L3	Vault (HashiCorp / GCP Secret Manager)	90 days	⚠️ Gap G01 — currently env vars
AI model API keys	L3	Vault	Per provider policy (90d default)	⚠️ Gap G01
JWT signing keys	L3	KMS (cloud-native)	Annual + on-demand	✅ Designed
E-signature hash keys	L4	HSM / Cloud KMS	Versioned (never rotated — new version created)	⚠️ Gap G02
Agent execution tokens	L2	In-memory (ephemeral)	Per-execution	✅ Designed
mTLS certificates	L2	Cert-manager (automated)	90 days (Let's Encrypt)	✅ Designed
Encryption keys (at-rest)	L4	KMS (cloud-native)	Annual	✅ Designed
NATS credentials	L2	Vault	90 days	⚠️ Gap G01
Vendor portal OAuth client secrets	L3	Vault	180 days	⚠️ Gap G01

4.2 Vault Integration Pattern (Gap G01)

Application Code → Vault Sidecar → Vault Server → Secret Value

Job Plan credential reference:
  Before (gap): { "db_password": "plaintextvalue" }
  After (G01):  { "db_password": "vault://secret/wo-system/db/prod#password" }

Resolution flow:
  1. Agent needs credential → reads vault reference from JobPlan
  2. Agent requests scoped token from Orchestrator (includes WO ID + credential path)
  3. Vault sidecar resolves reference → returns value in memory
  4. Value used for operation → never persisted outside vault
  5. Vault audit log records: who accessed, when, which secret, from which WO

4.3 Key Management

Key Type	Algorithm	Key Size	Storage	Rotation Trigger
JWT signing	RSA	2048-bit	KMS	Annual or compromise
E-signature hash	SHA-256	256-bit	KMS/HSM	Version-based (new key per year, old keys retained for verification)
Audit trail hash chain	SHA-256	256-bit	KMS	Never rotated (chain integrity)
At-rest encryption	AES-256-GCM	256-bit	KMS	Annual (envelope encryption, rotate DEK)
Message signing (agent-to-agent)	HMAC-SHA256	256-bit	Vault (ephemeral per session)	Per agent session

5. Network Security

5.1 Network Boundaries

┌─────────────────────────────────────────────────┐
│  PUBLIC INTERNET                                 │
│  ┌────────────────────────────────────────────┐  │
│  │  DMZ (WAF + DDoS Protection)               │  │
│  │  ┌──────────────────────────────────────┐  │  │
│  │  │  API Gateway (TLS termination)       │  │  │
│  │  │  Vendor Portal (TLS termination)     │  │  │
│  │  └──────────────┬───────────────────────┘  │  │
│  └─────────────────┼─────────────────────────┘  │
│                    │ mTLS                        │
│  ┌─────────────────▼─────────────────────────┐  │
│  │  PRIVATE NETWORK (VPC)                     │  │
│  │  ┌──────────┐ ┌──────────┐ ┌───────────┐  │  │
│  │  │ Agent    │ │Compliance│ │ Observ.   │  │  │
│  │  │Orchestr. │ │ Engine   │ │ Stack     │  │  │
│  │  └────┬─────┘ └────┬─────┘ └───────────┘  │  │
│  │       │  mTLS       │  mTLS                │  │
│  │  ┌────▼─────────────▼────────────────────┐ │  │
│  │  │  DATA PLANE (most restricted)         │ │  │
│  │  │  ┌───────────┐ ┌──────────┐           │ │  │
│  │  │  │PostgreSQL │ │  NATS    │           │ │  │
│  │  │  │(encrypted │ │(TLS +   │           │ │  │
│  │  │  │ at rest)  │ │ authz)  │           │ │  │
│  │  │  └───────────┘ └──────────┘           │ │  │
│  │  └───────────────────────────────────────┘ │  │
│  └────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────┘

5.2 Network Policies

Rule	Source	Destination	Protocol	Port	Justification
Internet → API Gateway	Any	API Gateway	HTTPS	443	Public API access
Internet → Vendor Portal	Vendor IP allowlist (optional)	Vendor Portal	HTTPS	443	Vendor access
API Gateway → Orchestrator	API Gateway	Agent Orchestrator	gRPC over mTLS	8443	Internal routing
Orchestrator → Agent Workers	Agent Orchestrator	Agent Workers	gRPC over mTLS	8444	Agent dispatch
Any service → PostgreSQL	Private VPC services	PostgreSQL	TLS	5432	Data access
Any service → NATS	Private VPC services	NATS cluster	TLS	4222	Event bus
PostgreSQL → External	PostgreSQL	None	—	—	No outbound (data plane isolated)
Agent Workers → AI Models	Agent Workers	Anthropic/OpenAI API	HTTPS	443	Model calls (via egress proxy)

5.3 Zero Trust Principles

Principle	WO System Implementation
Never trust, always verify	Every request authenticated + authorized, even internal
Least privilege	Tokens scoped to minimum required access; agent tokens scoped to WO
Assume breach	Audit everything; hash chains detect tampering; circuit breakers limit blast radius
Explicit verification	mTLS between all services; no implicit trust based on network position
Encrypt everything	TLS 1.3 in transit; AES-256-GCM at rest; field-level for L3+

6. Supply Chain Security

6.1 Dependency Management

Control	Tool	Frequency	Gate Type
Vulnerability scanning	Snyk / Trivy	Every PR + daily scan	Block on critical/high CVEs
License compliance	FOSSA	Every PR	Block copyleft in proprietary components
Dependency pinning	Lock files (package-lock.json, poetry.lock)	Always	Exact versions only
Controlled updates	Renovate (configured for grouped weekly PRs)	Weekly	PR with changelog + test results
Transitive dependency audit	Snyk deep scan	Monthly	Review report, create WO for remediation

6.2 Build Artifact Security

Artifact	Signing	Storage	Verification
Container images	Cosign (Sigstore)	Private registry (GCR/ECR)	Admission controller verifies signature before deployment
Helm charts	GPG signed	Private chart repository	Signature verified before `helm install`
Database migrations	SHA-256 hash in migration manifest	Git (source of truth)	Hash verified before execution
SBOM	Auto-generated (CycloneDX format)	Stored alongside build artifact	Included in IQ evidence package

6.3 Base Image Policy

Allowed Base	Use Case	Update Cadence
`gcr.io/distroless/cc-debian12`	Service containers (Go, compiled languages)	Monthly rebuild
`gcr.io/distroless/python3-debian12`	Python services (Orchestrator, Compliance Engine)	Monthly rebuild
`node:22-slim`	TypeScript services (API Gateway, IDE)	Monthly rebuild
`postgres:16-alpine`	Database (dev/test only; managed service in production)	Quarterly

Rejected: Ubuntu/Debian full images, latest tags, unverified third-party images.

7. Incident Response Integration

7.1 Security Event Taxonomy

Event Category	Source	Severity	Response
Authentication failure (≥5 in 5min)	API Gateway	P3	Auto-lockout + alert to security team
Cross-tenant access attempt	RLS / Application	P1	Immediate block + forensic investigation
SOD violation attempt	State machine guard	P2	Block + log + notify compliance officer
Hash chain integrity failure	Nightly verification job	P1	Freeze affected records + forensic investigation
Agent attempted approval	Signature service	P1	Block + alert + review agent configuration
PHI detected in non-PHI field	PHI scanner	P2	Flag record + notify data owner + quarantine
Token budget exhaustion	Token Budget Controller	P3	Hard stop agent execution + alert orchestrator
Circuit breaker open	Agent Worker monitoring	P3	Route around failed worker + alert SRE
Break-glass activation	Break-glass system	P2	Enhanced audit logging + mandatory 72-hour review
Credential rotation failure	Vault integration	P2	Retry with backoff + alert security team + use cached credential

7.2 Security Event → WO Creation (Gap G14)

Critical security events auto-generate incident Work Orders:

Security Event (P1/P2)
  → Incident WO created automatically
    → Type: MANUAL (source_type override: SECURITY_INCIDENT)
    → Priority: EMERGENCY
    → Assigned to: Security Team (pre-configured)
    → Regulatory flag: true (all security incidents are regulatory-relevant)
    → JobPlan: pre-populated from incident template
    → Mandatory QA review before closure
  → Correlation: incident WO linked to triggering event via correlationId

8. Residual Risk Register

Risk ID	Description	STRIDE Category	Severity	Probability	Mitigation Status	Acceptance Criteria	Review Date
SR-001	Plaintext credentials in JobPlan JSONB	Disclosure	Critical	Medium	⚠️ Gap G01	Resolved when vault integration complete	Immediate
SR-002	No cryptographic hash binding on e-signatures	Repudiation	High	Low	⚠️ Gap G02	Resolved when hash function implemented	Immediate
SR-003	Audit trail hash chain not implemented	Tampering	High	Low	⚠️ Gap G03	Resolved when chain verification active	Immediate
SR-004	Agent messages unsigned	Tampering	Medium	Low	⚠️ Gap G04	Resolved when HMAC signing active	Next sprint
SR-005	No PHI scanner on WO fields	Disclosure	High	Medium	⚠️ Gap G09	Resolved when scanner operational	Next sprint
SR-006	Break-glass not implemented	Privilege	Medium	Low	⚠️ Gap G10	Resolved when break-glass system live	Next quarter
SR-007	AI model provider processes L4 data	Disclosure	Medium	Low	Contractual (BAA/DPA)	BAA/DPA signed with all model providers	Annually
SR-008	Single-region deployment (no DR)	Availability	Medium	Low	⚠️ DR gap	Resolved when multi-region deployed	Next quarter
SR-009	Insider threat (malicious admin)	All categories	Medium	Very Low	Admin audit trail + SOD + no admin approval	Accepted — residual risk with quarterly access review	Quarterly

Risk review cadence: Monthly for Critical/High, quarterly for Medium, annually for Low.

Security is not a feature — it's a property of the system. Every new endpoint, every new agent capability, every new data flow must pass through this STRIDE analysis and authorization framework before deployment. The gap closure series (G01–G10) addresses the 7 partial items identified in this threat model.

1. Threat Model (STRIDE)​

1.1 System Boundary​

1.2 STRIDE Analysis​

Spoofing (Identity)​

Tampering (Integrity)​

Repudiation (Non-repudiation)​

Information Disclosure (Confidentiality)​

Denial of Service (Availability)​

Elevation of Privilege (Authorization)​

1.3 Threat Model Summary​

2. Authentication Architecture​

2.1 Authentication Flow​

2.2 Authentication Types​

2.3 Session Management​

3. Authorization Architecture​

3.1 Layered Model​

3.2 Policy Decision Flow​

3.3 Agent Permission Boundaries​

4. Secrets Management​

4.1 Secret Inventory​

4.2 Vault Integration Pattern (Gap G01)​

4.3 Key Management​

5. Network Security​

5.1 Network Boundaries​

5.2 Network Policies​

5.3 Zero Trust Principles​

6. Supply Chain Security​

6.1 Dependency Management​

6.2 Build Artifact Security​

6.3 Base Image Policy​

7. Incident Response Integration​

7.1 Security Event Taxonomy​

7.2 Security Event → WO Creation (Gap G14)​

8. Residual Risk Register​