Work Order QMS Module — Testing & Validation Strategy

Classification: Internal — Quality Engineering Date: 2026-02-13 Artifact: 65 of WO System Series Prompt Section: v8.0 §9 — Testing & Validation Strategy

1. Test Pyramid

1.1 Pyramid Definition

                       ╱╲
                      ╱  ╲         Compliance Validation (IQ/OQ/PQ)
                     ╱ 3% ╲        Full regulatory workflow evidence
                    ╱──────╲
                   ╱        ╲      E2E Tests
                  ╱    5%    ╲     Multi-container, browser-driven
                 ╱────────────╲
                ╱              ╲   Contract Tests
               ╱     8%        ╲  API contracts, message schemas
              ╱──────────────────╲
             ╱                    ╲  Integration Tests
            ╱       19%            ╲ Database, NATS, Vault, external
           ╱────────────────────────╲
          ╱                          ╲ Unit Tests
         ╱          65%               ╲ Pure logic, deterministic, fast
        ╱──────────────────────────────╲

1.2 Layer Details

Layer	Scope	Target Count	Speed	Run When	Tooling	Compliance Role
Unit	Single function, class, or module	~650	<10ms each	Every commit (pre-push hook)	Vitest (TS), pytest (Python)	Logic correctness evidence
Integration	Component + real dependency	~190	<2s each	Every PR	Vitest + Testcontainers, pytest + Testcontainers	Data integrity, query correctness
Contract	API endpoint shape, message schema	~80	<500ms each	Every PR	Pact (consumer-driven), JSON Schema validation	Interface compliance
E2E	Full workflow, multi-container	~50	<30s each	Pre-merge + nightly	Playwright (browser), supertest (API)	Workflow correctness
Compliance	Regulatory evidence generation	~30	<60s each	Pre-release + quarterly	Custom validation harness	IQ/OQ/PQ evidence
Total		~1,000

1.3 Unit Test Coverage Targets

Component	Coverage Target	Critical Paths (100% required)
State machine guards (T1–T8)	100%	All guard functions, all transition paths
Model Router	95%	All routing rules, all fallback paths
RBAC permission checks	100%	All role × permission combinations
SOD enforcement	100%	All conflict detection rules
DAG cycle detection	95%	Kahn's algorithm, all edge cases
Optimistic locking	95%	Version check, conflict detection, retry logic
Audit trail generation	100%	All entity types × all action types
Hash chain computation	100%	Hash generation, chain verification, break detection
Token budget controller	95%	Budget allocation, threshold enforcement, hard stop
Circuit breaker	95%	State transitions (closed → open → half-open), recovery

1.4 Integration Test Scope

Test Category	What's Tested	Real Dependencies	Mock Dependencies
Database operations	CRUD, RLS enforcement, triggers, migrations	PostgreSQL (Testcontainers)	None
Event bus	Publish, subscribe, ordering, backpressure	NATS (Testcontainers)	None
Audit trail immutability	Trigger blocks UPDATE/DELETE	PostgreSQL (Testcontainers)	None
Cross-tenant isolation	RLS prevents cross-tenant reads	PostgreSQL (Testcontainers)	None
Agent message contracts	Message serialization, validation, routing	NATS (Testcontainers)	AI model (deterministic stub)
Vault integration	Secret retrieval, rotation	Vault dev mode (Testcontainers)	None
API endpoint behavior	Request → response, auth, rate limiting	Express (in-process)	Database (seeded Testcontainer)

1.5 Contract Tests

Contract	Provider	Consumer	Schema Source
WO REST API	WO Service	Frontend, Agent Workers	OpenAPI 3.1 spec
Agent messages	Agent Workers	Orchestrator, Compliance Engine	TypeScript interfaces (26-agent-message-contracts.md)
Audit trail events	WO Service	Compliance Engine, SIEM connector	Event schema (JSON Schema)
Webhook payloads	WO Service	External subscribers	Webhook schema (JSON Schema)
Approval/signature flow	Signature Service	WO Service, Frontend	Signature API contract

Contract test verification: provider publishes contract → consumer tests against contract → breaking changes detected before merge.

2. Test Data Management

2.1 Synthetic Data Strategy

Production data is never used in non-production environments. All test data is synthetic.

Data Category	Generation Strategy	Regulatory Constraint	Tooling
Person records	Faker-generated names, emails, phone numbers	Must not match any real individual; must pass format validation	`@faker-js/faker` with deterministic seed
Work orders	Template-based generation covering all WO types, statuses, and regulatory flags	Must cover every state machine transition path	Custom seed script (`seed-work-orders.ts`)
Approval chains	Combinatorial generation of all role × decision paths	Must include SOD-compliant and SOD-violating scenarios	Custom generator with constraint solver
Audit trails	Generated from WO lifecycle execution (not fabricated independently)	Must be internally consistent — audit entries match WO transitions	Generated as side-effect of WO test execution
Asset/tool catalog	Realistic bioscience equipment names and categories	No patient/subject identifiers	Static fixtures in `test/fixtures/assets.json`
Multi-tenant data	Identical schema, different tenant_id values	Must test RLS isolation between tenants	Seed script creates 3 test tenants
Edge cases	Property-based generation (boundary values, null fields, max-length strings)	Must test validation boundaries	`fast-check` property-based testing

2.2 Seed Data Structure

test/
├── fixtures/
│   ├── assets.json              # 50 bioscience assets
│   ├── tools.json               # 30 tools with calibration data
│   ├── experiences.json         # 20 experience/certification types
│   ├── materials.json           # 15 material types
│   └── persons.json             # 25 persons across all roles
├── factories/
│   ├── work-order.factory.ts    # Creates WOs with configurable complexity
│   ├── approval.factory.ts      # Creates approval chains (valid and invalid)
│   ├── job-plan.factory.ts      # Creates job plans with requirements
│   └── tenant.factory.ts        # Creates isolated tenant contexts
├── scenarios/
│   ├── happy-path.scenario.ts   # WO: draft → completed → closed
│   ├── regulatory.scenario.ts   # Full Part 11 workflow with signatures
│   ├── master-linked.scenario.ts # Master WO with 5 linked WOs + dependencies
│   ├── vendor.scenario.ts       # Vendor assignment → execution → evidence
│   ├── rejection.scenario.ts    # WO rejected → revision → re-approval
│   ├── cancellation.scenario.ts # WO cancelled at various stages
│   └── concurrent.scenario.ts   # Optimistic locking conflict resolution
└── seeds/
    ├── seed-all.ts              # Master seed script (deterministic)
    ├── seed-minimal.ts          # Minimum viable data for unit tests
    └── seed-performance.ts      # 10,000 WOs for load testing

2.3 Data Isolation Rules

Rule	Implementation	Verification
No production data in test environments	Network isolation (test env cannot reach production DB)	Monthly audit of test DB contents
Deterministic seed data	All factories use seeded PRNG (`faker.seed(42)`)	Same seed → same data (verified in CI)
PHI-free certification	PHI scanner runs on test fixtures before commit	CI gate blocks commits with PHI patterns
Tenant isolation in tests	Each test suite creates its own tenant context	RLS verification test: cross-tenant query returns 0 rows
Test data cleanup	Transactional rollback per test (unit/integration); DB reset per suite (E2E)	Post-suite assertion: no orphaned test data

3. Performance Testing

3.1 Performance Budgets

Operation	P50 Target	P95 Target	P99 Target	Measurement
Create WO (single)	<50ms	<200ms	<500ms	API response time
State transition	<100ms	<300ms	<1s	API response time (excludes approval wait)
List WOs (paginated, 50/page)	<100ms	<300ms	<1s	API response time
Dependency graph query	<50ms	<150ms	<500ms	API response time
Critical path calculation	<200ms	<500ms	<2s	API response time
Audit trail query (paginated)	<100ms	<300ms	<1s	API response time
E-signature creation	<200ms	<500ms	<1s	API response time (includes hash computation)
Agent dispatch (task → first action)	<500ms	<1s	<3s	Orchestrator metric
Batch WO creation (100 WOs)	<2s	<5s	<10s	API response time
Dashboard render (50 WOs)	<1s	<2s	<5s	Frontend time-to-interactive

3.2 Load Profiles

Profile	Concurrent Users	WO Volume	Duration	Purpose
Baseline	25	500 WOs pre-seeded, 50 new/hr	1 hour	Establish performance baseline
Normal load	100	5,000 WOs pre-seeded, 200 new/hr	4 hours	Typical enterprise usage
Peak load	500	10,000 WOs pre-seeded, 1,000 new/hr	2 hours	Quarter-end audit preparation
Stress	2,000	50,000 WOs pre-seeded, 5,000 new/hr	30 min	Find breaking point
Soak	100 (sustained)	Continuous creation + transition	24 hours	Memory leak detection
Spike	0 → 1,000 → 0 (in 60s)	Burst of 10,000 transitions	10 min	Auto-scaling validation

3.3 Performance Test Implementation

// k6 load test: WO lifecycle
import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '2m', target: 100 },   // Ramp to 100 users
    { duration: '5m', target: 100 },   // Hold at 100
    { duration: '2m', target: 500 },   // Ramp to peak
    { duration: '5m', target: 500 },   // Hold at peak
    { duration: '2m', target: 0 },     // Ramp down
  ],
  thresholds: {
    'http_req_duration{endpoint:create_wo}': ['p(95)<200'],
    'http_req_duration{endpoint:transition}': ['p(95)<300'],
    'http_req_duration{endpoint:list_wos}': ['p(95)<300'],
    http_req_failed: ['rate<0.01'],     // <1% error rate
  },
};

export default function () {
  // Create WO
  const createRes = http.post(`${BASE_URL}/api/v1/work-orders`, JSON.stringify({
    summary: `Perf test WO ${Date.now()}`,
    type: 'AUTOMATION',
    regulatory: false,
    systemCategoryId: 'test-category',
  }), { headers: { 'Content-Type': 'application/json', Authorization: `Bearer ${TOKEN}` },
    tags: { endpoint: 'create_wo' } });

  check(createRes, { 'WO created': (r) => r.status === 201 });

  const woId = createRes.json('id');

  // Transition: DRAFT → PLANNED
  const transRes = http.patch(`${BASE_URL}/api/v1/work-orders/${woId}/status`,
    JSON.stringify({ status: 'PLANNED', reason: 'Performance test' }),
    { headers: { 'Content-Type': 'application/json', Authorization: `Bearer ${TOKEN}` },
      tags: { endpoint: 'transition' } });

  check(transRes, { 'Transitioned': (r) => r.status === 200 });
  sleep(1);
}

3.4 Performance Test Schedule

Test Type	Frequency	Environment	Gate	Owner
Baseline benchmark	Every release	Staging	Regression > 10% blocks release	Engineering
Normal load	Weekly (automated)	Staging	P95 violations create P3 ticket	SRE
Stress test	Monthly	Dedicated perf environment	Document capacity ceiling, update scaling guidelines	SRE
Soak test	Quarterly	Dedicated perf environment	Memory growth > 5%/hr blocks release	SRE
Spike test	Per release (if auto-scaling changed)	Staging	Recovery within 30s	SRE

4. Chaos Engineering

4.1 Experiment Catalog

Experiment	Method	Expected Behavior	Recovery Target	Compliance Impact
Kill Agent Worker	Terminate container (random agent)	Circuit breaker opens, task re-routed to healthy worker	< 30s	No audit trail gaps; in-flight WO transitions preserved
PostgreSQL primary failure	Force replica promotion	Reads continue immediately; writes resume after promotion	< 60s	Zero data loss (synchronous replication for L4); audit trail intact
NATS partition	Network partition between NATS nodes	Buffered delivery, no message loss; publishers see backpressure	< 120s	Audit events delayed but not lost; compliance evidence intact
Vault unavailable	Block vault endpoint	Cached credentials used (30min TTL); alert fired; new secret requests fail gracefully	< 10s (cache), < 5min (fresh)	Agent executions paused if fresh credentials needed
AI model API timeout	Inject 30s latency on model endpoint	Fallback to secondary model; timeout after 10s; circuit breaker opens	< 5s	Agent switches model tier; task continues with different model
Disk full on State Store	Fill ephemeral storage to 95%	Alert fires; oldest temp files purged; WAL archiving prioritized	< 30s	No data loss; new writes may temporarily fail
API Gateway crash	Kill API Gateway pod	Kubernetes restarts pod; load balancer routes to healthy instances	< 15s	Brief API unavailability; no data loss
DNS resolution failure	Drop DNS for model provider	Cached DNS used; circuit breaker opens for model calls; queued for retry	< 5s	Agent execution paused; human checkpoint triggered
Clock skew	Inject 5-minute clock offset on one node	Timestamps from skewed node detected via NTP monitoring; affected audit entries flagged	Detection < 60s	Compliance alert: server-side timestamps (§11.10(e)) potentially affected
Tenant isolation breach attempt	Inject cross-tenant RLS bypass attempt	Query returns 0 rows; security alert triggered; attacker session terminated	Immediate	P1 incident; forensic investigation initiated

4.2 Chaos Test Schedule

Cadence	Experiments	Environment	Stakeholder Notification
Weekly (automated)	Kill Agent Worker, API Gateway crash	Staging	SRE team (automated report)
Monthly	All single-component failures	Staging	Engineering + SRE review
Quarterly	Multi-component failures (e.g., DB + NATS simultaneously)	Dedicated chaos environment	Engineering + SRE + compliance review
Annually	Full DR exercise (see operational-readiness.md)	Production replica	All stakeholders + regulatory

4.3 Chaos Engineering Maturity Model

Level	Description	WO System Status
L0: No chaos	Only reactive incident response	Passed ✓
L1: GameDay	Manual fault injection with team present	Current ←
L2: Automated	Scheduled automated fault injection in staging	Target (Q3 2026)
L3: Continuous	Continuous fault injection in production with auto-rollback	Future (Q1 2027)
L4: Self-healing	System detects and corrects faults autonomously	Future

5. Compliance Validation Automation (IQ/OQ/PQ)

5.1 Qualification Overview

Qualification	Purpose	Automation Level	Evidence Output	Trigger
IQ (Installation)	Verify correct installation of WO system	100% automated	Deployment manifest, config verification, dependency check	Every deployment
OQ (Operational)	Verify correct operation under normal conditions	95% automated	Test execution report, expected vs. actual, screenshot evidence	Every release
PQ (Performance)	Verify correct operation under real-world conditions	80% automated	Performance report, SLA compliance, capacity analysis	Quarterly + major release

5.2 IQ (Installation Qualification) Test Cases

IQ Test	Verification	Pass Criteria	Evidence
IQ-001: Database schema	Compare deployed schema hash against expected	Hash match	Schema diff report (empty = pass)
IQ-002: RLS policies	Verify all 22 tables have active RLS	All policies active	`pg_policies` query result
IQ-003: Audit trail triggers	Verify trigger prevents UPDATE/DELETE on audit_trail	Trigger exists and active	Trigger test execution log
IQ-004: Service connectivity	Verify all containers can reach dependencies	All health checks pass	Health check response log
IQ-005: TLS configuration	Verify TLS 1.3 on all endpoints	No TLS < 1.3 accepted	SSL scan report (testssl.sh)
IQ-006: Vault connectivity	Verify service can retrieve test secret from vault	Secret retrieved successfully	Vault audit log entry
IQ-007: NATS connectivity	Verify publish/subscribe on test channel	Message round-trip < 100ms	NATS monitoring metrics
IQ-008: Configuration verification	Compare deployed config against approved config	Config hash match	Config diff report
IQ-009: Container image verification	Verify Cosign signature on all deployed images	All signatures valid	Sigstore verification log
IQ-010: Version verification	Verify deployed version matches release manifest	Version match	Version endpoint response

5.3 OQ (Operational Qualification) Test Cases

OQ Test	Scenario	Pass Criteria	Evidence
OQ-001: WO lifecycle (happy path)	Create → Plan → Schedule → Execute → Review → Approve → Complete → Close	All transitions succeed; audit trail complete	Full audit trail export
OQ-002: Regulatory WO with dual approval	Create regulatory WO → SO approval → QA approval → completion	Both signatures captured with meaning; hash binding valid	Signature records + hash verification
OQ-003: SOD enforcement	Assignee attempts to approve own WO	Transition blocked; guard violation logged	Guard violation audit entry
OQ-004: Master/Linked WO hierarchy	Create master + 5 linked WOs with dependencies	DAG valid; critical path calculated; progress tracking accurate	Dependency graph export + progress report
OQ-005: Cross-tenant isolation	Tenant A queries for Tenant B's WOs	0 results returned; access logged	Query result + audit log
OQ-006: Optimistic locking conflict	Two users update same WO simultaneously	One succeeds, one gets 409 Conflict with diff	API response logs
OQ-007: Agent WO creation	Agent creates and transitions WO within scope	WO created; agent attribution in audit trail; scope enforced	Audit trail with agent session ID
OQ-008: Vendor portal scoping	Vendor queries WOs outside assignment	403 Forbidden; access logged	API response + audit log
OQ-009: Audit trail immutability	Attempt UPDATE on audit_trail via direct SQL	UPDATE blocked by trigger; error logged	Database error log
OQ-010: E-signature re-authentication	User signs after session timeout	Re-auth required; new attestation created; signature valid	Re-auth attestation + signature record
OQ-011: WO cancellation	Cancel WO at each stage	Correct transitions; proper audit trail; linked WOs notified	Audit trail + notification log
OQ-012: Resource matching	Create WO with experience requirements → agent matches person	Qualified person identified; assignment recommendation valid	Match result + audit log

5.4 PQ (Performance Qualification) Test Cases

PQ Test	Scenario	Pass Criteria	Evidence
PQ-001: Normal load performance	100 concurrent users, 200 WOs/hr for 4 hours	All P95 targets met; error rate < 0.1%	k6 report + Grafana dashboard snapshot
PQ-002: Peak load performance	500 concurrent users, 1,000 WOs/hr for 2 hours	P95 < 500ms; error rate < 1%; auto-scaling triggers	k6 report + scaling event log
PQ-003: Audit trail volume	100,000 audit entries; query performance	Paginated query P95 < 300ms	Query execution plan + timing
PQ-004: Concurrent approvals	50 simultaneous approval signing events	All signatures valid; no hash collision; no race condition	Signature verification report
PQ-005: Agent execution under load	20 concurrent agent WO executions	Token budget respected; circuit breakers functional; no agent starvation	Agent monitoring dashboard snapshot

5.5 Evidence Package Generation

Every qualification run generates a standardized evidence package:

evidence/
├── iq/
│   ├── IQ-execution-report.json     # Machine-readable results
│   ├── IQ-execution-report.pdf      # Human-readable with screenshots
│   ├── IQ-traceability-matrix.csv   # Requirement → test case → result
│   └── IQ-signature-page.pdf        # E-signed by QA reviewer
├── oq/
│   ├── OQ-execution-report.json
│   ├── OQ-execution-report.pdf
│   ├── OQ-traceability-matrix.csv
│   ├── OQ-screenshots/              # UI workflow evidence
│   └── OQ-signature-page.pdf
├── pq/
│   ├── PQ-execution-report.json
│   ├── PQ-execution-report.pdf
│   ├── PQ-grafana-snapshots/        # Performance dashboard captures
│   ├── PQ-k6-reports/               # Load test detailed results
│   └── PQ-signature-page.pdf
└── summary/
    ├── qualification-summary.pdf     # Executive summary of all qualifications
    ├── deviation-report.pdf          # Any failures and disposition
    └── release-authorization.pdf     # Final sign-off for deployment

6. CI/CD Integration

6.1 Pipeline Gates

Developer Commit
  │
  ▼
Pre-Push Hook (local)
  ├── Lint (ESLint/Ruff)
  ├── Type check (tsc --noEmit)
  └── Unit tests (affected files only)
  │
  ▼
Pull Request (CI)
  ├── Full unit test suite (~650 tests, <2min)
  ├── Integration tests (~190 tests, <5min)
  ├── Contract tests (~80 tests, <1min)
  ├── Security scan (Snyk/Trivy, block on critical)
  ├── License check (FOSSA)
  ├── PHI scan on test data (block if detected)
  └── Coverage gate (overall ≥85%, critical paths = 100%)
  │
  ▼ (all pass → merge allowed)
  │
Main Branch (CI)
  ├── Full test suite (unit + integration + contract)
  ├── E2E tests (~50 tests, <15min)
  ├── Container image build + Cosign signing
  ├── SBOM generation (CycloneDX)
  └── Performance baseline (quick benchmark, 5min)
  │
  ▼ (all pass → deploy to staging)
  │
Staging Deployment
  ├── IQ automation (full installation qualification)
  ├── OQ automation (operational qualification)
  ├── Smoke tests (critical path only, <2min)
  └── Weekly: chaos experiments (automated)
  │
  ▼ (all pass + QA sign-off → production eligible)
  │
Production Deployment
  ├── Blue-green deployment
  ├── IQ automation (production installation verification)
  ├── Smoke tests (critical path)
  ├── Canary metrics monitoring (15min)
  └── Rollback trigger: error rate >1% or P95 >2s

6.2 Test Failure Response

Failure Type	Automated Response	Human Response	SLA
Unit test failure	PR blocked	Developer fixes	Before merge
Integration test failure	PR blocked	Developer + reviewer investigate	Before merge
Contract test failure	PR blocked + alert to API owner	Breaking change review	Before merge
E2E test failure	Deploy blocked + alert to team	Team investigation	Within 4 hours
Security scan critical	PR blocked + P2 ticket created	Security team review	Within 24 hours
Performance regression >10%	Deploy blocked + alert to SRE	Performance investigation	Within 48 hours
IQ failure	Deploy rolled back	SRE + QA investigation	Within 1 hour
OQ failure	Release held	QA team investigation + deviation report	Within 24 hours
PQ failure	Release held + capacity review	SRE + QA + engineering review	Within 48 hours

7. Coverage Requirements

Metric	Target	Enforcement	Measurement
Line coverage (overall)	≥85%	CI gate	Istanbul/NYC (TS), coverage.py (Python)
Branch coverage (overall)	≥80%	CI gate	Same tooling
Critical path coverage	100%	CI gate (stricter threshold for critical modules)	Module-level coverage config
State machine guard coverage	100%	CI gate	Custom coverage report for guard functions
RBAC permission coverage	100% (all role × permission × entity combinations)	CI gate	Custom matrix test generator
API endpoint coverage	100% (every endpoint has ≥1 integration test)	CI gate	OpenAPI spec cross-reference
Agent message contract coverage	100% (every message type has ≥1 contract test)	CI gate	Message schema cross-reference
Mutation testing score	≥70% (stretch: 80%)	Weekly report (not a gate)	Stryker (TS), mutmut (Python)
Compliance test coverage	100% of regulatory requirements in matrix	Pre-release gate	Traceability matrix cross-reference

In regulated environments, tests aren't just quality checks — they're compliance evidence. Every test that runs in the IQ/OQ/PQ pipeline produces an auditable artifact. Every test failure is a potential deviation that requires documented disposition. The testing strategy is not separate from the compliance strategy — it IS the compliance strategy's execution arm.

1. Test Pyramid​

1.1 Pyramid Definition​

1.2 Layer Details​

1.3 Unit Test Coverage Targets​

1.4 Integration Test Scope​

1.5 Contract Tests​

2. Test Data Management​

2.1 Synthetic Data Strategy​

2.2 Seed Data Structure​

2.3 Data Isolation Rules​

3. Performance Testing​

3.1 Performance Budgets​

3.2 Load Profiles​

3.3 Performance Test Implementation​

3.4 Performance Test Schedule​

4. Chaos Engineering​

4.1 Experiment Catalog​

4.2 Chaos Test Schedule​

4.3 Chaos Engineering Maturity Model​

5. Compliance Validation Automation (IQ/OQ/PQ)​

5.1 Qualification Overview​

5.2 IQ (Installation Qualification) Test Cases​

5.3 OQ (Operational Qualification) Test Cases​

5.4 PQ (Performance Qualification) Test Cases​

5.5 Evidence Package Generation​

6. CI/CD Integration​

6.1 Pipeline Gates​

6.2 Test Failure Response​

7. Coverage Requirements​