Sprint 0: Foundations and Config (1 week)
- Goal: Establish configuration surface and basic scaffolding for hardening work.
- Backlog Items: A.9.1.1, A.9.1.2, A.9.1.3, A.9.1.4, A.9.1.6, A.9.1.9 (config only)
- Tasks:
- A.9.0.1 Add config types for diagnostics, CORS, limits, validation, and metrics in
types/app-config-types.ts.
- A.9.0.2 Add default config values (new defaults module or config helper).
- A.9.0.3 Wire config into
motia.ts and server.ts without changing behavior.
- Dependencies: A.9.0.1, A.9.0.2
- A.9.0.4 Add typed config accessors for diagnostics, CORS, limits, validation, metrics.
- A.9.0.5 Document config flags in a short README section.
- Dependencies: A.9.0.1, A.9.0.2
- A.9.0.6 Add config unit tests or type tests.
- Dependencies: A.9.0.1, A.9.0.2
- Exit Criteria:
- Core compiles with new config fields and defaults.
- No behavioral changes yet, only config surfaces.
Sprint 1: Access Control and CORS (1 week)
- Goal: Close immediate security gaps.
- Backlog Items: A.9.1.1, A.9.1.2
- Tasks:
- A.9.1.1.1 Implement diagnostics guard middleware (enabled/disabled + optional auth).
- Dependencies: A.9.0.1, A.9.0.2, A.9.0.3
- A.9.1.1.2 Add role/permission check stubs (diagnostics read/write) for future RBAC integration.
- A.9.1.1.3 Gate
flowsEndpoint, flowsConfigEndpoint, stepEndpoint, analyticsEndpoint.
- A.9.1.1.4 Add tests for 404/403 behavior under each mode.
- A.9.1.2.1 Implement allowlist CORS middleware with credential-safe rules.
- Dependencies: A.9.0.1, A.9.0.2
- A.9.1.2.2 Remove
Access-Control-Allow-Private-Network by default; gate via config.
- A.9.1.2.3 Add preflight tests and
Origin allowlist tests.
- Exit Criteria:
- All diagnostic endpoints return 404/403 when disabled or unauthorized.
- CORS no longer allows
* with credentials.
Sprint 2: Runtime Limits and Payload Transport (2 weeks)
- Goal: Enforce stability limits and eliminate CLI-arg payload risks.
- Backlog Items: A.9.1.3, A.9.1.4, A.9.1.5
- Tasks:
- A.9.1.3.1 Add request size limits to config with safe defaults.
- Dependencies: A.9.0.1, A.9.0.2
- A.9.1.3.2 Apply limits to
body-parser (json, urlencoded, text).
- A.9.1.3.3 Add tests for 413 responses on oversize payloads.
- A.9.1.4.1 Add max concurrency config and implement semaphore in
call-step-file.
- Dependencies: A.9.0.1, A.9.0.2
- A.9.1.4.2 Add default step timeout and apply when per-step timeout missing.
- A.9.1.4.3 Add tests for timeout enforcement and concurrency backpressure.
- Dependencies: A.9.1.4.1, A.9.1.4.2
- A.9.1.5.1 Add payload size detection threshold.
- A.9.1.5.2 Implement temp-file or stdin payload transport in
call-step-file.
- A.9.1.5.3 Update node/python/ruby runners to read payload from file or stdin.
- A.9.1.5.4 Add cross-language tests for large payload handling.
- Exit Criteria:
- No
ARG_MAX failures under large payload tests.
- Limits are enforced and observable.
Sprint 3: Validation and Subscription Reliability (1 week)
- Goal: Deterministic execution and safer input handling.
- Backlog Items: A.9.1.6, A.9.1.7, A.9.1.8
- Tasks:
- A.9.1.6.1 Add
validation.strictEvents config and defaults.
- Dependencies: A.9.0.1, A.9.0.2
- A.9.1.6.2 Make
validateEventInput throw or block when strict mode is enabled.
- A.9.1.6.3 Add tests for strict vs non-strict behavior.
- A.9.1.7.1 Replace
forEach(async) with for...of or Promise.all for subscriptions.
- A.9.1.7.2 Ensure
handlerMap is set only after subscriptions succeed.
- A.9.1.7.3 Add explicit logging for subscription failures.
- A.9.1.8.1 Wrap WS
JSON.parse in try/catch.
- A.9.1.8.2 Add message schema validation and error responses.
- A.9.1.8.3 Add tests for malformed WS payloads.
- Exit Criteria:
- No async subscription races on startup.
- Invalid inputs do not crash handlers.
Sprint 4: Observability and SLO Instrumentation (2 weeks)
- Goal: Add platform-grade metrics, traces, and structured logging.
- Backlog Items: A.9.1.9
- Tasks:
- A.9.1.9.1 Define metrics interface (counters, timers, gauges).
- Dependencies: A.9.0.1, A.9.0.2
- A.9.1.9.2 Add API metrics hooks (request count, error count, latency).
- A.9.1.9.3 Add step execution metrics (count, error, latency).
- A.9.1.9.4 Add queue and stream metrics as available.
- A.9.1.9.5 Add structured log format toggle with trace IDs.
- Dependencies: A.9.0.1, A.9.0.2
- A.9.1.9.6 Expose metrics endpoint when enabled.
- Dependencies: A.9.1.9.1, A.9.1.9.2
- A.9.1.9.7 Add tests for metrics increments and log shape.
- Dependencies: A.9.1.9.2, A.9.1.9.3, A.9.1.9.5
- Exit Criteria:
- SLO dashboards can be populated from emitted metrics.
Sprint 5: Scalability Proof (2 weeks)
- Goal: Demonstrate resilience and performance.
- Backlog Items: Test plan execution from
platform-hardening/test-plan.md
- Tasks:
- A.9.5.1 Build API load test scripts with mixed GET/POST.
- A.9.5.2 Build event burst and fan-out tests.
- A.9.5.3 Build stream subscription concurrency tests.
- A.9.5.4 Run 24-hour soak test with scheduled spikes.
- Dependencies: A.9.5.1, A.9.5.2, A.9.5.3
- A.9.5.5 Implement Redis outage injection and verify recovery.
- A.9.5.6 Implement event adapter outage injection and verify recovery.
- A.9.5.7 Integrate load/soak tests into CI on schedule.
- Exit Criteria:
- SLO targets met under defined load profiles.
- Recovery behavior documented and verified.
Dependencies
- RBAC enforcement design from
platform-hardening/rbac.md.
- Config system changes in core types.
- Availability of metrics backend for Sprint 4.
Risks
- Tightening defaults may require migration guidance.
- Performance regressions from added validation or logging.
Sequencing Notes
- Sprint 1 should start only after Sprint 0 config surface is merged.
- Sprint 2 and Sprint 3 can overlap if teams are available.
- Sprint 5 requires observability metrics to measure SLOs.