Skip to main content

System Design Document (SDD)

1. Purpose and Scope

Coditect Flow Platform (CFP) is a multi-tenant orchestration and governance platform built as a secure extension to coditect-core. CFP provides an enterprise AI-first workbench for designing, executing, and operating step-based workflows that power other applications. The platform must preserve Motia functionality while hardening security, compliance, and operational resilience.

1.1 Repository Organization

The platform implementation is maintained in the dedicated coditect-step-dev-platform repository. coditect-core remains the parent intelligence framework and standards source, while the platform repo focuses on delivery, runtime, and UI.

2. Product Definition

CFP is the core step workflow and visualization platform for building other applications. It provides:

  • A visual workbench for workflow authoring and debugging.
  • A Rust runtime for deterministic step execution.
  • A multi-tenant control plane with strict RBAC and auditability.
  • Multi-LLM provider routing and governance.
  • An immutable audit chain for compliance.
  • A mobile-responsive web UI.
  • IDE access through GCP Cloud Workstations opened in a new browser tab.

3. Functional Requirements

The system MUST:

  1. Provide Motia feature parity: Steps, workflows, API endpoints, events, background jobs, cron, streams, typed state, plugins, and workbench.
  2. Support multi-tenant, multi-user, multi-team, and multi-project isolation with explicit scopes.
  3. Enforce RBAC with at least these roles: system admin, tenant admin, tenant viewer, team admin, team viewer, project admin, project editor, project viewer, auditor, support.
  4. Gate all dev and diagnostic endpoints behind explicit configuration flags and/or authenticated roles.
  5. Enforce CORS with allowlists that support credentialed requests only for approved origins.
  6. Provide configurable limits for request size, timeouts, and max concurrency at tenant and project scopes.
  7. Avoid CLI-argument payload transfer for large inputs by using object storage references.
  8. Support optional strict event input validation with a runtime configuration switch.
  9. Provide subscription lifecycle correctness including clear error handling and retries for adapter wiring.
  10. Emit structured logs, traces, and metrics for all control and data plane actions.
  11. Provide per-step latency and error budgets with SLO enforcement alerts.
  12. Provide immutable audit logging with hash-chained append-only storage and verification.
  13. Provide project and executive reporting: contracts, project status reports, and session logs.
  14. Provide a mobile-responsive UI for all public and authenticated pages.
  15. Provide IDE access by opening GCP Cloud Workstations in a separate browser tab.

4. Non-Functional Requirements

The system MUST:

  1. Achieve 99.9% monthly availability for control plane and data plane.
  2. Maintain p95 API latency below 300 ms for lightweight endpoints.
  3. Maintain p95 event processing below 2 seconds for standard workloads.
  4. Scale to at least 10,000 concurrent workbench websocket connections per cluster.
  5. Provide deterministic execution guarantees for step retry and idempotency.
  6. Ensure data residency by routing tenant data to approved regions only.
  7. Use encryption at rest and in transit for all sensitive data and payloads.
  8. Provide tamper-evident audit verification with periodic external anchoring.

5. Personas and Roles

  • CODITECT System Admin: global governance and platform policy.
  • Tenant Admin: tenant configuration, user management, compliance.
  • Tenant Viewer: read-only tenant visibility.
  • Team Admin: team membership and project access control.
  • Team Viewer: read-only team visibility.
  • Project Admin: workflow and environment management.
  • Project Editor: workflow editing and deployment.
  • Project Viewer: read-only project visibility.
  • Auditor: audit and compliance reporting.
  • Support: limited runtime visibility and diagnostic access.

6. Architecture Overview

6.1 Control Plane

  • Identity and access management (OIDC/SAML).
  • Tenant, team, user, and project management.
  • Contracts, status reports, and session logs.
  • Quotas, rate limits, and usage metering.
  • Audit policy enforcement.

6.2 Data Plane

  • Rust runtime for step execution.
  • Event gateway for ingestion, validation, and routing.
  • Stream service for websocket fan-out and state updates.
  • Artifact service for payloads and step bundles.

6.3 Workbench

  • React Flow canvas for workflows.
  • Real-time logs, traces, metrics, and state views.
  • Role-aware dashboards and management pages.
  • Mobile-responsive UI.

6.4 IDE Integration

  • GCP Cloud Workstations are opened in a separate browser tab.
  • The platform never embeds a full IDE in the primary UI shell.
  • The platform must provide a secure, audited link-out experience.

7. Data Architecture

  • PostgreSQL is the system of record for control-plane entities.
  • Object storage is the system of record for large payloads and artifacts.
  • Redis/KeyDB is used for caching, rate limits, and ephemeral session state.
  • Audit chain metadata is stored in PostgreSQL with hash-chained block references.

8. Security Architecture (Fort Knox)

  • Zero-trust access at every boundary.
  • mTLS for all inter-service communication.
  • Short-lived access tokens with rotation and revocation.
  • Per-tenant encryption keys with KMS integration.
  • Immutable audit chain with hash verification.
  • Security event logging with alerting and automated response hooks.

9. Compliance and Data Residency

  • Tenant data is pinned to an approved region with routing enforcement.
  • Retention policies are configured per tenant with legal hold support.
  • Audit logs are exportable with verified hash chain.

10. Observability and SLOs

  • Traces via OpenTelemetry for every request and step.
  • Structured logs with tenant, project, and trace identifiers.
  • Metrics for latency, error rate, queue depth, and concurrency.
  • Per-step error and latency budgets with alert thresholds.

11. Testing and Scalability

  • Load tests and soak tests for runtime throughput and control plane APIs.
  • Failure injection for queue adapters and runtime execution.
  • Concurrency and burst testing for websocket and event pipelines.

12. Out of Scope

  • Embedded IDE inside the primary platform UI.
  • Anonymous or public execution without authentication.

13. Explicit Decisions

  • IDE access uses GCP Cloud Workstations in a separate browser tab.
  • Mobile responsiveness is a strict requirement for all UI surfaces.
  • The platform is built as a secure extension to coditect-core and reuses coditect-core intelligence services.