Skip to main content

Technical Design Document (TDD)

1. Component Topology

1.1 Control Plane Services

  • tenant-service: tenant lifecycle, region and residency policy.
  • identity-service: users, teams, orgs, RBAC, service accounts.
  • project-service: projects, environments, flows, versions.
  • policy-service: policy definitions, enforcement hooks, validation modes.
  • audit-service: immutable audit chain, verification, export.
  • reporting-service: contracts, status reports, session logs.
  • usage-service: quotas, rate limits, metering.

1.2 Data Plane Services

  • runtime-service (Rust): step execution and orchestration.
  • event-gateway: ingest, validate, route to runtime.
  • stream-service: websocket state and log fan-out.
  • artifact-service: payload storage and retrieval.

1.3 Workbench

  • React Flow UI for workflow authoring and runtime inspection.
  • Mobile-responsive layout with reduced panels and adaptive nav.

1.4 IDE Integration

  • GCP Cloud Workstations are opened in a new browser tab.
  • Platform provides a secure, audited, signed access URL.

1.5 Repository Organization

  • coditect-step-dev-platform hosts the platform runtime, UI, and rewrite documentation.
  • coditect-core remains the parent intelligence framework and standards source.

2. Runtime Execution Flow

  1. Event arrives at event-gateway.
  2. AuthN and RBAC enforced at tenant and project scopes.
  3. Payload validation applies configured strictness.
  4. Event is routed to runtime-service.
  5. Runtime loads workflow graph and executes steps.
  6. State writes and outputs are emitted to streams and storage.
  7. Observability events are emitted to logs, traces, and metrics.

3. API Surfaces

3.1 REST (Control Plane)

  • /tenants, /teams, /users, /projects, /roles
  • /contracts, /reports, /session-logs
  • /audit/verify, /audit/export
  • /usage/quotas, /usage/limits
  • /workstations (list and access link generation)

3.2 gRPC (Runtime and Internal)

  • Runtime.ExecuteStep
  • Runtime.EmitEvent
  • Runtime.StreamState
  • Artifacts.PutPayload, Artifacts.GetPayload
  • Audit.AppendBlock, Audit.VerifyChain

4. Data Model (High-Level)

  • tenants: id, name, region, status, kms_key_id
  • teams: id, tenant_id, name, lead_user_id
  • users: id, tenant_id, email, role, status, mfa_enabled
  • roles: id, name, description
  • role_bindings: user_id, role_id, scope
  • projects: id, tenant_id, team_id, name, environment
  • flows: id, project_id, name, version
  • steps: id, flow_id, name, type, config_json
  • executions: id, step_id, trace_id, status, latency_ms
  • events: id, topic, payload_ref, trace_id
  • contracts: id, tenant_id, status, value, dates
  • status_reports: id, project_id, status, summary
  • session_logs: id, project_id, visibility, content
  • workstations: id, tenant_id, user_id, status, region, gcp_resource
  • audit_blocks: id, hash, prev_hash, payload_ref, created_at

5. Security and Policy Enforcement

  • mTLS between all services.
  • OIDC/SAML with short-lived tokens.
  • CORS allowlist with credentials only for approved origins.
  • Dev and diagnostic endpoints require explicit config and RBAC.
  • Immutable audit chain with hash verification for every privileged action.

6. Limits and Concurrency

  • Tenant-wide concurrency limits and per-project execution pools.
  • Request size limits for API and event ingest.
  • Timeouts for step execution, event gateway, and streaming.
  • Large payloads use object storage references, not CLI args.

7. Event Validation Modes

  • validation_mode = strict | permissive
  • Strict mode rejects unknown fields and invalid schemas.
  • Permissive mode logs violations and continues.

8. Subscription Lifecycle and Adapter Hardening

  • Adapter wiring must handle errors with retries and clear failure states.
  • Subscriptions must be idempotent and auditable.

9. Observability

  • OpenTelemetry traces across gateway, runtime, and workbench APIs.
  • Metrics: latency, error rate, queue depth, concurrency, saturation.
  • Logs: structured JSON with tenant and trace identifiers.
  • SLO alerts per step, per project, and per tenant.

10. Testing

  • Contract tests for API boundaries.
  • Load tests for runtime and gateway.
  • Soak tests for long-running streams.
  • Failure injection for event adapters and queue backpressure.

11. Deployment Model

  • Separate scaling domains for control plane and data plane.
  • Multi-region clusters with residency enforcement.
  • Blue/green deployments for runtime upgrades.

12. Mobile Responsiveness

  • All UI layouts MUST adapt to phone and tablet breakpoints.
  • Tables must support horizontal scrolling on small screens.
  • Primary actions must remain accessible without hover-only controls.