target architecture (e.g., Coditect‑style multi‑tenant agents, GPU needs, persistence model), a next step can be a concrete prompt scaffold for: “Design a Modal‑backed sandbox executor service for untrusted agent code (API + orchestration + limits + IAM). ARE MODAL SANDBOXES open source?

Modal Sandboxes themselves (the backend runtime and orchestration) are not open source; only the client SDKs (Python modal, JS/TS/Go libmodal) are open source under Apache‑2.0.¹²³⁴

What is open vs closed

The Modal Python client (modal on PyPI, modal-client on GitHub) is Apache‑2.0 licensed open source.²³¹
The JS/TS and Go SDKs (libmodal on GitHub) are also open source SDKs for interacting with Modal.⁵⁴
The Modal Sandboxes product (gVisor‑based isolation, orchestration, scheduling, infra) is a proprietary managed service, not released as open source. Public materials describe it as a product you use via their API, with no backend repo exposed.⁶⁷⁸

Relevant docs and links

Modal Sandboxes product page (positioning, capabilities): https://modal.com/products/sandboxes
Modal Sandboxes guide (runtime‑defined containers, untrusted code use cases): https://modal.com/docs/guide/sandbox
Python SDK (client) on PyPI: https://pypi.org/project/modal/
Python SDK source (Apache‑2.0): https://github.com/modal-labs/modal-client
JS/TS and Go SDKs: https://github.com/modal-labs/libmodal

Suggested next-step prompt scaffold

Design a Modal-backed sandbox executor microservice for Coditect that executes untrusted agent code using Modal Sandboxes. Define: (1) a multi-tenant API surface (create_sandbox, exec, snapshot, destroy) with per-tenant isolation and quotas, (2) orchestration flow for routing agent jobs to Modal, handling stdout/stderr/log streaming, and enforcing CPU/GPU, time, and memory limits, (3) an IAM model where Coditect issues short-lived tokens/roles that map tenants and projects to specific Modal apps, volumes, and network policies, and (4) a persistence model for ephemeral vs project-scoped workspaces (Modal volumes, object storage), including audit logging of all executions and artifacts. Use OpenAPI 3.1 for the external API, describe internal sequence diagrams, and propose a minimal RBAC schema for tenants, projects, and sandboxes.

If you want, this can be further specialized for “GPU-heavy codegen and test runs” vs “CPU-only quick tools,” or aligned with your existing Coditect RLM orchestrator API. ⁹¹⁰¹¹¹²¹³¹⁴

⁂

Design a Modal-like sandbox executor microservice for Coditect that executes untrusted agent code using Modal-like Sandboxes. Define: (1) a multi-tenant API surface (create_sandbox, exec, snapshot, destroy) with per-tenant isolation and quotas, (2) orchestration flow for routing agent jobs to Modal, handling stdout/stderr/log streaming, and enforcing CPU/GPU, time, and memory limits, (3) an IAM model where Coditect issues short-lived tokens/roles that map tenants and projects to specific Modal-like apps, volumes, and network policies, and (4) a persistence model for ephemeral vs project-scoped workspaces (Modal-like volumes, object storage), including audit logging of all executions and artifacts. Use OpenAPI 3.1 for the external API, describe internal sequence diagrams, and propose a minimal RBAC schema for tenants, projects, and sandboxes.

Below is a comprehensive design document for a Modal-like sandbox executor service for Coditect. This architecture leverages patterns from Modal's Sandbox API while adding multi-tenant IAM, quota enforcement, persistence, and audit logging.

OpenAPI 3.1 Specification (Core Endpoints)

openapi: 3.1.0
info:
  title: Coditect Sandbox Executor API
  version: 1.0.0
  description: Multi-tenant sandbox executor for untrusted agent code

servers:
  - url: https://api.coditect.io/v1

security:
  - BearerAuth: []

paths:
  /sandboxes:
    post:
      operationId: createSandbox
      summary: Create a new sandbox
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/CreateSandboxRequest'
      responses:
        '201':
          description: Sandbox created
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Sandbox'
        '403':
          description: Quota exceeded or permission denied

  /sandboxes/{sandbox_id}/exec:
    post:
      operationId: execCommand
      summary: Execute command in sandbox
      parameters:
        - $ref: '#/components/parameters/SandboxId'
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/ExecRequest'
      responses:
        '200':
          description: Execution started
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ExecHandle'

  /sandboxes/{sandbox_id}/snapshot:
    post:
      operationId: snapshotFilesystem
      summary: Snapshot sandbox filesystem to image
      parameters:
        - $ref: '#/components/parameters/SandboxId'
      responses:
        '201':
          description: Snapshot created
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/SnapshotResult'

  /sandboxes/{sandbox_id}:
    delete:
      operationId: destroySandbox
      summary: Terminate and destroy sandbox
      parameters:
        - $ref: '#/components/parameters/SandboxId'
      responses:
        '204':
          description: Sandbox terminated

  /sandboxes/{sandbox_id}/streams:
    get:
      operationId: streamLogs
      summary: SSE stream of stdout/stderr
      parameters:
        - $ref: '#/components/parameters/SandboxId'
      responses:
        '200':
          description: Event stream
          content:
            text/event-stream:
              schema:
                $ref: '#/components/schemas/LogEvent'

components:
  securitySchemes:
    BearerAuth:
      type: http
      scheme: bearer
      bearerFormat: JWT

  parameters:
    SandboxId:
      name: sandbox_id
      in: path
      required: true
      schema:
        type: string
        format: uuid

  schemas:
    CreateSandboxRequest:
      type: object
      required: [project_id]
      properties:
        project_id:
          type: string
          format: uuid
        name:
          type: string
          maxLength: 64
          pattern: '^[a-zA-Z0-9._-]+$'
        image:
          type: string
          description: Image reference or snapshot ID
        workdir:
          type: string
        env:
          type: object
          additionalProperties:
            type: string
        secrets:
          type: array
          items:
            type: string
          description: Secret references to inject
        volumes:
          type: object
          additionalProperties:
            type: string
          description: Mount path -> volume ID mapping
        limits:
          $ref: '#/components/schemas/ResourceLimits'
        network_policy:
          $ref: '#/components/schemas/NetworkPolicy'
        timeout:
          type: integer
          minimum: 1
          maximum: 86400
          default: 300
        idle_timeout:
          type: integer
          minimum: 30
          maximum: 3600

    ResourceLimits:
      type: object
      properties:
        cpu:
          oneOf:
            - type: number
            - type: array
              items:
                type: number
              minItems: 2
              maxItems: 2
        memory_mib:
          oneOf:
            - type: integer
            - type: array
              items:
                type: integer
              minItems: 2
              maxItems: 2
        gpu:
          type: string
          enum: [none, t4, a10g, a100, h100]

    NetworkPolicy:
      type: object
      properties:
        block_network:
          type: boolean
          default: false
        cidr_allowlist:
          type: array
          items:
            type: string
            format: cidr

    ExecRequest:
      type: object
      required: [command]
      properties:
        command:
          type: array
          items:
            type: string
          minItems: 1
        workdir:
          type: string
        env:
          type: object
          additionalProperties:
            type: string
        timeout:
          type: integer
        pty:
          type: boolean
          default: false

    Sandbox:
      type: object
      properties:
        id:
          type: string
          format: uuid
        tenant_id:
          type: string
        project_id:
          type: string
        name:
          type: string
        status:
          type: string
          enum: [creating, running, terminated, failed]
        created_at:
          type: string
          format: date-time
        tunnels:
          type: object
          additionalProperties:
            $ref: '#/components/schemas/Tunnel'

    ExecHandle:
      type: object
      properties:
        exec_id:
          type: string
        sandbox_id:
          type: string
        stream_url:
          type: string
          format: uri

    SnapshotResult:
      type: object
      properties:
        snapshot_id:
          type: string
        image_ref:
          type: string
        size_bytes:
          type: integer

    LogEvent:
      type: object
      properties:
        stream:
          type: string
          enum: [stdout, stderr, system]
        data:
          type: string
        timestamp:
          type: string
          format: date-time

    Tunnel:
      type: object
      properties:
        port:
          type: integer
        url:
          type: string
          format: uri

Orchestration Flow (Sequence Diagram)

┌─────────┐    ┌────────────┐    ┌──────────────┐    ┌─────────────┐    ┌────────────┐
│  Agent  │    │ API Gateway│    │ Orchestrator │    │ Sandbox Pool│    │ Modal-like │
│ (client)│    │   (IAM)    │    │   Service    │    │   Manager   │    │  Backend   │
└────┬────┘    └─────┬──────┘    └──────┬───────┘    └──────┬──────┘    └─────┬──────┘
     │               │                  │                   │                 │
     │──POST /sandboxes──────────────>│                   │                 │
     │               │                  │                   │                 │
     │               │<──validate JWT, extract tenant/project/roles──       │
     │               │                  │                   │                 │
     │               │──CreateSandboxCmd (with quotas)───>│                 │
     │               │                  │                   │                 │
     │               │                  │──check tenant quota, reserve───>│ │
     │               │                  │                   │                 │
     │               │                  │                   │──Sandbox.create()──>│
     │               │                  │                   │     (image, limits,  │
     │               │                  │                   │      network_policy) │
     │               │                  │                   │<───sandbox_id────────│
     │               │                  │<──sandbox_id, status─────────────│      │
     │<─────────────201 {sandbox}──────│                   │                 │
     │               │                  │                   │                 │
     │──POST /sandboxes/{id}/exec────>│                   │                 │
     │               │                  │──ExecCmd───────>│                 │
     │               │                  │                   │──sb.exec(cmd)──────>│
     │               │                  │                   │<──exec_handle───────│
     │               │                  │<──exec_id, stream_url────────────│      │
     │<─────────────200 {exec_handle}──│                   │                 │
     │               │                  │                   │                 │
     │──GET /sandboxes/{id}/streams ─>│                   │                 │
     │               │                  │──subscribe to log stream────────>│      │
     │<─────────────SSE: stdout/stderr lines──────────────│<──streaming────│      │
     │               │                  │                   │                 │
     │──DELETE /sandboxes/{id}───────>│                   │                 │
     │               │                  │──TerminateCmd──>│                 │
     │               │                  │                   │──sb.terminate()────>│
     │               │                  │                   │<──ack───────────────│
     │               │                  │──release quota──│                 │
     │<─────────────204─────────────────│                   │                 │

Key orchestration points from Modal's patterns:

Sandbox.create() allocates a container with specified image, volumes, CPU/GPU/memory limits, and network policy (block_network, cidr_allowlist) .
Sandbox.exec() runs commands inside the sandbox, returning a handle for streaming stdout/stderr .
Sandbox.snapshot_filesystem() persists the current filesystem state as a reusable image .
Sandbox.terminate() cleans up resources .

IAM Model

Token Structure (JWT Claims)

{
  "sub": "user:uuid",
  "tenant_id": "tenant:uuid",
  "project_ids": ["proj:uuid1", "proj:uuid2"],
  "roles": ["sandbox:execute", "sandbox:snapshot", "volume:read"],
  "quota_tier": "standard",
  "network_policy_override": null,
  "exp": 1705212000,
  "iss": "coditect-iam"
}

Policy Enforcement Points

Check	Enforced At	Description
Token validity	API Gateway	JWT signature, expiry, issuer ¹⁵
Tenant isolation	Orchestrator	Sandbox tagged with tenant_id, query filters
Project scope	Orchestrator	project_id must be in token's project_ids
Role permission	Orchestrator	Action mapped to required role (see RBAC below)
Quota enforcement	Pool Manager	Concurrent sandboxes, GPU allocation per tenant ¹⁶
Network policy	Backend	cidr_allowlist, block_network applied at create

Secret Injection

Secrets are referenced by name (e.g., secrets: ["hf-token", "wandb-key"]) and resolved server-side using tenant-scoped secret stores. The orchestrator injects them as environment variables via Modal-style Secret.from_dict() pattern.

RBAC Schema

Roles

Role	Permissions
`sandbox:create`	Create sandboxes in assigned projects
`sandbox:execute`	Run exec commands in owned sandboxes
`sandbox:snapshot`	Create filesystem snapshots
`sandbox:destroy`	Terminate sandboxes
`sandbox:admin`	All sandbox operations + list all tenant sandboxes
`volume:read`	Mount volumes as read-only
`volume:write`	Mount volumes with read-write access
`secret:use`	Reference secrets for injection
`network:unrestricted`	Override default network policies
`gpu:request`	Request GPU resources

Role Hierarchy

tenant_admin
  └── project_admin
        ├── sandbox:admin
        ├── volume:write
        ├── secret:use
        └── gpu:request
              └── sandbox:create
                    ├── sandbox:execute
                    ├── sandbox:snapshot
                    └── sandbox:destroy

Resource Bindings

# Example: User role binding
bindings:
  - principal: user:alice-uuid
    tenant: tenant:acme-uuid
    projects: [proj:agent-runner]
    roles:
      - sandbox:create
      - sandbox:execute
      - sandbox:snapshot
      - volume:read
      - secret:use
    quotas:
      max_concurrent_sandboxes: 5
      max_gpu_hours_per_day: 10
      allowed_gpu_types: [t4, a10g]

Persistence Model

Workspace Types

Type	Lifecycle	Backend	Use Case
Ephemeral	Sandbox lifetime	tmpfs / overlay	Untrusted temp work, scratch ¹⁷
Project Volume	Persistent	Modal Volume / GCS FUSE	Shared project artifacts, datasets
Snapshot Image	Immutable	Container registry	Checkpoint/restore, reproducibility

# Orchestrator maps tenant request to Modal Volume
volumes = {
    "/workspace": modal.Volume.from_name(f"{tenant_id}/{project_id}/workspace"),
    "/data": modal.CloudBucketMount(f"gs://coditect-{tenant_id}-data", read_only=True)
}
sb = modal.Sandbox.create(app=tenant_app, volumes=volumes, ...)

Snapshot Flow

Agent requests POST /sandboxes/{id}/snapshot
Orchestrator calls sb.snapshot_filesystem()
Backend captures overlay diff → creates Image artifact
Snapshot metadata stored: {snapshot_id, tenant_id, project_id, parent_image, size, created_at}
Snapshot can be used as `image` in subsequent CreateSandboxRequest

Audit Logging Schema

Every sandbox operation emits an audit event to a write-ahead log (e.g., Pub/Sub → BigQuery or FoundationDB):

{
  "event_id": "uuid",
  "timestamp": "2026-01-14T06:47:00Z",
  "event_type": "sandbox.exec",
  "tenant_id": "tenant:acme",
  "project_id": "proj:agent-runner",
  "actor": {
    "type": "user|service_account|agent",
    "id": "uuid",
    "ip": "203.0.113.42"
  },
  "resource": {
    "type": "sandbox",
    "id": "sb-xyz",
    "name": "codegen-worker-1"
  },
  "action": {
    "command": ["python", "run_tests.py"],
    "workdir": "/workspace",
    "timeout": 120
  },
  "result": {
    "status": "success|failure|timeout",
    "exit_code": 0,
    "duration_ms": 4523
  },
  "resources_consumed": {
    "cpu_seconds": 8.2,
    "memory_mib_peak": 512,
    "gpu_seconds": 0
  }
}

Audit events support compliance queries: "Show all executions by agent X in project Y over the last 7 days with GPU usage."

Resource Limits Enforcement

Limit	Enforcement Mechanism	Default
Timeout	Sandbox auto-terminate after N seconds	300s (max 24h)
Idle timeout	Terminate if no exec/stdin/tunnel activity	60s
CPU	(request, limit) tuple → throttling	(0.5, 2.0)
Memory	(request, limit) MiB → OOM kill	(256, 1024)
GPU	Type selection + quota check	none
Concurrent sandboxes	Per-tenant counter in Pool Manager	10
Network	block_network / cidr_allowlist	allowlist: []

Suggested Follow-Up Prompts

gVisor Sentry syscall policy tuning: "Design a custom gVisor seccomp profile for Coditect sandboxes that allows common ML workloads (PyTorch, JAX) while blocking high-risk syscalls. Include audit logging for denied calls."
Quota billing integration: "Extend the Coditect sandbox API with usage metering endpoints and a Stripe-compatible billing webhook for CPU/GPU/storage consumption per tenant."
Warm pool pre-provisioning: "Design a sandbox warm pool strategy for Coditect that pre-provisions N sandboxes per (tenant, image) tuple to reduce cold-start latency, including eviction policies and cost optimization."

Would you like me to drill into any of these areas or generate implementation stubs (Rust/Python/Go) for specific components? ¹⁸¹⁹²⁰²¹²²²³²⁴²⁵²⁶²⁷²⁸²⁹³⁰

⁂

WHAT MODAL-LIKE technologies already exist which can provide the technical foundation for the MODAL-LIKE CODITECT sandbox technology platfrom.

Several “Modal-like” foundations already exist that you can either build on directly or crib heavily from for a Coditect self-hosted sandbox layer.³¹³²³³

Managed sandbox platforms (API-level inspirations)

These give you patterns for API, orchestration, and DX even if you don’t consume them directly.

Northflank Sandboxes / workloads – MicroVM (Kata or cloud-hypervisor) and gVisor isolation, any OCI image, BYOC deployments in your own cloud, and persistent volumes; marketed explicitly as a Modal alternative for secure AI code execution.³¹
E2B.dev – Firecracker-based microVM sandboxes focused on AI agents, with per-session workspaces, SDKs, and 24‑hour max lifetimes. Good reference for “agent-native” sandbox ergonomics.³⁴³¹
Daytona – Fast-provisioning Docker/Kata sandboxes for AI workflows, sub‑90ms spin-up, limited persistence but strong “ephemeral dev env / AI run” mental model.³⁵³⁴³¹
Cloudflare Workers + AI Sandbox SDK – V8 isolate boundary (no full Linux) but very strong example of capability-based APIs, no direct filesystem, and extremely fast cold starts.³⁴³¹
Vercel Sandbox (beta) – Firecracker-based ephemeral sandboxes tightly integrated with their platform; 45‑minute limits and “preview” semantics are a useful pattern for time-bounded untrusted code.³⁶³¹
InstaVM, Koyeb, RunPod et al. – Several GPU-oriented platforms now expose “code execution” sandboxes and serverless containers; Koyeb and RunPod are both called out as Modal alternatives for AI workloads.³⁷³⁸³⁴

Universal sandbox API libraries

These are especially relevant if you want Coditect to orchestrate multiple backends (Modal, E2B, Daytona, self-hosted, etc.) behind one interface.

Cased sandboxes – Open source Python library + CLI that provides a universal API for multiple cloud sandbox providers (Modal, E2B, Daytona, Cloudflare, etc.), with provider selection, failover, sandbox reuse, labels, image selection, and streaming output.³²³⁹
- Example API: async with Sandbox.create(provider="modal") as sandbox: await sandbox.execute("python analyze.py").³²
- This is very close to the Coditect “multi-provider executor” story; you could mirror its provider abstraction while swapping in your own control plane.³²

Architectural primitives (self-hosted building blocks)

Luis Cardoso’s “Field guide to sandboxes for AI” lays out clear decision criteria and suggests concrete tech stacks for AI coding agents.⁴⁰³³

gVisor – User-space kernel interception for hardened containers; good middle ground if you already run Kubernetes and want better isolation than plain containers without going full microVM.³³
Kata Containers / Firecracker / cloud-hypervisor – MicroVM isolation that Cardoso recommends explicitly for multi-tenant AI coding agents (hostile code, full Linux semantics).³³
Kubernetes + CRI runtimes – Use containerd + Kata runtime class or Firecracker-based runtimes to create per-job microVM sandboxes managed via standard K8s Jobs/Pods API. Cardoso walks through how CRI talks to Kata to spin a microVM per pod.³³

For a Coditect “Modal-like but self-hosted” design, you can:

Use Kubernetes + Kata/Firecracker as the isolation substrate.³³
Layer a sandbox control plane that exposes /sandboxes, /exec, /snapshot, /streams on top of K8s (like a focused PaaS).³¹³³
Optionally integrate with or borrow patterns from Cased sandboxes to get multi-provider abstractions and fallback to external services where appropriate.³⁹³²

Decision heuristics you can reuse

The field guide offers a practical decision table you can almost lift into Coditect docs:

For multi-tenant AI agent execution with hostile code and full Linux needs, choose microVM (Firecracker / cloud-hypervisor) as the boundary.³³
If you’re constrained to existing Kubernetes infra and semi-trusted workloads, gVisor-hardened containers are acceptable.³³
Minimum viable policy: default-deny egress with allowlist, short-lived scoped credentials, workspace-only filesystem, hard CPU/mem/timeout limits, and detailed telemetry (process tree + network).³³

Suggested next prompt

Design a self-hosted Coditect sandbox runtime on Kubernetes using Kata/Firecracker microVMs as the isolation boundary, taking inspiration from Cased sandboxes’ provider abstraction and Modal’s Sandbox API surface. Specify the CRDs, controller logic, and mapping from the Coditect /sandboxes HTTP API to K8s Jobs/Pods, including quota enforcement and network policies. ⁴¹⁴²⁴³⁴⁴⁴⁵

⁂

what open source can I leverage, I do not want proprietary 3rd party solutions as part of the stack

For a Coditect-owned, non-proprietary sandbox stack, you can assemble everything from open source components: isolation runtime, orchestrator, and developer-facing API.⁴⁶⁴⁷

These give you the secure execution boundary for untrusted agent code.

gVisor – User-space kernel implementing the Linux syscall interface, designed as a sandbox runtime between containers and the host kernel; used in several “untrusted code” runners.⁴⁸⁴⁹
Kata Containers – OCI-compatible runtime that runs each pod/container in a lightweight VM; effectively “Firecracker with Kubernetes support,” giving VM-grade isolation per sandbox.⁵⁰⁴⁸
Firecracker – MicroVM VMM originally from AWS, widely recommended as the default for multi-tenant AI coding agents that need full Linux, shell, and package managers.⁴⁷⁴⁸
Wasm runtimes (for capability-scoped tools) – Wasmtime, Wasmer, and WasmEdge can be used when you can constrain workloads to WASI, avoiding full Linux while gaining strong isolation and fast startup.⁴⁷

These can all be wired under containerd/CRI on Kubernetes, so that each Coditect sandbox maps to a Pod using a specific runtime class (e.g., Kata for microVM, gVisor for hardened containers).⁵¹⁴⁷

Orchestration and execution frameworks

These provide patterns or code for creating “submit code → run in sandbox → stream output” workflows.

Sandman (jakhax/sandman) – gVisor-based code execution service that runs and tests untrusted code inside a container sandbox; good reference for using gVisor as an isolation layer and discussing security tradeoffs.⁴⁹
Awesome-sandbox list – Curated overview of modern sandboxing solutions, with entries for e2b, Daytona, and others that show patterns for AI-agent runtimes even if you don’t use their hosted offerings.⁴⁶
Chris Hay’s Code Sandbox MCP server – Not a full infra runtime, but a concrete example of a code-execution service exposing a clean tool protocol (MCP) to LLM clients, which you can mirror with your own backend.⁵²

Universal sandbox API (open source)

Even if you don’t call third-party providers, Cased’s library is a strong reference for the shape of a provider-agnostic sandbox API.

Cased sandboxes (MIT-licensed) – Open-source Python library + CLI that defines a universal interface for cloud code execution sandboxes (create, execute, reuse, labels, streaming output).⁵³⁵⁴
- Shows how to abstract “provider” as a pluggable backend and implement a uniform Sandbox.create(...).execute(...) API that Coditect can replicate while routing to your own gVisor/Kata/Firecracker control plane.⁵³

Design guidance from open references

For high-level architecture choices and tradeoffs, you can lean on recent sandbox guides.

Luis Cardoso’s “Field guide to sandboxes for AI” – Open article that compares containers, gVisor, microVMs, and Wasm, and recommends microVMs (Firecracker/cloud-hypervisor) as the default for multi-tenant AI coding agents, with gVisor as a compromise when you’re already on K8s.⁴⁷
Simon Willison’s mirror / commentary – Reinforces the same taxonomy and serves as an additional reference for justifying your architecture and threat model to stakeholders.⁵⁵⁵⁶

Concrete stack suggestion for Coditect

All-open stack, no proprietary services:

Boundary: Firecracker or Kata microVMs (for multi-tenant hostile code) plus gVisor for lighter, semi-trusted workloads.⁴⁸⁴⁷
Orchestration: Kubernetes with runtime classes for Kata/gVisor, plus a custom Coditect sandbox controller and CRD (Sandbox, SandboxExec) that maps your HTTP API to K8s Jobs/Pods.⁵¹
API / DX layer: Coditect “sandbox executor” service modeled on Cased sandboxes API patterns (create, execute, reuse, labels, streaming) but backed solely by your own runtimes.⁵⁴⁵³

If you want, the next step can be: “Design a Kubernetes-based Coditect Sandbox CRD and controller that uses Kata Containers as the runtime, including the mapping from HTTP /sandboxes to CRD, and how to plug in gVisor/Firecracker as alternative runtime classes.” ⁵⁷⁵⁸⁵⁹⁶⁰

⁂

RESEARCH EACH OF THESE in DEPTH: COMPARE AND CONTRAST, PROs AND CONs solation runtimes (core “Modal-like” boundary)

These give you the secure execution boundary for untrusted agent code. gVisor – User-space kernel implementing the Linux syscall interface, designed as a sandbox runtime between containers and the host kernel; used in several “untrusted code” runners. Kata Containers – OCI-compatible runtime that runs each pod/container in a lightweight VM; effectively “Firecracker with Kubernetes support,” giving VM-grade isolation per sandbox. Firecracker – MicroVM VMM originally from AWS, widely recommended as the default for multi-tenant AI coding agents that need full Linux, shell, and package managers. Wasm runtimes (for capability-scoped tools) – Wasmtime, Wasmer, and WasmEdge can be used when you can constrain workloads to WASI, avoiding full Linux while gaining strong isolation and fast startup.

All four can underpin a “Modal-like” sandbox layer, but they make different tradeoffs on isolation strength, compatibility, performance, and complexity.⁶¹⁶²⁶³⁶⁴

Snapshot: when to use what

Runtime	Isolation strength	Linux compatibility	Startup / density	Best fit in Coditect
gVisor	Higher than containers, lower than VMs ⁶²	Very high, but some syscalls slower/unsupported ⁶⁵⁶⁶	Near-container startup, good density ⁶²	Semi-trusted multi-tenant, “secure containers” on K8s
Kata	VM-grade, per-container microVM ⁶⁷⁶⁸⁶³	Full Linux, OCI-compatible ⁶⁷⁶³	Slower than containers; faster than full VMs ⁶³	High-risk tenants; “secure pod” profile on K8s
Firecracker	Very strong, FaaS-level isolation ⁶⁹⁷⁰⁶⁴	Full Linux inside guest, but custom integration ⁶⁹⁷⁰	100–125 ms spin-up, huge density ⁶⁹⁶⁴	Your own Lambda/Modal-style pool for untrusted agents
Wasm runtimes (Wasmtime/Wasmer/WasmEdge)	Very strong per-module memory + capability isolation ⁶¹	Limited to WASI / host APIs; no full Linux ⁶¹	Microseconds–ms startup, extremely high density ⁶¹⁷¹	Capability-scoped tools, sandboxes for constrained languages

gVisor

What it is

A user-space kernel that implements the Linux syscall interface and sits between containers and the host kernel; it “implements Linux by way of Linux” by intercepting syscalls in a sentry process.⁶⁶⁶²⁷²
Deployed as a container runtime sandbox (e.g., runsc), including integration with Kubernetes and GKE Sandbox; often described as “seccomp on steroids.”⁶⁵⁶²

Pros

Better isolation than plain containers: host kernel surface exposed to the workload is drastically reduced; syscalls are handled by the user-space kernel rather than directly by the host.⁶²⁶⁵
Lightweight footprint vs VMs: no guest OS to boot, no per-VM kernel; starts fast and scales like containers while adding an isolation boundary.⁶²
Works without hardware virtualization: no need for KVM support, so easier in nested virtualization environments or constrained clouds.⁶²
Kubernetes-native: can be plugged in as a runtime class and selectively applied to pods that need extra isolation.⁶²

Cons

Not VM-grade isolation: still shares the host kernel; a gVisor escape is less likely than a vanilla container escape, but the blast radius is larger than with Firecracker/Kata microVMs.⁷³⁶²
Performance overhead: syscall-heavy workloads pay a noticeable tax; each syscall goes through the user-space kernel.⁶⁵⁶²
Compatibility quirks: some low-level kernel features, /proc behavior, or exotic syscalls may be missing or behave differently, which can surprise deep Linux tooling.⁶⁶⁶⁵

When it shines for Coditect

Multi-tenant but semi-trusted agent code (e.g., internal teams, controlled languages) where you want better isolation than containers but don’t want to pay microVM costs.⁶¹⁶²
You already have Kubernetes and want to opt-in sandboxing via a runtimeClass on selected workloads.⁶²

Kata Containers

What it is

An open-source runtime that runs each “container” inside its own minimal VM, combining container UX with VM isolation.⁶⁷⁶⁸⁶³
Integrates with Docker/Kubernetes using OCI and CRI, with a runtime plus CRI-friendly shim/library.⁶³⁶⁷

Pros

VM-grade isolation: each pod/container gets its own guest kernel and VM boundary, significantly reducing cross-tenant risk compared to shared-kernel containers.⁶⁷⁶³
Kubernetes/OCI compatible: drop-in runtime that lets you run Kata and standard containers in the same cluster, choosing per-workload isolation.⁶³⁶⁷
Supports multiple VMMs: can use Firecracker or Cloud Hypervisor under the hood, so you get microVM characteristics with K8s integration.⁶⁸⁶³

Cons

Higher overhead than containers: you pay for a guest kernel and VM per sandbox; memory footprint per workload is larger.⁶³
Slower cold starts than containers: still typically faster than traditional VMs, but slower than gVisor/container-only setups.⁶³
Operational complexity: more moving parts (runtime, agent, hypervisor), guest kernel management, and debugging complexity vs plain containers.⁶³

When it shines for Coditect

High-risk, multi-tenant untrusted code (public SaaS) where you want strong isolation but also Kubernetes-native control and scheduling.⁶⁸⁶⁷⁶³
You want a “secure pod” class: map Coditect “high-risk sandboxes” to a K8s runtimeClass that uses Kata, keeping lower-risk workloads on gVisor or runc.⁶³

Firecracker

What it is

An open-source microVM VMM built by AWS, designed for secure, multi-tenant container and function workloads with minimal overhead.⁷⁰⁶⁴
Used under AWS Lambda and Fargate to start thousands of microVMs per second with ~100–125 ms cold-start times and as low as ~5 MB memory footprint per microVM.⁶⁹⁶⁴

Pros

Very strong isolation: each microVM has its own kernel and minimal device model, tailored for security and multi-tenancy.⁶⁴⁶⁹
Purpose-built for FaaS/serverless: start thousands of microVMs per second, with cold starts competitive with containers; ideal for short-lived, untrusted code.⁶⁹⁶⁴
Minimal footprint: small memory and device surface compared to general-purpose hypervisors.⁶⁴⁶⁹

Cons

Lower-level integration effort: unlike Kata, Firecracker doesn’t come with built-in Kubernetes integration; you must integrate via containerd plugins or build your own control plane.⁶⁸⁶⁴
Guest VM management: you must manage guest OS images, kernels, and per-VM boot config, similar to running VMs at scale.⁶⁹⁶⁴
More opinionated: limited device model and focus on network+block devices can complicate some advanced workloads (e.g., complex PCI passthrough).⁷⁰⁶⁹

When it shines for Coditect

A Modal-like / Lambda-like executor: Coditect runs each agent sandbox in a Firecracker microVM, with its own VM pool, warm instances, and very tight per-tenant isolation.⁶⁴⁶⁹
You’re willing to build a custom control plane (or K8s integration) and want direct control of microVM lifecycle, warm pools, and scheduling.⁷⁰⁶⁹

Wasm runtimes (Wasmtime, Wasmer, WasmEdge)

What they are

WebAssembly runtimes that execute Wasm modules with linear memory and no ambient access: all host interactions must be explicitly imported.⁶¹
Often support WASI (WebAssembly System Interface) for POSIX-like capabilities and provide resource metering (“fuel”) for deterministic preemption.⁶¹

Pros

Strong memory and capability isolation: modules can’t touch arbitrary host memory or the OS unless explicitly allowed; great fit for capability-based “tools.”⁶¹
Very fast startup and high density: no guest OS, no VM boot; instantiation is microseconds–milliseconds.⁷¹⁶¹
Fine-grained resource control: e.g., Wasmtime’s fuel mechanism for instruction metering, making runtime limits more deterministic than “CPU time + signals” alone.⁶¹
Language reach: multiple languages compile to Wasm (Rust, TinyGo, C/C++, Zig, some Python/JS subsets), which can be used for extensions and plugins.⁶¹

Cons

No full Linux: many agent workloads assume POSIX, /proc, apt, arbitrary shells, etc., which do not exist in pure WASI environments.⁶¹
Ecosystem gaps: not every language or library is readily portable; some ML stacks or system tools won’t work without heavy adaptation.⁶¹
Host API design burden: you must define a careful host surface (I/O, network, filesystem) and then maintain that ABI over time.⁶¹

When it shines for Coditect

Capability-scoped tools and plugins: “safe” tools where Coditect defines a Wasm ABI (e.g., limited file access, HTTP client) and agents can only call those.⁶¹
Single-tenant or internal deployments where you want Maximum density and safety for specific languages (Rust, Go) without the full complexity of Linux sandboxing.⁷¹⁶¹

How to combine them in a Coditect architecture

A practical hybrid that lines up with current best practice for AI sandboxes:⁶⁴⁶²⁶³⁶¹

Tier 1 (light sandbox) – gVisor runtime class on Kubernetes:
- For internal or semi-trusted code where container-level semantics are enough; used for “fast dev” sandboxes.
Tier 2 (strong sandbox) – Kata Containers (possibly backed by Firecracker):
- For external tenants and highly untrusted code; Coditect maps “high-risk agent sessions” to Kata runtimeClass pods.
Tier 3 (serverless microVM) – Firecracker pool:
- For Modal-style, bursty agent workloads: a separate executor service that manages Firecracker microVMs directly, with warm pools and hard quotas.
Tier 4 (tool sandbox) – Wasm runtime (Wasmtime/Wasmer/WasmEdge):
- For Coditect’s “capability-scoped tools” where you control the ABI; used for plugin-style extensions and deterministically metered workloads.

If you want, the next step can be a detailed design prompt like:

Design a tiered Coditect sandbox architecture that uses gVisor, Kata Containers (with Firecracker), and Wasmtime in a single Kubernetes cluster. Specify how the /sandboxes API selects a runtime tier based on tenant, project, and risk level, and how to share common observability, quota, and audit logging across all runtimes. ⁷⁴⁷⁵⁷⁶

⁂

Design a tiered Coditect sandbox architecture that uses gVisor, Kata Containers (with Firecracker), and Wasmtime in a single Kubernetes cluster. Specify how the /sandboxes API selects a runtime tier based on tenant, project, and risk level, and how to share common observability, quota, and audit logging across all runtimes.

Design a 3‑tier sandbox architecture by treating the “runtime choice” as a policy decision at the /sandboxes control plane, and using Kubernetes RuntimeClasses to route pods to gVisor, Kata+Firecracker, or Wasm-backed workloads.⁷⁷⁷⁸⁷⁹⁸⁰⁸¹

1. Runtime tiers and Kubernetes setup

Runtime classes

Define three RuntimeClass objects for Linux-based sandboxes, plus a Wasm integration (via CRI plugin or sidecar).⁸²⁸³⁸⁰⁸⁴⁷⁷

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: gvisor
handler: runsc           # gVisor handler [web:73][web:65]
---
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: kata-fc
handler: kata-fc         # Kata Containers using Firecracker [web:66][web:78][web:84]
---
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: runc
handler: runc            # baseline container runtime

For Wasm, follow CNCF’s “Wasm on Kubernetes” pattern, using either:⁸⁵⁸¹

Wasm containers: using a Wasm-aware runtime (e.g., cri-o/containerd shim) and an annotation like module.wasm.image/variant.⁸¹
Sidecar pattern: run a Wasmtime/Wasmer sidecar that executes Wasm modules on demand next to a thin HTTP gRPC proxy container.⁸⁵⁸¹

2. `/sandboxes` API and runtime selection

API surface

You keep a single tenant-facing HTTP API, with an explicit but optional risk_profile and runtime_hint that the control plane resolves to a runtime tier:

POST /sandboxes
{
  "project_id": "proj-uuid",
  "name": "agent-run-123",
  "image": "ghcr.io/coditect/agent-runner:latest",
  "risk_profile": "untrusted_public | semi_trusted | internal",
  "runtime_hint": "auto | gvisor | kata | wasm",
  "workload_type": "linux_full | wasm_tool",
  "limits": { "cpu": 1.0, "memory_mib": 1024, "gpu": "none" },
  "network_policy": { "block_network": true },
  "code": {
    "language": "python",
    "entrypoint": "main.py"
  }
}

Policy engine

On POST /sandboxes, the Coditect sandbox controller:

Authenticates the caller and loads tenant + project configuration (risk tier, allowed runtimes).
Computes an effective runtime tier (gVisor / Kata+FC / Wasm) based on:
- Tenant risk classification (e.g., “external SaaS”, “internal corp”).
- Project tag (e.g., project.security_level = high).
- Requested runtime_hint and workload_type.
Maps tier to implementation: Kubernetes RuntimeClass for Linux workloads, or Wasm pipeline for capability-scoped tools.⁷⁸⁷⁹⁷⁷⁸¹

Example pseudo-logic:

def choose_runtime(tenant, project, req):
    # 1. Wasm tools get routed to Wasm
    if req.workload_type == "wasm_tool":
        return "wasm"

    # 2. Force Kata+Firecracker for high-risk external tenants
    if tenant.risk == "external" or project.flags.get("requires_vm_isolation"):
        return "kata-fc"

    # 3. Respect explicit hint if allowed
    if req.runtime_hint == "gvisor" and "gvisor" in tenant.allowed_runtimes:
        return "gvisor"
    if req.runtime_hint == "kata" and "kata-fc" in tenant.allowed_runtimes:
        return "kata-fc"

    # 4. Default policy
    if tenant.risk == "internal":
        return "gvisor"
    else:
        return "kata-fc"

3. Mapping to Kubernetes and Wasm

3.1 gVisor tier (semi-trusted)

For runtime = "gvisor", the controller creates a Pod with the gvisor runtimeClassName.⁸³⁸⁶⁸²

apiVersion: v1
kind: Pod
metadata:
  name: sb-123
  labels:
    coditect.sandbox/id: "sb-123"
    coditect.tenant/id: "tenant-abc"
spec:
  runtimeClassName: gvisor
  containers:
    - name: sandbox
      image: ghcr.io/coditect/agent-runner:latest
      command: ["sleep", "infinity"]
      resources:
        requests:
          cpu: "500m"
          memory: "512Mi"
        limits:
          cpu: "2"
          memory: "1Gi"

gVisor provides an extra boundary beyond runc while still running as fast, OCI-compliant containers.⁸⁶⁷⁸⁸³

3.2 Kata + Firecracker tier (untrusted/public)

For runtime = "kata-fc", the controller creates Pods using the kata-fc RuntimeClass; Kata then uses Firecracker under the hood.⁷⁹⁸⁰⁸⁷⁸⁸

apiVersion: v1
kind: Pod
metadata:
  name: sb-456
  labels:
    coditect.sandbox/id: "sb-456"
spec:
  runtimeClassName: kata-fc
  containers:
    - name: sandbox
      image: ghcr.io/coditect/agent-runner:latest
      command: ["sleep", "infinity"]
      resources:
        requests:
          cpu: "500m"
          memory: "512Mi"

This gives you VM-grade isolation and microVM characteristics (fast boot, low footprint) while still scheduling via Kubernetes.⁸⁰⁸⁷⁸⁸⁷⁹

3.3 Wasm tier (capability tools)

For runtime = "wasm", Coditect does not spin a full Linux sandbox. Instead it:

Deploys a Wasm executor service (Deployment + Service) with a Wasmtime/Wasmer runtime.
/sandboxes returns a logical sandbox ID, but the “exec” calls go to the Wasm executor’s HTTP/gRPC API.

You can either:⁸¹⁸⁵

Run the executor as a sidecar next to a thin API container, or
Use a Wasm container runtime (e.g., annotation-based selection as shown in CNCF’s guide).⁸¹

4. Shared observability, quota, and audit logging

4.1 Central metadata and audit log

Maintain a cluster-agnostic metadata store (e.g., PostgreSQL, FoundationDB) with a sandboxes table recording tenant, project, runtime tier, and lifecycle state:

CREATE TABLE sandboxes (
  id uuid PRIMARY KEY,
  tenant_id uuid NOT NULL,
  project_id uuid NOT NULL,
  runtime_tier text NOT NULL,   -- gvisor | kata-fc | wasm
  k8s_namespace text,
  k8s_pod_name text,
  status text,
  created_at timestamptz,
  terminated_at timestamptz
);

Every API action (create, exec, snapshot, destroy) writes to an append-only audit_log table or event stream (Pub/Sub, Kafka).⁸⁹⁸⁶

CREATE TABLE sandbox_audit_log (
  event_id uuid PRIMARY KEY,
  sandbox_id uuid NOT NULL,
  tenant_id uuid NOT NULL,
  event_type text NOT NULL,      -- create | exec | destroy | snapshot
  runtime_tier text NOT NULL,
  actor jsonb,
  details jsonb,
  timestamp timestamptz NOT NULL
);

Because runtime tier is just metadata, you can query “all exec events across gVisor/Kata/Wasm for tenant X” uniformly.

4.2 Common telemetry stack

Instrument all runtimes with the same observability layer:

Logs:
- Use a cluster-wide log collector (Fluent Bit, Vector, OpenTelemetry Collector) to ship container stdout/stderr to a common log backend (e.g., Loki, Elasticsearch, GCP Logging).⁸⁹
- Use consistent labels: tenant_id, project_id, sandbox_id, runtime_tier.
Metrics:
- Expose K8s metrics (CPU/mem usage per Pod) and Wasm executor metrics via Prometheus.
- Implement per-sandbox metrics: CPU seconds, memory peak, exec duration.
Traces:
- Instrument /sandboxes API, controller, and executor with OpenTelemetry spans, including attributes like coditect.runtime_tier.

4.3 Quota enforcement

Implement a quota service used by the API controller before creating or executing a sandbox, regardless of runtime:

Per-tenant limits:
- max_concurrent_sandboxes[gvisor], max_concurrent_sandboxes[kata-fc], max_concurrent_sandboxes[wasm].
- Daily CPU/GPU time per tier.

Flow on POST /sandboxes:

Controller computes runtime_tier.
Calls quota_service.reserve(tenant_id, runtime_tier, requested_limits) (atomic).
If accepted, create Pod or logical sandbox; on failure, return 403/429.
On sandbox termination, the controller calls quota_service.release(...).

Because this is tier-agnostic (only keyed by runtime_tier), quotas work similarly whether the sandbox is a Pod using gVisor/Kata or a Wasm module in the executor.

5. High-level sequence for `/sandboxes` with tiered runtimes

Client ──POST /sandboxes──────────────────────────► API Gateway
       (risk_profile, runtime_hint, workload_type)
Gateway ──JWT validate, tenant/project lookup─────► Sandbox Controller
Controller ──compute runtime tier─────────────────► Policy Engine
           ◄──────────────────────────────────────
Controller ──quota.reserve(tenant, tier, limits)─► Quota Service
           ◄──ok / reject─────────────────────────
[if rejected → 403]

[if tier == gvisor or kata-fc]
Controller ──create Pod with runtimeClassName────► Kubernetes API
           ◄──Pod object──────────────────────────
Controller ──record in metadata & audit_log──────► DB/Event Bus

[if tier == wasm]
Controller ──create logical sandbox in Wasm exec► Wasm Executor
           ◄──sandbox_id─────────────────────────
Controller ──record in metadata & audit_log──────► DB/Event Bus

Controller ──201 {sandbox_id, runtime_tier,...}─► Client

Exec, snapshot, and destroy operations follow the same pattern but call either the Kubernetes API (for gVisor/Kata pods) or the Wasm executor’s API, while always writing to shared audit and using shared quota bookkeeping.

If you want, the next iteration can focus on:

A CRD-based design (Sandbox, SandboxExec) instead of direct Pod creation, or
A concrete OpenAPI 3.1 spec extension that exposes runtime_tier and policy decisions back to the client for debugging and observability. ⁹⁰⁹¹

⁂

Define security and performance tradeoffs for each runtime tier

Example RuntimeClass YAML for gVisor Kata and Wasmtime tiers How to configure node pools for Firecracker and Wasm workloads Admission controller policy to enforce /sandboxes runtime selection Pod security considerations when mixing runtimes in one cluster

Security and performance differ by tier; you’ll enforce them with RuntimeClasses, node pools, and admission policies that ensure each sandbox lands on the right infrastructure with consistent guardrails.⁹²⁹³⁹⁴⁹⁵⁹⁶⁹⁷

Security and performance tradeoffs by tier

gVisor tier

Security
- Strength: Higher than plain containers (user-space kernel “Sentry” intercepts syscalls), reducing host-kernel attack surface.⁹⁸⁹³
- Weakness: Still shares host kernel; not as strong as VM/microVM isolation if gVisor or host kernel are compromised.⁹⁵⁹⁸
- Good use: Semi-trusted or internal multi-tenant workloads where container-level semantics are required.⁹³⁹⁵
Performance
- Overhead: Syscall-heavy or I/O-heavy workloads see higher latency because every syscall passes through the user-space kernel.⁹⁹⁹³
- Startup: Very close to container startup; negligible extra cold start vs runc.⁹³

Kata Containers + Firecracker tier

Security
- Strength: Each pod runs inside a lightweight VM with its own guest kernel, providing VM-grade, hardware-backed isolation.⁹⁶¹⁰⁰⁹⁸
- Good use: Untrusted/public code execution and strong tenant isolation scenarios.⁹⁸⁹⁵⁹⁶
Performance
- Overhead: Higher memory and CPU overhead per sandbox vs gVisor; you pay for booting a microVM and guest kernel.¹⁰⁰⁹⁶
- Startup: Faster than traditional VMs, but slower than containers; firecracker-containerd and pre-warmed VMs help mitigate cold start.¹⁰¹¹⁰²

Wasmtime / Wasm tier

Security
- Strength: Strong in-process isolation—linear memory and capability-based host APIs; no POSIX/host kernel surface unless explicitly exposed.¹⁰³⁹⁷
- Weakness: Only covers code that can be compiled to Wasm/WASI; any unsafe host APIs you expose become the main attack surface.⁹⁷¹⁰³
Performance
- Overhead: Extremely low cold start for simple functions (few–tens of ms vs tens–hundreds of ms for microVMs).¹⁰³
- Limitations: Complex workloads with large dependency graphs and heavy compute can see slower cold starts and execution than microVM-based setups.¹⁰⁴¹⁰³

Example RuntimeClass YAML for gVisor, Kata, and Wasm

gVisor RuntimeClass

Use runsc as handler; optionally restrict to sandboxes node pool via scheduling.nodeSelector.¹⁰⁵¹⁰⁶⁹²⁹⁵

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: gvisor
handler: runsc
scheduling:
  nodeSelector:
    coditect.io/node-profile: gvisor
  tolerations:
    - key: "coditect.io/sandbox"
      operator: "Equal"
      value: "gvisor"
      effect: "NoSchedule"

Kata + Firecracker RuntimeClass

Use Kata handler configured to use Firecracker as VMM, and tie to a dedicated node pool.¹⁰²¹⁰⁷¹⁰⁸¹⁰⁵

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: kata-fc
handler: kata-fc
scheduling:
  nodeSelector:
    coditect.io/node-profile: kata-firecracker
  tolerations:
    - key: "coditect.io/sandbox"
      operator: "Equal"
      value: "kata-fc"
      effect: "NoSchedule"

Wasm RuntimeClass (if using Wasm containers)

If you’re using a Wasm-aware containerd shim (per CNCF practice guide), define a RuntimeClass that points to the Wasm handler and node pool.¹⁰⁹⁹⁷

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: wasm-wasmtime
handler: wasmtime
scheduling:
  nodeSelector:
    coditect.io/node-profile: wasm
  tolerations:
    - key: "coditect.io/sandbox"
      operator: "Equal"
      value: "wasm"
      effect: "NoSchedule"

If instead you use a Wasm executor Deployment (sidecar or service), you won’t need a RuntimeClass; the tier is enforced via your control plane.

Node pool configuration for Firecracker and Wasm workloads

Firecracker/Kata nodes

Label and taint nodes to ensure only Kata/Firecracker sandboxes land there:¹¹⁰⁹²¹⁰²
- Labels: coditect.io/node-profile=kata-firecracker
- Taints: coditect.io/sandbox=kata-fc:NoSchedule
Configure containerd on those nodes with kata-fc runtime pointing to Kata configured for Firecracker:¹⁰⁷¹⁰⁸¹⁰²
- containerd.toml with plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-fc.
Capacity planning: fewer but larger nodes, since microVM overhead per sandbox is higher; account for guest OS memory and disk.

Wasm nodes

Option A – Wasm runtimeClass: nodes with Wasm-aware container runtime:⁹⁷¹⁰⁹
- Labels: coditect.io/node-profile=wasm
- Taints: coditect.io/sandbox=wasm:NoSchedule
- containerd configured with a wasmtime/wasmedge runtime handler.
Option B – Wasm executor pool: generic nodes running Wasm executor pods.
- Use node labels for CPU-optimized nodes (node.kubernetes.io/instance-type=c3-highcpu) and schedule Wasm executors there.⁹⁷

For both tiers, you keep compute isolation by not mixing high-risk runtimes with general workloads on the same nodes.

Admission controller policy for enforcing `/sandboxes` runtime

Implement a ValidatingAdmissionWebhook (plus optional MutatingAdmissionWebhook) that:

Only allows Coditect sandbox controller to set runtimeClassName.
Enforces mapping between sandbox labels/annotations and RuntimeClass.⁹⁴¹¹¹¹¹²¹¹³

Mutating webhook (optional)

If devs create Pods directly (internal tools), you can auto-set RuntimeClass based on labels or namespace:

On Pod CREATE:
- If label coditect.io/sandbox-runtime=gvisor, set .spec.runtimeClassName = "gvisor".
- Same for kata-fc and wasm-wasmtime.

Validating webhook (strong enforcement)

Reject pods that:
- Have runtimeClassName set but lack coditect.io/sandbox-id label (prevents arbitrary use of Kata/gVisor).¹¹³
- Use a RuntimeClass that doesn’t match allowed values for the namespace/tenant.
Ensure nodeSelector compatibility: intersect node selectors with the chosen RuntimeClass’s scheduling nodeSelector, reject on conflict.⁹²⁹⁴

This guarantees only the Coditect control plane can allocate expensive, high-isolation runtimes and binds runtime selection to your /sandboxes policy engine.

Pod security considerations when mixing runtimes

When multiple runtimes share a cluster, lock down Pod specs and namespaces so that invariants hold across all tiers.¹¹⁴⁹⁵⁹⁸

Pod Security / SecurityContext
- Enforce non-root, drop all capabilities, no privileged or hostPath mounts, and no host network for all sandbox pods via Namespace-level Pod Security Standards or OPA/Gatekeeper/ValidatingAdmissionPolicy.¹¹⁴
- Disallow hostPID, hostIPC, and hostNetwork for sandbox pods.⁹⁵¹¹⁴
Namespace isolation
- Use dedicated namespaces per tenant or per tier (e.g., coditect-gvisor, coditect-kata, coditect-wasm).
- Apply tighter NetworkPolicies and ResourceQuotas per namespace.¹¹⁴
Runtime isolation
- Do not allow regular workloads to use high-security RuntimeClasses; enforce via admission.¹¹³
- Ensure runtime-specific node pools are not running cluster control plane components or shared databases to reduce blast radius.
Observability and secrets
- Use sidecarless logging (node-level collectors) to avoid sharing sidecars between runtimes.¹¹⁵
- Scope Kubernetes secrets to namespaces and bind them to specific service accounts used by sandbox pods; enforce via RBAC and admission.¹¹¹¹¹⁴

If you want, the next iteration can be a concrete ValidatingAdmissionPolicy or Gatekeeper constraint set that encodes: “only Coditect controller service account may create pods with runtimeClassName in {gvisor, kata-fc, wasm-wasmtime}, and all such pods must have sandbox labels and non-privileged securityContext.” ¹¹⁶

⁂

WHICH OF THESE COULD RUN MULTI-USER MULTI-TENANT MULTI-TEAM MULTI-PROJECT WORKFLOWS on GOOGLE CLOUD WORKSTATIONS economically?

For multi-user / multi-tenant / multi-team workflows on Google Cloud Workstations, the most economical and operationally straightforward option from your list is gVisor-based containers, with Wasm as a niche add‑on for specific tools; Kata+Firecracker is possible but significantly more complex and costly on GCP.¹¹⁷¹¹⁸¹¹⁹¹²⁰¹²¹¹²²

gVisor on GCP (best economic fit)

Google already runs gVisor at scale for Cloud Run, Cloud Functions, App Engine, and GKE Sandbox, which shows it works well for arbitrary user workloads with good compatibility.¹¹⁸¹²³¹¹⁹¹¹⁷
Cloud Workstations are just GCE VMs; you can run runsc there or on a companion GKE cluster, giving each user/team “nested” sandboxes without extra hypervisor layers.¹²⁴¹²²
No need for nested virtualization; gVisor works on VMs or bare metal and is explicitly documented as “cloud ready” for all major providers.¹²²

This makes gVisor the most economical and least operationally risky choice for embedding many Coditect sandboxes into a shared pool of Google Cloud Workstations (per-team dev VMs), while still adding a strong isolation boundary around untrusted agent code.¹¹⁹¹¹⁷¹¹⁸¹²²

Kata Containers + Firecracker on GCP (possible, but heavier)

Kata requires hardware virtualization (KVM) in the underlying node; on GKE/GCE, you must customize node images or enable nested virtualization to run Kata+Firecracker, which is not available by default.¹²⁰¹²¹¹²⁵
Northflank’s case study notes extra engineering effort to provision Kata on Google Cloud because standard GKE nodes don’t expose KVM; they built tooling for custom images with virtualization enabled.¹²⁰
Firecracker on GKE has been demonstrated by running Kata Containers with Firecracker as the VMM, but again this relies on KVM and custom configuration on GCE nodes.¹²¹¹²⁶¹²⁷

So while Kata+Firecracker can give you strong isolation for some high-risk multi-tenant workflows, it is less economical and more complex than gVisor on standard Workstations/VMs, and you’d likely reserve it for a smaller pool of critical workloads rather than the general multi-user Workstations fleet.¹²⁸¹²⁵¹²¹¹²⁰

Wasm runtimes on GCP (targeted, very cheap per-tool)

GCP already exposes Wasm at the edge via Service Extensions (GCPWasmPlugin) and Artifact Registry‑hosted Wasm modules for load balancers.¹²⁹¹³⁰
You can also run WasmEdge/Wasmtime inside containers on GCE/GKE/Workstations, which is light-weight and doesn’t require nested virtualization; good for per-tenant “tools” rather than full dev workflows.¹³¹¹³²¹³³

Economically, Wasm is ideal for capability-scoped tools (linting, small transforms) that Coditect calls from Workstations, but not for full multi-tenant Linux workflows (no full distro, limited POSIX).¹³³¹³⁴¹³¹

Practical recommendation for Coditect on Google Cloud Workstations

Use gVisor as the default “Modal-like” sandbox layer inside or alongside Workstations:
- Run a gVisor-enabled container runtime (runsc) on shared GCE nodes or a GKE cluster attached to the same VPC as Workstations.¹¹⁷¹¹⁹¹²⁴
- Treat each user/team/project workflow as a gVisor sandbox, with multi-tenant isolation and quotas enforced at the Coditect layer.
Optionally add:
- Wasm runtimes for scoped tools and plugins (extremely cheap, safe, and portable).¹³⁰¹²⁹¹³³
- A small, specialized Kata+Firecracker node pool if you later need VM-grade isolation for the most sensitive multi-tenant workloads and are willing to handle nested virtualization and custom images on GCE.¹²⁵¹²¹¹²⁰

If you want, next step could be: “Design a Coditect + Google Cloud Workstations reference architecture using gVisor for in-VM sandboxes (no Kubernetes), including how to provision per-team Workstations, run nested containers with runsc, and enforce tenant/project quotas.” ¹³⁵¹³⁶

⁂

Design a Coditect + Google Cloud Workstations reference architecture using gVisor for in-VM sandboxes (no Kubernetes), including how to provision per-team Workstations, run nested containers with runsc, and enforce tenant/project quotas. Practical recommendation for Coditect on Google Cloud Workstations

Use gVisor as the default “Modal-like” sandbox layer inside or alongside Workstations: Run a gVisor-enabled container runtime (runsc) on shared GCE nodes or a GKE cluster attached to the same VPC as Workstations. Treat each user/team/project workflow as a gVisor sandbox, with multi-tenant isolation and quotas enforced at the Coditect layer.

Use Google Cloud Workstations as “fat agent hosts” and run gVisor (runsc) inside each VM to provide a Modal‑like sandbox for multi-tenant workflows, with Coditect as the control plane enforcing policies and quotas.¹³⁷¹³⁸¹³⁹¹⁴⁰¹⁴¹

High-level architecture

Per-team Workstations
- Each team gets one or more Google Cloud Workstations (backed by GCE VMs) in a shared “Coditect Workstations” project/VPC.
- Workstations run Docker or containerd configured with runsc as an additional OCI runtime for sandboxed workloads.¹³⁸¹³⁹¹⁴⁰
In-VM gVisor sandboxes
- Agent code and tools run inside containers launched with the runsc runtime (not plain runc), giving a gVisor user-space kernel boundary inside the Workstation VM.¹⁴¹¹³⁷¹³⁸
- Each Coditect sandbox = one gVisor container, with per-sandbox CPU/mem/time limits applied via cgroups and Coditect’s control plane.
Coditect control plane (central services)
- Hosted on GCE or GKE in the same VPC; exposes /sandboxes API, manages Workstation registration, scheduling, quotas, and audit logs.
- Workstations run a Coditect agent that pulls/receives sandbox tasks, launches runsc containers, streams logs, and reports resource usage.

This avoids Kubernetes entirely for the inner sandboxing, leveraging gVisor’s “runs anywhere existing container tooling does” property.¹³⁹¹³⁸¹⁴¹

Provisioning per-team Workstations with gVisor

Base image / template
- Start from a Linux Workstation image (e.g., Container‑Optimized OS or Ubuntu with Docker preinstalled).
- Install gVisor runsc following the official installation guide: apt-get install -y runsc or by using the install script; then run runsc install to integrate with Docker/containerd.¹⁴⁰¹³⁹
Docker/containerd config
- Add runsc as a runtime in /etc/docker/daemon.json (or containerd config):¹³⁸¹³⁹¹⁴⁰

{
  "runtimes": {
    "runsc": {
      "path": "/usr/bin/runsc"
    }
  },
  "default-runtime": "runc"
}

- Restart Docker/Containerd. After this, `docker run --runtime=runsc ...` will launch a gVisor sandbox container.[^9_3][^9_4][^9_2]

3. Workstation registration with Coditect - On first boot, a Coditect agent on the Workstation: - Registers itself to Coditect control plane with metadata (team, tenant, capabilities, vCPU/RAM). - Opens a secure gRPC/WebSocket connection for task dispatch and health reporting. 4. Per-team isolation - Map Workstations to tenants/teams using labels and IAM (e.g., each Workstation has a Coditect “worker_id” and “team_id”). - Optionally run multiple tenants per Workstation, but rely on gVisor sandboxes + user-level ACLs for separation.¹⁴¹¹³⁸

Running nested containers with `runsc` (Modal-like behavior)

On each Workstation, the Coditect agent:

Receives a CreateSandbox RPC with: tenant, project, image, resources, and network policy.
Executes a gVisor container:

docker run \
  --runtime=runsc \
  --cpus=1.0 \
  --memory=1g \
  --read-only \
  --network=none \
  --name coditect-sb-$SANDBOX_ID \
  -v /workspaces/$TEAM/$PROJECT:/workspace:rw \
  ghcr.io/coditect/agent-runtime:latest \
  sleep infinity

- `runsc` enforces a user-space kernel boundary inside the VM.[^9_1][^9_2][^9_5]
- Use per-sandbox volumes for project data; keep container rootfs ephemeral.

3. For exec operations, the agent uses docker exec against the running gVisor container to run commands and stream stdout/stderr back to the Coditect control plane. 4. On destroy, the agent stops and removes the container and cleans up any ephemeral volumes or scratch space.

This is the same pattern Google documentation suggests for “run untrusted binaries with gVisor inside your own container infrastructure.”¹³⁷¹³⁸

Tenant/project quota enforcement

Coditect’s control plane maintains a quota service and metadata DB independent of Workstations:

Per-tenant and per-project quotas
- Max concurrent sandboxes.
- vCPU and memory budgets (e.g., vCPU‑seconds, GiB‑hours) per time window.
- Optional GPU quotas (if Workstations have GPUs and gVisor GPU access is configured).¹⁴²¹⁴³
Lifecycle flow

Client calls POST /sandboxes.
Control plane checks quotas in central DB; if OK, allocates a sandbox ID and assigns it to a Workstation with available capacity.
Workstation agent launches the runsc container and periodically reports usage (CPU, mem, wall time).
On completion or timeout, agent reports final metrics; control plane decrements quota and writes an audit event.

Enforcement
- Control plane will refuse new sandboxes when quotas are exceeded.
- Agents enforce per-container CPU/memory limits (cgroups) and kill long‑running sandboxes after their allowed time.

This keeps quotas consistent across all Workstations and decouples enforcement from individual VMs.

Security and isolation notes

gVisor boundary
- gVisor intercepts syscalls via runsc, providing an extra kernel between sandboxed workloads and the Workstation’s host kernel.¹⁴⁴¹³⁸¹⁴¹
- Appropriate for running arbitrary code from multiple tenants and teams with better protection than plain containers, but not as strong as full microVM isolation.¹⁴³¹⁴⁴¹³⁸
VM boundary (Workstation)
- Each Workstation is itself a GCE VM, so you also have a hypervisor boundary separating teams/workflows across different Workstations.
- For especially sensitive tenants, run them on dedicated Workstations instead of mixing them on shared ones.
OS hardening
- Run minimal OS images, restrict SSH/users, and treat Workstations as “managed worker nodes” controlled by Coditect automation.

Practical Coditect recommendation on Google Cloud Workstations

Default: Use gVisor (runsc) inside Workstations as the default sandbox runtime for agent workflows:
- Good enough isolation for multi-user/multi-project use, with much lower cost and complexity than microVM solutions, and fully supported by Google’s stack.¹⁴⁴¹³⁸¹⁴¹
Control plane: Central Coditect service manages /sandboxes, quotas, routing to Workstations, and logs/metrics.
Workstation agents: Lightweight daemon per VM that:
- Launches runsc containers.
- Streams outputs and metrics.
- Enforces local CPU/mem/time limits and cleanup.

If you want, next step can be:

“Draft a minimal gRPC API between Coditect control plane and the Workstation agent for creating, executing, and destroying gVisor sandboxes, including streaming stdout/stderr and resource usage metrics.” ¹⁴⁵¹⁴⁶¹⁴⁷¹⁴⁸¹⁴⁹¹⁵⁰¹⁵¹

⁂

Draft a minimal gRPC API between Coditect control plane and the Workstation agent for creating, executing, and destroying gVisor sandboxes, including streaming stdout/stderr and resource usage metrics DESIGN multi-user individual user data persistence, individual projects backed up to github in near real time as development proceeds in a project management CODITECT environment running on top of Anthropic CLAUDE CODE and GOOGLE GEMINIE

A minimal control-plane↔Workstation gRPC API plus project-scoped Git workflows lets Coditect run multi-user sandboxes on Workstations while persisting each user’s work to GitHub in near–real time.¹⁵²¹⁵³¹⁵⁴¹⁵⁵¹⁵⁶

1. Minimal gRPC API (control plane ↔ agent)

Service definitions (proto sketch)

syntax = "proto3";

package coditect.sandbox.v1;

service SandboxAgent {
  rpc CreateSandbox(CreateSandboxRequest) returns (CreateSandboxResponse);
  rpc ExecSandbox(stream ExecRequest) returns (stream ExecEvent);
  rpc DestroySandbox(DestroySandboxRequest) returns (DestroySandboxResponse);
  rpc ReportMetrics(stream MetricsReport) returns (MetricsAcknowledge);
}

// --- Core messages ---

message CreateSandboxRequest {
  string sandbox_id = 1;
  string tenant_id = 2;
  string project_id = 3;
  string image = 4;
  repeated string command = 5;        // optional initial command
  double cpu_limit = 6;               // cores
  int64 memory_limit_mib = 7;
  int64 idle_timeout_seconds = 8;
  string workspace_path = 9;          // e.g. /workspaces/tenant/project/user
  map<string,string> env = 10;
}

message CreateSandboxResponse {
  string sandbox_id = 1;
  string container_id = 2;
  string status = 3;                  // CREATED | FAILED
  string error_message = 4;
}

message ExecRequest {
  string sandbox_id = 1;
  oneof payload {
    ExecStart start = 2;
    ExecStdin stdin = 3;
    ExecControl control = 4;
  }
}

message ExecStart {
  repeated string command = 1;
  string workdir = 2;
  int64 timeout_seconds = 3;
  bool pty = 4;
}

message ExecStdin {
  bytes data = 1;
}

message ExecControl {
  bool terminate = 1;
}

message ExecEvent {
  string sandbox_id = 1;
  oneof event {
    ExecStdout stdout = 2;
    ExecStderr stderr = 3;
    ExecExit   exit   = 4;
    ExecStatus status = 5;
  }
}

message ExecStdout { bytes data = 1; }
message ExecStderr { bytes data = 1; }
message ExecExit {
  int32 exit_code = 1;
  string reason = 2; // NORMAL | TIMEOUT | KILLED
}
message ExecStatus {
  string phase = 1;  // STARTING | RUNNING | DONE | ERROR
  string message = 2;
}

message DestroySandboxRequest {
  string sandbox_id = 1;
  bool force = 2;
}

message DestroySandboxResponse {
  string sandbox_id = 1;
  string status = 2; // TERMINATED | NOT_FOUND | ERROR
}

// --- Metrics streaming ---

message MetricsReport {
  string sandbox_id = 1;
  string tenant_id = 2;
  string project_id = 3;
  int64 timestamp_unix_ms = 4;
  double cpu_seconds = 5;
  int64 memory_peak_mib = 6;
  int64 bytes_read = 7;
  int64 bytes_written = 8;
}

message MetricsAcknowledge {
  bool ok = 1;
}

CreateSandbox → agent does docker run --runtime=runsc ... on the Workstation.¹⁵⁴¹⁵⁵¹⁵⁶
ExecSandbox is bidirectional streaming: control plane sends start/stdin, agent streams stdout/stderr/exit/status.¹⁵³¹⁵⁷¹⁵²
ReportMetrics is a long-lived stream where the agent periodically pushes resource usage (derived from cgroups and container stats) for quota/billing.¹⁵⁸¹⁵⁹¹⁶⁰

2. Agent behavior on Workstation (gVisor inside VM)

CreateSandbox:
- docker run --runtime=runsc with CPU/mem limits and mounted user workspace.¹⁵⁶¹⁶¹¹⁵⁴
- Enforce network=none or egress-locked config for untrusted code.
ExecSandbox (stream):
- On ExecStart, run docker exec (optionally with a PTY) and hook process stdout/stderr to the gRPC stream back to the control plane.¹⁵⁷¹⁵²¹⁵³
- On ExecStdin, write to process stdin.
- On timeout or ExecControl.terminate, kill the process and report ExecExit with reason.
Metrics:
- Use docker stats/cgroup FS to sample per-container CPU/bytes/memory and send MetricsReport every N seconds.¹⁵⁹¹⁶⁰¹⁵⁸

3. Multi-user data model and persistence

Workspace layout (per user / project)

On each Workstation VM:

- Root: `/workspaces/<tenant>/<user>/<project>/`

src/ – working tree checked out from GitHub.
.coditect/ – agent metadata, run logs, temp artifacts.
venv/ or envs/ – optional per-project deps.

A gVisor sandbox mounts this path into the container:

docker run --runtime=runsc \
  -v /workspaces/$TENANT/$USER/$PROJECT:/workspace \
  -w /workspace/src \
  ghcr.io/coditect/agent-runtime:latest \
  sleep infinity

Each sandbox therefore operates directly on the user’s Git checkout, so Git becomes the persistence boundary.

Git + GitHub near–real time

In each /workspace/src:

Initialize Git with origin GitHub repo (per project).
Run a small Coditect sync daemon (inside Workstation or as part of the Coditect agent) that:
- Watches for filesystem changes (inotify) and triggers:

- `git add` + `git commit` with machine-generated messages on a **shadow branch** (`coditect/autosave/<user>/<date>`).  

    - `git push` to GitHub over a PAT or GitHub App installation token.
- Batches frequent changes (e.g., commit/push every 15–30 seconds when there are changes).

This yields “near–real time” backups:

If a user’s Workstation dies, their work since last commit exists in GitHub on the autosave branch.
Human commits (PRs) remain under user control; the autosave branch can be squashed or used for recovery.

You can optionally run a centralized backup job that mirrors GitHub org repos to object storage or another Git remote for compliance.¹⁶²¹⁶³¹⁶⁴

4. Coditect project management + agents (Claude/Gemini)

Project state

In the Coditect control plane:

Tenant → Team → User → Project.
For each project:
- GitHub repo URL + default branch.
- Mapping to one or more Workstations and gVisor sandboxes.
- Current “session” state: active sandboxes, last Claude/Gemini runs, metrics, etc.

AI-assisted workflows

Claude Code and Google Gemini operate on project state stored in GitHub and metadata from Coditect:
- When a user requests “refactor module X”, Coditect:
  - Ensures project workspace is up to date (Git pull).
  - Calls Claude Code / Gemini with repo snapshot + context.
  - Writes generated changes directly into the workspace.
  - gVisor sandbox runs tests / linters; outputs streamed via ExecSandbox.
  - Autosave daemon captures changes and pushes to GitHub autosave branch.
Coditect stores run metadata (prompt, models, files touched, test results) in .coditect/ and central DB, linking to Git commit hashes so all AI-driven changes are auditable.

5. Quotas, multi-tenancy, and economics

Quotas in control plane
- Per tenant/project: max concurrent sandboxes, max vCPU/memory, and daily CPU‑seconds.
- Per user: sandbox count and total runtime to prevent abuse.
Enforcement
- Before sending CreateSandbox over gRPC, Coditect checks quotas and only dispatches if under limit.
- MetricsReport streams feed usage accounting; Coditect updates counters and can proactively kill/deny new sandboxes if limits exceeded.¹⁶⁰¹⁵⁸¹⁵⁹
Economics
- On GCP, you pay primarily for Workstation VMs; gVisor just adds software isolation inside those VMs, which is cheaper than microVM stacks while still significantly improving security for untrusted code.¹⁵⁵¹⁶⁵¹⁶⁶¹⁵⁴

If you want next, a focused proto + Go server stub for SandboxAgent (with docker/runsc integration) or a concrete Git autosave strategy (branch naming, commit messages, conflict handling) can be sketched. ¹⁶⁷¹⁶⁸¹⁶⁹¹⁷⁰¹⁷¹¹⁷²

⁂

I want next, a focused proto + Go server stub for SandboxAgent (with docker/runsc integration) or a concrete Git autosave strategy (branch naming, commit messages, conflict handling) can be sketched. Show a protobuf definition for the control plane API messages and services

Example Go server and agent implementation for the gRPC API Authentication and authorization patterns for this control plane How to stream stdout/stderr reliably over unstable networks Best practices for gVisor sandbox lifecycle management

A minimal, production-friendly design is: a protobuf-defined control-plane API, a Go agent that wraps docker run --runtime=runsc, mTLS for authn/z, streaming with backpressure and resume semantics, and disciplined sandbox lifecycle policies.¹⁷³¹⁷⁴¹⁷⁵¹⁷⁶¹⁷⁷¹⁷⁸

1. Protobuf: control plane ↔ agent API

syntax = "proto3";

package coditect.sandbox.v1;

option go_package = "github.com/coditect/platform/sandboxpb";

// Control plane <-> workstation agent
service SandboxAgent {
  // One-shot create
  rpc CreateSandbox(CreateSandboxRequest) returns (CreateSandboxResponse);

  // Bidirectional exec stream: control plane sends start/stdin/control;
  // agent streams stdout/stderr/status/exit events.
  rpc ExecSandbox(stream ExecRequest) returns (stream ExecEvent);

  // Destroy sandbox container + cleanup
  rpc DestroySandbox(DestroySandboxRequest) returns (DestroySandboxResponse);

  // Telemetry stream from agent to control plane
  rpc ReportMetrics(stream MetricsReport) returns (MetricsAcknowledge);
}

// ---------- Create / Destroy ----------

message CreateSandboxRequest {
  string sandbox_id    = 1;
  string tenant_id     = 2;
  string project_id    = 3;
  string image         = 4;
  string workspace_path= 5;  // host path in workstation VM
  double cpu_limit     = 6;  // cores
  int64  memory_limit_mib = 7;
  int64  idle_timeout_seconds = 8;
  map<string,string> env = 9;
}

message CreateSandboxResponse {
  string sandbox_id  = 1;
  string container_id= 2;
  string status      = 3; // CREATED | FAILED
  string error       = 4;
}

message DestroySandboxRequest {
  string sandbox_id   = 1;
  bool   force        = 2;
}

message DestroySandboxResponse {
  string sandbox_id   = 1;
  string status       = 2; // TERMINATED | NOT_FOUND | ERROR
  string error        = 3;
}

// ---------- Exec streaming ----------

message ExecRequest {
  string sandbox_id = 1;
  oneof payload {
    ExecStart   start   = 2;
    ExecStdin   stdin   = 3;
    ExecControl control = 4;
  }
}

message ExecStart {
  repeated string command = 1;
  string workdir          = 2;
  int64  timeout_seconds  = 3;
  bool   pty              = 4;
}

message ExecStdin {
  bytes data = 1;
}

message ExecControl {
  bool terminate = 1;
}

message ExecEvent {
  string sandbox_id = 1;
  oneof event {
    ExecStdout stdout = 2;
    ExecStderr stderr = 3;
    ExecExit   exit   = 4;
    ExecStatus status = 5;
  }
}

message ExecStdout { bytes data = 1; }
message ExecStderr { bytes data = 1; }

message ExecExit {
  int32 exit_code = 1;
  string reason   = 2; // NORMAL | TIMEOUT | KILLED | INTERNAL_ERROR
}

message ExecStatus {
  string phase    = 1; // STARTING | RUNNING | DONE | ERROR
  string message  = 2;
}

// ---------- Metrics streaming ----------

message MetricsReport {
  string sandbox_id = 1;
  string tenant_id  = 2;
  string project_id = 3;
  int64  timestamp_unix_ms = 4;
  double cpu_seconds       = 5;
  int64  memory_peak_mib   = 6;
  int64  bytes_read        = 7;
  int64  bytes_written     = 8;
}

message MetricsAcknowledge {
  bool ok      = 1;
  string error = 2;
}

Shapes match gRPC streaming best practices (bidirectional Exec, client-stream metrics).¹⁷⁹¹⁸⁰¹⁸¹¹⁷³

2. Go agent skeleton with `docker` + `runsc`

Below is a condensed agent implementation sketch using Go gRPC; it assumes Docker is configured with a runsc runtime.¹⁸²¹⁷⁵¹⁷⁸¹⁸³

// go:build workstation-agent

package main

import (
	"context"
	"io"
	"log"
	"net"
	"os/exec"
	"time"

	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials"
	"google.golang.org/grpc/peer"

	pb "github.com/coditect/platform/sandboxpb"
)

type agentServer struct {
	pb.UnimplementedSandboxAgentServer
}

func (s *agentServer) CreateSandbox(ctx context.Context, req *pb.CreateSandboxRequest) (*pb.CreateSandboxResponse, error) {
	containerName := "coditect-sb-" + req.SandboxId

	args := []string{
		"run", "-d",
		"--runtime=runsc", // gVisor runtime [web:116][web:142]
		"--cpus", formatCPU(req.CpuLimit),
		"--memory", formatMem(req.MemoryLimitMib),
		"--name", containerName,
		"--network", "none",
		"--read-only",
		"-v", req.WorkspacePath + ":/workspace",
	}
	for k, v := range req.Env {
		args = append(args, "-e", k+"="+v)
	}
	args = append(args, req.Image, "sleep", "infinity")

	cmd := exec.CommandContext(ctx, "docker", args...)
	out, err := cmd.CombinedOutput()
	if err != nil {
		return &pb.CreateSandboxResponse{
			SandboxId: req.SandboxId,
			Status:    "FAILED",
			Error:     string(out),
		}, nil
	}

	return &pb.CreateSandboxResponse{
		SandboxId:   req.SandboxId,
		ContainerId: containerName,
		Status:      "CREATED",
	}, nil
}

func (s *agentServer) ExecSandbox(stream pb.SandboxAgent_ExecSandboxServer) error {
	// Receive first message with ExecStart
	first, err := stream.Recv()
	if err != nil {
		return err
	}
	start := first.GetStart()
	if start == nil {
		return io.EOF
	}
	sandboxID := first.SandboxId
	containerName := "coditect-sb-" + sandboxID

	ctx := stream.Context()
	cmdArgs := append([]string{"exec"}, containerName)
	if start.Pty {
		cmdArgs = append(cmdArgs, "-it")
	}
	cmdArgs = append(cmdArgs, start.Command...)

	cmd := exec.CommandContext(ctx, "docker", cmdArgs...)
	stdin, _ := cmd.StdinPipe()
	stdout, _ := cmd.StdoutPipe()
	stderr, _ := cmd.StderrPipe()

	if err := cmd.Start(); err != nil {
		stream.Send(&pb.ExecEvent{
			SandboxId: sandboxID,
			Event: &pb.ExecEvent_Status{
				Status: &pb.ExecStatus{
					Phase:   "ERROR",
					Message: err.Error(),
				},
			},
		})
		return nil
	}

	// Goroutine: forward stdout
	go func() {
		buf := make([]byte, 32*1024)
		for {
			n, readErr := stdout.Read(buf)
			if n > 0 {
				_ = stream.Send(&pb.ExecEvent{
					SandboxId: sandboxID,
					Event: &pb.ExecEvent_Stdout{
						Stdout: &pb.ExecStdout{Data: buf[:n]},
					},
				})
			}
			if readErr != nil {
				return
			}
		}
	}()

	// Goroutine: forward stderr
	go func() {
		buf := make([]byte, 32*1024)
		for {
			n, readErr := stderr.Read(buf)
			if n > 0 {
				_ = stream.Send(&pb.ExecEvent{
					SandboxId: sandboxID,
					Event: &pb.ExecEvent_Stderr{
						Stderr: &pb.ExecStderr{Data: buf[:n]},
					},
				})
			}
			if readErr != nil {
				return
			}
		}
	}()

	// Receive stdin/control
	for {
		msg, recvErr := stream.Recv()
		if recvErr == io.EOF {
			break
		}
		if recvErr != nil {
			break
		}
		switch p := msg.Payload.(type) {
		case *pb.ExecRequest_Stdin:
			if _, err := stdin.Write(p.Stdin.Data); err != nil {
				// ignore, process might be done
			}
		case *pb.ExecRequest_Control:
			if p.Control.Terminate {
				_ = cmd.Process.Kill()
			}
		}
	}

	waitErr := cmd.Wait()
	exitCode := 0
	reason := "NORMAL"
	if waitErr != nil {
		// Map to exit code and reason...
		reason = "ERROR"
	}
	return stream.Send(&pb.ExecEvent{
		SandboxId: sandboxID,
		Event: &pb.ExecEvent_Exit{
			Exit: &pb.ExecExit{
				ExitCode: int32(exitCode),
				Reason:   reason,
			},
		},
	})
}

func (s *agentServer) DestroySandbox(ctx context.Context, req *pb.DestroySandboxRequest) (*pb.DestroySandboxResponse, error) {
	containerName := "coditect-sb-" + req.SandboxId
	cmd := exec.CommandContext(ctx, "docker", "rm", "-f", containerName)
	out, err := cmd.CombinedOutput()
	if err != nil {
		return &pb.DestroySandboxResponse{
			SandboxId: req.SandboxId,
			Status:    "ERROR",
			Error:     string(out),
		}, nil
	}
	return &pb.DestroySandboxResponse{
		SandboxId: req.SandboxId,
		Status:    "TERMINATED",
	}, nil
}

func (s *agentServer) ReportMetrics(stream pb.SandboxAgent_ReportMetricsServer) error {
	for {
		report, err := stream.Recv()
		if err == io.EOF {
			return stream.SendAndClose(&pb.MetricsAcknowledge{Ok: true})
		}
		if err != nil {
			return err
		}
		// Write metrics to local exporter / forward to control plane, etc.
		_ = report
	}
}

func main() {
	creds, err := credentials.NewServerTLSFromFile("agent-cert.pem", "agent-key.pem")
	if err != nil {
		log.Fatal(err)
	}
	server := grpc.NewServer(grpc.Creds(creds), authInterceptor())
	pb.RegisterSandboxAgentServer(server, &agentServer{})

	lis, err := net.Listen("tcp", ":8443")
	if err != nil {
		log.Fatal(err)
	}
	log.Println("Sandbox agent listening on :8443")
	if err := server.Serve(lis); err != nil {
		log.Fatal(err)
	}
}

func authInterceptor() grpc.ServerOption {
	return grpc.UnaryInterceptor(func(
		ctx context.Context,
		req interface{},
		info *grpc.UnaryServerInfo,
		handler grpc.UnaryHandler,
	) (interface{}, error) {
		if p, ok := peer.FromContext(ctx); ok && p.AuthInfo != nil {
			// Check mTLS cert SANs / SPIFFE IDs here.
		}
		return handler(ctx, req)
	})
}

func formatCPU(cpu float64) string  { return fmt.Sprintf("%.2f", cpu) }
func formatMem(mib int64) string   { return fmt.Sprintf("%dMi", mib) }

Follows Go gRPC streaming patterns.¹⁸⁰¹⁸⁴¹⁷³¹⁷⁹
Uses runsc via Docker runtime integration.¹⁷⁵¹⁷⁸¹⁸²

3. Authentication and authorization patterns

mTLS for service-to-service
- Use gRPC with TLS and mutual TLS; control-plane and agents each present X.509 certs.¹⁸⁵¹⁷⁴
- Encode identity in certificate SAN / SPIFFE ID (e.g., spiffe://coditect/workstation/<id>).
- Agent only accepts connections from certs signed by Coditect CA with appropriate SAN prefix; control plane similarly validates agents.¹⁷⁴¹⁸⁶¹⁸⁵
Per-call metadata (RBAC)
- Control plane includes tenant_id, project_id, and sandbox_id in gRPC metadata.
- Agent uses cert identity + metadata to validate that the caller is allowed to manage that sandbox on that workstation.
Least privilege
- Only Coditect control plane uses SandboxAgent API; user sessions never talk directly to Workstations.
- Per-tenant quotas and allowed operations enforced at control-plane before calling agent.

4. Streaming stdout/stderr over unstable networks

Patterns from gRPC streaming best practices:¹⁸¹¹⁸⁴¹⁸⁷¹⁷³¹⁸⁰

Chunked messages
- Send stdout/stderr as small chunks (e.g., 32 KiB) in ExecStdout/ExecStderr messages to avoid head-of-line blocking and huge messages.
Application-level sequence numbers
- Optionally add int64 seq to ExecStdout/ExecStderr and ExecRequest to detect gaps or duplicate messages if the control plane reconnects or needs to reconstruct logs.
Backpressure
- Use gRPC flow-control; avoid buffering unbounded data server-side. If client is slow, writes will block, naturally throttling output.¹⁸⁴¹⁸¹
Reconnect + resume
- Treat Exec streams as long-lived but restartable sessions:
  - If connection drops, control plane can reconnect and reattach to the same sandbox in “logs-only” mode, starting from last-seen sequence number.
  - For reliability, also persist logs on disk (/workspace/.coditect/logs/<sandbox>.log) and expose a “download logs” API as a fallback.
Heartbeat messages
- Periodic ExecStatus messages (e.g., every 5–10 seconds) let the control plane detect broken connections and mark runs as stale.

5. gVisor sandbox lifecycle best practices

Based on gVisor docs and production usage patterns:¹⁸⁸¹⁷⁶¹⁷⁷¹⁸⁹¹⁹⁰¹⁹¹

Immutable, minimal images
- Build small base images for agent workloads; avoid writing to container root, use workspace volume for project data.¹⁹⁰¹⁸⁸
Strict resource limits
- Always set CPU and memory limits on docker run --runtime=runsc; gVisor handles resource isolation via underlying cgroups.¹⁹²¹⁹³¹⁹⁰
- Apply wall-clock and idle timeouts; agent enforces timeouts and cleans up containers.
Network minimization
- Use --network=none or minimal egress (proxy) for untrusted code, since many LLM-generated code exploits involve network exfiltration.¹⁹⁴¹⁹⁵
Lifecycle hooks
- Define states: CREATING -> RUNNING -> IDLE -> TERMINATING -> TERMINATED, and enforce transitions centrally.
- Periodically reconcile: list Docker containers with coditect-sb-* prefix, and clean up orphaned or long-idle sandboxes.
Observability
- Expose per-sandbox metrics (CPU, memory, I/O) via MetricsReport, and integrate with a global metrics backend for alerts (e.g., runaway sandboxes, OOM loops).¹⁹³¹⁹²
Compatibility testing
- Some syscalls and kernel features behave differently under gVisor; test typical workloads and frameworks (Python, Node, build tools) under runsc early.¹⁹¹¹⁹⁶¹⁸⁸¹⁹⁰

If you want, next prompt could be:

“Sketch the Git autosave strategy: exact branch naming scheme, commit-message conventions, and conflict-resolution flow for Coditect autosave branches pushed from Workstations to GitHub.” ¹⁹⁷¹⁹⁸¹⁹⁹²⁰⁰

⁂

Sketch the Git autosave strategy: exact branch naming scheme, commit-message conventions, and conflict-resolution flow for Coditect autosave branches pushed from Workstations to GitHub

Use autosave as a separate, machine-owned history that never collides with human feature branches, and treat it as an append-only stream that can be squashed into clean commits later.²⁰¹²⁰²²⁰³

Branch naming scheme

Goal: globally unique, machine-owned, easy to query by tenant/user/project and date.

Pattern
- autosave/<tenant-slug>/<project-slug>/<user-slug>/<yyyymmdd>
Examples
- autosave/acme/payments/api-user123/20260114
- autosave/corp-ml/llm-router/alice/20260114

Properties:

Uses slash-separated hierarchy per best practices for descriptive branch names.²⁰⁴²⁰⁵²⁰⁶²⁰⁷
Rotates daily so branches don’t grow unbounded; you can archive old days or squash them.
Automatically indicates ownership (tenant, project, user, date).

Implementation details:

Slugify: lowercase, [^a-z0-9-] → -, truncate to length bounds to avoid absurd branch names.²⁰⁵²⁰⁶²⁰⁸

Commit-message conventions

Autosave commits should be clearly machine-generated, compact, and self-describing.

Subject line pattern

- `autosave: <short summary> [<user>@<timestamp>]`

Short summary
- “workspace snapshot”, or
- “edited (+N, -M lines)” if cheap to compute.

Examples:

autosave: workspace snapshot [alice@2026-01-14T04:05:12Z]
autosave: edited payment_service.py (+32 -8) [bob@2026-01-14T04:06:45Z]

Guidelines:

Keep subject ≤ 72 chars for readability.²⁰³²⁰⁹²⁰¹
No body text unless you want to embed a JSON summary blob (file list, tools used), which can be parsed by Coditect later.
Consider a fixed prefix autosave: so automation can distinguish these commits from human ones.²¹⁰²⁰³

Autosave workflow and frequency

Inside the Workstation (per user/project):

Tracking changes
- Watch /workspace/src for file changes (inotify) or poll git status --porcelain every N seconds.
Autosave trigger
- If there are uncommitted changes and no Git operation in progress (.git/index.lock absent), and last autosave > 15–30s ago, then autosave.
Autosave algorithm

# Pseudocode
git status --porcelain
if dirty:
  branch="autosave/<tenant>/<project>/<user>/<yyyymmdd>"
  git fetch origin
  git checkout -B "$branch" "origin/$branch" || git checkout -b "$branch"
  git add -A
  git commit -m "autosave: workspace snapshot [user@timestamp]"
  git push origin "$branch"

- This keeps autosave isolated while allowing rebasing/merging from the main feature branch into autosave when needed.

4. User feature branches - Users develop on normal branches like feature/PROJ-123-new-api.²⁰⁷²⁰⁵ - A Coditect “promote” action later merges/squashes autosave changes into the user’s feature branch via PR or manual cherry-picking.

Conflict-resolution flow

Key principle: never rewrite autosave public history; treat it as an append-only log to avoid breaking in-flight Workstation pushes.²⁰²²¹¹

1. Push conflicts (non-fast-forward)

When autosave git push origin autosave/... fails due to non-FF:

The autosave daemon:
- Runs git fetch origin autosave/....
- Rebases local autosave branch onto the remote:

git fetch origin autosave/...
git rebase origin/autosave/...   # safe if only automation writes here
git push origin autosave/...

Because only automation writes these branches, controlling rebase via tooling is acceptable; no human consumers rely on autosave’s exact history.²¹¹²⁰²

If rebase fails (e.g., due to manual edits):

Mark the autosave branch as dirty in Coditect metadata and stop autosaving until a resolver job or a human cleans it up.

2. Divergence from main branches

Autosave branches will drift from main / feature/*:

Periodically (or when user wants to “promote” changes), Coditect:
- Checks out the user’s feature branch.
- Merges or cherry-picks from the autosave branch:
  - Squash merge to produce one coherent commit:

git checkout feature/PROJ-123
git merge --squash autosave/acme/payments/api-user123/20260114
git commit -m "feat: integrate autosave work from 2026-01-14"

    - This avoids dragging noisy machine-generated history into the main branch.[^12_1][^12_2]

After promotion, autosave continues accumulating; the next day rotates into a new .../20260115 branch.

3. Handling local uncommitted changes when switching branches

Autosave logic should not change the current checked-out branch in a way that disturbs the user:

Preferred pattern:
- Keep user working on their feature branch.
- Use separate clone or worktree for autosave if you need to track autosave branches explicitly, or:
- Autosave directly on the current branch to a private remote (mirrored) and only push consolidated snapshots to GitHub branches.²¹¹

A practical compromise:

Treat autosave as branchless local commits in a hidden remote:
- Local autosave repository pointing to a “shadow” remote or ref namespace (refs/autosave/...) and mirroring that to GitHub.²¹¹

But for simplicity in first iteration, using standard branches with automation allowed to re-checkout inside a detached worktree is usually sufficient.

Summary shape Coditect can adopt

Branch naming:

- `autosave/<tenant>/<project>/<user>/<yyyymmdd>`

Commit messages:
- autosave: workspace snapshot [user@ISO8601]
Autosave frequency:
- Every 15–30 seconds of idle time with dirty working tree, batched and pushed to GitHub.
Conflict handling:
- Automated rebase of autosave branches on push conflicts.
- “Promotion” into human branches via squash merges or cherry-picks.
Safety:
- Never auto-rewrite user feature branches.
- Treat autosave branches as machine-only, append-only logs that can be squashed/archived after promotion.

If you want, the next iteration can be a shell/Go daemon sketch that implements this autosave loop, including detection of ongoing rebase/merge/commit operations to avoid interfering with active Git commands. ²¹²²¹³²¹⁴²¹⁵

⁂

Conflict-resolution steps when autosave push fails due to divergence from main branch

When an autosave push fails because the autosave branch has diverged from its remote (usually because main/feature branches moved and autosave was rebased/merged against them), use a clean, automated, autosave-only reconciliation flow.

1. Detect and classify the failure

On git push origin autosave/<tenant>/<project>/<user>/<date>:

If error is non-fast-forward (remote has commits that local doesn’t), classify as autosave divergence, not a main-branch conflict.
Important: treat autosave/* as machine-owned only; humans don’t commit there, so you can safely rewrite it without affecting user branches.²¹⁶²¹⁷

2. Reconcile autosave branch with remote

Steps inside the Workstation’s autosave daemon:

Fetch remote state

git fetch origin autosave/<...>

Rebase local autosave on remote

git checkout autosave/<...>
git rebase origin/autosave/<...>

- Because both sides are machine-generated, rebase should normally be trivial (fast-forward or linear append).

3. Resolve rare conflicts (if any)

If rebase pauses with conflicts (e.g., previous autosave bug or manual edits to autosave branch): - Abort automated rebase and mark the autosave branch as “needs manual intervention”:

git rebase --abort

- Record a flag in Coditect metadata (e.g., `autosave_conflict=true`) and stop autosaving for that branch until a background “repair job” or a human resolves it.

4. Retry push

git push origin autosave/<...>

3. Keeping in sync with main / feature branches

Autosave divergence from main/feature is expected; resolution should be one-way, from autosave into user branches, never the reverse.

Periodically (or on user action “sync autosave”), Coditect:

Checks out user’s feature branch:

git checkout feature/PROJ-123
git pull origin feature/PROJ-123

Integrates autosave history via squash merge:

git merge --squash autosave/<tenant>/<project>/<user>/<date>
git commit -m "feat: integrate autosave work from 2026-01-14"

Pushes feature branch and opens a PR if desired.

This avoids ever forcing main/feature to incorporate noisy autosave commit history and keeps autosave conflicts decoupled from “real” Git history.²¹⁸²¹⁶

4. Fallback strategy for persistent autosave failures

If autosave cannot rebase/push after N attempts (e.g., due to repository corruption or manual edits):

Stop autosave for that project/user and mark in Coditect UI: “Autosave paused due to Git conflicts.”
Keep local working tree intact so user doesn’t lose work.
Offer one-click “repair” action that:
- Creates a new autosave branch (e.g., autosave/.../<date>-repair),
- Commits the current workspace snapshot there,
- Pushes and resumes autosave from that new branch.

This minimizes risk of data loss while keeping autosave history clean and machine-controlled.

⁂

USE GIT WORKTREES and research

Which branch prefix should Workstations create for autosave changes How to include ticket and workstation ID in branch names Exact commit message template for autosave snapshots How to tag autosave commits for easy cleanup

Use worktrees so autosave runs in an isolated working copy per user/ticket, and encode ticket + workstation identity directly into branch and commit metadata for traceability and cleanup.²¹⁹²²⁰²²¹²²²

Branch prefix and naming with worktrees

Prefix

Use a dedicated autosave prefix and ticket-first naming, so you get uniqueness + traceability:²²³²²⁴²²⁵

Prefix: autosave/
Full pattern:

- `autosave/<ticket-id>/<tenant>/<project>/<user>/<workstation>/<yyyymmdd>`

Examples:

autosave/PROJ-1234/acme/payments/alice/ws-01/20260114
autosave/NOTICKET/acme/spike-service/bob/ws-nyc-02/20260114

Notes:

Ticket-first aligns with common branch-naming best practices and makes it easy to correlate autosave streams to work-items.²²⁶²²⁵²²³
NOTICKET (or similar) can be used when there is no tracker ID.²²⁴

Worktree usage

For each active autosave context:

git worktree add ../.wt-autosave-PROJ-1234 \
  autosave/PROJ-1234/acme/payments/alice/ws-01/20260114

The autosave daemon operates in ../.wt-autosave-PROJ-1234, leaving the main working tree free for the developer’s feature branch.²²⁰²²¹²²²²¹⁹

Commit message template

Use a strict, machine-readable template:

Template

autosave: <ticket-id> <workstation-id> <user> <ISO8601-UTC>

Examples:

autosave: PROJ-1234 ws-01 alice 2026-01-14T04:07:15Z
autosave: NOTICKET ws-nyc-02 bob 2026-01-14T04:08:01Z

Rationale:

Starts with autosave: to differentiate from human commits and allow easy filtering.²²⁷²²⁸²²⁹
Contains ticket ID and workstation ID to diagnose which environment generated the snapshot.
ISO timestamp makes it sortable and avoids locale issues.

Optional body (for Coditect-only consumption):

files:
  - payment_service.py
  - handlers/webhook.go
tool:
  - claude-code
  - gemini
summary: "Refactor payment flow and add retry logic"

Tags for autosave commits (for cleanup and indexing)

Treat tags as lightweight markers for autosave epochs, not for every commit.²³⁰²³¹

Tag naming

Prefix: autosave-

- **Pattern**: `autosave/<ticket-id>/<user>/<yyyymmdd>/<n>`

Examples:

autosave/PROJ-1234/alice/20260114/001
autosave/PROJ-1234/alice/20260114/002

Usage:

Optionally tag every Nth autosave commit (e.g., every 20th) to create anchors for cleanup and recovery.
Store mapping in Coditect (ticket → last autosave tag).

Cleanup

To prune old autosave history for a ticket/date after promoting to a clean feature branch, Coditect can:
- Enumerate tags matching autosave/PROJ-1234/alice/20260114/*.
- Decide retention (keep last tag, delete older ones).
- Optionally delete old autosave branches past an age threshold (e.g., > 30 days).

This keeps autosave history discoverable but bounded.

Conflict handling with worktrees

Autosave worktree operates solely on its autosave branch; developer worktree remains on feature/PROJ-1234-something.²²¹²²²²¹⁹²²⁰
On push non-FF: autosave process uses its worktree to git fetch origin autosave/... && git rebase origin/autosave/... and re-push, with no impact on the developer’s branch.
If rebase fails repeatedly, Coditect can:
- Create a fresh autosave branch and worktree with a suffix, e.g., autosave/PROJ-1234/.../20260114-repair,
- Commit the current state there and continue, leaving the old branch as an artifact for manual inspection.

If you want, next step can be a small shell or Go daemon loop that: creates the autosave worktree from the feature branch, watches for changes, and applies this branch + commit + tag scheme automatically. ²³²²³³²³⁴²³⁵²³⁶

⁂

Should autosave branches be local or pushed to remote

Recommended prefix for workstation autosave branches How to format branch name with ticket and workstation ID Examples of safe unique workstation identifiers Branch naming collisions handling strategy for autosave branches

Autosave branches should be pushed to the remote (for recovery and central visibility), but treated as machine-owned, low-value history that can be rotated and squashed.

Local vs remote autosave branches

Local-only autosave:
- Pros: No noise in the remote, simpler Git perms.
- Cons: No protection against Workstation loss; Coditect cannot see or manage autosave state centrally.
Pushed autosave branches (recommended):
- Pros: Survives VM failure, enables Coditect-wide audit, consolidation, and cleanup; fits your “near real-time backup” goal.
- Cons: Extra refs in the remote, but mitigated by strict naming and retention policies.

For Coditect, autosave branches should be pushed to the remote, with clear prefixes and automated pruning.

Recommended prefix for autosave branches

Use a distinct, machine-only prefix at the root:

Recommended: autosave/

Examples with hierarchy:

autosave/PROJ-1234/acme/payments/alice/ws-01/20260114
autosave/NOTICKET/corp-ml/llm-router/bob/ws-nyc-02/20260114

This keeps autosave clearly separated from feature/, bugfix/, etc., and allows easy listing/cleanup (git branch --list 'autosave/*').

Branch name format with ticket and workstation ID

Template:

`autosave/<ticket-id>/<tenant>/<project>/<user>/<workstation-id>/<yyyymmdd>`

<ticket-id>: JIRA-style or tracker ID (e.g., PROJ-1234), or NOTICKET when absent.

- `<tenant>` / `<project>`: short slugs, kebab-case.

<user>: user slug (GitHub handle or SSO username).
<workstation-id>: short, stable identifier (see below).
<yyyymmdd>: date (UTC) to rotate branches daily.

Example:

autosave/PROJ-1234/acme/payments/alice/ws-01/20260114

Slugify each segment (lowercase, [^a-z0-9-] → -, length-capped) to keep branch names safe and Git-friendly.

Safe unique workstation identifiers

Properties: stable per Workstation, non-sensitive, short, and unique enough within the repo.

Good options:

Short host-based ID (preferred):
- E.g., ws-01, ws-nyc-02, ws-br-cj-01.

- Derived from hostname + region or team: `ws-<region>-<seq>`.

Hash-based ID:
- E.g., ws-6f3a9b, first 6–8 chars of a hash of instance ID.
Cloud instance metadata (normalized):
- GCE instance name slugged: ws-<instance-name> (but keep short).

Avoid:

Raw IP addresses, internal hostnames that leak topology, or long opaque UUIDs; they clutter branch names and can expose infra details.

Handling branch naming collisions

A collision happens if two autosave processes produce the same branch name (e.g., two Workstations with misconfigured IDs). Strategy:

Prevent collisions by design
- Make <workstation-id> unique per repo or per project (control this in Coditect config).
- Enforce uniqueness check at Workstation registration time in Coditect’s control plane.
If collision still occurs (non-fast-forward push)

When git push origin autosave/... fails with a non-FF error: - Try automated reconciliation: - git fetch origin autosave/... - If the remote branch was also autosave from same workstation (detected via commit metadata), rebase and push. - If remote branch appears to be from a different workstation (commit messages show different workstation IDs): - Treat as collision and fork a new branch name with a numeric suffix: - Original: - autosave/PROJ-1234/acme/payments/alice/ws-01/20260114 - New (forked): - autosave/PROJ-1234/acme/payments/alice/ws-01b/20260114 - or autosave/PROJ-1234/acme/payments/alice/ws-01/20260114-2 - Update the Workstation’s autosave config to use the forked branch going forward. 3. Central coordination (Coditect) - Maintain a mapping in Coditect: (tenant, project, user, workstation-id, date) → branch name. - On registration or first autosave, the control plane assigns the canonical autosave branch name and ensures uniqueness; Workstations never invent it on their own.

This gives you:

Remote-backed, recoverable autosave.
A clear naming scheme carrying ticket + workstation identity.
A collision strategy that always preserves data and isolates conflicting producers into new branches.

SUMMARIZE OUR FINDINGS

Coditect’s sandbox + autosave design converged on a practical, mostly open stack with clear separation of concerns across runtimes, execution control, and Git workflows.

Sandbox runtimes and tiers

For a Modal-like execution boundary, gVisor, Kata+Firecracker, Firecracker alone, and Wasm runtimes all emerged as viable building blocks, each trading isolation for performance and complexity. gVisor sits between containers and the host kernel via a user-space kernel, giving better isolation than runc with near-container startup. Kata+Firecracker adds VM-grade isolation via microVMs at the cost of higher overhead and more complex node setup, while Wasm (Wasmtime/Wasmer/WasmEdge) is ideal for capability-scoped tools with very fast startup but no full Linux.²³⁷²³⁸²³⁹²⁴⁰²⁴¹²⁴²
A tiered architecture on Kubernetes was outlined: gVisor for semi-trusted workloads via a gvisor RuntimeClass, Kata+Firecracker for untrusted/public workloads via a kata-fc RuntimeClass bound to special node pools, and a Wasm tier using either a Wasm-aware runtimeClass or a dedicated Wasm executor service. Runtime choice is policy-driven, based on tenant, project, and risk level, with shared observability, quotas, and audit logging across all tiers.²⁴²²⁴³²⁴⁴²⁴⁵²⁴⁶

gVisor on Google Cloud Workstations

For Coditect on Google Cloud Workstations (no Kubernetes), the recommended approach is to run gVisor (runsc) as an alternate container runtime inside each Workstation VM, treating each agent execution as a gVisor sandbox container. Google’s own use of gVisor for serverless (Cloud Run, GKE Sandbox) shows it is a good fit for untrusted multi-tenant workloads with acceptable performance.²⁴⁷²⁴⁸²⁴⁹²⁵⁰²⁵¹
Each team gets one or more Workstations; a Coditect agent on each VM handles docker run --runtime=runsc ... with CPU/memory/time limits, mounts per-tenant project workspaces, streams stdout/stderr back to the control plane, and periodically reports resource usage for quotas and billing. gVisor provides an extra isolation boundary inside each VM without needing nested virtualization or microVM orchestration.²⁵⁰²⁵¹²⁵²²⁵³

Control plane ↔ agent gRPC API

A minimal gRPC API was defined with a SandboxAgent service exposing CreateSandbox, ExecSandbox (bidirectional stream), DestroySandbox, and ReportMetrics (client stream). This shape matches gRPC streaming best practices for long-lived exec sessions and metrics telemetry.²⁵⁴²⁵⁵²⁵⁶²⁵⁷
A Go agent implementation sketch uses Docker with the runsc runtime to launch sandbox containers, wires docker exec stdout/stderr into the ExecSandbox stream, reads stdin and control messages from the control plane, and reports exit codes and resource usage. Authentication and authorization are handled via mTLS (X.509 certs with SAN/SPIFFE IDs) and gRPC metadata, with the control plane enforcing tenant/project quotas before calling the agent.²⁵²²⁵⁸²⁵⁹²⁶⁰²⁶¹²⁶²

Streaming and lifecycle best practices

For stdout/stderr over unstable networks, the design uses chunked messages (e.g., 32 KiB), backpressure via gRPC’s flow control, optional application-level sequence numbers, heartbeat ExecStatus messages, and the ability to reconnect and reattach to running sandboxes, with local log files as a fallback. These patterns follow gRPC streaming guidance for robust long-running streams.²⁵⁶²⁵⁷²⁶²²⁶³²⁵⁴
gVisor lifecycle guidance includes: minimal immutable images; strict per-container CPU/memory/time limits; default-deny or tightly restricted network; periodic reconciliation to clean up orphan containers; and testing workloads under gVisor to catch syscall/compatibility quirks.²⁵¹²⁵³²⁶⁴²⁶⁵²⁶⁶²⁶⁷²⁵⁰

Git autosave with worktrees and remote branches

Autosave is treated as a machine-owned, remote-backed history to provide near–real time backup and central visibility. Dedicated autosave branches are pushed to the remote, using a prefix like autosave/ and a hierarchical name including ticket, tenant, project, user, workstation, and date, e.g.: autosave/PROJ-1234/acme/payments/alice/ws-01/20260114. This follows general branch-naming best practices while keeping autosave clearly segregated from human branches.²⁶⁸²⁶⁹²⁷⁰²⁷¹²⁷²
Git worktrees are used to isolate autosave operations from the user’s main working copy: a separate worktree pointed at the autosave branch lets the autosave daemon commit and push snapshots without disturbing the developer’s current branch. Prior art and guides show this pattern is well-suited for multi-branch workflows and AI tooling.²⁷³²⁷⁴²⁷⁵²⁷⁶
Commit messages follow a strict template such as autosave: PROJ-1234 ws-01 alice 2026-01-14T04:07:15Z, making them easy to filter and audit. Autosave-only tags like autosave/PROJ-1234/alice/20260114/001 can mark epochs for cleanup. Conflict handling uses automated fetch+rebase on autosave branches (machine-only), with collision detection on workstation IDs and, if needed, creation of new suffixed branch names (e.g., .../20260114-2), while leaving user feature branches untouched. This aligns with Git guidance to keep main history clean while allowing frequent automated commits.²⁷⁷²⁷⁸²⁷⁹²⁸⁰²⁸¹

⁂

CREATE A SDD TDD AND ADR describing the CODITECT GOOGLE CLOUD WORKSTATION IMPLEMENTATION as research, analyzed and recommended

Below is a concise SDD, TDD, and ADR set for the Coditect + Google Cloud Workstations + gVisor implementation as designed and researched.

1. SDD – System Design Document

1.1 Overview

Coditect runs multi-tenant, agentic development workflows on Google Cloud Workstations, using gVisor (runsc) inside each Workstation VM as a Modal-like sandbox runtime for untrusted agent code.²⁸²²⁸³²⁸⁴²⁸⁵

Core elements:

Google Cloud Workstations cluster (controller + gateway) providing per-team IDE VMs with persistent disks and VPC access.²⁸³²⁸⁴
Per-VM Coditect Agent exposing a gRPC SandboxAgent service and managing docker --runtime=runsc containers.²⁸⁶²⁸⁷
Coditect Control Plane (API + Orchestrator) that exposes /sandboxes to clients, enforces quotas and IAM, and orchestrates Workstations/agents.
Git-backed project workspaces with autosave branches using Git worktrees and machine-owned branches pushed to GitHub for near real-time persistence.²⁸⁸²⁸⁹²⁹⁰²⁹¹

1.2 Architecture components

Google Cloud Workstations
- Managed cluster per region; Workstations are GCE VMs managed by the Workstations controller and reachable via a gateway.²⁸⁴²⁸³
- Each Workstation VM has:
  - Docker or containerd configured with gVisor runsc runtime.²⁸⁷²⁸²²⁸⁶
  - Coditect Agent daemon (gRPC server).

- Workspace root: `/workspaces/<tenant>/<user>/<project>` with persistent disk.

gVisor sandbox runtime
- Installed via runsc install and configured as Docker runtime runsc.²⁸⁶²⁸⁷
- Sandbox containers launched by agent as:

docker run --runtime=runsc \
  --cpus=<limit> --memory=<limit> \
  --network=none --read-only \
  -v /workspaces/...:/workspace \
  --name coditect-sb-<id> <image> sleep infinity

- Provides stronger isolation between agent code and Workstation OS by interposing a user-space kernel.[^17_11][^17_12][^17_13][^17_4][^17_1]

3. Coditect Control Plane - Exposes HTTP API (/sandboxes, /exec, /destroy) to Coditect UI and orchestration agents. - Maintains metadata DB (tenants, projects, sandboxes, quotas, autosave branches). - Implements gRPC client to per-Workstation SandboxAgent. 4. SandboxAgent gRPC API (per Workstation) - Proto (summarized): - CreateSandbox(CreateSandboxRequest) -> CreateSandboxResponse - ExecSandbox(stream ExecRequest) <-/-> (stream ExecEvent) (bidirectional).²⁹²²⁹³²⁹⁴ - DestroySandbox(DestroySandboxRequest) -> DestroySandboxResponse - ReportMetrics(stream MetricsReport) -> MetricsAcknowledge - Control Plane selects a Workstation, calls CreateSandbox, then runs exec sessions via ExecSandbox. 5. Git autosave + worktrees - For each (tenant, project, user, ticket, workstation, date) Coditect creates a worktree checked out to an autosave branch:²⁸⁹²⁹⁰²⁹¹²⁸⁸

Branch pattern:

`autosave/<ticket-id>/<tenant>/<project>/<user>/<workstation-id>/<yyyymmdd>`

- Autosave daemon in the Workstation’s context:
    - Watches workspace changes.
    - Periodically `git add -A`, `git commit` with machine template, and `git push origin autosave/...`.

1.3 Data flows

Sandbox lifecycle
- Client → Control Plane: POST /sandboxes with tenant/project/session info.
- Control Plane: quota + IAM checks; chooses Workstation; calls CreateSandbox on its agent.
- Agent: launches gVisor container and returns sandbox/container IDs.
- Control Plane: records sandbox metadata and returns sandbox handle.
Execution + streaming
- Client → Control Plane: POST /sandboxes/{id}/exec.
- Control Plane ↔ Agent: ExecSandbox stream.
  - Control Plane sends ExecStart and optional ExecStdin.
  - Agent streams ExecStdout, ExecStderr, ExecStatus, ExecExit.²⁹³²⁹²
Metrics + quotas
- Agent periodically sends MetricsReport (CPU seconds, peak mem, bytes I/O) to Control Plane.²⁹⁵²⁹⁶²⁹⁷
- Control Plane updates usage counters per tenant/project and may deny new sandboxes or terminate existing ones when quotas exceeded.
Git autosave
- Autosave daemon operates in autosave worktree, committing snapshots and pushing to GitHub.²⁹⁸²⁸⁸
- Coditect central DB tracks mapping: (tenant, project, user, ticket, workstation, date) -> autosave branch.

1.4 Non-functional requirements

Security:
- gVisor sandbox isolating untrusted code from Workstation host kernel and other workloads.²⁹⁹³⁰⁰³⁰¹²⁸⁵²⁸²
- mTLS between Control Plane and Agents, strict RBAC in Control Plane.³⁰²³⁰³
Reliability:
- Resilient streaming with backpressure and reconnect semantics for stdout/stderr.²⁹⁴³⁰⁴²⁹²
- Autosave branches on remote Git for recovery if Workstation fails.
Performance:
- gVisor performance tuned with recent FS improvements (VFS2/LISAFS) to keep overhead close to containers for typical I/O patterns.²⁹⁷³⁰⁵

2. TDD – Technical Design Details

2.1 gVisor configuration on Workstations

Install runsc from gVisor releases.³⁰⁶²⁸⁷
sudo runsc install to add runsc runtime to Docker and update daemon.json.²⁸⁷²⁸⁶
Restart Docker; test with docker run --runtime=runsc hello-world.²⁸⁶
Hardening: configure Docker to use cgroupfs when required by gVisor, per docs.²⁸⁷

2.2 SandboxAgent implementation (Go)

gRPC server with TLS + mTLS; SandboxAgent service from proto.³⁰⁷³⁰⁸³⁰²
CreateSandbox builds docker run args and executes them using exec.CommandContext.
ExecSandbox uses bidirectional streams:²⁹²²⁹³²⁹⁴
- On first message (with ExecStart), start docker exec and attach to stdout/stderr.
- Forward stdout/stderr as chunked ExecStdout/ExecStderr events.
- Accept ExecStdin messages and write to process stdin.
- On termination or timeout, send ExecExit.
ReportMetrics reads from Docker stats/cgroup FS and streams metrics periodically to Control Plane.²⁹⁶²⁹⁵

2.3 Control Plane internals

Scheduler:
- Workstation registry with capacity metrics.
- Placement algorithm (simple round-robin, least-loaded, or tenant-aware).
Quota service:
- DB schema for per-tenant/project quotas and usage counters.
- Atomic operations to “reserve” and “release” capacity when sandboxes start/stop.
IAM:
- JWTs or session tokens tying user to tenant/project; Control Plane enforces ACLs at /sandboxes API.
Audit logging:
- Append-only event log of sandbox create/exec/destroy with tenant/project IDs, user, and model (e.g., Claude/Gemini) context.

2.4 Git autosave + worktrees details

For each active ticket/workstation/project:

git worktree add ../.wt-autosave-$ID autosave/<ticket>/<tenant>/<project>/<user>/<ws>/<yyyymmdd>

Autosave daemon in that worktree:²⁹⁰²⁹¹²⁸⁸²⁸⁹
- Debounced loop: if dirty and no index.lock, then:

git add -A
git commit -m "autosave: PROJ-1234 ws-01 alice 2026-01-14T04:07:15Z"
git push origin autosave/...

On non-FF push: git fetch + git rebase origin/autosave/... and retry; if conflict persists, create suffix branch and update mapping.

2.5 Integration with Claude Code and Gemini

Control Plane keeps project context (Git repo URLs, file trees, tests).
When user invokes AI action, Control Plane:
- Pulls autosave branch or feature branch;
- Calls Claude Code or Gemini with repo snapshot and task;
- Writes modifications into workspace;
- Triggers tests inside gVisor sandbox;
- Autosave daemon snapshots changes to GitHub.

3. ADR – Architecture Decision Record

Title: Use Google Cloud Workstations with gVisor (runsc) and Workstation-local gRPC agents for Coditect sandboxes and Git autosave.

Context

Coditect must execute untrusted, multi-tenant agent code for software development workflows, integrate with GitHub for near real-time project backups, and support AI agents (Claude Code, Gemini) operating over codebases. Requirements include:

Stronger isolation than plain containers on shared hosts.
Economic viability and simplicity on GCP.
Deep IDE integration via Google Cloud Workstations.²⁸³²⁸⁴
Centralized quotas, audit, and streaming execution control for sandboxes.

Decision

Use Google Cloud Workstations as the primary execution environment, with each Workstation being a GCE VM managed by Workstations controller/gateway in Coditect’s VPC.²⁸⁴²⁸³
Install gVisor’s runsc runtime on each Workstation and configure Docker to use runsc for sandbox containers, leveraging gVisor’s user-space kernel for added isolation.³⁰⁰²⁸²²⁸⁶²⁸⁷
Deploy a Coditect Agent on each Workstation that exposes a gRPC SandboxAgent service, implements sandbox lifecycle via Docker+runsc, streams stdout/stderr, and reports metrics.²⁹³²⁹⁴²⁹²
Keep a central Coditect Control Plane exposing /sandboxes to clients, performing IAM + quota checks, and orchestrating work across Workstations and agents.

5. **Implement Git autosave using remote-pushed autosave branches and git worktrees**, with branches named `autosave/<ticket>/<tenant>/<project>/<user>/<workstation>/<yyyymmdd>` and machine-generated commits pushed to GitHub, giving near real-time backup.[^17_10][^17_28][^17_7][^17_8][^17_9]

Rationale

gVisor is designed as an application kernel for containers and is widely used within Google’s own serverless and container platforms, providing a well-tested isolation layer suitable for untrusted code.³⁰¹²⁸⁵²⁸²²⁹⁹³⁰⁰
Google Cloud Workstations architecture already gives per-team VMs, persistent disks, and VPC connectivity, and can be centrally managed and snapshotted.²⁸³²⁸⁴
Using gRPC streaming between control plane and agents matches best practices for long-lived process I/O (stdout/stderr) and metrics, and keeps the control plane simple and language-agnostic.³⁰⁴²⁹⁴²⁹²²⁹³
Git worktrees and autosave branches provide local isolation and remote resiliency: work continues even if autosave fails, and GitHub holds recovery points if a Workstation dies.²⁸⁸²⁸⁹²⁹⁰²⁹¹

Alternatives considered

Kubernetes + Kata/Firecracker: Stronger isolation but higher operational cost and complexity on GCP; less aligned with the Workstations-first developer experience.³⁰⁹³¹⁰³¹¹
Remote-only autosave (no worktrees): Simpler but interferes with the developer’s active branch and makes automation riskier; worktrees were chosen to isolate automation.³¹²³¹³

Consequences

Pros
- Stronger isolation than vanilla containers with minimal changes to the developer experience.²⁸⁵²⁸²³⁰⁰³⁰¹
- Clear separation between Control Plane and Workstation Agent responsibilities.
- Robust autosave and recovery with GitHub as the durable store.
Cons
- Requires OS-level access to Workstations to install runsc and configure Docker.³⁰⁶²⁸⁶²⁸⁷
- gVisor compatibility quirks must be tested for typical dev workloads and tools.³¹⁴³¹⁵²⁹⁷

This ADR, SDD, and TDD give a concrete blueprint for implementing the Coditect Google Cloud Workstation architecture as designed in the research. ³¹⁶³¹⁷

⁂

Define the scope and success criteria for the SDD TDD and ADR

List stakeholders and approval authorities for the documents Provide an outline template for the SDD specific to Cloud Workstations Provide an outline template for the TDD focused on gVisor and runsc Provide an ADR template with decision options and tradeoffs section

For Coditect’s Google Cloud Workstations + gVisor effort, the SDD, TDD, and ADR should have explicit scope, measurable success criteria, clear owners, and reusable templates tailored to this stack.

Scope and success criteria

SDD (System Design Document)

Scope
- End-to-end system architecture for Coditect on Google Cloud Workstations: control plane, Workstations, Coditect agent, gVisor sandboxes, Git autosave, IAM, quotas, and observability.
Success criteria
- All stakeholders can describe the system at a high level from the SDD alone.
- Architecture decisions are consistent with GCP Workstations and gVisor capabilities (no “magic infra”).³¹⁸³¹⁹³²⁰
- Interfaces between components (API, gRPC, Git, IAM) are unambiguous enough to drive TDD work.

TDD (Technical Design Details)

Scope
- Implementation-level design for gVisor and runsc integration on Workstations, the SandboxAgent gRPC service, and autosave Git worktrees.
- Container launch and lifecycle, resource limits, metrics collection, and integration with Docker/containerd.³²¹³²²³²³³²⁴
Success criteria
- Go/Rust implementation teams can build the agent and control plane without guessing behaviors.
- Operational teams can configure runsc and Docker on Workstations using only TDD steps.³²⁵³²¹
- gRPC APIs and message schemas are stable enough to generate client/server stubs and tests.³²⁶³²⁷³²⁸

ADR (Architecture Decision Record)

Scope
- Capture major architectural choices (Workstations vs GKE, gVisor vs microVMs, autosave strategy, etc.), rationale, and implications.
Success criteria
- Future engineers can understand why Workstations + gVisor was chosen and what alternatives were rejected.
- Changes to the architecture can be evaluated against documented decisions and tradeoffs.

Stakeholders and approval authorities

Product / Platform Lead (Coditect)
- Owns overall platform direction and approves SDD/ADR alignment with product roadmap.
Chief Architect / Principal Engineer
- Accountable for SDD and ADR technical soundness and long-term maintainability.
Infra / DevOps Lead
- Approves TDD sections on Workstations provisioning, Docker/containerd + gVisor setup, monitoring, and rollout.³¹⁹³²²³¹⁸³²⁵
Security / Compliance Officer
- Reviews SDD/TDD for sandbox isolation, IAM, audit logging, and data handling; signs off on ADR security tradeoffs (gVisor vs microVM).³²⁰³²³³²⁹³³⁰
Team Leads (Agent Orchestration, Git Integration)
- Ensure SDD/TDD requirements are implementable by their teams; sign off on scope and milestones.

Approval suggestion:

SDD: Product Lead + Chief Architect + Security.
TDD: Chief Architect + Infra Lead + relevant Team Leads.
ADR: Chief Architect + Security Officer (and Product Lead if impact is high).

SDD outline template (Cloud Workstations–specific)

1. Document control

Version, date, author.
Reviewers and approvers (names/roles).
Related ADRs and TDDs.

2. Overview

Purpose and scope (Coditect on GCP Workstations).
Objectives (multi-tenant agent compute, near real-time Git persistence, safe untrusted code execution).

3. System context

Context diagram:
- Coditect Control Plane, Google Cloud Workstations, GitHub, Anthropic Claude, Google Gemini, identity provider.³¹⁸³¹⁹
External dependencies (GCP services, GitHub, auth providers).

4. High-level architecture

Components:
- Workstations cluster (controller/gateway, per-team VMs).³¹⁹³¹⁸
- Coditect Control Plane.
- Workstation Agent + gVisor runtime.³²⁰³²¹
- GitHub and autosave branches.
Deployment topology (regions, VPCs, projects).

5. Workstation and sandbox model

Workstation lifecycle (provisioning, scaling, deprovisioning).³¹⁸³¹⁹
Sandbox abstraction (one gVisor container per sandbox).³²³³²⁴³³⁰³²⁰

- Workspace layout (`/workspaces/<tenant>/<user>/<project>`).

6. Control Plane responsibilities

/sandboxes API surface.
Scheduling logic (Workstation selection).
Quota enforcement and billing.
IAM model (tenants, projects, users, roles).

7. gRPC and messaging

Description of SandboxAgent gRPC services and message flows (Create/Exec/Destroy/ReportMetrics).³²⁷³²⁸³²⁶
Error handling and retry semantics.

8. Git integration and autosave

Git repository mapping (tenant/project → repo).
Autosave branch naming and worktree strategy.³³¹³³²³³³³³⁴
Promotion from autosave to feature branches.

9. Non-functional requirements

Security (gVisor isolation, network policies, mTLS).³²⁹³³⁰³³⁵³²³³²⁰
Reliability and availability (Workstation/node failure behavior).
Performance expectations (latency, throughput, cost).³³⁶³³⁷
Observability (logging, metrics, tracing).

10. Risks and open questions

gVisor compatibility hot spots.³³⁸³³⁹³⁴⁰
Workstations lifecycle edge cases.
Future evolution (microVM tier, Wasm tier).

TDD outline template (gVisor + runsc–focused)

1. Document control

Version, date, author, reviewers.

2. Purpose and scope

Detailed design for:
- gVisor installation and configuration on Workstations.
- Docker/containerd runtime integration.
- SandboxAgent implementation.
- Metrics, logs, and lifecycle policies.

3. Workstation environment

Base OS/image and Workstations configuration.³¹⁹³¹⁸
Required packages (Docker/containerd, runsc, etc.).³²²³²⁵
Security hardening (user accounts, SSH, file permissions).

4. gVisor (runsc) setup

Installation steps (commands, versions) referencing gVisor docs.³²¹³²²³²⁵
Docker/containerd configuration snippets (daemon.json, runtime definitions).
Validation tests (docker run --runtime=runsc hello-world).³³⁹³²¹

5. Sandbox lifecycle implementation

Container naming and labels (coditect-sb-<id>).
CreateSandbox behavior (CPU/mem/network/volume args).
Exec behavior (PTY support, working dirs, env).
Destroy behavior and cleanup (timeouts, orphan detection).
Lifecycle state machine and transitions.

6. SandboxAgent gRPC server

Service definitions (from proto).³²⁸³²⁶³²⁷
Go package layout (agent binary, config, logging).
Streaming implementation details:
- stdout/stderr buffering and chunk size.
- Stdin handling and control messages.
- Heartbeats and idle detection.

7. Metrics and logging

Metrics collection (Docker stats, cgroups, sampling interval).³³⁷³⁴¹³⁴²
Mapping to MetricsReport fields and quota counters.
Log routing from Workstations to central logging (e.g., via fluentd/Vector).

8. Security and authN/Z

mTLS configuration (cert distribution, rotation, validation).³³⁵³⁴³
Agent-side checks on caller identity (peer cert SANs).
Least-privilege OS users and Docker group configuration.

9. Failure handling and resiliency

Behavior on gRPC disconnects (Exec retries, metrics stream reconnection).³⁴⁴³²⁸
Handling of Docker failures and runsc errors.
Health checks for Agent and gVisor.

10. Testing strategy

Unit tests (command building, gRPC handlers).
Integration tests (agent + control plane on a real Workstation).
Performance and soak tests under gVisor (CPU/I/O patterns).³³⁷³³⁹

ADR template (with options and tradeoffs)

Title: <Short decision title> Status: Proposed | Accepted | Superseded Date: <YYYY-MM-DD> Authors: <names>

1. Context

Problem statement and constraints.
Relevant background (e.g., Coditect’s need for sandboxing untrusted code on GCP Workstations).³³⁰³²⁰³¹⁸³¹⁹

2. Decision

Clear statement of the chosen option.
Brief description of the solution (e.g., “Use gVisor (runsc) on Workstations with Docker, plus per-VM SandboxAgent gRPC service.”).³²⁴³²³³³⁰³²⁰³²¹

3. Options considered

List each option with a short summary:

Option A: GCP Workstations + gVisor + Docker/runsc (chosen).³³⁰³²⁰³²¹
Option B: GKE + Kata Containers/Firecracker.³⁴⁵³⁴⁶³⁴⁷
Option C: GKE + gVisor (GKE Sandbox) with remote IDE.³⁴⁸³⁴⁹
Option D: Direct Firecracker microVM pool on GCE.³⁵⁰

4. Tradeoffs

For each option, discuss:

Security
- Isolation strength, blast radius, kernel sharing vs VM boundary.³⁴⁷³²³³⁴⁵³⁵⁰
Performance
- Cold start, I/O overhead, CPU/memory footprint.³³⁶³³⁷
Operational complexity
- Setup effort (Workstations vs GKE vs custom microVM), support, upgrades.³⁴⁶³¹⁸³¹⁹
Developer experience
- IDE quality, latency, friction for AI-assisted workflows.
Cost
- VM count, node types, overhead from isolation layer.

Clearly highlight why the chosen option wins overall in this context (e.g., best balance of security, DX, and implementation time).

5. Consequences

Positive
- List benefits and what becomes simpler.
Negative / Risks
- New complexity, vendor dependencies, potential performance or compatibility issues.³³⁸³³⁹³³⁷
Mitigations
- How risks will be handled (e.g., compatibility testing, fallbacks to plain containers/microVMs).

Links to other ADRs (e.g., autosave strategy, multi-runtime tiering, AI model choices).

This gives you a structured way to lock in the Workstations + gVisor direction while keeping future alternatives explicit and comparable.

⁂

CODITECT runs DJANGO backend with front-end REACT TYPESCRIPT strict for user, tenant, project, license management with JWT tokens for individuals with RBAC - analyze and describe how this integrates int the GOOGLE CLOUD WORKSTATION provisioning, orchestration, gVisor assignements for individuals working within shared Google Cloud Workstations

Coditect’s Django/React SaaS becomes the control plane and identity source for who can use which Cloud Workstations and which gVisor sandboxes run on them, with JWT+RBAC driving provisioning, orchestration, and sandbox assignment.³⁵¹³⁵²³⁵³³⁵⁴³⁵⁵³⁵⁶³⁵⁷

Identity, JWT, and RBAC as the control layer

Django issues JWTs per user that embed user_id, tenant_id, project_ids, and role claims (roles: ["tenant_admin", "developer", ...]).
React TS frontend authenticates via these JWTs and calls Coditect’s backend (Django REST / GraphQL), not GCP directly.
Coditect backend maps its internal RBAC to Google Cloud IAM:
- E.g., a tenant admin can request workstation templates but not edit cluster-wide settings.³⁵²³⁵⁸³⁵¹
JWT subject and tenant/project claims are the canonical identity for:
- Which Workstation(s) a user may attach to.
- Which sandboxes (gVisor containers) they may start, exec into, or destroy on those Workstations.³⁵⁴³⁵⁹

Workstation provisioning driven by Django

Cloud Workstations are provisioned via GCP APIs or Terraform/Pulumi from a Coditect service account, not from the browser.³⁶⁰³⁶¹³⁵⁶³⁵⁷
Django integrates user/tenant/project models with Workstation configs:
- For each tenant/team, Coditect stores the Workstation cluster and one or more workstation configs (image, machine type, disk, tools).³⁵⁸³⁵⁶³⁵⁷
- When a user creates or resumes a dev session from the React UI, Django:
  - Validates they have a role allowing dev sessions in that project.
  - Calls the Workstations API (with roles/workstations.workstationUser or workstationCreator bound to its service account) to create or attach to a workstation.³⁵¹³⁵²³⁵⁴³⁵⁸
IAM bindings are managed centrally: tenant or team groups are bound to Workstation configs with appropriate roles; individual user access comes from Coditect’s RBAC layer mapping to these groups or directly to Workstations.³⁵²³⁶⁰³⁵⁸³⁵¹

gVisor sandbox assignment per user in shared Workstations

Within each Workstation VM, a Coditect Agent runs, which trusts the control plane’s JWT-bearing requests and never exposes raw Docker/gVisor to end-users directly:

The React frontend calls Django with the user’s JWT to request sandbox creation or execution.
Django validates RBAC (e.g., project:execute_sandbox) and then calls the Agent over gRPC as the only caller, using mTLS service identity and passing user/tenant/project context in metadata.³⁵⁹³⁶²³⁶³
The Agent runs per-sandbox gVisor containers (docker run --runtime=runsc ...) tagged with labels derived from the JWT claims: tenant_id, project_id, user_id, sandbox_id.³⁶⁴³⁵⁵³⁶⁵³⁶⁶³⁶⁷
This allows:
- Multi-user, multi-tenant isolation inside a shared Workstation: each user’s agent instructions correspond to separate gVisor sandboxes, giving a user-space kernel boundary between them.³⁵³³⁵⁵³⁶⁴
- The control plane to enforce per-user and per-project quotas on concurrent sandboxes and resource usage before a CreateSandbox call is made.

The net effect: Django’s JWT+RBAC governs not only access to Coditect features but also which gVisor sandboxes can be created and on which Workstations, with Google IAM governing underlying Workstation access.³⁵³³⁵⁴³⁵⁸³⁵¹³⁵² ³⁶⁸³⁶⁹³⁷⁰³⁷¹³⁷²

⁂

Map required IAM roles and permissions for each service component

Describe JWT token flow and refresh strategy for tenants Design RBAC role hierarchy for users tenants and projects Specify secure storage and rotation for signing keys List API endpoints that need tenant isolation checks only once per request cycle is required for caching or optimization

Coditect’s stack needs clear separation between Google IAM (for Workstations) and app-level JWT/RBAC (for tenants/projects). Below is a concise mapping and design.

IAM roles and permissions per component

1. Coditect Control Plane (Django backend) Service account: coditect-control-plane-sa

Project-level roles on the GCP project(s) hosting Workstations:
- roles/workstations.admin for creating/managing workstation clusters/configs (if Coditect manages infra).³⁷³³⁷⁴
- Or narrower:
  - roles/workstations.workstationCreator to create Workstations from configs.³⁷⁴³⁷⁵
  - roles/workstations.workstationUser if only starting/stopping and connecting.³⁷⁵³⁷⁴
Possibly roles/iam.serviceAccountUser on coditect-workstation-agent-sa if Workstations run agents with that SA and need to impersonate.³⁷⁶³⁷⁴

2. Workstation Agent (per VM) Service account: coditect-workstation-agent-sa

Minimal roles:
- roles/logging.logWriter to send logs to Cloud Logging.
- roles/monitoring.metricWriter if directly pushing metrics.
No direct Workstations API access needed; it only talks to the Control Plane via gRPC.

3. CI / Infra automation Service account: coditect-infra-sa

roles/workstations.admin to create/update Workstations clusters/configs.³⁷³³⁷⁴
roles/iam.serviceAccountAdmin only if managing service accounts for agents.

4. Human users

Google IAM roles for direct Workstations usage (if ever used outside Coditect):
- Typically roles/workstations.user or roles/workstations.workstationUser mapped to groups, but ideally humans only interact via Coditect UI.³⁷⁷³⁷⁴³⁷⁵

JWT token flow and refresh strategy

Claims (access token)

Standard: sub, iat, exp, iss.
Custom:
- tenant_id
- user_id
- project_ids (or current project)
- roles (tenant/global: ["tenant_admin", "project_admin", "developer"])
- Optional workstation_id when bound to a session.

Flow

User logs in via SSO/OIDC; Django maps identity to tenant_id and roles.
Django issues a short-lived access JWT (e.g., 15–30 minutes) signed with HS256/RS256.³⁷⁸³⁷⁹
React TS frontend attaches this JWT in Authorization: Bearer header for API calls.
Django validates token and uses claims for RBAC and tenant isolation checks per request.

Refresh

Maintain a longer-lived refresh token (HTTP-only, Secure cookie) mapped server-side to user/device.
When access token is near expiry, frontend calls /auth/refresh; Django:
- Validates refresh token.
- Issues new access JWT with updated claims (e.g., changed roles/permissions).
Immediate revocation: server-side invalidation list keyed by refresh token ID; access JWTs naturally expire soon.

RBAC role hierarchy (users, tenants, projects)

Structure: tenant-scoped roles + project-scoped roles.

Tenant-level roles

tenant_owner
- Full management of tenant settings, billing, all projects and Workstations within tenant.
tenant_admin
- Manage projects, users, licenses; cannot change billing/legal.
tenant_auditor
- Read-only access to logs, audit, and project configs.

Project-level roles

project_admin
- Manage project membership, settings, Workstation configs for that project.
- Can create/destroy sandboxes and adjust quotas within limits set by tenant.
developer
- Create/exec/destroy sandboxes within project.
- Access project repo, autosave, AI tools (Claude/Gemini) according to policies.
viewer
- Read-only access to logs, code (if allowed), no sandbox execution.

Role mapping and evaluation

JWT contains both tenant and project roles, e.g.:

{
  "tenant_id": "t-acme",
  "user_id": "u-alice",
  "tenant_roles": ["tenant_admin"],
  "project_roles": {
    "proj-foo": ["project_admin"],
    "proj-bar": ["developer"]
  }
}

On each request, Django:
- Checks tenant-level role for tenant-scoped endpoints (user management, workstation config).
- Checks project role for project-scoped endpoints (sandboxes, autosave, AI runs).

Hierarchy:

tenant_owner ⊃ tenant_admin ⊃ {project_admin, developer, viewer}
project_admin ⊃ {developer, viewer}

Secure storage and rotation for signing keys

Key types

Access/refresh token signing keys (JWT).
mTLS certs/keys for gRPC between Control Plane and Agents.³⁸⁰³⁸¹

Storage

Store JWT signing keys in a managed KMS (e.g., Google Cloud KMS) and never embed in images/env vars.³⁷⁸
- Django uses KMS to sign/verify tokens or loads keys from KMS at startup with caching.
Store mTLS certs/keys in:
- Secret manager or KMS, distributed to Workstations via startup scripts or Workstation images.³⁸²³⁸³³⁸⁴

Rotation

JWT signing:
- Use a key ID (kid) in JWT header and maintain a keyset (current + previous).
- Rotate keys periodically (e.g., every 90 days) by introducing new key, updating keyset, and invalidating old one once old tokens expire.
mTLS certs:
- Issue short-lived certs per agent (e.g., via internal CA or GCP CA Service).
- Implement automated renewal and hot-reload on agents and control plane.

Endpoints requiring tenant isolation checks (once per request)

These are endpoints where you must validate tenant and project from JWT against the resource, but can then cache the result for the duration of the request to avoid repeated lookups:

Auth-related endpoints
- POST /auth/refresh (validates tenant/user but no heavy tenant lookup).
Tenant-scoped management
- GET /tenants/{tenant_id}
- GET/POST /tenants/{tenant_id}/users
- GET/POST /tenants/{tenant_id}/workstations/configs
- GET/POST /tenants/{tenant_id}/projects
Project-scoped execution and resources
- POST /projects/{project_id}/sandboxes
- POST /projects/{project_id}/sandboxes/{id}/exec
- DELETE /projects/{project_id}/sandboxes/{id}
- GET /projects/{project_id}/sandboxes (list)
- GET/POST /projects/{project_id}/autosave/* (branch info, promotion)
- POST /projects/{project_id}/ai/claude-run
- POST /projects/{project_id}/ai/gemini-run

Pattern:

At the beginning of each request, Django:
- Extracts JWT; verifies signature and expiry.
- Resolves tenant + project membership and roles once (potentially cached in Redis per token ID).
Stores a request-scoped context (tenant, projects, roles).
Downstream handlers and database calls rely on this context instead of re-checking tenant isolation repeatedly.

This keeps tenant isolation robust and explicit while minimizing redundant checks on a per-request basis.

⁂

ANYTHING ELSE WE SHOULD EXPAND ON?

Two areas are worth expanding next, given your goals and stack:

1. Security / abuse-resistance of sandboxes

For Coditect-as-SaaS, you likely want a more explicit security model for:

Threat modeling
- Enumerate attacker types (malicious tenant, compromised agent, model-generated exploit) and what gVisor + VM boundaries do or do not protect against.³⁸⁵³⁸⁶³⁸⁷³⁸⁸
Network controls inside Workstations
- Egress controls per sandbox (DNS, HTTP, cloud metadata access), ideally via iptables/ebpf or a local proxy.³⁸⁹³⁹⁰
File system & secrets isolation
- Exact pattern for which secrets are visible to sandboxes (e.g., no direct cloud creds; only tightly scoped per-project tokens).
Abuse detection
- Metrics/heuristics for detecting abuse: long-running CPU-bound loops, port scans, mass outbound requests, etc., and how the control plane auto-kills or bans.

You could capture this as a Security Design Addendum to the SDD/TDD, tying gVisor’s documented guarantees and limitations into concrete policies.³⁹¹³⁹²³⁸⁵

2. Developer UX and AI workflow integration

You’ve defined infra and autosave; two UX layers would benefit from a design pass:

Session model and UI semantics
- How Coditect represents “I’m working in project X on Workstation Y with sandboxes Z1–Zn” in the React UI and APIs.
- Standard flows: start session, resume session, sync autosave -> feature branch, run tests via Claude/Gemini, review and promote changes.³⁹³³⁹⁴³⁹⁵
AI tool contracts
- Define a small, explicit “AI tool” API surface (for Claude Code / Gemini) that describes:
  - What they can read/write (files, branches).
  - How they request sandboxes (e.g., a “run tests” tool always goes through /projects/{id}/sandboxes).
- Logging / provenance format tying AI actions to Git commits and sandbox executions for audit and explainability.

If you want to go deeper, good next prompts would be:

“Design a security threat model and concrete mitigations for Coditect’s gVisor-based sandboxes on Google Cloud Workstations.”
“Design the user-facing ‘project session’ model and API for Coditect (sessions, sandboxes, autosave, and AI runs) and how it’s represented in the React TS frontend.”

⁂

Metrics/heuristics for detecting abuse: long-running CPU-bound loops, port scans, mass outbound requests, etc., and how the control plane auto-kills or bans.

You could capture this as a Security Design Addendum to the SDD/TDD, tying gVisor’s documented guarantees and limitations into concrete policies. “Design a security threat model and concrete mitigations for Coditect’s gVisor-based sandboxes on Google Cloud Workstations.” “Design the user-facing ‘project session’ model and API for Coditect (sessions, sandboxes, autosave, and AI runs) and how it’s represented in the React TS frontend.”

Two concrete extensions make sense: a Security Design Addendum (threat model + mitigations + abuse detection) and a Project Session model (how users see sessions/sandboxes/autosave/AI runs).

1. Security Design Addendum: gVisor sandboxes on Cloud Workstations

1.1 Threat model

Assets

- Tenant code and data in `/workspaces/<tenant>/<user>/<project>`.

Coditect secrets (GitHub tokens, model keys, internal APIs).
Workstation VMs and underlying GCP project.
Control plane (Django, DB, audit logs).

Adversaries

Malicious tenant user (tries to escape sandbox, exfiltrate data or abuse compute).
Compromised user account (legit user’s JWT stolen).
Malicious or buggy AI-generated code (infinite loops, network abuse).
Compromised Workstation (agent host taken over).

Trust boundaries

gVisor sandbox boundary between untrusted workload and Workstation kernel.³⁹⁶³⁹⁷³⁹⁸
VM boundary between Workstations and other GCP workloads.³⁹⁹⁴⁰⁰⁴⁰¹
mTLS + RBAC between Control Plane and Agents.⁴⁰²⁴⁰³

1.2 Concrete mitigations

Sandbox isolation

All agent code runs in containers with --runtime=runsc, --network=none (or very constrained egress), --read-only rootfs, and fixed CPU/memory limits.⁴⁰⁴⁴⁰⁵⁴⁰⁶³⁹⁶
Each sandbox mounts only its project workspace and ephemeral scratch; no host paths, no Docker socket.

Secrets & identity

No cloud credentials or GitHub tokens inside sandbox by default; AI tools and Git operations are brokered via Coditect backend.
Per-project, scoped tokens if absolutely needed (e.g., Git LFS or artifact fetch).

Network controls

Default network=none for most sandboxes; “networked” sandboxes use:
- Egress proxy with allowlists (GitHub, package registries).
- Egress quotas (requests/hour) and rate limiting per tenant.

Abuse detection metrics/heuristics

Collected via MetricsReport from agent + host-level firewalls/logs:

CPU abuse
- High CPU utilization over threshold (e.g., >80% of core) for >N seconds with no I/O.
- Many sandboxes at or near CPU limit for same tenant.
- Mitigation:
  - Hard per-sandbox CPU time limit.
  - Tenant-level CPU budget (vCPU-seconds per hour); auto-throttle or reject new sandboxes when exceeded.
Memory abuse
- Repeated OOM kills by same tenant or sandbox pattern.
- Rapid growth of memory usage without progress signals (no logs).
- Mitigation:
  - Strict mem limits per sandbox; repeated OOM → cool-down for tenant/project.
Network abuse (for allowed-network sandboxes)
- High rate of outbound connections to distinct IPs (port scan signature).
- Large outbound volume to non-approved domains.
- Mitigation:
  - Egress proxy detecting port scans / connection bursts.
  - Auto-kill sandbox on detection; temporarily block tenant from networked sandboxes.
Filesystem abuse
- Excessive writes (GiB/min) or inode creation in workspace or scratch.
- Mitigation:
  - Quotas on workspace volume size and inode count.
  - Kill sandboxes exceeding thresholds; alert.
Command behavior heuristics
- Detect repeated fork bombs, suspicious binaries, or known exploit toolchains via process monitoring inside sandbox (as far as gVisor allows), plus signatures in stdout/stderr.⁴⁰⁶⁴⁰⁷

Automated responses

Sandbox-level:
- Hard kill (SIGKILL) + mark run as “abuse suspected”.
- Lock that sandbox ID and do not permit further execs.
Project-level:
- Temporary throttle (e.g., max 1 concurrent sandbox for 30 minutes).
- Require manual approval for networked sandboxes.
Tenant-level:
- If multiple projects trigger abuse heuristics within a time window, soft-ban network access or sandbox creation, pending admin review.

All actions logged to audit logs with tenant/project/user/sandbox IDs for post-incident review.

2. Project Session model and API (React TS + Django)

2.1 Conceptual model

Entities

Tenant: organization.
Project: codebase + configuration (Git repo, AI tools enabled, quotas).
Session: a developer’s active workspace in a project, bound to a Workstation and one or more sandboxes.
Sandbox: a gVisor-backed execution environment inside the Workstation.
AI Run: an invocation of Claude or Gemini on a project (code edits or analysis).
Autosave: background Git snapshots in autosave/... branches.

2.2 REST/GraphQL API shape

Sessions

POST /projects/{project_id}/sessions
- Creates or attaches to a session; returns session_id, workstation info, active sandboxes.
GET /projects/{project_id}/sessions/{session_id}
- Returns current state: Workstation, sandboxes, autosave status, active AI runs.
DELETE /projects/{project_id}/sessions/{session_id}
- Ends session (may leave Workstation running but cleans up sandboxes/autosave processes).

Sandboxes

POST /projects/{project_id}/sessions/{session_id}/sandboxes
- Create sandbox; Django calls CreateSandbox on relevant Agent.
POST /projects/{project_id}/sessions/{session_id}/sandboxes/{sandbox_id}/exec
- Start an exec; returns stream token or WebSocket URL for front-end to attach.
DELETE /projects/{project_id}/sessions/{session_id}/sandboxes/{sandbox_id}
- Destroy sandbox.

Autosave

GET /projects/{project_id}/autosave/status
- Summarizes autosave branches and last snapshot time per user/session.
POST /projects/{project_id}/autosave/promote
- Promotes autosave branch into a feature branch (e.g., squash merge) and opens PR.

AI runs

POST /projects/{project_id}/ai/claude-run
- Body: task description, scope (files), optional session/sandbox IDs.
POST /projects/{project_id}/ai/gemini-run
- Same shape.
GET /projects/{project_id}/ai/runs/{run_id}
- Status, logs, diff summary, linked sandbox execs, and autosave commits.

Each endpoint enforces tenant/project isolation based on JWT once per request at the Django layer, as discussed earlier.

2.3 React TS frontend representation

State model (per user)

currentTenant, currentProject.
currentSession {id, workstation, sandboxes[], autosaveSummary, aiRuns[]}.
Derived UI states:
- Session status: CONNECTING, ACTIVE, IDLE, ENDED.
- Sandbox list: each with status (CREATING, RUNNING, IDLE, ERROR), last exec, CPU/mem utilization.
- Autosave indicator: last snapshot time, branch, and Git status compared to main.
- AI activity: list of recent runs with statuses (RUNNING, SUCCEEDED, FAILED).

UI affordances

“Open Project Session” button → POST /sessions → shows Workstation IDE URL and Coditect panel.
“Run in Sandbox” buttons associated with:
- Test commands (pytest/go test).
- Build commands.
- Arbitrary shell commands.
A “Sandboxes” pane showing each sandbox’s logs (via WebSocket/stream connection to ExecSandbox forwarded by Django).
“Autosave” indicator (e.g., green when autosave branch is healthy and up-to-date, orange when paused due to conflicts).
“AI Assist” panel where Claude/Gemini suggestions are tied to AI runs and autosave commits (each suggestion lists the Git commit/branch produced).

Session lifecycle flows

Start: user picks project → frontend calls POST /sessions → backend provisions Workstation (if needed) and returns session.
Work: user edits code (via Workstation IDE), triggers tests and AI tools through Coditect UI; autosave daemon pushes to Git.
Pause/Resume: session can be resumed if Workstation still alive; Coditect uses session ID to rehydrate state (sandboxes, autosave, AI runs).
End: session ends, sandboxes are destroyed; Workstation may be stopped or left for reuse based on policy.

If you want, next steps could be:

A dedicated Security Design Addendum document outline (sections for threat model, STRIDE analysis, policies per risk) filled in for this architecture.
A React TS interface contract (types/interfaces) for Session, Sandbox, AIRun, and AutosaveSummary that you can drop straight into your front-end code.

⁂

A dedicated Security Design Addendum document outline (sections for threat model, STRIDE analysis, policies per risk) filled in for this architecture.

A React TS interface contract (types/interfaces) for Session, Sandbox, AIRun, and AutosaveSummary that you can drop straight into your front-end code.

Below is a compact Security Design Addendum outline (pre-filled for Coditect on Cloud Workstations + gVisor) and a React TS interface contract you can paste into your FE codebase.

Security Design Addendum (Coditect + Cloud Workstations + gVisor)

1. Purpose and scope

Document security posture and controls for Coditect’s gVisor-based sandboxes running inside Google Cloud Workstations.
Extend SDD/TDD with explicit threat model, STRIDE analysis, and policies for sandboxed agent execution.⁴⁰⁸⁴⁰⁹⁴¹⁰⁴¹¹⁴¹²⁴¹³⁴¹⁴

Applies to:

Django control plane + React TS frontend.
Workstations VMs and Coditect Agent.
gVisor runsc sandboxes for untrusted agent code.
Git autosave and AI (Claude/Gemini) tooling.

2. Assets

Code & data: project repositories, configuration, secrets in .coditect/, autosave branches.
Identity & auth: JWTs, refresh tokens, user/tenant/project mappings.
Infra: Workstations VMs, gVisor runtime, Coditect Agents, Control Plane, DB, logs.
Third-party credentials: GitHub tokens, AI model keys, any per-tenant API keys.

3. Trust boundaries

Browser ↔ Django: HTTPS, JWT-based auth; browser untrusted.
Django ↔ Workstation Agent: gRPC over mTLS; only Coditect Control Plane may call agents.⁴¹⁵⁴¹⁶
Agent ↔ sandbox: Docker + gVisor runsc runtime; sandbox is untrusted code, separated from host kernel.⁴¹⁰⁴¹¹⁴¹⁴⁴¹⁷⁴¹⁸
Workstation VM ↔ GCP project: hypervisor isolation; Workstations managed by Cloud Workstations controller.⁴⁰⁹⁴¹⁹⁴⁰⁸

4. STRIDE analysis (per threat, with mitigations/policies)

4.1 Spoofing

Risks:

Attacker impersonates a user or control-plane service.
Rogue client tries to talk directly to Workstation Agent.

Mitigations:

User auth: SSO/OIDC → short-lived access JWTs; refresh tokens in HTTP-only cookies; per-tenant RBAC enforced server-side.⁴²⁰⁴²¹
Service auth: mTLS between Control Plane and Agents, with CA-issued certs and SANs (spiffe://coditect/control-plane vs .../workstation/<id>).⁴¹⁶⁴¹⁵
Agents reject any non-mTLS or invalid cert; only accept control-plane CN/SAN.

Policies:

Tokens: access tokens ≤30 min; refresh tokens revocable server-side.
Regular rotation of certs and JWT signing keys via KMS/CA.

4.2 Tampering

Risks:

Malicious sandbox modifies files outside workspace or tampers with other sandboxes.
Attacker alters logs or audit records.

Mitigations:

gVisor sandbox: untrusted code runs with --runtime=runsc, read-only rootfs, only /workspace volume mounted.⁴¹⁴⁴¹⁷⁴¹⁸⁴²²⁴¹⁰
No hostPath or Docker socket mounts; each sandbox has its own container filesystem.
Central, append-only audit log in Control Plane; sandboxes cannot access it.

Policies:

All sandbox containers must use a hard-coded runsc runtime; no fall-back to runc for untrusted workloads.
Control Plane rejects any attempt to run execs on containers not labeled as Coditect-owned sandboxes.

4.3 Repudiation

Risks:

Users deny having run specific code or AI actions; incidents lack attribution.

Mitigations:

Detailed audit logs: user_id, tenant_id, project_id, sandbox_id, Workstation ID, exec commands, AI tool used, timestamps.
AI runs tied to autosave commits and Git author metadata.

Policies:

Audit events are immutable, stored in an append-only log or WORM-capable storage.
Any admin action manipulating sandboxes or Workstations is logged with actor ID.

4.4 Information disclosure

Risks:

Sandbox reads secrets or code belonging to other projects/tenants.
Sandbox exfiltrates data over network.

Mitigations:

- Workspace isolation: each sandbox only mounts `/workspaces/<tenant>/<user>/<project>` and ephemeral scratch.  

No global filesystem or /home mount; no cloud metadata access.
Default network=none or strict outbound allowlist with egress proxy.⁴¹⁹⁴²²⁴²³
Secrets kept out of sandbox: GitHub tokens, AI keys live in Control Plane; any external calls happen via backend, not directly from sandbox.

Policies:

Any network-enabled sandbox is tied to project policy and tenant risk level; logs of outbound requests with rate limits.
No direct DB or internal service endpoints exposed in sandbox environment.

4.5 Denial of Service

Risks:

Infinite loops / CPU bombs.
Memory bombs, fork bombs.
Port scans or outbound floods.

Mitigations:

gVisor with cgroup CPU/mem limits per container; enforced timeout_seconds and idle_timeout_seconds.⁴²⁴⁴²⁵⁴²⁶
Quota service: per-tenant limits on concurrent sandboxes, vCPU-seconds, memory, and networked sandbox count.
Abuse heuristics:
- CPU >80% for >N seconds with no output → flagged.
- Repeated OOMs / process restarts → auto-kill and cool-down.
- Outbound connection patterns matching port scans → immediate kill, tenant throttling.

Policies:

Sandbox is auto-terminated upon exceeding CPU-time/memory or triggering heuristics; tenant may be temporarily banned from new sandboxes based on configurable thresholds.

4.6 Elevation of privilege

Risks:

Sandbox escapes gVisor to host Workstation.
Compromised Workstation tries to impersonate Control Plane.

Mitigations:

gVisor: user-space kernel intercepting syscalls, reducing host attack surface.⁴¹¹⁴¹²⁴¹³⁴¹⁰⁴¹⁴
Each Workstation runs under a restricted service account with minimal GCP IAM permissions.
Control Plane authenticates agent identity via cert SANs and uses allowlists of agent IDs; compromised Workstation cannot impersonate another.⁴¹⁵⁴¹⁶

Policies:

Regularly update gVisor per production guide and CVEs.⁴¹⁰⁴¹¹
Periodic security scans of Workstation images; limit installed tooling.

React TS interface contract

You can drop this into a types/session.ts or similar.

// Session, Sandbox, AI Run, AutosaveSummary types for Coditect FE

export type SessionStatus = 'CONNECTING' | 'ACTIVE' | 'IDLE' | 'ENDING' | 'ENDED' | 'ERROR';

export interface Session {
  id: string;
  tenantId: string;
  projectId: string;
  userId: string;

  workstationId: string;
  workstationName?: string;
  workstationRegion?: string;

  status: SessionStatus;
  createdAt: string;       // ISO 8601
  lastActiveAt: string;    // ISO 8601

  sandboxes: Sandbox[];
  autosave: AutosaveSummary | null;
  aiRuns: AIRunSummary[];
}

export type SandboxStatus = 'CREATING' | 'RUNNING' | 'IDLE' | 'TERMINATING' | 'TERMINATED' | 'FAILED';

export interface Sandbox {
  id: string;
  sessionId: string;
  tenantId: string;
  projectId: string;
  userId: string;

  image: string;
  runtime: 'gvisor';
  cpuLimit: number;        // cores
  memoryLimitMiB: number;

  status: SandboxStatus;
  createdAt: string;
  lastExecAt?: string;

  // Live metrics snapshot (optional, can be filled from metrics stream)
  metrics?: SandboxMetrics;
}

export interface SandboxMetrics {
  cpuSeconds: number;
  cpuPercent?: number;
  memoryPeakMiB: number;
  bytesRead: number;
  bytesWritten: number;
  lastUpdatedAt: string;
}

export type AIRunStatus = 'QUEUED' | 'RUNNING' | 'SUCCEEDED' | 'FAILED' | 'CANCELLED';

export interface AIRunSummary {
  id: string;
  sessionId: string;
  projectId: string;
  userId: string;

  modelProvider: 'claude' | 'gemini';
  modelName: string;

  status: AIRunStatus;
  createdAt: string;
  completedAt?: string;

  // human-readable summary for UI
  title: string;
  description?: string;

  // link to code changes produced
  autosaveBranch?: string;
  autosaveCommitId?: string;
}

export interface AutosaveSummary {
  tenantId: string;
  projectId: string;
  userId: string;
  ticketId?: string;

  branchName: string;        // e.g. autosave/PROJ-1234/acme/foo/alice/ws-01/20260114
  lastCommitId: string;
  lastCommitAt: string;

  // is autosave loop healthy
  status: 'HEALTHY' | 'PAUSED' | 'ERROR';

  // optional human description for UI
  statusMessage?: string;
}

This gives you:

Session as the top-level unit tying together Workstation, sandboxes, autosave, and AI runs.
Sandbox as a gVisor-backed execution unit with metrics.
AIRunSummary for UI lists and details panes.
AutosaveSummary for status indicators and promotion flows.

⁂

What is open vs closed
Relevant docs and links
Suggested next-step prompt scaffold
OpenAPI 3.1 Specification (Core Endpoints)
Orchestration Flow (Sequence Diagram)
IAM Model
- Token Structure (JWT Claims)
- Policy Enforcement Points
- Secret Injection
RBAC Schema
- Roles
- Role Hierarchy
- Resource Bindings
Persistence Model
- Workspace Types
- Volume Mounting (from Modal pattern)
- Snapshot Flow
Audit Logging Schema
Resource Limits Enforcement
Suggested Follow-Up Prompts
Managed sandbox platforms (API-level inspirations)
Universal sandbox API libraries
Architectural primitives (self-hosted building blocks)
Decision heuristics you can reuse
Suggested next prompt
Isolation runtimes (core “Modal-like” boundary)
Orchestration and execution frameworks
Universal sandbox API (open source)
Design guidance from open references
Concrete stack suggestion for Coditect
Snapshot: when to use what
gVisor
Kata Containers
Firecracker
Wasm runtimes (Wasmtime, Wasmer, WasmEdge)
How to combine them in a Coditect architecture
1. Runtime tiers and Kubernetes setup
- Runtime classes
2. /sandboxes API and runtime selection
- API surface
- Policy engine
3. Mapping to Kubernetes and Wasm
- 3.1 gVisor tier (semi-trusted)
- 3.2 Kata + Firecracker tier (untrusted/public)
- 3.3 Wasm tier (capability tools)
4. Shared observability, quota, and audit logging
- 4.1 Central metadata and audit log
- 4.2 Common telemetry stack
- 4.3 Quota enforcement
5. High-level sequence for /sandboxes with tiered runtimes
Security and performance tradeoffs by tier
- gVisor tier
- Kata Containers + Firecracker tier
- Wasmtime / Wasm tier
Example RuntimeClass YAML for gVisor, Kata, and Wasm
- gVisor RuntimeClass
- Kata + Firecracker RuntimeClass
- Wasm RuntimeClass (if using Wasm containers)
Node pool configuration for Firecracker and Wasm workloads
- Firecracker/Kata nodes
- Wasm nodes
Admission controller policy for enforcing /sandboxes runtime
- Mutating webhook (optional)
- Validating webhook (strong enforcement)
Pod security considerations when mixing runtimes
gVisor on GCP (best economic fit)
Kata Containers + Firecracker on GCP (possible, but heavier)
Wasm runtimes on GCP (targeted, very cheap per-tool)
Practical recommendation for Coditect on Google Cloud Workstations
High-level architecture
Provisioning per-team Workstations with gVisor
Running nested containers with runsc (Modal-like behavior)
Tenant/project quota enforcement
Security and isolation notes
Practical Coditect recommendation on Google Cloud Workstations
1. Minimal gRPC API (control plane ↔ agent)
- Service definitions (proto sketch)
2. Agent behavior on Workstation (gVisor inside VM)
3. Multi-user data model and persistence
- Workspace layout (per user / project)
- Git + GitHub near–real time
4. Coditect project management + agents (Claude/Gemini)
- Project state
- AI-assisted workflows
5. Quotas, multi-tenancy, and economics
1. Protobuf: control plane ↔ agent API
2. Go agent skeleton with docker + runsc
3. Authentication and authorization patterns
4. Streaming stdout/stderr over unstable networks
5. gVisor sandbox lifecycle best practices
Branch naming scheme
Commit-message conventions
Autosave workflow and frequency
Conflict-resolution flow
- 1. Push conflicts (non-fast-forward)
- 2. Divergence from main branches
- 3. Handling local uncommitted changes when switching branches
Summary shape Coditect can adopt
1. Detect and classify the failure
2. Reconcile autosave branch with remote
3. Keeping in sync with main / feature branches
4. Fallback strategy for persistent autosave failures
Branch prefix and naming with worktrees
- Prefix
- Worktree usage
Commit message template
Tags for autosave commits (for cleanup and indexing)
- Tag naming
- Cleanup
Conflict handling with worktrees
Local vs remote autosave branches
Recommended prefix for autosave branches
Branch name format with ticket and workstation ID
Safe unique workstation identifiers
Handling branch naming collisions
Sandbox runtimes and tiers
gVisor on Google Cloud Workstations
Control plane ↔ agent gRPC API
Streaming and lifecycle best practices
Git autosave with worktrees and remote branches
1. SDD – System Design Document
- 1.1 Overview
- 1.2 Architecture components
- 1.3 Data flows
- 1.4 Non-functional requirements
2. TDD – Technical Design Details
- 2.1 gVisor configuration on Workstations
- 2.2 SandboxAgent implementation (Go)
- 2.3 Control Plane internals
- 2.4 Git autosave + worktrees details
- 2.5 Integration with Claude Code and Gemini
3. ADR – Architecture Decision Record
- Context
- Decision
- Rationale
- Alternatives considered
- Consequences
Scope and success criteria
- SDD (System Design Document)
- TDD (Technical Design Details)
- ADR (Architecture Decision Record)
Stakeholders and approval authorities
SDD outline template (Cloud Workstations–specific)
TDD outline template (gVisor + runsc–focused)
ADR template (with options and tradeoffs)
- 1. Context
- 2. Decision
- 3. Options considered
- 4. Tradeoffs
- 5. Consequences
- 6. Related decisions
Identity, JWT, and RBAC as the control layer
Workstation provisioning driven by Django
gVisor sandbox assignment per user in shared Workstations
IAM roles and permissions per component
JWT token flow and refresh strategy
RBAC role hierarchy (users, tenants, projects)
Secure storage and rotation for signing keys
Endpoints requiring tenant isolation checks (once per request)
1. Security / abuse-resistance of sandboxes
2. Developer UX and AI workflow integration
1. Security Design Addendum: gVisor sandboxes on Cloud Workstations
- 1.1 Threat model
- 1.2 Concrete mitigations
2. Project Session model and API (React TS + Django)
- 2.1 Conceptual model
- 2.2 REST/GraphQL API shape
- 2.3 React TS frontend representation
Security Design Addendum (Coditect + Cloud Workstations + gVisor)
- 1. Purpose and scope
- 2. Assets
- 3. Trust boundaries
- 4. STRIDE analysis (per threat, with mitigations/policies)
React TS interface contract

What is open vs closed​

Relevant docs and links​

Suggested next-step prompt scaffold​

OpenAPI 3.1 Specification (Core Endpoints)​

Orchestration Flow (Sequence Diagram)​

IAM Model​

Token Structure (JWT Claims)​

Policy Enforcement Points​

Secret Injection​

RBAC Schema​

Roles​

Role Hierarchy​

Resource Bindings​

Persistence Model​

Workspace Types​

Volume Mounting (from Modal pattern)​

Snapshot Flow​

Audit Logging Schema​

Resource Limits Enforcement​

Suggested Follow-Up Prompts​

WHAT MODAL-LIKE technologies already exist which can provide the technical foundation for the MODAL-LIKE CODITECT sandbox technology platfrom.

Managed sandbox platforms (API-level inspirations)​

Universal sandbox API libraries​

Architectural primitives (self-hosted building blocks)​

Decision heuristics you can reuse​

Suggested next prompt​

what open source can I leverage, I do not want proprietary 3rd party solutions as part of the stack

Isolation runtimes (core “Modal-like” boundary)​

Orchestration and execution frameworks​

Universal sandbox API (open source)​

Design guidance from open references​

Concrete stack suggestion for Coditect​

RESEARCH EACH OF THESE in DEPTH: COMPARE AND CONTRAST, PROs AND CONs solation runtimes (core “Modal-like” boundary)

Snapshot: when to use what​

gVisor​

Kata Containers​

Firecracker​

Wasm runtimes (Wasmtime, Wasmer, WasmEdge)​

How to combine them in a Coditect architecture​

1. Runtime tiers and Kubernetes setup​

Runtime classes​

2. /sandboxes API and runtime selection​

API surface​

Policy engine​

3. Mapping to Kubernetes and Wasm​

3.1 gVisor tier (semi-trusted)​

3.2 Kata + Firecracker tier (untrusted/public)​

3.3 Wasm tier (capability tools)​

4. Shared observability, quota, and audit logging​

4.1 Central metadata and audit log​

4.2 Common telemetry stack​

4.3 Quota enforcement​

5. High-level sequence for /sandboxes with tiered runtimes​

Define security and performance tradeoffs for each runtime tier

Security and performance tradeoffs by tier​

gVisor tier​

Kata Containers + Firecracker tier​

Wasmtime / Wasm tier​

Example RuntimeClass YAML for gVisor, Kata, and Wasm​

gVisor RuntimeClass​

Kata + Firecracker RuntimeClass​

Wasm RuntimeClass (if using Wasm containers)​

Node pool configuration for Firecracker and Wasm workloads​

Firecracker/Kata nodes​

Wasm nodes​

Admission controller policy for enforcing /sandboxes runtime​

Mutating webhook (optional)​

Validating webhook (strong enforcement)​

Pod security considerations when mixing runtimes​

WHICH OF THESE COULD RUN MULTI-USER MULTI-TENANT MULTI-TEAM MULTI-PROJECT WORKFLOWS on GOOGLE CLOUD WORKSTATIONS economically?

gVisor on GCP (best economic fit)​

Kata Containers + Firecracker on GCP (possible, but heavier)​

Wasm runtimes on GCP (targeted, very cheap per-tool)​

Practical recommendation for Coditect on Google Cloud Workstations​

High-level architecture​

Provisioning per-team Workstations with gVisor​

Running nested containers with runsc (Modal-like behavior)​

Tenant/project quota enforcement​

Security and isolation notes​

Practical Coditect recommendation on Google Cloud Workstations​

What is open vs closed

Relevant docs and links

Suggested next-step prompt scaffold

OpenAPI 3.1 Specification (Core Endpoints)

Orchestration Flow (Sequence Diagram)

IAM Model

Token Structure (JWT Claims)

Policy Enforcement Points

Secret Injection

RBAC Schema

Roles

Role Hierarchy

Resource Bindings

Persistence Model

Workspace Types

Volume Mounting (from Modal pattern)

Snapshot Flow

Audit Logging Schema

Resource Limits Enforcement

Suggested Follow-Up Prompts

Managed sandbox platforms (API-level inspirations)

Universal sandbox API libraries

Architectural primitives (self-hosted building blocks)

Decision heuristics you can reuse

Suggested next prompt

Isolation runtimes (core “Modal-like” boundary)

Orchestration and execution frameworks

Universal sandbox API (open source)

Design guidance from open references

Concrete stack suggestion for Coditect

Snapshot: when to use what

gVisor

Kata Containers

Firecracker

Wasm runtimes (Wasmtime, Wasmer, WasmEdge)

How to combine them in a Coditect architecture

1. Runtime tiers and Kubernetes setup

Runtime classes

2. `/sandboxes` API and runtime selection

API surface

Policy engine

3. Mapping to Kubernetes and Wasm

3.1 gVisor tier (semi-trusted)

3.2 Kata + Firecracker tier (untrusted/public)

3.3 Wasm tier (capability tools)

4. Shared observability, quota, and audit logging

4.1 Central metadata and audit log

4.2 Common telemetry stack

4.3 Quota enforcement

5. High-level sequence for `/sandboxes` with tiered runtimes

Security and performance tradeoffs by tier

gVisor tier

Kata Containers + Firecracker tier

Wasmtime / Wasm tier

Example RuntimeClass YAML for gVisor, Kata, and Wasm

gVisor RuntimeClass

Kata + Firecracker RuntimeClass

Wasm RuntimeClass (if using Wasm containers)

Node pool configuration for Firecracker and Wasm workloads

Firecracker/Kata nodes

Wasm nodes

Admission controller policy for enforcing `/sandboxes` runtime

Mutating webhook (optional)

Validating webhook (strong enforcement)

Pod security considerations when mixing runtimes

gVisor on GCP (best economic fit)

Kata Containers + Firecracker on GCP (possible, but heavier)

Wasm runtimes on GCP (targeted, very cheap per-tool)

Practical recommendation for Coditect on Google Cloud Workstations

High-level architecture

Provisioning per-team Workstations with gVisor

Running nested containers with `runsc` (Modal-like behavior)

Tenant/project quota enforcement

Security and isolation notes

Practical Coditect recommendation on Google Cloud Workstations

1. Minimal gRPC API (control plane ↔ agent)

Service definitions (proto sketch)

2. Agent behavior on Workstation (gVisor inside VM)

3. Multi-user data model and persistence

Workspace layout (per user / project)