target architecture (e.g., Coditect‑style multi‑tenant agents, GPU needs, persistence model), a next step can be a concrete prompt scaffold for: “Design a Modal‑backed sandbox executor service for untrusted agent code (API + orchestration + limits + IAM). ARE MODAL SANDBOXES open source?

Modal Sandboxes themselves (the backend runtime and orchestration) are not open source; only the client SDKs (Python modal, JS/TS/Go libmodal) are open source under Apache‑2.0.¹²³⁴

What is open vs closed

The Modal Python client (modal on PyPI, modal-client on GitHub) is Apache‑2.0 licensed open source.²³¹
The JS/TS and Go SDKs (libmodal on GitHub) are also open source SDKs for interacting with Modal.⁵⁴
The Modal Sandboxes product (gVisor‑based isolation, orchestration, scheduling, infra) is a proprietary managed service, not released as open source. Public materials describe it as a product you use via their API, with no backend repo exposed.⁶⁷⁸

Relevant docs and links

Modal Sandboxes product page (positioning, capabilities): https://modal.com/products/sandboxes
Modal Sandboxes guide (runtime‑defined containers, untrusted code use cases): https://modal.com/docs/guide/sandbox
Python SDK (client) on PyPI: https://pypi.org/project/modal/
Python SDK source (Apache‑2.0): https://github.com/modal-labs/modal-client
JS/TS and Go SDKs: https://github.com/modal-labs/libmodal

Suggested next-step prompt scaffold

Design a Modal-backed sandbox executor microservice for Coditect that executes untrusted agent code using Modal Sandboxes. Define: (1) a multi-tenant API surface (create_sandbox, exec, snapshot, destroy) with per-tenant isolation and quotas, (2) orchestration flow for routing agent jobs to Modal, handling stdout/stderr/log streaming, and enforcing CPU/GPU, time, and memory limits, (3) an IAM model where Coditect issues short-lived tokens/roles that map tenants and projects to specific Modal apps, volumes, and network policies, and (4) a persistence model for ephemeral vs project-scoped workspaces (Modal volumes, object storage), including audit logging of all executions and artifacts. Use OpenAPI 3.1 for the external API, describe internal sequence diagrams, and propose a minimal RBAC schema for tenants, projects, and sandboxes.

If you want, this can be further specialized for “GPU-heavy codegen and test runs” vs “CPU-only quick tools,” or aligned with your existing Coditect RLM orchestrator API. ⁹¹⁰¹¹¹²¹³¹⁴

⁂

Design a Modal-like sandbox executor microservice for Coditect that executes untrusted agent code using Modal-like Sandboxes. Define: (1) a multi-tenant API surface (create_sandbox, exec, snapshot, destroy) with per-tenant isolation and quotas, (2) orchestration flow for routing agent jobs to Modal, handling stdout/stderr/log streaming, and enforcing CPU/GPU, time, and memory limits, (3) an IAM model where Coditect issues short-lived tokens/roles that map tenants and projects to specific Modal-like apps, volumes, and network policies, and (4) a persistence model for ephemeral vs project-scoped workspaces (Modal-like volumes, object storage), including audit logging of all executions and artifacts. Use OpenAPI 3.1 for the external API, describe internal sequence diagrams, and propose a minimal RBAC schema for tenants, projects, and sandboxes.

Below is a comprehensive design document for a Modal-like sandbox executor service for Coditect. This architecture leverages patterns from Modal's Sandbox API while adding multi-tenant IAM, quota enforcement, persistence, and audit logging.

OpenAPI 3.1 Specification (Core Endpoints)

openapi: 3.1.0
info:
  title: Coditect Sandbox Executor API
  version: 1.0.0
  description: Multi-tenant sandbox executor for untrusted agent code

servers:
  - url: https://api.coditect.io/v1

security:
  - BearerAuth: []

paths:
  /sandboxes:
    post:
      operationId: createSandbox
      summary: Create a new sandbox
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/CreateSandboxRequest'
      responses:
        '201':
          description: Sandbox created
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Sandbox'
        '403':
          description: Quota exceeded or permission denied

  /sandboxes/{sandbox_id}/exec:
    post:
      operationId: execCommand
      summary: Execute command in sandbox
      parameters:
        - $ref: '#/components/parameters/SandboxId'
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/ExecRequest'
      responses:
        '200':
          description: Execution started
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ExecHandle'

  /sandboxes/{sandbox_id}/snapshot:
    post:
      operationId: snapshotFilesystem
      summary: Snapshot sandbox filesystem to image
      parameters:
        - $ref: '#/components/parameters/SandboxId'
      responses:
        '201':
          description: Snapshot created
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/SnapshotResult'

  /sandboxes/{sandbox_id}:
    delete:
      operationId: destroySandbox
      summary: Terminate and destroy sandbox
      parameters:
        - $ref: '#/components/parameters/SandboxId'
      responses:
        '204':
          description: Sandbox terminated

  /sandboxes/{sandbox_id}/streams:
    get:
      operationId: streamLogs
      summary: SSE stream of stdout/stderr
      parameters:
        - $ref: '#/components/parameters/SandboxId'
      responses:
        '200':
          description: Event stream
          content:
            text/event-stream:
              schema:
                $ref: '#/components/schemas/LogEvent'

components:
  securitySchemes:
    BearerAuth:
      type: http
      scheme: bearer
      bearerFormat: JWT

  parameters:
    SandboxId:
      name: sandbox_id
      in: path
      required: true
      schema:
        type: string
        format: uuid

  schemas:
    CreateSandboxRequest:
      type: object
      required: [project_id]
      properties:
        project_id:
          type: string
          format: uuid
        name:
          type: string
          maxLength: 64
          pattern: '^[a-zA-Z0-9._-]+$'
        image:
          type: string
          description: Image reference or snapshot ID
        workdir:
          type: string
        env:
          type: object
          additionalProperties:
            type: string
        secrets:
          type: array
          items:
            type: string
          description: Secret references to inject
        volumes:
          type: object
          additionalProperties:
            type: string
          description: Mount path -> volume ID mapping
        limits:
          $ref: '#/components/schemas/ResourceLimits'
        network_policy:
          $ref: '#/components/schemas/NetworkPolicy'
        timeout:
          type: integer
          minimum: 1
          maximum: 86400
          default: 300
        idle_timeout:
          type: integer
          minimum: 30
          maximum: 3600

    ResourceLimits:
      type: object
      properties:
        cpu:
          oneOf:
            - type: number
            - type: array
              items:
                type: number
              minItems: 2
              maxItems: 2
        memory_mib:
          oneOf:
            - type: integer
            - type: array
              items:
                type: integer
              minItems: 2
              maxItems: 2
        gpu:
          type: string
          enum: [none, t4, a10g, a100, h100]

    NetworkPolicy:
      type: object
      properties:
        block_network:
          type: boolean
          default: false
        cidr_allowlist:
          type: array
          items:
            type: string
            format: cidr

    ExecRequest:
      type: object
      required: [command]
      properties:
        command:
          type: array
          items:
            type: string
          minItems: 1
        workdir:
          type: string
        env:
          type: object
          additionalProperties:
            type: string
        timeout:
          type: integer
        pty:
          type: boolean
          default: false

    Sandbox:
      type: object
      properties:
        id:
          type: string
          format: uuid
        tenant_id:
          type: string
        project_id:
          type: string
        name:
          type: string
        status:
          type: string
          enum: [creating, running, terminated, failed]
        created_at:
          type: string
          format: date-time
        tunnels:
          type: object
          additionalProperties:
            $ref: '#/components/schemas/Tunnel'

    ExecHandle:
      type: object
      properties:
        exec_id:
          type: string
        sandbox_id:
          type: string
        stream_url:
          type: string
          format: uri

    SnapshotResult:
      type: object
      properties:
        snapshot_id:
          type: string
        image_ref:
          type: string
        size_bytes:
          type: integer

    LogEvent:
      type: object
      properties:
        stream:
          type: string
          enum: [stdout, stderr, system]
        data:
          type: string
        timestamp:
          type: string
          format: date-time

    Tunnel:
      type: object
      properties:
        port:
          type: integer
        url:
          type: string
          format: uri

Orchestration Flow (Sequence Diagram)

┌─────────┐    ┌────────────┐    ┌──────────────┐    ┌─────────────┐    ┌────────────┐
│  Agent  │    │ API Gateway│    │ Orchestrator │    │ Sandbox Pool│    │ Modal-like │
│ (client)│    │   (IAM)    │    │   Service    │    │   Manager   │    │  Backend   │
└────┬────┘    └─────┬──────┘    └──────┬───────┘    └──────┬──────┘    └─────┬──────┘
     │               │                  │                   │                 │
     │──POST /sandboxes──────────────>│                   │                 │
     │               │                  │                   │                 │
     │               │<──validate JWT, extract tenant/project/roles──       │
     │               │                  │                   │                 │
     │               │──CreateSandboxCmd (with quotas)───>│                 │
     │               │                  │                   │                 │
     │               │                  │──check tenant quota, reserve───>│ │
     │               │                  │                   │                 │
     │               │                  │                   │──Sandbox.create()──>│
     │               │                  │                   │     (image, limits,  │
     │               │                  │                   │      network_policy) │
     │               │                  │                   │<───sandbox_id────────│
     │               │                  │<──sandbox_id, status─────────────│      │
     │<─────────────201 {sandbox}──────│                   │                 │
     │               │                  │                   │                 │
     │──POST /sandboxes/{id}/exec────>│                   │                 │
     │               │                  │──ExecCmd───────>│                 │
     │               │                  │                   │──sb.exec(cmd)──────>│
     │               │                  │                   │<──exec_handle───────│
     │               │                  │<──exec_id, stream_url────────────│      │
     │<─────────────200 {exec_handle}──│                   │                 │
     │               │                  │                   │                 │
     │──GET /sandboxes/{id}/streams ─>│                   │                 │
     │               │                  │──subscribe to log stream────────>│      │
     │<─────────────SSE: stdout/stderr lines──────────────│<──streaming────│      │
     │               │                  │                   │                 │
     │──DELETE /sandboxes/{id}───────>│                   │                 │
     │               │                  │──TerminateCmd──>│                 │
     │               │                  │                   │──sb.terminate()────>│
     │               │                  │                   │<──ack───────────────│
     │               │                  │──release quota──│                 │
     │<─────────────204─────────────────│                   │                 │

Key orchestration points from Modal's patterns:

Sandbox.create() allocates a container with specified image, volumes, CPU/GPU/memory limits, and network policy (block_network, cidr_allowlist) .
Sandbox.exec() runs commands inside the sandbox, returning a handle for streaming stdout/stderr .
Sandbox.snapshot_filesystem() persists the current filesystem state as a reusable image .
Sandbox.terminate() cleans up resources .

IAM Model

Token Structure (JWT Claims)

{
  "sub": "user:uuid",
  "tenant_id": "tenant:uuid",
  "project_ids": ["proj:uuid1", "proj:uuid2"],
  "roles": ["sandbox:execute", "sandbox:snapshot", "volume:read"],
  "quota_tier": "standard",
  "network_policy_override": null,
  "exp": 1705212000,
  "iss": "coditect-iam"
}

Policy Enforcement Points

Check	Enforced At	Description
Token validity	API Gateway	JWT signature, expiry, issuer ¹⁵
Tenant isolation	Orchestrator	Sandbox tagged with tenant_id, query filters
Project scope	Orchestrator	project_id must be in token's project_ids
Role permission	Orchestrator	Action mapped to required role (see RBAC below)
Quota enforcement	Pool Manager	Concurrent sandboxes, GPU allocation per tenant ¹⁶
Network policy	Backend	cidr_allowlist, block_network applied at create

Secret Injection

Secrets are referenced by name (e.g., secrets: ["hf-token", "wandb-key"]) and resolved server-side using tenant-scoped secret stores. The orchestrator injects them as environment variables via Modal-style Secret.from_dict() pattern.

RBAC Schema

Roles

Role	Permissions
`sandbox:create`	Create sandboxes in assigned projects
`sandbox:execute`	Run exec commands in owned sandboxes
`sandbox:snapshot`	Create filesystem snapshots
`sandbox:destroy`	Terminate sandboxes
`sandbox:admin`	All sandbox operations + list all tenant sandboxes
`volume:read`	Mount volumes as read-only
`volume:write`	Mount volumes with read-write access
`secret:use`	Reference secrets for injection
`network:unrestricted`	Override default network policies
`gpu:request`	Request GPU resources

Role Hierarchy

tenant_admin
  └── project_admin
        ├── sandbox:admin
        ├── volume:write
        ├── secret:use
        └── gpu:request
              └── sandbox:create
                    ├── sandbox:execute
                    ├── sandbox:snapshot
                    └── sandbox:destroy

Resource Bindings

# Example: User role binding
bindings:
  - principal: user:alice-uuid
    tenant: tenant:acme-uuid
    projects: [proj:agent-runner]
    roles:
      - sandbox:create
      - sandbox:execute
      - sandbox:snapshot
      - volume:read
      - secret:use
    quotas:
      max_concurrent_sandboxes: 5
      max_gpu_hours_per_day: 10
      allowed_gpu_types: [t4, a10g]

Persistence Model

Workspace Types

Type	Lifecycle	Backend	Use Case
Ephemeral	Sandbox lifetime	tmpfs / overlay	Untrusted temp work, scratch ¹⁷
Project Volume	Persistent	Modal Volume / GCS FUSE	Shared project artifacts, datasets
Snapshot Image	Immutable	Container registry	Checkpoint/restore, reproducibility

# Orchestrator maps tenant request to Modal Volume
volumes = {
    "/workspace": modal.Volume.from_name(f"{tenant_id}/{project_id}/workspace"),
    "/data": modal.CloudBucketMount(f"gs://coditect-{tenant_id}-data", read_only=True)
}
sb = modal.Sandbox.create(app=tenant_app, volumes=volumes, ...)

Snapshot Flow

Agent requests POST /sandboxes/{id}/snapshot
Orchestrator calls sb.snapshot_filesystem()
Backend captures overlay diff → creates Image artifact
Snapshot metadata stored: {snapshot_id, tenant_id, project_id, parent_image, size, created_at}
Snapshot can be used as `image` in subsequent CreateSandboxRequest

Audit Logging Schema

Every sandbox operation emits an audit event to a write-ahead log (e.g., Pub/Sub → BigQuery or FoundationDB):

{
  "event_id": "uuid",
  "timestamp": "2026-01-14T06:47:00Z",
  "event_type": "sandbox.exec",
  "tenant_id": "tenant:acme",
  "project_id": "proj:agent-runner",
  "actor": {
    "type": "user|service_account|agent",
    "id": "uuid",
    "ip": "203.0.113.42"
  },
  "resource": {
    "type": "sandbox",
    "id": "sb-xyz",
    "name": "codegen-worker-1"
  },
  "action": {
    "command": ["python", "run_tests.py"],
    "workdir": "/workspace",
    "timeout": 120
  },
  "result": {
    "status": "success|failure|timeout",
    "exit_code": 0,
    "duration_ms": 4523
  },
  "resources_consumed": {
    "cpu_seconds": 8.2,
    "memory_mib_peak": 512,
    "gpu_seconds": 0
  }
}

Audit events support compliance queries: "Show all executions by agent X in project Y over the last 7 days with GPU usage."

Resource Limits Enforcement

Limit	Enforcement Mechanism	Default
Timeout	Sandbox auto-terminate after N seconds	300s (max 24h)
Idle timeout	Terminate if no exec/stdin/tunnel activity	60s
CPU	(request, limit) tuple → throttling	(0.5, 2.0)
Memory	(request, limit) MiB → OOM kill	(256, 1024)
GPU	Type selection + quota check	none
Concurrent sandboxes	Per-tenant counter in Pool Manager	10
Network	block_network / cidr_allowlist	allowlist: []

Suggested Follow-Up Prompts

gVisor Sentry syscall policy tuning: "Design a custom gVisor seccomp profile for Coditect sandboxes that allows common ML workloads (PyTorch, JAX) while blocking high-risk syscalls. Include audit logging for denied calls."
Quota billing integration: "Extend the Coditect sandbox API with usage metering endpoints and a Stripe-compatible billing webhook for CPU/GPU/storage consumption per tenant."
Warm pool pre-provisioning: "Design a sandbox warm pool strategy for Coditect that pre-provisions N sandboxes per (tenant, image) tuple to reduce cold-start latency, including eviction policies and cost optimization."

Would you like me to drill into any of these areas or generate implementation stubs (Rust/Python/Go) for specific components? ¹⁸¹⁹²⁰²¹²²²³²⁴²⁵²⁶²⁷²⁸²⁹³⁰

⁂

WHAT MODAL-LIKE technologies already exist which can provide the technical foundation for the MODAL-LIKE CODITECT sandbox technology platfrom.

Several “Modal-like” foundations already exist that you can either build on directly or crib heavily from for a Coditect self-hosted sandbox layer.³¹³²³³

Managed sandbox platforms (API-level inspirations)

These give you patterns for API, orchestration, and DX even if you don’t consume them directly.

Northflank Sandboxes / workloads – MicroVM (Kata or cloud-hypervisor) and gVisor isolation, any OCI image, BYOC deployments in your own cloud, and persistent volumes; marketed explicitly as a Modal alternative for secure AI code execution.³¹
E2B.dev – Firecracker-based microVM sandboxes focused on AI agents, with per-session workspaces, SDKs, and 24‑hour max lifetimes. Good reference for “agent-native” sandbox ergonomics.³⁴³¹
Daytona – Fast-provisioning Docker/Kata sandboxes for AI workflows, sub‑90ms spin-up, limited persistence but strong “ephemeral dev env / AI run” mental model.³⁵³⁴³¹
Cloudflare Workers + AI Sandbox SDK – V8 isolate boundary (no full Linux) but very strong example of capability-based APIs, no direct filesystem, and extremely fast cold starts.³⁴³¹
Vercel Sandbox (beta) – Firecracker-based ephemeral sandboxes tightly integrated with their platform; 45‑minute limits and “preview” semantics are a useful pattern for time-bounded untrusted code.³⁶³¹
InstaVM, Koyeb, RunPod et al. – Several GPU-oriented platforms now expose “code execution” sandboxes and serverless containers; Koyeb and RunPod are both called out as Modal alternatives for AI workloads.³⁷³⁸³⁴

Universal sandbox API libraries

These are especially relevant if you want Coditect to orchestrate multiple backends (Modal, E2B, Daytona, self-hosted, etc.) behind one interface.

Cased sandboxes – Open source Python library + CLI that provides a universal API for multiple cloud sandbox providers (Modal, E2B, Daytona, Cloudflare, etc.), with provider selection, failover, sandbox reuse, labels, image selection, and streaming output.³²³⁹
- Example API: async with Sandbox.create(provider="modal") as sandbox: await sandbox.execute("python analyze.py").³²
- This is very close to the Coditect “multi-provider executor” story; you could mirror its provider abstraction while swapping in your own control plane.³²

Architectural primitives (self-hosted building blocks)

Luis Cardoso’s “Field guide to sandboxes for AI” lays out clear decision criteria and suggests concrete tech stacks for AI coding agents.⁴⁰³³

gVisor – User-space kernel interception for hardened containers; good middle ground if you already run Kubernetes and want better isolation than plain containers without going full microVM.³³
Kata Containers / Firecracker / cloud-hypervisor – MicroVM isolation that Cardoso recommends explicitly for multi-tenant AI coding agents (hostile code, full Linux semantics).³³
Kubernetes + CRI runtimes – Use containerd + Kata runtime class or Firecracker-based runtimes to create per-job microVM sandboxes managed via standard K8s Jobs/Pods API. Cardoso walks through how CRI talks to Kata to spin a microVM per pod.³³

For a Coditect “Modal-like but self-hosted” design, you can:

Use Kubernetes + Kata/Firecracker as the isolation substrate.³³
Layer a sandbox control plane that exposes /sandboxes, /exec, /snapshot, /streams on top of K8s (like a focused PaaS).³¹³³
Optionally integrate with or borrow patterns from Cased sandboxes to get multi-provider abstractions and fallback to external services where appropriate.³⁹³²

Decision heuristics you can reuse

The field guide offers a practical decision table you can almost lift into Coditect docs:

For multi-tenant AI agent execution with hostile code and full Linux needs, choose microVM (Firecracker / cloud-hypervisor) as the boundary.³³
If you’re constrained to existing Kubernetes infra and semi-trusted workloads, gVisor-hardened containers are acceptable.³³
Minimum viable policy: default-deny egress with allowlist, short-lived scoped credentials, workspace-only filesystem, hard CPU/mem/timeout limits, and detailed telemetry (process tree + network).³³

Suggested next prompt

Design a self-hosted Coditect sandbox runtime on Kubernetes using Kata/Firecracker microVMs as the isolation boundary, taking inspiration from Cased sandboxes’ provider abstraction and Modal’s Sandbox API surface. Specify the CRDs, controller logic, and mapping from the Coditect /sandboxes HTTP API to K8s Jobs/Pods, including quota enforcement and network policies. ⁴¹⁴²⁴³⁴⁴⁴⁵

⁂

what open source can I leverage, I do not want proprietary 3rd party solutions as part of the stack

For a Coditect-owned, non-proprietary sandbox stack, you can assemble everything from open source components: isolation runtime, orchestrator, and developer-facing API.⁴⁶⁴⁷

These give you the secure execution boundary for untrusted agent code.

gVisor – User-space kernel implementing the Linux syscall interface, designed as a sandbox runtime between containers and the host kernel; used in several “untrusted code” runners.⁴⁸⁴⁹
Kata Containers – OCI-compatible runtime that runs each pod/container in a lightweight VM; effectively “Firecracker with Kubernetes support,” giving VM-grade isolation per sandbox.⁵⁰⁴⁸
Firecracker – MicroVM VMM originally from AWS, widely recommended as the default for multi-tenant AI coding agents that need full Linux, shell, and package managers.⁴⁷⁴⁸
Wasm runtimes (for capability-scoped tools) – Wasmtime, Wasmer, and WasmEdge can be used when you can constrain workloads to WASI, avoiding full Linux while gaining strong isolation and fast startup.⁴⁷

These can all be wired under containerd/CRI on Kubernetes, so that each Coditect sandbox maps to a Pod using a specific runtime class (e.g., Kata for microVM, gVisor for hardened containers).⁵¹⁴⁷

Orchestration and execution frameworks

These provide patterns or code for creating “submit code → run in sandbox → stream output” workflows.

Sandman (jakhax/sandman) – gVisor-based code execution service that runs and tests untrusted code inside a container sandbox; good reference for using gVisor as an isolation layer and discussing security tradeoffs.⁴⁹
Awesome-sandbox list – Curated overview of modern sandboxing solutions, with entries for e2b, Daytona, and others that show patterns for AI-agent runtimes even if you don’t use their hosted offerings.⁴⁶
Chris Hay’s Code Sandbox MCP server – Not a full infra runtime, but a concrete example of a code-execution service exposing a clean tool protocol (MCP) to LLM clients, which you can mirror with your own backend.⁵²

Universal sandbox API (open source)

Even if you don’t call third-party providers, Cased’s library is a strong reference for the shape of a provider-agnostic sandbox API.

Cased sandboxes (MIT-licensed) – Open-source Python library + CLI that defines a universal interface for cloud code execution sandboxes (create, execute, reuse, labels, streaming output).⁵³⁵⁴
- Shows how to abstract “provider” as a pluggable backend and implement a uniform Sandbox.create(...).execute(...) API that Coditect can replicate while routing to your own gVisor/Kata/Firecracker control plane.⁵³

Design guidance from open references

For high-level architecture choices and tradeoffs, you can lean on recent sandbox guides.

Luis Cardoso’s “Field guide to sandboxes for AI” – Open article that compares containers, gVisor, microVMs, and Wasm, and recommends microVMs (Firecracker/cloud-hypervisor) as the default for multi-tenant AI coding agents, with gVisor as a compromise when you’re already on K8s.⁴⁷
Simon Willison’s mirror / commentary – Reinforces the same taxonomy and serves as an additional reference for justifying your architecture and threat model to stakeholders.⁵⁵⁵⁶

Concrete stack suggestion for Coditect

All-open stack, no proprietary services:

Boundary: Firecracker or Kata microVMs (for multi-tenant hostile code) plus gVisor for lighter, semi-trusted workloads.⁴⁸⁴⁷
Orchestration: Kubernetes with runtime classes for Kata/gVisor, plus a custom Coditect sandbox controller and CRD (Sandbox, SandboxExec) that maps your HTTP API to K8s Jobs/Pods.⁵¹
API / DX layer: Coditect “sandbox executor” service modeled on Cased sandboxes API patterns (create, execute, reuse, labels, streaming) but backed solely by your own runtimes.⁵⁴⁵³

If you want, the next step can be: “Design a Kubernetes-based Coditect Sandbox CRD and controller that uses Kata Containers as the runtime, including the mapping from HTTP /sandboxes to CRD, and how to plug in gVisor/Firecracker as alternative runtime classes.” ⁵⁷⁵⁸⁵⁹⁶⁰

⁂

RESEARCH EACH OF THESE in DEPTH: COMPARE AND CONTRAST, PROs AND CONs solation runtimes (core “Modal-like” boundary)

These give you the secure execution boundary for untrusted agent code. gVisor – User-space kernel implementing the Linux syscall interface, designed as a sandbox runtime between containers and the host kernel; used in several “untrusted code” runners. Kata Containers – OCI-compatible runtime that runs each pod/container in a lightweight VM; effectively “Firecracker with Kubernetes support,” giving VM-grade isolation per sandbox. Firecracker – MicroVM VMM originally from AWS, widely recommended as the default for multi-tenant AI coding agents that need full Linux, shell, and package managers. Wasm runtimes (for capability-scoped tools) – Wasmtime, Wasmer, and WasmEdge can be used when you can constrain workloads to WASI, avoiding full Linux while gaining strong isolation and fast startup.

All four can underpin a “Modal-like” sandbox layer, but they make different tradeoffs on isolation strength, compatibility, performance, and complexity.⁶¹⁶²⁶³⁶⁴

Snapshot: when to use what

Runtime	Isolation strength	Linux compatibility	Startup / density	Best fit in Coditect
gVisor	Higher than containers, lower than VMs ⁶²	Very high, but some syscalls slower/unsupported ⁶⁵⁶⁶	Near-container startup, good density ⁶²	Semi-trusted multi-tenant, “secure containers” on K8s
Kata	VM-grade, per-container microVM ⁶⁷⁶⁸⁶³	Full Linux, OCI-compatible ⁶⁷⁶³	Slower than containers; faster than full VMs ⁶³	High-risk tenants; “secure pod” profile on K8s
Firecracker	Very strong, FaaS-level isolation ⁶⁹⁷⁰⁶⁴	Full Linux inside guest, but custom integration ⁶⁹⁷⁰	100–125 ms spin-up, huge density ⁶⁹⁶⁴	Your own Lambda/Modal-style pool for untrusted agents
Wasm runtimes (Wasmtime/Wasmer/WasmEdge)	Very strong per-module memory + capability isolation ⁶¹	Limited to WASI / host APIs; no full Linux ⁶¹	Microseconds–ms startup, extremely high density ⁶¹⁷¹	Capability-scoped tools, sandboxes for constrained languages

gVisor

What it is

A user-space kernel that implements the Linux syscall interface and sits between containers and the host kernel; it “implements Linux by way of Linux” by intercepting syscalls in a sentry process.⁶⁶⁶²⁷²
Deployed as a container runtime sandbox (e.g., runsc), including integration with Kubernetes and GKE Sandbox; often described as “seccomp on steroids.”⁶⁵⁶²

Pros

Better isolation than plain containers: host kernel surface exposed to the workload is drastically reduced; syscalls are handled by the user-space kernel rather than directly by the host.⁶²⁶⁵
Lightweight footprint vs VMs: no guest OS to boot, no per-VM kernel; starts fast and scales like containers while adding an isolation boundary.⁶²
Works without hardware virtualization: no need for KVM support, so easier in nested virtualization environments or constrained clouds.⁶²
Kubernetes-native: can be plugged in as a runtime class and selectively applied to pods that need extra isolation.⁶²

Cons

Not VM-grade isolation: still shares the host kernel; a gVisor escape is less likely than a vanilla container escape, but the blast radius is larger than with Firecracker/Kata microVMs.⁷³⁶²
Performance overhead: syscall-heavy workloads pay a noticeable tax; each syscall goes through the user-space kernel.⁶⁵⁶²
Compatibility quirks: some low-level kernel features, /proc behavior, or exotic syscalls may be missing or behave differently, which can surprise deep Linux tooling.⁶⁶⁶⁵

When it shines for Coditect

Multi-tenant but semi-trusted agent code (e.g., internal teams, controlled languages) where you want better isolation than containers but don’t want to pay microVM costs.⁶¹⁶²
You already have Kubernetes and want to opt-in sandboxing via a runtimeClass on selected workloads.⁶²

Kata Containers

What it is

An open-source runtime that runs each “container” inside its own minimal VM, combining container UX with VM isolation.⁶⁷⁶⁸⁶³
Integrates with Docker/Kubernetes using OCI and CRI, with a runtime plus CRI-friendly shim/library.⁶³⁶⁷

Pros

VM-grade isolation: each pod/container gets its own guest kernel and VM boundary, significantly reducing cross-tenant risk compared to shared-kernel containers.⁶⁷⁶³
Kubernetes/OCI compatible: drop-in runtime that lets you run Kata and standard containers in the same cluster, choosing per-workload isolation.⁶³⁶⁷
Supports multiple VMMs: can use Firecracker or Cloud Hypervisor under the hood, so you get microVM characteristics with K8s integration.⁶⁸⁶³

Cons

Higher overhead than containers: you pay for a guest kernel and VM per sandbox; memory footprint per workload is larger.⁶³
Slower cold starts than containers: still typically faster than traditional VMs, but slower than gVisor/container-only setups.⁶³
Operational complexity: more moving parts (runtime, agent, hypervisor), guest kernel management, and debugging complexity vs plain containers.⁶³

When it shines for Coditect

High-risk, multi-tenant untrusted code (public SaaS) where you want strong isolation but also Kubernetes-native control and scheduling.⁶⁸⁶⁷⁶³
You want a “secure pod” class: map Coditect “high-risk sandboxes” to a K8s runtimeClass that uses Kata, keeping lower-risk workloads on gVisor or runc.⁶³

Firecracker

What it is

An open-source microVM VMM built by AWS, designed for secure, multi-tenant container and function workloads with minimal overhead.⁷⁰⁶⁴
Used under AWS Lambda and Fargate to start thousands of microVMs per second with ~100–125 ms cold-start times and as low as ~5 MB memory footprint per microVM.⁶⁹⁶⁴

Pros

Very strong isolation: each microVM has its own kernel and minimal device model, tailored for security and multi-tenancy.⁶⁴⁶⁹
Purpose-built for FaaS/serverless: start thousands of microVMs per second, with cold starts competitive with containers; ideal for short-lived, untrusted code.⁶⁹⁶⁴
Minimal footprint: small memory and device surface compared to general-purpose hypervisors.⁶⁴⁶⁹

Cons

Lower-level integration effort: unlike Kata, Firecracker doesn’t come with built-in Kubernetes integration; you must integrate via containerd plugins or build your own control plane.⁶⁸⁶⁴
Guest VM management: you must manage guest OS images, kernels, and per-VM boot config, similar to running VMs at scale.⁶⁹⁶⁴
More opinionated: limited device model and focus on network+block devices can complicate some advanced workloads (e.g., complex PCI passthrough).⁷⁰⁶⁹

When it shines for Coditect

A Modal-like / Lambda-like executor: Coditect runs each agent sandbox in a Firecracker microVM, with its own VM pool, warm instances, and very tight per-tenant isolation.⁶⁴⁶⁹
You’re willing to build a custom control plane (or K8s integration) and want direct control of microVM lifecycle, warm pools, and scheduling.⁷⁰⁶⁹

Wasm runtimes (Wasmtime, Wasmer, WasmEdge)

What they are

WebAssembly runtimes that execute Wasm modules with linear memory and no ambient access: all host interactions must be explicitly imported.⁶¹
Often support WASI (WebAssembly System Interface) for POSIX-like capabilities and provide resource metering (“fuel”) for deterministic preemption.⁶¹

Pros

Strong memory and capability isolation: modules can’t touch arbitrary host memory or the OS unless explicitly allowed; great fit for capability-based “tools.”⁶¹
Very fast startup and high density: no guest OS, no VM boot; instantiation is microseconds–milliseconds.⁷¹⁶¹
Fine-grained resource control: e.g., Wasmtime’s fuel mechanism for instruction metering, making runtime limits more deterministic than “CPU time + signals” alone.⁶¹
Language reach: multiple languages compile to Wasm (Rust, TinyGo, C/C++, Zig, some Python/JS subsets), which can be used for extensions and plugins.⁶¹

Cons

No full Linux: many agent workloads assume POSIX, /proc, apt, arbitrary shells, etc., which do not exist in pure WASI environments.⁶¹
Ecosystem gaps: not every language or library is readily portable; some ML stacks or system tools won’t work without heavy adaptation.⁶¹
Host API design burden: you must define a careful host surface (I/O, network, filesystem) and then maintain that ABI over time.⁶¹

When it shines for Coditect

Capability-scoped tools and plugins: “safe” tools where Coditect defines a Wasm ABI (e.g., limited file access, HTTP client) and agents can only call those.⁶¹
Single-tenant or internal deployments where you want Maximum density and safety for specific languages (Rust, Go) without the full complexity of Linux sandboxing.⁷¹⁶¹

How to combine them in a Coditect architecture

A practical hybrid that lines up with current best practice for AI sandboxes:⁶⁴⁶²⁶³⁶¹

Tier 1 (light sandbox) – gVisor runtime class on Kubernetes:
- For internal or semi-trusted code where container-level semantics are enough; used for “fast dev” sandboxes.
Tier 2 (strong sandbox) – Kata Containers (possibly backed by Firecracker):
- For external tenants and highly untrusted code; Coditect maps “high-risk agent sessions” to Kata runtimeClass pods.
Tier 3 (serverless microVM) – Firecracker pool:
- For Modal-style, bursty agent workloads: a separate executor service that manages Firecracker microVMs directly, with warm pools and hard quotas.
Tier 4 (tool sandbox) – Wasm runtime (Wasmtime/Wasmer/WasmEdge):
- For Coditect’s “capability-scoped tools” where you control the ABI; used for plugin-style extensions and deterministically metered workloads.

If you want, the next step can be a detailed design prompt like:

Design a tiered Coditect sandbox architecture that uses gVisor, Kata Containers (with Firecracker), and Wasmtime in a single Kubernetes cluster. Specify how the /sandboxes API selects a runtime tier based on tenant, project, and risk level, and how to share common observability, quota, and audit logging across all runtimes. ⁷⁴⁷⁵⁷⁶

⁂

Design a tiered Coditect sandbox architecture that uses gVisor, Kata Containers (with Firecracker), and Wasmtime in a single Kubernetes cluster. Specify how the /sandboxes API selects a runtime tier based on tenant, project, and risk level, and how to share common observability, quota, and audit logging across all runtimes.

Design a 3‑tier sandbox architecture by treating the “runtime choice” as a policy decision at the /sandboxes control plane, and using Kubernetes RuntimeClasses to route pods to gVisor, Kata+Firecracker, or Wasm-backed workloads.⁷⁷⁷⁸⁷⁹⁸⁰⁸¹

1. Runtime tiers and Kubernetes setup

Runtime classes

Define three RuntimeClass objects for Linux-based sandboxes, plus a Wasm integration (via CRI plugin or sidecar).⁸²⁸³⁸⁰⁸⁴⁷⁷

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: gvisor
handler: runsc           # gVisor handler [web:73][web:65]
---
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: kata-fc
handler: kata-fc         # Kata Containers using Firecracker [web:66][web:78][web:84]
---
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: runc
handler: runc            # baseline container runtime

For Wasm, follow CNCF’s “Wasm on Kubernetes” pattern, using either:⁸⁵⁸¹

Wasm containers: using a Wasm-aware runtime (e.g., cri-o/containerd shim) and an annotation like module.wasm.image/variant.⁸¹
Sidecar pattern: run a Wasmtime/Wasmer sidecar that executes Wasm modules on demand next to a thin HTTP gRPC proxy container.⁸⁵⁸¹

2. `/sandboxes` API and runtime selection

API surface

You keep a single tenant-facing HTTP API, with an explicit but optional risk_profile and runtime_hint that the control plane resolves to a runtime tier:

POST /sandboxes
{
  "project_id": "proj-uuid",
  "name": "agent-run-123",
  "image": "ghcr.io/coditect/agent-runner:latest",
  "risk_profile": "untrusted_public | semi_trusted | internal",
  "runtime_hint": "auto | gvisor | kata | wasm",
  "workload_type": "linux_full | wasm_tool",
  "limits": { "cpu": 1.0, "memory_mib": 1024, "gpu": "none" },
  "network_policy": { "block_network": true },
  "code": {
    "language": "python",
    "entrypoint": "main.py"
  }
}

Policy engine

On POST /sandboxes, the Coditect sandbox controller:

Authenticates the caller and loads tenant + project configuration (risk tier, allowed runtimes).
Computes an effective runtime tier (gVisor / Kata+FC / Wasm) based on:
- Tenant risk classification (e.g., “external SaaS”, “internal corp”).
- Project tag (e.g., project.security_level = high).
- Requested runtime_hint and workload_type.
Maps tier to implementation: Kubernetes RuntimeClass for Linux workloads, or Wasm pipeline for capability-scoped tools.⁷⁸⁷⁹⁷⁷⁸¹

Example pseudo-logic:

def choose_runtime(tenant, project, req):
    # 1. Wasm tools get routed to Wasm
    if req.workload_type == "wasm_tool":
        return "wasm"

    # 2. Force Kata+Firecracker for high-risk external tenants
    if tenant.risk == "external" or project.flags.get("requires_vm_isolation"):
        return "kata-fc"

    # 3. Respect explicit hint if allowed
    if req.runtime_hint == "gvisor" and "gvisor" in tenant.allowed_runtimes:
        return "gvisor"
    if req.runtime_hint == "kata" and "kata-fc" in tenant.allowed_runtimes:
        return "kata-fc"

    # 4. Default policy
    if tenant.risk == "internal":
        return "gvisor"
    else:
        return "kata-fc"

3. Mapping to Kubernetes and Wasm

3.1 gVisor tier (semi-trusted)

For runtime = "gvisor", the controller creates a Pod with the gvisor runtimeClassName.⁸³⁸⁶⁸²

apiVersion: v1
kind: Pod
metadata:
  name: sb-123
  labels:
    coditect.sandbox/id: "sb-123"
    coditect.tenant/id: "tenant-abc"
spec:
  runtimeClassName: gvisor
  containers:
    - name: sandbox
      image: ghcr.io/coditect/agent-runner:latest
      command: ["sleep", "infinity"]
      resources:
        requests:
          cpu: "500m"
          memory: "512Mi"
        limits:
          cpu: "2"
          memory: "1Gi"

gVisor provides an extra boundary beyond runc while still running as fast, OCI-compliant containers.⁸⁶⁷⁸⁸³

3.2 Kata + Firecracker tier (untrusted/public)

For runtime = "kata-fc", the controller creates Pods using the kata-fc RuntimeClass; Kata then uses Firecracker under the hood.⁷⁹⁸⁰⁸⁷⁸⁸

apiVersion: v1
kind: Pod
metadata:
  name: sb-456
  labels:
    coditect.sandbox/id: "sb-456"
spec:
  runtimeClassName: kata-fc
  containers:
    - name: sandbox
      image: ghcr.io/coditect/agent-runner:latest
      command: ["sleep", "infinity"]
      resources:
        requests:
          cpu: "500m"
          memory: "512Mi"

This gives you VM-grade isolation and microVM characteristics (fast boot, low footprint) while still scheduling via Kubernetes.⁸⁰⁸⁷⁸⁸⁷⁹

3.3 Wasm tier (capability tools)

For runtime = "wasm", Coditect does not spin a full Linux sandbox. Instead it:

Deploys a Wasm executor service (Deployment + Service) with a Wasmtime/Wasmer runtime.
/sandboxes returns a logical sandbox ID, but the “exec” calls go to the Wasm executor’s HTTP/gRPC API.

You can either:⁸¹⁸⁵

Run the executor as a sidecar next to a thin API container, or
Use a Wasm container runtime (e.g., annotation-based selection as shown in CNCF’s guide).⁸¹

4. Shared observability, quota, and audit logging

4.1 Central metadata and audit log

Maintain a cluster-agnostic metadata store (e.g., PostgreSQL, FoundationDB) with a sandboxes table recording tenant, project, runtime tier, and lifecycle state:

CREATE TABLE sandboxes (
  id uuid PRIMARY KEY,
  tenant_id uuid NOT NULL,
  project_id uuid NOT NULL,
  runtime_tier text NOT NULL,   -- gvisor | kata-fc | wasm
  k8s_namespace text,
  k8s_pod_name text,
  status text,
  created_at timestamptz,
  terminated_at timestamptz
);

Every API action (create, exec, snapshot, destroy) writes to an append-only audit_log table or event stream (Pub/Sub, Kafka).⁸⁹⁸⁶

CREATE TABLE sandbox_audit_log (
  event_id uuid PRIMARY KEY,
  sandbox_id uuid NOT NULL,
  tenant_id uuid NOT NULL,
  event_type text NOT NULL,      -- create | exec | destroy | snapshot
  runtime_tier text NOT NULL,
  actor jsonb,
  details jsonb,
  timestamp timestamptz NOT NULL
);

Because runtime tier is just metadata, you can query “all exec events across gVisor/Kata/Wasm for tenant X” uniformly.

4.2 Common telemetry stack

Instrument all runtimes with the same observability layer:

Logs:
- Use a cluster-wide log collector (Fluent Bit, Vector, OpenTelemetry Collector) to ship container stdout/stderr to a common log backend (e.g., Loki, Elasticsearch, GCP Logging).⁸⁹
- Use consistent labels: tenant_id, project_id, sandbox_id, runtime_tier.
Metrics:
- Expose K8s metrics (CPU/mem usage per Pod) and Wasm executor metrics via Prometheus.
- Implement per-sandbox metrics: CPU seconds, memory peak, exec duration.
Traces:
- Instrument /sandboxes API, controller, and executor with OpenTelemetry spans, including attributes like coditect.runtime_tier.

4.3 Quota enforcement

Implement a quota service used by the API controller before creating or executing a sandbox, regardless of runtime:

Per-tenant limits:
- max_concurrent_sandboxes[gvisor], max_concurrent_sandboxes[kata-fc], max_concurrent_sandboxes[wasm].
- Daily CPU/GPU time per tier.

Flow on POST /sandboxes:

Controller computes runtime_tier.
Calls quota_service.reserve(tenant_id, runtime_tier, requested_limits) (atomic).
If accepted, create Pod or logical sandbox; on failure, return 403/429.
On sandbox termination, the controller calls quota_service.release(...).

Because this is tier-agnostic (only keyed by runtime_tier), quotas work similarly whether the sandbox is a Pod using gVisor/Kata or a Wasm module in the executor.

5. High-level sequence for `/sandboxes` with tiered runtimes

Client ──POST /sandboxes──────────────────────────► API Gateway
       (risk_profile, runtime_hint, workload_type)
Gateway ──JWT validate, tenant/project lookup─────► Sandbox Controller
Controller ──compute runtime tier─────────────────► Policy Engine
           ◄──────────────────────────────────────
Controller ──quota.reserve(tenant, tier, limits)─► Quota Service
           ◄──ok / reject─────────────────────────
[if rejected → 403]

[if tier == gvisor or kata-fc]
Controller ──create Pod with runtimeClassName────► Kubernetes API
           ◄──Pod object──────────────────────────
Controller ──record in metadata & audit_log──────► DB/Event Bus

[if tier == wasm]
Controller ──create logical sandbox in Wasm exec► Wasm Executor
           ◄──sandbox_id─────────────────────────
Controller ──record in metadata & audit_log──────► DB/Event Bus

Controller ──201 {sandbox_id, runtime_tier,...}─► Client

Exec, snapshot, and destroy operations follow the same pattern but call either the Kubernetes API (for gVisor/Kata pods) or the Wasm executor’s API, while always writing to shared audit and using shared quota bookkeeping.

If you want, the next iteration can focus on:

A CRD-based design (Sandbox, SandboxExec) instead of direct Pod creation, or
A concrete OpenAPI 3.1 spec extension that exposes runtime_tier and policy decisions back to the client for debugging and observability. ⁹⁰⁹¹

⁂

Define security and performance tradeoffs for each runtime tier

Example RuntimeClass YAML for gVisor Kata and Wasmtime tiers How to configure node pools for Firecracker and Wasm workloads Admission controller policy to enforce /sandboxes runtime selection Pod security considerations when mixing runtimes in one cluster

Security and performance differ by tier; you’ll enforce them with RuntimeClasses, node pools, and admission policies that ensure each sandbox lands on the right infrastructure with consistent guardrails.⁹²⁹³⁹⁴⁹⁵⁹⁶⁹⁷

Security and performance tradeoffs by tier

gVisor tier

Security
- Strength: Higher than plain containers (user-space kernel “Sentry” intercepts syscalls), reducing host-kernel attack surface.⁹⁸⁹³
- Weakness: Still shares host kernel; not as strong as VM/microVM isolation if gVisor or host kernel are compromised.⁹⁵⁹⁸
- Good use: Semi-trusted or internal multi-tenant workloads where container-level semantics are required.⁹³⁹⁵
Performance
- Overhead: Syscall-heavy or I/O-heavy workloads see higher latency because every syscall passes through the user-space kernel.⁹⁹⁹³
- Startup: Very close to container startup; negligible extra cold start vs runc.⁹³

Kata Containers + Firecracker tier

Security
- Strength: Each pod runs inside a lightweight VM with its own guest kernel, providing VM-grade, hardware-backed isolation.⁹⁶¹⁰⁰⁹⁸
- Good use: Untrusted/public code execution and strong tenant isolation scenarios.⁹⁸⁹⁵⁹⁶
Performance
- Overhead: Higher memory and CPU overhead per sandbox vs gVisor; you pay for booting a microVM and guest kernel.¹⁰⁰⁹⁶
- Startup: Faster than traditional VMs, but slower than containers; firecracker-containerd and pre-warmed VMs help mitigate cold start.¹⁰¹¹⁰²

Wasmtime / Wasm tier

Security
- Strength: Strong in-process isolation—linear memory and capability-based host APIs; no POSIX/host kernel surface unless explicitly exposed.¹⁰³⁹⁷
- Weakness: Only covers code that can be compiled to Wasm/WASI; any unsafe host APIs you expose become the main attack surface.⁹⁷¹⁰³
Performance
- Overhead: Extremely low cold start for simple functions (few–tens of ms vs tens–hundreds of ms for microVMs).¹⁰³
- Limitations: Complex workloads with large dependency graphs and heavy compute can see slower cold starts and execution than microVM-based setups.¹⁰⁴¹⁰³

Example RuntimeClass YAML for gVisor, Kata, and Wasm

gVisor RuntimeClass

Use runsc as handler; optionally restrict to sandboxes node pool via scheduling.nodeSelector.¹⁰⁵¹⁰⁶⁹²⁹⁵

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: gvisor
handler: runsc
scheduling:
  nodeSelector:
    coditect.io/node-profile: gvisor
  tolerations:
    - key: "coditect.io/sandbox"
      operator: "Equal"
      value: "gvisor"
      effect: "NoSchedule"

Kata + Firecracker RuntimeClass

Use Kata handler configured to use Firecracker as VMM, and tie to a dedicated node pool.¹⁰²¹⁰⁷¹⁰⁸¹⁰⁵

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: kata-fc
handler: kata-fc
scheduling:
  nodeSelector:
    coditect.io/node-profile: kata-firecracker
  tolerations:
    - key: "coditect.io/sandbox"
      operator: "Equal"
      value: "kata-fc"
      effect: "NoSchedule"

Wasm RuntimeClass (if using Wasm containers)

If you’re using a Wasm-aware containerd shim (per CNCF practice guide), define a RuntimeClass that points to the Wasm handler and node pool.¹⁰⁹⁹⁷

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: wasm-wasmtime
handler: wasmtime
scheduling:
  nodeSelector:
    coditect.io/node-profile: wasm
  tolerations:
    - key: "coditect.io/sandbox"
      operator: "Equal"
      value: "wasm"
      effect: "NoSchedule"

If instead you use a Wasm executor Deployment (sidecar or service), you won’t need a RuntimeClass; the tier is enforced via your control plane.

Node pool configuration for Firecracker and Wasm workloads

Firecracker/Kata nodes

Label and taint nodes to ensure only Kata/Firecracker sandboxes land there:¹¹⁰⁹²¹⁰²
- Labels: coditect.io/node-profile=kata-firecracker
- Taints: coditect.io/sandbox=kata-fc:NoSchedule
Configure containerd on those nodes with kata-fc runtime pointing to Kata configured for Firecracker:¹⁰⁷¹⁰⁸¹⁰²
- containerd.toml with plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-fc.
Capacity planning: fewer but larger nodes, since microVM overhead per sandbox is higher; account for guest OS memory and disk.

Wasm nodes

Option A – Wasm runtimeClass: nodes with Wasm-aware container runtime:⁹⁷¹⁰⁹
- Labels: coditect.io/node-profile=wasm
- Taints: coditect.io/sandbox=wasm:NoSchedule
- containerd configured with a wasmtime/wasmedge runtime handler.
Option B – Wasm executor pool: generic nodes running Wasm executor pods.
- Use node labels for CPU-optimized nodes (node.kubernetes.io/instance-type=c3-highcpu) and schedule Wasm executors there.⁹⁷

For both tiers, you keep compute isolation by not mixing high-risk runtimes with general workloads on the same nodes.

Admission controller policy for enforcing `/sandboxes` runtime

Implement a ValidatingAdmissionWebhook (plus optional MutatingAdmissionWebhook) that:

Only allows Coditect sandbox controller to set runtimeClassName.
Enforces mapping between sandbox labels/annotations and RuntimeClass.⁹⁴¹¹¹¹¹²¹¹³

Mutating webhook (optional)

If devs create Pods directly (internal tools), you can auto-set RuntimeClass based on labels or namespace:

On Pod CREATE:
- If label coditect.io/sandbox-runtime=gvisor, set .spec.runtimeClassName = "gvisor".
- Same for kata-fc and wasm-wasmtime.

Validating webhook (strong enforcement)

Reject pods that:
- Have runtimeClassName set but lack coditect.io/sandbox-id label (prevents arbitrary use of Kata/gVisor).¹¹³
- Use a RuntimeClass that doesn’t match allowed values for the namespace/tenant.
Ensure nodeSelector compatibility: intersect node selectors with the chosen RuntimeClass’s scheduling nodeSelector, reject on conflict.⁹²⁹⁴

This guarantees only the Coditect control plane can allocate expensive, high-isolation runtimes and binds runtime selection to your /sandboxes policy engine.

Pod security considerations when mixing runtimes

When multiple runtimes share a cluster, lock down Pod specs and namespaces so that invariants hold across all tiers.¹¹⁴⁹⁵⁹⁸

Pod Security / SecurityContext
- Enforce non-root, drop all capabilities, no privileged or hostPath mounts, and no host network for all sandbox pods via Namespace-level Pod Security Standards or OPA/Gatekeeper/ValidatingAdmissionPolicy.¹¹⁴
- Disallow hostPID, hostIPC, and hostNetwork for sandbox pods.⁹⁵¹¹⁴
Namespace isolation
- Use dedicated namespaces per tenant or per tier (e.g., coditect-gvisor, coditect-kata, coditect-wasm).
- Apply tighter NetworkPolicies and ResourceQuotas per namespace.¹¹⁴
Runtime isolation
- Do not allow regular workloads to use high-security RuntimeClasses; enforce via admission.¹¹³
- Ensure runtime-specific node pools are not running cluster control plane components or shared databases to reduce blast radius.
Observability and secrets
- Use sidecarless logging (node-level collectors) to avoid sharing sidecars between runtimes.¹¹⁵
- Scope Kubernetes secrets to namespaces and bind them to specific service accounts used by sandbox pods; enforce via RBAC and admission.¹¹¹¹¹⁴

If you want, the next iteration can be a concrete ValidatingAdmissionPolicy or Gatekeeper constraint set that encodes: “only Coditect controller service account may create pods with runtimeClassName in {gvisor, kata-fc, wasm-wasmtime}, and all such pods must have sandbox labels and non-privileged securityContext.” ¹¹⁶

⁂

WHICH OF THESE COULD RUN MULTI-USER MULTI-TENANT MULTI-TEAM MULTI-PROJECT WORKFLOWS on GOOGLE CLOUD WORKSTATIONS economically?

For multi-user / multi-tenant / multi-team workflows on Google Cloud Workstations, the most economical and operationally straightforward option from your list is gVisor-based containers, with Wasm as a niche add‑on for specific tools; Kata+Firecracker is possible but significantly more complex and costly on GCP.¹¹⁷¹¹⁸¹¹⁹¹²⁰¹²¹¹²²

gVisor on GCP (best economic fit)

Google already runs gVisor at scale for Cloud Run, Cloud Functions, App Engine, and GKE Sandbox, which shows it works well for arbitrary user workloads with good compatibility.¹¹⁸¹²³¹¹⁹¹¹⁷
Cloud Workstations are just GCE VMs; you can run runsc there or on a companion GKE cluster, giving each user/team “nested” sandboxes without extra hypervisor layers.¹²⁴¹²²
No need for nested virtualization; gVisor works on VMs or bare metal and is explicitly documented as “cloud ready” for all major providers.¹²²

This makes gVisor the most economical and least operationally risky choice for embedding many Coditect sandboxes into a shared pool of Google Cloud Workstations (per-team dev VMs), while still adding a strong isolation boundary around untrusted agent code.¹¹⁹¹¹⁷¹¹⁸¹²²

Kata Containers + Firecracker on GCP (possible, but heavier)

Kata requires hardware virtualization (KVM) in the underlying node; on GKE/GCE, you must customize node images or enable nested virtualization to run Kata+Firecracker, which is not available by default.¹²⁰¹²¹¹²⁵
Northflank’s case study notes extra engineering effort to provision Kata on Google Cloud because standard GKE nodes don’t expose KVM; they built tooling for custom images with virtualization enabled.¹²⁰
Firecracker on GKE has been demonstrated by running Kata Containers with Firecracker as the VMM, but again this relies on KVM and custom configuration on GCE nodes.¹²¹¹²⁶¹²⁷

So while Kata+Firecracker can give you strong isolation for some high-risk multi-tenant workflows, it is less economical and more complex than gVisor on standard Workstations/VMs, and you’d likely reserve it for a smaller pool of critical workloads rather than the general multi-user Workstations fleet.¹²⁸¹²⁵¹²¹¹²⁰

Wasm runtimes on GCP (targeted, very cheap per-tool)

GCP already exposes Wasm at the edge via Service Extensions (GCPWasmPlugin) and Artifact Registry‑hosted Wasm modules for load balancers.¹²⁹¹³⁰
You can also run WasmEdge/Wasmtime inside containers on GCE/GKE/Workstations, which is light-weight and doesn’t require nested virtualization; good for per-tenant “tools” rather than full dev workflows.¹³¹¹³²¹³³

Economically, Wasm is ideal for capability-scoped tools (linting, small transforms) that Coditect calls from Workstations, but not for full multi-tenant Linux workflows (no full distro, limited POSIX).¹³³¹³⁴¹³¹

Practical recommendation for Coditect on Google Cloud Workstations

Use gVisor as the default “Modal-like” sandbox layer inside or alongside Workstations:
- Run a gVisor-enabled container runtime (runsc) on shared GCE nodes or a GKE cluster attached to the same VPC as Workstations.¹¹⁷¹¹⁹¹²⁴
- Treat each user/team/project workflow as a gVisor sandbox, with multi-tenant isolation and quotas enforced at the Coditect layer.
Optionally add:
- Wasm runtimes for scoped tools and plugins (extremely cheap, safe, and portable).¹³⁰¹²⁹¹³³
- A small, specialized Kata+Firecracker node pool if you later need VM-grade isolation for the most sensitive multi-tenant workloads and are willing to handle nested virtualization and custom images on GCE.¹²⁵¹²¹¹²⁰

If you want, next step could be: “Design a Coditect + Google Cloud Workstations reference architecture using gVisor for in-VM sandboxes (no Kubernetes), including how to provision per-team Workstations, run nested containers with runsc, and enforce tenant/project quotas.” ¹³⁵¹³⁶

⁂

Design a Coditect + Google Cloud Workstations reference architecture using gVisor for in-VM sandboxes (no Kubernetes), including how to provision per-team Workstations, run nested containers with runsc, and enforce tenant/project quotas. Practical recommendation for Coditect on Google Cloud Workstations

Use gVisor as the default “Modal-like” sandbox layer inside or alongside Workstations: Run a gVisor-enabled container runtime (runsc) on shared GCE nodes or a GKE cluster attached to the same VPC as Workstations. Treat each user/team/project workflow as a gVisor sandbox, with multi-tenant isolation and quotas enforced at the Coditect layer.

Use Google Cloud Workstations as “fat agent hosts” and run gVisor (runsc) inside each VM to provide a Modal‑like sandbox for multi-tenant workflows, with Coditect as the control plane enforcing policies and quotas.¹³⁷¹³⁸¹³⁹¹⁴⁰¹⁴¹

High-level architecture

Per-team Workstations
- Each team gets one or more Google Cloud Workstations (backed by GCE VMs) in a shared “Coditect Workstations” project/VPC.
- Workstations run Docker or containerd configured with runsc as an additional OCI runtime for sandboxed workloads.¹³⁸¹³⁹¹⁴⁰
In-VM gVisor sandboxes
- Agent code and tools run inside containers launched with the runsc runtime (not plain runc), giving a gVisor user-space kernel boundary inside the Workstation VM.¹⁴¹¹³⁷¹³⁸
- Each Coditect sandbox = one gVisor container, with per-sandbox CPU/mem/time limits applied via cgroups and Coditect’s control plane.
Coditect control plane (central services)
- Hosted on GCE or GKE in the same VPC; exposes /sandboxes API, manages Workstation registration, scheduling, quotas, and audit logs.
- Workstations run a Coditect agent that pulls/receives sandbox tasks, launches runsc containers, streams logs, and reports resource usage.

This avoids Kubernetes entirely for the inner sandboxing, leveraging gVisor’s “runs anywhere existing container tooling does” property.¹³⁹¹³⁸¹⁴¹

Provisioning per-team Workstations with gVisor

Base image / template
- Start from a Linux Workstation image (e.g., Container‑Optimized OS or Ubuntu with Docker preinstalled).
- Install gVisor runsc following the official installation guide: apt-get install -y runsc or by using the install script; then run runsc install to integrate with Docker/containerd.¹⁴⁰¹³⁹
Docker/containerd config
- Add runsc as a runtime in /etc/docker/daemon.json (or containerd config):¹³⁸¹³⁹¹⁴⁰

{
  "runtimes": {
    "runsc": {
      "path": "/usr/bin/runsc"
    }
  },
  "default-runtime": "runc"
}

- Restart Docker/Containerd. After this, `docker run --runtime=runsc ...` will launch a gVisor sandbox container.[^9_3][^9_4][^9_2]

3. Workstation registration with Coditect - On first boot, a Coditect agent on the Workstation: - Registers itself to Coditect control plane with metadata (team, tenant, capabilities, vCPU/RAM). - Opens a secure gRPC/WebSocket connection for task dispatch and health reporting. 4. Per-team isolation - Map Workstations to tenants/teams using labels and IAM (e.g., each Workstation has a Coditect “worker_id” and “team_id”). - Optionally run multiple tenants per Workstation, but rely on gVisor sandboxes + user-level ACLs for separation.¹⁴¹¹³⁸

Running nested containers with `runsc` (Modal-like behavior)

On each Workstation, the Coditect agent:

Receives a CreateSandbox RPC with: tenant, project, image, resources, and network policy.
Executes a gVisor container:

docker run \
  --runtime=runsc \
  --cpus=1.0 \
  --memory=1g \
  --read-only \
  --network=none \
  --name coditect-sb-$SANDBOX_ID \
  -v /workspaces/$TEAM/$PROJECT:/workspace:rw \
  ghcr.io/coditect/agent-runtime:latest \
  sleep infinity

- `runsc` enforces a user-space kernel boundary inside the VM.[^9_1][^9_2][^9_5]
- Use per-sandbox volumes for project data; keep container rootfs ephemeral.

3. For exec operations, the agent uses docker exec against the running gVisor container to run commands and stream stdout/stderr back to the Coditect control plane. 4. On destroy, the agent stops and removes the container and cleans up any ephemeral volumes or scratch space.

This is the same pattern Google documentation suggests for “run untrusted binaries with gVisor inside your own container infrastructure.”¹³⁷¹³⁸

Tenant/project quota enforcement

Coditect’s control plane maintains a quota service and metadata DB independent of Workstations:

Per-tenant and per-project quotas
- Max concurrent sandboxes.
- vCPU and memory budgets (e.g., vCPU‑seconds, GiB‑hours) per time window.
- Optional GPU quotas (if Workstations have GPUs and gVisor GPU access is configured).¹⁴²¹⁴³
Lifecycle flow

Client calls POST /sandboxes.
Control plane checks quotas in central DB; if OK, allocates a sandbox ID and assigns it to a Workstation with available capacity.
Workstation agent launches the runsc container and periodically reports usage (CPU, mem, wall time).
On completion or timeout, agent reports final metrics; control plane decrements quota and writes an audit event.

Enforcement
- Control plane will refuse new sandboxes when quotas are exceeded.
- Agents enforce per-container CPU/memory limits (cgroups) and kill long‑running sandboxes after their allowed time.

This keeps quotas consistent across all Workstations and decouples enforcement from individual VMs.

Security and isolation notes

gVisor boundary
- gVisor intercepts syscalls via runsc, providing an extra kernel between sandboxed workloads and the Workstation’s host kernel.¹⁴⁴¹³⁸¹⁴¹
- Appropriate for running arbitrary code from multiple tenants and teams with better protection than plain containers, but not as strong as full microVM isolation.¹⁴³¹⁴⁴¹³⁸
VM boundary (Workstation)
- Each Workstation is itself a GCE VM, so you also have a hypervisor boundary separating teams/workflows across different Workstations.
- For especially sensitive tenants, run them on dedicated Workstations instead of mixing them on shared ones.
OS hardening
- Run minimal OS images, restrict SSH/users, and treat Workstations as “managed worker nodes” controlled by Coditect automation.

Practical Coditect recommendation on Google Cloud Workstations

Default: Use gVisor (runsc) inside Workstations as the default sandbox runtime for agent workflows:
- Good enough isolation for multi-user/multi-project use, with much lower cost and complexity than microVM solutions, and fully supported by Google’s stack.¹⁴⁴¹³⁸¹⁴¹
Control plane: Central Coditect service manages /sandboxes, quotas, routing to Workstations, and logs/metrics.
Workstation agents: Lightweight daemon per VM that:
- Launches runsc containers.
- Streams outputs and metrics.
- Enforces local CPU/mem/time limits and cleanup.

If you want, next step can be:

“Draft a minimal gRPC API between Coditect control plane and the Workstation agent for creating, executing, and destroying gVisor sandboxes, including streaming stdout/stderr and resource usage metrics.” ¹⁴⁵¹⁴⁶¹⁴⁷¹⁴⁸¹⁴⁹¹⁵⁰¹⁵¹

⁂

Draft a minimal gRPC API between Coditect control plane and the Workstation agent for creating, executing, and destroying gVisor sandboxes, including streaming stdout/stderr and resource usage metrics DESIGN multi-user individual user data persistence, individual projects backed up to github in near real time as development proceeds in a project management CODITECT environment running on top of Anthropic CLAUDE CODE and GOOGLE GEMINIE

A minimal control-plane↔Workstation gRPC API plus project-scoped Git workflows lets Coditect run multi-user sandboxes on Workstations while persisting each user’s work to GitHub in near–real time.¹⁵²¹⁵³¹⁵⁴¹⁵⁵¹⁵⁶

1. Minimal gRPC API (control plane ↔ agent)

Service definitions (proto sketch)

syntax = "proto3";

package coditect.sandbox.v1;

service SandboxAgent {
  rpc CreateSandbox(CreateSandboxRequest) returns (CreateSandboxResponse);
  rpc ExecSandbox(stream ExecRequest) returns (stream ExecEvent);
  rpc DestroySandbox(DestroySandboxRequest) returns (DestroySandboxResponse);
  rpc ReportMetrics(stream MetricsReport) returns (MetricsAcknowledge);
}

// --- Core messages ---

message CreateSandboxRequest {
  string sandbox_id = 1;
  string tenant_id = 2;
  string project_id = 3;
  string image = 4;
  repeated string command = 5;        // optional initial command
  double cpu_limit = 6;               // cores
  int64 memory_limit_mib = 7;
  int64 idle_timeout_seconds = 8;
  string workspace_path = 9;          // e.g. /workspaces/tenant/project/user
  map<string,string> env = 10;
}

message CreateSandboxResponse {
  string sandbox_id = 1;
  string container_id = 2;
  string status = 3;                  // CREATED | FAILED
  string error_message = 4;
}

message ExecRequest {
  string sandbox_id = 1;
  oneof payload {
    ExecStart start = 2;
    ExecStdin stdin = 3;
    ExecControl control = 4;
  }
}

message ExecStart {
  repeated string command = 1;
  string workdir = 2;
  int64 timeout_seconds = 3;
  bool pty = 4;
}

message ExecStdin {
  bytes data = 1;
}

message ExecControl {
  bool terminate = 1;
}

message ExecEvent {
  string sandbox_id = 1;
  oneof event {
    ExecStdout stdout = 2;
    ExecStderr stderr = 3;
    ExecExit   exit   = 4;
    ExecStatus status = 5;
  }
}

message ExecStdout { bytes data = 1; }
message ExecStderr { bytes data = 1; }
message ExecExit {
  int32 exit_code = 1;
  string reason = 2; // NORMAL | TIMEOUT | KILLED
}
message ExecStatus {
  string phase = 1;  // STARTING | RUNNING | DONE | ERROR
  string message = 2;
}

message DestroySandboxRequest {
  string sandbox_id = 1;
  bool force = 2;
}

message DestroySandboxResponse {
  string sandbox_id = 1;
  string status = 2; // TERMINATED | NOT_FOUND | ERROR
}

// --- Metrics streaming ---

message MetricsReport {
  string sandbox_id = 1;
  string tenant_id = 2;
  string project_id = 3;
  int64 timestamp_unix_ms = 4;
  double cpu_seconds = 5;
  int64 memory_peak_mib = 6;
  int64 bytes_read = 7;
  int64 bytes_written = 8;
}

message MetricsAcknowledge {
  bool ok = 1;
}

CreateSandbox → agent does docker run --runtime=runsc ... on the Workstation.¹⁵⁴¹⁵⁵¹⁵⁶
ExecSandbox is bidirectional streaming: control plane sends start/stdin, agent streams stdout/stderr/exit/status.¹⁵³¹⁵⁷¹⁵²
ReportMetrics is a long-lived stream where the agent periodically pushes resource usage (derived from cgroups and container stats) for quota/billing.¹⁵⁸¹⁵⁹¹⁶⁰

2. Agent behavior on Workstation (gVisor inside VM)

CreateSandbox:
- docker run --runtime=runsc with CPU/mem limits and mounted user workspace.¹⁵⁶¹⁶¹¹⁵⁴
- Enforce network=none or egress-locked config for untrusted code.
ExecSandbox (stream):
- On ExecStart, run docker exec (optionally with a PTY) and hook process stdout/stderr to the gRPC stream back to the control plane.¹⁵⁷¹⁵²¹⁵³
- On ExecStdin, write to process stdin.
- On timeout or ExecControl.terminate, kill the process and report ExecExit with reason.
Metrics:
- Use docker stats/cgroup FS to sample per-container CPU/bytes/memory and send MetricsReport every N seconds.¹⁵⁹¹⁶⁰¹⁵⁸

3. Multi-user data model and persistence

Workspace layout (per user / project)

On each Workstation VM:

- Root: `/workspaces/<tenant>/<user>/<project>/`

src/ – working tree checked out from GitHub.
.coditect/ – agent metadata, run logs, temp artifacts.
venv/ or envs/ – optional per-project deps.

A gVisor sandbox mounts this path into the container:

docker run --runtime=runsc \
  -v /workspaces/$TENANT/$USER/$PROJECT:/workspace \
  -w /workspace/src \
  ghcr.io/coditect/agent-runtime:latest \
  sleep infinity

Each sandbox therefore operates directly on the user’s Git checkout, so Git becomes the persistence boundary.

Git + GitHub near–real time

In each /workspace/src:

Initialize Git with origin GitHub repo (per project).
Run a small Coditect sync daemon (inside Workstation or as part of the Coditect agent) that:
- Watches for filesystem changes (inotify) and triggers:

- `git add` + `git commit` with machine-generated messages on a **shadow branch** (`coditect/autosave/<user>/<date>`).  

    - `git push` to GitHub over a PAT or GitHub App installation token.
- Batches frequent changes (e.g., commit/push every 15–30 seconds when there are changes).

This yields “near–real time” backups:

If a user’s Workstation dies, their work since last commit exists in GitHub on the autosave branch.
Human commits (PRs) remain under user control; the autosave branch can be squashed or used for recovery.

You can optionally run a centralized backup job that mirrors GitHub org repos to object storage or another Git remote for compliance.¹⁶²¹⁶³¹⁶⁴

4. Coditect project management + agents (Claude/Gemini)

Project state

In the Coditect control plane:

Tenant → Team → User → Project.
For each project:
- GitHub repo URL + default branch.
- Mapping to one or more Workstations and gVisor sandboxes.
- Current “session” state: active sandboxes, last Claude/Gemini runs, metrics, etc.

AI-assisted workflows

Claude Code and Google Gemini operate on project state stored in GitHub and metadata from Coditect:
- When a user requests “refactor module X”, Coditect:
  - Ensures project workspace is up to date (Git pull).
  - Calls Claude Code / Gemini with repo snapshot + context.
  - Writes generated changes directly into the workspace.
  - gVisor sandbox runs tests / linters; outputs streamed via ExecSandbox.
  - Autosave daemon captures changes and pushes to GitHub autosave branch.
Coditect stores run metadata (prompt, models, files touched, test results) in .coditect/ and central DB, linking to Git commit hashes so all AI-driven changes are auditable.

5. Quotas, multi-tenancy, and economics

Quotas in control plane
- Per tenant/project: max concurrent sandboxes, max vCPU/memory, and daily CPU‑seconds.
- Per user: sandbox count and total runtime to prevent abuse.
Enforcement
- Before sending CreateSandbox over gRPC, Coditect checks quotas and only dispatches if under limit.
- MetricsReport streams feed usage accounting; Coditect updates counters and can proactively kill/deny new sandboxes if limits exceeded.¹⁶⁰¹⁵⁸¹⁵⁹
Economics
- On GCP, you pay primarily for Workstation VMs; gVisor just adds software isolation inside those VMs, which is cheaper than microVM stacks while still significantly improving security for untrusted code.¹⁵⁵¹⁶⁵¹⁶⁶¹⁵⁴

If you want next, a focused proto + Go server stub for SandboxAgent (with docker/runsc integration) or a concrete Git autosave strategy (branch naming, commit messages, conflict handling) can be sketched. ¹⁶⁷¹⁶⁸¹⁶⁹¹⁷⁰¹⁷¹¹⁷²

⁂

I want next, a focused proto + Go server stub for SandboxAgent (with docker/runsc integration) or a concrete Git autosave strategy (branch naming, commit messages, conflict handling) can be sketched. Show a protobuf definition for the control plane API messages and services

Example Go server and agent implementation for the gRPC API Authentication and authorization patterns for this control plane How to stream stdout/stderr reliably over unstable networks Best practices for gVisor sandbox lifecycle management

A minimal, production-friendly design is: a protobuf-defined control-plane API, a Go agent that wraps docker run --runtime=runsc, mTLS for authn/z, streaming with backpressure and resume semantics, and disciplined sandbox lifecycle policies.¹⁷³¹⁷⁴¹⁷⁵¹⁷⁶¹⁷⁷¹⁷⁸

1. Protobuf: control plane ↔ agent API

syntax = "proto3";

package coditect.sandbox.v1;

option go_package = "github.com/coditect/platform/sandboxpb";

// Control plane <-> workstation agent
service SandboxAgent {
  // One-shot create
  rpc CreateSandbox(CreateSandboxRequest) returns (CreateSandboxResponse);

  // Bidirectional exec stream: control plane sends start/stdin/control;
  // agent streams stdout/stderr/status/exit events.
  rpc ExecSandbox(stream ExecRequest) returns (stream ExecEvent);

  // Destroy sandbox container + cleanup
  rpc DestroySandbox(DestroySandboxRequest) returns (DestroySandboxResponse);

  // Telemetry stream from agent to control plane
  rpc ReportMetrics(stream MetricsReport) returns (MetricsAcknowledge);
}

// ---------- Create / Destroy ----------

message CreateSandboxRequest {
  string sandbox_id    = 1;
  string tenant_id     = 2;
  string project_id    = 3;
  string image         = 4;
  string workspace_path= 5;  // host path in workstation VM
  double cpu_limit     = 6;  // cores
  int64  memory_limit_mib = 7;
  int64  idle_timeout_seconds = 8;
  map<string,string> env = 9;
}

message CreateSandboxResponse {
  string sandbox_id  = 1;
  string container_id= 2;
  string status      = 3; // CREATED | FAILED
  string error       = 4;
}

message DestroySandboxRequest {
  string sandbox_id   = 1;
  bool   force        = 2;
}

message DestroySandboxResponse {
  string sandbox_id   = 1;
  string status       = 2; // TERMINATED | NOT_FOUND | ERROR
  string error        = 3;
}

// ---------- Exec streaming ----------

message ExecRequest {
  string sandbox_id = 1;
  oneof payload {
    ExecStart   start   = 2;
    ExecStdin   stdin   = 3;
    ExecControl control = 4;
  }
}

message ExecStart {
  repeated string command = 1;
  string workdir          = 2;
  int64  timeout_seconds  = 3;
  bool   pty              = 4;
}

message ExecStdin {
  bytes data = 1;
}

message ExecControl {
  bool terminate = 1;
}

message ExecEvent {
  string sandbox_id = 1;
  oneof event {
    ExecStdout stdout = 2;
    ExecStderr stderr = 3;
    ExecExit   exit   = 4;
    ExecStatus status = 5;
  }
}

message ExecStdout { bytes data = 1; }
message ExecStderr { bytes data = 1; }

message ExecExit {
  int32 exit_code = 1;
  string reason   = 2; // NORMAL | TIMEOUT | KILLED | INTERNAL_ERROR
}

message ExecStatus {
  string phase    = 1; // STARTING | RUNNING | DONE | ERROR
  string message  = 2;
}

// ---------- Metrics streaming ----------

message MetricsReport {
  string sandbox_id = 1;
  string tenant_id  = 2;
  string project_id = 3;
  int64  timestamp_unix_ms = 4;
  double cpu_seconds       = 5;
  int64  memory_peak_mib   = 6;
  int64  bytes_read        = 7;
  int64  bytes_written     = 8;
}

message MetricsAcknowledge {
  bool ok      = 1;
  string error = 2;
}

Shapes match gRPC streaming best practices (bidirectional Exec, client-stream metrics).¹⁷⁹¹⁸⁰¹⁸¹¹⁷³

2. Go agent skeleton with `docker` + `runsc`

Below is a condensed agent implementation sketch using Go gRPC; it assumes Docker is configured with a runsc runtime.¹⁸²¹⁷⁵¹⁷⁸¹⁸³

// go:build workstation-agent

package main

import (
	"context"
	"io"
	"log"
	"net"
	"os/exec"
	"time"

	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials"
	"google.golang.org/grpc/peer"

	pb "github.com/coditect/platform/sandboxpb"
)

type agentServer struct {
	pb.UnimplementedSandboxAgentServer
}

func (s *agentServer) CreateSandbox(ctx context.Context, req *pb.CreateSandboxRequest) (*pb.CreateSandboxResponse, error) {
	containerName := "coditect-sb-" + req.SandboxId

	args := []string{
		"run", "-d",
		"--runtime=runsc", // gVisor runtime [web:116][web:142]
		"--cpus", formatCPU(req.CpuLimit),
		"--memory", formatMem(req.MemoryLimitMib),
		"--name", containerName,
		"--network", "none",
		"--read-only",
		"-v", req.WorkspacePath + ":/workspace",
	}
	for k, v := range req.Env {
		args = append(args, "-e", k+"="+v)
	}
	args = append(args, req.Image, "sleep", "infinity")

	cmd := exec.CommandContext(ctx, "docker", args...)
	out, err := cmd.CombinedOutput()
	if err != nil {
		return &pb.CreateSandboxResponse{
			SandboxId: req.SandboxId,
			Status:    "FAILED",
			Error:     string(out),
		}, nil
	}

	return &pb.CreateSandboxResponse{
		SandboxId:   req.SandboxId,
		ContainerId: containerName,
		Status:      "CREATED",
	}, nil
}

func (s *agentServer) ExecSandbox(stream pb.SandboxAgent_ExecSandboxServer) error {
	// Receive first message with ExecStart
	first, err := stream.Recv()
	if err != nil {
		return err
	}
	start := first.GetStart()
	if start == nil {
		return io.EOF
	}
	sandboxID := first.SandboxId
	containerName := "coditect-sb-" + sandboxID

	ctx := stream.Context()
	cmdArgs := append([]string{"exec"}, containerName)
	if start.Pty {
		cmdArgs = append(cmdArgs, "-it")
	}
	cmdArgs = append(cmdArgs, start.Command...)

	cmd := exec.CommandContext(ctx, "docker", cmdArgs...)
	stdin, _ := cmd.StdinPipe()
	stdout, _ := cmd.StdoutPipe()
	stderr, _ := cmd.StderrPipe()

	if err := cmd.Start(); err != nil {
		stream.Send(&pb.ExecEvent{
			SandboxId: sandboxID,
			Event: &pb.ExecEvent_Status{
				Status: &pb.ExecStatus{
					Phase:   "ERROR",
					Message: err.Error(),
				},
			},
		})
		return nil
	}

	// Goroutine: forward stdout
	go func() {
		buf := make([]byte, 32*1024)
		for {
			n, readErr := stdout.Read(buf)
			if n > 0 {
				_ = stream.Send(&pb.ExecEvent{
					SandboxId: sandboxID,
					Event: &pb.ExecEvent_Stdout{
						Stdout: &pb.ExecStdout{Data: buf[:n]},
					},
				})
			}
			if readErr != nil {
				return
			}
		}
	}()

	// Goroutine: forward stderr
	go func() {
		buf := make([]byte, 32*1024)
		for {
			n, readErr := stderr.Read(buf)
			if n > 0 {
				_ = stream.Send(&pb.ExecEvent{
					SandboxId: sandboxID,
					Event: &pb.ExecEvent_Stderr{
						Stderr: &pb.ExecStderr{Data: buf[:n]},
					},
				})
			}
			if readErr != nil {
				return
			}
		}
	}()

	// Receive stdin/control
	for {
		msg, recvErr := stream.Recv()
		if recvErr == io.EOF {
			break
		}
		if recvErr != nil {
			break
		}
		switch p := msg.Payload.(type) {
		case *pb.ExecRequest_Stdin:
			if _, err := stdin.Write(p.Stdin.Data); err != nil {
				// ignore, process might be done
			}
		case *pb.ExecRequest_Control:
			if p.Control.Terminate {
				_ = cmd.Process.Kill()
			}
		}
	}

	waitErr := cmd.Wait()
	exitCode := 0
	reason := "NORMAL"
	if waitErr != nil {
		// Map to exit code and reason...
		reason = "ERROR"
	}
	return stream.Send(&pb.ExecEvent{
		SandboxId: sandboxID,
		Event: &pb.ExecEvent_Exit{
			Exit: &pb.ExecExit{
				ExitCode: int32(exitCode),
				Reason:   reason,
			},
		},
	})
}

func (s *agentServer) DestroySandbox(ctx context.Context, req *pb.DestroySandboxRequest) (*pb.DestroySandboxResponse, error) {
	containerName := "coditect-sb-" + req.SandboxId
	cmd := exec.CommandContext(ctx, "docker", "rm", "-f", containerName)
	out, err := cmd.CombinedOutput()
	if err != nil {
		return &pb.DestroySandboxResponse{
			SandboxId: req.SandboxId,
			Status:    "ERROR",
			Error:     string(out),
		}, nil
	}
	return &pb.DestroySandboxResponse{
		SandboxId: req.SandboxId,
		Status:    "TERMINATED",
	}, nil
}

func (s *agentServer) ReportMetrics(stream pb.SandboxAgent_ReportMetricsServer) error {
	for {
		report, err := stream.Recv()
		if err == io.EOF {
			return stream.SendAndClose(&pb.MetricsAcknowledge{Ok: true})
		}
		if err != nil {
			return err
		}
		// Write metrics to local exporter / forward to control plane, etc.
		_ = report
	}
}

func main() {
	creds, err := credentials.NewServerTLSFromFile("agent-cert.pem", "agent-key.pem")
	if err != nil {
		log.Fatal(err)
	}
	server := grpc.NewServer(grpc.Creds(creds), authInterceptor())
	pb.RegisterSandboxAgentServer(server, &agentServer{})

	lis, err := net.Listen("tcp", ":8443")
	if err != nil {
		log.Fatal(err)
	}
	log.Println("Sandbox agent listening on :8443")
	if err := server.Serve(lis); err != nil {
		log.Fatal(err)
	}
}

func authInterceptor() grpc.ServerOption {
	return grpc.UnaryInterceptor(func(
		ctx context.Context,
		req interface{},
		info *grpc.UnaryServerInfo,
		handler grpc.UnaryHandler,
	) (interface{}, error) {
		if p, ok := peer.FromContext(ctx); ok && p.AuthInfo != nil {
			// Check mTLS cert SANs / SPIFFE IDs here.
		}
		return handler(ctx, req)
	})
}

func formatCPU(cpu float64) string  { return fmt.Sprintf("%.2f", cpu) }
func formatMem(mib int64) string   { return fmt.Sprintf("%dMi", mib) }

Follows Go gRPC streaming patterns.¹⁸⁰¹⁸⁴¹⁷³¹⁷⁹
Uses runsc via Docker runtime integration.¹⁷⁵¹⁷⁸¹⁸²

3. Authentication and authorization patterns

mTLS for service-to-service
- Use gRPC with TLS and mutual TLS; control-plane and agents each present X.509 certs.¹⁸⁵¹⁷⁴
- Encode identity in certificate SAN / SPIFFE ID (e.g., spiffe://coditect/workstation/<id>).
- Agent only accepts connections from certs signed by Coditect CA with appropriate SAN prefix; control plane similarly validates agents.¹⁷⁴¹⁸⁶¹⁸⁵
Per-call metadata (RBAC)
- Control plane includes tenant_id, project_id, and sandbox_id in gRPC metadata.
- Agent uses cert identity + metadata to validate that the caller is allowed to manage that sandbox on that workstation.
Least privilege
- Only Coditect control plane uses SandboxAgent API; user sessions never talk directly to Workstations.
- Per-tenant quotas and allowed operations enforced at control-plane before calling agent.

4. Streaming stdout/stderr over unstable networks

Patterns from gRPC streaming best practices:¹⁸¹¹⁸⁴¹⁸⁷¹⁷³¹⁸⁰

Chunked messages
- Send stdout/stderr as small chunks (e.g., 32 KiB) in ExecStdout/ExecStderr messages to avoid head-of-line blocking and huge messages.
Application-level sequence numbers
- Optionally add int64 seq to ExecStdout/ExecStderr and ExecRequest to detect gaps or duplicate messages if the control plane reconnects or needs to reconstruct logs.
Backpressure
- Use gRPC flow-control; avoid buffering unbounded data server-side. If client is slow, writes will block, naturally throttling output.¹⁸⁴¹⁸¹
Reconnect + resume
- Treat Exec streams as long-lived but restartable sessions:
  - If connection drops, control plane can reconnect and reattach to the same sandbox in “logs-only” mode, starting from last-seen sequence number.
  - For reliability, also persist logs on disk (/workspace/.coditect/logs/<sandbox>.log) and expose a “download logs” API as a fallback.
Heartbeat messages
- Periodic ExecStatus messages (e.g., every 5–10 seconds) let the control plane detect broken connections and mark runs as stale.

5. gVisor sandbox lifecycle best practices

Based on gVisor docs and production usage patterns:¹⁸⁸¹⁷⁶¹⁷⁷¹⁸⁹¹⁹⁰¹⁹¹

Immutable, minimal images
- Build small base images for agent workloads; avoid writing to container root, use workspace volume for project data.¹⁹⁰¹⁸⁸
Strict resource limits
- Always set CPU and memory limits on docker run --runtime=runsc; gVisor handles resource isolation via underlying cgroups.¹⁹²¹⁹³¹⁹⁰
- Apply wall-clock and idle timeouts; agent enforces timeouts and cleans up containers.
Network minimization
- Use --network=none or minimal egress (proxy) for untrusted code, since many LLM-generated code exploits involve network exfiltration.¹⁹⁴¹⁹⁵
Lifecycle hooks
- Define states: CREATING -> RUNNING -> IDLE -> TERMINATING -> TERMINATED, and enforce transitions centrally.
- Periodically reconcile: list Docker containers with coditect-sb-* prefix, and clean up orphaned or long-idle sandboxes.
Observability
- Expose per-sandbox metrics (CPU, memory, I/O) via MetricsReport, and integrate with a global metrics backend for alerts (e.g., runaway sandboxes, OOM loops).¹⁹³¹⁹²
Compatibility testing
- Some syscalls and kernel features behave differently under gVisor; test typical workloads and frameworks (Python, Node, build tools) under runsc early.¹⁹¹¹⁹⁶¹⁸⁸¹⁹⁰

If you want, next prompt could be:

“Sketch the Git autosave strategy: exact branch naming scheme, commit-message conventions, and conflict-resolution flow for Coditect autosave branches pushed from Workstations to GitHub.” ¹⁹⁷¹⁹⁸¹⁹⁹²⁰⁰

⁂

Sketch the Git autosave strategy: exact branch naming scheme, commit-message conventions, and conflict-resolution flow for Coditect autosave branches pushed from Workstations to GitHub

Use autosave as a separate, machine-owned history that never collides with human feature branches, and treat it as an append-only stream that can be squashed into clean commits later.²⁰¹²⁰²²⁰³

Branch naming scheme

Goal: globally unique, machine-owned, easy to query by tenant/user/project and date.

Pattern
- autosave/<tenant-slug>/<project-slug>/<user-slug>/<yyyymmdd>
Examples
- autosave/acme/payments/api-user123/20260114
- autosave/corp-ml/llm-router/alice/20260114

Properties:

Uses slash-separated hierarchy per best practices for descriptive branch names.²⁰⁴²⁰⁵²⁰⁶²⁰⁷
Rotates daily so branches don’t grow unbounded; you can archive old days or squash them.
Automatically indicates ownership (tenant, project, user, date).

Implementation details:

Slugify: lowercase, [^a-z0-9-] → -, truncate to length bounds to avoid absurd branch names.²⁰⁵²⁰⁶²⁰⁸

Commit-message conventions

Autosave commits should be clearly machine-generated, compact, and self-describing.

Subject line pattern

- `autosave: <short summary> [<user>@<timestamp>]`

Short summary
- “workspace snapshot”, or
- “edited (+N, -M lines)” if cheap to compute.

Examples:

autosave: workspace snapshot [alice@2026-01-14T04:05:12Z]
autosave: edited payment_service.py (+32 -8) [bob@2026-01-14T04:06:45Z]

Guidelines:

Keep subject ≤ 72 chars for readability.²⁰³²⁰⁹²⁰¹
No body text unless you want to embed a JSON summary blob (file list, tools used), which can be parsed by Coditect later.
Consider a fixed prefix autosave: so automation can distinguish these commits from human ones.²¹⁰²⁰³

Autosave workflow and frequency

Inside the Workstation (per user/project):

Tracking changes
- Watch /workspace/src for file changes (inotify) or poll git status --porcelain every N seconds.
Autosave trigger
- If there are uncommitted changes and no Git operation in progress (.git/index.lock absent), and last autosave > 15–30s ago, then autosave.
Autosave algorithm

# Pseudocode
git status --porcelain
if dirty:
  branch="autosave/<tenant>/<project>/<user>/<yyyymmdd>"
  git fetch origin
  git checkout -B "$branch" "origin/$branch" || git checkout -b "$branch"
  git add -A
  git commit -m "autosave: workspace snapshot [user@timestamp]"
  git push origin "$branch"

- This keeps autosave isolated while allowing rebasing/merging from the main feature branch into autosave when needed.

4. User feature branches - Users develop on normal branches like feature/PROJ-123-new-api.²⁰⁷²⁰⁵ - A Coditect “promote” action later merges/squashes autosave changes into the user’s feature branch via PR or manual cherry-picking.

Conflict-resolution flow

Key principle: never rewrite autosave public history; treat it as an append-only log to avoid breaking in-flight Workstation pushes.²⁰²²¹¹

1. Push conflicts (non-fast-forward)

When autosave git push origin autosave/... fails due to non-FF:

The autosave daemon:
- Runs git fetch origin autosave/....
- Rebases local autosave branch onto the remote:

git fetch origin autosave/...
git rebase origin/autosave/...   # safe if only automation writes here
git push origin autosave/...

Because only automation writes these branches, controlling rebase via tooling is acceptable; no human consumers rely on autosave’s exact history.²¹¹²⁰²

If rebase fails (e.g., due to manual edits):

Mark the autosave branch as dirty in Coditect metadata and stop autosaving until a resolver job or a human cleans it up.

2. Divergence from main branches

Autosave branches will drift from main / feature/*:

Periodically (or when user wants to “promote” changes), Coditect:
- Checks out the user’s feature branch.
- Merges or cherry-picks from the autosave branch:
  - Squash merge to produce one coherent commit:

git checkout feature/PROJ-123
git merge --squash autosave/acme/payments/api-user123/20260114
git commit -m "feat: integrate autosave work from 2026-01-14"

    - This avoids dragging noisy machine-generated history into the main branch.[^12_1][^12_2]

After promotion, autosave continues accumulating; the next day rotates into a new .../20260115 branch.

3. Handling local uncommitted changes when switching branches

Autosave logic should not change the current checked-out branch in a way that disturbs the user:

Preferred pattern:
- Keep user working on their feature branch.
- Use separate clone or worktree for autosave if you need to track autosave branches explicitly, or:
- Autosave directly on the current branch to a private remote (mirrored) and only push consolidated snapshots to GitHub branches.²¹¹

A practical compromise:

Treat autosave as branchless local commits in a hidden remote:
- Local autosave repository pointing to a “shadow” remote or ref namespace (refs/autosave/...) and mirroring that to GitHub.²¹¹

But for simplicity in first iteration, using standard branches with automation allowed to re-checkout inside a detached worktree is usually sufficient.

Summary shape Coditect can adopt

Branch naming:

- `autosave/<tenant>/<project>/<user>/<yyyymmdd>`

Commit messages:
- autosave: workspace snapshot [user@ISO8601]
Autosave frequency:
- Every 15–30 seconds of idle time with dirty working tree, batched and pushed to GitHub.
Conflict handling:
- Automated rebase of autosave branches on push conflicts.
- “Promotion” into human branches via squash merges or cherry-picks.
Safety:
- Never auto-rewrite user feature branches.
- Treat autosave branches as machine-only, append-only logs that can be squashed/archived after promotion.

If you want, the next iteration can be a shell/Go daemon sketch that implements this autosave loop, including detection of ongoing rebase/merge/commit operations to avoid interfering with active Git commands. ²¹²²¹³²¹⁴²¹⁵

⁂

Conflict-resolution steps when autosave push fails due to divergence from main branch

When an autosave push fails because the autosave branch has diverged from its remote (usually because main/feature branches moved and autosave was rebased/merged against them), use a clean, automated, autosave-only reconciliation flow.

1. Detect and classify the failure

On git push origin autosave/<tenant>/<project>/<user>/<date>:

If error is non-fast-forward (remote has commits that local doesn’t), classify as autosave divergence, not a main-branch conflict.
Important: treat autosave/* as machine-owned only; humans don’t commit there, so you can safely rewrite it without affecting user branches.²¹⁶²¹⁷

2. Reconcile autosave branch with remote

Steps inside the Workstation’s autosave daemon:

Fetch remote state

git fetch origin autosave/<...>

Rebase local autosave on remote

git checkout autosave/<...>
git rebase origin/autosave/<...>

- Because both sides are machine-generated, rebase should normally be trivial (fast-forward or linear append).

3. Resolve rare conflicts (if any)

If rebase pauses with conflicts (e.g., previous autosave bug or manual edits to autosave branch): - Abort automated rebase and mark the autosave branch as “needs manual intervention”:

git rebase --abort

- Record a flag in Coditect metadata (e.g., `autosave_conflict=true`) and stop autosaving for that branch until a background “repair job” or a human resolves it.

4. Retry push

git push origin autosave/<...>

3. Keeping in sync with main / feature branches

Autosave divergence from main/feature is expected; resolution should be one-way, from autosave into user branches, never the reverse.

Periodically (or on user action “sync autosave”), Coditect:

Checks out user’s feature branch:

git checkout feature/PROJ-123
git pull origin feature/PROJ-123

Integrates autosave history via squash merge:

git merge --squash autosave/<tenant>/<project>/<user>/<date>
git commit -m "feat: integrate autosave work from 2026-01-14"

Pushes feature branch and opens a PR if desired.

This avoids ever forcing main/feature to incorporate noisy autosave commit history and keeps autosave conflicts decoupled from “real” Git history.²¹⁸²¹⁶

4. Fallback strategy for persistent autosave failures

If autosave cannot rebase/push after N attempts (e.g., due to repository corruption or manual edits):

Stop autosave for that project/user and mark in Coditect UI: “Autosave paused due to Git conflicts.”
Keep local working tree intact so user doesn’t lose work.
Offer one-click “repair” action that:
- Creates a new autosave branch (e.g., autosave/.../<date>-repair),
- Commits the current workspace snapshot there,
- Pushes and resumes autosave from that new branch.

This minimizes risk of data loss while keeping autosave history clean and machine-controlled.

⁂

USE GIT WORKTREES and research

Which branch prefix should Workstations create for autosave changes How to include ticket and workstation ID in branch names Exact commit message template for autosave snapshots How to tag autosave commits for easy cleanup

Use worktrees so autosave runs in an isolated working copy per user/ticket, and encode ticket + workstation identity directly into branch and commit metadata for traceability and cleanup.²¹⁹²²⁰²²¹²²²

Branch prefix and naming with worktrees

Prefix

Use a dedicated autosave prefix and ticket-first naming, so you get uniqueness + traceability:²²³²²⁴²²⁵

Prefix: autosave/
Full pattern:

- `autosave/<ticket-id>/<tenant>/<project>/<user>/<workstation>/<yyyymmdd>`

Examples:

autosave/PROJ-1234/acme/payments/alice/ws-01/20260114
autosave/NOTICKET/acme/spike-service/bob/ws-nyc-02/20260114

Notes:

Ticket-first aligns with common branch-naming best practices and makes it easy to correlate autosave streams to work-items.²²⁶²²⁵²²³
NOTICKET (or similar) can be used when there is no tracker ID.²²⁴

Worktree usage

For each active autosave context:

git worktree add ../.wt-autosave-PROJ-1234 \
  autosave/PROJ-1234/acme/payments/alice/ws-01/20260114

The autosave daemon operates in ../.wt-autosave-PROJ-1234, leaving the main working tree free for the developer’s feature branch.²²⁰²²¹²²²²¹⁹

Commit message template

Use a strict, machine-readable template:

Template

autosave: <ticket-id> <workstation-id> <user> <ISO8601-UTC>

Examples:

autosave: PROJ-1234 ws-01 alice 2026-01-14T04:07:15Z
autosave: NOTICKET ws-nyc-02 bob 2026-01-14T04:08:01Z

Rationale:

Starts with autosave: to differentiate from human commits and allow easy filtering.²²⁷²²⁸²²⁹
Contains ticket ID and workstation ID to diagnose which environment generated the snapshot.
ISO timestamp makes it sortable and avoids locale issues.

Optional body (for Coditect-only consumption):

files:
  - payment_service.py
  - handlers/webhook.go
tool:
  - claude-code
  - gemini
summary: "Refactor payment flow and add retry logic"

Tags for autosave commits (for cleanup and indexing)

Treat tags as lightweight markers for autosave epochs, not for every commit.²³⁰²³¹

Tag naming

Prefix: autosave-

- **Pattern**: `autosave/<ticket-id>/<user>/<yyyymmdd>/<n>`

Examples:

autosave/PROJ-1234/alice/20260114/001
autosave/PROJ-1234/alice/20260114/002

Usage:

Optionally tag every Nth autosave commit (e.g., every 20th) to create anchors for cleanup and recovery.
Store mapping in Coditect (ticket → last autosave tag).

Cleanup

To prune old autosave history for a ticket/date after promoting to a clean feature branch, Coditect can:
- Enumerate tags matching autosave/PROJ-1234/alice/20260114/*.
- Decide retention (keep last tag, delete older ones).
- Optionally delete old autosave branches past an age threshold (e.g., > 30 days).

This keeps autosave history discoverable but bounded.

Conflict handling with worktrees

Autosave worktree operates solely on its autosave branch; developer worktree remains on feature/PROJ-1234-something.²²¹²²²²¹⁹²²⁰
On push non-FF: autosave process uses its worktree to git fetch origin autosave/... && git rebase origin/autosave/... and re-push, with no impact on the developer’s branch.
If rebase fails repeatedly, Coditect can:
- Create a fresh autosave branch and worktree with a suffix, e.g., autosave/PROJ-1234/.../20260114-repair,
- Commit the current state there and continue, leaving the old branch as an artifact for manual inspection.

If you want, next step can be a small shell or Go daemon loop that: creates the autosave worktree from the feature branch, watches for changes, and applies this branch + commit + tag scheme automatically. ²³²²³³²³⁴²³⁵²³⁶

⁂

Should autosave branches be local or pushed to remote

Recommended prefix for workstation autosave branches How to format branch name with ticket and workstation ID Examples of safe unique workstation identifiers Branch naming collisions handling strategy for autosave branches

Autosave branches should be pushed to the remote (for recovery and central visibility), but treated as machine-owned, low-value history that can be rotated and squashed.

Local vs remote autosave branches

Local-only autosave:
- Pros: No noise in the remote, simpler Git perms.
- Cons: No protection against Workstation loss; Coditect cannot see or manage autosave state centrally.
Pushed autosave branches (recommended):
- Pros: Survives VM failure, enables Coditect-wide audit, consolidation, and cleanup; fits your “near real-time backup” goal.
- Cons: Extra refs in the remote, but mitigated by strict naming and retention policies.

For Coditect, autosave branches should be pushed to the remote, with clear prefixes and automated pruning.

Recommended prefix for autosave branches

Use a distinct, machine-only prefix at the root:

Recommended: autosave/

Examples with hierarchy:

autosave/PROJ-1234/acme/payments/alice/ws-01/20260114
autosave/NOTICKET/corp-ml/llm-router/bob/ws-nyc-02/20260114

This keeps autosave clearly separated from feature/, bugfix/, etc., and allows easy listing/cleanup (git branch --list 'autosave/*').

Branch name format with ticket and workstation ID

Template:

`autosave/<ticket-id>/<tenant>/<project>/<user>/<workstation-id>/<yyyymmdd>`

<ticket-id>: JIRA-style or tracker ID (e.g., PROJ-1234), or NOTICKET when absent.

- `<tenant>` / `<project>`: short slugs, kebab-case.

<user>: user slug (GitHub handle or SSO username).
<workstation-id>: short, stable identifier (see below).
<yyyymmdd>: date (UTC) to rotate branches daily.

Example:

autosave/PROJ-1234/acme/payments/alice/ws-01/20260114

Slugify each segment (lowercase, [^a-z0-9-] → -, length-capped) to keep branch names safe and Git-friendly.

Safe unique workstation identifiers

Properties: stable per Workstation, non-sensitive, short, and unique enough within the repo.

Good options:

Short host-based ID (preferred):
- E.g., ws-01, ws-nyc-02, ws-br-cj-01.

- Derived from hostname + region or team: `ws-<region>-<seq>`.

Hash-based ID:
- E.g., ws-6f3a9b, first 6–8 chars of a hash of instance ID.
Cloud instance metadata (normalized):
- GCE instance name slugged: ws-<instance-name> (but keep short).

Avoid:

Raw IP addresses, internal hostnames that leak topology, or long opaque UUIDs; they clutter branch names and can expose infra details.

Handling branch naming collisions

A collision happens if two autosave processes produce the same branch name (e.g., two Workstations with misconfigured IDs). Strategy:

Prevent collisions by design
- Make <workstation-id> unique per repo or per project (control this in Coditect config).
- Enforce uniqueness check at Workstation registration time in Coditect’s control plane.
If collision still occurs (non-fast-forward push)

When git push origin autosave/... fails with a non-FF error: - Try automated reconciliation: - git fetch origin autosave/... - If the remote branch was also autosave from same workstation (detected via commit metadata), rebase and push. - If remote branch appears to be from a different workstation (commit messages show different workstation IDs): - Treat as collision and fork a new branch name with a numeric suffix: - Original: - autosave/PROJ-1234/acme/payments/alice/ws-01/20260114 - New (forked): - autosave/PROJ-1234/acme/payments/alice/ws-01b/20260114 - or autosave/PROJ-1234/acme/payments/alice/ws-01/20260114-2 - Update the Workstation’s autosave config to use the forked branch going forward. 3. Central coordination (Coditect) - Maintain a mapping in Coditect: (tenant, project, user, workstation-id, date) → branch name. - On registration or first autosave, the control plane assigns the canonical autosave branch name and ensures uniqueness; Workstations never invent it on their own.

This gives you:

Remote-backed, recoverable autosave.
A clear naming scheme carrying ticket + workstation identity.
A collision strategy that always preserves data and isolates conflicting producers into new branches.

SUMMARIZE OUR FINDINGS

Coditect’s sandbox + autosave design converged on a practical, mostly open stack with clear separation of concerns across runtimes, execution control, and Git workflows.

Sandbox runtimes and tiers

For a Modal-like execution boundary, gVisor, Kata+Firecracker, Firecracker alone, and Wasm runtimes all emerged as viable building blocks, each trading isolation for performance and complexity. gVisor sits between containers and the host kernel via a user-space kernel, giving better isolation than runc with near-container startup. Kata+Firecracker adds VM-grade isolation via microVMs at the cost of higher overhead and more complex node setup, while Wasm (Wasmtime/Wasmer/WasmEdge) is ideal for capability-scoped tools with very fast startup but no full Linux.²³⁷²³⁸²³⁹²⁴⁰²⁴¹²⁴²
A tiered architecture on Kubernetes was outlined: gVisor for semi-trusted workloads via a gvisor RuntimeClass, Kata+Firecracker for untrusted/public workloads via a kata-fc RuntimeClass bound to special node pools, and a Wasm tier using either a Wasm-aware runtimeClass or a dedicated Wasm executor service. Runtime choice is policy-driven, based on tenant, project, and risk level, with shared observability, quotas, and audit logging across all tiers.²⁴²²⁴³²⁴⁴²⁴⁵²⁴⁶

gVisor on Google Cloud Workstations

For Coditect on Google Cloud Workstations (no Kubernetes), the recommended approach is to run gVisor (runsc) as an alternate container runtime inside each Workstation VM, treating each agent execution as a gVisor sandbox container. Google’s own use of gVisor for serverless (Cloud Run, GKE Sandbox) shows it is a good fit for untrusted multi-tenant workloads with acceptable performance.²⁴⁷²⁴⁸²⁴⁹²⁵⁰²⁵¹
Each team gets one or more Workstations; a Coditect agent on each VM handles docker run --runtime=runsc ... with CPU/memory/time limits, mounts per-tenant project workspaces, streams stdout/stderr back to the control plane, and periodically reports resource usage for quotas and billing. gVisor provides an extra isolation boundary inside each VM without needing nested virtualization or microVM orchestration.²⁵⁰²⁵¹²⁵²²⁵³

Control plane ↔ agent gRPC API

A minimal gRPC API was defined with a SandboxAgent service exposing CreateSandbox, ExecSandbox (bidirectional stream), DestroySandbox, and ReportMetrics (client stream). This shape matches gRPC streaming best practices for long-lived exec sessions and metrics telemetry.²⁵⁴²⁵⁵²⁵⁶²⁵⁷
A Go agent implementation sketch uses Docker with the runsc runtime to launch sandbox containers, wires docker exec stdout/stderr into the ExecSandbox stream, reads stdin and control messages from the control plane, and reports exit codes and resource usage. Authentication and authorization are handled via mTLS (X.509 certs with SAN/SPIFFE IDs) and gRPC metadata, with the control plane enforcing tenant/project quotas before calling the agent.²⁵²²⁵⁸²⁵⁹²⁶⁰²⁶¹²⁶²

Streaming and lifecycle best practices

For stdout/stderr over unstable networks, the design uses chunked messages (e.g., 32 KiB), backpressure via gRPC’s flow control, optional application-level sequence numbers, heartbeat ExecStatus messages, and the ability to reconnect and reattach to running sandboxes, with local log files as a fallback. These patterns follow gRPC streaming guidance for robust long-running streams.²⁵⁶²⁵⁷²⁶²²⁶³²⁵⁴
gVisor lifecycle guidance includes: minimal immutable images; strict per-container CPU/memory/time limits; default-deny or tightly restricted network; periodic reconciliation to clean up orphan containers; and testing workloads under gVisor to catch syscall/compatibility quirks.²⁵¹²⁵³²⁶⁴²⁶⁵²⁶⁶²⁶⁷²⁵⁰

Git autosave with worktrees and remote branches

Autosave is treated as a machine-owned, remote-backed history to provide near–real time backup and central visibility. Dedicated autosave branches are pushed to the remote, using a prefix like autosave/ and a hierarchical name including ticket, tenant, project, user, workstation, and date, e.g.: autosave/PROJ-1234/acme/payments/alice/ws-01/20260114. This follows general branch-naming best practices while keeping autosave clearly segregated from human branches.²⁶⁸²⁶⁹²⁷⁰²⁷¹²⁷²
Git worktrees are used to isolate autosave operations from the user’s main working copy: a separate worktree pointed at the autosave branch lets the autosave daemon commit and push snapshots without disturbing the developer’s current branch. Prior art and guides show this pattern is well-suited for multi-branch workflows and AI tooling.²⁷³²⁷⁴²⁷⁵²⁷⁶
Commit messages follow a strict template such as autosave: PROJ-1234 ws-01 alice 2026-01-14T04:07:15Z, making them easy to filter and audit. Autosave-only tags like autosave/PROJ-1234/alice/20260114/001 can mark epochs for cleanup. Conflict handling uses automated fetch+rebase on autosave branches (machine-only), with collision detection on workstation IDs and, if needed, creation of new suffixed branch names (e.g., .../20260114-2), while leaving user feature branches untouched. This aligns with Git guidance to keep main history clean while allowing frequent automated commits.²⁷⁷²⁷⁸²⁷⁹²⁸⁰²⁸¹

⁂

CREATE A SDD TDD AND ADR describing the CODITECT GOOGLE CLOUD WORKSTATION IMPLEMENTATION as research, analyzed and recommended

Below is a concise SDD, TDD, and ADR set for the Coditect + Google Cloud Workstations + gVisor implementation as designed and researched.

1. SDD – System Design Document

1.1 Overview

Coditect runs multi-tenant, agentic development workflows on Google Cloud Workstations, using gVisor (runsc) inside each Workstation VM as a Modal-like sandbox runtime for untrusted agent code.²⁸²²⁸³²⁸⁴²⁸⁵

Core elements:

Google Cloud Workstations cluster (controller + gateway) providing per-team IDE VMs with persistent disks and VPC access.²⁸³²⁸⁴
Per-VM Coditect Agent exposing a gRPC SandboxAgent service and managing docker --runtime=runsc containers.²⁸⁶²⁸⁷
Coditect Control Plane (API + Orchestrator) that exposes /sandboxes to clients, enforces quotas and IAM, and orchestrates Workstations/agents.
Git-backed project workspaces with autosave branches using Git worktrees and machine-owned branches pushed to GitHub for near real-time persistence.²⁸⁸²⁸⁹²⁹⁰²⁹¹

1.2 Architecture components

Google Cloud Workstations
- Managed cluster per region; Workstations are GCE VMs managed by the Workstations controller and reachable via a gateway.²⁸⁴²⁸³
- Each Workstation VM has:
  - Docker or containerd configured with gVisor runsc runtime.²⁸⁷²⁸²²⁸⁶
  - Coditect Agent daemon (gRPC server).

- Workspace root: `/workspaces/<tenant>/<user>/<project>` with persistent disk.

gVisor sandbox runtime
- Installed via runsc install and configured as Docker runtime runsc.²⁸⁶²⁸⁷
- Sandbox containers launched by agent as:

docker run --runtime=runsc \
  --cpus=<limit> --memory=<limit> \
  --network=none --read-only \
  -v /workspaces/...:/workspace \
  --name coditect-sb-<id> <image> sleep infinity

- Provides stronger isolation between agent code and Workstation OS by interposing a user-space kernel.[^17_11][^17_12][^17_13][^17_4][^17_1]

3. Coditect Control Plane - Exposes HTTP API (/sandboxes, /exec, /destroy) to Coditect UI and orchestration agents. - Maintains metadata DB (tenants, projects, sandboxes, quotas, autosave branches). - Implements gRPC client to per-Workstation SandboxAgent. 4. SandboxAgent gRPC API (per Workstation) - Proto (summarized): - CreateSandbox(CreateSandboxRequest) -> CreateSandboxResponse - ExecSandbox(stream ExecRequest) <-/-> (stream ExecEvent) (bidirectional).²⁹²²⁹³²⁹⁴ - DestroySandbox(DestroySandboxRequest) -> DestroySandboxResponse - ReportMetrics(stream MetricsReport) -> MetricsAcknowledge - Control Plane selects a Workstation, calls CreateSandbox, then runs exec sessions via ExecSandbox. 5. Git autosave + worktrees - For each (tenant, project, user, ticket, workstation, date) Coditect creates a worktree checked out to an autosave branch:²⁸⁹²⁹⁰²⁹¹²⁸⁸

Branch pattern:

`autosave/<ticket-id>/<tenant>/<project>/<user>/<workstation-id>/<yyyymmdd>`

- Autosave daemon in the Workstation’s context:
    - Watches workspace changes.
    - Periodically `git add -A`, `git commit` with machine template, and `git push origin autosave/...`.

1.3 Data flows

Sandbox lifecycle
- Client → Control Plane: POST /sandboxes with tenant/project/session info.
- Control Plane: quota + IAM checks; chooses Workstation; calls CreateSandbox on its agent.
- Agent: launches gVisor container and returns sandbox/container IDs.
- Control Plane: records sandbox metadata and returns sandbox handle.
Execution + streaming
- Client → Control Plane: POST /sandboxes/{id}/exec.
- Control Plane ↔ Agent: ExecSandbox stream.
  - Control Plane sends ExecStart and optional ExecStdin.
  - Agent streams ExecStdout, ExecStderr, ExecStatus, ExecExit.²⁹³²⁹²
Metrics + quotas
- Agent periodically sends MetricsReport (CPU seconds, peak mem, bytes I/O) to Control Plane.²⁹⁵²⁹⁶²⁹⁷
- Control Plane updates usage counters per tenant/project and may deny new sandboxes or terminate existing ones when quotas exceeded.
Git autosave
- Autosave daemon operates in autosave worktree, committing snapshots and pushing to GitHub.²⁹⁸²⁸⁸
- Coditect central DB tracks mapping: (tenant, project, user, ticket, workstation, date) -> autosave branch.

1.4 Non-functional requirements

Security:
- gVisor sandbox isolating untrusted code from Workstation host kernel and other workloads.²⁹⁹³⁰⁰³⁰¹²⁸⁵²⁸²
- mTLS between Control Plane and Agents, strict RBAC in Control Plane.³⁰²³⁰³
Reliability:
- Resilient streaming with backpressure and reconnect semantics for stdout/stderr.²⁹⁴³⁰⁴²⁹²
- Autosave branches on remote Git for recovery if Workstation fails.
Performance:
- gVisor performance tuned with recent FS improvements (VFS2/LISAFS) to keep overhead close to containers for typical I/O patterns.²⁹⁷³⁰⁵

2. TDD – Technical Design Details

2.1 gVisor configuration on Workstations

Install runsc from gVisor releases.³⁰⁶²⁸⁷
sudo runsc install to add runsc runtime to Docker and update daemon.json.²⁸⁷²⁸⁶
Restart Docker; test with docker run --runtime=runsc hello-world.²⁸⁶
Hardening: configure Docker to use cgroupfs when required by gVisor, per docs.²⁸⁷

2.2 SandboxAgent implementation (Go)

gRPC server with TLS + mTLS; SandboxAgent service from proto.³⁰⁷³⁰⁸³⁰²
CreateSandbox builds docker run args and executes them using exec.CommandContext.
ExecSandbox uses bidirectional streams:²⁹²²⁹³²⁹⁴
- On first message (with ExecStart), start docker exec and attach to stdout/stderr.
- Forward stdout/stderr as chunked ExecStdout/ExecStderr events.
- Accept ExecStdin messages and write to process stdin.
- On termination or timeout, send ExecExit.
ReportMetrics reads from Docker stats/cgroup FS and streams metrics periodically to Control Plane.²⁹⁶²⁹⁵

2.3 Control Plane internals

Scheduler:
- Workstation registry with capacity metrics.
- Placement algorithm (simple round-robin, least-loaded, or tenant-aware).
Quota service:
- DB schema for per-tenant/project quotas and usage counters.
- Atomic operations to “reserve” and “release” capacity when sandboxes start/stop.
IAM:
- JWTs or session tokens tying user to tenant/project; Control Plane enforces ACLs at /sandboxes API.
Audit logging:
- Append-only event log of sandbox create/exec/destroy with tenant/project IDs, user, and model (e.g., Claude/Gemini) context.

2.4 Git autosave + worktrees details

For each active ticket/workstation/project:

git worktree add ../.wt-autosave-$ID autosave/<ticket>/<tenant>/<project>/<user>/<ws>/<yyyymmdd>

Autosave daemon in that worktree:²⁹⁰²⁹¹²⁸⁸²⁸⁹
- Debounced loop: if dirty and no index.lock, then:

git add -A
git commit -m "autosave: PROJ-1234 ws-01 alice 2026-01-14T04:07:15Z"
git push origin autosave/...

On non-FF push: git fetch + git rebase origin/autosave/... and retry; if conflict persists, create suffix branch and update mapping.

2.5 Integration with Claude Code and Gemini

Control Plane keeps project context (Git repo URLs, file trees, tests).
When user invokes AI action, Control Plane:
- Pulls autosave branch or feature branch;
- Calls Claude Code or Gemini with repo snapshot and task;
- Writes modifications into workspace;
- Triggers tests inside gVisor sandbox;
- Autosave daemon snapshots changes to GitHub.

3. ADR – Architecture Decision Record

Title: Use Google Cloud Workstations with gVisor (runsc) and Workstation-local gRPC agents for Coditect sandboxes and Git autosave.

Context

Coditect must execute untrusted, multi-tenant agent code for software development workflows, integrate with GitHub for near real-time project backups, and support AI agents (Claude Code, Gemini) operating over codebases. Requirements include:

Stronger isolation than plain containers on shared hosts.
Economic viability and simplicity on GCP.
Deep IDE integration via Google Cloud Workstations.²⁸³²⁸⁴
Centralized quotas, audit, and streaming execution control for sandboxes.

Decision

Use Google Cloud Workstations as the primary execution environment, with each Workstation being a GCE VM managed by Workstations controller/gateway in Coditect’s VPC.²⁸⁴²⁸³
Install gVisor’s runsc runtime on each Workstation and configure Docker to use runsc for sandbox containers, leveraging gVisor’s user-space kernel for added isolation.³⁰⁰²⁸²²⁸⁶²⁸⁷
Deploy a Coditect Agent on each Workstation that exposes a gRPC SandboxAgent service, implements sandbox lifecycle via Docker+runsc, streams stdout/stderr, and reports metrics.²⁹³²⁹⁴²⁹²
Keep a central Coditect Control Plane exposing /sandboxes to clients, performing IAM + quota checks, and orchestrating work across Workstations and agents.

5. **Implement Git autosave using remote-pushed autosave branches and git worktrees**, with branches named `autosave/<ticket>/<tenant>/<project>/<user>/<workstation>/<yyyymmdd>` and machine-generated commits pushed to GitHub, giving near real-time backup.[^17_10][^17_28][^17_7][^17_8][^17_9]

Rationale

gVisor is designed as an application kernel for containers and is widely used within Google’s own serverless and container platforms, providing a well-tested isolation layer suitable for untrusted code.³⁰¹²⁸⁵²⁸²²⁹⁹³⁰⁰
Google Cloud Workstations architecture already gives per-team VMs, persistent disks, and VPC connectivity, and can be centrally managed and snapshotted.²⁸³²⁸⁴
Using gRPC streaming between control plane and agents matches best practices for long-lived process I/O (stdout/stderr) and metrics, and keeps the control plane simple and language-agnostic.³⁰⁴²⁹⁴²⁹²²⁹³
Git worktrees and autosave branches provide local isolation and remote resiliency: work continues even if autosave fails, and GitHub holds recovery points if a Workstation dies.²⁸⁸²⁸⁹²⁹⁰²⁹¹

Alternatives considered

Kubernetes + Kata/Firecracker: Stronger isolation but higher operational cost and complexity on GCP; less aligned with the Workstations-first developer experience.³⁰⁹³¹⁰³¹¹
Remote-only autosave (no worktrees): Simpler but interferes with the developer’s active branch and makes automation riskier; worktrees were chosen to isolate automation.³¹²³¹³

Consequences

Pros
- Stronger isolation than vanilla containers with minimal changes to the developer experience.²⁸⁵²⁸²³⁰⁰³⁰¹
- Clear separation between Control Plane and Workstation Agent responsibilities.
- Robust autosave and recovery with GitHub as the durable store.
Cons
- Requires OS-level access to Workstations to install runsc and configure Docker.³⁰⁶²⁸⁶²⁸⁷
- gVisor compatibility quirks must be tested for typical dev workloads and tools.³¹⁴³¹⁵²⁹⁷

This ADR, SDD, and TDD give a concrete blueprint for implementing the Coditect Google Cloud Workstation architecture as designed in the research. ³¹⁶³¹⁷

⁂

Define the scope and success criteria for the SDD TDD and ADR

List stakeholders and approval authorities for the documents Provide an outline template for the SDD specific to Cloud Workstations Provide an outline template for the TDD focused on gVisor and runsc Provide an ADR template with decision options and tradeoffs section

For Coditect’s Google Cloud Workstations + gVisor effort, the SDD, TDD, and ADR should have explicit scope, measurable success criteria, clear owners, and reusable templates tailored to this stack.

Scope and success criteria

SDD (System Design Document)

Scope
- End-to-end system architecture for Coditect on Google Cloud Workstations: control plane, Workstations, Coditect agent, gVisor sandboxes, Git autosave, IAM, quotas, and observability.
Success criteria
- All stakeholders can describe the system at a high level from the SDD alone.
- Architecture decisions are consistent with GCP Workstations and gVisor capabilities (no “magic infra”).³¹⁸³¹⁹³²⁰
- Interfaces between components (API, gRPC, Git, IAM) are unambiguous enough to drive TDD work.

TDD (Technical Design Details)

Scope
- Implementation-level design for gVisor and runsc integration on Workstations, the SandboxAgent gRPC service, and autosave Git worktrees.
- Container launch and lifecycle, resource limits, metrics collection, and integration with Docker/containerd.³²¹³²²³²³³²⁴
Success criteria
- Go/Rust implementation teams can build the agent and control plane without guessing behaviors.
- Operational teams can configure runsc and Docker on Workstations using only TDD steps.³²⁵³²¹
- gRPC APIs and message schemas are stable enough to generate client/server stubs and tests.³²⁶³²⁷³²⁸

ADR (Architecture Decision Record)

Scope
- Capture major architectural choices (Workstations vs GKE, gVisor vs microVMs, autosave strategy, etc.), rationale, and implications.
Success criteria
- Future engineers can understand why Workstations + gVisor was chosen and what alternatives were rejected.
- Changes to the architecture can be evaluated against documented decisions and tradeoffs.

Stakeholders and approval authorities

Product / Platform Lead (Coditect)
- Owns overall platform direction and approves SDD/ADR alignment with product roadmap.
Chief Architect / Principal Engineer
- Accountable for SDD and ADR technical soundness and long-term maintainability.
Infra / DevOps Lead
- Approves TDD sections on Workstations provisioning, Docker/containerd + gVisor setup, monitoring, and rollout.³¹⁹³²²³¹⁸³²⁵
Security / Compliance Officer
- Reviews SDD/TDD for sandbox isolation, IAM, audit logging, and data handling; signs off on ADR security tradeoffs (gVisor vs microVM).³²⁰³²³³²⁹³³⁰
Team Leads (Agent Orchestration, Git Integration)
- Ensure SDD/TDD requirements are implementable by their teams; sign off on scope and milestones.

Approval suggestion:

SDD: Product Lead + Chief Architect + Security.
TDD: Chief Architect + Infra Lead + relevant Team Leads.
ADR: Chief Architect + Security Officer (and Product Lead if impact is high).

SDD outline template (Cloud Workstations–specific)

1. Document control

Version, date, author.
Reviewers and approvers (names/roles).
Related ADRs and TDDs.

2. Overview

Purpose and scope (Coditect on GCP Workstations).
Objectives (multi-tenant agent compute, near real-time Git persistence, safe untrusted code execution).

3. System context

Context diagram:
- Coditect Control Plane, Google Cloud Workstations, GitHub, Anthropic Claude, Google Gemini, identity provider.³¹⁸³¹⁹
External dependencies (GCP services, GitHub, auth providers).

4. High-level architecture

Components:
- Workstations cluster (controller/gateway, per-team VMs).³¹⁹³¹⁸
- Coditect Control Plane.
- Workstation Agent + gVisor runtime.³²⁰³²¹
- GitHub and autosave branches.
Deployment topology (regions, VPCs, projects).

5. Workstation and sandbox model

Workstation lifecycle (provisioning, scaling, deprovisioning).³¹⁸³¹⁹
Sandbox abstraction (one gVisor container per sandbox).³²³³²⁴³³⁰³²⁰

- Workspace layout (`/workspaces/<tenant>/<user>/<project>`).

6. Control Plane responsibilities

/sandboxes API surface.
Scheduling logic (Workstation selection).
Quota enforcement and billing.
IAM model (tenants, projects, users, roles).

7. gRPC and messaging

Description of SandboxAgent gRPC services and message flows (Create/Exec/Destroy/ReportMetrics).³²⁷³²⁸³²⁶
Error handling and retry semantics.

8. Git integration and autosave

Git repository mapping (tenant/project → repo).
Autosave branch naming and worktree strategy.³³¹³³²³³³³³⁴
Promotion from autosave to feature branches.

9. Non-functional requirements

Security (gVisor isolation, network policies, mTLS).³²⁹³³⁰³³⁵³²³³²⁰
Reliability and availability (Workstation/node failure behavior).
Performance expectations (latency, throughput, cost).³³⁶³³⁷
Observability (logging, metrics, tracing).

10. Risks and open questions

gVisor compatibility hot spots.³³⁸³³⁹³⁴⁰
Workstations lifecycle edge cases.
Future evolution (microVM tier, Wasm tier).

TDD outline template (gVisor + runsc–focused)

1. Document control

Version, date, author, reviewers.

2. Purpose and scope

Detailed design for:
- gVisor installation and configuration on Workstations.
- Docker/containerd runtime integration.
- SandboxAgent implementation.
- Metrics, logs, and lifecycle policies.

3. Workstation environment

Base OS/image and Workstations configuration.³¹⁹³¹⁸
Required packages (Docker/containerd, runsc, etc.).³²²³²⁵
Security hardening (user accounts, SSH, file permissions).

4. gVisor (runsc) setup

Installation steps (commands, versions) referencing gVisor docs.³²¹³²²³²⁵
Docker/containerd configuration snippets (daemon.json, runtime definitions).
Validation tests (docker run --runtime=runsc hello-world).³³⁹³²¹

5. Sandbox lifecycle implementation

Container naming and labels (coditect-sb-<id>).
CreateSandbox behavior (CPU/mem/network/volume args).
Exec behavior (PTY support, working dirs, env).
Destroy behavior and cleanup (timeouts, orphan detection).
Lifecycle state machine and transitions.

6. SandboxAgent gRPC server

Service definitions (from proto).³²⁸³²⁶³²⁷
Go package layout (agent binary, config, logging).
Streaming implementation details:
- stdout/stderr buffering and chunk size.
- Stdin handling and control messages.
- Heartbeats and idle detection.

7. Metrics and logging

Metrics collection (Docker stats, cgroups, sampling interval).³³⁷³⁴¹³⁴²
Mapping to MetricsReport fields and quota counters.
Log routing from Workstations to central logging (e.g., via fluentd/Vector).

8. Security and authN/Z

mTLS configuration (cert distribution, rotation, validation).³³⁵³⁴³
Agent-side checks on caller identity (peer cert SANs).
Least-privilege OS users and Docker group configuration.

9. Failure handling and resiliency

Behavior on gRPC disconnects (Exec retries, metrics stream reconnection).³⁴⁴³²⁸
Handling of Docker failures and runsc errors.
Health checks for Agent and gVisor.

10. Testing strategy

Unit tests (command building, gRPC handlers).
Integration tests (agent + control plane on a real Workstation).
Performance and soak tests under gVisor (CPU/I/O patterns).³³⁷³³⁹

ADR template (with options and tradeoffs)

Title: <Short decision title> Status: Proposed | Accepted | Superseded Date: <YYYY-MM-DD> Authors: <names>

1. Context

Problem statement and constraints.
Relevant background (e.g., Coditect’s need for sandboxing untrusted code on GCP Workstations).³³⁰³²⁰³¹⁸³¹⁹

2. Decision

Clear statement of the chosen option.
Brief description of the solution (e.g., “Use gVisor (runsc) on Workstations with Docker, plus per-VM SandboxAgent gRPC service.”).³²⁴³²³³³⁰³²⁰³²¹

3. Options considered

List each option with a short summary:

Option A: GCP Workstations + gVisor + Docker/runsc (chosen).³³⁰³²⁰³²¹
Option B: GKE + Kata Containers/Firecracker.³⁴⁵³⁴⁶³⁴⁷
Option C: GKE + gVisor (GKE Sandbox) with remote IDE.³⁴⁸³⁴⁹
Option D: Direct Firecracker microVM pool on GCE.³⁵⁰

4. Tradeoffs

For each option, discuss:

Security
- Isolation strength, blast radius, kernel sharing vs VM boundary.³⁴⁷³²³³⁴⁵³⁵⁰
Performance
- Cold start, I/O overhead, CPU/memory footprint.³³⁶³³⁷
Operational complexity
- Setup effort (Workstations vs GKE vs custom microVM), support, upgrades.³⁴⁶³¹⁸³¹⁹
Developer experience
- IDE quality, latency, friction for AI-assisted workflows.
Cost
- VM count, node types, overhead from isolation layer.

Clearly highlight why the chosen option wins overall in this context (e.g., best balance of security, DX, and implementation time).

5. Consequences

Positive
- List benefits and what becomes simpler.
Negative / Risks
- New complexity, vendor dependencies, potential performance or compatibility issues.³³⁸³³⁹³³⁷
Mitigations
- How risks will be handled (e.g., compatibility testing, fallbacks to plain containers/microVMs).

Links to other ADRs (e.g., autosave strategy, multi-runtime tiering, AI model choices).

This gives you a structured way to lock in the Workstations + gVisor direction while keeping future alternatives explicit and comparable.

⁂

CODITECT runs DJANGO backend with front-end REACT TYPESCRIPT strict for user, tenant, project, license management with JWT tokens for individuals with RBAC - analyze and describe how this integrates int the GOOGLE CLOUD WORKSTATION provisioning, orchestration, gVisor assignements for individuals working within shared Google Cloud Workstations

Coditect’s Django/React SaaS becomes the control plane and identity source for who can use which Cloud Workstations and which gVisor sandboxes run on them, with JWT+RBAC driving provisioning, orchestration, and sandbox assignment.³⁵¹³⁵²³⁵³³⁵⁴³⁵⁵³⁵⁶³⁵⁷

Identity, JWT, and RBAC as the control layer

Django issues JWTs per user that embed user_id, tenant_id, project_ids, and role claims (roles: ["tenant_admin", "developer", ...]).
React TS frontend authenticates via these JWTs and calls Coditect’s backend (Django REST / GraphQL), not GCP directly.
Coditect backend maps its internal RBAC to Google Cloud IAM:
- E.g., a tenant admin can request workstation templates but not edit cluster-wide settings.³⁵²³⁵⁸³⁵¹
JWT subject and tenant/project claims are the canonical identity for:
- Which Workstation(s) a user may attach to.
- Which sandboxes (gVisor containers) they may start, exec into, or destroy on those Workstations.³⁵⁴³⁵⁹

Workstation provisioning driven by Django

Cloud Workstations are provisioned via GCP APIs or Terraform/Pulumi from a Coditect service account, not from the browser.³⁶⁰³⁶¹³⁵⁶³⁵⁷
Django integrates user/tenant/project models with Workstation configs:
- For each tenant/team, Coditect stores the Workstation cluster and one or more workstation configs (image, machine type, disk, tools).³⁵⁸³⁵⁶³⁵⁷
- When a user creates or resumes a dev session from the React UI, Django:
  - Validates they have a role allowing dev sessions in that project.
  - Calls the Workstations API (with roles/workstations.workstationUser or workstationCreator bound to its service account) to create or attach to a workstation.³⁵¹³⁵²³⁵⁴³⁵⁸
IAM bindings are managed centrally: tenant or team groups are bound to Workstation configs with appropriate roles; individual user access comes from Coditect’s RBAC layer mapping to these groups or directly to Workstations.³⁵²³⁶⁰³⁵⁸³⁵¹

gVisor sandbox assignment per user in shared Workstations

Within each Workstation VM, a Coditect Agent runs, which trusts the control plane’s JWT-bearing requests and never exposes raw Docker/gVisor to end-users directly:

The React frontend calls Django with the user’s JWT to request sandbox creation or execution.
Django validates RBAC (e.g., project:execute_sandbox) and then calls the Agent over gRPC as the only caller, using mTLS service identity and passing user/tenant/project context in metadata.³⁵⁹³⁶²³⁶³
The Agent runs per-sandbox gVisor containers (docker run --runtime=runsc ...) tagged with labels derived from the JWT claims: tenant_id, project_id, user_id, sandbox_id.³⁶⁴³⁵⁵³⁶⁵³⁶⁶³⁶⁷
This allows:
- Multi-user, multi-tenant isolation inside a shared Workstation: each user’s agent instructions correspond to separate gVisor sandboxes, giving a user-space kernel boundary between them.³⁵³³⁵⁵³⁶⁴
- The control plane to enforce per-user and per-project quotas on concurrent sandboxes and resource usage before a CreateSandbox call is made.

The net effect: Django’s JWT+RBAC governs not only access to Coditect features but also which gVisor sandboxes can be created and on which Workstations, with Google IAM governing underlying Workstation access.³⁵³³⁵⁴³⁵⁸³⁵¹³⁵² ³⁶⁸³⁶⁹³⁷⁰³⁷¹³⁷²

⁂

Map required IAM roles and permissions for each service component

Describe JWT token flow and refresh strategy for tenants Design RBAC role hierarchy for users tenants and projects Specify secure storage and rotation for signing keys List API endpoints that need tenant isolation checks only once per request cycle is required for caching or optimization

Coditect’s stack needs clear separation between Google IAM (for Workstations) and app-level JWT/RBAC (for tenants/projects). Below is a concise mapping and design.

IAM roles and permissions per component

1. Coditect Control Plane (Django backend) Service account: coditect-control-plane-sa

Project-level roles on the GCP project(s) hosting Workstations:
- roles/workstations.admin for creating/managing workstation clusters/configs (if Coditect manages infra).³⁷³³⁷⁴
- Or narrower:
  - roles/workstations.workstationCreator to create Workstations from configs.³⁷⁴³⁷⁵
  - roles/workstations.workstationUser if only starting/stopping and connecting.³⁷⁵³⁷⁴
Possibly roles/iam.serviceAccountUser on coditect-workstation-agent-sa if Workstations run agents with that SA and need to impersonate.³⁷⁶³⁷⁴

2. Workstation Agent (per VM) Service account: coditect-workstation-agent-sa

Minimal roles:
- roles/logging.logWriter to send logs to Cloud Logging.
- roles/monitoring.metricWriter if directly pushing metrics.
No direct Workstations API access needed; it only talks to the Control Plane via gRPC.

3. CI / Infra automation Service account: coditect-infra-sa

roles/workstations.admin to create/update Workstations clusters/configs.³⁷³³⁷⁴
roles/iam.serviceAccountAdmin only if managing service accounts for agents.

4. Human users

Google IAM roles for direct Workstations usage (if ever used outside Coditect):
- Typically roles/workstations.user or roles/workstations.workstationUser mapped to groups, but ideally humans only interact via Coditect UI.³⁷⁷³⁷⁴³⁷⁵

JWT token flow and refresh strategy

Claims (access token)

Standard: sub, iat, exp, iss.
Custom:
- tenant_id
- user_id
- project_ids (or current project)
- roles (tenant/global: ["tenant_admin", "project_admin", "developer"])
- Optional workstation_id when bound to a session.

Flow

User logs in via SSO/OIDC; Django maps identity to tenant_id and roles.
Django issues a short-lived access JWT (e.g., 15–30 minutes) signed with HS256/RS256.³⁷⁸³⁷⁹
React TS frontend attaches this JWT in Authorization: Bearer header for API calls.
Django validates token and uses claims for RBAC and tenant isolation checks per request.

Refresh

Maintain a longer-lived refresh token (HTTP-only, Secure cookie) mapped server-side to user/device.
When access token is near expiry, frontend calls /auth/refresh; Django:
- Validates refresh token.
- Issues new access JWT with updated claims (e.g., changed roles/permissions).
Immediate revocation: server-side invalidation list keyed by refresh token ID; access JWTs naturally expire soon.

RBAC role hierarchy (users, tenants, projects)

Structure: tenant-scoped roles + project-scoped roles.

Tenant-level roles

tenant_owner
- Full management of tenant settings, billing, all projects and Workstations within tenant.
tenant_admin
- Manage projects, users, licenses; cannot change billing/legal.
tenant_auditor
- Read-only access to logs, audit, and project configs.

Project-level roles

project_admin
- Manage project membership, settings, Workstation configs for that project.
- Can create/destroy sandboxes and adjust quotas within limits set by tenant.
developer
- Create/exec/destroy sandboxes within project.
- Access project repo, autosave, AI tools (Claude/Gemini) according to policies.
viewer
- Read-only access to logs, code (if allowed), no sandbox execution.

Role mapping and evaluation

JWT contains both tenant and project roles, e.g.:

{
  "tenant_id": "t-acme",
  "user_id": "u-alice",
  "tenant_roles": ["tenant_admin"],
  "project_roles": {
    "proj-foo": ["project_admin"],
    "proj-bar": ["developer"]
  }
}

On each request, Django:
- Checks tenant-level role for tenant-scoped endpoints (user management, workstation config).
- Checks project role for project-scoped endpoints (sandboxes, autosave, AI runs).

Hierarchy:

tenant_owner ⊃ tenant_admin ⊃ {project_admin, developer, viewer}
project_admin ⊃ {developer, viewer}

Secure storage and rotation for signing keys

Key types

Access/refresh token signing keys (JWT).
mTLS certs/keys for gRPC between Control Plane and Agents.³⁸⁰³⁸¹

Storage

Store JWT signing keys in a managed KMS (e.g., Google Cloud KMS) and never embed in images/env vars.³⁷⁸
- Django uses KMS to sign/verify tokens or loads keys from KMS at startup with caching.
Store mTLS certs/keys in:
- Secret manager or KMS, distributed to Workstations via startup scripts or Workstation images.³⁸²³⁸³³⁸⁴

Rotation

JWT signing:
- Use a key ID (kid) in JWT header and maintain a keyset (current + previous).
- Rotate keys periodically (e.g., every 90 days) by introducing new key, updating keyset, and invalidating old one once old tokens expire.
mTLS certs:
- Issue short-lived certs per agent (e.g., via internal CA or GCP CA Service).
- Implement automated renewal and hot-reload on agents and control plane.

Endpoints requiring tenant isolation checks (once per request)

These are endpoints where you must validate tenant and project from JWT against the resource, but can then cache the result for the duration of the request to avoid repeated lookups:

Auth-related endpoints
- POST /auth/refresh (validates tenant/user but no heavy tenant lookup).
Tenant-scoped management
- GET /tenants/{tenant_id}
- GET/POST /tenants/{tenant_id}/users
- GET/POST /tenants/{tenant_id}/workstations/configs
- GET/POST /tenants/{tenant_id}/projects
Project-scoped execution and resources
- POST /projects/{project_id}/sandboxes
- POST /projects/{project_id}/sandboxes/{id}/exec
- DELETE /projects/{project_id}/sandboxes/{id}
- GET /projects/{project_id}/sandboxes (list)
- GET/POST /projects/{project_id}/autosave/* (branch info, promotion)
- POST /projects/{project_id}/ai/claude-run
- POST /projects/{project_id}/ai/gemini-run

Pattern:

At the beginning of each request, Django:
- Extracts JWT; verifies signature and expiry.
- Resolves tenant + project membership and roles once (potentially cached in Redis per token ID).
Stores a request-scoped context (tenant, projects, roles).
Downstream handlers and database calls rely on this context instead of re-checking tenant isolation repeatedly.

This keeps tenant isolation robust and explicit while minimizing redundant checks on a per-request basis.

⁂

ANYTHING ELSE WE SHOULD EXPAND ON?

Two areas are worth expanding next, given your goals and stack:

1. Security / abuse-resistance of sandboxes

For Coditect-as-SaaS, you likely want a more explicit security model for:

Threat modeling
- Enumerate attacker types (malicious tenant, compromised agent, model-generated exploit) and what gVisor + VM boundaries do or do not protect against.³⁸⁵³⁸⁶³⁸⁷³⁸⁸
Network controls inside Workstations
- Egress controls per sandbox (DNS, HTTP, cloud metadata access), ideally via iptables/ebpf or a local proxy.³⁸⁹³⁹⁰
File system & secrets isolation
- Exact pattern for which secrets are visible to sandboxes (e.g., no direct cloud creds; only tightly scoped per-project tokens).
Abuse detection
- Metrics/heuristics for detecting abuse: long-running CPU-bound loops, port scans, mass outbound requests, etc., and how the control plane auto-kills or bans.

You could capture this as a Security Design Addendum to the SDD/TDD, tying gVisor’s documented guarantees and limitations into concrete policies.³⁹¹³⁹²³⁸⁵

2. Developer UX and AI workflow integration

You’ve defined infra and autosave; two UX layers would benefit from a design pass:

Session model and UI semantics
- How Coditect represents “I’m working in project X on Workstation Y with sandboxes Z1–Zn” in the React UI and APIs.
- Standard flows: start session, resume session, sync autosave -> feature branch, run tests via Claude/Gemini, review and promote changes.³⁹³³⁹⁴³⁹⁵
AI tool contracts
- Define a small, explicit “AI tool” API surface (for Claude Code / Gemini) that describes:
  - What they can read/write (files, branches).
  - How they request sandboxes (e.g., a “run tests” tool always goes through /projects/{id}/sandboxes).
- Logging / provenance format tying AI actions to Git commits and sandbox executions for audit and explainability.

If you want to go deeper, good next prompts would be:

“Design a security threat model and concrete mitigations for Coditect’s gVisor-based sandboxes on Google Cloud Workstations.”
“Design the user-facing ‘project session’ model and API for Coditect (sessions, sandboxes, autosave, and AI runs) and how it’s represented in the React TS frontend.”

⁂

Metrics/heuristics for detecting abuse: long-running CPU-bound loops, port scans, mass outbound requests, etc., and how the control plane auto-kills or bans.

You could capture this as a Security Design Addendum to the SDD/TDD, tying gVisor’s documented guarantees and limitations into concrete policies. “Design a security threat model and concrete mitigations for Coditect’s gVisor-based sandboxes on Google Cloud Workstations.” “Design the user-facing ‘project session’ model and API for Coditect (sessions, sandboxes, autosave, and AI runs) and how it’s represented in the React TS frontend.”

Two concrete extensions make sense: a Security Design Addendum (threat model + mitigations + abuse detection) and a Project Session model (how users see sessions/sandboxes/autosave/AI runs).

1. Security Design Addendum: gVisor sandboxes on Cloud Workstations

1.1 Threat model

Assets

- Tenant code and data in `/workspaces/<tenant>/<user>/<project>`.

Coditect secrets (GitHub tokens, model keys, internal APIs).
Workstation VMs and underlying GCP project.
Control plane (Django, DB, audit logs).

Adversaries

Malicious tenant user (tries to escape sandbox, exfiltrate data or abuse compute).
Compromised user account (legit user’s JWT stolen).
Malicious or buggy AI-generated code (infinite loops, network abuse).
Compromised Workstation (agent host taken over).

Trust boundaries

gVisor sandbox boundary between untrusted workload and Workstation kernel.³⁹⁶³⁹⁷³⁹⁸
VM boundary between Workstations and other GCP workloads.³⁹⁹⁴⁰⁰⁴⁰¹
mTLS + RBAC between Control Plane and Agents.⁴⁰²⁴⁰³

1.2 Concrete mitigations

Sandbox isolation

All agent code runs in containers with --runtime=runsc, --network=none (or very constrained egress), --read-only rootfs, and fixed CPU/memory limits.⁴⁰⁴⁴⁰⁵⁴⁰⁶³⁹⁶
Each sandbox mounts only its project workspace and ephemeral scratch; no host paths, no Docker socket.

Secrets & identity

No cloud credentials or GitHub tokens inside sandbox by default; AI tools and Git operations are brokered via Coditect backend.
Per-project, scoped tokens if absolutely needed (e.g., Git LFS or artifact fetch).

Network controls

Default network=none for most sandboxes; “networked” sandboxes use:
- Egress proxy with allowlists (GitHub, package registries).
- Egress quotas (requests/hour) and rate limiting per tenant.

Abuse detection metrics/heuristics

Collected via MetricsReport from agent + host-level firewalls/logs:

CPU abuse
- High CPU utilization over threshold (e.g., >80% of core) for >N seconds with no I/O.
- Many sandboxes at or near CPU limit for same tenant.
- Mitigation:
  - Hard per-sandbox CPU time limit.
  - Tenant-level CPU budget (vCPU-seconds per hour); auto-throttle or reject new sandboxes when exceeded.
Memory abuse
- Repeated OOM kills by same tenant or sandbox pattern.
- Rapid growth of memory usage without progress signals (no logs).
- Mitigation:
  - Strict mem limits per sandbox; repeated OOM → cool-down for tenant/project.
Network abuse (for allowed-network sandboxes)
- High rate of outbound connections to distinct IPs (port scan signature).
- Large outbound volume to non-approved domains.
- Mitigation:
  - Egress proxy detecting port scans / connection bursts.
  - Auto-kill sandbox on detection; temporarily block tenant from networked sandboxes.
Filesystem abuse
- Excessive writes (GiB/min) or inode creation in workspace or scratch.
- Mitigation:
  - Quotas on workspace volume size and inode count.
  - Kill sandboxes exceeding thresholds; alert.
Command behavior heuristics
- Detect repeated fork bombs, suspicious binaries, or known exploit toolchains via process monitoring inside sandbox (as far as gVisor allows), plus signatures in stdout/stderr.⁴⁰⁶⁴⁰⁷

Automated responses

Sandbox-level:
- Hard kill (SIGKILL) + mark run as “abuse suspected”.
- Lock that sandbox ID and do not permit further execs.
Project-level:
- Temporary throttle (e.g., max 1 concurrent sandbox for 30 minutes).
- Require manual approval for networked sandboxes.
Tenant-level:
- If multiple projects trigger abuse heuristics within a time window, soft-ban network access or sandbox creation, pending admin review.

All actions logged to audit logs with tenant/project/user/sandbox IDs for post-incident review.

2. Project Session model and API (React TS + Django)

2.1 Conceptual model

Entities

Tenant: organization.
Project: codebase + configuration (Git repo, AI tools enabled, quotas).
Session: a developer’s active workspace in a project, bound to a Workstation and one or more sandboxes.
Sandbox: a gVisor-backed execution environment inside the Workstation.
AI Run: an invocation of Claude or Gemini on a project (code edits or analysis).
Autosave: background Git snapshots in autosave/... branches.

2.2 REST/GraphQL API shape

Sessions

POST /projects/{project_id}/sessions
- Creates or attaches to a session; returns session_id, workstation info, active sandboxes.
GET /projects/{project_id}/sessions/{session_id}
- Returns current state: Workstation, sandboxes, autosave status, active AI runs.
DELETE /projects/{project_id}/sessions/{session_id}
- Ends session (may leave Workstation running but cleans up sandboxes/autosave processes).

Sandboxes

POST /projects/{project_id}/sessions/{session_id}/sandboxes
- Create sandbox; Django calls CreateSandbox on relevant Agent.
POST /projects/{project_id}/sessions/{session_id}/sandboxes/{sandbox_id}/exec
- Start an exec; returns stream token or WebSocket URL for front-end to attach.
DELETE /projects/{project_id}/sessions/{session_id}/sandboxes/{sandbox_id}
- Destroy sandbox.

Autosave

GET /projects/{project_id}/autosave/status
- Summarizes autosave branches and last snapshot time per user/session.
POST /projects/{project_id}/autosave/promote
- Promotes autosave branch into a feature branch (e.g., squash merge) and opens PR.

AI runs

POST /projects/{project_id}/ai/claude-run
- Body: task description, scope (files), optional session/sandbox IDs.
POST /projects/{project_id}/ai/gemini-run
- Same shape.
GET /projects/{project_id}/ai/runs/{run_id}
- Status, logs, diff summary, linked sandbox execs, and autosave commits.

Each endpoint enforces tenant/project isolation based on JWT once per request at the Django layer, as discussed earlier.

2.3 React TS frontend representation

State model (per user)

currentTenant, currentProject.
currentSession {id, workstation, sandboxes[], autosaveSummary, aiRuns[]}.
Derived UI states:
- Session status: CONNECTING, ACTIVE, IDLE, ENDED.
- Sandbox list: each with status (CREATING, RUNNING, IDLE, ERROR), last exec, CPU/mem utilization.
- Autosave indicator: last snapshot time, branch, and Git status compared to main.
- AI activity: list of recent runs with statuses (RUNNING, SUCCEEDED, FAILED).

UI affordances

“Open Project Session” button → POST /sessions → shows Workstation IDE URL and Coditect panel.
“Run in Sandbox” buttons associated with:
- Test commands (pytest/go test).
- Build commands.
- Arbitrary shell commands.
A “Sandboxes” pane showing each sandbox’s logs (via WebSocket/stream connection to ExecSandbox forwarded by Django).
“Autosave” indicator (e.g., green when autosave branch is healthy and up-to-date, orange when paused due to conflicts).
“AI Assist” panel where Claude/Gemini suggestions are tied to AI runs and autosave commits (each suggestion lists the Git commit/branch produced).

Session lifecycle flows

Start: user picks project → frontend calls POST /sessions → backend provisions Workstation (if needed) and returns session.
Work: user edits code (via Workstation IDE), triggers tests and AI tools through Coditect UI; autosave daemon pushes to Git.
Pause/Resume: session can be resumed if Workstation still alive; Coditect uses session ID to rehydrate state (sandboxes, autosave, AI runs).
End: session ends, sandboxes are destroyed; Workstation may be stopped or left for reuse based on policy.

If you want, next steps could be:

A dedicated Security Design Addendum document outline (sections for threat model, STRIDE analysis, policies per risk) filled in for this architecture.
A React TS interface contract (types/interfaces) for Session, Sandbox, AIRun, and AutosaveSummary that you can drop straight into your front-end code.

⁂

A dedicated Security Design Addendum document outline (sections for threat model, STRIDE analysis, policies per risk) filled in for this architecture.

A React TS interface contract (types/interfaces) for Session, Sandbox, AIRun, and AutosaveSummary that you can drop straight into your front-end code.

Below is a compact Security Design Addendum outline (pre-filled for Coditect on Cloud Workstations + gVisor) and a React TS interface contract you can paste into your FE codebase.

Security Design Addendum (Coditect + Cloud Workstations + gVisor)

1. Purpose and scope

Document security posture and controls for Coditect’s gVisor-based sandboxes running inside Google Cloud Workstations.
Extend SDD/TDD with explicit threat model, STRIDE analysis, and policies for sandboxed agent execution.⁴⁰⁸⁴⁰⁹⁴¹⁰⁴¹¹⁴¹²⁴¹³⁴¹⁴

Applies to:

Django control plane + React TS frontend.
Workstations VMs and Coditect Agent.
gVisor runsc sandboxes for untrusted agent code.
Git autosave and AI (Claude/Gemini) tooling.

2. Assets

Code & data: project repositories, configuration, secrets in .coditect/, autosave branches.
Identity & auth: JWTs, refresh tokens, user/tenant/project mappings.
Infra: Workstations VMs, gVisor runtime, Coditect Agents, Control Plane, DB, logs.
Third-party credentials: GitHub tokens, AI model keys, any per-tenant API keys.

3. Trust boundaries

Browser ↔ Django: HTTPS, JWT-based auth; browser untrusted.
Django ↔ Workstation Agent: gRPC over mTLS; only Coditect Control Plane may call agents.⁴¹⁵⁴¹⁶
Agent ↔ sandbox: Docker + gVisor runsc runtime; sandbox is untrusted code, separated from host kernel.⁴¹⁰⁴¹¹⁴¹⁴⁴¹⁷⁴¹⁸
Workstation VM ↔ GCP project: hypervisor isolation; Workstations managed by Cloud Workstations controller.⁴⁰⁹⁴¹⁹⁴⁰⁸

4. STRIDE analysis (per threat, with mitigations/policies)

4.1 Spoofing

Risks:

Attacker impersonates a user or control-plane service.
Rogue client tries to talk directly to Workstation Agent.

Mitigations:

User auth: SSO/OIDC → short-lived access JWTs; refresh tokens in HTTP-only cookies; per-tenant RBAC enforced server-side.⁴²⁰⁴²¹
Service auth: mTLS between Control Plane and Agents, with CA-issued certs and SANs (spiffe://coditect/control-plane vs .../workstation/<id>).⁴¹⁶⁴¹⁵
Agents reject any non-mTLS or invalid cert; only accept control-plane CN/SAN.

Policies:

Tokens: access tokens ≤30 min; refresh tokens revocable server-side.
Regular rotation of certs and JWT signing keys via KMS/CA.

4.2 Tampering

Risks:

Malicious sandbox modifies files outside workspace or tampers with other sandboxes.
Attacker alters logs or audit records.

Mitigations:

gVisor sandbox: untrusted code runs with --runtime=runsc, read-only rootfs, only /workspace volume mounted.⁴¹⁴⁴¹⁷⁴¹⁸⁴²²⁴¹⁰
No hostPath or Docker socket mounts; each sandbox has its own container filesystem.
Central, append-only audit log in Control Plane; sandboxes cannot access it.

Policies:

All sandbox containers must use a hard-coded runsc runtime; no fall-back to runc for untrusted workloads.
Control Plane rejects any attempt to run execs on containers not labeled as Coditect-owned sandboxes.

4.3 Repudiation

Risks:

Users deny having run specific code or AI actions; incidents lack attribution.

Mitigations:

Detailed audit logs: user_id, tenant_id, project_id, sandbox_id, Workstation ID, exec commands, AI tool used, timestamps.
AI runs tied to autosave commits and Git author metadata.

Policies:

Audit events are immutable, stored in an append-only log or WORM-capable storage.
Any admin action manipulating sandboxes or Workstations is logged with actor ID.

4.4 Information disclosure

Risks:

Sandbox reads secrets or code belonging to other projects/tenants.
Sandbox exfiltrates data over network.

Mitigations:

- Workspace isolation: each sandbox only mounts `/workspaces/<tenant>/<user>/<project>` and ephemeral scratch.  

No global filesystem or /home mount; no cloud metadata access.
Default network=none or strict outbound allowlist with egress proxy.⁴¹⁹⁴²²⁴²³
Secrets kept out of sandbox: GitHub tokens, AI keys live in Control Plane; any external calls happen via backend, not directly from sandbox.

Policies:

Any network-enabled sandbox is tied to project policy and tenant risk level; logs of outbound requests with rate limits.
No direct DB or internal service endpoints exposed in sandbox environment.

4.5 Denial of Service

Risks:

Infinite loops / CPU bombs.
Memory bombs, fork bombs.
Port scans or outbound floods.

Mitigations:

gVisor with cgroup CPU/mem limits per container; enforced timeout_seconds and idle_timeout_seconds.⁴²⁴⁴²⁵⁴²⁶
Quota service: per-tenant limits on concurrent sandboxes, vCPU-seconds, memory, and networked sandbox count.
Abuse heuristics:
- CPU >80% for >N seconds with no output → flagged.
- Repeated OOMs / process restarts → auto-kill and cool-down.
- Outbound connection patterns matching port scans → immediate kill, tenant throttling.

Policies:

Sandbox is auto-terminated upon exceeding CPU-time/memory or triggering heuristics; tenant may be temporarily banned from new sandboxes based on configurable thresholds.

4.6 Elevation of privilege

Risks:

Sandbox escapes gVisor to host Workstation.
Compromised Workstation tries to impersonate Control Plane.

Mitigations:

gVisor: user-space kernel intercepting syscalls, reducing host attack surface.⁴¹¹⁴¹²⁴¹³⁴¹⁰⁴¹⁴
Each Workstation runs under a restricted service account with minimal GCP IAM permissions.
Control Plane authenticates agent identity via cert SANs and uses allowlists of agent IDs; compromised Workstation cannot impersonate another.⁴¹⁵⁴¹⁶

Policies:

Regularly update gVisor per production guide and CVEs.⁴¹⁰⁴¹¹
Periodic security scans of Workstation images; limit installed tooling.

React TS interface contract

You can drop this into a types/session.ts or similar.

// Session, Sandbox, AI Run, AutosaveSummary types for Coditect FE

export type SessionStatus = 'CONNECTING' | 'ACTIVE' | 'IDLE' | 'ENDING' | 'ENDED' | 'ERROR';

export interface Session {
  id: string;
  tenantId: string;
  projectId: string;
  userId: string;

  workstationId: string;
  workstationName?: string;
  workstationRegion?: string;

  status: SessionStatus;
  createdAt: string;       // ISO 8601
  lastActiveAt: string;    // ISO 8601

  sandboxes: Sandbox[];
  autosave: AutosaveSummary | null;
  aiRuns: AIRunSummary[];
}

export type SandboxStatus = 'CREATING' | 'RUNNING' | 'IDLE' | 'TERMINATING' | 'TERMINATED' | 'FAILED';

export interface Sandbox {
  id: string;
  sessionId: string;
  tenantId: string;
  projectId: string;
  userId: string;

  image: string;
  runtime: 'gvisor';
  cpuLimit: number;        // cores
  memoryLimitMiB: number;

  status: SandboxStatus;
  createdAt: string;
  lastExecAt?: string;

  // Live metrics snapshot (optional, can be filled from metrics stream)
  metrics?: SandboxMetrics;
}

export interface SandboxMetrics {
  cpuSeconds: number;
  cpuPercent?: number;
  memoryPeakMiB: number;
  bytesRead: number;
  bytesWritten: number;
  lastUpdatedAt: string;
}

export type AIRunStatus = 'QUEUED' | 'RUNNING' | 'SUCCEEDED' | 'FAILED' | 'CANCELLED';

export interface AIRunSummary {
  id: string;
  sessionId: string;
  projectId: string;
  userId: string;

  modelProvider: 'claude' | 'gemini';
  modelName: string;

  status: AIRunStatus;
  createdAt: string;
  completedAt?: string;

  // human-readable summary for UI
  title: string;
  description?: string;

  // link to code changes produced
  autosaveBranch?: string;
  autosaveCommitId?: string;
}

export interface AutosaveSummary {
  tenantId: string;
  projectId: string;
  userId: string;
  ticketId?: string;

  branchName: string;        // e.g. autosave/PROJ-1234/acme/foo/alice/ws-01/20260114
  lastCommitId: string;
  lastCommitAt: string;

  // is autosave loop healthy
  status: 'HEALTHY' | 'PAUSED' | 'ERROR';

  // optional human description for UI
  statusMessage?: string;
}

This gives you:

Session as the top-level unit tying together Workstation, sandboxes, autosave, and AI runs.
Sandbox as a gVisor-backed execution unit with metrics.
AIRunSummary for UI lists and details panes.
AutosaveSummary for status indicators and promotion flows.

⁂

Sandboxes

In addition to the Function interface, Modal has a direct interface for defining containers at runtime and securely running arbitrary code inside them. This can be useful if, for example, you want to: Execute code generated by a language model. Create isolated environments for running untrusted code. Check out a git repository and run a command against it, like a test suite, or npm lint. Run containers with arbitrary dependencies and setup scripts. Each individual job is called a Sandbox and can be created using the Sandbox.create constructor: PythonJavaScriptGo import modal

app = modal.App.lookup("my-app", create_if_missing=True)

sb = modal.Sandbox.create(app=app)

p = sb.exec("python", "-c", "print('hello')", timeout=3) print(p.stdout.read())

p = sb.exec("bash", "-c", "for i in {1..10}; do date +%T; sleep 0.5; done", timeout=5) for line in p.stdout: # Avoid double newlines by using end="". print(line, end="")

sb.terminate() Copy Note: you can run the above example as a script directly with python my_script.py. modal run is not needed here since there is no entrypoint. Sandboxes require an App to be passed when spawned from outside of a Modal container. You may pass in a regular App object or look one up by name with App.lookup. The create_if_missing flag on App.lookup will create an App with the given name if it doesn’t exist. Lifecycle Timeouts Sandboxes have a default maximum lifetime of 5 minutes. You can change this by passing a timeout of up to 24 hours to the Sandbox.create(...) function. PythonJavaScriptGo sb = modal.Sandbox.create(app=my_app, timeout=10*60) # 10 minutes Copy If you need a Sandbox to run for more than 24 hours, we recommend using Filesystem Snapshots to preserve its state, and then restore from that snapshot with a subsequent Sandbox. Idle Timeouts Sandboxes can also be automatically terminated after a period of inactivity - you can do this by setting the idle_timeout parameter. A Sandbox is considered active if any of the following are true: It has an active command running (via sb.exec(...)) Its stdin is being written to (via sb.stdin.write()) It has an open TCP connection over one of its Tunnels Configuration Sandboxes support nearly all configuration options found in regular modal.Functions. Refer to Sandbox.create for further documentation on Sandbox configs. For example, Images and Volumes can be used just as with functions: PythonJavaScriptGo sb = modal.Sandbox.create( image=modal.Image.debian_slim().pip_install("pandas"), volumes={"/data": modal.Volume.from_name("my-volume")}, workdir="/repo", app=my_app, ) Copy Environments Environment variables You can set environment variables using inline secrets: PythonJavaScriptGo secret = modal.Secret.from_dict({"MY_SECRET": "hello"})

sb = modal.Sandbox.create( secrets=[secret], app=my_app, ) p = sb.exec("bash", "-c", "echo $MY_SECRET") print(p.stdout.read()) Copy Custom Images Sandboxes support custom images just as Functions do. However, while you’ll typically invoke a Modal Function with the modal run cli, you typically spawn a Sandbox with a simple script call. As such, you may need to manually enable output streaming to see your image build logs: PythonJavaScriptGo image = modal.Image.debian_slim().pip_install("pandas", "numpy")

with modal.enable_output(): sb = modal.Sandbox.create(image=image, app=my_app) Copy Dynamically defined environments Note that any valid Image or Mount can be used with a Sandbox, even if those images or mounts have not previously been defined. This also means that Images and Mounts can be built from requirements at runtime. For example, you could use a language model to write some code and define your image, and then spawn a Sandbox with it. Check out devlooper for a concrete example of this. Running a Sandbox with an entrypoint In most cases, Sandboxes are treated as a generic container that can run arbitrary commands. However, in some cases, you may want to run a single command or script as the entrypoint of the Sandbox. You can do this by passing command arguments to the Sandbox constructor: PythonJavaScriptGo sb = modal.Sandbox.create("python", "-m", "http.server", "8080", app=my_app, timeout=10) for line in sb.stdout: print(line, end="") Copy This functionality is most useful for running long-lived services that you want to keep running in the background. See our Jupyter notebook example for a more concrete example of this. Referencing Sandboxes from other code If you have a running Sandbox, you can retrieve it using the from_id method. PythonJavaScriptGo sb = modal.Sandbox.create(app=my_app) sb_id = sb.object_id

... later in the program ...

sb2 = modal.Sandbox.from_id(sb_id)

p = sb2.exec("echo", "hello") print(p.stdout.read()) sb2.terminate() Copy A common use case for this is keeping a pool of Sandboxes available for executing tasks as they come in. You can keep a list of object_ids of Sandboxes that are “open” and reuse them, closing over the object_id in whatever function is using them. Logging You can see Sandbox execution logs using the verbose option. For example: PythonJavaScriptGo sb = modal.Sandbox.create(app=my_app, verbose=True)

p = sb.exec("python", "-c", "print('hello')") print(p.stdout.read())

with sb.open("test.txt", "w") as f: f.write("Hello World\n") Copy shows Sandbox logs: Sandbox exec started: python -c print('hello') Opened file 'test.txt': fd-yErSQzGL9sig6WAjyNgTPR Wrote to file: fd-yErSQzGL9sig6WAjyNgTPR Closed file: fd-yErSQzGL9sig6WAjyNgTPR Copy Named Sandboxes You can assign a name to a Sandbox when creating it. Each name must be unique within an app - only one running Sandbox can use a given name at a time. Note that the associated app must be a deployed app. Once a Sandbox completely stops running, its name becomes available for reuse. Some applications find Sandbox Names to be useful for ensuring that no more than one Sandbox is running per resource or project. If a Sandbox with the given name is already running, create() will raise an error. PythonJavaScriptGo sb1 = modal.Sandbox.create(app=my_app, name="my-name")

This will raise a modal.exception.AlreadyExistsError.

sb2 = modal.Sandbox.create(app=my_app, name="my-name") Copy A named Sandbox may be fetched from a deployed app using from_name() but only if the Sandbox is currently running. If no running Sandbox is found, from_name() will raise an error. PythonJavaScriptGo my_app = modal.App.lookup("my-app", create_if_missing=True) sb1 = modal.Sandbox.create(app=my_app, name="my-name")

Returns the currently running Sandbox with the name "my-name" from the

deployed app named "my-app".

sb2 = modal.Sandbox.from_name("my-app", "my-name") assert sb1.object_id == sb2.object_id # sb1 and sb2 refer to the same Sandbox Copy Sandbox Names may contain only alphanumeric characters, dashes, periods, and underscores, and must be shorter than 64 characters. Tagging Sandboxes can also be tagged with arbitrary key-value pairs. These tags can be used to filter results in Sandbox.list. PythonJavaScriptGo sandbox_v1_1 = modal.Sandbox.create("sleep", "10", app=my_app) sandbox_v1_2 = modal.Sandbox.create("sleep", "20", app=my_app)

sandbox_v1_1.set_tags({"major_version": "1", "minor_version": "1"}) sandbox_v1_2.set_tags({"major_version": "1", "minor_version": "2"})

for sandbox in modal.Sandbox.list(app_id=my_app.app_id): # All sandboxes. print(sandbox.object_id)

for sandbox in modal.Sandbox.list( app_id=my_app.app_id, tags={"major_version": "1"}, ): # Also all sandboxes. print(sandbox.object_id)

for sandbox in modal.Sandbox.list( app_id=app.app_id, tags={"major_version": "1", "minor_version": "2"}, ): # Just the latest sandbox. print(sandbox.object_id)

Modal’s Sandbox API gives a very clear, high-level contract for “containers defined at runtime” that Coditect’s gVisor-based Workstation model should emulate conceptually, even though the underlying implementation is different.⁴²⁷⁴²⁸⁴²⁹⁴³⁰

Here’s how to translate the key ideas into Coditect’s architecture you’ve been designing:

1. Core conceptual mapping

Modal concept → Coditect equivalent:

modal.App (deployment context, config namespace) → Coditect Project (and Tenant): holds sandbox policies, images, volumes, secrets, tags.
Sandbox.create(app=..., ...) → POST /projects/{project_id}/sandboxes → Control Plane chooses Workstation + Workstation Agent → docker run --runtime=runsc ... gVisor container.⁴²⁸⁴²⁹⁴³¹⁴²⁷
sb.exec(...) returning process handles and streams → POST /sandboxes/{id}/exec (REST) + gRPC ExecSandbox stream between Control Plane and Agent, with stdout/stderr and exit info.⁴³²⁴³³⁴³⁴
Timeouts / idle timeouts (5 min default, up to 24h) → Coditect sandbox-level timeout_seconds + idle_timeout_seconds enforced by Agent and tracked centrally; gVisor container is killed by Control Plane when limits exceeded.⁴²⁹⁴³⁵⁴³⁶
Images, volumes, workdir, env, secrets → Image selection and volume mounts in CreateSandboxRequest (Docker+runsc args); environment variables plus secret injection handled by Agent but sourced from tenant-scoped secret stores.⁴³¹⁴²⁸⁴²⁹
Named sandboxes (name=, uniqueness per app) and from_name → Named Coditect sandboxes (unique sandbox_name per project) and lookups via Control Plane DB, not direct to the runtime.
Tags and Sandbox.list(tags=...) → Labels/metadata on Coditect sandbox records, filterable via Control Plane API (and eventually CLI/UI).

2. Patterns worth explicitly copying

Even though Modal is closed-source, its behavior model is a good blueprint:

Short-lived, time-bounded sandboxes (5 min default, up to 24h) with explicit timeout and idle_timeout:
- Coditect’s API should require both, with defaults and hard caps, and enforce them via Agent + Control Plane.
Exec as primary interaction:
- Keep Coditect’s sandbox interface centered on exec commands (one or many) rather than building a full “VM” abstraction.
Image + runtime-defined dependencies:
- Modal’s dynamic image building (e.g., Image.debian_slim().pip_install("pandas")) maps to a build pipeline in Coditect (Dockerfile builder, Nix, etc.); you can accept high-level “requirements” and use an internal builder to create images cached per project.⁴³⁶⁴²⁸
Secrets as env injection:
- Use project/tenant-scoped secret references that the Agent resolves, never push raw secrets into the sandbox API surface.
Named and tagged sandboxes:
- Names for uniqueness (“only one Jupyter/HTTP server per project”), tags for arbitrary attributes (version, tool type, ticket).

3. Where Coditect diverges (by design)

Runtime: Coditect uses gVisor (runsc) inside VMs (Workstations) rather than a proprietary serverless microVM fleet; isolation is still improved over plain containers, but you control the whole life-cycle from inside your Workstations.⁴³⁰⁴³⁷⁴³⁸⁴²⁷⁴²⁸⁴²⁹
Control plane: Django + React + gRPC, not Modal’s infra; you can tailor quotas, RBAC, and multi-tenant policies tightly to your SaaS model.
Autosave and Git integration: Coditect’s autosave branches and worktrees are first-class; each sandbox’s work is automatically reflected in Git in near-real time, which Modal does not provide out of the box.⁴³⁹⁴⁴⁰⁴⁴¹⁴⁴²
Security policy layer: you’re layering explicit abuse detection (CPU, network, filesystem) and tenant-level throttling on top of the sandbox abstraction, using metrics the Agent and Workstations can expose.⁴³⁵⁴⁴³⁴⁴⁴⁴⁴⁵⁴⁴⁶⁴⁴⁷

4. Suggested next focused spec

If you want to turn this Modal-Sandbox mental model directly into Coditect docs/specs, a good next artifact would be:

“Coditect Sandbox API: a Modal-Sandbox–inspired spec for Create/Exec/Terminate, including timeout/idle semantics, named sandboxes, tags, and a clear mapping to gVisor containers on Cloud Workstations.”

That spec would:

Enumerate fields like image, volumes, env, secrets, timeout, idle_timeout, name, tags, and workdir.
Define exact behaviors for timeouts, idle detection, and reattachment (like from_id / from_name).
Tie them to the Workstation Agent’s gRPC API and to your RBAC/tenant model.

⁂

turn this Modal-Sandbox mental model directly into Coditect docs/specs, a good next artifact would be:

“Coditect Sandbox API: a Modal-Sandbox–inspired spec for Create/Exec/Terminate, including timeout/idle semantics, named sandboxes, tags, and a clear mapping to gVisor containers on Cloud Workstations.” That spec would: Enumerate fields like image, volumes, env, secrets, timeout, idle_timeout, name, tags, and workdir. Define exact behaviors for timeouts, idle detection, and reattachment (like from_id / from_name). Tie them to the Workstation Agent’s gRPC API and to your RBAC/tenant model.

Below is a “Coditect Sandbox API” spec, directly modeled on Modal Sandboxes but mapped to your gVisor-on-Workstations implementation, RBAC, and gRPC agent.

1. Overview

Coditect Sandboxes provide runtime-defined containers inside Google Cloud Workstations, backed by gVisor (runsc), for executing untrusted or model-generated code per tenant/project.⁴⁴⁸⁴⁴⁹⁴⁵⁰⁴⁵¹⁴⁵²

Core capabilities:

Create an isolated sandbox (container) with configurable image, volumes, env, secrets, tags, timeout, idle_timeout, workdir, and optional name.
Execute commands (exec) with interactive stdin, streamed stdout/stderr, and timeouts.⁴⁵³⁴⁵⁴⁴⁵⁵⁴⁵⁶
Terminate sandboxes cleanly or forcefully.
Reattach to running sandboxes by id or name within a project.

All calls are project-scoped, tenant-isolated via JWT+RBAC, and enforced by the Django control plane.

2. REST API surface (high level)

Base path: /v1/projects/{project_id}

2.1 Create sandbox

POST /v1/projects/{project_id}/sandboxes

Request body

{
  "name": "optional-unique-name",
  "image": "ghcr.io/coditect/runtime:py310",
  "volumes": {
    "/workspace": "project-workspace",
    "/data": "project-datasets-ro"
  },
  "workdir": "/workspace",
  "env": {
    "PYTHONUNBUFFERED": "1"
  },
  "secrets": [
    "hf-token-readonly"
  ],
  "timeout_seconds": 300,
  "idle_timeout_seconds": 60,
  "tags": {
    "ticket": "PROJ-1234",
    "tool": "tests",
    "tenant": "acme"
  },
  "runtime": "gvisor"
}

Fields

name (optional):
- Unique per project among running sandboxes.
- Characters: [A-Za-z0-9._-], length < 64.
- If a running sandbox with this name exists, creation fails with 409 AlreadyExists.
image: container image (string).

- `volumes`: map mountPath → volume ID; e.g. `project-workspace`→ `/workspaces/<tenant>/<user>/<project>`.  

workdir: default working directory for execs.
env: static environment variables.
secrets: secret reference IDs (resolved by control plane, not by agent).
timeout_seconds: max wall-clock lifetime, 1–86400 (1–86400 seconds, default 300).
idle_timeout_seconds: idle timeout (no active exec/stdin/tunnel) before auto-termination.
tags: arbitrary key/value strings, used for filtering and audit.
runtime: must be "gvisor" for this implementation.

Response

{
  "id": "sb-uuid",
  "project_id": "proj-uuid",
  "tenant_id": "tenant-uuid",
  "name": "optional-unique-name",
  "image": "ghcr.io/coditect/runtime:py310",
  "status": "RUNNING",
  "created_at": "2026-01-15T07:51:00Z",
  "timeout_seconds": 300,
  "idle_timeout_seconds": 60,
  "tags": {
    "ticket": "PROJ-1234",
    "tool": "tests"
  }
}

2.2 Exec in sandbox

POST /v1/projects/{project_id}/sandboxes/{sandbox_id}/exec

Request

{
  "command": ["python", "-m", "pytest"],
  "workdir": "/workspace",
  "timeout_seconds": 120,
  "pty": false
}

Behavior

Creates a logical exec session under the sandbox.
Control plane opens a gRPC ExecSandbox stream to the Agent and returns a stream token or WebSocket URL to the client.⁴⁵⁴⁴⁵⁵⁴⁵⁶⁴⁵³

Response

{
  "exec_id": "exec-uuid",
  "stream_url": "wss://api.coditect.io/v1/streams/exec/exec-uuid"
}

Client then:

Connects to stream_url.
Sends stdin chunks and control messages (if interactive).
Receives stdout/stderr and exit events in a JSON or protobuf-encoded stream.

2.3 Terminate sandbox

DELETE /v1/projects/{project_id}/sandboxes/{sandbox_id}

Gracefully terminates the sandbox (SIGTERM then SIGKILL); cleans up container.
Returns 204 on success.

2.4 Lookup by name / list with tags

GET /v1/projects/{project_id}/sandboxes?name=foo
- Returns currently running sandbox with that name or 404.
GET /v1/projects/{project_id}/sandboxes?tags[ticket]=PROJ-1234&tags[tool]=tests
- Returns list of matching sandboxes (aggregated from control-plane DB).

This mirrors Sandbox.from_name and Sandbox.list(tags=...) semantics.

3. Timeout and idle semantics

3.1 Total timeout

timeout_seconds defines the max wall-clock lifetime from sandbox creation to termination.
Enforced by Control Plane and Agent:
- Control Plane stores created_at + timeout_seconds as deadline_at.
- Background janitor / per-sandbox timer kills the sandbox when now ≥ deadline.

3.2 Idle timeout

A sandbox is considered active if any of:

An ExecSandbox session is running (ExecStatus not DONE/ERROR).
There is recent stdin activity (last ExecStdin within N seconds).
(Future) There is an open tunnel connection.

If none of the above for idle_timeout_seconds, Control Plane marks sandbox as idle and:

Sends a DestroySandbox to Agent.
Marks status TERMINATED with reason IDLE_TIMEOUT.

3.3 Exec-level timeout

timeout_seconds passed in exec request acts as per-command timeout.
Sandbox staying alive but the exec process is killed when exec deadline is reached.
Exec exit event: exit_code non-zero with reason = "TIMEOUT".

4. Reattachment semantics (`from_id` / `from_name`)

4.1 by `id`

GET /v1/projects/{project_id}/sandboxes/{sandbox_id}
- Returns sandbox metadata if still running; 404 if terminated or never existed.
POST /v1/projects/{project_id}/sandboxes/{sandbox_id}/exec
- Always allowed if sandbox status == RUNNING and within timeout.

4.2 by `name`

Names are unique among running sandboxes per project.
GET /v1/projects/{project_id}/sandboxes?name=<name> → returns the running sandbox with that name or 404.

This provides a from_name analog without exposing the underlying runtime’s object IDs.

5. Mapping to Workstation Agent gRPC

The REST API above is implemented via the Workstation Agent gRPC service defined earlier.⁴⁵⁵⁴⁵⁶⁴⁵⁷⁴⁵³

POST /sandboxes → SandboxAgent.CreateSandbox(CreateSandboxRequest)
- Maps:
  - image → Docker image.
  - volumes → -v mount flags, workspace paths.
  - env → -e env flags.
  - timeout_seconds → stored in Control Plane, not passed directly to Docker.
  - idle_timeout_seconds → stored in Control Plane (timer logic).
POST /sandboxes/{id}/exec → SandboxAgent.ExecSandbox (bidirectional stream):
- Control Plane initiates stream, sends ExecStart (command, workdir, timeout, pty), then handles stdin/ctrl.
- Agent streams stdout/stderr/events.
DELETE /sandboxes/{id} → SandboxAgent.DestroySandbox:
- docker rm -f coditect-sb-<sandbox_id> with runsc runtime.⁴⁵⁸⁴⁵⁹⁴⁴⁸

RBAC and tenancy:

Django reads JWT, performs tenant/project RBAC checks once per REST request, and only then calls Agent gRPC with tenant_id, project_id, user_id in metadata.
Agent does not perform multi-tenant checks; it trusts the Control Plane identity via mTLS, but uses the metadata for logging and metrics attribution.⁴⁶⁰⁴⁶¹⁴⁶²

6. RBAC and multi-tenant constraints

Only roles with sandbox:create may call POST /projects/{project_id}/sandboxes.
Only roles with sandbox:execute may exec; sandbox:destroy to terminate.
name and tags are always scoped to (tenant_id, project_id); two tenants can have sandboxes named my-name without conflict.
Quotas (per tenant/project) enforced at Control Plane before CreateSandbox:
- Max concurrent sandboxes.
- Max total vCPU/memory for running sandboxes.

7. Example JSON schemas (abridged)

Sandbox

{
  "id": "sb-uuid",
  "project_id": "proj-uuid",
  "tenant_id": "tenant-uuid",
  "name": "jupyter-dev",
  "image": "ghcr.io/coditect/runtime:py310",
  "runtime": "gvisor",
  "status": "RUNNING",
  "created_at": "2026-01-15T07:51:00Z",
  "timeout_seconds": 7200,
  "idle_timeout_seconds": 300,
  "tags": {
    "ticket": "PROJ-1234",
    "tool": "jupyter"
  }
}

Exec event stream (over WebSocket)

{ "type": "status", "phase": "STARTING", "message": "exec started" }
{ "type": "stdout", "data": "Running tests...\n" }
{ "type": "stderr", "data": "warning: deprecation\n" }
{ "type": "exit", "exit_code": 0, "reason": "NORMAL" }

This spec gives you a clear, Modal-like contract at the Coditect API layer while remaining faithful to your gVisor-on-Workstations implementation, gRPC agent model, and RBAC/tenancy requirements. ⁴⁶³⁴⁶⁴⁴⁶⁵⁴⁶⁶⁴⁶⁷⁴⁶⁸⁴⁶⁹⁴⁷⁰⁴⁷¹

⁂

Draft API endpoints and payloads for Create/Exec/Terminate sandbox

Specify timeout and retry semantics for sandbox operations Define authorization and RBAC for sandbox API calls Describe observability metrics and logging for sandboxes List error codes and failure handling behaviors for runsc sandboxes

Below is a concise API spec you can drop into Coditect docs.

1. REST API endpoints and payloads

Base: /v1/projects/{project_id} All requests require Authorization: Bearer <JWT> with tenant/project claims.

1.1 Create sandbox

POST /v1/projects/{project_id}/sandboxes

Request body

{
  "name": "optional-unique-name",
  "image": "ghcr.io/coditect/runtime:py310",
  "volumes": {
    "/workspace": "project-workspace",
    "/data": "datasets-ro"
  },
  "workdir": "/workspace",
  "env": {
    "PYTHONUNBUFFERED": "1"
  },
  "secrets": [
    "hf-token-readonly"
  ],
  "timeout_seconds": 300,
  "idle_timeout_seconds": 60,
  "tags": {
    "ticket": "PROJ-1234",
    "tool": "tests"
  },
  "runtime": "gvisor"
}

Response 201

{
  "id": "sb-uuid",
  "project_id": "proj-uuid",
  "tenant_id": "tenant-uuid",
  "name": "optional-unique-name",
  "image": "ghcr.io/coditect/runtime:py310",
  "runtime": "gvisor",
  "status": "RUNNING",
  "created_at": "2026-01-15T07:52:00Z",
  "timeout_seconds": 300,
  "idle_timeout_seconds": 60,
  "tags": {
    "ticket": "PROJ-1234",
    "tool": "tests"
  }
}

1.2 Exec in sandbox

POST /v1/projects/{project_id}/sandboxes/{sandbox_id}/exec

Request

{
  "command": ["python", "-m", "pytest"],
  "workdir": "/workspace",
  "timeout_seconds": 120,
  "pty": false
}

Response 200

{
  "exec_id": "exec-uuid",
  "stream_url": "wss://api.coditect.io/v1/streams/exec/exec-uuid"
}

Client connects to stream_url for bidirectional streaming of stdin/stdout/stderr (via WebSocket or SSE proxying gRPC).⁴⁷²⁴⁷³⁴⁷⁴⁴⁷⁵

1.3 Terminate sandbox

DELETE /v1/projects/{project_id}/sandboxes/{sandbox_id}

Response

204 No Content on success.

Optional: force=true query param for immediate SIGKILL semantics.

2. Timeout and retry semantics

2.1 Sandbox lifetime

timeout_seconds (create request): max wall-clock lifetime from creation to termination.
- Range: 1–86400 seconds; default 300.
- Enforced by Control Plane timers and Agent; when exceeded, sandbox is killed and marked TERMINATED with reason TIMEOUT.

2.2 Idle timeout

idle_timeout_seconds: max period with no activity (no running exec, no stdin, no tunnel) before auto-terminate.
- If idle period exceeds this, sandbox is terminated and marked TERMINATED with reason IDLE_TIMEOUT.

2.3 Exec timeout

timeout_seconds in exec request: max duration for that command.
- If exceeded: command killed; sandbox remains; exec result has reason = "TIMEOUT".

2.4 Retries (API-level)

Create
- Clients should treat 429 / 503 as transient and may retry with exponential backoff (jittered), but not automatically on 4xx other than 409 (name conflict).
Exec
- If exec call fails before a stream is established (connection error), client may retry once; after ExecStart has been accepted, treat stream errors as terminal and surface to user.⁴⁷³⁴⁷⁶
Terminate
- Idempotent: repeated DELETE on already-terminated sandbox returns 204.

3. Authorization and RBAC

All sandbox endpoints are project-scoped and require JWT with tenant_id and project_roles.

3.1 Required roles

Per project:

POST /projects/{project_id}/sandboxes → sandbox:create
POST /projects/{project_id}/sandboxes/{sandbox_id}/exec → sandbox:execute
DELETE /projects/{project_id}/sandboxes/{sandbox_id} → sandbox:destroy
GET /projects/{project_id}/sandboxes* → sandbox:view

Typical role mapping:

project_admin → {sandbox:create, sandbox:execute, sandbox:destroy, sandbox:view}
developer → {sandbox:create, sandbox:execute, sandbox:view}
viewer → {sandbox:view}

3.2 Enforcement

On each request:

Django decodes JWT, validates signature/expiry, and extracts tenant_id, project_roles.⁴⁷⁷⁴⁷⁸
Verifies that project_id in path belongs to tenant_id in token.
Checks required permission for the endpoint against project_roles[project_id].
Only then calls Workstation Agent gRPC with tenant_id, project_id, user_id in metadata over mTLS.⁴⁷⁹⁴⁸⁰⁴⁸¹

No direct user calls to agents are allowed.

4. Observability: metrics and logging

4.1 Metrics (per sandbox and per tenant)

Collected via Agent and ReportMetrics stream, then exported (Prometheus/Cloud Monitoring):⁴⁸²⁴⁸³⁴⁸⁴

Per-sandbox:

sandbox_cpu_seconds_total{sandbox_id,tenant_id,project_id}
sandbox_memory_peak_mib{sandbox_id,...}
sandbox_io_bytes_read_total{...}
sandbox_io_bytes_written_total{...}
sandbox_executions_total{sandbox_id,...,status="success|failure|timeout"}
sandbox_lifetime_seconds{sandbox_id,...}

Per tenant/project:

Aggregated CPU seconds, memory usage, and exec counts for quota enforcement and cost reporting.

4.2 Logs

All stdout/stderr chunks streamed through ExecSandbox are optionally mirrored to a central logging system with labels:
- tenant_id, project_id, sandbox_id, exec_id, workstation_id.
Sandbox lifecycle events (create/destroy/timeout/idle-kill) and errors logged as structured events.

4.3 Audit log

For each sandbox action:

{
  "event_type": "sandbox.exec",
  "timestamp": "2026-01-15T07:53:00Z",
  "tenant_id": "t-acme",
  "project_id": "proj-foo",
  "sandbox_id": "sb-uuid",
  "user_id": "user-uuid",
  "workstation_id": "ws-01",
  "command": ["python", "-m", "pytest"],
  "result": {
    "status": "success",
    "exit_code": 0,
    "duration_ms": 10234
  },
  "resources": {
    "cpu_seconds": 4.2,
    "memory_peak_mib": 512
  }
}

5. Error codes and failure behaviors (runsc/gVisor sandboxes)

Control Plane translates low-level Docker/runsc errors into API-level responses.⁴⁸⁵⁴⁸⁶⁴⁸⁷⁴⁸⁸

5.1 Create sandbox

400 Bad Request
- Invalid payload (missing image, invalid timeout range, invalid name).
401 Unauthorized / 403 Forbidden
- Missing/invalid JWT or insufficient RBAC.
404 Not Found
- project_id not found or not accessible to tenant.
409 Conflict
- Sandbox with name already running in this project.
422 Unprocessable Entity
- Image not found / invalid image / unsupported runtime.
429 Too Many Requests
- Tenant/project quotas exceeded (concurrent sandboxes or CPU).
500 Internal Server Error
- Unexpected Docker/runsc error; message logged, generic message returned.

Common runsc-related issues surfaced as 422/500 with obfuscated internal error details.⁴⁸⁷⁴⁸⁸⁴⁸⁹

5.2 Exec

404 Not Found
- Sandbox ID does not exist or is terminated.
409 Conflict
- Sandbox not in RUNNING state.
422 Unprocessable Entity
- Invalid command/args.
504 Gateway Timeout
- Exec setup exceeded a platform timeout before process start (e.g., Docker hung).

On-stream errors:

If process exits due to gVisor or Docker error, client receives an exit event with reason="INTERNAL_ERROR" and exit_code generic.

5.3 Terminate

204 No Content
- Sandbox terminated or was already gone.
404 Not Found
- Sandbox never existed (e.g., wrong ID).
500 Internal Server Error
- Agent failed to remove container; control plane marks sandbox as ERROR and alerts ops.

This spec gives you a crisp, implementable contract for Coditect Sandboxes aligned with your gVisor Workstation architecture, gRPC agent model, and tenant RBAC.

⁂

SUGGESTIONS

Three concise directions to deepen this design and make it “build-ready”:

1. Formalize everything into a single “Sandbox RFC”

Consolidate the SDD/TDD/ADR + Sandbox API spec into one internal RFC that:
- Starts with the Modal mental model and explicitly maps each concept to Coditect.
- Includes finalized OpenAPI 3.1 schema, gRPC proto, and RBAC matrix.
- Has an implementation plan (phased rollout: “single-tenant pilot → multi-tenant beta → public GA”).
Use that RFC as the canonical doc for onboarding new engineers and aligning infra / product.

2. Implementation spike plan

Define 2–3 short spikes with clear goals:

Workstation + gVisor spike
- Bring up a single Workstation, install runsc, run a minimal agent, and prove:
  - Create/Exec/Terminate work reliably inside gVisor.
  - Basic metrics (CPU/mem) flow back to a dev control plane.
End-to-end sandbox exec demo
- From React → Django → Agent → gVisor → back to React:
  - Run pytest in a sandbox on a small demo repo.
  - Show streamed logs and autosave commits hitting a GitHub test org.
Autosave + AI run integration
- Wire a simple Claude/Gemini “run tests and fix failures” flow that:
  - Modifies code, runs sandboxed tests, and creates autosave commits.

Each spike should produce working code plus a short retro doc capturing surprises (especially gVisor quirks).

3. Hardening & operability checklist

Before committing this as the core Coditect runtime, line up a short checklist:

Compatibility matrix: which languages/tools are officially supported under gVisor (Python, Node, Go, Rust toolchains) and known caveats.
Abuse thresholds: concrete default limits for CPU, mem, network, sandboxes per tenant, and clear docs on what gets auto-killed vs just logged.
Runbooks:
- “Sandbox stuck / cannot be terminated.”
- “Workstation agent unreachable.”
- “Tenant hits abuse heuristics repeatedly.”
Feature flags:
- Ability to toggle gVisor vs runc per tenant/project for debugging.
- Ability to roll back sandbox features without breaking dev workflows.

If you share which part you want to productize first (e.g., “test runner as a service” vs “full REPL/dev shell”), the next iteration can be a very focused build plan for that slice.

1. Formalize everything into a single “Sandbox RFC”

Consolidate the SDD/TDD/ADR + Sandbox API spec into one internal RFC that: Starts with the Modal mental model and explicitly maps each concept to Coditect. Includes finalized OpenAPI 3.1 schema, gRPC proto, and RBAC matrix. Has an implementation plan (phased rollout: “single-tenant pilot → multi-tenant beta → public GA”). Use that RFC as the canonical doc for onboarding new engineers and aligning infra / product.

Below is a tight skeleton you can treat as the Sandbox RFC v1 and extend as needed.

1. Title, authors, status

Title: Coditect Sandbox Runtime on Google Cloud Workstations
Authors: <names>
Status: Draft v1
Reviewers: Platform, Infra, Security, Frontend
Date: 2026‑01‑15

2. Overview & goals

Goal: Provide a Modal‑like Sandbox API and runtime for executing untrusted and AI‑generated code per tenant/project, implemented on Google Cloud Workstations using gVisor (runsc) containers.⁴⁹⁰⁴⁹¹⁴⁹²⁴⁹³
Non‑goals:
- Providing general serverless compute to external services.
- Replacing all CI/CD; initial focus is dev/test workflows and AI agents.

Modal concept	Coditect equivalent
`App`	Tenant + Project configuration namespace
`Sandbox.create()`	`POST /projects/{project_id}/sandboxes`
`sb.exec()`	`POST /.../sandboxes/{id}/exec` + gRPC `ExecSandbox`
`timeout`, `idle_timeout`	Sandbox lifetime + idle timers in Control Plane
Named sandboxes	`name` field, unique per project while running
Tags	`tags` map on sandbox metadata
`from_id` / `from_name`	`GET /sandboxes/{id}` + `GET /sandboxes?name=`

Under the hood, Modal uses proprietary infra; Coditect uses gVisor containers inside Workstations VMs, orchestrated via a Workstation Agent gRPC service.⁴⁹²⁴⁹³⁴⁹⁴⁴⁹⁵⁴⁹⁶

4. Architecture summary (SDD in brief)

Control Plane (Django + React)
- Exposes /v1/projects/{id}/sandboxes REST API.
- Handles JWT auth, tenant/project RBAC, quotas, and audit logs.
- Maintains sandbox metadata and routes calls to Workstations.
Google Cloud Workstations
- Per‑team Workstation VMs in our GCP projects.⁴⁹¹⁴⁹⁰
- Each VM runs Docker/containerd with runsc installed as an alternative runtime (--runtime=runsc).⁴⁹³⁴⁹⁷⁴⁹⁸
Coditect Agent (per Workstation)
- gRPC server implementing SandboxAgent (CreateSandbox, ExecSandbox, DestroySandbox, ReportMetrics).⁴⁹⁹⁵⁰⁰⁵⁰¹⁵⁰²
- On Create: docker run --runtime=runsc with mounts/env/limits.⁴⁹⁴⁴⁹⁵⁴⁹²⁴⁹³
- On Exec: docker exec wrapped in gRPC bidirectional stream; forwards stdout/stderr, handles stdin and timeouts.
- On Destroy: docker rm -f sandbox container.
Git + autosave
- Per user/project/ticket, autosave branches and worktrees push snapshots to GitHub for recovery and review.⁵⁰³⁵⁰⁴⁵⁰⁵⁵⁰⁶

5. API surface (OpenAPI 3.1 sketch)

Paths

POST /v1/projects/{project_id}/sandboxes – create sandbox.
GET /v1/projects/{project_id}/sandboxes – list, filter by name or tags.
GET /v1/projects/{project_id}/sandboxes/{sandbox_id} – details.
POST /v1/projects/{project_id}/sandboxes/{sandbox_id}/exec – start exec, return exec_id + stream_url.
DELETE /v1/projects/{project_id}/sandboxes/{sandbox_id} – terminate sandbox.

Core schema fields (Create body)

name?: string (optional, unique while running)
image: string
volumes?: { [mountPath: string]: string }
workdir?: string
env?: { [key: string]: string }
secrets?: string[] (secret IDs)
timeout_seconds?: number (default 300, max 86400)
idle_timeout_seconds?: number (default 60)
tags?: { [key: string]: string }
runtime?: "gvisor"

Exec request:

command: string[]
workdir?: string
timeout_seconds?: number
pty?: boolean

You already have most of this from earlier; the RFC just freezes it as the canonical API.

6. gRPC proto (TDD in brief)

Service

service SandboxAgent {
  rpc CreateSandbox(CreateSandboxRequest) returns (CreateSandboxResponse);
  rpc ExecSandbox(stream ExecRequest) returns (stream ExecEvent);
  rpc DestroySandbox(DestroySandboxRequest) returns (DestroySandboxResponse);
  rpc ReportMetrics(stream MetricsReport) returns (MetricsAcknowledge);
}

Key messages

CreateSandboxRequest: sandbox_id, tenant_id, project_id, image, workspace_path, cpu_limit, memory_limit_mib, idle_timeout_seconds, env.
ExecRequest / ExecEvent: ExecStart (command, workdir, timeout, pty), ExecStdout, ExecStderr, ExecExit, ExecStatus.
MetricsReport: sandbox_id, cpu_seconds, memory_peak_mib, bytes_read, bytes_written.

gRPC is mTLS‑protected; Control Plane identity and tenant metadata are carried in TLS and metadata, not user‑originating JWTs.⁵⁰²⁵⁰⁷⁵⁰⁸⁵⁰⁹

7. RBAC matrix

Per project:

Role	Permissions
`project_admin`	create/exec/destroy/list sandboxes; manage quotas
`developer`	create/exec/list sandboxes
`viewer`	list/get sandboxes

Permissions:

sandbox:create → POST /projects/{id}/sandboxes
sandbox:execute → POST /projects/{id}/sandboxes/{sb}/exec
sandbox:destroy → DELETE /projects/{id}/sandboxes/{sb}
sandbox:view → GET /projects/{id}/sandboxes*

Tenant owners/admins can adjust project quotas and sandbox policies; RBAC evaluation happens once per request at Django.

8. Timeout, idle, and reattachment semantics

timeout_seconds: sandbox max lifetime; Control Plane enforces via deadline timers, Agent kills container when exceeded.
idle_timeout_seconds: auto‑terminate when no exec/stdin/tunnel activity for this period.
Exec calls have their own per‑exec timeout_seconds, with exit reason TIMEOUT.
Reattachment:
- GET /sandboxes/{id} → lookup by ID (if status == RUNNING).
- GET /sandboxes?name=... → unique by name per project while running; 404 otherwise.

9. Observability and security policies (summary)

Metrics: CPU/mem/I/O per sandbox and aggregated per tenant for quotas.⁵¹⁰⁵¹¹⁵¹²
Logs: stdout/stderr streams centralised with sandbox/tenant/project labels.
Abuse detection: CPU loops, OOM patterns, network scans; auto‑terminate sandbox and rate‑limit/bucket tenants.⁵¹³⁵¹⁴⁵¹⁵
gVisor guarantees and limits documented; all untrusted workloads must use runsc.⁴⁹⁵⁴⁹⁶⁵¹⁶⁴⁹²⁴⁹³⁴⁹⁴

10. Implementation plan (phased rollout)

Phase 0 – Prototype (single tenant, non‑prod)

Target: internal “dogfood” tenant.
Deliveries:
- Single Workstation pool with runsc installed.⁴⁹⁷⁵¹⁷⁴⁹³
- Minimal SandboxAgent implementation + Control Plane integration for Create/Exec/Terminate.
- Basic metrics and logs.

Phase 1 – Single‑tenant pilot (prod‑like)

Expand to 1–2 real tenants under feature flag.
Quotas and RBAC enforced; autosave branches wired in.⁵⁰⁴⁵⁰⁵⁵⁰⁶⁵⁰³
Abuse heuristics and kill logic in place (safeguards).

Phase 2 – Multi‑tenant beta

Hardened IAM and tenant isolation.⁵¹⁸⁵¹⁹⁵²⁰
Named sandboxes + tags, from_name-style lookups.
Per-tenant dashboards for resource usage.

Phase 3 – Public GA

Documented API, SLAs, and support runbooks.
Backwards compatibility guarantees for Sandbox API.
Optional additional runtimes (e.g., Wasm, later microVMs) hiding behind same API.

This RFC skeleton should be the single canonical document in your repo (e.g., rfcs/000-sandbox-runtime.md), with links out to detailed OpenAPI, proto files, and TDD code stubs.

⁂

2. Implementation spike plan

Define 2–3 short spikes with clear goals: Workstation + gVisor spike Bring up a single Workstation, install runsc, run a minimal agent, and prove: Create/Exec/Terminate work reliably inside gVisor. Basic metrics (CPU/mem) flow back to a dev control plane. End-to-end sandbox exec demo From React → Django → Agent → gVisor → back to React: Run pytest in a sandbox on a small demo repo. Show streamed logs and autosave commits hitting a GitHub test org. Autosave + AI run integration Wire a simple Claude/Gemini “run tests and fix failures” flow that: Modifies code, runs sandboxed tests, and creates autosave commits. Each spike should produce working code plus a short retro doc capturing surprises (especially gVisor quirks).

Here’s a concise implementation spike plan you can plug into your backlog.

Spike 1 – Workstation + gVisor + Agent

Goal: Prove Create/Exec/Terminate works reliably via runsc on a single Workstation and that basic metrics flow to a dev control plane.

Scope

One Google Cloud Workstation (or equivalent GCE VM) in a dev project.⁵²¹⁵²²
Docker/containerd configured with gVisor runsc.⁵²³⁵²⁴⁵²⁵
Minimal Go (or Rust) Agent implementing CreateSandbox, ExecSandbox, DestroySandbox, ReportMetrics over gRPC.⁵²⁶⁵²⁷⁵²⁸

Tasks

Workstation and gVisor setup
- Provision a single Workstation VM.⁵²²⁵²¹
- Install runsc and integrate as Docker runtime (runsc install, update daemon.json, restart Docker).⁵²⁴⁵²⁵⁵²³
- Sanity check: docker run --runtime=runsc alpine echo hello.
Agent skeleton
- Implement gRPC server with:
  - CreateSandbox: docker run --runtime=runsc ... sleep infinity.
  - ExecSandbox: docker exec with stdout/stderr streaming over gRPC.⁵²⁸⁵²⁹⁵³⁰⁵²⁶
  - DestroySandbox: docker rm -f.
  - ReportMetrics: periodically emit fake metrics (or parse docker stats/cgroups).⁵³¹⁵³²⁵³³
Dev control plane stub
- Simple CLI or small Django view to:
  - Call CreateSandbox, then ExecSandbox with echo hello, then DestroySandbox.
  - Print outputs and metrics.

Exit criteria

Can create a gVisor sandbox, run a trivial command, see output, terminate it without orphan containers.
CPU/mem metrics for that sandbox are visible in logs or a simple dashboard.
Retro doc:
- gVisor quirks observed (e.g., runsc perf, filesystem behaviors).⁵³³⁵³⁴⁵³⁵
- Workstation image/permissions gotchas.

Spike 2 – End-to-end sandbox exec demo (React → Django → Agent → gVisor → React)

Goal: Demonstrate a full-path workflow: user triggers tests in UI, tests run in a gVisor sandbox on a demo repo, logs stream back live.

Scope

Existing React TS frontend and Django backend.
One dev Workstation + Agent from Spike 1.
Demo repo (Python with pytest) in a GitHub test org.

Tasks

Django integration
- Implement REST endpoints:
  - POST /v1/projects/{project_id}/sandboxes (single image, hardcoded resource limits).
  - POST /.../sandboxes/{id}/exec → opens gRPC Exec stream, exposes WebSocket stream_url.
- Hardcode a mapping project_id -> workstation_id for now.
React integration
- Add a “Run tests in sandbox” button for a demo project.
- On click:
  - POST /sandboxes → get sandbox ID.
  - POST /sandboxes/{id}/exec with ["pytest", "-q"] → get stream_url.
  - Connect to stream_url and render stdout/stderr live in UI.
Sandbox teardown
- On exec completion, auto-terminate sandbox or leave it running and show “Terminate” button.
- Ensure idle/total timeouts configured (e.g., 300s).

Exit criteria

From the browser, you can:
- Click “Run tests”, see a sandbox created, pytest run, and logs stream in real-time.
- Terminate sandbox and confirm container removed.
Retro doc:
- Latency/UX observations (cold start, streaming behavior).⁵²⁹⁵³⁶⁵²⁸
- Any gVisor compatibility issues running pytest and common dependencies.⁵³⁵⁵³⁷⁵³³

Spike 3 – Autosave + AI run integration (Claude/Gemini)

Goal: Wire a minimal AI-assisted workflow: “run tests and fix failures,” with code changes + sandboxed tests + autosave commits to GitHub.

Scope

Same demo project repo as Spike 2, forked into a GitHub test org.
Simple Claude/Gemini integration via existing Coditect AI layer.

Tasks

Autosave plumbing
- In the Workstation’s workspace:
  - Create autosave branch (e.g., autosave/PROJ-1234/demo/user/ws-01/20260115).
  - Use git worktree for autosave operations.⁵³⁸⁵³⁹⁵⁴⁰⁵⁴¹
- Implement a small autosave daemon:
  - Every 30 seconds if dirty: git add -A, git commit -m "autosave: ...", git push origin autosave/....
AI “fix tests” action
- In UI: button “Fix failing tests with AI.”
- Flow:
Run tests in sandbox; capture failing tests summary.
Send repo context + failures + goal to Claude/Gemini.
Apply model’s patch to workspace.
Trigger autosave; run tests again in sandbox.
Visibility in UI
- Show list of recent autosave commits (branch + commit ID).
- Show AI runs (model, status, link to autosave diff).

Exit criteria

End-to-end demo:
- Break a test intentionally.
- Click “Fix failing tests with AI.”
- AI suggests changes; autosave commits appear in GitHub; sandboxed tests re-run and pass (for simple cases).
Retro doc:
- Where gVisor, Git, and model behavior interacted poorly (e.g., file watchers, pytest under gVisor).⁵³³⁵³⁵
- Edge cases in autosave (conflicts, push races).

This gives you three very focused, demonstrable increments you can assign to small squads and use to de-risk gVisor + Workstations before scaling to full multi-tenant GA.

⁂

3. Hardening & operability checklist

Before committing this as the core Coditect runtime, line up a short checklist: Compatibility matrix: which languages/tools are officially supported under gVisor (Python, Node, Go, Rust toolchains) and known caveats. Abuse thresholds: concrete default limits for CPU, mem, network, sandboxes per tenant, and clear docs on what gets auto-killed vs just logged. Runbooks: “Sandbox stuck / cannot be terminated.” “Workstation agent unreachable.” “Tenant hits abuse heuristics repeatedly.” Feature flags: Ability to toggle gVisor vs runc per tenant/project for debugging. Ability to roll back sandbox features without breaking dev workflows.

Here’s a compact hardening & operability checklist you can turn into tickets.

1. Compatibility matrix (gVisor support)

Define a “supported stack” table backed by real tests under runsc.⁵⁴²⁵⁴³⁵⁴⁴⁵⁴⁵⁵⁴⁶⁵⁴⁷

Languages / runtimes
- Python: 3.9–3.12, pip, pytest, uv, poetry.
- Node.js: LTS versions, npm, yarn, pnpm.
- Go: go toolchain, go test, go build.
- Rust: rustc, cargo, rustfmt, clippy.
- Shell: bash, zsh.
Tools and services
- Git, SSH (if needed inside sandbox), curl/wget for limited networked sandboxes.
- Databases: local sqlite only; external DB access only via backend (not from sandbox).
Known gVisor caveats (to document)
- Potential perf overhead on syscall-heavy tools and file watchers.⁵⁴⁵⁵⁴⁶⁵⁴⁸
- Limited / different behavior for some /proc and networking features.⁵⁴⁴⁵⁴⁸
- GPU access: unsupported / experimental; GPU workloads not in v1.⁵⁴⁸⁵⁴⁹

Deliverable: a small “gVisor compatibility” page listing supported stacks and what’s not supported.

2. Abuse thresholds (defaults)

All thresholds are per sandbox and per tenant, with clear behaviors.

Per-sandbox defaults
- CPU: 1 vCPU limit; auto-kill if CPU > 90% sustained for > 60s with no log output.
- Memory: 1–2 GiB limit; kill on OOM; mark run as RESOURCE_EXCEEDED.
- Runtime:
  - timeout_seconds default 300, max 86400.
  - idle_timeout_seconds default 60.
- Network (if enabled):
  - Max connections/minute, max destinations/minute, domain allowlist only.
Per-tenant defaults
- Max concurrent sandboxes: e.g., 5 (free), 10–20 (paid tiers).
- vCPU-seconds/hour: soft limit with warnings at 80%, hard rejection at 100%.
- Networked sandboxes: 0 by default; explicit opt-in for “networked tools”.
Behavior on breach
- Sandbox-level breach → kill sandbox, log event, increment tenant abuse counter.
- Tenant repeated breaches (e.g., 3 in 10 minutes) → throttle new sandboxes and require manual review.

Document this as a table (“What gets auto-killed vs just logged”) and expose summary in tenant admin UI.

3. Runbooks

3.1 “Sandbox stuck / cannot be terminated”

Symptoms: sandbox status TERMINATING, but container still exists or exec hangs.

Steps:

Control Plane calls DestroySandbox (graceful).
If no success within N seconds, Agent attempts docker rm -f.
If still stuck:
- Mark sandbox ERROR; stop sending new execs.
- Emit alert tagged with sandbox_id, workstation_id.
- Provide operator script:
  - SSH into Workstation, inspect with docker ps, docker kill, runsc logs.⁵⁵⁰⁵⁵¹
If sandbox failures correlate with specific image/tool, mark that combination as “unsupported” until fixed.

3.2 “Workstation agent unreachable”

Symptoms: gRPC errors or health checks failing for Agent.

Steps:

Control Plane health-checks agents periodically; mark Workstation UNHEALTHY after M failures.
Stop scheduling new sandboxes to that Workstation.
If Workstation likely dead:
- Mark associated sandboxes UNKNOWN; show “session lost” in UI.
- Allow users to restart sessions (new Workstation).
Operator actions:
- Check Workstation status in GCP Workstations console.⁵⁵²⁵⁵³
- Restart or recreate Workstation; redeploy Agent.

3.3 “Tenant hits abuse heuristics repeatedly”

Symptoms: frequent auto-kills for CPU/network/file abuse.

Steps:

Threshold: e.g., ≥5 sandbox kills with ABUSE_SUSPECTED in 30 minutes.
Automatically:
- Lower tenant’s sandbox concurrency / CPU quotas temporarily.
- Block networked sandboxes for that tenant.
Notify:
- Send email / in-app warning to tenant admins with log summary.
Operator decision:
- Lift or escalate restrictions after review.

4. Feature flags

Implement feature flags in the control plane config / DB:

sandbox_runtime_mode per tenant/project
- Values: gvisor, runc, auto.
- Default gvisor; runc allowed only for internal debugging and marked as unsafe.
- Agents accept a runtime hint but may restrict to runsc in prod.
sandbox_enabled per tenant/project
- Globally disable the sandbox feature for specific tenants or environments without impacting other project features.
sandbox_api_version
- For future API evolution; allow routing tenants to v1 or v2 semantics (e.g., new timeouts, new fields) while keeping old ones stable.
autosave_enabled and ai_tools_enabled per project
- Allow rollout/rollback of autosave and AI runs independently of sandbox runtime.

This allows safe experimentation, targeted rollbacks, and gradual enablement across tenants without breaking core dev workflows.

⁂

What is open vs closed
Relevant docs and links
Suggested next-step prompt scaffold
OpenAPI 3.1 Specification (Core Endpoints)
Orchestration Flow (Sequence Diagram)
IAM Model
- Token Structure (JWT Claims)
- Policy Enforcement Points
- Secret Injection
RBAC Schema
- Roles
- Role Hierarchy
- Resource Bindings
Persistence Model
- Workspace Types
- Volume Mounting (from Modal pattern)
- Snapshot Flow
Audit Logging Schema
Resource Limits Enforcement
Suggested Follow-Up Prompts
Managed sandbox platforms (API-level inspirations)
Universal sandbox API libraries
Architectural primitives (self-hosted building blocks)
Decision heuristics you can reuse
Suggested next prompt
Isolation runtimes (core “Modal-like” boundary)
Orchestration and execution frameworks
Universal sandbox API (open source)
Design guidance from open references
Concrete stack suggestion for Coditect
Snapshot: when to use what
gVisor
Kata Containers
Firecracker
Wasm runtimes (Wasmtime, Wasmer, WasmEdge)
How to combine them in a Coditect architecture
1. Runtime tiers and Kubernetes setup
- Runtime classes
2. /sandboxes API and runtime selection
- API surface
- Policy engine
3. Mapping to Kubernetes and Wasm
- 3.1 gVisor tier (semi-trusted)
- 3.2 Kata + Firecracker tier (untrusted/public)
- 3.3 Wasm tier (capability tools)
4. Shared observability, quota, and audit logging
- 4.1 Central metadata and audit log
- 4.2 Common telemetry stack
- 4.3 Quota enforcement
5. High-level sequence for /sandboxes with tiered runtimes
Security and performance tradeoffs by tier
- gVisor tier
- Kata Containers + Firecracker tier
- Wasmtime / Wasm tier
Example RuntimeClass YAML for gVisor, Kata, and Wasm
- gVisor RuntimeClass
- Kata + Firecracker RuntimeClass
- Wasm RuntimeClass (if using Wasm containers)
Node pool configuration for Firecracker and Wasm workloads
- Firecracker/Kata nodes
- Wasm nodes
Admission controller policy for enforcing /sandboxes runtime
- Mutating webhook (optional)
- Validating webhook (strong enforcement)
Pod security considerations when mixing runtimes
gVisor on GCP (best economic fit)
Kata Containers + Firecracker on GCP (possible, but heavier)
Wasm runtimes on GCP (targeted, very cheap per-tool)
Practical recommendation for Coditect on Google Cloud Workstations
High-level architecture
Provisioning per-team Workstations with gVisor
Running nested containers with runsc (Modal-like behavior)
Tenant/project quota enforcement
Security and isolation notes
Practical Coditect recommendation on Google Cloud Workstations
1. Minimal gRPC API (control plane ↔ agent)
- Service definitions (proto sketch)
2. Agent behavior on Workstation (gVisor inside VM)
3. Multi-user data model and persistence
- Workspace layout (per user / project)
- Git + GitHub near–real time
4. Coditect project management + agents (Claude/Gemini)
- Project state
- AI-assisted workflows
5. Quotas, multi-tenancy, and economics
1. Protobuf: control plane ↔ agent API
2. Go agent skeleton with docker + runsc
3. Authentication and authorization patterns
4. Streaming stdout/stderr over unstable networks
5. gVisor sandbox lifecycle best practices
Branch naming scheme
Commit-message conventions
Autosave workflow and frequency
Conflict-resolution flow
- 1. Push conflicts (non-fast-forward)
- 2. Divergence from main branches
- 3. Handling local uncommitted changes when switching branches
Summary shape Coditect can adopt
1. Detect and classify the failure
2. Reconcile autosave branch with remote
3. Keeping in sync with main / feature branches
4. Fallback strategy for persistent autosave failures
Branch prefix and naming with worktrees
- Prefix
- Worktree usage
Commit message template
Tags for autosave commits (for cleanup and indexing)
- Tag naming
- Cleanup
Conflict handling with worktrees
Local vs remote autosave branches
Recommended prefix for autosave branches
Branch name format with ticket and workstation ID
Safe unique workstation identifiers
Handling branch naming collisions
Sandbox runtimes and tiers
gVisor on Google Cloud Workstations
Control plane ↔ agent gRPC API
Streaming and lifecycle best practices
Git autosave with worktrees and remote branches
1. SDD – System Design Document
- 1.1 Overview
- 1.2 Architecture components
- 1.3 Data flows
- 1.4 Non-functional requirements
2. TDD – Technical Design Details
- 2.1 gVisor configuration on Workstations
- 2.2 SandboxAgent implementation (Go)
- 2.3 Control Plane internals
- 2.4 Git autosave + worktrees details
- 2.5 Integration with Claude Code and Gemini
3. ADR – Architecture Decision Record
- Context
- Decision
- Rationale
- Alternatives considered
- Consequences
Scope and success criteria
- SDD (System Design Document)
- TDD (Technical Design Details)
- ADR (Architecture Decision Record)
Stakeholders and approval authorities
SDD outline template (Cloud Workstations–specific)
TDD outline template (gVisor + runsc–focused)
ADR template (with options and tradeoffs)
- 1. Context
- 2. Decision
- 3. Options considered
- 4. Tradeoffs
- 5. Consequences
- 6. Related decisions
Identity, JWT, and RBAC as the control layer
Workstation provisioning driven by Django
gVisor sandbox assignment per user in shared Workstations
IAM roles and permissions per component
JWT token flow and refresh strategy
RBAC role hierarchy (users, tenants, projects)
Secure storage and rotation for signing keys
Endpoints requiring tenant isolation checks (once per request)
1. Security / abuse-resistance of sandboxes
2. Developer UX and AI workflow integration
1. Security Design Addendum: gVisor sandboxes on Cloud Workstations
- 1.1 Threat model
- 1.2 Concrete mitigations
2. Project Session model and API (React TS + Django)
- 2.1 Conceptual model
- 2.2 REST/GraphQL API shape
- 2.3 React TS frontend representation
Security Design Addendum (Coditect + Cloud Workstations + gVisor)
- 1. Purpose and scope
- 2. Assets
- 3. Trust boundaries
- 4. STRIDE analysis (per threat, with mitigations/policies)
React TS interface contract
1. Core conceptual mapping
2. Patterns worth explicitly copying
3. Where Coditect diverges (by design)
4. Suggested next focused spec
1. Overview
2. REST API surface (high level)
- 2.1 Create sandbox
- 2.2 Exec in sandbox
- 2.3 Terminate sandbox
- 2.4 Lookup by name / list with tags
3. Timeout and idle semantics
- 3.1 Total timeout
- 3.2 Idle timeout
- 3.3 Exec-level timeout
4. Reattachment semantics (from_id / from_name)
- 4.1 by id
- 4.2 by name
5. Mapping to Workstation Agent gRPC
6. RBAC and multi-tenant constraints
7. Example JSON schemas (abridged)
1. REST API endpoints and payloads
- 1.1 Create sandbox
- 1.2 Exec in sandbox
- 1.3 Terminate sandbox
2. Timeout and retry semantics
- 2.1 Sandbox lifetime
- 2.2 Idle timeout
- 2.3 Exec timeout
- 2.4 Retries (API-level)
3. Authorization and RBAC
- 3.1 Required roles
- 3.2 Enforcement
4. Observability: metrics and logging
- 4.1 Metrics (per sandbox and per tenant)
- 4.2 Logs
- 4.3 Audit log
5. Error codes and failure behaviors (runsc/gVisor sandboxes)
- 5.1 Create sandbox
- 5.2 Exec
- 5.3 Terminate
1. Formalize everything into a single “Sandbox RFC”
2. Implementation spike plan
3. Hardening & operability checklist
1. Title, authors, status
2. Overview & goals
3. Mental model: Modal → Coditect mapping
4. Architecture summary (SDD in brief)
5. API surface (OpenAPI 3.1 sketch)
6. gRPC proto (TDD in brief)
7. RBAC matrix
8. Timeout, idle, and reattachment semantics
9. Observability and security policies (summary)
10. Implementation plan (phased rollout)
- Phase 0 – Prototype (single tenant, non‑prod)
- Phase 1 – Single‑tenant pilot (prod‑like)
- Phase 2 – Multi‑tenant beta
- Phase 3 – Public GA
Spike 1 – Workstation + gVisor + Agent
- Scope
- Tasks
- Exit criteria
Spike 2 – End-to-end sandbox exec demo (React → Django → Agent → gVisor → React)
- Scope
- Tasks
- Exit criteria
Spike 3 – Autosave + AI run integration (Claude/Gemini)
- Scope
- Tasks
- Exit criteria
1. Compatibility matrix (gVisor support)
2. Abuse thresholds (defaults)
3. Runbooks
- 3.1 “Sandbox stuck / cannot be terminated”
- 3.2 “Workstation agent unreachable”
- 3.3 “Tenant hits abuse heuristics repeatedly”
4. Feature flags

What is open vs closed​

Relevant docs and links​

Suggested next-step prompt scaffold​

OpenAPI 3.1 Specification (Core Endpoints)​

Orchestration Flow (Sequence Diagram)​

IAM Model​

Token Structure (JWT Claims)​

Policy Enforcement Points​

Secret Injection​

RBAC Schema​

Roles​

Role Hierarchy​

Resource Bindings​

Persistence Model​

Workspace Types​

Volume Mounting (from Modal pattern)​

Snapshot Flow​

Audit Logging Schema​

Resource Limits Enforcement​

Suggested Follow-Up Prompts​

WHAT MODAL-LIKE technologies already exist which can provide the technical foundation for the MODAL-LIKE CODITECT sandbox technology platfrom.

Managed sandbox platforms (API-level inspirations)​

Universal sandbox API libraries​

Architectural primitives (self-hosted building blocks)​

Decision heuristics you can reuse​

Suggested next prompt​

what open source can I leverage, I do not want proprietary 3rd party solutions as part of the stack

Isolation runtimes (core “Modal-like” boundary)​

Orchestration and execution frameworks​

Universal sandbox API (open source)​

Design guidance from open references​

Concrete stack suggestion for Coditect​

RESEARCH EACH OF THESE in DEPTH: COMPARE AND CONTRAST, PROs AND CONs solation runtimes (core “Modal-like” boundary)

Snapshot: when to use what​

gVisor​

Kata Containers​

Firecracker​

Wasm runtimes (Wasmtime, Wasmer, WasmEdge)​

How to combine them in a Coditect architecture​

1. Runtime tiers and Kubernetes setup​

Runtime classes​

2. /sandboxes API and runtime selection​

API surface​

Policy engine​

3. Mapping to Kubernetes and Wasm​

3.1 gVisor tier (semi-trusted)​

3.2 Kata + Firecracker tier (untrusted/public)​

3.3 Wasm tier (capability tools)​

4. Shared observability, quota, and audit logging​

4.1 Central metadata and audit log​

4.2 Common telemetry stack​

4.3 Quota enforcement​

5. High-level sequence for /sandboxes with tiered runtimes​

Define security and performance tradeoffs for each runtime tier

Security and performance tradeoffs by tier​

gVisor tier​

Kata Containers + Firecracker tier​

Wasmtime / Wasm tier​

Example RuntimeClass YAML for gVisor, Kata, and Wasm​

gVisor RuntimeClass​

Kata + Firecracker RuntimeClass​

Wasm RuntimeClass (if using Wasm containers)​

Node pool configuration for Firecracker and Wasm workloads​

Firecracker/Kata nodes​

Wasm nodes​

Admission controller policy for enforcing /sandboxes runtime​

Mutating webhook (optional)​

Validating webhook (strong enforcement)​

Pod security considerations when mixing runtimes​

WHICH OF THESE COULD RUN MULTI-USER MULTI-TENANT MULTI-TEAM MULTI-PROJECT WORKFLOWS on GOOGLE CLOUD WORKSTATIONS economically?

gVisor on GCP (best economic fit)​

Kata Containers + Firecracker on GCP (possible, but heavier)​

Wasm runtimes on GCP (targeted, very cheap per-tool)​

Practical recommendation for Coditect on Google Cloud Workstations​

High-level architecture​

Provisioning per-team Workstations with gVisor​

Running nested containers with runsc (Modal-like behavior)​

Tenant/project quota enforcement​

Security and isolation notes​

Practical Coditect recommendation on Google Cloud Workstations​

What is open vs closed

Relevant docs and links

Suggested next-step prompt scaffold

OpenAPI 3.1 Specification (Core Endpoints)

Orchestration Flow (Sequence Diagram)

IAM Model

Token Structure (JWT Claims)

Policy Enforcement Points

Secret Injection

RBAC Schema

Roles

Role Hierarchy

Resource Bindings

Persistence Model

Workspace Types

Volume Mounting (from Modal pattern)

Snapshot Flow

Audit Logging Schema

Resource Limits Enforcement

Suggested Follow-Up Prompts

Managed sandbox platforms (API-level inspirations)

Universal sandbox API libraries

Architectural primitives (self-hosted building blocks)

Decision heuristics you can reuse

Suggested next prompt

Isolation runtimes (core “Modal-like” boundary)

Orchestration and execution frameworks

Universal sandbox API (open source)

Design guidance from open references

Concrete stack suggestion for Coditect

Snapshot: when to use what

gVisor

Kata Containers

Firecracker

Wasm runtimes (Wasmtime, Wasmer, WasmEdge)

How to combine them in a Coditect architecture

1. Runtime tiers and Kubernetes setup

Runtime classes

2. `/sandboxes` API and runtime selection

API surface

Policy engine

3. Mapping to Kubernetes and Wasm

3.1 gVisor tier (semi-trusted)

3.2 Kata + Firecracker tier (untrusted/public)

3.3 Wasm tier (capability tools)

4. Shared observability, quota, and audit logging

4.1 Central metadata and audit log

4.2 Common telemetry stack

4.3 Quota enforcement

5. High-level sequence for `/sandboxes` with tiered runtimes

Security and performance tradeoffs by tier

gVisor tier

Kata Containers + Firecracker tier

Wasmtime / Wasm tier

Example RuntimeClass YAML for gVisor, Kata, and Wasm

gVisor RuntimeClass

Kata + Firecracker RuntimeClass

Wasm RuntimeClass (if using Wasm containers)

Node pool configuration for Firecracker and Wasm workloads

Firecracker/Kata nodes

Wasm nodes

Admission controller policy for enforcing `/sandboxes` runtime

Mutating webhook (optional)

Validating webhook (strong enforcement)

Pod security considerations when mixing runtimes

gVisor on GCP (best economic fit)

Kata Containers + Firecracker on GCP (possible, but heavier)

Wasm runtimes on GCP (targeted, very cheap per-tool)

Practical recommendation for Coditect on Google Cloud Workstations

High-level architecture

Provisioning per-team Workstations with gVisor

Running nested containers with `runsc` (Modal-like behavior)

Tenant/project quota enforcement

Security and isolation notes

Practical Coditect recommendation on Google Cloud Workstations

1. Minimal gRPC API (control plane ↔ agent)

Service definitions (proto sketch)

2. Agent behavior on Workstation (gVisor inside VM)

3. Multi-user data model and persistence

Workspace layout (per user / project)