π Coditect Sandbox Platform β Full Technical Architecture (L1 β L7)
π§ Level 1 β Problem Statement & Contextβ
Coditect addresses the need for secure, autosaving, multi-runtime sandbox environments that support AI agents, ephemeral developer environments, and dynamic runtime workloads in a zero-trust cloud-native architecture.
Context:
- Increasing demand for ephemeral compute across AI and software engineering workflows
- Multi-tenant environments require strong isolation guarantees
- Git-centric workflows require automatic state capture, snapshots, and traceability
- Executable environments must span containers (gVisor/Kata), microVMs (Firecracker), and WASM runtimes
π§ Level 2 β High-Level Architectureβ
πΉ Key Components:β
- Frontend UI (React) β sandbox explorer, logs, creation
- API (FastAPI) β JWT-authenticated entrypoint for sandbox lifecycle, autosave, and quota
- Controller (Go) β Kubernetes CRD controller for
Sandboxresources - Agent (Python or Rust) β GCP Workstation-local gRPC server to launch containers in gVisor, Kata, or Wasmtime
- Infrastructure (OpenTofu) β GCP project, GKE, WorkstationConfig, Secret Manager, IAM
- Autosave Engine β GitHub worktree commit/push daemon
- Monitoring β Prometheus, Grafana, Cloud Logging
π System Flow:β
π Level 3 β Security Architectureβ
Identity:β
- OIDC Login (Firebase/Auth0)
- JWT Access Tokens (5m, RS256-signed via KMS)
- Refresh Tokens (7d, Firebase Secure Storage)
- Agent identity via Workload Identity Federation (WIF)
Network:β
- All agent<->API traffic secured via mTLS with GKE-managed certs
- GKE Ingress uses HTTPS, with Cloud Armor IP allowlists
- No Pod ever exposed directly to users
Secrets:β
- GitHub tokens stored in Secret Manager with restricted IAM access
- mTLS root CA pinned in agents and API pods
Authorization:β
- JWT includes:
{
"tenant_id": "t-xyz",
"user_id": "u-abc",
"sandbox_quota": 1800,
"roles": ["sandboxer"]
}
- Enforced at:
- API
- Agent (metadata validation)
- Controller (via webhook)
π§© Level 4 β Sandbox Lifecycle & CRD Reconciliationβ
Sandbox CRD (Expanded)β
apiVersion: coditect.io/v1alpha1
kind: Sandbox
metadata:
name: sbx-abc123
spec:
runtime: gvisor
tenantID: t-xyz
userID: u-abc
projectID: p-123
image: python:3.11
command: ["python", "main.py"]
timeoutSeconds: 900
limits:
cpu: 1
memoryMiB: 512
networkPolicy:
blockNetwork: true
cidrAllowlist: ["10.0.0.0/8"]
status:
phase: Running
logsURL: https://...
autosaveURL: https://github.com/org/repo/tree/autosave/...
Controller Logic:β
- Finalizers:
sandbox.coditect.io/finalize - Pod annotations:
sandbox_id,runtime,project_id
- CRD β Pod or agent call
- Deletes:
- Trigger final snapshot
- Call agentβs
TerminateSandbox()
βοΈ Level 5 β Runtime Isolation & Quota Enforcementβ
RuntimeClasses:β
| Name | Isolation Model | Use Case |
|---|---|---|
gvisor | syscall filter (runsc) | medium-trust agents |
kata-fc | KVM-based microVM | untrusted workloads |
wasmtime | wasm runtime sandbox | wasm toolchains |
Quota:β
- JWT field
sandbox_quota - API tracks usage in Redis
- Prometheus alert on >90% quota
- Rejections return
429 Retry-After
π Level 6 β Git Autosave, Push, and Recoveryβ
Git Flow:β
git worktree add ../_autosave autosave/<ticket>/<tenant>/<user>- Commits every 30s via daemon:
git add . && git commit -m 'autosave checkpoint' && git push
- Final snapshot on destroy
- Pushes include:
sandbox_id- UTC timestamp
commitURLadded to CRD
Failure Modes:β
| Scenario | Resolution |
|---|---|
| Token expired | refresh via Firebase |
| Branch conflict | retry with worktree rebase |
| API timeout | async queue push attempt |
π Level 7 β Observability, Audit, Threat Modelingβ
Prometheus Metrics:β
sandbox_create_latency_secondsagent_launch_failures_totalquota_usage_ratio{tenant}sandbox_active_total{runtime}
Grafana Dashboards:β
- CPU + mem usage per sandbox
- Quota % per user/project
- Idle sandbox heatmap
Cloud Logs:β
- API, Agent, Controller emit:
sandbox_iduser_idtrace_id
- Stored in GCS:
coditect-audit-logs
Threat Map:β
| Threat | Mitigation |
|---|---|
| Sandbox breakout | RuntimeClass, seccomp, readonly FS |
| JWT forgery | RS256 KMS signing, short TTL |
| GitHub token exfiltration | Secret Manager + token TTL + scoping |
| Workstation impersonation | mTLS, WIF, pinned root CA |
| Excess sandbox abuse | Token quota + Prometheus + Redis sync |