📚 Coditect Sandbox Platform — Full Technical Architecture (L1 → L7)

🧭 Level 1 — Problem Statement & Context

Coditect addresses the need for secure, autosaving, multi-runtime sandbox environments that support AI agents, ephemeral developer environments, and dynamic runtime workloads in a zero-trust cloud-native architecture.

Context:

Increasing demand for ephemeral compute across AI and software engineering workflows
Multi-tenant environments require strong isolation guarantees
Git-centric workflows require automatic state capture, snapshots, and traceability
Executable environments must span containers (gVisor/Kata), microVMs (Firecracker), and WASM runtimes

🧠 Level 2 — High-Level Architecture

🔹 Key Components:

Frontend UI (React) — sandbox explorer, logs, creation
API (FastAPI) — JWT-authenticated entrypoint for sandbox lifecycle, autosave, and quota
Controller (Go) — Kubernetes CRD controller for Sandbox resources
Agent (Python or Rust) — GCP Workstation-local gRPC server to launch containers in gVisor, Kata, or Wasmtime
Infrastructure (OpenTofu) — GCP project, GKE, WorkstationConfig, Secret Manager, IAM
Autosave Engine — GitHub worktree commit/push daemon
Monitoring — Prometheus, Grafana, Cloud Logging

🔁 System Flow:

🔐 Level 3 — Security Architecture

Identity:

OIDC Login (Firebase/Auth0)
JWT Access Tokens (5m, RS256-signed via KMS)
Refresh Tokens (7d, Firebase Secure Storage)
Agent identity via Workload Identity Federation (WIF)

Network:

All agent<->API traffic secured via mTLS with GKE-managed certs
GKE Ingress uses HTTPS, with Cloud Armor IP allowlists
No Pod ever exposed directly to users

Secrets:

GitHub tokens stored in Secret Manager with restricted IAM access
mTLS root CA pinned in agents and API pods

Authorization:

JWT includes:

{
  "tenant_id": "t-xyz",
  "user_id": "u-abc",
  "sandbox_quota": 1800,
  "roles": ["sandboxer"]
}

Enforced at:
- API
- Agent (metadata validation)
- Controller (via webhook)

🧩 Level 4 — Sandbox Lifecycle & CRD Reconciliation

Sandbox CRD (Expanded)

apiVersion: coditect.io/v1alpha1
kind: Sandbox
metadata:
  name: sbx-abc123
spec:
  runtime: gvisor
  tenantID: t-xyz
  userID: u-abc
  projectID: p-123
  image: python:3.11
  command: ["python", "main.py"]
  timeoutSeconds: 900
  limits:
    cpu: 1
    memoryMiB: 512
  networkPolicy:
    blockNetwork: true
    cidrAllowlist: ["10.0.0.0/8"]
status:
  phase: Running
  logsURL: https://...
  autosaveURL: https://github.com/org/repo/tree/autosave/...  

Controller Logic:

Finalizers: sandbox.coditect.io/finalize
Pod annotations:
- sandbox_id, runtime, project_id
CRD → Pod or agent call
Deletes:
- Trigger final snapshot
- Call agent’s TerminateSandbox()

⚙️ Level 5 — Runtime Isolation & Quota Enforcement

RuntimeClasses:

Name	Isolation Model	Use Case
`gvisor`	syscall filter (runsc)	medium-trust agents
`kata-fc`	KVM-based microVM	untrusted workloads
`wasmtime`	wasm runtime sandbox	wasm toolchains

Quota:

JWT field sandbox_quota
API tracks usage in Redis
Prometheus alert on >90% quota
Rejections return 429 Retry-After

🔁 Level 6 — Git Autosave, Push, and Recovery

Git Flow:

git worktree add ../_autosave autosave/<ticket>/<tenant>/<user>
Commits every 30s via daemon:

git add . && git commit -m 'autosave checkpoint' && git push

Final snapshot on destroy
Pushes include:
- sandbox_id
- UTC timestamp
- commitURL added to CRD

Failure Modes:

Scenario	Resolution
Token expired	refresh via Firebase
Branch conflict	retry with worktree rebase
API timeout	async queue push attempt

🔍 Level 7 — Observability, Audit, Threat Modeling

Prometheus Metrics:

sandbox_create_latency_seconds
agent_launch_failures_total
quota_usage_ratio{tenant}
sandbox_active_total{runtime}

Grafana Dashboards:

CPU + mem usage per sandbox
Quota % per user/project
Idle sandbox heatmap

Cloud Logs:

API, Agent, Controller emit:
- sandbox_id
- user_id
- trace_id
Stored in GCS: coditect-audit-logs

Threat Map:

Threat	Mitigation
Sandbox breakout	RuntimeClass, seccomp, readonly FS
JWT forgery	RS256 KMS signing, short TTL
GitHub token exfiltration	Secret Manager + token TTL + scoping
Workstation impersonation	mTLS, WIF, pinned root CA
Excess sandbox abuse	Token quota + Prometheus + Redis sync

🧭 Level 1 — Problem Statement & Context​

🧠 Level 2 — High-Level Architecture​

🔹 Key Components:​

🔁 System Flow:​

🔐 Level 3 — Security Architecture​

Identity:​

Network:​

Secrets:​

Authorization:​

🧩 Level 4 — Sandbox Lifecycle & CRD Reconciliation​

Sandbox CRD (Expanded)​

Controller Logic:​

⚙️ Level 5 — Runtime Isolation & Quota Enforcement​

RuntimeClasses:​

Quota:​

🔁 Level 6 — Git Autosave, Push, and Recovery​

Git Flow:​

Failure Modes:​

🔍 Level 7 — Observability, Audit, Threat Modeling​

Prometheus Metrics:​

Grafana Dashboards:​

Cloud Logs:​

Threat Map:​