Skip to main content

Canvas 3 — Impact on agentic system design (architecture implications)

1) Inference architecture shifts

From: “LLM produces answer”

To: “LLM controls a computer substrate”

Required components:

  • Sandbox provisioner: ephemeral container per task or per session
  • I/O contract: fixed directories + final answer file extraction
  • Tool gateway: minimal, universal tools (shell + file editor + submit)
  • Trajectory store: persist action/observation logs for debugging + training

Design consequence:

  • the environment becomes the long-context buffer, scratchpad, and verifier.

2) Context handling: prompt budget → filesystem budget

Mechanism:

  • keep prompt thin (instructions + pointers)
  • place large context as files
  • rely on search + selective reads

System implications:

  • deterministic retrieval primitives (grep, ripgrep, sqlite, embeddings)
  • caching of preprocessed indices for repeated queries
  • document chunking as first-class preprocessing, not prompt engineering

3) Tooling philosophy: “meta-tools” dominate

The shell is a universal API surface:

  • install capabilities at runtime
  • run domain software
  • compose pipelines (grep → parse → compute → format)

System implications:

  • fewer bespoke tool integrations
  • higher need for policy (what is allowed) and guardrails (quotas)

4) Training pipeline implications (sandbox-native post-training)

Key idea:

  • train exploration using general tasks by relocating context into the sandbox

System implications:

  • training infrastructure must run many sandboxes concurrently
  • reward functions remain outcome-based; no need for step-level labels
  • logs become training data (trajectory replay, tool-use diagnostics)

Operational consequence:

  • “agentic competence” becomes a trainable, transferable skill layer.

5) Efficiency model changes

Token accounting:

  • prompt tokens down (files instead)
  • multi-turn overhead up
  • environment output tokens mostly “prefill”, sometimes cheaper than decoding

Engineering implications:

  • throughput depends on:
    • ratio of env tokens vs decoded tokens,
    • turn count,
    • command latency (install/network).

Optimization surface:

  • parallelize safe environment steps (e.g., pre-index files)
  • package caching and pinned dependency layers
  • constrain tool output size; enforce truncation strategies

6) Reliability + safety envelope expands

New risk classes introduced by “computer access”:

  • network exfiltration and data leakage
  • supply-chain attacks via installs
  • prompt injection via local files or web content
  • resource exhaustion (CPU/RAM/disk), fork bombs, infinite loops

Hard controls required:

  • network egress policy (allowlist/denylist), DNS control
  • CPU/memory/time quotas; syscall filtering
  • read-only mounts for sensitive areas; restrict host integration
  • package install policy: mirror + hashes, pinned versions

Auditing requirements:

  • immutable logs of commands, files created/modified, outbound requests
  • reproducible runs (snapshot image + dependency lockfiles)

7) Product-level capability expansion (beyond text)

Sandbox makes file artifacts first-class outputs:

  • html dashboards, posters, charts, audio, video, datasets, code repos

System design implication:

  • outputs are “deliverables”, not prose.
  • evaluation can become artifact-based (render/execute/test).

8) Concrete blueprint (minimal viable agentic stack)

Control plane

  • Task router → sandbox allocator → run loop controller → artifact collector

Data plane

  • Container image + runtime dependency cache
  • /input, /documents, /output directory contract
  • log sink (actions, observations, timestamps, resource metrics)

Policy plane

  • capability toggles (network on/off; install allowed/blocked)
  • quotas (turns, tokens, wall time, CPU, RAM, disk)
  • content filters for outbound requests and sensitive file access

Outcome:

  • a general agent runtime where “tools” are emergent from the OS substrate.

9) What changes in “agentic system design” immediately

  • Treat filesystem + terminal as the default tool substrate.
  • Treat context placement as a systems decision, not a prompting decision.
  • Add first-class observability: capability-use rates + wandering detectors.
  • Make sandbox policy explicit per deployment tier (offline, restricted net, full net).