Canvas 3 — Impact on agentic system design (architecture implications)
1) Inference architecture shifts
From: “LLM produces answer”
To: “LLM controls a computer substrate”
Required components:
- Sandbox provisioner: ephemeral container per task or per session
- I/O contract: fixed directories + final answer file extraction
- Tool gateway: minimal, universal tools (shell + file editor + submit)
- Trajectory store: persist action/observation logs for debugging + training
Design consequence:
- the environment becomes the long-context buffer, scratchpad, and verifier.
2) Context handling: prompt budget → filesystem budget
Mechanism:
- keep prompt thin (instructions + pointers)
- place large context as files
- rely on search + selective reads
System implications:
- deterministic retrieval primitives (
grep,ripgrep,sqlite, embeddings) - caching of preprocessed indices for repeated queries
- document chunking as first-class preprocessing, not prompt engineering
3) Tooling philosophy: “meta-tools” dominate
The shell is a universal API surface:
- install capabilities at runtime
- run domain software
- compose pipelines (grep → parse → compute → format)
System implications:
- fewer bespoke tool integrations
- higher need for policy (what is allowed) and guardrails (quotas)
4) Training pipeline implications (sandbox-native post-training)
Key idea:
- train exploration using general tasks by relocating context into the sandbox
System implications:
- training infrastructure must run many sandboxes concurrently
- reward functions remain outcome-based; no need for step-level labels
- logs become training data (trajectory replay, tool-use diagnostics)
Operational consequence:
- “agentic competence” becomes a trainable, transferable skill layer.
5) Efficiency model changes
Token accounting:
- prompt tokens down (files instead)
- multi-turn overhead up
- environment output tokens mostly “prefill”, sometimes cheaper than decoding
Engineering implications:
- throughput depends on:
- ratio of env tokens vs decoded tokens,
- turn count,
- command latency (install/network).
Optimization surface:
- parallelize safe environment steps (e.g., pre-index files)
- package caching and pinned dependency layers
- constrain tool output size; enforce truncation strategies
6) Reliability + safety envelope expands
New risk classes introduced by “computer access”:
- network exfiltration and data leakage
- supply-chain attacks via installs
- prompt injection via local files or web content
- resource exhaustion (CPU/RAM/disk), fork bombs, infinite loops
Hard controls required:
- network egress policy (allowlist/denylist), DNS control
- CPU/memory/time quotas; syscall filtering
- read-only mounts for sensitive areas; restrict host integration
- package install policy: mirror + hashes, pinned versions
Auditing requirements:
- immutable logs of commands, files created/modified, outbound requests
- reproducible runs (snapshot image + dependency lockfiles)
7) Product-level capability expansion (beyond text)
Sandbox makes file artifacts first-class outputs:
- html dashboards, posters, charts, audio, video, datasets, code repos
System design implication:
- outputs are “deliverables”, not prose.
- evaluation can become artifact-based (render/execute/test).
8) Concrete blueprint (minimal viable agentic stack)
Control plane
- Task router → sandbox allocator → run loop controller → artifact collector
Data plane
- Container image + runtime dependency cache
/input,/documents,/outputdirectory contract- log sink (actions, observations, timestamps, resource metrics)
Policy plane
- capability toggles (network on/off; install allowed/blocked)
- quotas (turns, tokens, wall time, CPU, RAM, disk)
- content filters for outbound requests and sensitive file access
Outcome:
- a general agent runtime where “tools” are emergent from the OS substrate.
9) What changes in “agentic system design” immediately
- Treat filesystem + terminal as the default tool substrate.
- Treat context placement as a systems decision, not a prompting decision.
- Add first-class observability: capability-use rates + wandering detectors.
- Make sandbox policy explicit per deployment tier (offline, restricted net, full net).