Canvas 3 — Impact on agentic system design (architecture implications)

1) Inference architecture shifts

From: “LLM produces answer”

To: “LLM controls a computer substrate”

Required components:

Sandbox provisioner: ephemeral container per task or per session
I/O contract: fixed directories + final answer file extraction
Tool gateway: minimal, universal tools (shell + file editor + submit)
Trajectory store: persist action/observation logs for debugging + training

Design consequence:

the environment becomes the long-context buffer, scratchpad, and verifier.

2) Context handling: prompt budget → filesystem budget

Mechanism:

keep prompt thin (instructions + pointers)
place large context as files
rely on search + selective reads

System implications:

deterministic retrieval primitives (grep, ripgrep, sqlite, embeddings)
caching of preprocessed indices for repeated queries
document chunking as first-class preprocessing, not prompt engineering

3) Tooling philosophy: “meta-tools” dominate

The shell is a universal API surface:

install capabilities at runtime
run domain software
compose pipelines (grep → parse → compute → format)

System implications:

fewer bespoke tool integrations
higher need for policy (what is allowed) and guardrails (quotas)

4) Training pipeline implications (sandbox-native post-training)

Key idea:

train exploration using general tasks by relocating context into the sandbox

System implications:

training infrastructure must run many sandboxes concurrently
reward functions remain outcome-based; no need for step-level labels
logs become training data (trajectory replay, tool-use diagnostics)

Operational consequence:

“agentic competence” becomes a trainable, transferable skill layer.

5) Efficiency model changes

Token accounting:

prompt tokens down (files instead)
multi-turn overhead up
environment output tokens mostly “prefill”, sometimes cheaper than decoding

Engineering implications:

throughput depends on:
- ratio of env tokens vs decoded tokens,
- turn count,
- command latency (install/network).

Optimization surface:

parallelize safe environment steps (e.g., pre-index files)
package caching and pinned dependency layers
constrain tool output size; enforce truncation strategies

6) Reliability + safety envelope expands

New risk classes introduced by “computer access”:

network exfiltration and data leakage
supply-chain attacks via installs
prompt injection via local files or web content
resource exhaustion (CPU/RAM/disk), fork bombs, infinite loops

Hard controls required:

network egress policy (allowlist/denylist), DNS control
CPU/memory/time quotas; syscall filtering
read-only mounts for sensitive areas; restrict host integration
package install policy: mirror + hashes, pinned versions

Auditing requirements:

immutable logs of commands, files created/modified, outbound requests
reproducible runs (snapshot image + dependency lockfiles)

7) Product-level capability expansion (beyond text)

Sandbox makes file artifacts first-class outputs:

html dashboards, posters, charts, audio, video, datasets, code repos

System design implication:

outputs are “deliverables”, not prose.
evaluation can become artifact-based (render/execute/test).

8) Concrete blueprint (minimal viable agentic stack)

Control plane

Task router → sandbox allocator → run loop controller → artifact collector

Data plane

Container image + runtime dependency cache
/input, /documents, /output directory contract
log sink (actions, observations, timestamps, resource metrics)

Policy plane

capability toggles (network on/off; install allowed/blocked)
quotas (turns, tokens, wall time, CPU, RAM, disk)
content filters for outbound requests and sensitive file access

Outcome:

a general agent runtime where “tools” are emergent from the OS substrate.

9) What changes in “agentic system design” immediately

Treat filesystem + terminal as the default tool substrate.
Treat context placement as a systems decision, not a prompting decision.
Add first-class observability: capability-use rates + wandering detectors.
Make sandbox policy explicit per deployment tier (offline, restricted net, full net).

1) Inference architecture shifts​

From: “LLM produces answer”​

To: “LLM controls a computer substrate”​

2) Context handling: prompt budget → filesystem budget​

3) Tooling philosophy: “meta-tools” dominate​

4) Training pipeline implications (sandbox-native post-training)​

5) Efficiency model changes​

6) Reliability + safety envelope expands​

7) Product-level capability expansion (beyond text)​

8) Concrete blueprint (minimal viable agentic stack)​

Control plane​

Data plane​

Policy plane​

9) What changes in “agentic system design” immediately​

1) Inference architecture shifts

From: “LLM produces answer”

To: “LLM controls a computer substrate”

2) Context handling: prompt budget → filesystem budget

3) Tooling philosophy: “meta-tools” dominate

4) Training pipeline implications (sandbox-native post-training)

5) Efficiency model changes

6) Reliability + safety envelope expands

7) Product-level capability expansion (beyond text)

8) Concrete blueprint (minimal viable agentic stack)

Control plane

Data plane

Policy plane

9) What changes in “agentic system design” immediately