The Pivot: From Ephemeral Sandboxes to Unified Persistent Workspaces
A Technical and Strategic Evolution
CODITECT Development Studio Architecture Transformation
Executive Summary
In January 2026, the CODITECT architecture team made a fundamental decision: abandon the ephemeral sandbox model in favor of unified persistent workspaces. This document explains why we pivoted, what changed, and what it means for the future of AI-assisted development.
| Aspect | Before (v1.0) | After (v2.0) | Impact |
|---|---|---|---|
| Architecture | 4 ephemeral sandboxes | 1 persistent workspace | Simplified, cohesive |
| Session Lifetime | 30-minute timeout | 8+ hours renewable | Uninterrupted flow |
| Cold Start | 5-10 seconds | 0 seconds | Instant access |
| Multi-Agent | External routing | In-workspace coordination | Seamless collaboration |
| Storage | R2 snapshots | GCS FUSE real-time | Strong durability |
| Cost @ 1K users | $4.20/user | $6.50/user | +55% for 10× experience |
Part 1: The Original Vision (v1.0)
What We Built
CODITECT v1.0 was architected around a simple concept: ephemeral sandboxes for each LLM.
┌─────────────────────────────────────────────────────────────────┐
│ CODITECT v1.0 Architecture │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Sandbox │ │ Sandbox │ │ Sandbox │ │
│ │ Claude │ │ Gemini │ │ Codex │ │
│ │ (30 min) │ │ (30 min) │ │ (30 min) │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │ │
│ └────────────────┼────────────────┘ │
│ │ │
│ ┌────────────▼────────────┐ │
│ │ External Router │ │
│ │ (Routes tasks to LLMs) │ │
│ └─────────────────────────┘ │
│ │
│ Storage: R2 snapshots (periodic) │
│ State: Durable Objects (session registry) │
│ │
└─────────────────────────────────────────────────────────────────┘
The Promise
- Isolation: Each LLM in its own container
- Scalability: Spin up sandboxes on demand
- Cost Control: Pay only for active usage
- Multi-LLM: Support Claude, Gemini, Codex, Kimi
The Reality
Within weeks of internal testing, fundamental problems emerged:
Part 2: The Problems (Why v1.0 Failed)
Problem #1: The Cold Start Penalty
User Experience:
User: "Let me start a session..."
[5 seconds pass]
"Hmm, still loading..."
[10 seconds pass]
"Okay, finally ready."
[Works for 25 minutes]
"Wait, session timed out?"
[5 more seconds to restart]
"This is frustrating."
Technical Reality:
- Each sandbox startup: 5-10 seconds
- Container image pull: 3-5 seconds
- File system mount: 2-3 seconds
- Agent initialization: 2-4 seconds
- Total: 12-26 seconds of waiting
Users expected an IDE experience. They got a "spin up a VM" experience.
Problem #2: The 30-Minute Timeout
Scenario: Developer is deep in a complex refactoring task.
Minute 0: Start session, load codebase
Minute 10: Claude analyzes auth module
Minute 20: Gemini suggests improvements
Minute 28: Codex generates test cases
Minute 30: ⚠️ SESSION TIMEOUT
Result: All context lost. Task interrupted.
Impact:
- Long-running tasks impossible
- Context constantly lost
- Developer flow repeatedly broken
- Unusable for serious development
Problem #3: Multi-Agent Collaboration Was Broken
The v1.0 Promise: "Multiple LLMs working together"
The v1.0 Reality:
User: "Claude, refactor auth.ts"
[External router sends to Claude sandbox]
Claude: "Done!"
[File saved to R2]
User: "Gemini, review the changes"
[External router sends to Gemini sandbox]
Gemini: "What changes?"
[Gemini sandbox has stale filesystem]
User: "Wait, you can't see Claude's edits?"
The Problem:
- Each sandbox had its own R2 mount
- Changes not immediately visible to other sandboxes
- "Collaboration" required manual sync
- Agents couldn't build on each other's work
Problem #4: Complex External Routing
Architecture Complexity:
User Request → Worker → Router → Select LLM → Spawn Sandbox → Wait for Ready → Route Request → Return Response
Failure Modes:
- Router bottlenecks
- Sandbox spawn failures
- Routing logic bugs
- State synchronization issues
- Timeout handling complexity
The Breaking Point
"I was demoing CODITECT to a potential customer. I started a session, made some edits with Claude, then asked Gemini to review. Gemini couldn't see the changes because the sandbox had stale data. The customer asked: 'Why can't they work together?' I had no good answer. We knew we needed to pivot."
— Engineering Lead, CODITECT Team
Part 3: The Epiphany
What If...
The Question: Instead of 4 separate sandboxes, what if we had 1 workspace with all 4 agents inside?
The Insight:
- Agents don't need isolation from each other
- They need shared context
- They need real-time collaboration
- They need persistent state
The New Architecture Emerges
┌─────────────────────────────────────────────────────────────────┐
│ UNIFIED PERSISTENT WORKSPACE │
│ │
│ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │
│ │ Claude │ │ Gemini │ │ Kimi │ │ Codex │ │
│ │ Agent │ │ Agent │ │ Agent │ │ Agent │ │
│ └───┬────┘ └───┬────┘ └───┬────┘ └───┬────┘ │
│ └─────────┬─────────┬─────────┘ │
│ ▼ │
│ ┌─────────────────────────────────────┐ │
│ │ SHARED CONTEXT MANAGER │ │
│ │ • Task queue (prioritized) │ │
│ │ • File locks (prevent conflicts) │ │
│ │ • Message bus (cross-agent comms) │ │
│ └─────────────────────────────────────┘ │
│ ▼ │
│ ┌─────────────────────────────────────┐ │
│ │ SQLITE DATABASE CLUSTER │ │
│ │ (6 databases, WAL mode, ACID) │ │
│ │ │ │
│ │ • sessions.db (500MB) │ │
│ │ • messages.db (2GB) │ │
│ │ • artifacts.db (1GB) │ │
│ │ • parsed_sessions.db (5GB) │ │
│ │ • agent_metrics.db (100MB) │ │
│ │ • workspace_idx.db (200MB) │ │
│ └─────────────────────────────────────┘ │
│ ▼ │
│ ┌─────────────────────────────────────┐ │
│ │ GCS FUSE MOUNT (/projects) │ │
│ │ (Persistent, real-time sync) │ │
│ └─────────────────────────────────────┘ │
│ │
│ Lifetime: 8+ hours (renewable) │
│ Cold Start: 0 seconds (always warm) │
│ │
└─────────────────────────────────────────────────────────────────┘
Part 4: The v2.0 Architecture
Core Principles
-
One Workspace, Multiple Agents
- All 4 LLMs coexist in the same container
- Shared filesystem (GCS FUSE)
- Shared databases (SQLite cluster)
- Shared memory (context manager)
-
Persistent, Not Ephemeral
- 8-hour default lifetime (renewable)
- Survives browser disconnects
- Resume exactly where you left off
- Background sync to GCS
-
In-Workspace Coordination
- No external routing
- Agent orchestrator manages all 4
- File locks prevent conflicts
- Message bus for coordination
-
Strong Data Durability
- SQLite WAL mode (ACID guarantees)
- Real-time GCS sync
- Append-only JSONL audit log
- Git integration for code versioning
Technical Stack
| Layer | v1.0 | v2.0 |
|---|---|---|
| Compute | Ephemeral sandboxes | Persistent containers |
| Lifetime | 30 minutes | 8+ hours |
| Storage | R2 snapshots | GCS FUSE real-time |
| Databases | Durable Objects | SQLite cluster |
| Routing | External router | In-workspace orchestrator |
| Sync | Periodic snapshots | Continuous WAL sync |
The Agent Coordinator
The heart of v2.0 is the Agent Coordinator - a service that runs inside the workspace and manages all 4 agents:
class AgentCoordinator {
// All 4 agents in the same workspace
private agents: Map<AgentId, BaseAgent> = new Map([
['claude', new ClaudeAgent()],
['gemini', new GeminiAgent()],
['kimi', new KimiAgent()],
['codex', new CodexAgent()]
]);
// Shared resources
private taskQueue: PriorityQueue<Task>;
private lockManager: LockManager;
private messageBus: MessageBus;
private databases: DatabaseCluster;
// Coordinate multi-agent tasks
async coordinate(task: MultiAgentTask): Promise<void> {
// 1. Parallel planning from all agents
const plans = await Promise.all(
task.agents.map(agent => this.agents.get(agent)!.plan(task))
);
// 2. Merge plans
const mergedPlan = await this.mergePlans(plans);
// 3. Execute with file locking
for (const subtask of mergedPlan.subtasks) {
await this.lockManager.acquire(subtask.file, subtask.agent, 'exclusive');
await this.agents.get(subtask.agent)!.execute(subtask);
await this.lockManager.release(subtask.file, subtask.agent);
// Notify other agents
this.messageBus.broadcast({
type: 'file_modified',
file: subtask.file,
by: subtask.agent
});
}
// 4. Persist to SQLite
await this.databases.messages.insert({ /* ... */ });
// 5. Sync to GCS
await this.syncToGCS();
}
}
Part 5: The Benefits
Benefit #1: Zero Cold Start
Before:
Start session → Wait 12 seconds → Begin work
After:
Start session → Immediate → Begin work
Workspaces are always warm. The first connection provisions the workspace. Subsequent connections (even after browser close) reconnect instantly.
Benefit #2: True Multi-Agent Collaboration
Before:
Claude edits → Save to R2 → Wait for sync → Gemini sees changes
After:
Claude edits → Instant visibility → Gemini builds on changes
All agents see the same filesystem in real-time. They can:
- Edit the same files (with locks)
- Build on each other's work
- Communicate via message bus
- Share context
Benefit #3: Uninterrupted Flow
Before:
[Work 25 min] → [Timeout!] → [Lose context] → [Restart] → [Re-establish context]
After:
[Work 8 hours] → [Take break] → [Resume exactly where you left off]
Sessions survive:
- Browser closes
- Laptop sleeps
- Network interruptions
- Coffee breaks
Benefit #4: Strong Data Durability
v2.0 Data Protection:
┌─────────────────────────────────────┐
│ DATA DURABILITY STACK │
├─────────────────────────────────────┤
│ Layer 1: SQLite WAL Mode │
│ (ACID transactions) │
├─────────────────────────────────────┤
│ Layer 2: GCS FUSE Mount │
│ (Real-time sync) │
├─────────────────────────────────────┤
│ Layer 3: JSONL Archive │
│ (Immutable audit log) │
├─────────────────────────────────────┤
│ Layer 4: Git Integration │
│ (Version control) │
├─────────────────────────────────────┤
│ Result: 99.999% durability │
└─────────────────────────────────────┘
Benefit #5: Simplified Architecture
v1.0 Complexity:
- External routing service
- Sandbox lifecycle management
- Cross-sandbox synchronization
- Complex timeout handling
v2.0 Simplicity:
- Single workspace provisioner
- In-workspace coordinator
- Continuous GCS sync
- Simple hibernate/resume
Part 6: The Trade-Offs
Trade-Off #1: Higher Cost (+55%)
| Metric | v1.0 | v2.0 | Change |
|---|---|---|---|
| @ 1K users/month | $4,200 | $6,500 | +$2,300 (+55%) |
| Per user/month | $4.20 | $6.50 | +$2.30 |
Why Higher?
- Persistent containers run 24/7
- GCS operations vs R2 snapshots
- More observability needed
- SQLite cluster overhead
Justification:
- 10× better user experience
- Competitive advantage
- Developer productivity gains
- Premium pricing support
Trade-Off #2: Implementation Complexity
New Challenges:
- GCS FUSE tuning required
- SQLite clustering (non-trivial)
- File lock management
- Resource contention (4 agents, 1 container)
Mitigations:
- 14-week implementation timeline
- Experienced team (8 engineers)
- Gradual rollout with fallbacks
- Extensive load testing
Trade-Off #3: Resource Contention
The Challenge:
- 4 agents share 2 vCPU / 4GB RAM
- Noisy neighbor problem
- One busy agent affects others
Mitigations:
- Per-agent CPU/memory limits
- cgroup isolation
- Task queue prioritization
- Auto-scaling to larger instances
Part 7: The Migration Path
Phase 1: Foundation (Weeks 1-4)
- GCS FUSE setup
- SQLite cluster implementation
- Workspace container provisioning
- Basic worker integration
Phase 2: Agents (Weeks 5-7)
- Agent coordinator development
- 4 LLM adapters (Claude, Gemini, Codex, Kimi)
- File lock manager
- Message bus implementation
Phase 3: Frontend (Weeks 8-9)
- Multi-agent UI
- Agent activity panel
- File lock indicators
- Session timeline
Phase 4: Migration (Weeks 10-11)
- v1.0 → v2.0 data migration
- Blue-green deployment
- Zero-downtime cutover
- Rollback capability
Phase 5: Optimization (Weeks 12-14)
- Performance tuning
- Cost optimization (target: <$7/user)
- Load testing (10K users)
- Security audit
Part 8: Success Metrics
| Metric | v1.0 | v2.0 Target | Measurement |
|---|---|---|---|
| Cold Start | 5-10s | < 1s | Time to interactive |
| Session Timeout | 30 min | Never | User disconnects |
| Agent Collaboration | Broken | Seamless | Multi-agent tasks |
| Data Durability | 99.9% | 99.999% | Zero data loss |
| User Satisfaction | 3.2/5 | > 4.0/5 | Post-session survey |
| Task Completion | 65% | 90% | End-to-end tasks |
Conclusion
The Pivot Was Necessary
v1.0's ephemeral sandbox model was fundamentally flawed for collaborative AI development. The cold starts, timeouts, and broken multi-agent coordination made it unusable for serious development work.
v2.0 Is the Right Architecture
Unified persistent workspaces provide:
- ✅ Zero cold start
- ✅ True multi-agent collaboration
- ✅ Strong data durability
- ✅ Simplified architecture
- ✅ Premium user experience
The Investment Is Worth It
| Investment | Return |
|---|---|
| +55% infrastructure cost | 10× better experience |
| 14 weeks development | Market differentiation |
| $707K engineering cost | Premium pricing support |
The Bottom Line:
"We could have shipped v1.0 and called it done. But it wouldn't have been great. v2.0 is what we envisioned when we started CODITECT - multiple AI agents working together seamlessly in a persistent, collaborative environment. That's worth the pivot."
Appendix: Architecture Comparison
Side-by-Side
v1.0: Ephemeral Sandboxes v2.0: Unified Persistent Workspace
┌─────────────────────────┐ ┌─────────────────────────┐
│ 4 Separate Containers │ │ 1 Shared Container │
│ • Claude (30 min) │ │ • Claude Agent │
│ • Gemini (30 min) │ PIVOT │ • Gemini Agent │
│ • Kimi (30 min) │ ────────► │ • Kimi Agent │
│ • Codex (30 min) │ │ • Codex Agent │
│ │ │ │
│ External Router │ │ In-Workspace Coordinator│
│ R2 Snapshots │ │ GCS FUSE Real-Time │
│ Durable Objects │ │ SQLite Cluster │
│ 5-10s Cold Start │ │ 0s Cold Start │
│ 30-min Timeout │ │ 8+ Hour Lifetime │
└─────────────────────────┘ └─────────────────────────┘
Document Version: 1.0.0
Last Updated: 2026-01-31
Status: Final
Related Documents: