ADR-160: Inter-Session Messaging Architecture
Status
Accepted (2026-02-06)
Context
CODITECT uniquely supports concurrent sessions from multiple LLM vendors (Claude, Codex, Gemini, Kimi) on a single developer machine. When 5-10 sessions run simultaneously, they have no awareness of each other, leading to:
- File conflicts -- Two sessions editing the same file without knowledge of the other
- Duplicate work -- Sessions claiming the same task independently
- No status visibility -- No session knows what others are working on
- No task routing -- Cannot direct work to the session with the right context
This is a novel problem. No competing tool (Cursor, Windsurf, Devin, Copilot Workspace, SWE-agent, Aider) attempts multi-vendor LLM session coordination. Claude Code Agent Teams (released 2026-02-05) coordinates Claude-to-Claude only.
Decision Drivers
- Zero new dependencies -- CODITECT is installed locally on customer machines. Every dependency is a customer installation requirement.
- LLM-vendor agnostic -- Must coordinate Claude, Codex, Gemini, and Kimi sessions equally.
- Crash recovery -- Sessions may be killed, crash, or run out of context at any time.
- Minimal resource overhead -- Developers run CODITECT alongside resource-intensive IDEs and LLM sessions.
- Cloud upgrade path -- CODITECT Cloud (api.coditect.ai) is a future requirement.
Evaluation Process
A formal MoE (Mixture of Experts) evaluation was conducted:
- 3 parallel research agents analyzed Motia framework, 6 alternatives, and industry landscape
- 7 candidates scored across 34 attributes in 5 weighted categories
- 3-judge panel (Technical Architecture, Systems Engineering, Industry Ecosystem) reviewed the scoring
- Unanimous agreement on the final ranking
Full analysis: internal/analysis/inter-session-messaging/
Decision
Use a dedicated SQLite database (messaging.db) in WAL mode with kqueue/inotify hybrid notification for inter-session coordination.
Architecture
┌──────────────────────────────────────────────────────────┐
│ Developer Machine │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐│
│ │ Claude │ │ Codex │ │ Gemini │ │ Kimi ││
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘│
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────┐│
│ │ MessageBus Abstraction ││
│ │ publish(channel, payload) -> None ││
│ │ subscribe(channel, callback) -> Handle ││
│ │ poll(channel, since_id) -> List[Message] ││
│ │ register_session(session_info) -> None ││
│ │ heartbeat() -> None ││
│ └────────────────────┬─────────────────────────────────┘│
│ │ │
│ ┌──────────┴──────────┐ │
│ │ messaging.db │ ← kqueue/inotify │
│ │ (WAL mode) │ watches WAL file │
│ │ < 1 MB │ for push notification│
│ └─────────────────────┘ │
│ │
│ Unchanged: platform.db | org.db | sessions.db │
└──────────────────────────────────────────────────────────┘
Key Design Decisions
-
Separate
messaging.db-- NOT added tosessions.db(18.4 GB, 80+ tables). A dedicated database avoids SQLITE_BUSY contention with existing context extraction and tool analytics workloads. -
kqueue/inotify hybrid -- OS-level filesystem watchers on the WAL file provide near-instant (<50ms) push notifications without polling overhead. Fallback to 250ms polling on unsupported platforms.
-
MessageBus abstraction -- Clean Python ABC interface (
scripts/core/message_bus.py) enables transport replacement for CODITECT Cloud without changing calling code. -
Advisory file locks -- Tracked in messaging.db (not kernel-level flock) for cross-LLM visibility. Sessions register files they are editing; conflict detection is advisory, not blocking.
-
TTL-based message cleanup -- Messages auto-expire (default 5 minutes). No unbounded table growth. Cleanup runs on each write operation.
Database Schema
CREATE TABLE session_registry (
session_id TEXT PRIMARY KEY,
llm_vendor TEXT NOT NULL,
llm_model TEXT,
tty TEXT,
pid INTEGER,
project_id TEXT,
task_id TEXT,
active_files TEXT, -- JSON array
heartbeat_at TEXT NOT NULL,
registered_at TEXT NOT NULL DEFAULT (datetime('now')),
status TEXT DEFAULT 'active'
);
CREATE TABLE inter_session_messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
sender_id TEXT NOT NULL,
channel TEXT NOT NULL,
payload TEXT NOT NULL,
created_at TEXT NOT NULL DEFAULT (datetime('now')),
ttl_seconds INTEGER DEFAULT 300
);
CREATE TABLE file_locks (
file_path TEXT PRIMARY KEY,
session_id TEXT NOT NULL,
lock_type TEXT DEFAULT 'advisory',
locked_at TEXT NOT NULL DEFAULT (datetime('now'))
);
Message Channels
| Channel | Purpose | TTL |
|---|---|---|
state | Session status broadcasts | 60s |
file_conflict | File edit conflict warnings | 300s |
task_broadcast | Task routing between sessions | 600s |
heartbeat | Session liveness detection | 30s |
API Surface
from scripts.core.message_bus import get_message_bus
bus = get_message_bus() # Returns SQLiteMessageBus by default
# Register this session
bus.register_session(
session_id="sess-abc123",
llm_vendor="claude",
llm_model="opus-4.6",
project_id="PILOT"
)
# Publish a message
bus.publish("state", {"task_id": "H.8.1", "status": "working"})
# Subscribe with callback (uses kqueue internally)
handle = bus.subscribe("file_conflict", my_conflict_handler)
# Poll for messages (fallback)
messages = bus.poll("task_broadcast", since_id=42)
# Lock a file (advisory)
bus.lock_file("scripts/core/paths.py")
bus.unlock_file("scripts/core/paths.py")
# Heartbeat (call every 15s)
bus.heartbeat()
Alternatives Considered
Rejected: Motia Framework (Score: 40.1%)
- In-process event bus -- cannot serve external subscribers
- Elastic License v2 (changed from MIT, Nov 2025) -- managed service restriction
- Core rewrite to Rust/Go in progress -- current API will be deprecated
- 1,011 commits in 2025, only 13 in 2026 -- declining activity
Runner-Up: File-based JSON Manifest (Score: 79.6%)
- Simplest possible approach; used by Claude Code Agent Teams
- Rejected because CODITECT's multi-vendor, peer-to-peer coordination problem is harder than Agent Teams' single-vendor, hierarchical problem
- No ACID guarantees, no crash recovery, race conditions under concurrent writes
Considered: NATS.io (Score: 77.0%)
- Excellent messaging system, CNCF graduated, Apache 2.0
- Rejected because it adds a server process for a ~2 msgs/sec workload
- 10M+ msgs/sec capacity is 5 million times more than needed
Considered: Redis Pub/Sub (Score: 74.8%)
- Excellent latency (<0.1ms), true push pub/sub
- Rejected due to license instability (changed twice in 18 months), infrastructure overhead
- AGPLv3 has Valkey/KeyDB as drop-in alternatives, but adds operational burden
Considered: Unix Domain Sockets (Score: 75.7%)
- Best latency option (~0.05ms), zero network overhead
- Rejected because it requires building and maintaining a custom daemon (SPOF)
- CODITECT already has 4 LaunchAgent daemons; adding a 5th is feasible but adds operational complexity
Disqualified: Claude Code Agent Teams (Score: 52.1%)
- Claude-only -- cannot coordinate Codex, Gemini, or Kimi sessions
- No programmatic API -- no Python/TypeScript SDK
- Experimental with known bugs (task status lag, failed cleanup, no session resumption)
Consequences
Positive
- Zero new dependencies -- Only Python stdlib and SQLite (already in stack)
- ACID crash recovery -- Incomplete transactions auto-rollback; no orphaned state
- Sub-50ms notification -- kqueue/inotify push on WAL file changes
- Consistent architecture -- Follows ADR-118 four-tier database pattern
- Clean upgrade path -- MessageBus abstraction enables CloudMessageBus for api.coditect.ai
- Testable -- SQLite in-memory mode for unit tests, no infrastructure needed
Negative
- SQLite is single-writer -- Concurrent writes serialize. Mitigated by separate database and TTL cleanup.
- No true pub/sub semantics -- SQLite has no notification mechanism. Mitigated by kqueue/inotify hybrid.
- macOS/Linux only -- kqueue (macOS) and inotify (Linux) are OS-specific. Fallback polling on other platforms.
- Cloud tier requires transport replacement -- SQLiteMessageBus cannot serve networked clients. CloudMessageBus must be implemented for CODITECT Cloud.
Cloud Evolution Path
| Phase | Trigger | Transport |
|---|---|---|
| 1 (Launch) | Now | SQLiteMessageBus (local) |
| 2 (Team features) | Multi-machine requirement | PostgreSQL LISTEN/NOTIFY via CloudMessageBus |
| 3 (Scale) | 1000+ concurrent sessions | Optional broker (NATS/Redis) via CloudMessageBus |
Compliance
- ADR-118 (Four-Tier DB): messaging.db is a new purpose-specific database, consistent with tier separation
- ADR-053 (Cloud Sync): MessageBus abstraction provides the hook for cloud-tier sync
- ADR-089 (Data Separation): messaging.db stores ephemeral coordination data, not customer knowledge
References
- Full evaluation:
internal/analysis/inter-session-messaging/evaluation-matrix.md - Final verdict:
internal/analysis/inter-session-messaging/final-verdict.md - Judge 3 analysis:
internal/analysis/inter-session-messaging/judge-3-industry-ecosystem-analysis.md - Industry research: Cursor, Windsurf, Devin, OpenHands, SWE-agent, Aider, Copilot Workspace
- Protocol research: MCP (Nov 2025 spec), A2A (v0.3, July 2025)
- Claude Code Agent Teams: Released Feb 5, 2026 with Opus 4.6