Inter-Session Messaging Architecture - MoE Final Verdict
Task: H.13 - Inter-Session Communication Layer Date: 2026-02-06 Author: Claude (Opus 4.6) MoE Phase: 4 of 4 (Final Verdict)
Process Summary
Research Phase (3 Parallel Agents)
| Agent | Scope | Key Finding |
|---|---|---|
| Agent 1 (Motia Deep-Dive) | Architecture, license, maturity, performance analysis of Motia | NOT SUITABLE: In-process event bus, ELv2 license, Rust rewrite incoming, 1.5/5 overall fit |
| Agent 2 (Alternatives) | 6 alternatives profiled with code, benchmarks, license analysis | SQLite best fit: Zero deps, ACID, aligns with ADR-118. File JSON is fallback |
| Agent 3 (Industry Landscape) | 8 competing tools, MCP/A2A protocols, market patterns | Industry consensus: File-based or SQLite. No leading AI coding tool uses message brokers for this |
Evaluation Matrix (5 Weighted Sub-Tables)
7 candidates scored across 34 attributes in 5 categories:
| Category | Weight | Purpose |
|---|---|---|
| Technical Fit | 30% | Does it solve the problem? |
| Operational | 25% | How hard to deploy/maintain? |
| Strategic Fit | 20% | Does it align with CODITECT's direction? |
| Risk Assessment | 15% | What could go wrong? |
| Long-Term Value | 10% | Will it still be right in 3 years? |
Judge Panel (3 Perspectives)
| Judge | Perspective | Model | Confidence | Agrees? |
|---|---|---|---|---|
| Judge 1 | Technical Architecture & Risk | Claude Opus 4.6 | 82% | Yes |
| Judge 2 | Systems Engineering & Integration | Kimi k2.5 | Yes, with caveats | Yes |
| Judge 3 | Industry Ecosystem & Longevity | Gemini 2.5 Pro | 88% | Yes |
Final Ranking (Post-Judge Adjustments)
| Rank | Solution | Original Score | Judge-Adjusted Score | Verdict |
|---|---|---|---|---|
| 1 | SQLite Pub/Sub (messaging.db) | 94.6% | 92.7% | Clear winner |
| 2 | File-based JSON Manifest | 79.6% | 79.6% | Strong runner-up |
| 3 | NATS.io | 76.2% | 77.0% | Best "real" broker -- overkill |
| 4 | Unix Domain Sockets | 73.8% | 75.7% | Best latency -- daemon overhead |
| 5 | Redis Pub/Sub | 73.9% | 74.8% | Excellent -- license risk |
| 6 | Claude Code Agent Teams | 51.5% | 52.1% | Claude-only (disqualified) |
| 7 | Motia Framework | 40.1% | 40.1% | Wrong architecture (disqualified) |
Winner: SQLite with dedicated messaging.db + kqueue/inotify hybrid notification
Decision
What We Will Build
A lightweight inter-session coordination layer using:
messaging.db-- A new, small (<1 MB), dedicated SQLite database in WAL mode, separate from sessions.db (18.4 GB)- kqueue/inotify notification -- OS-level file system watchers on the WAL file for near-instant push notifications (macOS: kqueue, Linux: inotify)
- MessageBus abstraction -- Clean Python interface enabling future transport replacement for CODITECT Cloud
What We Will NOT Build
- No Motia integration (wrong architecture, ELv2 license)
- No external message broker (Redis, NATS, RabbitMQ)
- No daemon process (no SPOF, no new LaunchAgent)
- No dependency on Claude Code Agent Teams (Claude-only)
Why SQLite Over File-Based JSON
Despite the industry pattern of file-based coordination (Agent Teams, Cursor, Aider), SQLite wins because CODITECT's problem is harder than theirs:
| Competitor Problem | CODITECT Problem |
|---|---|
| Coordinate 2-5 Claude sessions | Coordinate 5-10 sessions across 4+ LLM vendors |
| Same-vendor, same-protocol | Mixed vendors, mixed protocols |
| Single team lead, hierarchical | Peer-to-peer, decentralized |
| File conflicts are rare (worktrees) | File conflicts are the primary risk |
SQLite provides ACID guarantees, crash recovery, message ordering (ROWID), and concurrent access control that file-based patterns cannot. The 13-point gap (92.7% vs 79.6%) is justified by the problem complexity difference.
Key Judge Findings Incorporated
From Judge 1 (Technical Architecture)
- Mandatory MessageBus abstraction -- Clean separation between API and transport for cloud upgrade
- Poll interval configuration -- Default 250ms, configurable per deployment
- SQLITE_BUSY retry logic -- Exponential backoff with jitter on all write paths
- Message TTL and cleanup -- Automatic purging to prevent unbounded growth
- Benchmark gate -- Required: 10 concurrent sessions, 250ms polling, measure CPU and p99 latency
From Judge 2 (Systems Engineering)
- CRITICAL: Use separate messaging.db, NOT sessions.db -- Sessions.db is 18.4 GB with 80+ tables. Adding pub/sub polling would create SQLITE_BUSY contention with context extraction and tool analytics
- Hybrid kqueue notification -- Watch WAL file for changes, eliminate polling latency entirely
- No retry logic exists in codebase today -- Must be added to messaging.db at minimum
- Existing message_bus.py (RabbitMQ) is dead code -- CODITECT already tried the broker path and abandoned it
From Judge 3 (Industry Ecosystem)
- Agent Teams composability -- Design as participant in coordination, not competitor
- Cloud evolution phases: SQLite (launch) -> PostgreSQL LISTEN/NOTIFY (team features) -> optional broker (1000+ sessions)
- Standards watch -- Monitor MCP and A2A for inter-session relevance (12-18 months)
- Competitive differentiation is the capability, not the transport -- Focus engineering on multi-LLM coordination features
Architecture Overview
┌──────────────────────────────────────────────────────────────┐
│ Developer Machine │
│ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Claude │ │ Codex │ │ Gemini │ ...more │
│ │ Session │ │ Session │ │ Session │ │
│ └─────┬──────┘ └─────┬──────┘ └─────┬──────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────┐ │
│ │ MessageBus (Python API) │ │
│ │ publish() / subscribe() / poll() │ │
│ └─────────────────┬───────────────────────────┘ │
│ │ │
│ ┌───────────┴───────────┐ │
│ │ messaging.db │ ← kqueue/inotify watch │
│ │ (WAL mode, <1 MB) │ on WAL file changes │
│ │ │ │
│ │ inter_session_msgs │ │
│ │ session_registry │ │
│ │ file_locks │ │
│ └──────────────────────┘ │
│ │
│ Existing (unchanged): │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │platform │ │ org.db │ │sessions │ │context.db│ │
│ │.db (T1) │ │ (T2) │ │.db (T3) │ │(LEGACY) │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
└──────────────────────────────────────────────────────────────┘
Database Schema (messaging.db)
-- Session registry
CREATE TABLE session_registry (
session_id TEXT PRIMARY KEY,
llm_vendor TEXT NOT NULL, -- claude, codex, gemini, kimi
llm_model TEXT, -- opus-4.6, o3, 2.5-pro, k2.5
tty TEXT,
pid INTEGER,
project_id TEXT,
task_id TEXT,
active_files TEXT, -- JSON array of files being edited
heartbeat_at TEXT NOT NULL,
registered_at TEXT NOT NULL DEFAULT (datetime('now')),
status TEXT DEFAULT 'active' -- active, idle, terminated
);
-- Inter-session messages
CREATE TABLE inter_session_messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
sender_id TEXT NOT NULL,
channel TEXT NOT NULL, -- state, file_conflict, task_broadcast, heartbeat
payload TEXT NOT NULL, -- JSON
created_at TEXT NOT NULL DEFAULT (datetime('now')),
ttl_seconds INTEGER DEFAULT 300, -- 5 minute default TTL
FOREIGN KEY (sender_id) REFERENCES session_registry(session_id)
);
-- File lock tracking
CREATE TABLE file_locks (
file_path TEXT PRIMARY KEY,
session_id TEXT NOT NULL,
lock_type TEXT DEFAULT 'advisory', -- advisory, exclusive
locked_at TEXT NOT NULL DEFAULT (datetime('now')),
FOREIGN KEY (session_id) REFERENCES session_registry(session_id)
);
-- Indexes
CREATE INDEX idx_messages_channel_created ON inter_session_messages(channel, created_at);
CREATE INDEX idx_messages_ttl ON inter_session_messages(created_at, ttl_seconds);
CREATE INDEX idx_registry_status ON session_registry(status);
CREATE INDEX idx_registry_heartbeat ON session_registry(heartbeat_at);
Message Channels
| Channel | Purpose | TTL | Example Payload |
|---|---|---|---|
state | Session status broadcasts | 60s | {"task_id": "H.8.1", "status": "working"} |
file_conflict | File edit conflict warnings | 300s | {"file": "paths.py", "sessions": ["s1", "s2"]} |
task_broadcast | Task routing between sessions | 600s | {"task_id": "A.9.1", "action": "claim"} |
heartbeat | Session liveness detection | 30s | {"session_id": "s1", "alive": true} |
Implementation Plan
| Phase | Task | Effort | Milestone |
|---|---|---|---|
| 1 | Create scripts/core/message_bus.py with abstraction interface | 2 days | MessageBus ABC + SQLiteMessageBus |
| 2 | Create messaging.db initialization in paths.py | 0.5 day | get_messaging_db_path() |
| 3 | Add kqueue/inotify notification watcher | 1.5 days | Near-instant push on WAL change |
| 4 | Session registration hook (PreToolUse) | 1 day | Auto-register on first tool use |
| 5 | File conflict detection | 1 day | Advisory locks + conflict channel |
| 6 | Benchmark with 10 concurrent sessions | 0.5 day | Validate <50ms p99, <5% CPU |
| 7 | Write ADR-160 | 0.5 day | Architecture documented |
| Total | 7 days |
Risks and Mitigations
| Risk | Probability | Severity | Mitigation |
|---|---|---|---|
| kqueue not available (non-macOS) | Low | Medium | Fallback to 250ms polling with watchdog library |
| SQLite BUSY under load | Low | Medium | Separate messaging.db + exponential backoff retry |
| Message table growth | Medium | Low | TTL-based cleanup cron, 5-minute default TTL |
| Schema migration | Low | Low | messaging.db version tracked in _schema_version table |
| Cloud upgrade complexity | Medium | Medium | MessageBus abstraction enables transport swap |
Deliverables Checklist
- Evaluation matrix with 5 weighted sub-tables (34 attributes, 7 candidates)
- 3-judge MoE panel review (unanimous agreement on ranking)
- Final verdict document (this file)
- ADR-160: Inter-Session Messaging Architecture
- Session log entry with findings
- TRACK-H update with H.13 task definitions
Documents Produced
| Document | Path |
|---|---|
| Evaluation Matrix | internal/analysis/inter-session-messaging/evaluation-matrix.md |
| Final Verdict | internal/analysis/inter-session-messaging/final-verdict.md |
| Judge 3 Analysis | internal/analysis/inter-session-messaging/judge-3-industry-ecosystem-analysis.md |
| ADR-160 | internal/architecture/adrs/ADR-160-inter-session-messaging-architecture.md (pending) |