Skip to main content

ADR-160: Inter-Session Messaging Architecture

Status

Accepted (2026-02-06)

Context

CODITECT uniquely supports concurrent sessions from multiple LLM vendors (Claude, Codex, Gemini, Kimi) on a single developer machine. When 5-10 sessions run simultaneously, they have no awareness of each other, leading to:

  1. File conflicts -- Two sessions editing the same file without knowledge of the other
  2. Duplicate work -- Sessions claiming the same task independently
  3. No status visibility -- No session knows what others are working on
  4. No task routing -- Cannot direct work to the session with the right context

This is a novel problem. No competing tool (Cursor, Windsurf, Devin, Copilot Workspace, SWE-agent, Aider) attempts multi-vendor LLM session coordination. Claude Code Agent Teams (released 2026-02-05) coordinates Claude-to-Claude only.

Decision Drivers

  • Zero new dependencies -- CODITECT is installed locally on customer machines. Every dependency is a customer installation requirement.
  • LLM-vendor agnostic -- Must coordinate Claude, Codex, Gemini, and Kimi sessions equally.
  • Crash recovery -- Sessions may be killed, crash, or run out of context at any time.
  • Minimal resource overhead -- Developers run CODITECT alongside resource-intensive IDEs and LLM sessions.
  • Cloud upgrade path -- CODITECT Cloud (api.coditect.ai) is a future requirement.

Evaluation Process

A formal MoE (Mixture of Experts) evaluation was conducted:

  1. 3 parallel research agents analyzed Motia framework, 6 alternatives, and industry landscape
  2. 7 candidates scored across 34 attributes in 5 weighted categories
  3. 3-judge panel (Technical Architecture, Systems Engineering, Industry Ecosystem) reviewed the scoring
  4. Unanimous agreement on the final ranking

Full analysis: internal/analysis/inter-session-messaging/

Decision

Use a dedicated SQLite database (messaging.db) in WAL mode with kqueue/inotify hybrid notification for inter-session coordination.

Architecture

┌──────────────────────────────────────────────────────────┐
│ Developer Machine │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐│
│ │ Claude │ │ Codex │ │ Gemini │ │ Kimi ││
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘│
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────┐│
│ │ MessageBus Abstraction ││
│ │ publish(channel, payload) -> None ││
│ │ subscribe(channel, callback) -> Handle ││
│ │ poll(channel, since_id) -> List[Message] ││
│ │ register_session(session_info) -> None ││
│ │ heartbeat() -> None ││
│ └────────────────────┬─────────────────────────────────┘│
│ │ │
│ ┌──────────┴──────────┐ │
│ │ messaging.db │ ← kqueue/inotify │
│ │ (WAL mode) │ watches WAL file │
│ │ < 1 MB │ for push notification│
│ └─────────────────────┘ │
│ │
│ Unchanged: platform.db | org.db | sessions.db │
└──────────────────────────────────────────────────────────┘

Key Design Decisions

  1. Separate messaging.db -- NOT added to sessions.db (18.4 GB, 80+ tables). A dedicated database avoids SQLITE_BUSY contention with existing context extraction and tool analytics workloads.

  2. kqueue/inotify hybrid -- OS-level filesystem watchers on the WAL file provide near-instant (<50ms) push notifications without polling overhead. Fallback to 250ms polling on unsupported platforms.

  3. MessageBus abstraction -- Clean Python ABC interface (scripts/core/message_bus.py) enables transport replacement for CODITECT Cloud without changing calling code.

  4. Advisory file locks -- Tracked in messaging.db (not kernel-level flock) for cross-LLM visibility. Sessions register files they are editing; conflict detection is advisory, not blocking.

  5. TTL-based message cleanup -- Messages auto-expire (default 5 minutes). No unbounded table growth. Cleanup runs on each write operation.

Database Schema

CREATE TABLE session_registry (
session_id TEXT PRIMARY KEY,
llm_vendor TEXT NOT NULL,
llm_model TEXT,
tty TEXT,
pid INTEGER,
project_id TEXT,
task_id TEXT,
active_files TEXT, -- JSON array
heartbeat_at TEXT NOT NULL,
registered_at TEXT NOT NULL DEFAULT (datetime('now')),
status TEXT DEFAULT 'active'
);

CREATE TABLE inter_session_messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
sender_id TEXT NOT NULL,
channel TEXT NOT NULL,
payload TEXT NOT NULL,
created_at TEXT NOT NULL DEFAULT (datetime('now')),
ttl_seconds INTEGER DEFAULT 300
);

CREATE TABLE file_locks (
file_path TEXT PRIMARY KEY,
session_id TEXT NOT NULL,
lock_type TEXT DEFAULT 'advisory',
locked_at TEXT NOT NULL DEFAULT (datetime('now'))
);

Message Channels

ChannelPurposeTTL
stateSession status broadcasts60s
file_conflictFile edit conflict warnings300s
task_broadcastTask routing between sessions600s
heartbeatSession liveness detection30s

API Surface

from scripts.core.message_bus import get_message_bus

bus = get_message_bus() # Returns SQLiteMessageBus by default

# Register this session
bus.register_session(
session_id="sess-abc123",
llm_vendor="claude",
llm_model="opus-4.6",
project_id="PILOT"
)

# Publish a message
bus.publish("state", {"task_id": "H.8.1", "status": "working"})

# Subscribe with callback (uses kqueue internally)
handle = bus.subscribe("file_conflict", my_conflict_handler)

# Poll for messages (fallback)
messages = bus.poll("task_broadcast", since_id=42)

# Lock a file (advisory)
bus.lock_file("scripts/core/paths.py")
bus.unlock_file("scripts/core/paths.py")

# Heartbeat (call every 15s)
bus.heartbeat()

Alternatives Considered

Rejected: Motia Framework (Score: 40.1%)

  • In-process event bus -- cannot serve external subscribers
  • Elastic License v2 (changed from MIT, Nov 2025) -- managed service restriction
  • Core rewrite to Rust/Go in progress -- current API will be deprecated
  • 1,011 commits in 2025, only 13 in 2026 -- declining activity

Runner-Up: File-based JSON Manifest (Score: 79.6%)

  • Simplest possible approach; used by Claude Code Agent Teams
  • Rejected because CODITECT's multi-vendor, peer-to-peer coordination problem is harder than Agent Teams' single-vendor, hierarchical problem
  • No ACID guarantees, no crash recovery, race conditions under concurrent writes

Considered: NATS.io (Score: 77.0%)

  • Excellent messaging system, CNCF graduated, Apache 2.0
  • Rejected because it adds a server process for a ~2 msgs/sec workload
  • 10M+ msgs/sec capacity is 5 million times more than needed

Considered: Redis Pub/Sub (Score: 74.8%)

  • Excellent latency (<0.1ms), true push pub/sub
  • Rejected due to license instability (changed twice in 18 months), infrastructure overhead
  • AGPLv3 has Valkey/KeyDB as drop-in alternatives, but adds operational burden

Considered: Unix Domain Sockets (Score: 75.7%)

  • Best latency option (~0.05ms), zero network overhead
  • Rejected because it requires building and maintaining a custom daemon (SPOF)
  • CODITECT already has 4 LaunchAgent daemons; adding a 5th is feasible but adds operational complexity

Disqualified: Claude Code Agent Teams (Score: 52.1%)

  • Claude-only -- cannot coordinate Codex, Gemini, or Kimi sessions
  • No programmatic API -- no Python/TypeScript SDK
  • Experimental with known bugs (task status lag, failed cleanup, no session resumption)

Consequences

Positive

  • Zero new dependencies -- Only Python stdlib and SQLite (already in stack)
  • ACID crash recovery -- Incomplete transactions auto-rollback; no orphaned state
  • Sub-50ms notification -- kqueue/inotify push on WAL file changes
  • Consistent architecture -- Follows ADR-118 four-tier database pattern
  • Clean upgrade path -- MessageBus abstraction enables CloudMessageBus for api.coditect.ai
  • Testable -- SQLite in-memory mode for unit tests, no infrastructure needed

Negative

  • SQLite is single-writer -- Concurrent writes serialize. Mitigated by separate database and TTL cleanup.
  • No true pub/sub semantics -- SQLite has no notification mechanism. Mitigated by kqueue/inotify hybrid.
  • macOS/Linux only -- kqueue (macOS) and inotify (Linux) are OS-specific. Fallback polling on other platforms.
  • Cloud tier requires transport replacement -- SQLiteMessageBus cannot serve networked clients. CloudMessageBus must be implemented for CODITECT Cloud.

Cloud Evolution Path

PhaseTriggerTransport
1 (Launch)NowSQLiteMessageBus (local)
2 (Team features)Multi-machine requirementPostgreSQL LISTEN/NOTIFY via CloudMessageBus
3 (Scale)1000+ concurrent sessionsOptional broker (NATS/Redis) via CloudMessageBus

Compliance

  • ADR-118 (Four-Tier DB): messaging.db is a new purpose-specific database, consistent with tier separation
  • ADR-053 (Cloud Sync): MessageBus abstraction provides the hook for cloud-tier sync
  • ADR-089 (Data Separation): messaging.db stores ephemeral coordination data, not customer knowledge

References

  • Full evaluation: internal/analysis/inter-session-messaging/evaluation-matrix.md
  • Final verdict: internal/analysis/inter-session-messaging/final-verdict.md
  • Judge 3 analysis: internal/analysis/inter-session-messaging/judge-3-industry-ecosystem-analysis.md
  • Industry research: Cursor, Windsurf, Devin, OpenHands, SWE-agent, Aider, Copilot Workspace
  • Protocol research: MCP (Nov 2025 spec), A2A (v0.3, July 2025)
  • Claude Code Agent Teams: Released Feb 5, 2026 with Opus 4.6