ADR-194: Codi-Watcher v3.0 — Hash-Based Change Detection

Status

Accepted — February 13, 2026

Context

Codi-watcher v2.0 (ADR-134) monitors LLM session files across Claude, Codex, Gemini, and KIMI. When context usage thresholds are reached, it copies session files to pending directories for later manual processing via /cx.

This architecture has three problems:

Data redundancy: The copy-on-threshold approach creates 22GB/week of duplicate session files in cusf-archive/, proven lossless by CUSF verification (every copied file is byte-identical to the source).
Stale context: The user must manually run /cx to extract context from copied files. Between file detection and manual extraction, the context database is stale — sometimes by hours or days.
Complexity: The threshold system (context percentage parsing, 4 trigger types, cooldown management per trigger type) is fragile and requires parsing LLM-specific context usage formats that change between versions.

With ADR-181 (incremental /cx) now implemented, the unified message extractor can process only new/changed content via stat-based file classification and seek-based append extraction. The watcher no longer needs to copy files — it only needs to detect changes and trigger the extractor.

Decision

Replace the v2.0 threshold-based export pipeline with a v3.0 hash-based change detection system that automatically triggers incremental /cx.

Architecture Change

v2.0: Poll → Detect sessions → Check context% → Copy file → Pending dir → Manual /cx
v3.0: Poll → Discover files → Detect hash changes → Auto /cx --incremental → Log results

Key Design Decisions

1. Two-phase change detection (stat + hash)

Rather than hashing every file on every poll cycle, use a two-phase approach:

Phase 1: stat() each file for size + mtime (one syscall, O(1) per file)
Phase 2: SHA-256 hash only when stat differs from stored values

This makes the common case (no changes) extremely fast (~0.5-1s for a full poll) while still providing cryptographic certainty when changes are detected.

2. SHA-256 in-process (not shell-out)

Use the sha2 Rust crate for streaming hash computation in 64KB chunks. This avoids process spawn overhead per file and is consistent with ADR-182 (file integrity) which also uses SHA-256.

3. Shell-out to unified-message-extractor.py for /cx

The extractor is 3,000+ lines of battle-tested Python with SQLite integration, dedup stores, knowledge extraction, trajectory extraction, and MCP reindexing. Reimplementing in Rust would be massive scope creep. The watcher's job is detection + triggering, not extraction.

4. Per-LLM cooldown (not per-trigger-type)

v2.0 had cooldowns per trigger type (context%, size, time, turns). v3.0 simplifies to one cooldown per LLM: after /cx fires for an LLM, don't re-trigger within cooldown_seconds (default: 60s). This prevents /cx storms during rapid file changes.

5. Config schema v2.0.0 with backward compatibility

The new config replaces thresholds, triggers, and export blocks with a single trigger block. Old v1.0.0 config files are silently accepted (legacy fields ignored, defaults used for new fields).

6. State v3.0.0 with migration

State tracks per-file hashes instead of session cooldowns and export history. v1→v3 and v2→v3 migration paths exist for seamless upgrade.

Consequences

Positive

Zero data redundancy: No file copies. 22GB/week savings.
Near real-time context: Changes detected within poll interval (30s default), /cx triggered automatically. Context DB freshness goes from hours/days to minutes.
Simpler architecture: No threshold parsing, no trigger types, no export pipeline. Just hash → detect → trigger.
Faster common case: Poll cycle with no changes: ~0.5-1s (stat only) vs ~2-5s (stat + context% parse).

Negative

Shell-out dependency: Relies on unified-message-extractor.py being available and working. If the script breaks, auto-/cx breaks.
SHA-256 cost on change: Hashing a 50MB session file takes ~200ms. Acceptable for the uncommon case but adds latency to the trigger path.

Neutral

CLI simplified: Removed --threshold, --max-threshold, --multi-llm flags. Added --dry-run, --once, --force-cx, --status.
Dead code warnings: 29 Rust dead_code warnings remain for backward-compat legacy fields and future-use methods. These are intentional and expected.

Files

File	Lines	Action
`tools/context-watcher/Cargo.toml`	49	Modified (version bump, sha2 dep)
`tools/context-watcher/src/main.rs`	580	Rewritten
`tools/context-watcher/src/trigger.rs`	487	New (replaces export.rs)
`tools/context-watcher/src/monitor.rs`	815	Rewritten
`tools/context-watcher/src/state.rs`	700	Rewritten
`tools/context-watcher/src/config.rs`	541	Rewritten
`tools/context-watcher/src/detection.rs`	619	Minor updates
`tools/context-watcher/src/paths.rs`	195	Unchanged
`config/llm-watchers.json`	94	Modified (v2.0.0 schema)

Deleted:

src/context_watcher.rs — Legacy single-LLM mode, fully superseded
src/export.rs — Threshold-based export pipeline, replaced by trigger.rs

Verification

cargo build: Compiles clean (29 dead_code warnings, 0 errors)
cargo test: 25/25 tests pass (config: 5, detection: 2, monitor: 4, state: 7, trigger: 5, paths: 3)

Track

J.13: Codi-Watcher v3.0 — Hash-Based Change Detection

Status​

Context​

Decision​

Architecture Change​

Key Design Decisions​

Consequences​

Positive​

Negative​

Neutral​

Files​

Verification​

Track​