Skip to main content

ADR-194: Codi-Watcher v3.0 — Hash-Based Change Detection

Status

Accepted — February 13, 2026

Context

Codi-watcher v2.0 (ADR-134) monitors LLM session files across Claude, Codex, Gemini, and KIMI. When context usage thresholds are reached, it copies session files to pending directories for later manual processing via /cx.

This architecture has three problems:

  1. Data redundancy: The copy-on-threshold approach creates 22GB/week of duplicate session files in cusf-archive/, proven lossless by CUSF verification (every copied file is byte-identical to the source).

  2. Stale context: The user must manually run /cx to extract context from copied files. Between file detection and manual extraction, the context database is stale — sometimes by hours or days.

  3. Complexity: The threshold system (context percentage parsing, 4 trigger types, cooldown management per trigger type) is fragile and requires parsing LLM-specific context usage formats that change between versions.

With ADR-181 (incremental /cx) now implemented, the unified message extractor can process only new/changed content via stat-based file classification and seek-based append extraction. The watcher no longer needs to copy files — it only needs to detect changes and trigger the extractor.

Decision

Replace the v2.0 threshold-based export pipeline with a v3.0 hash-based change detection system that automatically triggers incremental /cx.

Architecture Change

v2.0: Poll → Detect sessions → Check context% → Copy file → Pending dir → Manual /cx
v3.0: Poll → Discover files → Detect hash changes → Auto /cx --incremental → Log results

Key Design Decisions

1. Two-phase change detection (stat + hash)

Rather than hashing every file on every poll cycle, use a two-phase approach:

  • Phase 1: stat() each file for size + mtime (one syscall, O(1) per file)
  • Phase 2: SHA-256 hash only when stat differs from stored values

This makes the common case (no changes) extremely fast (~0.5-1s for a full poll) while still providing cryptographic certainty when changes are detected.

2. SHA-256 in-process (not shell-out)

Use the sha2 Rust crate for streaming hash computation in 64KB chunks. This avoids process spawn overhead per file and is consistent with ADR-182 (file integrity) which also uses SHA-256.

3. Shell-out to unified-message-extractor.py for /cx

The extractor is 3,000+ lines of battle-tested Python with SQLite integration, dedup stores, knowledge extraction, trajectory extraction, and MCP reindexing. Reimplementing in Rust would be massive scope creep. The watcher's job is detection + triggering, not extraction.

4. Per-LLM cooldown (not per-trigger-type)

v2.0 had cooldowns per trigger type (context%, size, time, turns). v3.0 simplifies to one cooldown per LLM: after /cx fires for an LLM, don't re-trigger within cooldown_seconds (default: 60s). This prevents /cx storms during rapid file changes.

5. Config schema v2.0.0 with backward compatibility

The new config replaces thresholds, triggers, and export blocks with a single trigger block. Old v1.0.0 config files are silently accepted (legacy fields ignored, defaults used for new fields).

6. State v3.0.0 with migration

State tracks per-file hashes instead of session cooldowns and export history. v1→v3 and v2→v3 migration paths exist for seamless upgrade.

Consequences

Positive

  • Zero data redundancy: No file copies. 22GB/week savings.
  • Near real-time context: Changes detected within poll interval (30s default), /cx triggered automatically. Context DB freshness goes from hours/days to minutes.
  • Simpler architecture: No threshold parsing, no trigger types, no export pipeline. Just hash → detect → trigger.
  • Faster common case: Poll cycle with no changes: ~0.5-1s (stat only) vs ~2-5s (stat + context% parse).

Negative

  • Shell-out dependency: Relies on unified-message-extractor.py being available and working. If the script breaks, auto-/cx breaks.
  • SHA-256 cost on change: Hashing a 50MB session file takes ~200ms. Acceptable for the uncommon case but adds latency to the trigger path.

Neutral

  • CLI simplified: Removed --threshold, --max-threshold, --multi-llm flags. Added --dry-run, --once, --force-cx, --status.
  • Dead code warnings: 29 Rust dead_code warnings remain for backward-compat legacy fields and future-use methods. These are intentional and expected.

Files

FileLinesAction
tools/context-watcher/Cargo.toml49Modified (version bump, sha2 dep)
tools/context-watcher/src/main.rs580Rewritten
tools/context-watcher/src/trigger.rs487New (replaces export.rs)
tools/context-watcher/src/monitor.rs815Rewritten
tools/context-watcher/src/state.rs700Rewritten
tools/context-watcher/src/config.rs541Rewritten
tools/context-watcher/src/detection.rs619Minor updates
tools/context-watcher/src/paths.rs195Unchanged
config/llm-watchers.json94Modified (v2.0.0 schema)

Deleted:

  • src/context_watcher.rs — Legacy single-LLM mode, fully superseded
  • src/export.rs — Threshold-based export pipeline, replaced by trigger.rs

Verification

  • cargo build: Compiles clean (29 dead_code warnings, 0 errors)
  • cargo test: 25/25 tests pass (config: 5, detection: 2, monitor: 4, state: 7, trigger: 5, paths: 3)

Track

J.13: Codi-Watcher v3.0 — Hash-Based Change Detection