Context Watcher CUSF Pipeline Assessment
Date: 2026-02-12 Author: Claude (Opus 4.6) Status: Complete Recommendation: Disable watcher's Claude CUSF export pipeline; retain direct JSONL reads
Executive Summary
The context watcher daemon (coditect-daemon) has been exporting Claude session files to sessions-export-pending-anthropic/ since early February 2026. These exports are processed by /cx through the CUSF pipeline, which moves them to cusf-archive/. However, zero messages have ever been extracted from these Claude watcher exports due to a format mismatch. The result is 22GB of useless archived files.
Findings
1. Watcher is Healthy
| Metric | Value |
|---|---|
| Process | PID 815, uptime 4d 17h (since Feb 8) |
| Binary | ~/.coditect/bin/coditect-daemon --multi-llm |
| Config | ~/.coditect/config/llm-watchers.json |
| Total exports triggered | 945 |
| Export rate | 66-163 files/day |
| Trigger threshold | 75% context usage |
| Poll interval | 30 seconds |
2. CUSF Extraction Has Never Worked for Claude Exports
Root Cause: Format mismatch between watcher output and CUSF processor.
| Component | Format | Entry Types |
|---|---|---|
| Watcher export (raw Claude JSONL) | {type: "user", message: {...}} | user, assistant, file-history-snapshot |
| CUSF processor (unified-message-extractor.py:3827) | {type: "message", role: "user", content: "..."} | Only matches message |
The watcher copies raw Claude session JSONL files verbatim. The CUSF processor only processes entries with type == 'message'. Since raw Claude JSONL uses type == 'user'/type == 'assistant', every entry falls through without extraction. Files are archived with total=0, new=0.
3. Direct JSONL Reads Already Cover Everything
| Source | Messages | Files | Sessions | Disk |
|---|---|---|---|---|
| jsonl (direct reads) | 489,569 | 3,383 | 496 | 2.7 GB (source) |
| cusf (watcher pipeline) | 994 | 14 | 1 | 22 GB (archive) |
| export (manual /export) | 54,636 | 1,186 | - | in exports-archive |
The 994 CUSF messages came from Kimi exports (which use proper CUSF format), not from the watcher's Claude exports.
/cx already reads every Claude session file directly from ~/.claude/projects/. The watcher exports are 100% redundant copies.
4. Disk Waste
| Location | Size | Files |
|---|---|---|
cusf-archive/ | 22 GB | 987 |
sessions-export-pending-anthropic/ | 47 MB | 3 (recent) |
| Total wasted | ~22 GB | 990 |
The archive is 8x larger than the source session files (2.7 GB) because the watcher creates a full session dump each time context exceeds 75%. A session that grows from 1 MB to 20 MB generates ~10 snapshots.
Recommendation
Disable the watcher's Claude CUSF export pipeline. Specifically:
- Set
claude.enabled: falseinllm-watchers.json(or remove the export trigger) - Clean the
cusf-archive/— all 987 Claude exports contain zero extracted data - Keep Kimi/Codex/Gemini export pipeline intact (their CUSF format works correctly)
- Keep the watcher process running — it provides useful context% monitoring and session detection
Why NOT fix the format mismatch?
/cxalready reads the same session files directly from source — fixing the CUSF processor to handle raw JSONL would create duplicate extractions- The watcher creates redundant snapshots (same session at different sizes), increasing archive bloat
- The watcher has no deduplication — each export is a full session copy
- Direct JSONL reads are more efficient: they read each file once per
/cxrun
What the watcher DOES provide value for (keep these):
- Context usage monitoring (threshold warnings)
- Session detection (new session notifications)
- Token economics data in state file (export counts)
Lossless Verification Results (Phase 2)
Script: .coditect/scripts/cusf-archive-analyzer.py
Run: 2026-02-12T22:16Z (272s to process 990 files, 22.1 GB)
Phase 2 Parse Results
| Metric | Value |
|---|---|
| Lines scanned | 8,236,056 |
| Extractable entries | 2,549,787 |
| User/assistant messages | 2,370,913 |
| Unique entry hashes | 81,204 |
| Entry types | 19 distinct (progress: 5.2M, assistant: 1.7M, user: 1.0M) |
Massive duplication: 8.2M lines produce only 81K unique hashes because the watcher repeatedly dumps the same growing session.
Phase 3 Comparison Results
| Category | Count | % |
|---|---|---|
| Already in unified store | 79,770 | 98.2% |
| Apparently unique | 1,434 | 1.8% |
The 1,434 "unique" entries came from Kimi, Gemini, and Codex exports (not Claude watcher):
| Source | Unique Entries |
|---|---|
| Kimi (LOSSLESS wire format) | 886 |
| Gemini | 401 |
| Codex (LOSSLESS) | 130 |
| Claude | 17 |
Phase 4 Final Verification
All 1,434 entries are already in sessions.db (586,993 hashes). The apparent uniqueness was an artifact of the JSON hash cache (unified_hashes.json) being out of sync with the database. The hash cache had degraded to 3 entries; sessions.db had the complete 586,993.
Verdict: Archive is 100% redundant. Zero data loss risk from cleanup.
Hash store regenerated: 588,424 hashes (sessions.db + content hashes).
Files Referenced
| File | Purpose |
|---|---|
~/.coditect/bin/coditect-daemon | Watcher binary |
~/.coditect/config/llm-watchers.json | Watcher configuration |
~/.coditect/logs/context-watcher.log | Active watcher log (8 MB) |
~/.coditect/scripts/unified-message-extractor.py:3827 | CUSF processor type check |
~/PROJECTS/.coditect-data/context-storage/cusf-archive/ | 22 GB archive to clean |
~/PROJECTS/.coditect-data/sessions-export-pending-anthropic/ | Pending dir to clean |
~/PROJECTS/.coditect-data/context-storage/watcher-state.json | Watcher state |