Skip to main content

Context Watcher CUSF Pipeline Assessment

Date: 2026-02-12 Author: Claude (Opus 4.6) Status: Complete Recommendation: Disable watcher's Claude CUSF export pipeline; retain direct JSONL reads


Executive Summary

The context watcher daemon (coditect-daemon) has been exporting Claude session files to sessions-export-pending-anthropic/ since early February 2026. These exports are processed by /cx through the CUSF pipeline, which moves them to cusf-archive/. However, zero messages have ever been extracted from these Claude watcher exports due to a format mismatch. The result is 22GB of useless archived files.


Findings

1. Watcher is Healthy

MetricValue
ProcessPID 815, uptime 4d 17h (since Feb 8)
Binary~/.coditect/bin/coditect-daemon --multi-llm
Config~/.coditect/config/llm-watchers.json
Total exports triggered945
Export rate66-163 files/day
Trigger threshold75% context usage
Poll interval30 seconds

2. CUSF Extraction Has Never Worked for Claude Exports

Root Cause: Format mismatch between watcher output and CUSF processor.

ComponentFormatEntry Types
Watcher export (raw Claude JSONL){type: "user", message: {...}}user, assistant, file-history-snapshot
CUSF processor (unified-message-extractor.py:3827){type: "message", role: "user", content: "..."}Only matches message

The watcher copies raw Claude session JSONL files verbatim. The CUSF processor only processes entries with type == 'message'. Since raw Claude JSONL uses type == 'user'/type == 'assistant', every entry falls through without extraction. Files are archived with total=0, new=0.

3. Direct JSONL Reads Already Cover Everything

SourceMessagesFilesSessionsDisk
jsonl (direct reads)489,5693,3834962.7 GB (source)
cusf (watcher pipeline)99414122 GB (archive)
export (manual /export)54,6361,186-in exports-archive

The 994 CUSF messages came from Kimi exports (which use proper CUSF format), not from the watcher's Claude exports.

/cx already reads every Claude session file directly from ~/.claude/projects/. The watcher exports are 100% redundant copies.

4. Disk Waste

LocationSizeFiles
cusf-archive/22 GB987
sessions-export-pending-anthropic/47 MB3 (recent)
Total wasted~22 GB990

The archive is 8x larger than the source session files (2.7 GB) because the watcher creates a full session dump each time context exceeds 75%. A session that grows from 1 MB to 20 MB generates ~10 snapshots.


Recommendation

Disable the watcher's Claude CUSF export pipeline. Specifically:

  1. Set claude.enabled: false in llm-watchers.json (or remove the export trigger)
  2. Clean the cusf-archive/ — all 987 Claude exports contain zero extracted data
  3. Keep Kimi/Codex/Gemini export pipeline intact (their CUSF format works correctly)
  4. Keep the watcher process running — it provides useful context% monitoring and session detection

Why NOT fix the format mismatch?

  • /cx already reads the same session files directly from source — fixing the CUSF processor to handle raw JSONL would create duplicate extractions
  • The watcher creates redundant snapshots (same session at different sizes), increasing archive bloat
  • The watcher has no deduplication — each export is a full session copy
  • Direct JSONL reads are more efficient: they read each file once per /cx run

What the watcher DOES provide value for (keep these):

  • Context usage monitoring (threshold warnings)
  • Session detection (new session notifications)
  • Token economics data in state file (export counts)

Lossless Verification Results (Phase 2)

Script: .coditect/scripts/cusf-archive-analyzer.py Run: 2026-02-12T22:16Z (272s to process 990 files, 22.1 GB)

Phase 2 Parse Results

MetricValue
Lines scanned8,236,056
Extractable entries2,549,787
User/assistant messages2,370,913
Unique entry hashes81,204
Entry types19 distinct (progress: 5.2M, assistant: 1.7M, user: 1.0M)

Massive duplication: 8.2M lines produce only 81K unique hashes because the watcher repeatedly dumps the same growing session.

Phase 3 Comparison Results

CategoryCount%
Already in unified store79,77098.2%
Apparently unique1,4341.8%

The 1,434 "unique" entries came from Kimi, Gemini, and Codex exports (not Claude watcher):

SourceUnique Entries
Kimi (LOSSLESS wire format)886
Gemini401
Codex (LOSSLESS)130
Claude17

Phase 4 Final Verification

All 1,434 entries are already in sessions.db (586,993 hashes). The apparent uniqueness was an artifact of the JSON hash cache (unified_hashes.json) being out of sync with the database. The hash cache had degraded to 3 entries; sessions.db had the complete 586,993.

Verdict: Archive is 100% redundant. Zero data loss risk from cleanup.

Hash store regenerated: 588,424 hashes (sessions.db + content hashes).


Files Referenced

FilePurpose
~/.coditect/bin/coditect-daemonWatcher binary
~/.coditect/config/llm-watchers.jsonWatcher configuration
~/.coditect/logs/context-watcher.logActive watcher log (8 MB)
~/.coditect/scripts/unified-message-extractor.py:3827CUSF processor type check
~/PROJECTS/.coditect-data/context-storage/cusf-archive/22 GB archive to clean
~/PROJECTS/.coditect-data/sessions-export-pending-anthropic/Pending dir to clean
~/PROJECTS/.coditect-data/context-storage/watcher-state.jsonWatcher state