Context Watcher CUSF Pipeline Assessment

Date: 2026-02-12 Author: Claude (Opus 4.6) Status: Complete Recommendation: Disable watcher's Claude CUSF export pipeline; retain direct JSONL reads

Executive Summary

The context watcher daemon (coditect-daemon) has been exporting Claude session files to sessions-export-pending-anthropic/ since early February 2026. These exports are processed by /cx through the CUSF pipeline, which moves them to cusf-archive/. However, zero messages have ever been extracted from these Claude watcher exports due to a format mismatch. The result is 22GB of useless archived files.

Findings

1. Watcher is Healthy

Metric	Value
Process	PID 815, uptime 4d 17h (since Feb 8)
Binary	`~/.coditect/bin/coditect-daemon --multi-llm`
Config	`~/.coditect/config/llm-watchers.json`
Total exports triggered	945
Export rate	66-163 files/day
Trigger threshold	75% context usage
Poll interval	30 seconds

2. CUSF Extraction Has Never Worked for Claude Exports

Root Cause: Format mismatch between watcher output and CUSF processor.

Component	Format	Entry Types
Watcher export (raw Claude JSONL)	`{type: "user", message: {...}}`	`user`, `assistant`, `file-history-snapshot`
CUSF processor (unified-message-extractor.py:3827)	`{type: "message", role: "user", content: "..."}`	Only matches `message`

The watcher copies raw Claude session JSONL files verbatim. The CUSF processor only processes entries with type == 'message'. Since raw Claude JSONL uses type == 'user'/type == 'assistant', every entry falls through without extraction. Files are archived with total=0, new=0.

3. Direct JSONL Reads Already Cover Everything

Source	Messages	Files	Sessions	Disk
jsonl (direct reads)	489,569	3,383	496	2.7 GB (source)
cusf (watcher pipeline)	994	14	1	22 GB (archive)
export (manual /export)	54,636	1,186	-	in exports-archive

The 994 CUSF messages came from Kimi exports (which use proper CUSF format), not from the watcher's Claude exports.

/cx already reads every Claude session file directly from ~/.claude/projects/. The watcher exports are 100% redundant copies.

4. Disk Waste

Location	Size	Files
`cusf-archive/`	22 GB	987
`sessions-export-pending-anthropic/`	47 MB	3 (recent)
Total wasted	~22 GB	990

The archive is 8x larger than the source session files (2.7 GB) because the watcher creates a full session dump each time context exceeds 75%. A session that grows from 1 MB to 20 MB generates ~10 snapshots.

Recommendation

Disable the watcher's Claude CUSF export pipeline. Specifically:

Set claude.enabled: false in llm-watchers.json (or remove the export trigger)
Clean the cusf-archive/ — all 987 Claude exports contain zero extracted data
Keep Kimi/Codex/Gemini export pipeline intact (their CUSF format works correctly)
Keep the watcher process running — it provides useful context% monitoring and session detection

Why NOT fix the format mismatch?

/cx already reads the same session files directly from source — fixing the CUSF processor to handle raw JSONL would create duplicate extractions
The watcher creates redundant snapshots (same session at different sizes), increasing archive bloat
The watcher has no deduplication — each export is a full session copy
Direct JSONL reads are more efficient: they read each file once per /cx run

What the watcher DOES provide value for (keep these):

Context usage monitoring (threshold warnings)
Session detection (new session notifications)
Token economics data in state file (export counts)

Lossless Verification Results (Phase 2)

Script: .coditect/scripts/cusf-archive-analyzer.py Run: 2026-02-12T22:16Z (272s to process 990 files, 22.1 GB)

Phase 2 Parse Results

Metric	Value
Lines scanned	8,236,056
Extractable entries	2,549,787
User/assistant messages	2,370,913
Unique entry hashes	81,204
Entry types	19 distinct (progress: 5.2M, assistant: 1.7M, user: 1.0M)

Massive duplication: 8.2M lines produce only 81K unique hashes because the watcher repeatedly dumps the same growing session.

Phase 3 Comparison Results

Category	Count	%
Already in unified store	79,770	98.2%
Apparently unique	1,434	1.8%

The 1,434 "unique" entries came from Kimi, Gemini, and Codex exports (not Claude watcher):

Source	Unique Entries
Kimi (LOSSLESS wire format)	886
Gemini	401
Codex (LOSSLESS)	130
Claude	17

Phase 4 Final Verification

All 1,434 entries are already in sessions.db (586,993 hashes). The apparent uniqueness was an artifact of the JSON hash cache (unified_hashes.json) being out of sync with the database. The hash cache had degraded to 3 entries; sessions.db had the complete 586,993.

Verdict: Archive is 100% redundant. Zero data loss risk from cleanup.

Hash store regenerated: 588,424 hashes (sessions.db + content hashes).

Files Referenced

File	Purpose
`~/.coditect/bin/coditect-daemon`	Watcher binary
`~/.coditect/config/llm-watchers.json`	Watcher configuration
`~/.coditect/logs/context-watcher.log`	Active watcher log (8 MB)
`~/.coditect/scripts/unified-message-extractor.py:3827`	CUSF processor type check
`~/PROJECTS/.coditect-data/context-storage/cusf-archive/`	22 GB archive to clean
`~/PROJECTS/.coditect-data/sessions-export-pending-anthropic/`	Pending dir to clean
`~/PROJECTS/.coditect-data/context-storage/watcher-state.json`	Watcher state

Executive Summary​

Findings​

1. Watcher is Healthy​

2. CUSF Extraction Has Never Worked for Claude Exports​

3. Direct JSONL Reads Already Cover Everything​

4. Disk Waste​

Recommendation​

Why NOT fix the format mismatch?​

What the watcher DOES provide value for (keep these):​

Lossless Verification Results (Phase 2)​

Phase 2 Parse Results​

Phase 3 Comparison Results​

Phase 4 Final Verification​

Files Referenced​