Process JSONL Sessions - Batch Processing Command
Analyze and process Claude Code native JSONL session files from ~/.claude/projects/ with intelligent chunking, deduplication, and watermark tracking.
Usage
# Analyze session structure
/process-jsonl-sessions --analyze SESSION_FILE
# Process single session
/process-jsonl-sessions --session SESSION_FILE
# Batch process all large sessions
/process-jsonl-sessions --batch --min-size 10
# Resume from watermark
/process-jsonl-sessions --resume SESSION_ID
# Check processing status
/process-jsonl-sessions --status SESSION_ID
System Prompt
System Prompt
⚠️ EXECUTION DIRECTIVE: When the user invokes this command, you MUST:
- IMMEDIATELY execute - no questions, no explanations first
- ALWAYS show full output from script/tool execution
- ALWAYS provide summary after execution completes
DO NOT:
- Say "I don't need to take action" - you ALWAYS execute when invoked
- Ask for confirmation unless
requires_confirmation: truein frontmatter - Skip execution even if it seems redundant - run it anyway
The user invoking the command IS the confirmation.
Use the jsonl-session-processor agent for JSONL session processing.
Use the jsonl-session-processor subagent to process Claude Code JSONL session files with chunking and deduplication
Options
| Option | Description |
|---|---|
--analyze FILE | Analyze session structure and recommend chunking |
--session FILE | Process single session file |
--batch | Process all large sessions across projects |
--resume ID | Resume processing from last watermark |
--status ID | Check processing status and progress |
--min-size MB | Minimum file size for batch (default: 10 MB) |
--chunk-size N | Target chunk size in lines (default: 1000) |
--overlap N | Overlap messages between chunks (default: 10) |
--json | Output as JSON |
--verbose | Verbose output |
What This Command Does
- Scans ~/.claude/projects for JSONL session files
- Analyzes session structure (messages, snapshots, split points)
- Chunks large files at safe boundaries (file snapshots, user messages)
- Deduplicates messages via global SHA-256 hash pool
- Tracks progress with watermarks (resume from failures)
- Reports statistics (new unique, duplicates filtered, dedup rate)
Use Cases
1. Analyze Large Session
Task: Understand structure and chunking strategy
/process-jsonl-sessions --analyze ~/.claude/projects/.../cbe665f8-2712-4ed6-8721-2da739cf5e7e.jsonl
Result: Session metadata, safe split points, recommended chunks
2. Process Single Session
Task: Deduplicate one large session file
/process-jsonl-sessions --session ~/.claude/projects/.../SESSION.jsonl --verbose
Result: Chunks created, messages deduplicated, watermark set
3. Batch Process All Large Sessions
Task: Process all sessions >10 MB across all projects
/process-jsonl-sessions --batch --min-size 10
Result: All large sessions processed, global unique count updated
4. Resume After Failure
Task: Continue processing from last successful point
/process-jsonl-sessions --resume cbe665f8-2712-4ed6-8721-2da739cf5e7e
Result: Processing continues from watermark, no progress lost
5. Check Status
Task: View processing progress
/process-jsonl-sessions --status cbe665f8-2712-4ed6-8721-2da739cf5e7e
Result: Progress %, lines processed, chunks completed
Output Format
Analysis Output
Session Analysis: cbe665f8-2712-4ed6-8721-2da739cf5e7e.jsonl
Size: 89.3 MB
Lines: 15,906
Messages: 14,617 (user: 5,142, assistant: 9,475)
File snapshots: 1,289
Safe split points: 18
Recommended chunks: 16 @ ~1000 lines each
Processing Output
Processing: cbe665f8-2712-4ed6-8721-2da739cf5e7e.jsonl
Chunking: 16 chunks created
Chunk 1/16: Lines 1-1000 (847 new unique)
Chunk 2/16: Lines 990-1990 (800 new unique, 10 overlap)
... (14 more)
Results:
Messages processed: 14,617
New unique: 12,456
Duplicates filtered: 2,161 (14.8% dedup rate)
Global unique count: 22,662 (was 10,206)
✅ Session fully processed and watermarked
Status Output
Session: cbe665f8-2712-4ed6-8721-2da739cf5e7e
Status: in_progress
Progress: 34%
Lines processed: 5,432 / 15,906
Chunks completed: 4
Chunks pending: 12
Last updated: 2025-11-29T12:34:56Z
Integration
Works With:
message_deduplicator- Global hash pool for deduplicationsession-analyzer- Session discovery and indexing/session-index- Session file inventory
Enables:
- Phase 5 multi-session continuity
- Zero catastrophic forgetting
- Complete context preservation across sessions
Performance
Typical Session (89 MB, 15,906 lines):
- Analysis: <5 seconds
- Processing: ~25 seconds
- Memory: <500 MB peak
Batch (8 sessions, 287 MB):
- Total time: <3 minutes
- Throughput: ~1000 lines/second
Error Handling
Common Issues:
- File not found: Verify session file path
- Memory exhaustion: Reduce --chunk-size (try 500)
- Invalid JSONL: Skips malformed lines (continues processing)
- Watermark corruption: Use --reset to restart
Tips
- Start with --analyze to understand chunking strategy
- Use --verbose for detailed progress
- Batch process weekly to capture new sessions
- Check watermarks before reprocessing
- Use --json for programmatic access
Action Policy
<default_behavior> This command processes JSONL session files. Proceeds with:
- Session analysis (if --analyze)
- Chunking at safe boundaries
- Message deduplication
- Watermark tracking
- Progress reporting
Uses global hash pool for deduplication. </default_behavior>
Success Output
When process-jsonl-sessions completes:
✅ COMMAND COMPLETE: /process-jsonl-sessions
Sessions: <N> processed
New Unique: <N>
Duplicates: <N> filtered
Dedup Rate: <N>%
Status: Watermarked
Completion Checklist
Before marking complete:
- Sessions scanned
- Chunks created
- Messages deduplicated
- Watermarks set
- Statistics reported
Failure Indicators
This command has FAILED if:
- ❌ Session file not found
- ❌ Invalid JSONL
- ❌ Memory exhaustion
- ❌ Watermark corruption
When NOT to Use
Do NOT use when:
- Small sessions (<1 MB)
- Already processed (check watermark)
- Memory limited (reduce chunk size)
Anti-Patterns (Avoid)
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Skip --analyze | Wrong chunking | Analyze first |
| Large chunk size | Memory issues | Use smaller chunks |
| Ignore watermarks | Reprocessing | Check status first |
Principles
This command embodies:
- #3 Complete Execution - Full processing workflow
- #4 Idempotent - Watermark prevents reprocessing
- #9 Based on Facts - Hash-based deduplication
Full Standard: CODITECT-STANDARD-AUTOMATION.md
Version: 1.0.0 Status: Production Ready Last Updated: 2025-11-29