Skip to main content

Process JSONL Sessions - Batch Processing Command

Analyze and process Claude Code native JSONL session files from ~/.claude/projects/ with intelligent chunking, deduplication, and watermark tracking.

Usage

# Analyze session structure
/process-jsonl-sessions --analyze SESSION_FILE

# Process single session
/process-jsonl-sessions --session SESSION_FILE

# Batch process all large sessions
/process-jsonl-sessions --batch --min-size 10

# Resume from watermark
/process-jsonl-sessions --resume SESSION_ID

# Check processing status
/process-jsonl-sessions --status SESSION_ID

System Prompt

System Prompt

⚠️ EXECUTION DIRECTIVE: When the user invokes this command, you MUST:

  1. IMMEDIATELY execute - no questions, no explanations first
  2. ALWAYS show full output from script/tool execution
  3. ALWAYS provide summary after execution completes

DO NOT:

  • Say "I don't need to take action" - you ALWAYS execute when invoked
  • Ask for confirmation unless requires_confirmation: true in frontmatter
  • Skip execution even if it seems redundant - run it anyway

The user invoking the command IS the confirmation.


Use the jsonl-session-processor agent for JSONL session processing.

Use the jsonl-session-processor subagent to process Claude Code JSONL session files with chunking and deduplication

Options

OptionDescription
--analyze FILEAnalyze session structure and recommend chunking
--session FILEProcess single session file
--batchProcess all large sessions across projects
--resume IDResume processing from last watermark
--status IDCheck processing status and progress
--min-size MBMinimum file size for batch (default: 10 MB)
--chunk-size NTarget chunk size in lines (default: 1000)
--overlap NOverlap messages between chunks (default: 10)
--jsonOutput as JSON
--verboseVerbose output

What This Command Does

  1. Scans ~/.claude/projects for JSONL session files
  2. Analyzes session structure (messages, snapshots, split points)
  3. Chunks large files at safe boundaries (file snapshots, user messages)
  4. Deduplicates messages via global SHA-256 hash pool
  5. Tracks progress with watermarks (resume from failures)
  6. Reports statistics (new unique, duplicates filtered, dedup rate)

Use Cases

1. Analyze Large Session

Task: Understand structure and chunking strategy

/process-jsonl-sessions --analyze ~/.claude/projects/.../cbe665f8-2712-4ed6-8721-2da739cf5e7e.jsonl

Result: Session metadata, safe split points, recommended chunks

2. Process Single Session

Task: Deduplicate one large session file

/process-jsonl-sessions --session ~/.claude/projects/.../SESSION.jsonl --verbose

Result: Chunks created, messages deduplicated, watermark set

3. Batch Process All Large Sessions

Task: Process all sessions >10 MB across all projects

/process-jsonl-sessions --batch --min-size 10

Result: All large sessions processed, global unique count updated

4. Resume After Failure

Task: Continue processing from last successful point

/process-jsonl-sessions --resume cbe665f8-2712-4ed6-8721-2da739cf5e7e

Result: Processing continues from watermark, no progress lost

5. Check Status

Task: View processing progress

/process-jsonl-sessions --status cbe665f8-2712-4ed6-8721-2da739cf5e7e

Result: Progress %, lines processed, chunks completed

Output Format

Analysis Output

Session Analysis: cbe665f8-2712-4ed6-8721-2da739cf5e7e.jsonl
Size: 89.3 MB
Lines: 15,906
Messages: 14,617 (user: 5,142, assistant: 9,475)
File snapshots: 1,289
Safe split points: 18
Recommended chunks: 16 @ ~1000 lines each

Processing Output

Processing: cbe665f8-2712-4ed6-8721-2da739cf5e7e.jsonl

Chunking: 16 chunks created
Chunk 1/16: Lines 1-1000 (847 new unique)
Chunk 2/16: Lines 990-1990 (800 new unique, 10 overlap)
... (14 more)

Results:
Messages processed: 14,617
New unique: 12,456
Duplicates filtered: 2,161 (14.8% dedup rate)
Global unique count: 22,662 (was 10,206)

✅ Session fully processed and watermarked

Status Output

Session: cbe665f8-2712-4ed6-8721-2da739cf5e7e
Status: in_progress
Progress: 34%
Lines processed: 5,432 / 15,906
Chunks completed: 4
Chunks pending: 12
Last updated: 2025-11-29T12:34:56Z

Integration

Works With:

  • message_deduplicator - Global hash pool for deduplication
  • session-analyzer - Session discovery and indexing
  • /session-index - Session file inventory

Enables:

  • Phase 5 multi-session continuity
  • Zero catastrophic forgetting
  • Complete context preservation across sessions

Performance

Typical Session (89 MB, 15,906 lines):

  • Analysis: <5 seconds
  • Processing: ~25 seconds
  • Memory: <500 MB peak

Batch (8 sessions, 287 MB):

  • Total time: <3 minutes
  • Throughput: ~1000 lines/second

Error Handling

Common Issues:

  • File not found: Verify session file path
  • Memory exhaustion: Reduce --chunk-size (try 500)
  • Invalid JSONL: Skips malformed lines (continues processing)
  • Watermark corruption: Use --reset to restart

Tips

  1. Start with --analyze to understand chunking strategy
  2. Use --verbose for detailed progress
  3. Batch process weekly to capture new sessions
  4. Check watermarks before reprocessing
  5. Use --json for programmatic access

Action Policy

<default_behavior> This command processes JSONL session files. Proceeds with:

  • Session analysis (if --analyze)
  • Chunking at safe boundaries
  • Message deduplication
  • Watermark tracking
  • Progress reporting

Uses global hash pool for deduplication. </default_behavior>

After execution, verify: - Sessions processed - Chunks created - Duplicates filtered - Watermarks set

Success Output

When process-jsonl-sessions completes:

✅ COMMAND COMPLETE: /process-jsonl-sessions
Sessions: <N> processed
New Unique: <N>
Duplicates: <N> filtered
Dedup Rate: <N>%
Status: Watermarked

Completion Checklist

Before marking complete:

  • Sessions scanned
  • Chunks created
  • Messages deduplicated
  • Watermarks set
  • Statistics reported

Failure Indicators

This command has FAILED if:

  • ❌ Session file not found
  • ❌ Invalid JSONL
  • ❌ Memory exhaustion
  • ❌ Watermark corruption

When NOT to Use

Do NOT use when:

  • Small sessions (<1 MB)
  • Already processed (check watermark)
  • Memory limited (reduce chunk size)

Anti-Patterns (Avoid)

Anti-PatternProblemSolution
Skip --analyzeWrong chunkingAnalyze first
Large chunk sizeMemory issuesUse smaller chunks
Ignore watermarksReprocessingCheck status first

Principles

This command embodies:

  • #3 Complete Execution - Full processing workflow
  • #4 Idempotent - Watermark prevents reprocessing
  • #9 Based on Facts - Hash-based deduplication

Full Standard: CODITECT-STANDARD-AUTOMATION.md


Version: 1.0.0 Status: Production Ready Last Updated: 2025-11-29