Context Database Schema V2 Complete Data Capture

Context Database Schema V2 - Complete Data Capture

Version: 2.0.0 Date: 2025-12-23 Status: Proposed

Design Principles

Zero Data Loss - Every field from every entry type is preserved
Query Optimized - Common fields extracted and indexed
Type-Specific - Specialized tables for different data categories
Full JSON Preserved - Original entry stored for completeness

Schema Overview

┌─────────────────────────────────────────────────────────────────┐
│                        ENTRIES (master)                          │
│  All JSONL entries with common fields + full JSON blob           │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐   │
│  │   MESSAGES   │  │   SUMMARIES  │  │   FILE_SNAPSHOTS     │   │
│  │ user/assistant│ │ conversation │  │   file backups       │   │
│  └──────────────┘  └──────────────┘  └──────────────────────┘   │
│                                                                  │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐   │
│  │SYSTEM_EVENTS │  │QUEUE_OPERATIONS│ │   MESSAGE_THREADING │   │
│  │ compaction   │  │ command queue │  │   uuid relationships│   │
│  └──────────────┘  └──────────────┘  └──────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘

Table Definitions

1. ENTRIES (Master Table)

Stores ALL entries with common fields extracted for indexing.

CREATE TABLE entries (
    -- Primary key
    id INTEGER PRIMARY KEY AUTOINCREMENT,

    -- Content hash for deduplication
    hash TEXT UNIQUE NOT NULL,

    -- Entry identification
    entry_type TEXT NOT NULL,           -- user, assistant, system, queue-operation, summary, file-history-snapshot
    uuid TEXT,                          -- Entry's unique ID
    parent_uuid TEXT,                   -- Parent entry UUID (threading)
    logical_parent_uuid TEXT,           -- Logical parent (for compaction)

    -- Session context
    session_id TEXT,
    agent_id TEXT,                      -- NULL for main sessions
    slug TEXT,                          -- Human-readable session name

    -- Environment
    cwd TEXT,                           -- Working directory
    git_branch TEXT,
    version TEXT,                       -- Claude Code version

    -- Timestamps
    timestamp TEXT,
    extracted_at TEXT,

    -- Flags
    is_sidechain BOOLEAN DEFAULT FALSE,
    is_meta BOOLEAN DEFAULT FALSE,
    user_type TEXT,                     -- external, internal

    -- Content (extracted for search)
    content TEXT,                       -- Main text content
    role TEXT,                          -- user, assistant, system

    -- Full original JSON (preserves everything)
    raw_json TEXT NOT NULL,

    -- Source tracking
    source_file TEXT,
    source_line INTEGER
);

-- Indexes
CREATE INDEX idx_entries_type ON entries(entry_type);
CREATE INDEX idx_entries_uuid ON entries(uuid);
CREATE INDEX idx_entries_parent ON entries(parent_uuid);
CREATE INDEX idx_entries_session ON entries(session_id);
CREATE INDEX idx_entries_agent ON entries(agent_id);
CREATE INDEX idx_entries_timestamp ON entries(timestamp);
CREATE INDEX idx_entries_role ON entries(role);

-- Full-text search
CREATE VIRTUAL TABLE entries_fts USING fts5(
    content,
    role,
    entry_type,
    content='entries',
    content_rowid='id'
);

2. MESSAGES (Conversational Content)

Extracted from user and assistant entries with message-specific fields.

CREATE TABLE messages (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    entry_id INTEGER NOT NULL REFERENCES entries(id) ON DELETE CASCADE,

    -- Message identification
    message_id TEXT,                    -- API message ID (msg_xxx)
    request_id TEXT,                    -- API request ID (req_xxx)

    -- Content
    role TEXT NOT NULL,                 -- user, assistant
    content TEXT,
    content_length INTEGER,
    has_code BOOLEAN DEFAULT FALSE,
    has_markdown BOOLEAN DEFAULT FALSE,

    -- Model info (assistant only)
    model TEXT,
    stop_reason TEXT,
    stop_sequence TEXT,

    -- Token usage (assistant only)
    input_tokens INTEGER,
    output_tokens INTEGER,
    cache_read_tokens INTEGER,
    cache_creation_tokens INTEGER,
    cost_usd REAL,
    duration_ms INTEGER,

    -- Tool use linkage
    source_tool_use_id TEXT,            -- Links tool_result to tool_use
    tool_use_result TEXT,               -- Extracted tool result

    -- Thinking metadata
    thinking_metadata TEXT,             -- JSON blob

    -- Flags
    is_api_error BOOLEAN DEFAULT FALSE,
    is_compact_summary BOOLEAN DEFAULT FALSE,
    is_visible_transcript_only BOOLEAN DEFAULT FALSE,

    -- Todos state (user entries)
    todos TEXT,                         -- JSON array of todos

    -- Error info
    error TEXT,                         -- JSON blob if error

    -- Enhanced Export Fields (v4.0 - January 2026)
    tool_use TEXT,                      -- JSON blob of tool invocations
    tool_result TEXT,                   -- JSON blob of tool results with status
    export_metadata TEXT                -- JSON blob of export header metadata
);

CREATE INDEX idx_messages_entry ON messages(entry_id);
CREATE INDEX idx_messages_role ON messages(role);
CREATE INDEX idx_messages_model ON messages(model);
CREATE INDEX idx_messages_request ON messages(request_id);

-- Enhanced Export Indexes (v4.0)
CREATE INDEX idx_messages_tool_status ON messages(json_extract(tool_result, '$.status'));
CREATE INDEX idx_messages_export_model ON messages(json_extract(export_metadata, '$.model'));

Enhanced Export Field Schemas (v4.0)

tool_use - Stores tool invocations from assistant messages:

{
  "tool_name": "Edit",
  "parameters": {
    "file_path": "/path/to/file.py",
    "old_string": "...",
    "new_string": "..."
  }
}

tool_result - Stores tool execution results with status tracking:

{
  "tool_name": "Edit",
  "status": "success",           // success | error | interrupted | truncated
  "diff": {
    "lines_added": 15,
    "lines_removed": 3,
    "truncated_range": null
  },
  "error_message": null
}

export_metadata - Stores metadata from export file headers:

{
  "version": "1.0.96",
  "model": "claude-opus-4-5-20251101",
  "tier": "anthropic_business",
  "working_dir": "/path/to/project"
}

3. SUMMARIES (Conversation Summaries)

Extracted from summary entries.

CREATE TABLE summaries (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    entry_id INTEGER NOT NULL REFERENCES entries(id) ON DELETE CASCADE,

    summary_text TEXT NOT NULL,
    leaf_uuid TEXT,                     -- Reference to conversation leaf

    -- For search
    UNIQUE(leaf_uuid)
);

CREATE INDEX idx_summaries_leaf ON summaries(leaf_uuid);

-- FTS for summary search
CREATE VIRTUAL TABLE summaries_fts USING fts5(
    summary_text,
    content='summaries',
    content_rowid='id'
);

4. SYSTEM_EVENTS (System Operations)

Extracted from system entries.

CREATE TABLE system_events (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    entry_id INTEGER NOT NULL REFERENCES entries(id) ON DELETE CASCADE,

    subtype TEXT,                       -- compact_boundary, error, etc.
    level TEXT,                         -- info, warning, error
    content TEXT,

    -- Compaction metadata
    compact_trigger TEXT,               -- auto, manual
    compact_pre_tokens INTEGER,

    -- Error/retry info
    cause TEXT,
    error TEXT,
    max_retries INTEGER,
    retry_attempt INTEGER,
    retry_in_ms INTEGER
);

CREATE INDEX idx_system_events_subtype ON system_events(subtype);
CREATE INDEX idx_system_events_level ON system_events(level);

5. QUEUE_OPERATIONS (Command Queue)

Extracted from queue-operation entries.

CREATE TABLE queue_operations (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    entry_id INTEGER NOT NULL REFERENCES entries(id) ON DELETE CASCADE,

    operation TEXT NOT NULL,            -- enqueue, dequeue, clear
    content TEXT,                       -- Command content
    timestamp TEXT
);

CREATE INDEX idx_queue_ops_operation ON queue_operations(operation);

6. FILE_SNAPSHOTS (File History)

Extracted from file-history-snapshot entries.

CREATE TABLE file_snapshots (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    entry_id INTEGER NOT NULL REFERENCES entries(id) ON DELETE CASCADE,

    message_id TEXT,                    -- Links to message
    snapshot_timestamp TEXT,
    is_snapshot_update BOOLEAN DEFAULT FALSE,

    -- Full snapshot data
    snapshot_data TEXT                  -- JSON blob of trackedFileBackups
);

CREATE TABLE file_backup_entries (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    snapshot_id INTEGER NOT NULL REFERENCES file_snapshots(id) ON DELETE CASCADE,

    file_path TEXT NOT NULL,
    backup_file_name TEXT,
    version INTEGER,
    backup_time TEXT
);

CREATE INDEX idx_file_backups_path ON file_backup_entries(file_path);
CREATE INDEX idx_file_backups_snapshot ON file_backup_entries(snapshot_id);

7. MESSAGE_THREADING (Conversation Graph)

Extracted relationships for graph traversal.

CREATE TABLE message_threading (
    id INTEGER PRIMARY KEY AUTOINCREMENT,

    uuid TEXT NOT NULL,
    parent_uuid TEXT,
    logical_parent_uuid TEXT,
    session_id TEXT,
    entry_type TEXT,
    timestamp TEXT,

    UNIQUE(uuid)
);

CREATE INDEX idx_threading_parent ON message_threading(parent_uuid);
CREATE INDEX idx_threading_logical ON message_threading(logical_parent_uuid);
CREATE INDEX idx_threading_session ON message_threading(session_id);

8. TOKEN_USAGE (Aggregated Stats)

Preserved from existing schema for cost tracking.

CREATE TABLE token_usage (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    message_id INTEGER REFERENCES messages(id) ON DELETE CASCADE,
    session_id TEXT,
    project_path TEXT,

    input_tokens INTEGER DEFAULT 0,
    output_tokens INTEGER DEFAULT 0,
    cache_read_input_tokens INTEGER DEFAULT 0,
    cache_creation_input_tokens INTEGER DEFAULT 0,
    cost_usd REAL DEFAULT 0.0,
    duration_ms INTEGER DEFAULT 0,
    model TEXT,
    timestamp TEXT
);

CREATE INDEX idx_token_session ON token_usage(session_id);
CREATE INDEX idx_token_project ON token_usage(project_path);
CREATE INDEX idx_token_timestamp ON token_usage(timestamp);

Field Mapping by Entry Type

user → entries + messages

Source Field	entries	messages
type	entry_type	-
uuid	uuid	-
parentUuid	parent_uuid	-
sessionId	session_id	-
agentId	agent_id	-
cwd	cwd	-
gitBranch	git_branch	-
version	version	-
timestamp	timestamp	-
isSidechain	is_sidechain	-
isMeta	is_meta	-
userType	user_type	-
slug	slug	-
message.role	role	role
message.content	content	content
sourceToolUseID	-	source_tool_use_id
toolUseResult	-	tool_use_result
thinkingMetadata	-	thinking_metadata
todos	-	todos
isCompactSummary	-	is_compact_summary
isVisibleInTranscriptOnly	-	is_visible_transcript_only
(full entry)	raw_json	-

assistant → entries + messages

Source Field	entries	messages
type	entry_type	-
uuid	uuid	-
parentUuid	parent_uuid	-
sessionId	session_id	-
agentId	agent_id	-
requestId	-	request_id
message.id	-	message_id
message.model	-	model
message.role	role	role
message.content	content	content
message.stop_reason	-	stop_reason
message.stop_sequence	-	stop_sequence
message.usage.*	-	input_tokens, etc.
isApiErrorMessage	-	is_api_error
error	-	error
(full entry)	raw_json	-

system → entries + system_events

Source Field	entries	system_events
type	entry_type	-
subtype	-	subtype
level	-	level
content	content	content
compactMetadata	-	compact_* fields
cause	-	cause
error	-	error
maxRetries	-	max_retries
retryAttempt	-	retry_attempt
retryInMs	-	retry_in_ms

summary → entries + summaries

Source Field	entries	summaries
type	entry_type	-
summary	content	summary_text
leafUuid	-	leaf_uuid

queue-operation → entries + queue_operations

Source Field	entries	queue_operations
type	entry_type	-
operation	-	operation
content	content	content
timestamp	timestamp	timestamp
sessionId	session_id	-

file-history-snapshot → entries + file_snapshots + file_backup_entries

Source Field	entries	file_snapshots	file_backup_entries
type	entry_type	-	-
messageId	-	message_id	-
isSnapshotUpdate	-	is_snapshot_update	-
snapshot.timestamp	timestamp	snapshot_timestamp	-
snapshot.trackedFileBackups	-	snapshot_data	(expanded rows)

Migration Strategy

Backup existing data
Create new schema alongside old
Re-extract all JSONL files with new extractor
Populate all tables in single pass
Verify counts match source
Drop old tables after verification

Query Examples

-- Find all agent messages
SELECT e.*, m.* FROM entries e
JOIN messages m ON m.entry_id = e.id
WHERE e.agent_id IS NOT NULL;

-- Get conversation thread
WITH RECURSIVE thread AS (
    SELECT * FROM message_threading WHERE uuid = ?
    UNION ALL
    SELECT mt.* FROM message_threading mt
    JOIN thread t ON mt.uuid = t.parent_uuid
)
SELECT * FROM thread;

-- Find all summaries for recall
SELECT s.summary_text, e.session_id, e.timestamp
FROM summaries s
JOIN entries e ON s.entry_id = e.id
ORDER BY e.timestamp DESC;

-- Get file history for a path
SELECT fbe.*, fs.snapshot_timestamp
FROM file_backup_entries fbe
JOIN file_snapshots fs ON fbe.snapshot_id = fs.id
WHERE fbe.file_path LIKE '%/README.md'
ORDER BY fs.snapshot_timestamp DESC;

-- Token usage by session
SELECT session_id, SUM(input_tokens) as total_input,
       SUM(output_tokens) as total_output,
       SUM(cost_usd) as total_cost
FROM token_usage
GROUP BY session_id;

-- Enhanced Export Analytics (v4.0)
-- Get code change statistics
SELECT
    COUNT(*) as total_edits,
    SUM(json_extract(tool_result, '$.diff.lines_added')) as lines_added,
    SUM(json_extract(tool_result, '$.diff.lines_removed')) as lines_removed
FROM messages
WHERE tool_result IS NOT NULL
  AND json_extract(tool_result, '$.tool_name') = 'Edit';

-- Find failed tool invocations
SELECT
    json_extract(tool_result, '$.tool_name') as tool,
    json_extract(tool_result, '$.status') as status,
    json_extract(tool_result, '$.error_message') as error,
    timestamp
FROM messages
WHERE json_extract(tool_result, '$.status') IN ('error', 'interrupted', 'truncated')
ORDER BY timestamp DESC;

-- Group messages by Claude model
SELECT
    json_extract(export_metadata, '$.model') as model,
    COUNT(*) as message_count
FROM messages
WHERE export_metadata IS NOT NULL
GROUP BY json_extract(export_metadata, '$.model')
ORDER BY message_count DESC;

Implementation Notes

Single-pass extraction: Process each JSONL line once, insert into entries + type-specific table
raw_json preservation: Store original JSON for future schema evolution
Deduplication: Hash based on content + type + timestamp
Foreign keys: All type-specific tables reference entries(id)
FTS triggers: Auto-update FTS tables on insert/delete

Estimated Storage:

~180K entries × ~2KB avg = ~360MB entries table
Type-specific tables: ~50MB total
Indexes: ~100MB
Total: ~500MB

Last Updated: 2026-01-04 Author: CODITECT Framework

Changelog

v1.1.0 (2026-01-04)

Enhanced Export Fields (v4.0 unified-message-extractor)

Added tool_use column to messages table for tool invocation tracking
Added tool_result column with status tracking (success/error/interrupted/truncated)
Added export_metadata column for export header metadata (version, model, tier)
Added indexes for efficient JSON field querying
Added SQL query examples for enhanced analytics

v1.0.0 (2025-12-23)

Initial schema V2 design
All 6 JSONL entry types captured
Type-specific tables for optimized queries
Full JSON preservation

Design Principles​

Schema Overview​

Table Definitions​

1. ENTRIES (Master Table)​

2. MESSAGES (Conversational Content)​

Enhanced Export Field Schemas (v4.0)​

3. SUMMARIES (Conversation Summaries)​

4. SYSTEM_EVENTS (System Operations)​

5. QUEUE_OPERATIONS (Command Queue)​

6. FILE_SNAPSHOTS (File History)​

7. MESSAGE_THREADING (Conversation Graph)​

8. TOKEN_USAGE (Aggregated Stats)​

Field Mapping by Entry Type​

user → entries + messages​

assistant → entries + messages​

system → entries + system_events​

summary → entries + summaries​

queue-operation → entries + queue_operations​

file-history-snapshot → entries + file_snapshots + file_backup_entries​

Migration Strategy​

Query Examples​

Implementation Notes​

Changelog​

v1.1.0 (2026-01-04)​

v1.0.0 (2025-12-23)​