Context Database Schema V2 Complete Data Capture
Context Database Schema V2 - Complete Data Capture
Version: 2.0.0 Date: 2025-12-23 Status: Proposed
Design Principles
- Zero Data Loss - Every field from every entry type is preserved
- Query Optimized - Common fields extracted and indexed
- Type-Specific - Specialized tables for different data categories
- Full JSON Preserved - Original entry stored for completeness
Schema Overview
┌─────────────────────────────────────────────────────────────────┐
│ ENTRIES (master) │
│ All JSONL entries with common fields + full JSON blob │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ MESSAGES │ │ SUMMARIES │ │ FILE_SNAPSHOTS │ │
│ │ user/assistant│ │ conversation │ │ file backups │ │
│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │SYSTEM_EVENTS │ │QUEUE_OPERATIONS│ │ MESSAGE_THREADING │ │
│ │ compaction │ │ command queue │ │ uuid relationships│ │
│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Table Definitions
1. ENTRIES (Master Table)
Stores ALL entries with common fields extracted for indexing.
CREATE TABLE entries (
-- Primary key
id INTEGER PRIMARY KEY AUTOINCREMENT,
-- Content hash for deduplication
hash TEXT UNIQUE NOT NULL,
-- Entry identification
entry_type TEXT NOT NULL, -- user, assistant, system, queue-operation, summary, file-history-snapshot
uuid TEXT, -- Entry's unique ID
parent_uuid TEXT, -- Parent entry UUID (threading)
logical_parent_uuid TEXT, -- Logical parent (for compaction)
-- Session context
session_id TEXT,
agent_id TEXT, -- NULL for main sessions
slug TEXT, -- Human-readable session name
-- Environment
cwd TEXT, -- Working directory
git_branch TEXT,
version TEXT, -- Claude Code version
-- Timestamps
timestamp TEXT,
extracted_at TEXT,
-- Flags
is_sidechain BOOLEAN DEFAULT FALSE,
is_meta BOOLEAN DEFAULT FALSE,
user_type TEXT, -- external, internal
-- Content (extracted for search)
content TEXT, -- Main text content
role TEXT, -- user, assistant, system
-- Full original JSON (preserves everything)
raw_json TEXT NOT NULL,
-- Source tracking
source_file TEXT,
source_line INTEGER
);
-- Indexes
CREATE INDEX idx_entries_type ON entries(entry_type);
CREATE INDEX idx_entries_uuid ON entries(uuid);
CREATE INDEX idx_entries_parent ON entries(parent_uuid);
CREATE INDEX idx_entries_session ON entries(session_id);
CREATE INDEX idx_entries_agent ON entries(agent_id);
CREATE INDEX idx_entries_timestamp ON entries(timestamp);
CREATE INDEX idx_entries_role ON entries(role);
-- Full-text search
CREATE VIRTUAL TABLE entries_fts USING fts5(
content,
role,
entry_type,
content='entries',
content_rowid='id'
);
2. MESSAGES (Conversational Content)
Extracted from user and assistant entries with message-specific fields.
CREATE TABLE messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
entry_id INTEGER NOT NULL REFERENCES entries(id) ON DELETE CASCADE,
-- Message identification
message_id TEXT, -- API message ID (msg_xxx)
request_id TEXT, -- API request ID (req_xxx)
-- Content
role TEXT NOT NULL, -- user, assistant
content TEXT,
content_length INTEGER,
has_code BOOLEAN DEFAULT FALSE,
has_markdown BOOLEAN DEFAULT FALSE,
-- Model info (assistant only)
model TEXT,
stop_reason TEXT,
stop_sequence TEXT,
-- Token usage (assistant only)
input_tokens INTEGER,
output_tokens INTEGER,
cache_read_tokens INTEGER,
cache_creation_tokens INTEGER,
cost_usd REAL,
duration_ms INTEGER,
-- Tool use linkage
source_tool_use_id TEXT, -- Links tool_result to tool_use
tool_use_result TEXT, -- Extracted tool result
-- Thinking metadata
thinking_metadata TEXT, -- JSON blob
-- Flags
is_api_error BOOLEAN DEFAULT FALSE,
is_compact_summary BOOLEAN DEFAULT FALSE,
is_visible_transcript_only BOOLEAN DEFAULT FALSE,
-- Todos state (user entries)
todos TEXT, -- JSON array of todos
-- Error info
error TEXT, -- JSON blob if error
-- Enhanced Export Fields (v4.0 - January 2026)
tool_use TEXT, -- JSON blob of tool invocations
tool_result TEXT, -- JSON blob of tool results with status
export_metadata TEXT -- JSON blob of export header metadata
);
CREATE INDEX idx_messages_entry ON messages(entry_id);
CREATE INDEX idx_messages_role ON messages(role);
CREATE INDEX idx_messages_model ON messages(model);
CREATE INDEX idx_messages_request ON messages(request_id);
-- Enhanced Export Indexes (v4.0)
CREATE INDEX idx_messages_tool_status ON messages(json_extract(tool_result, '$.status'));
CREATE INDEX idx_messages_export_model ON messages(json_extract(export_metadata, '$.model'));
Enhanced Export Field Schemas (v4.0)
tool_use - Stores tool invocations from assistant messages:
{
"tool_name": "Edit",
"parameters": {
"file_path": "/path/to/file.py",
"old_string": "...",
"new_string": "..."
}
}
tool_result - Stores tool execution results with status tracking:
{
"tool_name": "Edit",
"status": "success", // success | error | interrupted | truncated
"diff": {
"lines_added": 15,
"lines_removed": 3,
"truncated_range": null
},
"error_message": null
}
export_metadata - Stores metadata from export file headers:
{
"version": "1.0.96",
"model": "claude-opus-4-5-20251101",
"tier": "anthropic_business",
"working_dir": "/path/to/project"
}
3. SUMMARIES (Conversation Summaries)
Extracted from summary entries.
CREATE TABLE summaries (
id INTEGER PRIMARY KEY AUTOINCREMENT,
entry_id INTEGER NOT NULL REFERENCES entries(id) ON DELETE CASCADE,
summary_text TEXT NOT NULL,
leaf_uuid TEXT, -- Reference to conversation leaf
-- For search
UNIQUE(leaf_uuid)
);
CREATE INDEX idx_summaries_leaf ON summaries(leaf_uuid);
-- FTS for summary search
CREATE VIRTUAL TABLE summaries_fts USING fts5(
summary_text,
content='summaries',
content_rowid='id'
);
4. SYSTEM_EVENTS (System Operations)
Extracted from system entries.
CREATE TABLE system_events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
entry_id INTEGER NOT NULL REFERENCES entries(id) ON DELETE CASCADE,
subtype TEXT, -- compact_boundary, error, etc.
level TEXT, -- info, warning, error
content TEXT,
-- Compaction metadata
compact_trigger TEXT, -- auto, manual
compact_pre_tokens INTEGER,
-- Error/retry info
cause TEXT,
error TEXT,
max_retries INTEGER,
retry_attempt INTEGER,
retry_in_ms INTEGER
);
CREATE INDEX idx_system_events_subtype ON system_events(subtype);
CREATE INDEX idx_system_events_level ON system_events(level);
5. QUEUE_OPERATIONS (Command Queue)
Extracted from queue-operation entries.
CREATE TABLE queue_operations (
id INTEGER PRIMARY KEY AUTOINCREMENT,
entry_id INTEGER NOT NULL REFERENCES entries(id) ON DELETE CASCADE,
operation TEXT NOT NULL, -- enqueue, dequeue, clear
content TEXT, -- Command content
timestamp TEXT
);
CREATE INDEX idx_queue_ops_operation ON queue_operations(operation);
6. FILE_SNAPSHOTS (File History)
Extracted from file-history-snapshot entries.
CREATE TABLE file_snapshots (
id INTEGER PRIMARY KEY AUTOINCREMENT,
entry_id INTEGER NOT NULL REFERENCES entries(id) ON DELETE CASCADE,
message_id TEXT, -- Links to message
snapshot_timestamp TEXT,
is_snapshot_update BOOLEAN DEFAULT FALSE,
-- Full snapshot data
snapshot_data TEXT -- JSON blob of trackedFileBackups
);
CREATE TABLE file_backup_entries (
id INTEGER PRIMARY KEY AUTOINCREMENT,
snapshot_id INTEGER NOT NULL REFERENCES file_snapshots(id) ON DELETE CASCADE,
file_path TEXT NOT NULL,
backup_file_name TEXT,
version INTEGER,
backup_time TEXT
);
CREATE INDEX idx_file_backups_path ON file_backup_entries(file_path);
CREATE INDEX idx_file_backups_snapshot ON file_backup_entries(snapshot_id);
7. MESSAGE_THREADING (Conversation Graph)
Extracted relationships for graph traversal.
CREATE TABLE message_threading (
id INTEGER PRIMARY KEY AUTOINCREMENT,
uuid TEXT NOT NULL,
parent_uuid TEXT,
logical_parent_uuid TEXT,
session_id TEXT,
entry_type TEXT,
timestamp TEXT,
UNIQUE(uuid)
);
CREATE INDEX idx_threading_parent ON message_threading(parent_uuid);
CREATE INDEX idx_threading_logical ON message_threading(logical_parent_uuid);
CREATE INDEX idx_threading_session ON message_threading(session_id);
8. TOKEN_USAGE (Aggregated Stats)
Preserved from existing schema for cost tracking.
CREATE TABLE token_usage (
id INTEGER PRIMARY KEY AUTOINCREMENT,
message_id INTEGER REFERENCES messages(id) ON DELETE CASCADE,
session_id TEXT,
project_path TEXT,
input_tokens INTEGER DEFAULT 0,
output_tokens INTEGER DEFAULT 0,
cache_read_input_tokens INTEGER DEFAULT 0,
cache_creation_input_tokens INTEGER DEFAULT 0,
cost_usd REAL DEFAULT 0.0,
duration_ms INTEGER DEFAULT 0,
model TEXT,
timestamp TEXT
);
CREATE INDEX idx_token_session ON token_usage(session_id);
CREATE INDEX idx_token_project ON token_usage(project_path);
CREATE INDEX idx_token_timestamp ON token_usage(timestamp);
Field Mapping by Entry Type
user → entries + messages
| Source Field | entries | messages |
|---|---|---|
| type | entry_type | - |
| uuid | uuid | - |
| parentUuid | parent_uuid | - |
| sessionId | session_id | - |
| agentId | agent_id | - |
| cwd | cwd | - |
| gitBranch | git_branch | - |
| version | version | - |
| timestamp | timestamp | - |
| isSidechain | is_sidechain | - |
| isMeta | is_meta | - |
| userType | user_type | - |
| slug | slug | - |
| message.role | role | role |
| message.content | content | content |
| sourceToolUseID | - | source_tool_use_id |
| toolUseResult | - | tool_use_result |
| thinkingMetadata | - | thinking_metadata |
| todos | - | todos |
| isCompactSummary | - | is_compact_summary |
| isVisibleInTranscriptOnly | - | is_visible_transcript_only |
| (full entry) | raw_json | - |
assistant → entries + messages
| Source Field | entries | messages |
|---|---|---|
| type | entry_type | - |
| uuid | uuid | - |
| parentUuid | parent_uuid | - |
| sessionId | session_id | - |
| agentId | agent_id | - |
| requestId | - | request_id |
| message.id | - | message_id |
| message.model | - | model |
| message.role | role | role |
| message.content | content | content |
| message.stop_reason | - | stop_reason |
| message.stop_sequence | - | stop_sequence |
| message.usage.* | - | input_tokens, etc. |
| isApiErrorMessage | - | is_api_error |
| error | - | error |
| (full entry) | raw_json | - |
system → entries + system_events
| Source Field | entries | system_events |
|---|---|---|
| type | entry_type | - |
| subtype | - | subtype |
| level | - | level |
| content | content | content |
| compactMetadata | - | compact_* fields |
| cause | - | cause |
| error | - | error |
| maxRetries | - | max_retries |
| retryAttempt | - | retry_attempt |
| retryInMs | - | retry_in_ms |
summary → entries + summaries
| Source Field | entries | summaries |
|---|---|---|
| type | entry_type | - |
| summary | content | summary_text |
| leafUuid | - | leaf_uuid |
queue-operation → entries + queue_operations
| Source Field | entries | queue_operations |
|---|---|---|
| type | entry_type | - |
| operation | - | operation |
| content | content | content |
| timestamp | timestamp | timestamp |
| sessionId | session_id | - |
file-history-snapshot → entries + file_snapshots + file_backup_entries
| Source Field | entries | file_snapshots | file_backup_entries |
|---|---|---|---|
| type | entry_type | - | - |
| messageId | - | message_id | - |
| isSnapshotUpdate | - | is_snapshot_update | - |
| snapshot.timestamp | timestamp | snapshot_timestamp | - |
| snapshot.trackedFileBackups | - | snapshot_data | (expanded rows) |
Migration Strategy
- Backup existing data
- Create new schema alongside old
- Re-extract all JSONL files with new extractor
- Populate all tables in single pass
- Verify counts match source
- Drop old tables after verification
Query Examples
-- Find all agent messages
SELECT e.*, m.* FROM entries e
JOIN messages m ON m.entry_id = e.id
WHERE e.agent_id IS NOT NULL;
-- Get conversation thread
WITH RECURSIVE thread AS (
SELECT * FROM message_threading WHERE uuid = ?
UNION ALL
SELECT mt.* FROM message_threading mt
JOIN thread t ON mt.uuid = t.parent_uuid
)
SELECT * FROM thread;
-- Find all summaries for recall
SELECT s.summary_text, e.session_id, e.timestamp
FROM summaries s
JOIN entries e ON s.entry_id = e.id
ORDER BY e.timestamp DESC;
-- Get file history for a path
SELECT fbe.*, fs.snapshot_timestamp
FROM file_backup_entries fbe
JOIN file_snapshots fs ON fbe.snapshot_id = fs.id
WHERE fbe.file_path LIKE '%/README.md'
ORDER BY fs.snapshot_timestamp DESC;
-- Token usage by session
SELECT session_id, SUM(input_tokens) as total_input,
SUM(output_tokens) as total_output,
SUM(cost_usd) as total_cost
FROM token_usage
GROUP BY session_id;
-- Enhanced Export Analytics (v4.0)
-- Get code change statistics
SELECT
COUNT(*) as total_edits,
SUM(json_extract(tool_result, '$.diff.lines_added')) as lines_added,
SUM(json_extract(tool_result, '$.diff.lines_removed')) as lines_removed
FROM messages
WHERE tool_result IS NOT NULL
AND json_extract(tool_result, '$.tool_name') = 'Edit';
-- Find failed tool invocations
SELECT
json_extract(tool_result, '$.tool_name') as tool,
json_extract(tool_result, '$.status') as status,
json_extract(tool_result, '$.error_message') as error,
timestamp
FROM messages
WHERE json_extract(tool_result, '$.status') IN ('error', 'interrupted', 'truncated')
ORDER BY timestamp DESC;
-- Group messages by Claude model
SELECT
json_extract(export_metadata, '$.model') as model,
COUNT(*) as message_count
FROM messages
WHERE export_metadata IS NOT NULL
GROUP BY json_extract(export_metadata, '$.model')
ORDER BY message_count DESC;
Implementation Notes
- Single-pass extraction: Process each JSONL line once, insert into entries + type-specific table
- raw_json preservation: Store original JSON for future schema evolution
- Deduplication: Hash based on content + type + timestamp
- Foreign keys: All type-specific tables reference entries(id)
- FTS triggers: Auto-update FTS tables on insert/delete
Estimated Storage:
- ~180K entries × ~2KB avg = ~360MB entries table
- Type-specific tables: ~50MB total
- Indexes: ~100MB
- Total: ~500MB
Last Updated: 2026-01-04 Author: CODITECT Framework
Changelog
v1.1.0 (2026-01-04)
Enhanced Export Fields (v4.0 unified-message-extractor)
- Added
tool_usecolumn to messages table for tool invocation tracking - Added
tool_resultcolumn with status tracking (success/error/interrupted/truncated) - Added
export_metadatacolumn for export header metadata (version, model, tier) - Added indexes for efficient JSON field querying
- Added SQL query examples for enhanced analytics
v1.0.0 (2025-12-23)
- Initial schema V2 design
- All 6 JSONL entry types captured
- Type-specific tables for optimized queries
- Full JSON preservation