Skip to main content

Context Database Schema V2 - Complete Data Capture

Version: 2.0.0 Date: 2025-12-23 Status: Proposed


Design Principles

  1. Zero Data Loss - Every field from every entry type is preserved
  2. Query Optimized - Common fields extracted and indexed
  3. Type-Specific - Specialized tables for different data categories
  4. Full JSON Preserved - Original entry stored for completeness

Schema Overview

┌─────────────────────────────────────────────────────────────────┐
│ ENTRIES (master) │
│ All JSONL entries with common fields + full JSON blob │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ MESSAGES │ │ SUMMARIES │ │ FILE_SNAPSHOTS │ │
│ │ user/assistant│ │ conversation │ │ file backups │ │
│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │SYSTEM_EVENTS │ │QUEUE_OPERATIONS│ │ MESSAGE_THREADING │ │
│ │ compaction │ │ command queue │ │ uuid relationships│ │
│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Table Definitions

1. ENTRIES (Master Table)

Stores ALL entries with common fields extracted for indexing.

CREATE TABLE entries (
-- Primary key
id INTEGER PRIMARY KEY AUTOINCREMENT,

-- Content hash for deduplication
hash TEXT UNIQUE NOT NULL,

-- Entry identification
entry_type TEXT NOT NULL, -- user, assistant, system, queue-operation, summary, file-history-snapshot
uuid TEXT, -- Entry's unique ID
parent_uuid TEXT, -- Parent entry UUID (threading)
logical_parent_uuid TEXT, -- Logical parent (for compaction)

-- Session context
session_id TEXT,
agent_id TEXT, -- NULL for main sessions
slug TEXT, -- Human-readable session name

-- Environment
cwd TEXT, -- Working directory
git_branch TEXT,
version TEXT, -- Claude Code version

-- Timestamps
timestamp TEXT,
extracted_at TEXT,

-- Flags
is_sidechain BOOLEAN DEFAULT FALSE,
is_meta BOOLEAN DEFAULT FALSE,
user_type TEXT, -- external, internal

-- Content (extracted for search)
content TEXT, -- Main text content
role TEXT, -- user, assistant, system

-- Full original JSON (preserves everything)
raw_json TEXT NOT NULL,

-- Source tracking
source_file TEXT,
source_line INTEGER
);

-- Indexes
CREATE INDEX idx_entries_type ON entries(entry_type);
CREATE INDEX idx_entries_uuid ON entries(uuid);
CREATE INDEX idx_entries_parent ON entries(parent_uuid);
CREATE INDEX idx_entries_session ON entries(session_id);
CREATE INDEX idx_entries_agent ON entries(agent_id);
CREATE INDEX idx_entries_timestamp ON entries(timestamp);
CREATE INDEX idx_entries_role ON entries(role);

-- Full-text search
CREATE VIRTUAL TABLE entries_fts USING fts5(
content,
role,
entry_type,
content='entries',
content_rowid='id'
);

2. MESSAGES (Conversational Content)

Extracted from user and assistant entries with message-specific fields.

CREATE TABLE messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
entry_id INTEGER NOT NULL REFERENCES entries(id) ON DELETE CASCADE,

-- Message identification
message_id TEXT, -- API message ID (msg_xxx)
request_id TEXT, -- API request ID (req_xxx)

-- Content
role TEXT NOT NULL, -- user, assistant
content TEXT,
content_length INTEGER,
has_code BOOLEAN DEFAULT FALSE,
has_markdown BOOLEAN DEFAULT FALSE,

-- Model info (assistant only)
model TEXT,
stop_reason TEXT,
stop_sequence TEXT,

-- Token usage (assistant only)
input_tokens INTEGER,
output_tokens INTEGER,
cache_read_tokens INTEGER,
cache_creation_tokens INTEGER,
cost_usd REAL,
duration_ms INTEGER,

-- Tool use linkage
source_tool_use_id TEXT, -- Links tool_result to tool_use
tool_use_result TEXT, -- Extracted tool result

-- Thinking metadata
thinking_metadata TEXT, -- JSON blob

-- Flags
is_api_error BOOLEAN DEFAULT FALSE,
is_compact_summary BOOLEAN DEFAULT FALSE,
is_visible_transcript_only BOOLEAN DEFAULT FALSE,

-- Todos state (user entries)
todos TEXT, -- JSON array of todos

-- Error info
error TEXT, -- JSON blob if error

-- Enhanced Export Fields (v4.0 - January 2026)
tool_use TEXT, -- JSON blob of tool invocations
tool_result TEXT, -- JSON blob of tool results with status
export_metadata TEXT -- JSON blob of export header metadata
);

CREATE INDEX idx_messages_entry ON messages(entry_id);
CREATE INDEX idx_messages_role ON messages(role);
CREATE INDEX idx_messages_model ON messages(model);
CREATE INDEX idx_messages_request ON messages(request_id);

-- Enhanced Export Indexes (v4.0)
CREATE INDEX idx_messages_tool_status ON messages(json_extract(tool_result, '$.status'));
CREATE INDEX idx_messages_export_model ON messages(json_extract(export_metadata, '$.model'));

Enhanced Export Field Schemas (v4.0)

tool_use - Stores tool invocations from assistant messages:

{
"tool_name": "Edit",
"parameters": {
"file_path": "/path/to/file.py",
"old_string": "...",
"new_string": "..."
}
}

tool_result - Stores tool execution results with status tracking:

{
"tool_name": "Edit",
"status": "success", // success | error | interrupted | truncated
"diff": {
"lines_added": 15,
"lines_removed": 3,
"truncated_range": null
},
"error_message": null
}

export_metadata - Stores metadata from export file headers:

{
"version": "1.0.96",
"model": "claude-opus-4-5-20251101",
"tier": "anthropic_business",
"working_dir": "/path/to/project"
}

3. SUMMARIES (Conversation Summaries)

Extracted from summary entries.

CREATE TABLE summaries (
id INTEGER PRIMARY KEY AUTOINCREMENT,
entry_id INTEGER NOT NULL REFERENCES entries(id) ON DELETE CASCADE,

summary_text TEXT NOT NULL,
leaf_uuid TEXT, -- Reference to conversation leaf

-- For search
UNIQUE(leaf_uuid)
);

CREATE INDEX idx_summaries_leaf ON summaries(leaf_uuid);

-- FTS for summary search
CREATE VIRTUAL TABLE summaries_fts USING fts5(
summary_text,
content='summaries',
content_rowid='id'
);

4. SYSTEM_EVENTS (System Operations)

Extracted from system entries.

CREATE TABLE system_events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
entry_id INTEGER NOT NULL REFERENCES entries(id) ON DELETE CASCADE,

subtype TEXT, -- compact_boundary, error, etc.
level TEXT, -- info, warning, error
content TEXT,

-- Compaction metadata
compact_trigger TEXT, -- auto, manual
compact_pre_tokens INTEGER,

-- Error/retry info
cause TEXT,
error TEXT,
max_retries INTEGER,
retry_attempt INTEGER,
retry_in_ms INTEGER
);

CREATE INDEX idx_system_events_subtype ON system_events(subtype);
CREATE INDEX idx_system_events_level ON system_events(level);

5. QUEUE_OPERATIONS (Command Queue)

Extracted from queue-operation entries.

CREATE TABLE queue_operations (
id INTEGER PRIMARY KEY AUTOINCREMENT,
entry_id INTEGER NOT NULL REFERENCES entries(id) ON DELETE CASCADE,

operation TEXT NOT NULL, -- enqueue, dequeue, clear
content TEXT, -- Command content
timestamp TEXT
);

CREATE INDEX idx_queue_ops_operation ON queue_operations(operation);

6. FILE_SNAPSHOTS (File History)

Extracted from file-history-snapshot entries.

CREATE TABLE file_snapshots (
id INTEGER PRIMARY KEY AUTOINCREMENT,
entry_id INTEGER NOT NULL REFERENCES entries(id) ON DELETE CASCADE,

message_id TEXT, -- Links to message
snapshot_timestamp TEXT,
is_snapshot_update BOOLEAN DEFAULT FALSE,

-- Full snapshot data
snapshot_data TEXT -- JSON blob of trackedFileBackups
);

CREATE TABLE file_backup_entries (
id INTEGER PRIMARY KEY AUTOINCREMENT,
snapshot_id INTEGER NOT NULL REFERENCES file_snapshots(id) ON DELETE CASCADE,

file_path TEXT NOT NULL,
backup_file_name TEXT,
version INTEGER,
backup_time TEXT
);

CREATE INDEX idx_file_backups_path ON file_backup_entries(file_path);
CREATE INDEX idx_file_backups_snapshot ON file_backup_entries(snapshot_id);

7. MESSAGE_THREADING (Conversation Graph)

Extracted relationships for graph traversal.

CREATE TABLE message_threading (
id INTEGER PRIMARY KEY AUTOINCREMENT,

uuid TEXT NOT NULL,
parent_uuid TEXT,
logical_parent_uuid TEXT,
session_id TEXT,
entry_type TEXT,
timestamp TEXT,

UNIQUE(uuid)
);

CREATE INDEX idx_threading_parent ON message_threading(parent_uuid);
CREATE INDEX idx_threading_logical ON message_threading(logical_parent_uuid);
CREATE INDEX idx_threading_session ON message_threading(session_id);

8. TOKEN_USAGE (Aggregated Stats)

Preserved from existing schema for cost tracking.

CREATE TABLE token_usage (
id INTEGER PRIMARY KEY AUTOINCREMENT,
message_id INTEGER REFERENCES messages(id) ON DELETE CASCADE,
session_id TEXT,
project_path TEXT,

input_tokens INTEGER DEFAULT 0,
output_tokens INTEGER DEFAULT 0,
cache_read_input_tokens INTEGER DEFAULT 0,
cache_creation_input_tokens INTEGER DEFAULT 0,
cost_usd REAL DEFAULT 0.0,
duration_ms INTEGER DEFAULT 0,
model TEXT,
timestamp TEXT
);

CREATE INDEX idx_token_session ON token_usage(session_id);
CREATE INDEX idx_token_project ON token_usage(project_path);
CREATE INDEX idx_token_timestamp ON token_usage(timestamp);

Field Mapping by Entry Type

user → entries + messages

Source Fieldentriesmessages
typeentry_type-
uuiduuid-
parentUuidparent_uuid-
sessionIdsession_id-
agentIdagent_id-
cwdcwd-
gitBranchgit_branch-
versionversion-
timestamptimestamp-
isSidechainis_sidechain-
isMetais_meta-
userTypeuser_type-
slugslug-
message.rolerolerole
message.contentcontentcontent
sourceToolUseID-source_tool_use_id
toolUseResult-tool_use_result
thinkingMetadata-thinking_metadata
todos-todos
isCompactSummary-is_compact_summary
isVisibleInTranscriptOnly-is_visible_transcript_only
(full entry)raw_json-

assistant → entries + messages

Source Fieldentriesmessages
typeentry_type-
uuiduuid-
parentUuidparent_uuid-
sessionIdsession_id-
agentIdagent_id-
requestId-request_id
message.id-message_id
message.model-model
message.rolerolerole
message.contentcontentcontent
message.stop_reason-stop_reason
message.stop_sequence-stop_sequence
message.usage.*-input_tokens, etc.
isApiErrorMessage-is_api_error
error-error
(full entry)raw_json-

system → entries + system_events

Source Fieldentriessystem_events
typeentry_type-
subtype-subtype
level-level
contentcontentcontent
compactMetadata-compact_* fields
cause-cause
error-error
maxRetries-max_retries
retryAttempt-retry_attempt
retryInMs-retry_in_ms

summary → entries + summaries

Source Fieldentriessummaries
typeentry_type-
summarycontentsummary_text
leafUuid-leaf_uuid

queue-operation → entries + queue_operations

Source Fieldentriesqueue_operations
typeentry_type-
operation-operation
contentcontentcontent
timestamptimestamptimestamp
sessionIdsession_id-

file-history-snapshot → entries + file_snapshots + file_backup_entries

Source Fieldentriesfile_snapshotsfile_backup_entries
typeentry_type--
messageId-message_id-
isSnapshotUpdate-is_snapshot_update-
snapshot.timestamptimestampsnapshot_timestamp-
snapshot.trackedFileBackups-snapshot_data(expanded rows)

Migration Strategy

  1. Backup existing data
  2. Create new schema alongside old
  3. Re-extract all JSONL files with new extractor
  4. Populate all tables in single pass
  5. Verify counts match source
  6. Drop old tables after verification

Query Examples

-- Find all agent messages
SELECT e.*, m.* FROM entries e
JOIN messages m ON m.entry_id = e.id
WHERE e.agent_id IS NOT NULL;

-- Get conversation thread
WITH RECURSIVE thread AS (
SELECT * FROM message_threading WHERE uuid = ?
UNION ALL
SELECT mt.* FROM message_threading mt
JOIN thread t ON mt.uuid = t.parent_uuid
)
SELECT * FROM thread;

-- Find all summaries for recall
SELECT s.summary_text, e.session_id, e.timestamp
FROM summaries s
JOIN entries e ON s.entry_id = e.id
ORDER BY e.timestamp DESC;

-- Get file history for a path
SELECT fbe.*, fs.snapshot_timestamp
FROM file_backup_entries fbe
JOIN file_snapshots fs ON fbe.snapshot_id = fs.id
WHERE fbe.file_path LIKE '%/README.md'
ORDER BY fs.snapshot_timestamp DESC;

-- Token usage by session
SELECT session_id, SUM(input_tokens) as total_input,
SUM(output_tokens) as total_output,
SUM(cost_usd) as total_cost
FROM token_usage
GROUP BY session_id;

-- Enhanced Export Analytics (v4.0)
-- Get code change statistics
SELECT
COUNT(*) as total_edits,
SUM(json_extract(tool_result, '$.diff.lines_added')) as lines_added,
SUM(json_extract(tool_result, '$.diff.lines_removed')) as lines_removed
FROM messages
WHERE tool_result IS NOT NULL
AND json_extract(tool_result, '$.tool_name') = 'Edit';

-- Find failed tool invocations
SELECT
json_extract(tool_result, '$.tool_name') as tool,
json_extract(tool_result, '$.status') as status,
json_extract(tool_result, '$.error_message') as error,
timestamp
FROM messages
WHERE json_extract(tool_result, '$.status') IN ('error', 'interrupted', 'truncated')
ORDER BY timestamp DESC;

-- Group messages by Claude model
SELECT
json_extract(export_metadata, '$.model') as model,
COUNT(*) as message_count
FROM messages
WHERE export_metadata IS NOT NULL
GROUP BY json_extract(export_metadata, '$.model')
ORDER BY message_count DESC;

Implementation Notes

  1. Single-pass extraction: Process each JSONL line once, insert into entries + type-specific table
  2. raw_json preservation: Store original JSON for future schema evolution
  3. Deduplication: Hash based on content + type + timestamp
  4. Foreign keys: All type-specific tables reference entries(id)
  5. FTS triggers: Auto-update FTS tables on insert/delete

Estimated Storage:

  • ~180K entries × ~2KB avg = ~360MB entries table
  • Type-specific tables: ~50MB total
  • Indexes: ~100MB
  • Total: ~500MB

Last Updated: 2026-01-04 Author: CODITECT Framework


Changelog

v1.1.0 (2026-01-04)

Enhanced Export Fields (v4.0 unified-message-extractor)

  • Added tool_use column to messages table for tool invocation tracking
  • Added tool_result column with status tracking (success/error/interrupted/truncated)
  • Added export_metadata column for export header metadata (version, model, tier)
  • Added indexes for efficient JSON field querying
  • Added SQL query examples for enhanced analytics

v1.0.0 (2025-12-23)

  • Initial schema V2 design
  • All 6 JSONL entry types captured
  • Type-specific tables for optimized queries
  • Full JSON preservation