Skip to main content

ADR-118: Four-Tier Database Architecture

Status

ACCEPTED (2026-01-28)

Context

Problem Statement

CODITECT's original database architecture evolved organically, resulting in a single monolithic context.db that contained:

  • Component metadata (agents, skills, commands)
  • Session messages (251K+ rows)
  • Tool analytics (8.2M+ rows)
  • Token economics (17.8M+ rows)
  • Accumulated knowledge (decisions, learnings, error_solutions)

This caused several problems:

ProblemImpact
Single point of failureAll data lost if context.db corrupts
Backup complexityCan't prioritize critical vs regenerable data
Size bloat12+ GB makes backup slow and expensive
Framework updatesRisk overwriting customer data
Recovery difficultyNo way to rebuild without full backup

Previous Decisions

  • ADR-089: Separated component data into platform.db (read-only)
  • ADR-103: Conceptualized four-database split but never fully implemented

Key Insight

Not all data has equal value:

Data TypeReplaceabilityExample
IrreplaceableCannot be regeneratedDecisions, learnings, error_solutions
RegenerableCan rebuild from sourceMessages (from JSONL), analytics
DerivedComputed from other dataEmbeddings, indexes

Decision

Four-Tier Database Architecture

Split data into four tiers based on criticality and regenerability:

┌─────────────────────────────────────────────────────────────────┐
│ TIER 1: PLATFORM (Regenerable) │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ platform.db │ │
│ │ • Component metadata (agents, skills, commands) │ │
│ │ • Capabilities, invocation patterns │ │
│ │ • Regenerable via: python3 scripts/component-indexer.py │ │
│ └─────────────────────────────────────────────────────────┘ │
│ Backup: NOT REQUIRED (regenerate in ~2 minutes) │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ TIER 2: ORGANIZATION (Critical) │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ org.db ⚠️ CRITICAL │ │
│ │ • skill_learnings (758K+ rows) - Accumulated improvements│ │
│ │ • decisions (1.8K+ rows) - Architectural decisions │ │
│ │ • error_solutions (475+ rows) - Problem-solution pairs │ │
│ │ • CANNOT BE REGENERATED - represents months of learning │ │
│ └─────────────────────────────────────────────────────────┘ │
│ Backup: DAILY with Grandfather-Father-Son (GFS) retention │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ TIER 3: SESSIONS (Regenerable) │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ sessions.db │ │
│ │ • messages (251K+ rows) - Extracted from JSONL │ │
│ │ • tool_analytics (8.2M+ rows) - Tool usage metrics │ │
│ │ • token_economics (17.8M+ rows) - Token tracking │ │
│ │ • activity_associations - Project/track mappings │ │
│ │ • Regenerable via: python3 scripts/unified-message- │ │
│ │ extractor.py (source: unified_messages.jsonl) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ Backup: OPTIONAL (can regenerate from unified_messages.jsonl) │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ TIER 4: PROJECTS (Regenerable) │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ projects.db │ │
│ │ • projects - Registered project metadata │ │
│ │ • content_hashes - File change detection │ │
│ │ • project_embeddings - Semantic search vectors │ │
│ │ • exclude_patterns - Per-project ignore rules │ │
│ │ • project_tags, project_activity │ │
│ │ • Regenerable via: /cx --index-project <path> │ │
│ └─────────────────────────────────────────────────────────┘ │
│ Backup: OPTIONAL (regenerable from project source files) │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ TIER 1b: PLATFORM INDEX (Regenerable) │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ platform-index.db │ │
│ │ • component_embeddings - Vector embeddings (384-dim) │ │
│ │ • component_hashes - SHA256 change detection │ │
│ │ • embedding_models - Model metadata │ │
│ │ • Regenerable via: /cx --reindex-framework │ │
│ └─────────────────────────────────────────────────────────┘ │
│ Backup: NOT REQUIRED (regenerate from platform.db content) │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ TIER 3b: MESSAGING (Regenerable) │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ messaging.db (ADR-160) │ │
│ │ • session_registry - Active session tracking │ │
│ │ • inter_session_messages - Cross-session communication │ │
│ │ • file_locks - Distributed file locking │ │
│ │ • task_claims - Task ownership coordination │ │
│ │ • WAL mode enabled for concurrent access │ │
│ └─────────────────────────────────────────────────────────┘ │
│ Backup: NOT REQUIRED (ephemeral coordination data) │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ TIER 1c: CALL GRAPH (Regenerable) │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ call_graph.db │ │
│ │ • functions - Indexed function definitions │ │
│ │ • calls - Function call relationships │ │
│ │ • files - Source file metadata │ │
│ │ • Regenerable via: MCP call-graph index_directory │ │
│ └─────────────────────────────────────────────────────────┘ │
│ Backup: NOT REQUIRED (regenerate from source code) │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ LEGACY (Deprecated) │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ context.db ❌ DEPRECATED │ │
│ │ • Being migrated to org.db + sessions.db │ │
│ │ • DO NOT USE in new code │ │
│ │ • Will be removed after migration complete │ │
│ └─────────────────────────────────────────────────────────┘ │
│ Status: MIGRATION IN PROGRESS │
└─────────────────────────────────────────────────────────────────┘

Database Specifications

TierDatabaseSizeKey TablesBackupRecovery
1platform.db~40 MBcomponents (+component_origin), capabilitiesNoneRegenerate (~2 min)
1bplatform-index.db~100 MBcomponent_embeddings, component_hashesNoneRegenerate via /cx --reindex-framework
1ccall_graph.dbVariablefunctions, calls, filesNoneRegenerate via MCP index
2org.db~250 MBskill_learnings, decisions, error_solutionsCRITICALRestore from backup
3sessions.db~11 GBmessages, tool_analytics, token_economics, activity_associationsOptionalRegenerate via /cx
3bmessaging.db~1 MBsession_registry, inter_session_messages, file_locks, task_claimsNoneEphemeral (recreate)
4projects.dbVariableprojects, content_hashes, project_embeddings, exclude_patternsOptionalRegenerate via /cx --index-project
LEGACYcontext.db~12 GB(migrating)N/ADO NOT USE

Location

All databases stored in: ~/.coditect-data/context-storage/

(Per ADR-114 User Data Separation)

Import Pattern

from scripts.core.paths import (
get_org_db_path, # TIER 2: Critical knowledge
get_sessions_db_path, # TIER 3: Regenerable sessions
get_projects_db_path, # TIER 4: Project indexing
get_context_db_path, # LEGACY: Deprecated, do not use
)

# Correct usage
ORG_DB = get_org_db_path() # For decisions, learnings
SESSIONS_DB = get_sessions_db_path() # For messages, activities
PROJECTS_DB = get_projects_db_path() # For project metadata, embeddings

# Other databases (resolved from CODITECT_DATA_DIR/context-storage/)
PLATFORM_DB = CODITECT_DATA_DIR / 'context-storage' / 'platform.db'
PLATFORM_INDEX_DB = CODITECT_DATA_DIR / 'context-storage' / 'platform-index.db'
MESSAGING_DB = CODITECT_DATA_DIR / 'context-storage' / 'messaging.db'
CALL_GRAPH_DB = CODITECT_DATA_DIR / 'context-storage' / 'call_graph.db'

# WRONG - Never use context.db in new code
# CONTEXT_DB = get_context_db_path() # DEPRECATED

Component Origin Tracking (ADR-180)

The platform.db components table includes a component_origin column to distinguish the source of each component:

ValueSourceDescription
systemcore/agents/, core/commands/, etc.Built-in framework components
productproducts/{name}/Product-provided components
useruser-components/ (via _user symlinks)User-created components

The component-indexer.py populates this column by scanning all three locations.

Migration Status

TableSourceTargetStatus
componentscontext.dbplatform.db✅ Complete
capabilitiescontext.dbplatform.db✅ Complete
skill_learningscontext.dborg.db✅ Complete
decisionscontext.dborg.db✅ Complete
error_solutionscontext.dborg.db✅ Complete
messagescontext.dbsessions.db✅ Complete
tool_analyticscontext.dbsessions.db✅ Complete
token_economicscontext.dbsessions.db✅ Complete
activity_associations(new)sessions.db✅ Created (J.14)

Consequences

Positive

  1. Targeted Backups: Only backup critical data (org.db ~250MB vs 12GB)
  2. Fast Recovery: Regenerate Tier 1 and Tier 3 from source files
  3. Data Safety: Critical knowledge protected separately
  4. Smaller Footprint: Daily backups are 250MB not 12GB
  5. Clear Ownership: Know which database owns which data

Negative

  1. Migration Effort: One-time migration of existing data
  2. Code Updates: Scripts must use correct database
  3. Join Complexity: Cross-database queries not possible

Risks

RiskMitigation
Code uses wrong databaseDeprecation warnings, linting
Migration data lossPre-migration backup, validation
Split-brain stateTransaction boundaries at database level

Implementation

Phase 1: Schema Creation (Complete)

-- org.db tables (TIER 2)
CREATE TABLE skill_learnings (...);
CREATE TABLE decisions (...);
CREATE TABLE error_solutions (...);

-- sessions.db tables (TIER 3)
CREATE TABLE messages (...);
CREATE TABLE tool_analytics (...);
CREATE TABLE token_economics (...);
CREATE TABLE activity_associations (...); -- J.14

Phase 2: Data Migration (Complete)

# Migrate critical data to org.db
python3 scripts/migrate-to-org-db.py

# Migrate session data to sessions.db
python3 scripts/migrate-to-sessions-db.py

Phase 3: Code Updates (Complete)

All scripts updated to use correct tier-specific imports.

Phase 4: Deprecation (Complete)

get_context_db_path() emits deprecation warnings. context.db is no longer created by the install script.

Phase 5: Install Script Integration (ADR-180)

The install script (v3.0.0) creates all databases during installation:

Install StepDatabase(s)Method
Step 5org.db, sessions.db, messaging.dbinitialize_database() — schema only, no seed data
Step 13platform.dbinitialize_platform_database() — runs component-indexer.py
Step 14platform-index.dbinitialize_platform_index_database() — schema + default model config
Step 15projects.dbinitialize_projects_database() — schema only, projects registered interactively

Clean seeding (ADR-180 D7): All databases are created with schema only. No pre-populated decisions, messages, or personal data. The only non-schema inserts are:

  • schema_versions records (metadata about the schema itself)
  • embedding_models default record (model configuration)

call_graph.db is created on-demand by the MCP call-graph tools, not during installation.

Backup Strategy

Tier 2 (org.db) - CRITICAL

# Daily backup with GFS retention
# Grandfather (monthly): 12 months
# Father (weekly): 4 weeks
# Son (daily): 7 days

~/.coditect/scripts/backup-org-db.sh
# → gs://coditect-backups/org.db/daily/YYYY-MM-DD/

Tier 3 (sessions.db) - Optional

# Weekly backup (optional, regenerable)
~/.coditect/scripts/backup-sessions-db.sh
# → gs://coditect-backups/sessions.db/weekly/

Recovery Procedures

ScenarioRecovery
org.db corruptRestore from latest GCS backup
sessions.db corruptRegenerate: python3 scripts/unified-message-extractor.py --rebuild
messaging.db corruptDelete and re-run install (ephemeral coordination data)
platform.db corruptRegenerate: python3 scripts/component-indexer.py
platform-index.db corruptRegenerate: /cx --reindex-framework
call_graph.db corruptDelete and re-index via MCP call-graph tools
projects.db corruptRegenerate: python3 scripts/init_projects_db.py then re-index projects
  • ADR-080: Context Extraction System (/cx)
  • ADR-089: Component-Customer Data Separation (superseded)
  • ADR-103: Four-Database Architecture (superseded, never implemented)
  • ADR-114: User Data Separation from Framework
  • ADR-121: Path Discovery Consolidation
  • ADR-148: Database Schema Documentation Standard (extends with field-level docs)
  • ADR-149: Query Language Evolution Strategy (extends with query capabilities)

Changelog

DateChange
2026-01-28Initial version - Four-Tier architecture
2026-01-28Added activity_associations to sessions.db (J.14)
2026-01-28Added TIER 4: projects.db for project indexing (J.15.1.3)
2026-02-12ADR-180: Added messaging.db (Tier 3b), platform-index.db (Tier 1b), call_graph.db (Tier 1c)
2026-02-12ADR-180: Added component_origin column to platform.db (system/product/user)
2026-02-12ADR-180: Documented install script Phase 5 - clean database seeding
2026-02-12Updated Phase 3/4 status to Complete, added recovery procedures for all databases

Track: J.15 (Memory - Project Database) Task: J.15.1.3, N.6.15.2.3 (Install Script Update)