ADR-118: Four-Tier Database Architecture
Status
ACCEPTED (2026-01-28)
Context
Problem Statement
CODITECT's original database architecture evolved organically, resulting in a single monolithic context.db that contained:
- Component metadata (agents, skills, commands)
- Session messages (251K+ rows)
- Tool analytics (8.2M+ rows)
- Token economics (17.8M+ rows)
- Accumulated knowledge (decisions, learnings, error_solutions)
This caused several problems:
| Problem | Impact |
|---|---|
| Single point of failure | All data lost if context.db corrupts |
| Backup complexity | Can't prioritize critical vs regenerable data |
| Size bloat | 12+ GB makes backup slow and expensive |
| Framework updates | Risk overwriting customer data |
| Recovery difficulty | No way to rebuild without full backup |
Previous Decisions
- ADR-089: Separated component data into
platform.db(read-only) - ADR-103: Conceptualized four-database split but never fully implemented
Key Insight
Not all data has equal value:
| Data Type | Replaceability | Example |
|---|---|---|
| Irreplaceable | Cannot be regenerated | Decisions, learnings, error_solutions |
| Regenerable | Can rebuild from source | Messages (from JSONL), analytics |
| Derived | Computed from other data | Embeddings, indexes |
Decision
Four-Tier Database Architecture
Split data into four tiers based on criticality and regenerability:
┌─────────────────────────────────────────────────────────────────┐
│ TIER 1: PLATFORM (Regenerable) │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ platform.db │ │
│ │ • Component metadata (agents, skills, commands) │ │
│ │ • Capabilities, invocation patterns │ │
│ │ • Regenerable via: python3 scripts/component-indexer.py │ │
│ └─────────────────────────────────────────────────────────┘ │
│ Backup: NOT REQUIRED (regenerate in ~2 minutes) │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ TIER 2: ORGANIZATION (Critical) │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ org.db ⚠️ CRITICAL │ │
│ │ • skill_learnings (758K+ rows) - Accumulated improvements│ │
│ │ • decisions (1.8K+ rows) - Architectural decisions │ │
│ │ • error_solutions (475+ rows) - Problem-solution pairs │ │
│ │ • CANNOT BE REGENERATED - represents months of learning │ │
│ └─────────────────────────────────────────────────────────┘ │
│ Backup: DAILY with Grandfather-Father-Son (GFS) retention │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ TIER 3: SESSIONS (Regenerable) │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ sessions.db │ │
│ │ • messages (251K+ rows) - Extracted from JSONL │ │
│ │ • tool_analytics (8.2M+ rows) - Tool usage metrics │ │
│ │ • token_economics (17.8M+ rows) - Token tracking │ │
│ │ • activity_associations - Project/track mappings │ │
│ │ • Regenerable via: python3 scripts/unified-message- │ │
│ │ extractor.py (source: unified_messages.jsonl) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ Backup: OPTIONAL (can regenerate from unified_messages.jsonl) │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ TIER 4: PROJECTS (Regenerable) │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ projects.db │ │
│ │ • projects - Registered project metadata │ │
│ │ • content_hashes - File change detection │ │
│ │ • project_embeddings - Semantic search vectors │ │
│ │ • exclude_patterns - Per-project ignore rules │ │
│ │ • project_tags, project_activity │ │
│ │ • Regenerable via: /cx --index-project <path> │ │
│ └─────────────────────────────────────────────────────────┘ │
│ Backup: OPTIONAL (regenerable from project source files) │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ TIER 1b: PLATFORM INDEX (Regenerable) │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ platform-index.db │ │
│ │ • component_embeddings - Vector embeddings (384-dim) │ │
│ │ • component_hashes - SHA256 change detection │ │
│ │ • embedding_models - Model metadata │ │
│ │ • Regenerable via: /cx --reindex-framework │ │
│ └─────────────────────────────────────────────────────────┘ │
│ Backup: NOT REQUIRED (regenerate from platform.db content) │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ TIER 3b: MESSAGING (Regenerable) │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ messaging.db (ADR-160) │ │
│ │ • session_registry - Active session tracking │ │
│ │ • inter_session_messages - Cross-session communication │ │
│ │ • file_locks - Distributed file locking │ │
│ │ • task_claims - Task ownership coordination │ │
│ │ • WAL mode enabled for concurrent access │ │
│ └─────────────────────────────────────────────────────────┘ │
│ Backup: NOT REQUIRED (ephemeral coordination data) │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ TIER 1c: CALL GRAPH (Regenerable) │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ call_graph.db │ │
│ │ • functions - Indexed function definitions │ │
│ │ • calls - Function call relationships │ │
│ │ • files - Source file metadata │ │
│ │ • Regenerable via: MCP call-graph index_directory │ │
│ └─────────────────────────────────────────────────────────┘ │
│ Backup: NOT REQUIRED (regenerate from source code) │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ LEGACY (Deprecated) │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ context.db ❌ DEPRECATED │ │
│ │ • Being migrated to org.db + sessions.db │ │
│ │ • DO NOT USE in new code │ │
│ │ • Will be removed after migration complete │ │
│ └─────────────────────────────────────────────────────────┘ │
│ Status: MIGRATION IN PROGRESS │
└─────────────────────────────────────────────────────────────────┘
Database Specifications
| Tier | Database | Size | Key Tables | Backup | Recovery |
|---|---|---|---|---|---|
| 1 | platform.db | ~40 MB | components (+component_origin), capabilities | None | Regenerate (~2 min) |
| 1b | platform-index.db | ~100 MB | component_embeddings, component_hashes | None | Regenerate via /cx --reindex-framework |
| 1c | call_graph.db | Variable | functions, calls, files | None | Regenerate via MCP index |
| 2 | org.db | ~250 MB | skill_learnings, decisions, error_solutions | CRITICAL | Restore from backup |
| 3 | sessions.db | ~11 GB | messages, tool_analytics, token_economics, activity_associations | Optional | Regenerate via /cx |
| 3b | messaging.db | ~1 MB | session_registry, inter_session_messages, file_locks, task_claims | None | Ephemeral (recreate) |
| 4 | projects.db | Variable | projects, content_hashes, project_embeddings, exclude_patterns | Optional | Regenerate via /cx --index-project |
| LEGACY | context.db | ~12 GB | (migrating) | N/A | DO NOT USE |
Location
All databases stored in: ~/.coditect-data/context-storage/
(Per ADR-114 User Data Separation)
Import Pattern
from scripts.core.paths import (
get_org_db_path, # TIER 2: Critical knowledge
get_sessions_db_path, # TIER 3: Regenerable sessions
get_projects_db_path, # TIER 4: Project indexing
get_context_db_path, # LEGACY: Deprecated, do not use
)
# Correct usage
ORG_DB = get_org_db_path() # For decisions, learnings
SESSIONS_DB = get_sessions_db_path() # For messages, activities
PROJECTS_DB = get_projects_db_path() # For project metadata, embeddings
# Other databases (resolved from CODITECT_DATA_DIR/context-storage/)
PLATFORM_DB = CODITECT_DATA_DIR / 'context-storage' / 'platform.db'
PLATFORM_INDEX_DB = CODITECT_DATA_DIR / 'context-storage' / 'platform-index.db'
MESSAGING_DB = CODITECT_DATA_DIR / 'context-storage' / 'messaging.db'
CALL_GRAPH_DB = CODITECT_DATA_DIR / 'context-storage' / 'call_graph.db'
# WRONG - Never use context.db in new code
# CONTEXT_DB = get_context_db_path() # DEPRECATED
Component Origin Tracking (ADR-180)
The platform.db components table includes a component_origin column to distinguish
the source of each component:
| Value | Source | Description |
|---|---|---|
system | core/agents/, core/commands/, etc. | Built-in framework components |
product | products/{name}/ | Product-provided components |
user | user-components/ (via _user symlinks) | User-created components |
The component-indexer.py populates this column by scanning all three locations.
Migration Status
| Table | Source | Target | Status |
|---|---|---|---|
components | context.db | platform.db | ✅ Complete |
capabilities | context.db | platform.db | ✅ Complete |
skill_learnings | context.db | org.db | ✅ Complete |
decisions | context.db | org.db | ✅ Complete |
error_solutions | context.db | org.db | ✅ Complete |
messages | context.db | sessions.db | ✅ Complete |
tool_analytics | context.db | sessions.db | ✅ Complete |
token_economics | context.db | sessions.db | ✅ Complete |
activity_associations | (new) | sessions.db | ✅ Created (J.14) |
Consequences
Positive
- Targeted Backups: Only backup critical data (org.db ~250MB vs 12GB)
- Fast Recovery: Regenerate Tier 1 and Tier 3 from source files
- Data Safety: Critical knowledge protected separately
- Smaller Footprint: Daily backups are 250MB not 12GB
- Clear Ownership: Know which database owns which data
Negative
- Migration Effort: One-time migration of existing data
- Code Updates: Scripts must use correct database
- Join Complexity: Cross-database queries not possible
Risks
| Risk | Mitigation |
|---|---|
| Code uses wrong database | Deprecation warnings, linting |
| Migration data loss | Pre-migration backup, validation |
| Split-brain state | Transaction boundaries at database level |
Implementation
Phase 1: Schema Creation (Complete)
-- org.db tables (TIER 2)
CREATE TABLE skill_learnings (...);
CREATE TABLE decisions (...);
CREATE TABLE error_solutions (...);
-- sessions.db tables (TIER 3)
CREATE TABLE messages (...);
CREATE TABLE tool_analytics (...);
CREATE TABLE token_economics (...);
CREATE TABLE activity_associations (...); -- J.14
Phase 2: Data Migration (Complete)
# Migrate critical data to org.db
python3 scripts/migrate-to-org-db.py
# Migrate session data to sessions.db
python3 scripts/migrate-to-sessions-db.py
Phase 3: Code Updates (Complete)
All scripts updated to use correct tier-specific imports.
Phase 4: Deprecation (Complete)
get_context_db_path() emits deprecation warnings. context.db is no longer created by the install script.
Phase 5: Install Script Integration (ADR-180)
The install script (v3.0.0) creates all databases during installation:
| Install Step | Database(s) | Method |
|---|---|---|
| Step 5 | org.db, sessions.db, messaging.db | initialize_database() — schema only, no seed data |
| Step 13 | platform.db | initialize_platform_database() — runs component-indexer.py |
| Step 14 | platform-index.db | initialize_platform_index_database() — schema + default model config |
| Step 15 | projects.db | initialize_projects_database() — schema only, projects registered interactively |
Clean seeding (ADR-180 D7): All databases are created with schema only. No pre-populated decisions, messages, or personal data. The only non-schema inserts are:
schema_versionsrecords (metadata about the schema itself)embedding_modelsdefault record (model configuration)
call_graph.db is created on-demand by the MCP call-graph tools, not during installation.
Backup Strategy
Tier 2 (org.db) - CRITICAL
# Daily backup with GFS retention
# Grandfather (monthly): 12 months
# Father (weekly): 4 weeks
# Son (daily): 7 days
~/.coditect/scripts/backup-org-db.sh
# → gs://coditect-backups/org.db/daily/YYYY-MM-DD/
Tier 3 (sessions.db) - Optional
# Weekly backup (optional, regenerable)
~/.coditect/scripts/backup-sessions-db.sh
# → gs://coditect-backups/sessions.db/weekly/
Recovery Procedures
| Scenario | Recovery |
|---|---|
| org.db corrupt | Restore from latest GCS backup |
| sessions.db corrupt | Regenerate: python3 scripts/unified-message-extractor.py --rebuild |
| messaging.db corrupt | Delete and re-run install (ephemeral coordination data) |
| platform.db corrupt | Regenerate: python3 scripts/component-indexer.py |
| platform-index.db corrupt | Regenerate: /cx --reindex-framework |
| call_graph.db corrupt | Delete and re-index via MCP call-graph tools |
| projects.db corrupt | Regenerate: python3 scripts/init_projects_db.py then re-index projects |
Related
- ADR-080: Context Extraction System (
/cx) - ADR-089: Component-Customer Data Separation (superseded)
- ADR-103: Four-Database Architecture (superseded, never implemented)
- ADR-114: User Data Separation from Framework
- ADR-121: Path Discovery Consolidation
- ADR-148: Database Schema Documentation Standard (extends with field-level docs)
- ADR-149: Query Language Evolution Strategy (extends with query capabilities)
Changelog
| Date | Change |
|---|---|
| 2026-01-28 | Initial version - Four-Tier architecture |
| 2026-01-28 | Added activity_associations to sessions.db (J.14) |
| 2026-01-28 | Added TIER 4: projects.db for project indexing (J.15.1.3) |
| 2026-02-12 | ADR-180: Added messaging.db (Tier 3b), platform-index.db (Tier 1b), call_graph.db (Tier 1c) |
| 2026-02-12 | ADR-180: Added component_origin column to platform.db (system/product/user) |
| 2026-02-12 | ADR-180: Documented install script Phase 5 - clean database seeding |
| 2026-02-12 | Updated Phase 3/4 status to Complete, added recovery procedures for all databases |
Track: J.15 (Memory - Project Database) Task: J.15.1.3, N.6.15.2.3 (Install Script Update)