ADR-107: Comprehensive Backup Strategy
Status
Accepted
Context
CODITECT generates and stores critical user data across multiple locations:
- Context Database (
context.db) - 10GB+ of extracted session knowledge - Unified Messages (
unified_messages.jsonl) - 1.4GB+ of message history - Claude Session Data (
~/.claude/projects/) - 3GB+ of session transcripts - Configuration Files - Settings, statusline config
- Hooks - Custom automation scripts
Previously, the backup script (backup-context-db.sh) only backed up:
- context.db, unified_messages.jsonl (context data)
- settings.json, settings.local.json, statusline-config.json (config)
This left critical data unprotected:
~/.claude/projects/- 3.2GB of Claude Code session transcripts (20,000+ files)~/.claude/todos/,plans/,history.jsonl- Task and execution datahooks/- Custom hook scripts that enforce governance
Additionally, the backup script had a location bug - it backed up the protected installation's context (133MB) instead of the development copy's context (10GB) for contributors.
Decision
1. Expand Backup Scope
The backup script now includes all critical data:
BACKED UP FILES:
Context storage (auto-detected location):
- context.db (~10GB)
- unified_messages.jsonl (~1.4GB)
- unified_hashes.json, unified_stats.json
Claude config (~/.claude/):
- settings.json, settings.local.json
- statusline-config.json, statusline-command.sh
Claude session data (~/.claude/) [NEW]:
- projects/ (3GB+ session transcripts - CRITICAL)
- todos/ (task lists)
- plans/ (execution plans)
- history.jsonl (command history)
Framework:
- hooks/ (all hook scripts) [NEW]
2. Smart Context Location Detection
For contributors who have both a development copy and protected installation, the script automatically finds the correct data:
# Priority: larger context.db wins (that's the real user data)
if [[ "$DEV_SIZE" -gt "$PROTECTED_SIZE" ]]; then
CONTEXT_DIR="$DEV_CONTEXT_DIR"
CONTEXT_SOURCE="development copy"
else
CONTEXT_DIR="$PROTECTED_CONTEXT_DIR"
CONTEXT_SOURCE="protected installation"
fi
3. Compression Strategy
| Data Type | Method | Typical Compression |
|---|---|---|
| context.db | SQLite snapshot + gzip | 75-80% |
| .jsonl files | gzip | 85-90% |
| Directories | tar + gzip | 80-85% |
Example compression results:
projects/(3.2GB) → 594MB (81% reduction)context.db(10GB) → ~2.5GB (75% reduction)
4. GCS Bucket Structure
gs://PROJECT_ID-context-backups/
└── coditect-core/
├── LATEST # Marker file with latest timestamp
└── YYYY-MM-DD/
└── HH-MM-SS/
├── context.db.gz
├── unified_messages.jsonl.gz
├── unified_hashes.json.gz
├── unified_stats.json.gz
├── claude-config/
│ ├── settings.json.gz
│ ├── settings.local.json.gz
│ ├── statusline-config.json.gz
│ └── statusline-command.sh.gz
├── claude-data/
│ ├── projects.tar.gz # CRITICAL - session transcripts
│ ├── todos.tar.gz
│ ├── plans.tar.gz
│ └── history.jsonl.gz
└── hooks.tar.gz
5. Backup Commands
# Full backup (all data)
~/.coditect/scripts/backup-context-db.sh
# Dry run (preview)
~/.coditect/scripts/backup-context-db.sh --dry-run
# Status check
~/.coditect/scripts/backup-context-db.sh --status
# Restore from backup
~/.coditect/scripts/backup-context-db.sh --restore latest
~/.coditect/scripts/backup-context-db.sh --restore 2026-01-24
6. Retention Policy (GFS)
Implements Grandfather-Father-Son (GFS) backup rotation:
| Tier | Description | Retention | Backups Kept | Est. Size |
|---|---|---|---|---|
| Son (Daily) | Most recent recovery points | 7 days | 7 | ~23 GB |
| Father (Weekly) | Last backup of each week | 4 weeks | 4 | ~13 GB |
| Grandfather (Monthly) | Last backup of each month | 12 months | 12 | ~40 GB |
| Total | ~23 | ~76 GB |
Comparison to flat retention:
- Previous: 90 days flat = 90 backups = ~300 GB
- GFS: 7 + 4 + 12 = 23 backups = ~76 GB (75% storage savings)
GCS Lifecycle Rules:
{
"lifecycle": {
"rule": [
{"action": {"type": "Delete"}, "condition": {"age": 7, "matchesPrefix": ["coditect-core/daily/"]}},
{"action": {"type": "Delete"}, "condition": {"age": 28, "matchesPrefix": ["coditect-core/weekly/"]}},
{"action": {"type": "Delete"}, "condition": {"age": 365, "matchesPrefix": ["coditect-core/monthly/"]}}
]
}
}
Bucket Structure (Updated):
gs://PROJECT_ID-context-backups/
└── coditect-core/
├── daily/YYYY-MM-DD/HH-MM-SS/... # Kept 7 days
├── weekly/YYYY-WW/... # Kept 4 weeks
└── monthly/YYYY-MM/... # Kept 12 months
Promotion Logic:
- Last daily backup of the week → promoted to weekly
- Last weekly backup of the month → promoted to monthly
Local backups before restore: Preserved in .restore-backup-TIMESTAMP/
Consequences
Positive
- Complete data protection - All critical user data is now backed up
- Automatic location detection - Works for both contributors and customers
- Efficient storage - 75-85% compression reduces storage costs
- Quick recovery - Full restore capability from any backup point
- Hooks preserved - Governance enforcement survives reinstallation
Negative
- Larger backup size - Full backup is now ~4GB compressed vs ~50MB before
- Longer backup time - 3-5 minutes for full backup vs 30 seconds before
- GCS costs - More storage used (~4GB × 90 days = ~360GB)
Mitigations
- Incremental backups - Future enhancement: only backup changed files
- GFS tiered retention - ✅ IMPLEMENTED: Daily 7d, Weekly 4w, Monthly 12m
- Selective restore - Can restore individual components (context, sessions, hooks)
Outstanding Gaps (MoE Analysis 2026-01-25)
-
P0 - Data loss window: 24-hour backup frequency means up to 24 hours of work could be lost
- Mitigation: Add mid-day backup (12:00 PM) for critical periods
- Alternative: Implement continuous sync for high-value tables (skill_learnings, decisions)
-
P1 - Backup failure alerting: No notification when backups fail
- Mitigation: Add Slack/email webhook on backup failure
- Script exit code should trigger launchd failure notification
-
P2 - Redundant data:
unified_hashes.json(2 MB) is derivable from JSONL- Recommendation: Exclude from backup, regenerate on restore
Implementation
Files Modified
scripts/backup-context-db.sh- Main backup script- Added
CLAUDE_BACKUP_DIRSandCLAUDE_BACKUP_FILESarrays - Added hooks directory backup
- Added smart context location detection
- Updated
prepare_backup_files()for directory archiving - Updated
run_backup()for new uploads - Updated
restore_backup()for directory extraction - Updated
show_status()to display all backup targets
- Added
Commit History
8aa833af- feat(backup): Add Claude session data and hooks to GCS backup1c1bf140- fix(backup): Find correct context-storage location automatically
Related ADRs
- ADR-053 - Cloud Context Sync Architecture
- ADR-057 - Initial Setup Architecture
- ADR-080 - Two-Database Separation
Notes
Backup Size Estimates
| Component | Uncompressed | Compressed |
|---|---|---|
| context.db | 10GB | ~2.5GB |
| unified_messages.jsonl | 1.4GB | ~200MB |
| projects/ | 3.2GB | ~600MB |
| hooks/ | 10MB | ~2MB |
| Config files | 50KB | ~10KB |
| Total | ~15GB | ~3.3GB |
Recommended Backup Schedule
- Manual: Before major changes, after significant work sessions
- Automated: Daily via cron or launchd (future enhancement)
# Example crontab entry (future)
0 3 * * * ~/.coditect/scripts/backup-context-db.sh >> ~/.coditect/logs/backup.log 2>&1