ADR-114: User Data Separation from Framework
Status
Accepted
Context
Problem Statement
Session logs, context storage, and machine-id.json currently live inside the coditect-core directory:
~/Library/Application Support/CODITECT/core/
├── agents/ # Framework code
├── skills/ # Framework code
├── scripts/ # Framework code
├── session-logs/ # ❌ User data mixed with framework
├── context-storage/ # ❌ User data mixed with framework
└── machine-id.json # ❌ User data mixed with framework
This causes several issues:
- Sync conflicts - When syncing framework from GitHub, user data must be carefully preserved
- Git pollution - Session logs were accidentally committed to GitHub
- Atomic updates impossible - Can't simply replace
core/directory without losing user data - Unclear ownership - Framework updates might overwrite user data
Root Cause
ADR-057 (Initial Setup) placed all CODITECT files in a single directory for simplicity. ADR-058 (Machine-Specific Session Logs) added machine-specific data but kept it in the same location.
Requirements
- Clean separation - Framework code separate from user data
- Atomic framework updates - Replace entire
core/without affecting user data - Cross-platform - Works on macOS, Linux, Windows
- Backward compatible - Migrate existing installations
- Simple symlinks -
~/.coditectshould still work for framework access
Decision
Separate user data into a dedicated directory in PROJECTS, completely independent from the protected framework:
~/Library/Application Support/CODITECT/
└── core/ # Framework ONLY (synced from GitHub)
├── agents/
├── commands/
├── skills/
├── scripts/
├── hooks/
├── config/
└── ...
~/PROJECTS/
├── .coditect/ → ~/Library/.../CODITECT/core/ (symlink)
├── .coditect-data/ # User data (REAL directory, NOT symlink)
│ ├── machine-id.json # Machine identification
│ ├── session-logs/ → ../coditect-rollout-master/docs/session-logs (for /sync-logs)
│ ├── context-storage/ # SQLite databases, JSONL files (CRITICAL)
│ │ ├── context.db # Customer sessions, messages, decisions
│ │ ├── platform.db # Component index
│ │ ├── unified_messages.jsonl
│ │ └── exports-*/ # Export archives
│ └── backups/ # Local backup staging
Why PROJECTS instead of Library/Application Support?
| Concern | Library/.../CODITECT/data/ | ~/PROJECTS/.coditect-data/ |
|---|---|---|
| Isolation from framework | Same parent dir as core/ | Completely separate |
| Backup visibility | Hidden in Library | Visible in PROJECTS |
| Corruption risk | Could be affected by core/ operations | Zero risk |
| User ownership | System-level feel | User-level, clearly owned |
| GCS backup path | Complex path | Simple: ~/PROJECTS/.coditect-data/ |
Note: Session logs are synced separately via
/sync-logscommand to GitHub, but they are NOT part of the framework sync (git-push-sync). This keeps framework updates clean while still preserving session log history.
CRITICAL: Context storage contains irreplaceable customer data. It MUST be backed up to GCS regularly and MUST NOT be affected by any framework operations.
### Platform Paths
| Platform | Framework | User Data |
|----------|-----------|-----------|
| **macOS** | `~/Library/Application Support/CODITECT/core/` | `$CODITECT_PROJECTS/.coditect-data/` |
| **Linux** | `~/.local/share/coditect/core/` | `$CODITECT_PROJECTS/.coditect-data/` |
| **Windows** | `%LOCALAPPDATA%\CODITECT\core\` | `%CODITECT_PROJECTS%\.coditect-data\` |
### Configurable PROJECTS Location
CODITECT customers can choose their own PROJECTS directory location. The user data location
is discovered dynamically rather than hardcoded to `~/PROJECTS`.
**Discovery Priority:**
1. **Environment variable** (highest priority): `$CODITECT_PROJECTS`
2. **Config file**: `~/.coditect/config/config.json` → `projects_dir`
3. **Symlink discovery**: Find parent directory of existing `.coditect` symlink
4. **Default fallback**: `~/PROJECTS`
**Discovery Algorithm (Python):**
```python
def discover_projects_dir() -> Path:
"""Discover the PROJECTS directory location."""
# 1. Environment variable (highest priority)
if env_projects := os.environ.get("CODITECT_PROJECTS"):
return Path(env_projects).expanduser()
# 2. Config file
config_path = HOME / ".coditect" / "config" / "config.json"
if config_path.exists():
with open(config_path) as f:
config = json.load(f)
if projects_dir := config.get("projects_dir"):
return Path(projects_dir).expanduser()
# 3. Symlink discovery - find where .coditect symlink lives
for candidate in [HOME / "PROJECTS", HOME / "projects", HOME / "Dev",
HOME / "dev", HOME / "Development", HOME / "Code"]:
if (candidate / ".coditect").is_symlink():
return candidate
# 4. Default fallback
return HOME / "PROJECTS"
Configuration Example (~/.coditect/config/config.json):
{
"projects_dir": "~/Development",
"cloud_sync": {
"enabled": true,
"api_url": "https://api.coditect.ai"
}
}
Impact on Architecture:
| Concern | Impact | Mitigation |
|---|---|---|
| Path hardcoding | Scripts can't assume ~/PROJECTS | All scripts use discover_projects_dir() |
| Migration | Existing installs may vary | Migration script discovers, doesn't assume |
| GCS backups | Backup path varies | Backup script reads same config |
| Symlink creation | Parent dir varies | Initial setup prompts for PROJECTS location |
| Documentation | Examples may confuse | Use $CODITECT_PROJECTS placeholder |
Shared Module: All scripts import the discovery function from a shared module:
from scripts.core.paths import discover_projects_dir, get_user_data_dir
PROJECTS_DIR = discover_projects_dir()
USER_DATA_LOC = get_user_data_dir() # PROJECTS_DIR / ".coditect-data"
Directory Architecture
# Framework (protected, synced from GitHub)
~/.coditect → ~/Library/.../CODITECT/core/ # Backward compat
~/PROJECTS/.coditect → ~/Library/.../CODITECT/core/ # Primary access
# User Data (user-owned, backed up to GCS)
~/PROJECTS/.coditect-data/ # REAL directory (not symlink)
~/.coditect-data → ~/PROJECTS/.coditect-data/ # Backward compat
Migration Steps
- Detect existing data in
core/ - Create
data/directory if not exists - Move files atomically:
core/machine-id.json→data/machine-id.jsoncore/session-logs/→data/session-logs/core/context-storage/→data/context-storage/
- Create compatibility symlinks in
core/pointing todata/(temporary) - Update references in scripts to use new paths
- Remove symlinks after transition period
Code Changes
Path constants (Python):
# Before (all in protected location)
SESSION_LOGS = PROTECTED_LOC / "session-logs"
CONTEXT_STORAGE = PROTECTED_LOC / "context-storage"
MACHINE_ID = PROTECTED_LOC / "machine-id.json"
# After (user data in PROJECTS, framework in protected)
PROJECTS_DIR = HOME / "PROJECTS"
USER_DATA_LOC = PROJECTS_DIR / ".coditect-data"
# Framework (synced from GitHub)
if sys.platform == "darwin":
FRAMEWORK_LOC = HOME / "Library" / "Application Support" / "CODITECT" / "core"
elif sys.platform == "win32":
FRAMEWORK_LOC = Path(os.environ.get("LOCALAPPDATA")) / "CODITECT" / "core"
else:
FRAMEWORK_LOC = HOME / ".local" / "share" / "coditect" / "core"
# User data (in PROJECTS, backed up to GCS)
SESSION_LOGS = USER_DATA_LOC / "session-logs"
CONTEXT_STORAGE = USER_DATA_LOC / "context-storage"
MACHINE_ID = USER_DATA_LOC / "machine-id.json"
Framework sync simplification:
# Before: Complex preservation logic
def atomic_sync():
# Clone new
# Preserve session-logs
# Preserve context-storage
# Preserve machine-id.json
# Atomic swap
# Restore preserved files
# After: Simple replacement
def atomic_sync():
# Clone new
# Remove .git
# Atomic swap core/
# Done - user data untouched in data/
Consequences
Positive
- Clean separation - Framework and user data never mixed
- Simple sync - Just replace
core/directory - No preservation logic - User data stays in
data/ - Atomic updates - Swap
core/without risk to user data - Clear ownership - Framework team owns
core/, user ownsdata/
Negative
- Migration required - Existing installations need update
- Two directories - Slightly more complex structure
- Symlink updates - Some scripts may need path updates
Risks
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Migration fails | Low | Data loss | Backup before migration |
| Scripts use old paths | Medium | Errors | Compatibility symlinks |
| User confusion | Low | Support tickets | Clear documentation |
Implementation
Phase 1: Create Migration Script (This Session)
- Create ADR-114
- Create
scripts/migrate-user-data.py - Update
CODITECT-CORE-INITIAL-SETUP.py
Phase 2: Update Scripts
- Update
git-push-sync.pyto use new paths - Update
framework-sync.pyto use new paths - Update
unified-message-extractor.py - Update context watcher paths
Phase 3: Documentation
- Update CLAUDE.md with new paths
- Update ADR-057 with reference to ADR-114
- Update ADR-058 with reference to ADR-114
Glossary
| Term | Definition |
|---|---|
| Framework | CODITECT core code: agents, skills, commands, scripts (synced from GitHub) |
| User Data | Machine-specific files: session-logs, context-storage, machine-id.json |
| Protected Installation | Read-only framework at platform-specific location |
| Atomic Swap | Replace entire directory in single operation |
| Compatibility Symlinks | Temporary symlinks in old location pointing to new location |
References
- ADR-057: CODITECT Core Initial Setup
- ADR-058: Machine-Specific Session Logs
- ADR-113: Post-Push Protected Installation Sync
ADR-114 | Created: 2026-01-25 | Status: Accepted