Skip to main content

ADR-114: User Data Separation from Framework

Status

Accepted

Context

Problem Statement

Session logs, context storage, and machine-id.json currently live inside the coditect-core directory:

~/Library/Application Support/CODITECT/core/
├── agents/ # Framework code
├── skills/ # Framework code
├── scripts/ # Framework code
├── session-logs/ # ❌ User data mixed with framework
├── context-storage/ # ❌ User data mixed with framework
└── machine-id.json # ❌ User data mixed with framework

This causes several issues:

  1. Sync conflicts - When syncing framework from GitHub, user data must be carefully preserved
  2. Git pollution - Session logs were accidentally committed to GitHub
  3. Atomic updates impossible - Can't simply replace core/ directory without losing user data
  4. Unclear ownership - Framework updates might overwrite user data

Root Cause

ADR-057 (Initial Setup) placed all CODITECT files in a single directory for simplicity. ADR-058 (Machine-Specific Session Logs) added machine-specific data but kept it in the same location.

Requirements

  1. Clean separation - Framework code separate from user data
  2. Atomic framework updates - Replace entire core/ without affecting user data
  3. Cross-platform - Works on macOS, Linux, Windows
  4. Backward compatible - Migrate existing installations
  5. Simple symlinks - ~/.coditect should still work for framework access

Decision

Separate user data into a dedicated directory in PROJECTS, completely independent from the protected framework:

~/Library/Application Support/CODITECT/
└── core/ # Framework ONLY (synced from GitHub)
├── agents/
├── commands/
├── skills/
├── scripts/
├── hooks/
├── config/
└── ...

~/PROJECTS/
├── .coditect/ → ~/Library/.../CODITECT/core/ (symlink)
├── .coditect-data/ # User data (REAL directory, NOT symlink)
│ ├── machine-id.json # Machine identification
│ ├── session-logs/ → ../coditect-rollout-master/docs/session-logs (for /sync-logs)
│ ├── context-storage/ # SQLite databases, JSONL files (CRITICAL)
│ │ ├── context.db # Customer sessions, messages, decisions
│ │ ├── platform.db # Component index
│ │ ├── unified_messages.jsonl
│ │ └── exports-*/ # Export archives
│ └── backups/ # Local backup staging

Why PROJECTS instead of Library/Application Support?

ConcernLibrary/.../CODITECT/data/~/PROJECTS/.coditect-data/
Isolation from frameworkSame parent dir as core/Completely separate
Backup visibilityHidden in LibraryVisible in PROJECTS
Corruption riskCould be affected by core/ operationsZero risk
User ownershipSystem-level feelUser-level, clearly owned
GCS backup pathComplex pathSimple: ~/PROJECTS/.coditect-data/

Note: Session logs are synced separately via /sync-logs command to GitHub, but they are NOT part of the framework sync (git-push-sync). This keeps framework updates clean while still preserving session log history.

CRITICAL: Context storage contains irreplaceable customer data. It MUST be backed up to GCS regularly and MUST NOT be affected by any framework operations.


### Platform Paths

| Platform | Framework | User Data |
|----------|-----------|-----------|
| **macOS** | `~/Library/Application Support/CODITECT/core/` | `$CODITECT_PROJECTS/.coditect-data/` |
| **Linux** | `~/.local/share/coditect/core/` | `$CODITECT_PROJECTS/.coditect-data/` |
| **Windows** | `%LOCALAPPDATA%\CODITECT\core\` | `%CODITECT_PROJECTS%\.coditect-data\` |

### Configurable PROJECTS Location

CODITECT customers can choose their own PROJECTS directory location. The user data location
is discovered dynamically rather than hardcoded to `~/PROJECTS`.

**Discovery Priority:**

1. **Environment variable** (highest priority): `$CODITECT_PROJECTS`
2. **Config file**: `~/.coditect/config/config.json` → `projects_dir`
3. **Symlink discovery**: Find parent directory of existing `.coditect` symlink
4. **Default fallback**: `~/PROJECTS`

**Discovery Algorithm (Python):**
```python
def discover_projects_dir() -> Path:
"""Discover the PROJECTS directory location."""
# 1. Environment variable (highest priority)
if env_projects := os.environ.get("CODITECT_PROJECTS"):
return Path(env_projects).expanduser()

# 2. Config file
config_path = HOME / ".coditect" / "config" / "config.json"
if config_path.exists():
with open(config_path) as f:
config = json.load(f)
if projects_dir := config.get("projects_dir"):
return Path(projects_dir).expanduser()

# 3. Symlink discovery - find where .coditect symlink lives
for candidate in [HOME / "PROJECTS", HOME / "projects", HOME / "Dev",
HOME / "dev", HOME / "Development", HOME / "Code"]:
if (candidate / ".coditect").is_symlink():
return candidate

# 4. Default fallback
return HOME / "PROJECTS"

Configuration Example (~/.coditect/config/config.json):

{
"projects_dir": "~/Development",
"cloud_sync": {
"enabled": true,
"api_url": "https://api.coditect.ai"
}
}

Impact on Architecture:

ConcernImpactMitigation
Path hardcodingScripts can't assume ~/PROJECTSAll scripts use discover_projects_dir()
MigrationExisting installs may varyMigration script discovers, doesn't assume
GCS backupsBackup path variesBackup script reads same config
Symlink creationParent dir variesInitial setup prompts for PROJECTS location
DocumentationExamples may confuseUse $CODITECT_PROJECTS placeholder

Shared Module: All scripts import the discovery function from a shared module:

from scripts.core.paths import discover_projects_dir, get_user_data_dir

PROJECTS_DIR = discover_projects_dir()
USER_DATA_LOC = get_user_data_dir() # PROJECTS_DIR / ".coditect-data"

Directory Architecture

# Framework (protected, synced from GitHub)
~/.coditect → ~/Library/.../CODITECT/core/ # Backward compat
~/PROJECTS/.coditect → ~/Library/.../CODITECT/core/ # Primary access

# User Data (user-owned, backed up to GCS)
~/PROJECTS/.coditect-data/ # REAL directory (not symlink)
~/.coditect-data → ~/PROJECTS/.coditect-data/ # Backward compat

Migration Steps

  1. Detect existing data in core/
  2. Create data/ directory if not exists
  3. Move files atomically:
    • core/machine-id.jsondata/machine-id.json
    • core/session-logs/data/session-logs/
    • core/context-storage/data/context-storage/
  4. Create compatibility symlinks in core/ pointing to data/ (temporary)
  5. Update references in scripts to use new paths
  6. Remove symlinks after transition period

Code Changes

Path constants (Python):

# Before (all in protected location)
SESSION_LOGS = PROTECTED_LOC / "session-logs"
CONTEXT_STORAGE = PROTECTED_LOC / "context-storage"
MACHINE_ID = PROTECTED_LOC / "machine-id.json"

# After (user data in PROJECTS, framework in protected)
PROJECTS_DIR = HOME / "PROJECTS"
USER_DATA_LOC = PROJECTS_DIR / ".coditect-data"

# Framework (synced from GitHub)
if sys.platform == "darwin":
FRAMEWORK_LOC = HOME / "Library" / "Application Support" / "CODITECT" / "core"
elif sys.platform == "win32":
FRAMEWORK_LOC = Path(os.environ.get("LOCALAPPDATA")) / "CODITECT" / "core"
else:
FRAMEWORK_LOC = HOME / ".local" / "share" / "coditect" / "core"

# User data (in PROJECTS, backed up to GCS)
SESSION_LOGS = USER_DATA_LOC / "session-logs"
CONTEXT_STORAGE = USER_DATA_LOC / "context-storage"
MACHINE_ID = USER_DATA_LOC / "machine-id.json"

Framework sync simplification:

# Before: Complex preservation logic
def atomic_sync():
# Clone new
# Preserve session-logs
# Preserve context-storage
# Preserve machine-id.json
# Atomic swap
# Restore preserved files

# After: Simple replacement
def atomic_sync():
# Clone new
# Remove .git
# Atomic swap core/
# Done - user data untouched in data/

Consequences

Positive

  1. Clean separation - Framework and user data never mixed
  2. Simple sync - Just replace core/ directory
  3. No preservation logic - User data stays in data/
  4. Atomic updates - Swap core/ without risk to user data
  5. Clear ownership - Framework team owns core/, user owns data/

Negative

  1. Migration required - Existing installations need update
  2. Two directories - Slightly more complex structure
  3. Symlink updates - Some scripts may need path updates

Risks

RiskLikelihoodImpactMitigation
Migration failsLowData lossBackup before migration
Scripts use old pathsMediumErrorsCompatibility symlinks
User confusionLowSupport ticketsClear documentation

Implementation

Phase 1: Create Migration Script (This Session)

  • Create ADR-114
  • Create scripts/migrate-user-data.py
  • Update CODITECT-CORE-INITIAL-SETUP.py

Phase 2: Update Scripts

  • Update git-push-sync.py to use new paths
  • Update framework-sync.py to use new paths
  • Update unified-message-extractor.py
  • Update context watcher paths

Phase 3: Documentation

  • Update CLAUDE.md with new paths
  • Update ADR-057 with reference to ADR-114
  • Update ADR-058 with reference to ADR-114

Glossary

TermDefinition
FrameworkCODITECT core code: agents, skills, commands, scripts (synced from GitHub)
User DataMachine-specific files: session-logs, context-storage, machine-id.json
Protected InstallationRead-only framework at platform-specific location
Atomic SwapReplace entire directory in single operation
Compatibility SymlinksTemporary symlinks in old location pointing to new location

References


ADR-114 | Created: 2026-01-25 | Status: Accepted