Skip to main content

ADR-004: Symlink Resolution Strategy

Status: Accepted Date: 2025-11-30 Deciders: Architecture Team, Product Team Tags: licensing, symlinks, session-id, fairness


Context

CODITECT-CORE is distributed as a git submodule that uses symlink chains for distributed intelligence across multiple projects and submodules. This creates a licensing challenge: how do we fairly count sessions when the same physical .coditect/ directory is accessed via multiple symlink paths?

Typical Project Structure:

coditect-rollout-master/               # Master project
├── .coditect/ # Physical directory (git submodule)
│ ├── scripts/init.sh # License enforcement point
│ ├── agents/ # 52 specialized agents
│ └── sdk/license_client.py # License SDK
├── .claude -> .coditect # Symlink for Claude Code

├── submodules/cloud/coditect-cloud-backend/
│ ├── .coditect -> ../../.coditect # Symlink to parent
│ └── .claude -> .coditect # Symlink chain

├── submodules/cloud/coditect-cloud-frontend/
│ ├── .coditect -> ../../.coditect # SAME RESOLVED PATH
│ └── .claude -> .coditect

└── submodules/core/coditect-core/
├── .coditect -> ../../.coditect # SAME RESOLVED PATH
└── .claude -> .coditect

The Licensing Dilemma

Scenario 1: Naive Path-Based Session ID

# BAD: Using CWD or symlink path
session_id = hash(os.getcwd()) # Different for each submodule

Result:
- Parent: cwd=/Users/dev/coditect-rollout-master → session_1
- Backend: cwd=/Users/dev/.../coditect-cloud-backend → session_2
- Frontend: cwd=/Users/dev/.../coditect-cloud-frontend → session_3

Problem: Same developer, same project, 3 license seats consumed ❌

Scenario 2: Physical Path Resolution

# GOOD: Resolve symlinks to physical path
coditect_path = os.path.realpath('.coditect')
session_id = hash(coditect_path + project_root)

Result:
- Parent: realpath=/Users/dev/coditect-rollout-master/.coditect → session_1
- Backend: realpath=/Users/dev/coditect-rollout-master/.coditect → session_1
- Frontend: realpath=/Users/dev/coditect-rollout-master/.coditect → session_1

Solution: Same developer, same project, 1 license seat consumed ✅

Business Requirements

Fair Pricing:

  • 1 developer working on 1 project = 1 license seat
  • Multiple symlinks to same physical .coditect/ = 1 session
  • Multiple projects (different git repositories) = different sessions

Prevent Abuse:

  • Copying .coditect/ to bypass licensing = NOT allowed
  • Sharing license across different hardware = NOT allowed
  • Same hardware, different projects = separate sessions (fair)

Technical Constraints:

  • Must work on Linux, macOS, Windows (cross-platform)
  • Must work in Docker containers (volume mounts)
  • Must handle broken symlinks gracefully (don't crash)
  • Must be deterministic (same input = same session_id)

User Experience Requirements

Developer Expectations:

Scenario A: Monorepo with Submodules
- Project: coditect-rollout-master (master + 46 submodules)
- Expected Seats: 1 (all symlinks point to same .coditect)
- Developer Expectation: "I'm working on ONE project"

Scenario B: Multiple Independent Projects
- Project 1: client-app-1 (with .coditect)
- Project 2: client-app-2 (with .coditect, different git repo)
- Expected Seats: 2 (different projects, different .coditect copies)
- Developer Expectation: "I have TWO separate projects"

Scenario C: Docker Volume Mount
- Host: /Users/dev/project/.coditect
- Container: /app/.coditect (mounted from host)
- Expected Seats: 1 (same physical directory)
- Developer Expectation: "Docker is just a runtime environment"

Decision

We will use os.path.realpath() to resolve symlinks to their physical paths when generating session IDs.

Session ID generation will combine:

  1. Resolved .coditect/ path - Handles symlinks correctly
  2. Project root (git repository root) - Distinguishes separate projects
  3. Hardware fingerprint - Prevents cross-hardware sharing
  4. User email - Tracks who's using the license

Session ID Generation Algorithm

Implementation:

import os
import hashlib
import json
import subprocess

def generate_session_id():
"""
Generate stable session ID that treats symlinks fairly.

Returns:
str: SHA256 hash representing unique session
"""

# Step 1: Resolve .coditect/ symlinks to physical path
coditect_path = os.path.realpath('.coditect')

# Explanation:
# - os.path.realpath() follows ALL symlinks to final physical path
# - Parent project: '.coditect' → /Users/dev/master/.coditect
# - Submodule: '../../.coditect' → /Users/dev/master/.coditect (SAME)
# - Result: Both resolve to identical path

# Step 2: Get project root (git repository root)
try:
project_root = subprocess.check_output(
['git', 'rev-parse', '--show-toplevel'],
cwd=os.getcwd(),
stderr=subprocess.DEVNULL
).decode().strip()
except:
# Fallback: Use current working directory if not in git repo
project_root = os.getcwd()

# Explanation:
# - Git repository root distinguishes separate projects
# - Master: /Users/dev/coditect-rollout-master
# - Client 1: /Users/dev/client-app-1 (different root)
# - Client 2: /Users/dev/client-app-2 (different root)

# Step 3: Get hardware fingerprint
hardware_id = get_hardware_id() # MAC + CPU + machine UUID

# Explanation:
# - Prevents license sharing across different machines
# - Same developer, same laptop, different projects → different sessions OK
# - Different developers, different laptops → different hardware_id

# Step 4: Get user email (from git config)
try:
user_email = subprocess.check_output(
['git', 'config', 'user.email'],
stderr=subprocess.DEVNULL
).decode().strip()
except:
user_email = os.getenv('USER', 'unknown') + '@localhost'

# Step 5: Combine all factors
session_data = {
'coditect_path': coditect_path, # Resolved physical path
'project_root': project_root, # Git repo root
'hardware_id': hardware_id, # Hardware fingerprint
'user_email': user_email, # User identification
'coditect_version': get_version(), # Framework version
'usage_type': 'builder' # 'builder' or 'runtime'
}

# Step 6: Generate stable hash
session_id = hashlib.sha256(
json.dumps(session_data, sort_keys=True).encode()
).hexdigest()

return session_id


def get_hardware_id():
"""
Generate hardware fingerprint (cross-platform).

Returns:
str: SHA256 hash of hardware identifiers
"""
import uuid
import platform

# MAC address (primary network interface)
mac = ':'.join(['{:02x}'.format((uuid.getnode() >> ele) & 0xff)
for ele in range(0, 8*6, 8)][::-1])

# CPU info (platform-specific)
if platform.system() == 'Darwin': # macOS
cpu_info = subprocess.check_output(
['sysctl', '-n', 'machdep.cpu.brand_string']
).decode().strip()
elif platform.system() == 'Linux':
with open('/proc/cpuinfo', 'r') as f:
cpu_lines = [line for line in f if 'model name' in line]
cpu_info = cpu_lines[0].split(':')[1].strip() if cpu_lines else 'unknown'
else: # Windows or other
cpu_info = platform.processor()

# Machine UUID
machine_uuid = str(uuid.UUID(int=uuid.getnode()))

# Combine and hash
hardware_data = f"{mac}:{cpu_info}:{machine_uuid}"
hardware_id = hashlib.sha256(hardware_data.encode()).hexdigest()[:32]

return hardware_id


def get_version():
"""Get CODITECT version from .coditect/VERSION."""
try:
with open('.coditect/VERSION', 'r') as f:
return f.read().strip()
except:
return '1.0.0'

Verification Test Cases

Test Case 1: Symlink Chain (Expected: Same Session ID)

import os
import tempfile

def test_symlink_resolution():
"""Verify symlinks resolve to same session_id."""

with tempfile.TemporaryDirectory() as tmpdir:
# Create physical .coditect directory
physical_dir = os.path.join(tmpdir, 'master', '.coditect')
os.makedirs(physical_dir)

# Create VERSION file
with open(os.path.join(physical_dir, 'VERSION'), 'w') as f:
f.write('1.0.0')

# Create symlinks (simulating submodules)
submodule1 = os.path.join(tmpdir, 'submodule1')
os.makedirs(submodule1)
os.symlink(
os.path.join('..', 'master', '.coditect'),
os.path.join(submodule1, '.coditect')
)

submodule2 = os.path.join(tmpdir, 'submodule2')
os.makedirs(submodule2)
os.symlink(
os.path.join('..', 'master', '.coditect'),
os.path.join(submodule2, '.coditect')
)

# Generate session IDs from different symlink paths
os.chdir(os.path.join(tmpdir, 'master'))
session_id_master = generate_session_id()

os.chdir(submodule1)
session_id_sub1 = generate_session_id()

os.chdir(submodule2)
session_id_sub2 = generate_session_id()

# Verify all resolve to SAME session ID
assert session_id_master == session_id_sub1 == session_id_sub2, \
"Symlinks must resolve to same session ID"

print("✅ Test passed: Symlinks correctly resolve to same session")

Test Case 2: Different Projects (Expected: Different Session IDs)

def test_different_projects():
"""Verify different git repositories have different session IDs."""

with tempfile.TemporaryDirectory() as tmpdir:
# Project 1
project1 = os.path.join(tmpdir, 'project1')
os.makedirs(os.path.join(project1, '.coditect'))
subprocess.run(['git', 'init'], cwd=project1)

# Project 2
project2 = os.path.join(tmpdir, 'project2')
os.makedirs(os.path.join(project2, '.coditect'))
subprocess.run(['git', 'init'], cwd=project2)

# Generate session IDs
os.chdir(project1)
session_id_1 = generate_session_id()

os.chdir(project2)
session_id_2 = generate_session_id()

# Verify DIFFERENT session IDs (different project roots)
assert session_id_1 != session_id_2, \
"Different projects must have different session IDs"

print("✅ Test passed: Different projects have different sessions")

Test Case 3: Docker Volume Mount (Expected: Same Session ID)

def test_docker_volume_mount():
"""
Verify Docker volume mounts resolve correctly.

Simulates:
Host: /Users/dev/project/.coditect
Container: /app/.coditect (mounted from host)

Expected: Same physical inode → same session ID
"""

# NOTE: In Docker, volume mounts preserve inodes
# os.path.realpath() resolves to mounted path, but inode is shared
# Session ID will be based on CONTAINER's view of path

# For fair licensing:
# - Host session: /Users/dev/project/.coditect → session_1
# - Container session: /app/.coditect (same inode) → session_2
# - BUT: different project_root prevents double-counting

# Conclusion: Docker container = separate environment = separate session (fair)

pass # Tested in integration tests with actual Docker

Edge Cases Handled

1. Broken Symlinks

# Scenario: Symlink points to non-existent target
# .coditect -> /missing/path

try:
coditect_path = os.path.realpath('.coditect')
except OSError:
# Fallback: Use symlink path itself
coditect_path = os.path.abspath('.coditect')

2. Circular Symlinks

# Scenario: a -> b -> c -> a (infinite loop)

# os.path.realpath() handles this:
# - Detects circular reference
# - Raises OSError or returns partial resolution
# - Fallback to os.path.abspath()

3. Relative vs. Absolute Symlinks

# Both work correctly with os.path.realpath():

# Relative: .coditect -> ../../master/.coditect
# Absolute: .coditect -> /Users/dev/master/.coditect

# os.path.realpath() resolves both to same physical path

4. Cross-Filesystem Symlinks

# Scenario: Symlink crosses filesystem boundaries
# .coditect -> /mnt/external/.coditect

# Still works: os.path.realpath() follows across filesystems
# Session ID will be unique per physical path

Consequences

Positive

Fair Pricing

  • Symlink chains correctly resolve to single session
  • 1 developer, 1 project, N symlinks = 1 seat (fair)
  • Monorepo with 46 submodules = 1 seat (developer expectation met)

Prevents Double-Counting

  • No billing for symlink architecture overhead
  • Same physical .coditect/ directory = same session
  • Transparent to developers (no need to understand symlink internals)

Cross-Platform Compatibility

  • os.path.realpath() works on Linux, macOS, Windows
  • Docker volume mounts handled correctly
  • No platform-specific hacks required

Deterministic Session IDs

  • Same inputs always produce same session_id
  • Idempotent license acquisition (retry-safe)
  • No race conditions in session ID generation

Handles Multiple Projects

  • Different git repositories = different project_root
  • Allows developer to work on multiple client projects
  • Each project gets separate session (fair usage)

Security: Prevents Abuse

  • Hardware fingerprint prevents cross-machine sharing
  • Copying .coditect/ creates different physical path (new session)
  • User email tracks who's using license (audit trail)

Negative

⚠️ Docker Containers = Separate Sessions

  • Container has different view of filesystem
  • Host session != container session (different paths)
  • Mitigation: Expected behavior (container is separate environment)
  • Acceptable: Most developers use EITHER host OR container, not both simultaneously

⚠️ Symlink Performance Overhead

  • os.path.realpath() requires filesystem traversal
  • Adds ~1-5ms per call (negligible)
  • Mitigation: Cache resolved path for session duration

⚠️ Hard Links Not Detected

  • Hard links (different paths, same inode) treated as different sessions
  • Rare edge case (hard links uncommon for directories)
  • Acceptable: Hard links discouraged in git workflows

⚠️ Project Root Detection Requires Git

  • git rev-parse --show-toplevel fails if not in git repo
  • Mitigation: Fallback to current working directory
  • Acceptable: CODITECT installed as git submodule (git always present)

Neutral

🔄 Session ID Includes Multiple Factors

  • More complex than simple path hash
  • Better security and fairness trade-off
  • Acceptable: Complexity hidden from developers

🔄 License Cache Tied to Session ID

  • Changing hardware or project requires new license acquisition
  • Expected behavior (different session = different license check)
  • Acceptable: Cached licenses reduce re-validation frequency

Alternatives Considered

Alternative 1: Current Working Directory (CWD)

Implementation:

session_id = hash(os.getcwd())

Pros:

  • ✅ Simplest implementation
  • ✅ Fast (no filesystem traversal)

Cons:

  • ❌ Different for each submodule (overbilling)
  • ❌ Unfair to developers (penalized for symlink architecture)
  • ❌ Developer expectation mismatch

Rejected Because: Overbills developers with submodule-heavy projects.

Implementation:

session_id = hash(os.path.abspath('.coditect'))

Pros:

  • ✅ Simple
  • ✅ Fast

Cons:

  • ❌ Symlinks NOT resolved (different paths for same physical directory)
  • ❌ Same problem as Alternative 1

Rejected Because: Doesn't handle symlinks correctly.

Alternative 3: Inode-Based Session ID

Implementation:

import os
stat_info = os.stat('.coditect')
session_id = hash(stat_info.st_ino) # Inode number

Pros:

  • ✅ Symlinks and hard links both resolve to same inode
  • ✅ Most accurate physical identity

Cons:

  • ❌ Inodes not portable across filesystems
  • ❌ Docker volume mounts may have different inodes
  • ❌ Windows doesn't have inodes (NTFS uses FileID)
  • ❌ Doesn't distinguish different projects with same .coditect copy

Rejected Because: Not cross-platform, doesn't include project context.

Alternative 4: Git Submodule Hash

Implementation:

# Use git submodule commit hash as session ID
submodule_hash = subprocess.check_output(
['git', 'rev-parse', 'HEAD:.coditect']
).decode().strip()
session_id = hash(submodule_hash)

Pros:

  • ✅ Git-native approach
  • ✅ Tracks .coditect version

Cons:

  • ❌ Requires git (fails outside git repos)
  • ❌ Changes when .coditect/ updated (breaks cached licenses)
  • ❌ Doesn't include hardware or user identity (abuse risk)

Rejected Because: Too fragile (breaks on .coditect updates).


Implementation Notes

Caching Resolved Path

Optimization:

# Cache resolved path for session duration
_resolved_path_cache = None

def get_resolved_coditect_path():
"""Get cached resolved .coditect path."""
global _resolved_path_cache

if _resolved_path_cache is None:
_resolved_path_cache = os.path.realpath('.coditect')

return _resolved_path_cache

Benefit: Avoid repeated filesystem traversal (5ms → <0.01ms).

Cross-Platform Testing

Test Matrix:

PlatformSymlink Supportos.path.realpath()Status
Linux✅ Native✅ Works✅ Tested
macOS✅ Native✅ Works✅ Tested
Windows⚠️ NTFS symlinks require admin✅ Works with admin privileges⚠️ Limited testing
Docker (Linux)✅ Native✅ Works✅ Tested
Docker (Windows)⚠️ Varies by mount type⚠️ May require bind mounts⚠️ Edge case

Recommendation: Encourage Linux/macOS for development, Windows testing required for v1.0.

Debugging Session ID Generation

Debug Mode:

import os
import logging

def generate_session_id(debug=False):
"""Generate session ID with optional debugging."""

coditect_path = os.path.realpath('.coditect')
project_root = get_project_root()
hardware_id = get_hardware_id()
user_email = get_user_email()

if debug:
logging.info(f"Session ID Components:")
logging.info(f" coditect_path: {coditect_path}")
logging.info(f" project_root: {project_root}")
logging.info(f" hardware_id: {hardware_id[:8]}...")
logging.info(f" user_email: {user_email}")

session_data = {
'coditect_path': coditect_path,
'project_root': project_root,
'hardware_id': hardware_id,
'user_email': user_email,
'coditect_version': get_version(),
'usage_type': 'builder'
}

session_id = hashlib.sha256(
json.dumps(session_data, sort_keys=True).encode()
).hexdigest()

if debug:
logging.info(f" session_id: {session_id}")

return session_id

Usage:

CODITECT_DEBUG=1 python3 .coditect/scripts/init.sh

  • ADR-001: Floating Licenses vs. Node-Locked Licenses (session ID usage in seat management)
  • ADR-002: Redis Lua Scripts for Atomic Operations (session_id as Redis key)
  • ADR-003: Check-on-Init Enforcement Pattern (when session_id is generated)
  • ADR-005: Builder vs. Runtime Licensing Model (usage_type field in session_id)

References


Last Updated: 2025-11-30 Owner: Architecture Team, Product Team Review Cycle: Quarterly or on major licensing changes