ADR-005: Experiment Data Governance

Status

Proposed

Context

Brainqub3 Agent Labs generates experiment data through systematic runs that compare agent architectures. Each run produces:

Run Artifacts:

run_manifest.json: Content-hashed manifest with run ID, scenario, architecture, agent count, parameters
performance_metrics.json: Task completion time, success rate, output quality scores
coordination_metrics.json: Overhead %, message density, redundancy, efficiency, error amplification
agent_logs/: Individual agent execution logs and decisions
evaluations/: Evaluator judgments (pass/fail + rationale)

Agent Labs Design Principles:

Immutability: Runs are never modified after creation (content-hashed)
Reproducibility: Run manifest contains all parameters needed to recreate experiment
Evaluator-First: No experiment without validated evaluator + passing tests
Local-First: Data stored in local filesystem, optionally synced to cloud

CODITECT Context:

CODITECT already has data governance patterns:
- ~/.coditect-data/ for user data (ADR-114)
- org.db is irreplaceable (decisions, learnings, error_solutions)
- Session logs in ~/.coditect-data/session-logs/ (ADR-155)
- Immutable session logs with content hashing
CODITECT values reproducibility and audit trails
Multi-machine access (same user, different machines)
Storage growth concerns (218 PDFs = 12+ GB in UDOM pipeline)

Questions:

Where should Agent Labs run data be stored?
How to handle storage growth (hundreds of runs = GBs of logs)?
Should run data sync across machines (like session logs)?
How long to retain runs? Archival strategy?
Can runs be deleted, or are they permanent like org.db decisions?

Decision

Run data is immutable, content-hashed, and stored locally with optional cloud sync, following these principles:

1. Storage Location

Primary Storage (Local):

~/.coditect-data/scaling-experiments/
├── runs/
│   ├── run-2026-02-16-abc123/
│   │   ├── run_manifest.json        # Content-hashed metadata
│   │   ├── performance_metrics.json
│   │   ├── coordination_metrics.json
│   │   ├── agent_logs/              # Per-agent execution logs
│   │   └── evaluations/             # Evaluator judgments
│   └── run-2026-02-16-def456/
├── evaluators/                       # Evaluator definitions (versioned)
├── scenarios/                        # Scenario configurations
└── scaling-models/                   # Trained models (coefficients)
    ├── coditect-v1.json
    └── calibration-2026-02-16/       # Raw calibration data

Rationale:

Aligns with CODITECT data pattern (~/.coditect-data/)
Isolated from protected installation (no accidental contamination)
User-owned, survives CODITECT version upgrades
Accessible to Agent Labs CLI (configured via environment variable)

2. Immutability Enforcement

Content Hashing:

Run ID = SHA256(run_manifest.json)[:16] (matches Agent Labs pattern)
Run directory name includes hash: run-{date}-{hash}/
Any modification to manifest invalidates run ID → creates new run

Write-Once:

Runs created in temp directory, moved to final location atomically
No in-place edits (use os.rename() for atomic move)
Read-only permissions after creation: chmod 444 run_manifest.json

Why Immutable:

Reproducibility: exact parameters preserved forever
Audit trail: cannot alter results retroactively
Scaling model validity: training data must be stable
Scientific integrity: matches Agent Labs research methodology

3. Cloud Sync (Optional)

Google Cloud Storage (Matches CODITECT Pattern):

gs://coditect-scaling-experiments-{user_id}/
└── runs/
    └── run-2026-02-16-abc123/
        └── [same structure as local]

Sync Behavior:

Manual: /scaling-sync --upload (push local → cloud), --download (pull cloud → local)
Automatic (opt-in): Sync on run completion if CODITECT_SCALING_SYNC=true
Incremental: Only upload new runs (skip existing hashes)
Cross-machine: Multiple machines can pull same runs for analysis

Rationale:

Matches org.db backup pattern (gs://coditect-cloud-infra-context-backups)
Enables collaboration (share runs across team)
Disaster recovery (machine loss → restore from cloud)
Optional (offline-first still works)

4. Retention and Archival

Retention Policy:

Active runs (last 90 days): Keep in ~/.coditect-data/scaling-experiments/runs/
Archived runs (90+ days old): Move to ~/.coditect-data/scaling-experiments/archive/
Deletion: Never automatic; manual only with explicit user command

Archival Workflow:

# Archive old runs (move to archive/, preserve in cloud)
/scaling-archive --older-than 90d

# Prune archived runs (delete local copies, keep cloud backups)
/scaling-prune --archived-only --keep-cloud-backup

# Full deletion (local + cloud, requires confirmation)
/scaling-delete run-2026-02-16-abc123 --confirm

Archival Format:

Compress runs: tar -czf run-2026-02-16-abc123.tar.gz run-2026-02-16-abc123/
Store compressed archives in archive/ directory
Decompress on-demand for analysis

Exemptions from Archival:

Runs tagged --keep-forever (e.g., benchmark baselines, paper results)
Runs used in active scaling models (referenced in coditect-v1.json)
Failed runs (always delete promptly, not archived)

5. Storage Growth Mitigation

Strategies:

Log Compression: gzip agent logs after run completion (10x reduction)
Sampling: For large-scale experiments, keep every 10th run (stochastic sampling)
Summary Metrics Only: Option to discard raw logs, keep only *_metrics.json
Cleanup Command: /scaling-cleanup --dry-run shows deletable runs, confirms before action
Quota Warnings: Alert when ~/.coditect-data/scaling-experiments/ exceeds 10 GB

Disk Usage Estimation:

Average run size: ~5 MB (with logs)
100 runs/month × 12 months = 1200 runs = ~6 GB/year
With compression + archival: ~2 GB/year active storage

6. Data Access and Querying

CLI Commands:

# List all runs
/scaling-list --architecture centralised --agent-count 3

# Compare runs
/scaling-compare run-abc123 run-def456

# Export run data
/scaling-export run-abc123 --format csv > results.csv

# Restore archived run
/scaling-restore run-2026-01-15-xyz789 --from-archive

Programmatic Access:

# scripts/scaling-analysis/query.py
from pathlib import Path
import json

def load_run(run_id: str):
    run_dir = Path.home() / ".coditect-data/scaling-experiments/runs" / run_id
    with open(run_dir / "run_manifest.json") as f:
        return json.load(f)

def query_runs(architecture: str = None, min_performance: float = None):
    runs_dir = Path.home() / ".coditect-data/scaling-experiments/runs"
    for run_dir in runs_dir.iterdir():
        manifest = load_run(run_dir.name)
        if architecture and manifest["architecture"] != architecture:
            continue
        # ... filter logic
        yield manifest

Alternatives Considered

1. Mutable Results (Allow Edits)

Pros:

Can fix errors in run data
Simpler implementation

Cons:

Destroys reproducibility (cannot verify results)
Breaks content hashing (run ID no longer valid)
Violates scientific integrity
Scaling model trained on unstable data

2. Database-Only Storage (SQLite/PostgreSQL)

Pros:

Structured queries (SQL)
Indexed lookups
Relational joins

Cons:

Loses filesystem simplicity (harder to inspect runs)
Migration overhead (schema changes)
Binary format (no cat run_manifest.json)
Harder to sync (database replication vs file copy)
Agent Labs uses filesystem, would require adapter

3. Cloud-First Storage (No Local Copy)

Pros:

No local disk usage
Automatic cross-machine access

Cons:

Requires internet (breaks offline-first principle)
Latency for analysis (network round-trips)
Violates CODITECT local-first design
Cost (cloud storage + egress)

4. Ephemeral Runs (Delete After Analysis)

Pros:

Zero storage growth
No archival needed

Cons:

Cannot reproduce experiments
Scaling model retraining impossible
Loses audit trail
Violates Agent Labs immutability principle

5. External Database (e.g., MLflow, Weights & Biases)

Pros:

Professional experiment tracking UI
Built-in versioning, comparisons
Large community

Cons:

External dependency (SaaS lock-in)
Data sent to third party (privacy concern)
Requires account/auth setup
Overkill for CODITECT use case

Consequences

Positive

Reproducibility: Immutable runs + content hashing guarantee exact recreation
Audit Trail: Permanent record of all scaling experiments
Scientific Integrity: Results cannot be retroactively altered
Scaling Model Validity: Training data stability ensures model accuracy
Disaster Recovery: Cloud sync protects against machine loss
Cross-Machine Access: Share runs across team or multiple machines
Simple Debugging: Filesystem-based storage is easy to inspect (cat, ls, grep)
Offline-First: No internet required for experimentation
CODITECT Consistency: Matches existing data governance patterns (ADR-114, ADR-155)

Negative

Storage Growth: Runs accumulate, requiring archival/cleanup discipline
No In-Place Fixes: Cannot edit run if parameters recorded incorrectly (must delete + rerun)
Manual Archival: Requires periodic /scaling-archive invocation
Disk Space Monitoring: Need to track ~/.coditect-data/ usage
Cloud Sync Complexity: Manual sync commands vs automatic background sync tradeoff
Deletion Friction: Immutability makes cleanup deliberate (confirmation required)

Risks

Runaway Storage Growth: User forgets to archive, disk fills up
- Mitigation: Automatic warnings at 10 GB; /scaling-cleanup recommendations
Accidental Deletion: User deletes important benchmark runs
- Mitigation: --keep-forever tagging; cloud backups; confirmation prompts
Cloud Sync Conflicts: Two machines create runs with same hash (unlikely but possible)
- Mitigation: Content hash includes timestamp + random nonce; conflict detection on upload
Lost Runs (No Cloud Backup): Machine crash before cloud sync
- Mitigation: Encourage opt-in automatic sync; periodic backup reminders
Stale Archives: Compressed runs never accessed, waste cloud storage
- Mitigation: Audit cloud storage quarterly; delete archives unused for 1+ year
Version Compatibility: Future Agent Labs version changes run format
- Mitigation: Version manifests ("schema_version": "1.0"); migration scripts

Implementation Notes

1. Environment Configuration

# Set Agent Labs data directory
export AGENT_LABS_DATA_DIR=~/.coditect-data/scaling-experiments

# Enable automatic cloud sync (optional)
export CODITECT_SCALING_SYNC=true
export CODITECT_SCALING_BUCKET=gs://coditect-scaling-experiments-halcasteel

2. Run Creation Workflow

# scripts/scaling-analysis/run_experiment.py

import json
import hashlib
from pathlib import Path
import shutil

def create_run(scenario: str, architecture: str, agents: int):
    # Generate manifest
    manifest = {
        "scenario": scenario,
        "architecture": architecture,
        "agent_count": agents,
        "timestamp": "2026-02-16T10:30:00Z",
        "schema_version": "1.0"
    }

    # Content hash (run ID)
    manifest_json = json.dumps(manifest, sort_keys=True)
    run_hash = hashlib.sha256(manifest_json.encode()).hexdigest()[:16]
    run_id = f"run-{manifest['timestamp'][:10]}-{run_hash}"

    # Create in temp, move atomically
    temp_dir = Path("/tmp") / run_id
    temp_dir.mkdir()
    with open(temp_dir / "run_manifest.json", "w") as f:
        json.dump(manifest, f, indent=2)

    # Execute experiment (populates temp_dir with results)
    execute_agent_labs_run(temp_dir, manifest)

    # Move to final location
    final_dir = Path.home() / ".coditect-data/scaling-experiments/runs" / run_id
    shutil.move(str(temp_dir), str(final_dir))

    # Make read-only
    (final_dir / "run_manifest.json").chmod(0o444)

    # Cloud sync (if enabled)
    if os.getenv("CODITECT_SCALING_SYNC") == "true":
        sync_to_cloud(final_dir)

    return run_id

3. Archival Script

#!/bin/bash
# scripts/scaling-analysis/archive_old_runs.sh

RUNS_DIR=~/.coditect-data/scaling-experiments/runs
ARCHIVE_DIR=~/.coditect-data/scaling-experiments/archive
CUTOFF_DATE=$(date -v-90d +%Y-%m-%d)  # 90 days ago

for run_dir in "$RUNS_DIR"/run-*; do
    run_date=$(basename "$run_dir" | cut -d'-' -f2-4)  # Extract YYYY-MM-DD

    if [[ "$run_date" < "$CUTOFF_DATE" ]]; then
        echo "Archiving $(basename "$run_dir")..."
        tar -czf "$ARCHIVE_DIR/$(basename "$run_dir").tar.gz" -C "$RUNS_DIR" "$(basename "$run_dir")"
        rm -rf "$run_dir"
    fi
done

4. Cloud Sync Command

# /scaling-sync implementation
case "$1" in
    --upload)
        gsutil -m rsync -r ~/.coditect-data/scaling-experiments/runs/ \
            gs://coditect-scaling-experiments-$USER/runs/
        ;;
    --download)
        gsutil -m rsync -r gs://coditect-scaling-experiments-$USER/runs/ \
            ~/.coditect-data/scaling-experiments/runs/
        ;;
esac

5. Storage Monitoring

# scripts/scaling-analysis/check_storage.py

from pathlib import Path
import shutil

def check_storage_usage():
    experiments_dir = Path.home() / ".coditect-data/scaling-experiments"
    usage_bytes = sum(
        f.stat().st_size
        for f in experiments_dir.rglob('*')
        if f.is_file()
    )
    usage_gb = usage_bytes / (1024**3)

    if usage_gb > 10:
        print(f"WARNING: Scaling experiments using {usage_gb:.1f} GB")
        print("Run '/scaling-archive --older-than 90d' to reduce usage")

    return usage_gb

References

ADR-114: User Data Directory (~/.coditect-data/)
ADR-155: Session Log Location and Sync
CODITECT Backup Pattern: ~/.coditect/scripts/backup-context-db.sh
Agent Labs Run Format: run_manifest.json schema
Google Cloud Storage: gs://coditect-cloud-infra-context-backups (existing pattern)
Related ADRs:
- ADR-001: Agent Labs Adoption
- ADR-002: Integration Pattern
- ADR-004: Scaling Model for Agent Selection

Author: Claude (Sonnet 4.5) Date: 2026-02-16 Track: H (Framework) Task ID: H.0

Status​

Context​

Decision​

1. Storage Location​

2. Immutability Enforcement​

3. Cloud Sync (Optional)​

4. Retention and Archival​

5. Storage Growth Mitigation​

6. Data Access and Querying​

Alternatives Considered​

1. Mutable Results (Allow Edits)​

2. Database-Only Storage (SQLite/PostgreSQL)​

3. Cloud-First Storage (No Local Copy)​

4. Ephemeral Runs (Delete After Analysis)​

5. External Database (e.g., MLflow, Weights & Biases)​

Consequences​

Positive​

Negative​

Risks​

Implementation Notes​

1. Environment Configuration​

2. Run Creation Workflow​

3. Archival Script​

4. Cloud Sync Command​

5. Storage Monitoring​

References​

Status

Context

Decision

1. Storage Location

2. Immutability Enforcement

3. Cloud Sync (Optional)

4. Retention and Archival

5. Storage Growth Mitigation

6. Data Access and Querying

Alternatives Considered

1. Mutable Results (Allow Edits)

2. Database-Only Storage (SQLite/PostgreSQL)

3. Cloud-First Storage (No Local Copy)

4. Ephemeral Runs (Delete After Analysis)

5. External Database (e.g., MLflow, Weights & Biases)

Consequences

Positive

Negative

Risks

Implementation Notes

1. Environment Configuration

2. Run Creation Workflow

3. Archival Script

4. Cloud Sync Command

5. Storage Monitoring

References