Skip to main content

ADR-005: Experiment Data Governance

Status

Proposed

Context

Brainqub3 Agent Labs generates experiment data through systematic runs that compare agent architectures. Each run produces:

Run Artifacts:

  • run_manifest.json: Content-hashed manifest with run ID, scenario, architecture, agent count, parameters
  • performance_metrics.json: Task completion time, success rate, output quality scores
  • coordination_metrics.json: Overhead %, message density, redundancy, efficiency, error amplification
  • agent_logs/: Individual agent execution logs and decisions
  • evaluations/: Evaluator judgments (pass/fail + rationale)

Agent Labs Design Principles:

  1. Immutability: Runs are never modified after creation (content-hashed)
  2. Reproducibility: Run manifest contains all parameters needed to recreate experiment
  3. Evaluator-First: No experiment without validated evaluator + passing tests
  4. Local-First: Data stored in local filesystem, optionally synced to cloud

CODITECT Context:

  • CODITECT already has data governance patterns:
    • ~/.coditect-data/ for user data (ADR-114)
    • org.db is irreplaceable (decisions, learnings, error_solutions)
    • Session logs in ~/.coditect-data/session-logs/ (ADR-155)
    • Immutable session logs with content hashing
  • CODITECT values reproducibility and audit trails
  • Multi-machine access (same user, different machines)
  • Storage growth concerns (218 PDFs = 12+ GB in UDOM pipeline)

Questions:

  1. Where should Agent Labs run data be stored?
  2. How to handle storage growth (hundreds of runs = GBs of logs)?
  3. Should run data sync across machines (like session logs)?
  4. How long to retain runs? Archival strategy?
  5. Can runs be deleted, or are they permanent like org.db decisions?

Decision

Run data is immutable, content-hashed, and stored locally with optional cloud sync, following these principles:

1. Storage Location

Primary Storage (Local):

~/.coditect-data/scaling-experiments/
├── runs/
│ ├── run-2026-02-16-abc123/
│ │ ├── run_manifest.json # Content-hashed metadata
│ │ ├── performance_metrics.json
│ │ ├── coordination_metrics.json
│ │ ├── agent_logs/ # Per-agent execution logs
│ │ └── evaluations/ # Evaluator judgments
│ └── run-2026-02-16-def456/
├── evaluators/ # Evaluator definitions (versioned)
├── scenarios/ # Scenario configurations
└── scaling-models/ # Trained models (coefficients)
├── coditect-v1.json
└── calibration-2026-02-16/ # Raw calibration data

Rationale:

  • Aligns with CODITECT data pattern (~/.coditect-data/)
  • Isolated from protected installation (no accidental contamination)
  • User-owned, survives CODITECT version upgrades
  • Accessible to Agent Labs CLI (configured via environment variable)

2. Immutability Enforcement

Content Hashing:

  • Run ID = SHA256(run_manifest.json)[:16] (matches Agent Labs pattern)
  • Run directory name includes hash: run-{date}-{hash}/
  • Any modification to manifest invalidates run ID → creates new run

Write-Once:

  • Runs created in temp directory, moved to final location atomically
  • No in-place edits (use os.rename() for atomic move)
  • Read-only permissions after creation: chmod 444 run_manifest.json

Why Immutable:

  • Reproducibility: exact parameters preserved forever
  • Audit trail: cannot alter results retroactively
  • Scaling model validity: training data must be stable
  • Scientific integrity: matches Agent Labs research methodology

3. Cloud Sync (Optional)

Google Cloud Storage (Matches CODITECT Pattern):

gs://coditect-scaling-experiments-{user_id}/
└── runs/
└── run-2026-02-16-abc123/
└── [same structure as local]

Sync Behavior:

  • Manual: /scaling-sync --upload (push local → cloud), --download (pull cloud → local)
  • Automatic (opt-in): Sync on run completion if CODITECT_SCALING_SYNC=true
  • Incremental: Only upload new runs (skip existing hashes)
  • Cross-machine: Multiple machines can pull same runs for analysis

Rationale:

  • Matches org.db backup pattern (gs://coditect-cloud-infra-context-backups)
  • Enables collaboration (share runs across team)
  • Disaster recovery (machine loss → restore from cloud)
  • Optional (offline-first still works)

4. Retention and Archival

Retention Policy:

  • Active runs (last 90 days): Keep in ~/.coditect-data/scaling-experiments/runs/
  • Archived runs (90+ days old): Move to ~/.coditect-data/scaling-experiments/archive/
  • Deletion: Never automatic; manual only with explicit user command

Archival Workflow:

# Archive old runs (move to archive/, preserve in cloud)
/scaling-archive --older-than 90d

# Prune archived runs (delete local copies, keep cloud backups)
/scaling-prune --archived-only --keep-cloud-backup

# Full deletion (local + cloud, requires confirmation)
/scaling-delete run-2026-02-16-abc123 --confirm

Archival Format:

  • Compress runs: tar -czf run-2026-02-16-abc123.tar.gz run-2026-02-16-abc123/
  • Store compressed archives in archive/ directory
  • Decompress on-demand for analysis

Exemptions from Archival:

  • Runs tagged --keep-forever (e.g., benchmark baselines, paper results)
  • Runs used in active scaling models (referenced in coditect-v1.json)
  • Failed runs (always delete promptly, not archived)

5. Storage Growth Mitigation

Strategies:

  1. Log Compression: gzip agent logs after run completion (10x reduction)
  2. Sampling: For large-scale experiments, keep every 10th run (stochastic sampling)
  3. Summary Metrics Only: Option to discard raw logs, keep only *_metrics.json
  4. Cleanup Command: /scaling-cleanup --dry-run shows deletable runs, confirms before action
  5. Quota Warnings: Alert when ~/.coditect-data/scaling-experiments/ exceeds 10 GB

Disk Usage Estimation:

  • Average run size: ~5 MB (with logs)
  • 100 runs/month × 12 months = 1200 runs = ~6 GB/year
  • With compression + archival: ~2 GB/year active storage

6. Data Access and Querying

CLI Commands:

# List all runs
/scaling-list --architecture centralised --agent-count 3

# Compare runs
/scaling-compare run-abc123 run-def456

# Export run data
/scaling-export run-abc123 --format csv > results.csv

# Restore archived run
/scaling-restore run-2026-01-15-xyz789 --from-archive

Programmatic Access:

# scripts/scaling-analysis/query.py
from pathlib import Path
import json

def load_run(run_id: str):
run_dir = Path.home() / ".coditect-data/scaling-experiments/runs" / run_id
with open(run_dir / "run_manifest.json") as f:
return json.load(f)

def query_runs(architecture: str = None, min_performance: float = None):
runs_dir = Path.home() / ".coditect-data/scaling-experiments/runs"
for run_dir in runs_dir.iterdir():
manifest = load_run(run_dir.name)
if architecture and manifest["architecture"] != architecture:
continue
# ... filter logic
yield manifest

Alternatives Considered

1. Mutable Results (Allow Edits)

Pros:

  • Can fix errors in run data
  • Simpler implementation

Cons:

  • Destroys reproducibility (cannot verify results)
  • Breaks content hashing (run ID no longer valid)
  • Violates scientific integrity
  • Scaling model trained on unstable data

2. Database-Only Storage (SQLite/PostgreSQL)

Pros:

  • Structured queries (SQL)
  • Indexed lookups
  • Relational joins

Cons:

  • Loses filesystem simplicity (harder to inspect runs)
  • Migration overhead (schema changes)
  • Binary format (no cat run_manifest.json)
  • Harder to sync (database replication vs file copy)
  • Agent Labs uses filesystem, would require adapter

3. Cloud-First Storage (No Local Copy)

Pros:

  • No local disk usage
  • Automatic cross-machine access

Cons:

  • Requires internet (breaks offline-first principle)
  • Latency for analysis (network round-trips)
  • Violates CODITECT local-first design
  • Cost (cloud storage + egress)

4. Ephemeral Runs (Delete After Analysis)

Pros:

  • Zero storage growth
  • No archival needed

Cons:

  • Cannot reproduce experiments
  • Scaling model retraining impossible
  • Loses audit trail
  • Violates Agent Labs immutability principle

5. External Database (e.g., MLflow, Weights & Biases)

Pros:

  • Professional experiment tracking UI
  • Built-in versioning, comparisons
  • Large community

Cons:

  • External dependency (SaaS lock-in)
  • Data sent to third party (privacy concern)
  • Requires account/auth setup
  • Overkill for CODITECT use case

Consequences

Positive

  1. Reproducibility: Immutable runs + content hashing guarantee exact recreation
  2. Audit Trail: Permanent record of all scaling experiments
  3. Scientific Integrity: Results cannot be retroactively altered
  4. Scaling Model Validity: Training data stability ensures model accuracy
  5. Disaster Recovery: Cloud sync protects against machine loss
  6. Cross-Machine Access: Share runs across team or multiple machines
  7. Simple Debugging: Filesystem-based storage is easy to inspect (cat, ls, grep)
  8. Offline-First: No internet required for experimentation
  9. CODITECT Consistency: Matches existing data governance patterns (ADR-114, ADR-155)

Negative

  1. Storage Growth: Runs accumulate, requiring archival/cleanup discipline
  2. No In-Place Fixes: Cannot edit run if parameters recorded incorrectly (must delete + rerun)
  3. Manual Archival: Requires periodic /scaling-archive invocation
  4. Disk Space Monitoring: Need to track ~/.coditect-data/ usage
  5. Cloud Sync Complexity: Manual sync commands vs automatic background sync tradeoff
  6. Deletion Friction: Immutability makes cleanup deliberate (confirmation required)

Risks

  1. Runaway Storage Growth: User forgets to archive, disk fills up

    • Mitigation: Automatic warnings at 10 GB; /scaling-cleanup recommendations
  2. Accidental Deletion: User deletes important benchmark runs

    • Mitigation: --keep-forever tagging; cloud backups; confirmation prompts
  3. Cloud Sync Conflicts: Two machines create runs with same hash (unlikely but possible)

    • Mitigation: Content hash includes timestamp + random nonce; conflict detection on upload
  4. Lost Runs (No Cloud Backup): Machine crash before cloud sync

    • Mitigation: Encourage opt-in automatic sync; periodic backup reminders
  5. Stale Archives: Compressed runs never accessed, waste cloud storage

    • Mitigation: Audit cloud storage quarterly; delete archives unused for 1+ year
  6. Version Compatibility: Future Agent Labs version changes run format

    • Mitigation: Version manifests ("schema_version": "1.0"); migration scripts

Implementation Notes

1. Environment Configuration

# Set Agent Labs data directory
export AGENT_LABS_DATA_DIR=~/.coditect-data/scaling-experiments

# Enable automatic cloud sync (optional)
export CODITECT_SCALING_SYNC=true
export CODITECT_SCALING_BUCKET=gs://coditect-scaling-experiments-halcasteel

2. Run Creation Workflow

# scripts/scaling-analysis/run_experiment.py

import json
import hashlib
from pathlib import Path
import shutil

def create_run(scenario: str, architecture: str, agents: int):
# Generate manifest
manifest = {
"scenario": scenario,
"architecture": architecture,
"agent_count": agents,
"timestamp": "2026-02-16T10:30:00Z",
"schema_version": "1.0"
}

# Content hash (run ID)
manifest_json = json.dumps(manifest, sort_keys=True)
run_hash = hashlib.sha256(manifest_json.encode()).hexdigest()[:16]
run_id = f"run-{manifest['timestamp'][:10]}-{run_hash}"

# Create in temp, move atomically
temp_dir = Path("/tmp") / run_id
temp_dir.mkdir()
with open(temp_dir / "run_manifest.json", "w") as f:
json.dump(manifest, f, indent=2)

# Execute experiment (populates temp_dir with results)
execute_agent_labs_run(temp_dir, manifest)

# Move to final location
final_dir = Path.home() / ".coditect-data/scaling-experiments/runs" / run_id
shutil.move(str(temp_dir), str(final_dir))

# Make read-only
(final_dir / "run_manifest.json").chmod(0o444)

# Cloud sync (if enabled)
if os.getenv("CODITECT_SCALING_SYNC") == "true":
sync_to_cloud(final_dir)

return run_id

3. Archival Script

#!/bin/bash
# scripts/scaling-analysis/archive_old_runs.sh

RUNS_DIR=~/.coditect-data/scaling-experiments/runs
ARCHIVE_DIR=~/.coditect-data/scaling-experiments/archive
CUTOFF_DATE=$(date -v-90d +%Y-%m-%d) # 90 days ago

for run_dir in "$RUNS_DIR"/run-*; do
run_date=$(basename "$run_dir" | cut -d'-' -f2-4) # Extract YYYY-MM-DD

if [[ "$run_date" < "$CUTOFF_DATE" ]]; then
echo "Archiving $(basename "$run_dir")..."
tar -czf "$ARCHIVE_DIR/$(basename "$run_dir").tar.gz" -C "$RUNS_DIR" "$(basename "$run_dir")"
rm -rf "$run_dir"
fi
done

4. Cloud Sync Command

# /scaling-sync implementation
case "$1" in
--upload)
gsutil -m rsync -r ~/.coditect-data/scaling-experiments/runs/ \
gs://coditect-scaling-experiments-$USER/runs/
;;
--download)
gsutil -m rsync -r gs://coditect-scaling-experiments-$USER/runs/ \
~/.coditect-data/scaling-experiments/runs/
;;
esac

5. Storage Monitoring

# scripts/scaling-analysis/check_storage.py

from pathlib import Path
import shutil

def check_storage_usage():
experiments_dir = Path.home() / ".coditect-data/scaling-experiments"
usage_bytes = sum(
f.stat().st_size
for f in experiments_dir.rglob('*')
if f.is_file()
)
usage_gb = usage_bytes / (1024**3)

if usage_gb > 10:
print(f"WARNING: Scaling experiments using {usage_gb:.1f} GB")
print("Run '/scaling-archive --older-than 90d' to reduce usage")

return usage_gb

References

  • ADR-114: User Data Directory (~/.coditect-data/)
  • ADR-155: Session Log Location and Sync
  • CODITECT Backup Pattern: ~/.coditect/scripts/backup-context-db.sh
  • Agent Labs Run Format: run_manifest.json schema
  • Google Cloud Storage: gs://coditect-cloud-infra-context-backups (existing pattern)
  • Related ADRs:
    • ADR-001: Agent Labs Adoption
    • ADR-002: Integration Pattern
    • ADR-004: Scaling Model for Agent Selection

Author: Claude (Sonnet 4.5) Date: 2026-02-16 Track: H (Framework) Task ID: H.0