ADR-005: Experiment Data Governance
Status
Proposed
Context
Brainqub3 Agent Labs generates experiment data through systematic runs that compare agent architectures. Each run produces:
Run Artifacts:
- run_manifest.json: Content-hashed manifest with run ID, scenario, architecture, agent count, parameters
- performance_metrics.json: Task completion time, success rate, output quality scores
- coordination_metrics.json: Overhead %, message density, redundancy, efficiency, error amplification
- agent_logs/: Individual agent execution logs and decisions
- evaluations/: Evaluator judgments (pass/fail + rationale)
Agent Labs Design Principles:
- Immutability: Runs are never modified after creation (content-hashed)
- Reproducibility: Run manifest contains all parameters needed to recreate experiment
- Evaluator-First: No experiment without validated evaluator + passing tests
- Local-First: Data stored in local filesystem, optionally synced to cloud
CODITECT Context:
- CODITECT already has data governance patterns:
~/.coditect-data/for user data (ADR-114)org.dbis irreplaceable (decisions, learnings, error_solutions)- Session logs in
~/.coditect-data/session-logs/(ADR-155) - Immutable session logs with content hashing
- CODITECT values reproducibility and audit trails
- Multi-machine access (same user, different machines)
- Storage growth concerns (218 PDFs = 12+ GB in UDOM pipeline)
Questions:
- Where should Agent Labs run data be stored?
- How to handle storage growth (hundreds of runs = GBs of logs)?
- Should run data sync across machines (like session logs)?
- How long to retain runs? Archival strategy?
- Can runs be deleted, or are they permanent like
org.dbdecisions?
Decision
Run data is immutable, content-hashed, and stored locally with optional cloud sync, following these principles:
1. Storage Location
Primary Storage (Local):
~/.coditect-data/scaling-experiments/
├── runs/
│ ├── run-2026-02-16-abc123/
│ │ ├── run_manifest.json # Content-hashed metadata
│ │ ├── performance_metrics.json
│ │ ├── coordination_metrics.json
│ │ ├── agent_logs/ # Per-agent execution logs
│ │ └── evaluations/ # Evaluator judgments
│ └── run-2026-02-16-def456/
├── evaluators/ # Evaluator definitions (versioned)
├── scenarios/ # Scenario configurations
└── scaling-models/ # Trained models (coefficients)
├── coditect-v1.json
└── calibration-2026-02-16/ # Raw calibration data
Rationale:
- Aligns with CODITECT data pattern (
~/.coditect-data/) - Isolated from protected installation (no accidental contamination)
- User-owned, survives CODITECT version upgrades
- Accessible to Agent Labs CLI (configured via environment variable)
2. Immutability Enforcement
Content Hashing:
- Run ID =
SHA256(run_manifest.json)[:16](matches Agent Labs pattern) - Run directory name includes hash:
run-{date}-{hash}/ - Any modification to manifest invalidates run ID → creates new run
Write-Once:
- Runs created in temp directory, moved to final location atomically
- No in-place edits (use
os.rename()for atomic move) - Read-only permissions after creation:
chmod 444 run_manifest.json
Why Immutable:
- Reproducibility: exact parameters preserved forever
- Audit trail: cannot alter results retroactively
- Scaling model validity: training data must be stable
- Scientific integrity: matches Agent Labs research methodology
3. Cloud Sync (Optional)
Google Cloud Storage (Matches CODITECT Pattern):
gs://coditect-scaling-experiments-{user_id}/
└── runs/
└── run-2026-02-16-abc123/
└── [same structure as local]
Sync Behavior:
- Manual:
/scaling-sync --upload(push local → cloud),--download(pull cloud → local) - Automatic (opt-in): Sync on run completion if
CODITECT_SCALING_SYNC=true - Incremental: Only upload new runs (skip existing hashes)
- Cross-machine: Multiple machines can pull same runs for analysis
Rationale:
- Matches
org.dbbackup pattern (gs://coditect-cloud-infra-context-backups) - Enables collaboration (share runs across team)
- Disaster recovery (machine loss → restore from cloud)
- Optional (offline-first still works)
4. Retention and Archival
Retention Policy:
- Active runs (last 90 days): Keep in
~/.coditect-data/scaling-experiments/runs/ - Archived runs (90+ days old): Move to
~/.coditect-data/scaling-experiments/archive/ - Deletion: Never automatic; manual only with explicit user command
Archival Workflow:
# Archive old runs (move to archive/, preserve in cloud)
/scaling-archive --older-than 90d
# Prune archived runs (delete local copies, keep cloud backups)
/scaling-prune --archived-only --keep-cloud-backup
# Full deletion (local + cloud, requires confirmation)
/scaling-delete run-2026-02-16-abc123 --confirm
Archival Format:
- Compress runs:
tar -czf run-2026-02-16-abc123.tar.gz run-2026-02-16-abc123/ - Store compressed archives in
archive/directory - Decompress on-demand for analysis
Exemptions from Archival:
- Runs tagged
--keep-forever(e.g., benchmark baselines, paper results) - Runs used in active scaling models (referenced in
coditect-v1.json) - Failed runs (always delete promptly, not archived)
5. Storage Growth Mitigation
Strategies:
- Log Compression:
gzipagent logs after run completion (10x reduction) - Sampling: For large-scale experiments, keep every 10th run (stochastic sampling)
- Summary Metrics Only: Option to discard raw logs, keep only
*_metrics.json - Cleanup Command:
/scaling-cleanup --dry-runshows deletable runs, confirms before action - Quota Warnings: Alert when
~/.coditect-data/scaling-experiments/exceeds 10 GB
Disk Usage Estimation:
- Average run size: ~5 MB (with logs)
- 100 runs/month × 12 months = 1200 runs = ~6 GB/year
- With compression + archival: ~2 GB/year active storage
6. Data Access and Querying
CLI Commands:
# List all runs
/scaling-list --architecture centralised --agent-count 3
# Compare runs
/scaling-compare run-abc123 run-def456
# Export run data
/scaling-export run-abc123 --format csv > results.csv
# Restore archived run
/scaling-restore run-2026-01-15-xyz789 --from-archive
Programmatic Access:
# scripts/scaling-analysis/query.py
from pathlib import Path
import json
def load_run(run_id: str):
run_dir = Path.home() / ".coditect-data/scaling-experiments/runs" / run_id
with open(run_dir / "run_manifest.json") as f:
return json.load(f)
def query_runs(architecture: str = None, min_performance: float = None):
runs_dir = Path.home() / ".coditect-data/scaling-experiments/runs"
for run_dir in runs_dir.iterdir():
manifest = load_run(run_dir.name)
if architecture and manifest["architecture"] != architecture:
continue
# ... filter logic
yield manifest
Alternatives Considered
1. Mutable Results (Allow Edits)
Pros:
- Can fix errors in run data
- Simpler implementation
Cons:
- Destroys reproducibility (cannot verify results)
- Breaks content hashing (run ID no longer valid)
- Violates scientific integrity
- Scaling model trained on unstable data
2. Database-Only Storage (SQLite/PostgreSQL)
Pros:
- Structured queries (SQL)
- Indexed lookups
- Relational joins
Cons:
- Loses filesystem simplicity (harder to inspect runs)
- Migration overhead (schema changes)
- Binary format (no
cat run_manifest.json) - Harder to sync (database replication vs file copy)
- Agent Labs uses filesystem, would require adapter
3. Cloud-First Storage (No Local Copy)
Pros:
- No local disk usage
- Automatic cross-machine access
Cons:
- Requires internet (breaks offline-first principle)
- Latency for analysis (network round-trips)
- Violates CODITECT local-first design
- Cost (cloud storage + egress)
4. Ephemeral Runs (Delete After Analysis)
Pros:
- Zero storage growth
- No archival needed
Cons:
- Cannot reproduce experiments
- Scaling model retraining impossible
- Loses audit trail
- Violates Agent Labs immutability principle
5. External Database (e.g., MLflow, Weights & Biases)
Pros:
- Professional experiment tracking UI
- Built-in versioning, comparisons
- Large community
Cons:
- External dependency (SaaS lock-in)
- Data sent to third party (privacy concern)
- Requires account/auth setup
- Overkill for CODITECT use case
Consequences
Positive
- Reproducibility: Immutable runs + content hashing guarantee exact recreation
- Audit Trail: Permanent record of all scaling experiments
- Scientific Integrity: Results cannot be retroactively altered
- Scaling Model Validity: Training data stability ensures model accuracy
- Disaster Recovery: Cloud sync protects against machine loss
- Cross-Machine Access: Share runs across team or multiple machines
- Simple Debugging: Filesystem-based storage is easy to inspect (
cat,ls,grep) - Offline-First: No internet required for experimentation
- CODITECT Consistency: Matches existing data governance patterns (ADR-114, ADR-155)
Negative
- Storage Growth: Runs accumulate, requiring archival/cleanup discipline
- No In-Place Fixes: Cannot edit run if parameters recorded incorrectly (must delete + rerun)
- Manual Archival: Requires periodic
/scaling-archiveinvocation - Disk Space Monitoring: Need to track
~/.coditect-data/usage - Cloud Sync Complexity: Manual sync commands vs automatic background sync tradeoff
- Deletion Friction: Immutability makes cleanup deliberate (confirmation required)
Risks
-
Runaway Storage Growth: User forgets to archive, disk fills up
- Mitigation: Automatic warnings at 10 GB;
/scaling-cleanuprecommendations
- Mitigation: Automatic warnings at 10 GB;
-
Accidental Deletion: User deletes important benchmark runs
- Mitigation:
--keep-forevertagging; cloud backups; confirmation prompts
- Mitigation:
-
Cloud Sync Conflicts: Two machines create runs with same hash (unlikely but possible)
- Mitigation: Content hash includes timestamp + random nonce; conflict detection on upload
-
Lost Runs (No Cloud Backup): Machine crash before cloud sync
- Mitigation: Encourage opt-in automatic sync; periodic backup reminders
-
Stale Archives: Compressed runs never accessed, waste cloud storage
- Mitigation: Audit cloud storage quarterly; delete archives unused for 1+ year
-
Version Compatibility: Future Agent Labs version changes run format
- Mitigation: Version manifests (
"schema_version": "1.0"); migration scripts
- Mitigation: Version manifests (
Implementation Notes
1. Environment Configuration
# Set Agent Labs data directory
export AGENT_LABS_DATA_DIR=~/.coditect-data/scaling-experiments
# Enable automatic cloud sync (optional)
export CODITECT_SCALING_SYNC=true
export CODITECT_SCALING_BUCKET=gs://coditect-scaling-experiments-halcasteel
2. Run Creation Workflow
# scripts/scaling-analysis/run_experiment.py
import json
import hashlib
from pathlib import Path
import shutil
def create_run(scenario: str, architecture: str, agents: int):
# Generate manifest
manifest = {
"scenario": scenario,
"architecture": architecture,
"agent_count": agents,
"timestamp": "2026-02-16T10:30:00Z",
"schema_version": "1.0"
}
# Content hash (run ID)
manifest_json = json.dumps(manifest, sort_keys=True)
run_hash = hashlib.sha256(manifest_json.encode()).hexdigest()[:16]
run_id = f"run-{manifest['timestamp'][:10]}-{run_hash}"
# Create in temp, move atomically
temp_dir = Path("/tmp") / run_id
temp_dir.mkdir()
with open(temp_dir / "run_manifest.json", "w") as f:
json.dump(manifest, f, indent=2)
# Execute experiment (populates temp_dir with results)
execute_agent_labs_run(temp_dir, manifest)
# Move to final location
final_dir = Path.home() / ".coditect-data/scaling-experiments/runs" / run_id
shutil.move(str(temp_dir), str(final_dir))
# Make read-only
(final_dir / "run_manifest.json").chmod(0o444)
# Cloud sync (if enabled)
if os.getenv("CODITECT_SCALING_SYNC") == "true":
sync_to_cloud(final_dir)
return run_id
3. Archival Script
#!/bin/bash
# scripts/scaling-analysis/archive_old_runs.sh
RUNS_DIR=~/.coditect-data/scaling-experiments/runs
ARCHIVE_DIR=~/.coditect-data/scaling-experiments/archive
CUTOFF_DATE=$(date -v-90d +%Y-%m-%d) # 90 days ago
for run_dir in "$RUNS_DIR"/run-*; do
run_date=$(basename "$run_dir" | cut -d'-' -f2-4) # Extract YYYY-MM-DD
if [[ "$run_date" < "$CUTOFF_DATE" ]]; then
echo "Archiving $(basename "$run_dir")..."
tar -czf "$ARCHIVE_DIR/$(basename "$run_dir").tar.gz" -C "$RUNS_DIR" "$(basename "$run_dir")"
rm -rf "$run_dir"
fi
done
4. Cloud Sync Command
# /scaling-sync implementation
case "$1" in
--upload)
gsutil -m rsync -r ~/.coditect-data/scaling-experiments/runs/ \
gs://coditect-scaling-experiments-$USER/runs/
;;
--download)
gsutil -m rsync -r gs://coditect-scaling-experiments-$USER/runs/ \
~/.coditect-data/scaling-experiments/runs/
;;
esac
5. Storage Monitoring
# scripts/scaling-analysis/check_storage.py
from pathlib import Path
import shutil
def check_storage_usage():
experiments_dir = Path.home() / ".coditect-data/scaling-experiments"
usage_bytes = sum(
f.stat().st_size
for f in experiments_dir.rglob('*')
if f.is_file()
)
usage_gb = usage_bytes / (1024**3)
if usage_gb > 10:
print(f"WARNING: Scaling experiments using {usage_gb:.1f} GB")
print("Run '/scaling-archive --older-than 90d' to reduce usage")
return usage_gb
References
- ADR-114: User Data Directory (
~/.coditect-data/) - ADR-155: Session Log Location and Sync
- CODITECT Backup Pattern:
~/.coditect/scripts/backup-context-db.sh - Agent Labs Run Format:
run_manifest.jsonschema - Google Cloud Storage:
gs://coditect-cloud-infra-context-backups(existing pattern) - Related ADRs:
- ADR-001: Agent Labs Adoption
- ADR-002: Integration Pattern
- ADR-004: Scaling Model for Agent Selection
Author: Claude (Sonnet 4.5) Date: 2026-02-16 Track: H (Framework) Task ID: H.0