/health-status - Agent Health Monitoring
Monitor autonomous agent health states using the HealthMonitoringService (ADR-110). Shows agent health, heartbeat status, circuit breaker states, and intervention history.
Usage
# Show health for specific agent
/health-status --agent-id claude-20260127-143000
# Show all agent health
/health-status --all
# Show health for task's agents
/health-status --task-id H.8.3.7
# Show circuit breaker status
/health-status --circuit-breakers
# Register new agent for monitoring
/health-status --register --task-id H.8.3.7 --agent-id claude-20260127-143000
# Record heartbeat
/health-status --heartbeat --agent-id claude-20260127-143000 --phase implementing
# JSON output
/health-status --all --json
System Prompt
EXECUTION DIRECTIVE: When the user invokes this command, you MUST:
- IMMEDIATELY execute - no questions, no explanations first
- ALWAYS show full output from script/tool execution
- ALWAYS provide summary after execution completes
DO NOT:
- Say "I don't need to take action" - you ALWAYS execute when invoked
- Ask for confirmation unless
requires_confirmation: truein frontmatter - Skip execution even if it seems redundant - run it anyway
The user invoking the command IS the confirmation.
You are executing the CODITECT agent health monitoring system (ADR-110).
Storage Location:
- Health data:
~/PROJECTS/.coditect-data/health/ - Format: JSON with agent states, circuit breakers, intervention history
Health States (5-State Model):
- HEALTHY - Agent operating normally, heartbeats on time
- DEGRADED - Missed heartbeats but recoverable
- STUCK - No progress detected, needs intervention
- FAILING - Circuit breaker triggered, recovery attempted
- TERMINATED - Agent stopped, handoff required
Execute the health status script:
python3 scripts/core/health-status-command.py \
--agent-id "$AGENT_ID" \
--task-id "$TASK_ID" \
[options]
Options
| Option | Description |
|---|---|
--agent-id ID | Show health for specific agent |
--task-id ID | Show health for all agents on task |
--all | Show health for all registered agents |
--circuit-breakers | Show circuit breaker status |
--register | Register new agent for monitoring |
--heartbeat | Record agent heartbeat |
--phase PHASE | Execution phase for heartbeat |
--progress PCT | Progress percentage (0-100) |
--transition STATE | Manually transition agent state |
--history | Show intervention history |
--history-limit N | Limit history entries (default: 10) |
--json | Output in JSON format |
--quiet | Minimal output |
Health State Transitions
┌─────────────┐
│ HEALTHY │◀──────────────────┐
└──────┬──────┘ │
│ missed heartbeats │ recovered
▼ │
┌─────────────┐ │
│ DEGRADED │───────────────────┤
└──────┬──────┘ │
│ no progress │
▼ │
┌─────────────┐ nudge │
│ STUCK │───────────────────┘
└──────┬──────┘
│ circuit breaker
▼
┌─────────────┐ recovery
│ FAILING │───────────────────┐
└──────┬──────┘ │
│ max retries │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ TERMINATED │ │ HEALTHY │
└─────────────┘ └─────────────┘
Examples
Check Specific Agent
/health-status --agent-id claude-20260127-143000
Output:
Agent Health: claude-20260127-143000
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
State: HEALTHY ✓
Task: H.8.3.7
Phase: implementing
Progress: 45%
Last Heartbeat: 2026-01-27T14:32:00Z (2m ago)
Nudges: 0
Escalations: 0
Circuit Breaker: CLOSED
Check All Agents
/health-status --all
Output:
Agent Health Dashboard
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Agent ID | Task | State | Last HB
----------------------------|----------|----------|----------
claude-20260127-143000 | H.8.3.7 | HEALTHY | 2m ago
claude-20260127-140000 | H.8.1.8 | HEALTHY | 5m ago
claude-20260127-120000 | J.1.2.3 | DEGRADED | 15m ago
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Total: 3 agents | 2 healthy | 1 degraded | 0 stuck | 0 failing
Check Circuit Breakers
/health-status --circuit-breakers
Output:
Circuit Breaker Status
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Circuit | State | Failures | Last Trip
----------------------|--------|----------|------------
checkpoint_creation | CLOSED | 0 | Never
cloud_sync | OPEN | 3 | 2026-01-27T14:00:00Z
intervention | CLOSED | 1 | 2026-01-26T10:00:00Z
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Register New Agent
/health-status --register --task-id H.8.5.4 --agent-id claude-20260127-150000
Output:
✅ Agent Registered
Agent ID: claude-20260127-150000
Task: H.8.5.4
Initial State: HEALTHY
Heartbeat Interval: 60s
Record Heartbeat
/health-status --heartbeat --agent-id claude-20260127-143000 --phase testing --progress 75
Output:
✅ Heartbeat Recorded
Agent: claude-20260127-143000
Phase: testing
Progress: 75%
State: HEALTHY
Next Expected: 2026-01-27T14:35:00Z
Integration
Works with:
/checkpoint --agent- Create checkpoints with health context/agent- Agent invocation with health registration- Context watcher daemon - Automated heartbeat monitoring
Scripts:
scripts/core/health-status-command.py- CLI wrapperscripts/core/ralph_wiggum/health_monitoring.py- HealthMonitoringService
Storage:
- Health data:
~/PROJECTS/.coditect-data/health/ - Circuit breakers:
~/PROJECTS/.coditect-data/health/circuit-breakers.json
Success Output
When health status displays:
✅ COMMAND COMPLETE: /health-status
Agents: <count>
Healthy: <count>
Alerts: <count>
Completion Checklist
Before marking complete:
- Health data retrieved
- State accurately displayed
- Circuit breakers checked (if requested)
- JSON format correct (if --json)
Failure Indicators
This command has FAILED if:
- ❌ Health directory not accessible
- ❌ Agent not found (when --agent-id specified)
- ❌ Invalid state transition requested
- ❌ Script execution error
When NOT to Use
Do NOT use when:
- Checking system health (use
/health-checkinstead) - No agents registered yet
- Need real-time monitoring (use context-watcher daemon)
Anti-Patterns (Avoid)
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Manual state changes | Bypasses health protocol | Let system manage transitions |
| Ignoring DEGRADED | Escalates to STUCK | Investigate degraded agents |
| Skip heartbeats | Agent appears dead | Ensure regular heartbeats |
Principles
This command embodies:
- #3 Complete Execution - Full health state capture
- #9 Based on Facts - Heartbeat-based evidence
- #6 Clear, Understandable - Visual state indicators
Full Standard: CODITECT-STANDARD-AUTOMATION.md
Scripts: scripts/core/health-status-command.py, scripts/core/ralph_wiggum/health_monitoring.py
Version: 1.0.0
Last Updated: 2026-01-27