Skip to main content

/health-status - Agent Health Monitoring

Monitor autonomous agent health states using the HealthMonitoringService (ADR-110). Shows agent health, heartbeat status, circuit breaker states, and intervention history.

Usage

# Show health for specific agent
/health-status --agent-id claude-20260127-143000

# Show all agent health
/health-status --all

# Show health for task's agents
/health-status --task-id H.8.3.7

# Show circuit breaker status
/health-status --circuit-breakers

# Register new agent for monitoring
/health-status --register --task-id H.8.3.7 --agent-id claude-20260127-143000

# Record heartbeat
/health-status --heartbeat --agent-id claude-20260127-143000 --phase implementing

# JSON output
/health-status --all --json

System Prompt

EXECUTION DIRECTIVE: When the user invokes this command, you MUST:

  1. IMMEDIATELY execute - no questions, no explanations first
  2. ALWAYS show full output from script/tool execution
  3. ALWAYS provide summary after execution completes

DO NOT:

  • Say "I don't need to take action" - you ALWAYS execute when invoked
  • Ask for confirmation unless requires_confirmation: true in frontmatter
  • Skip execution even if it seems redundant - run it anyway

The user invoking the command IS the confirmation.


You are executing the CODITECT agent health monitoring system (ADR-110).

Storage Location:

  • Health data: ~/PROJECTS/.coditect-data/health/
  • Format: JSON with agent states, circuit breakers, intervention history

Health States (5-State Model):

  1. HEALTHY - Agent operating normally, heartbeats on time
  2. DEGRADED - Missed heartbeats but recoverable
  3. STUCK - No progress detected, needs intervention
  4. FAILING - Circuit breaker triggered, recovery attempted
  5. TERMINATED - Agent stopped, handoff required

Execute the health status script:

python3 scripts/core/health-status-command.py \
--agent-id "$AGENT_ID" \
--task-id "$TASK_ID" \
[options]

Options

OptionDescription
--agent-id IDShow health for specific agent
--task-id IDShow health for all agents on task
--allShow health for all registered agents
--circuit-breakersShow circuit breaker status
--registerRegister new agent for monitoring
--heartbeatRecord agent heartbeat
--phase PHASEExecution phase for heartbeat
--progress PCTProgress percentage (0-100)
--transition STATEManually transition agent state
--historyShow intervention history
--history-limit NLimit history entries (default: 10)
--jsonOutput in JSON format
--quietMinimal output

Health State Transitions

              ┌─────────────┐
│ HEALTHY │◀──────────────────┐
└──────┬──────┘ │
│ missed heartbeats │ recovered
▼ │
┌─────────────┐ │
│ DEGRADED │───────────────────┤
└──────┬──────┘ │
│ no progress │
▼ │
┌─────────────┐ nudge │
│ STUCK │───────────────────┘
└──────┬──────┘
│ circuit breaker

┌─────────────┐ recovery
│ FAILING │───────────────────┐
└──────┬──────┘ │
│ max retries │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ TERMINATED │ │ HEALTHY │
└─────────────┘ └─────────────┘

Examples

Check Specific Agent

/health-status --agent-id claude-20260127-143000

Output:

Agent Health: claude-20260127-143000
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
State: HEALTHY ✓
Task: H.8.3.7
Phase: implementing
Progress: 45%
Last Heartbeat: 2026-01-27T14:32:00Z (2m ago)
Nudges: 0
Escalations: 0
Circuit Breaker: CLOSED

Check All Agents

/health-status --all

Output:

Agent Health Dashboard
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Agent ID | Task | State | Last HB
----------------------------|----------|----------|----------
claude-20260127-143000 | H.8.3.7 | HEALTHY | 2m ago
claude-20260127-140000 | H.8.1.8 | HEALTHY | 5m ago
claude-20260127-120000 | J.1.2.3 | DEGRADED | 15m ago
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Total: 3 agents | 2 healthy | 1 degraded | 0 stuck | 0 failing

Check Circuit Breakers

/health-status --circuit-breakers

Output:

Circuit Breaker Status
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Circuit | State | Failures | Last Trip
----------------------|--------|----------|------------
checkpoint_creation | CLOSED | 0 | Never
cloud_sync | OPEN | 3 | 2026-01-27T14:00:00Z
intervention | CLOSED | 1 | 2026-01-26T10:00:00Z
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Register New Agent

/health-status --register --task-id H.8.5.4 --agent-id claude-20260127-150000

Output:

✅ Agent Registered
Agent ID: claude-20260127-150000
Task: H.8.5.4
Initial State: HEALTHY
Heartbeat Interval: 60s

Record Heartbeat

/health-status --heartbeat --agent-id claude-20260127-143000 --phase testing --progress 75

Output:

✅ Heartbeat Recorded
Agent: claude-20260127-143000
Phase: testing
Progress: 75%
State: HEALTHY
Next Expected: 2026-01-27T14:35:00Z

Integration

Works with:

  • /checkpoint --agent - Create checkpoints with health context
  • /agent - Agent invocation with health registration
  • Context watcher daemon - Automated heartbeat monitoring

Scripts:

  • scripts/core/health-status-command.py - CLI wrapper
  • scripts/core/ralph_wiggum/health_monitoring.py - HealthMonitoringService

Storage:

  • Health data: ~/PROJECTS/.coditect-data/health/
  • Circuit breakers: ~/PROJECTS/.coditect-data/health/circuit-breakers.json

Success Output

When health status displays:

✅ COMMAND COMPLETE: /health-status
Agents: <count>
Healthy: <count>
Alerts: <count>

Completion Checklist

Before marking complete:

  • Health data retrieved
  • State accurately displayed
  • Circuit breakers checked (if requested)
  • JSON format correct (if --json)

Failure Indicators

This command has FAILED if:

  • ❌ Health directory not accessible
  • ❌ Agent not found (when --agent-id specified)
  • ❌ Invalid state transition requested
  • ❌ Script execution error

When NOT to Use

Do NOT use when:

  • Checking system health (use /health-check instead)
  • No agents registered yet
  • Need real-time monitoring (use context-watcher daemon)

Anti-Patterns (Avoid)

Anti-PatternProblemSolution
Manual state changesBypasses health protocolLet system manage transitions
Ignoring DEGRADEDEscalates to STUCKInvestigate degraded agents
Skip heartbeatsAgent appears deadEnsure regular heartbeats

Principles

This command embodies:

  • #3 Complete Execution - Full health state capture
  • #9 Based on Facts - Heartbeat-based evidence
  • #6 Clear, Understandable - Visual state indicators

Full Standard: CODITECT-STANDARD-AUTOMATION.md


Scripts: scripts/core/health-status-command.py, scripts/core/ralph_wiggum/health_monitoring.py Version: 1.0.0 Last Updated: 2026-01-27