ADR-027: Guardrail Engine
Status
Accepted - December 15, 2025
Context
As CODITECT moves toward semi-autonomous operation (ADR-006), we need safety mechanisms to prevent harmful actions, enforce policies, and maintain operational boundaries.
Problem Statement
- Safety: Autonomous agents could execute destructive commands (rm -rf, DROP TABLE)
- Policy Compliance: Enterprise customers require policy enforcement
- Rate Limiting: Prevent resource exhaustion and runaway operations
- Audit Trail: All actions must be logged for compliance
- Human Override: Critical operations require human approval
Requirements
- Pre-execution validation for all tool calls
- Post-execution audit logging
- Configurable policy rules per tenant/project
- Emergency stop capability
- Integration with intent classification (ADR-026)
Decision
Implement a Multi-Layer Guardrail Engine that validates actions before execution and audits after completion:
Architecture
┌─────────────────────────────────────────────────────────────────────┐
│ GUARDRAIL ENGINE ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Agent Action Request │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ LAYER 1: INTENT VALIDATION │ │
│ │ │ │
│ │ • Is this action consistent with stated intent? │ │
│ │ • Does it match the classified intent from ADR-026? │ │
│ │ • Is the target scope appropriate? │ │
│ │ │ │
│ │ ✓ PASS: Intent "debug_error" → Action "read_file" ✓ │ │
│ │ ✗ FAIL: Intent "read_docs" → Action "delete_file" ✗ │ │
│ └───────────────────────┬─────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ LAYER 2: POLICY VALIDATION │ │
│ │ │ │
│ │ Policy Rules: │ │
│ │ • No destructive operations without explicit permission │ │
│ │ • No access to paths outside workspace │ │
│ │ • No network calls to non-whitelisted domains │ │
│ │ • Rate limits per tool type │ │
│ │ │ │
│ │ ✓ Read file in workspace: ALLOWED │ │
│ │ ✗ Delete file without permission: BLOCKED │ │
│ └───────────────────────┬─────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ LAYER 3: RESOURCE VALIDATION │ │
│ │ │ │
│ │ • Token budget remaining │ │
│ │ • API rate limits │ │
│ │ • Concurrent operation limits │ │
│ │ • Session duration limits │ │
│ │ │ │
│ │ ✓ Token budget: 50K remaining (sufficient) │ │
│ │ ✗ Rate limit: 10 API calls/min exceeded │ │
│ └───────────────────────┬─────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ LAYER 4: HUMAN APPROVAL │ │
│ │ │ │
│ │ Requires approval if: │ │
│ │ • Destructive operation (delete, drop, force) │ │
│ │ • Production environment │ │
│ │ • Cost exceeds threshold │ │
│ │ • Security-sensitive action │ │
│ │ │ │
│ │ → Pause execution │ │
│ │ → Request human confirmation │ │
│ │ → Timeout after 5 minutes │ │
│ └───────────────────────┬─────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ ACTION EXECUTION │ │
│ │ │ │
│ │ Execute with: │ │
│ │ • Sandboxed environment │ │
│ │ • Timeout limits │ │
│ │ • Output capture │ │
│ └───────────────────────┬─────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ POST-EXECUTION AUDIT │ │
│ │ │ │
│ │ Log to sessions.db: │ │
│ │ • Action taken │ │
│ │ • Parameters used │ │
│ │ • Result/output │ │
│ │ • Duration │ │
│ │ • Guardrail checks passed │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
Policy Configuration
Policies are defined in YAML and can be customized per tenant/project:
# config/guardrail-policies.yaml
version: "1.0"
global:
# Apply to all tenants
destructive_operations:
require_confirmation: true
blocked_patterns:
- "rm -rf /"
- "DROP DATABASE"
- "git push --force origin main"
file_access:
allowed_paths:
- "${WORKSPACE}/**"
- "${HOME}/.coditect/**"
blocked_paths:
- "${HOME}/.ssh/**"
- "${HOME}/.aws/**"
- "/etc/**"
- "/var/**"
rate_limits:
bash_commands: 60/minute
file_writes: 100/minute
api_calls: 30/minute
token_budget:
warning_threshold: 0.8 # 80% of context
hard_limit: 0.95 # 95% of context
tenant_overrides:
enterprise_tenant:
destructive_operations:
require_confirmation: always
require_approval_from: ["admin", "lead"]
file_access:
blocked_paths:
- "**/production/**"
- "**/secrets/**"
Guardrail Rules
Built-in Rules:
| Rule ID | Category | Description | Default Action |
|---|---|---|---|
GR-001 | Destructive | rm/delete commands | Require confirmation |
GR-002 | Destructive | Database DROP/TRUNCATE | Block without approval |
GR-003 | Git | Force push to main/master | Block without approval |
GR-004 | Security | Access to secret files | Block |
GR-005 | Security | Network calls to unknown hosts | Warn |
GR-006 | Resource | Token budget exceeded | Pause and notify |
GR-007 | Resource | Rate limit exceeded | Throttle |
GR-008 | Scope | Action outside workspace | Block |
GR-009 | Intent | Action inconsistent with intent | Warn |
GR-010 | Cost | Estimated cost exceeds threshold | Require approval |
Implementation
from dataclasses import dataclass
from enum import Enum
from typing import Optional, List
import sqlite3
from datetime import datetime
class GuardrailResult(Enum):
ALLOWED = "allowed"
BLOCKED = "blocked"
REQUIRES_CONFIRMATION = "requires_confirmation"
THROTTLED = "throttled"
@dataclass
class ValidationResult:
result: GuardrailResult
rule_id: Optional[str]
message: str
details: dict
class GuardrailEngine:
def __init__(self, policy_path: str, sessions_db_path: str):
self.policies = self._load_policies(policy_path)
self.db_path = sessions_db_path
def validate_action(
self,
action: str,
parameters: dict,
context: dict
) -> ValidationResult:
"""
Validate an action against all guardrail layers.
Args:
action: The action to validate (e.g., "Bash", "Write", "Edit")
parameters: Action parameters (e.g., {"command": "rm -rf /"})
context: Execution context (session, intent, user, etc.)
Returns:
ValidationResult with allow/block decision and details
"""
# Layer 1: Intent Validation
result = self._validate_intent(action, parameters, context)
if result.result == GuardrailResult.BLOCKED:
return result
# Layer 2: Policy Validation
result = self._validate_policy(action, parameters, context)
if result.result == GuardrailResult.BLOCKED:
return result
# Layer 3: Resource Validation
result = self._validate_resources(action, context)
if result.result in [GuardrailResult.BLOCKED, GuardrailResult.THROTTLED]:
return result
# Layer 4: Human Approval Check
if self._requires_human_approval(action, parameters, context):
return ValidationResult(
result=GuardrailResult.REQUIRES_CONFIRMATION,
rule_id="GR-HUMAN",
message="This action requires human approval",
details={"action": action, "parameters": parameters}
)
return ValidationResult(
result=GuardrailResult.ALLOWED,
rule_id=None,
message="Action allowed",
details={}
)
def log_action(
self,
action: str,
parameters: dict,
result: str,
validation: ValidationResult,
duration_ms: int
):
"""Log action to sessions.db for audit trail (ADR-118 compliant)."""
conn = sqlite3.connect(self.db_path)
conn.execute("""
INSERT INTO guardrail_audit_log
(timestamp, action, parameters, result, validation_result,
rule_id, duration_ms, session_id)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
""", (
datetime.utcnow().isoformat(),
action,
json.dumps(parameters),
result,
validation.result.value,
validation.rule_id,
duration_ms,
self.current_session_id
))
conn.commit()
Storage Integration (ADR-118 Compliant)
Guardrail audit logs are stored in sessions.db (Tier 3):
-- In sessions.db
CREATE TABLE IF NOT EXISTS guardrail_audit_log (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp TEXT NOT NULL,
action TEXT NOT NULL,
parameters TEXT, -- JSON
result TEXT, -- Action result/output
validation_result TEXT NOT NULL, -- allowed, blocked, etc.
rule_id TEXT, -- Which rule triggered
duration_ms INTEGER,
session_id TEXT NOT NULL,
created_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX idx_guardrail_session ON guardrail_audit_log(session_id);
CREATE INDEX idx_guardrail_timestamp ON guardrail_audit_log(timestamp);
CREATE INDEX idx_guardrail_rule ON guardrail_audit_log(rule_id);
Hook Integration
The guardrail engine integrates with Claude Code hooks:
# hooks/guardrail-pre-tool.py
"""PreToolUse hook for guardrail validation."""
import json
import sys
from guardrail_engine import GuardrailEngine, GuardrailResult
def main():
input_data = json.loads(sys.stdin.read())
tool_name = input_data.get("tool_name")
tool_input = input_data.get("tool_input", {})
engine = GuardrailEngine(
policy_path="~/.coditect/config/guardrail-policies.yaml",
sessions_db_path="~/.coditect-data/context-storage/sessions.db"
)
result = engine.validate_action(
action=tool_name,
parameters=tool_input,
context={"session_id": input_data.get("session_id")}
)
if result.result == GuardrailResult.BLOCKED:
print(json.dumps({
"decision": "block",
"message": f"Guardrail {result.rule_id}: {result.message}"
}))
elif result.result == GuardrailResult.REQUIRES_CONFIRMATION:
print(json.dumps({
"decision": "ask",
"message": result.message
}))
else:
print(json.dumps({"decision": "allow"}))
if __name__ == "__main__":
main()
Emergency Stop
def emergency_stop(session_id: str, reason: str):
"""
Immediately halt all operations for a session.
Used when:
- Runaway operation detected
- User presses emergency stop
- System detects anomalous behavior
"""
# 1. Kill any running subprocesses
kill_session_processes(session_id)
# 2. Log emergency stop
log_emergency_stop(session_id, reason)
# 3. Notify user
send_notification(
session_id,
level="critical",
message=f"Emergency stop triggered: {reason}"
)
# 4. Create checkpoint before stop
create_emergency_checkpoint(session_id)
Consequences
Positive
- Safety: Prevents accidental destructive operations
- Compliance: Full audit trail for enterprise requirements
- Control: Human-in-the-loop for critical operations
- Flexibility: Configurable policies per tenant/project
Negative
- Latency: Adds ~10-20ms per action validation
- False Positives: Some legitimate actions may be blocked
- Complexity: Policy management overhead
Risks
| Risk | Mitigation |
|---|---|
| Over-blocking | Configurable policies with overrides |
| Bypass | All hooks are mandatory, no disable option |
| Performance | Caching for repeated policy checks |
Related Documents
- ADR-006: Autonomous Orchestration System
- ADR-191: Intent Classification System
- ADR-183: Governance Hook Architecture
- CODITECT-STANDARD-AUTOMATION.md
Author: CODITECT Team Approved: December 15, 2025 Migration: Migrated from cloud-infra per ADR-150 on 2026-02-03