Skip to main content

ADR-027: Guardrail Engine

Status

Accepted - December 15, 2025

Context

As CODITECT moves toward semi-autonomous operation (ADR-006), we need safety mechanisms to prevent harmful actions, enforce policies, and maintain operational boundaries.

Problem Statement

  1. Safety: Autonomous agents could execute destructive commands (rm -rf, DROP TABLE)
  2. Policy Compliance: Enterprise customers require policy enforcement
  3. Rate Limiting: Prevent resource exhaustion and runaway operations
  4. Audit Trail: All actions must be logged for compliance
  5. Human Override: Critical operations require human approval

Requirements

  • Pre-execution validation for all tool calls
  • Post-execution audit logging
  • Configurable policy rules per tenant/project
  • Emergency stop capability
  • Integration with intent classification (ADR-026)

Decision

Implement a Multi-Layer Guardrail Engine that validates actions before execution and audits after completion:

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│ GUARDRAIL ENGINE ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Agent Action Request │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ LAYER 1: INTENT VALIDATION │ │
│ │ │ │
│ │ • Is this action consistent with stated intent? │ │
│ │ • Does it match the classified intent from ADR-026? │ │
│ │ • Is the target scope appropriate? │ │
│ │ │ │
│ │ ✓ PASS: Intent "debug_error" → Action "read_file" ✓ │ │
│ │ ✗ FAIL: Intent "read_docs" → Action "delete_file" ✗ │ │
│ └───────────────────────┬─────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ LAYER 2: POLICY VALIDATION │ │
│ │ │ │
│ │ Policy Rules: │ │
│ │ • No destructive operations without explicit permission │ │
│ │ • No access to paths outside workspace │ │
│ │ • No network calls to non-whitelisted domains │ │
│ │ • Rate limits per tool type │ │
│ │ │ │
│ │ ✓ Read file in workspace: ALLOWED │ │
│ │ ✗ Delete file without permission: BLOCKED │ │
│ └───────────────────────┬─────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ LAYER 3: RESOURCE VALIDATION │ │
│ │ │ │
│ │ • Token budget remaining │ │
│ │ • API rate limits │ │
│ │ • Concurrent operation limits │ │
│ │ • Session duration limits │ │
│ │ │ │
│ │ ✓ Token budget: 50K remaining (sufficient) │ │
│ │ ✗ Rate limit: 10 API calls/min exceeded │ │
│ └───────────────────────┬─────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ LAYER 4: HUMAN APPROVAL │ │
│ │ │ │
│ │ Requires approval if: │ │
│ │ • Destructive operation (delete, drop, force) │ │
│ │ • Production environment │ │
│ │ • Cost exceeds threshold │ │
│ │ • Security-sensitive action │ │
│ │ │ │
│ │ → Pause execution │ │
│ │ → Request human confirmation │ │
│ │ → Timeout after 5 minutes │ │
│ └───────────────────────┬─────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ ACTION EXECUTION │ │
│ │ │ │
│ │ Execute with: │ │
│ │ • Sandboxed environment │ │
│ │ • Timeout limits │ │
│ │ • Output capture │ │
│ └───────────────────────┬─────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ POST-EXECUTION AUDIT │ │
│ │ │ │
│ │ Log to sessions.db: │ │
│ │ • Action taken │ │
│ │ • Parameters used │ │
│ │ • Result/output │ │
│ │ • Duration │ │
│ │ • Guardrail checks passed │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘

Policy Configuration

Policies are defined in YAML and can be customized per tenant/project:

# config/guardrail-policies.yaml
version: "1.0"

global:
# Apply to all tenants
destructive_operations:
require_confirmation: true
blocked_patterns:
- "rm -rf /"
- "DROP DATABASE"
- "git push --force origin main"

file_access:
allowed_paths:
- "${WORKSPACE}/**"
- "${HOME}/.coditect/**"
blocked_paths:
- "${HOME}/.ssh/**"
- "${HOME}/.aws/**"
- "/etc/**"
- "/var/**"

rate_limits:
bash_commands: 60/minute
file_writes: 100/minute
api_calls: 30/minute

token_budget:
warning_threshold: 0.8 # 80% of context
hard_limit: 0.95 # 95% of context

tenant_overrides:
enterprise_tenant:
destructive_operations:
require_confirmation: always
require_approval_from: ["admin", "lead"]

file_access:
blocked_paths:
- "**/production/**"
- "**/secrets/**"

Guardrail Rules

Built-in Rules:

Rule IDCategoryDescriptionDefault Action
GR-001Destructiverm/delete commandsRequire confirmation
GR-002DestructiveDatabase DROP/TRUNCATEBlock without approval
GR-003GitForce push to main/masterBlock without approval
GR-004SecurityAccess to secret filesBlock
GR-005SecurityNetwork calls to unknown hostsWarn
GR-006ResourceToken budget exceededPause and notify
GR-007ResourceRate limit exceededThrottle
GR-008ScopeAction outside workspaceBlock
GR-009IntentAction inconsistent with intentWarn
GR-010CostEstimated cost exceeds thresholdRequire approval

Implementation

from dataclasses import dataclass
from enum import Enum
from typing import Optional, List
import sqlite3
from datetime import datetime

class GuardrailResult(Enum):
ALLOWED = "allowed"
BLOCKED = "blocked"
REQUIRES_CONFIRMATION = "requires_confirmation"
THROTTLED = "throttled"

@dataclass
class ValidationResult:
result: GuardrailResult
rule_id: Optional[str]
message: str
details: dict

class GuardrailEngine:
def __init__(self, policy_path: str, sessions_db_path: str):
self.policies = self._load_policies(policy_path)
self.db_path = sessions_db_path

def validate_action(
self,
action: str,
parameters: dict,
context: dict
) -> ValidationResult:
"""
Validate an action against all guardrail layers.

Args:
action: The action to validate (e.g., "Bash", "Write", "Edit")
parameters: Action parameters (e.g., {"command": "rm -rf /"})
context: Execution context (session, intent, user, etc.)

Returns:
ValidationResult with allow/block decision and details
"""

# Layer 1: Intent Validation
result = self._validate_intent(action, parameters, context)
if result.result == GuardrailResult.BLOCKED:
return result

# Layer 2: Policy Validation
result = self._validate_policy(action, parameters, context)
if result.result == GuardrailResult.BLOCKED:
return result

# Layer 3: Resource Validation
result = self._validate_resources(action, context)
if result.result in [GuardrailResult.BLOCKED, GuardrailResult.THROTTLED]:
return result

# Layer 4: Human Approval Check
if self._requires_human_approval(action, parameters, context):
return ValidationResult(
result=GuardrailResult.REQUIRES_CONFIRMATION,
rule_id="GR-HUMAN",
message="This action requires human approval",
details={"action": action, "parameters": parameters}
)

return ValidationResult(
result=GuardrailResult.ALLOWED,
rule_id=None,
message="Action allowed",
details={}
)

def log_action(
self,
action: str,
parameters: dict,
result: str,
validation: ValidationResult,
duration_ms: int
):
"""Log action to sessions.db for audit trail (ADR-118 compliant)."""
conn = sqlite3.connect(self.db_path)
conn.execute("""
INSERT INTO guardrail_audit_log
(timestamp, action, parameters, result, validation_result,
rule_id, duration_ms, session_id)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
""", (
datetime.utcnow().isoformat(),
action,
json.dumps(parameters),
result,
validation.result.value,
validation.rule_id,
duration_ms,
self.current_session_id
))
conn.commit()

Storage Integration (ADR-118 Compliant)

Guardrail audit logs are stored in sessions.db (Tier 3):

-- In sessions.db
CREATE TABLE IF NOT EXISTS guardrail_audit_log (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp TEXT NOT NULL,
action TEXT NOT NULL,
parameters TEXT, -- JSON
result TEXT, -- Action result/output
validation_result TEXT NOT NULL, -- allowed, blocked, etc.
rule_id TEXT, -- Which rule triggered
duration_ms INTEGER,
session_id TEXT NOT NULL,
created_at TEXT DEFAULT (datetime('now'))
);

CREATE INDEX idx_guardrail_session ON guardrail_audit_log(session_id);
CREATE INDEX idx_guardrail_timestamp ON guardrail_audit_log(timestamp);
CREATE INDEX idx_guardrail_rule ON guardrail_audit_log(rule_id);

Hook Integration

The guardrail engine integrates with Claude Code hooks:

# hooks/guardrail-pre-tool.py
"""PreToolUse hook for guardrail validation."""

import json
import sys
from guardrail_engine import GuardrailEngine, GuardrailResult

def main():
input_data = json.loads(sys.stdin.read())
tool_name = input_data.get("tool_name")
tool_input = input_data.get("tool_input", {})

engine = GuardrailEngine(
policy_path="~/.coditect/config/guardrail-policies.yaml",
sessions_db_path="~/.coditect-data/context-storage/sessions.db"
)

result = engine.validate_action(
action=tool_name,
parameters=tool_input,
context={"session_id": input_data.get("session_id")}
)

if result.result == GuardrailResult.BLOCKED:
print(json.dumps({
"decision": "block",
"message": f"Guardrail {result.rule_id}: {result.message}"
}))
elif result.result == GuardrailResult.REQUIRES_CONFIRMATION:
print(json.dumps({
"decision": "ask",
"message": result.message
}))
else:
print(json.dumps({"decision": "allow"}))

if __name__ == "__main__":
main()

Emergency Stop

def emergency_stop(session_id: str, reason: str):
"""
Immediately halt all operations for a session.

Used when:
- Runaway operation detected
- User presses emergency stop
- System detects anomalous behavior
"""

# 1. Kill any running subprocesses
kill_session_processes(session_id)

# 2. Log emergency stop
log_emergency_stop(session_id, reason)

# 3. Notify user
send_notification(
session_id,
level="critical",
message=f"Emergency stop triggered: {reason}"
)

# 4. Create checkpoint before stop
create_emergency_checkpoint(session_id)

Consequences

Positive

  1. Safety: Prevents accidental destructive operations
  2. Compliance: Full audit trail for enterprise requirements
  3. Control: Human-in-the-loop for critical operations
  4. Flexibility: Configurable policies per tenant/project

Negative

  1. Latency: Adds ~10-20ms per action validation
  2. False Positives: Some legitimate actions may be blocked
  3. Complexity: Policy management overhead

Risks

RiskMitigation
Over-blockingConfigurable policies with overrides
BypassAll hooks are mandatory, no disable option
PerformanceCaching for repeated policy checks

Author: CODITECT Team Approved: December 15, 2025 Migration: Migrated from cloud-infra per ADR-150 on 2026-02-03