ADR-027: Guardrail Engine

Status

Accepted - December 15, 2025

Context

As CODITECT moves toward semi-autonomous operation (ADR-006), we need safety mechanisms to prevent harmful actions, enforce policies, and maintain operational boundaries.

Problem Statement

Safety: Autonomous agents could execute destructive commands (rm -rf, DROP TABLE)
Policy Compliance: Enterprise customers require policy enforcement
Rate Limiting: Prevent resource exhaustion and runaway operations
Audit Trail: All actions must be logged for compliance
Human Override: Critical operations require human approval

Requirements

Pre-execution validation for all tool calls
Post-execution audit logging
Configurable policy rules per tenant/project
Emergency stop capability
Integration with intent classification (ADR-026)

Decision

Implement a Multi-Layer Guardrail Engine that validates actions before execution and audits after completion:

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                      GUARDRAIL ENGINE ARCHITECTURE                   │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│   Agent Action Request                                              │
│         │                                                            │
│         ▼                                                            │
│   ┌─────────────────────────────────────────────────────────────┐   │
│   │                 LAYER 1: INTENT VALIDATION                   │   │
│   │                                                              │   │
│   │  • Is this action consistent with stated intent?            │   │
│   │  • Does it match the classified intent from ADR-026?        │   │
│   │  • Is the target scope appropriate?                         │   │
│   │                                                              │   │
│   │  ✓ PASS: Intent "debug_error" → Action "read_file" ✓       │   │
│   │  ✗ FAIL: Intent "read_docs" → Action "delete_file" ✗       │   │
│   └───────────────────────┬─────────────────────────────────────┘   │
│                           │                                          │
│                           ▼                                          │
│   ┌─────────────────────────────────────────────────────────────┐   │
│   │                 LAYER 2: POLICY VALIDATION                   │   │
│   │                                                              │   │
│   │  Policy Rules:                                               │   │
│   │  • No destructive operations without explicit permission    │   │
│   │  • No access to paths outside workspace                     │   │
│   │  • No network calls to non-whitelisted domains              │   │
│   │  • Rate limits per tool type                                │   │
│   │                                                              │   │
│   │  ✓ Read file in workspace: ALLOWED                         │   │
│   │  ✗ Delete file without permission: BLOCKED                 │   │
│   └───────────────────────┬─────────────────────────────────────┘   │
│                           │                                          │
│                           ▼                                          │
│   ┌─────────────────────────────────────────────────────────────┐   │
│   │                 LAYER 3: RESOURCE VALIDATION                 │   │
│   │                                                              │   │
│   │  • Token budget remaining                                   │   │
│   │  • API rate limits                                          │   │
│   │  • Concurrent operation limits                              │   │
│   │  • Session duration limits                                  │   │
│   │                                                              │   │
│   │  ✓ Token budget: 50K remaining (sufficient)                │   │
│   │  ✗ Rate limit: 10 API calls/min exceeded                   │   │
│   └───────────────────────┬─────────────────────────────────────┘   │
│                           │                                          │
│                           ▼                                          │
│   ┌─────────────────────────────────────────────────────────────┐   │
│   │                 LAYER 4: HUMAN APPROVAL                      │   │
│   │                                                              │   │
│   │  Requires approval if:                                      │   │
│   │  • Destructive operation (delete, drop, force)              │   │
│   │  • Production environment                                   │   │
│   │  • Cost exceeds threshold                                   │   │
│   │  • Security-sensitive action                                │   │
│   │                                                              │   │
│   │  → Pause execution                                          │   │
│   │  → Request human confirmation                               │   │
│   │  → Timeout after 5 minutes                                  │   │
│   └───────────────────────┬─────────────────────────────────────┘   │
│                           │                                          │
│                           ▼                                          │
│   ┌─────────────────────────────────────────────────────────────┐   │
│   │                    ACTION EXECUTION                          │   │
│   │                                                              │   │
│   │  Execute with:                                               │   │
│   │  • Sandboxed environment                                    │   │
│   │  • Timeout limits                                           │   │
│   │  • Output capture                                           │   │
│   └───────────────────────┬─────────────────────────────────────┘   │
│                           │                                          │
│                           ▼                                          │
│   ┌─────────────────────────────────────────────────────────────┐   │
│   │                  POST-EXECUTION AUDIT                        │   │
│   │                                                              │   │
│   │  Log to sessions.db:                                        │   │
│   │  • Action taken                                             │   │
│   │  • Parameters used                                          │   │
│   │  • Result/output                                            │   │
│   │  • Duration                                                 │   │
│   │  • Guardrail checks passed                                  │   │
│   └─────────────────────────────────────────────────────────────┘   │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Policy Configuration

Policies are defined in YAML and can be customized per tenant/project:

# config/guardrail-policies.yaml
version: "1.0"

global:
  # Apply to all tenants
  destructive_operations:
    require_confirmation: true
    blocked_patterns:
      - "rm -rf /"
      - "DROP DATABASE"
      - "git push --force origin main"

  file_access:
    allowed_paths:
      - "${WORKSPACE}/**"
      - "${HOME}/.coditect/**"
    blocked_paths:
      - "${HOME}/.ssh/**"
      - "${HOME}/.aws/**"
      - "/etc/**"
      - "/var/**"

  rate_limits:
    bash_commands: 60/minute
    file_writes: 100/minute
    api_calls: 30/minute

  token_budget:
    warning_threshold: 0.8  # 80% of context
    hard_limit: 0.95        # 95% of context

tenant_overrides:
  enterprise_tenant:
    destructive_operations:
      require_confirmation: always
      require_approval_from: ["admin", "lead"]

    file_access:
      blocked_paths:
        - "**/production/**"
        - "**/secrets/**"

Guardrail Rules

Built-in Rules:

Rule ID	Category	Description	Default Action
`GR-001`	Destructive	rm/delete commands	Require confirmation
`GR-002`	Destructive	Database DROP/TRUNCATE	Block without approval
`GR-003`	Git	Force push to main/master	Block without approval
`GR-004`	Security	Access to secret files	Block
`GR-005`	Security	Network calls to unknown hosts	Warn
`GR-006`	Resource	Token budget exceeded	Pause and notify
`GR-007`	Resource	Rate limit exceeded	Throttle
`GR-008`	Scope	Action outside workspace	Block
`GR-009`	Intent	Action inconsistent with intent	Warn
`GR-010`	Cost	Estimated cost exceeds threshold	Require approval

Implementation

from dataclasses import dataclass
from enum import Enum
from typing import Optional, List
import sqlite3
from datetime import datetime

class GuardrailResult(Enum):
    ALLOWED = "allowed"
    BLOCKED = "blocked"
    REQUIRES_CONFIRMATION = "requires_confirmation"
    THROTTLED = "throttled"

@dataclass
class ValidationResult:
    result: GuardrailResult
    rule_id: Optional[str]
    message: str
    details: dict

class GuardrailEngine:
    def __init__(self, policy_path: str, sessions_db_path: str):
        self.policies = self._load_policies(policy_path)
        self.db_path = sessions_db_path

    def validate_action(
        self,
        action: str,
        parameters: dict,
        context: dict
    ) -> ValidationResult:
        """
        Validate an action against all guardrail layers.

        Args:
            action: The action to validate (e.g., "Bash", "Write", "Edit")
            parameters: Action parameters (e.g., {"command": "rm -rf /"})
            context: Execution context (session, intent, user, etc.)

        Returns:
            ValidationResult with allow/block decision and details
        """

        # Layer 1: Intent Validation
        result = self._validate_intent(action, parameters, context)
        if result.result == GuardrailResult.BLOCKED:
            return result

        # Layer 2: Policy Validation
        result = self._validate_policy(action, parameters, context)
        if result.result == GuardrailResult.BLOCKED:
            return result

        # Layer 3: Resource Validation
        result = self._validate_resources(action, context)
        if result.result in [GuardrailResult.BLOCKED, GuardrailResult.THROTTLED]:
            return result

        # Layer 4: Human Approval Check
        if self._requires_human_approval(action, parameters, context):
            return ValidationResult(
                result=GuardrailResult.REQUIRES_CONFIRMATION,
                rule_id="GR-HUMAN",
                message="This action requires human approval",
                details={"action": action, "parameters": parameters}
            )

        return ValidationResult(
            result=GuardrailResult.ALLOWED,
            rule_id=None,
            message="Action allowed",
            details={}
        )

    def log_action(
        self,
        action: str,
        parameters: dict,
        result: str,
        validation: ValidationResult,
        duration_ms: int
    ):
        """Log action to sessions.db for audit trail (ADR-118 compliant)."""
        conn = sqlite3.connect(self.db_path)
        conn.execute("""
            INSERT INTO guardrail_audit_log
            (timestamp, action, parameters, result, validation_result,
             rule_id, duration_ms, session_id)
            VALUES (?, ?, ?, ?, ?, ?, ?, ?)
        """, (
            datetime.utcnow().isoformat(),
            action,
            json.dumps(parameters),
            result,
            validation.result.value,
            validation.rule_id,
            duration_ms,
            self.current_session_id
        ))
        conn.commit()

Storage Integration (ADR-118 Compliant)

Guardrail audit logs are stored in sessions.db (Tier 3):

-- In sessions.db
CREATE TABLE IF NOT EXISTS guardrail_audit_log (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    timestamp TEXT NOT NULL,
    action TEXT NOT NULL,
    parameters TEXT,  -- JSON
    result TEXT,      -- Action result/output
    validation_result TEXT NOT NULL,  -- allowed, blocked, etc.
    rule_id TEXT,     -- Which rule triggered
    duration_ms INTEGER,
    session_id TEXT NOT NULL,
    created_at TEXT DEFAULT (datetime('now'))
);

CREATE INDEX idx_guardrail_session ON guardrail_audit_log(session_id);
CREATE INDEX idx_guardrail_timestamp ON guardrail_audit_log(timestamp);
CREATE INDEX idx_guardrail_rule ON guardrail_audit_log(rule_id);

Hook Integration

The guardrail engine integrates with Claude Code hooks:

# hooks/guardrail-pre-tool.py
"""PreToolUse hook for guardrail validation."""

import json
import sys
from guardrail_engine import GuardrailEngine, GuardrailResult

def main():
    input_data = json.loads(sys.stdin.read())
    tool_name = input_data.get("tool_name")
    tool_input = input_data.get("tool_input", {})

    engine = GuardrailEngine(
        policy_path="~/.coditect/config/guardrail-policies.yaml",
        sessions_db_path="~/.coditect-data/context-storage/sessions.db"
    )

    result = engine.validate_action(
        action=tool_name,
        parameters=tool_input,
        context={"session_id": input_data.get("session_id")}
    )

    if result.result == GuardrailResult.BLOCKED:
        print(json.dumps({
            "decision": "block",
            "message": f"Guardrail {result.rule_id}: {result.message}"
        }))
    elif result.result == GuardrailResult.REQUIRES_CONFIRMATION:
        print(json.dumps({
            "decision": "ask",
            "message": result.message
        }))
    else:
        print(json.dumps({"decision": "allow"}))

if __name__ == "__main__":
    main()

Emergency Stop

def emergency_stop(session_id: str, reason: str):
    """
    Immediately halt all operations for a session.

    Used when:
    - Runaway operation detected
    - User presses emergency stop
    - System detects anomalous behavior
    """

    # 1. Kill any running subprocesses
    kill_session_processes(session_id)

    # 2. Log emergency stop
    log_emergency_stop(session_id, reason)

    # 3. Notify user
    send_notification(
        session_id,
        level="critical",
        message=f"Emergency stop triggered: {reason}"
    )

    # 4. Create checkpoint before stop
    create_emergency_checkpoint(session_id)

Consequences

Positive

Safety: Prevents accidental destructive operations
Compliance: Full audit trail for enterprise requirements
Control: Human-in-the-loop for critical operations
Flexibility: Configurable policies per tenant/project

Negative

Latency: Adds ~10-20ms per action validation
False Positives: Some legitimate actions may be blocked
Complexity: Policy management overhead

Risks

Risk	Mitigation
Over-blocking	Configurable policies with overrides
Bypass	All hooks are mandatory, no disable option
Performance	Caching for repeated policy checks

Author: CODITECT Team Approved: December 15, 2025 Migration: Migrated from cloud-infra per ADR-150 on 2026-02-03

Status​

Context​

Problem Statement​

Requirements​

Decision​

Architecture​

Policy Configuration​

Guardrail Rules​

Implementation​

Storage Integration (ADR-118 Compliant)​

Hook Integration​

Emergency Stop​

Consequences​

Positive​

Negative​

Risks​

Related Documents​