ClawGuard AI Agent Security Ecosystem — Follow-Up Research Prompts
Date: 2026-02-18 Generated by: research-agent (Claude Sonnet 4.6) Status: Ready for execution — each prompt is self-contained
Context Summary for Prompt Executors
CODITECT is a multi-tenant AI agent platform running 776 agents via 118 Claude Code hooks. Research on 2026-02-18 evaluated the ClawGuard open-source AI agent security ecosystem (3 repositories: ClawGuardian/superglue-ai, JaydenBeard/clawguard, maxxie114/ClawGuard) and produced a Conditional Go recommendation: adopt patterns and architecture as reference, build CODITECT-native Python implementation.
Key decisions made (see artifacts/adrs/):
- ADR-001: Six-component security layer — SecurityGateHook, PatternEngine, RiskAnalyzer, ActionRouter, MonitorDashboard, AuditLogger
- ADR-003: Fail-open default with per-tenant fail-closed opt-in; Ralph Wiggum autonomous loops always fail-closed
- ADR-004: Hybrid risk scoring — numeric (0-100) plus categorical (critical/high/medium/low)
- ADR-005: Three-mechanism supply chain verification — binary scanning, provenance verification, trust registry
The SDD at artifacts/sdd.md specifies the full implementation (~3,200 LOC Python + ~2,000 LOC TypeScript dashboard). Estimated 18-week build across 5 phases.
Key gaps identified requiring further research:
- No multi-tenant security policy isolation in source repos (largest gap)
- No Claude Code kill switch mechanism
- No agent identity in security events
- Regex-only detection (evasion-susceptible)
- Static pattern library with no maintenance process
- PII coverage minimal (consumer-oriented, not enterprise)
These 21 prompts target the open questions. Execute independently in any order.
Category 1: Architecture Deep-Dives
Prompt AD-01: Hook Pipeline Performance Under Production Load
Context: CODITECT's SecurityGateHook (SDD-CODITECT-SEC-001) must process every PreToolUse event for all 776 agents within a 500ms p99 timeout. The PatternEngine evaluates 80+ compiled regex patterns against up to 1 MB of tool call input on each invocation. Claude Code's hook system invokes hooks as subprocesses. CODITECT has no existing performance benchmarks for security hook overhead.
Research task:
Research Python regex performance optimization techniques applicable to a security scanning pipeline that must run 80+ patterns against arbitrary text payloads within 50ms (pattern-match-only target from SDD Section 7.4). Specifically:
- Benchmark
revsre2(Google RE2 viagoogle-re2orpyre2) for security pattern workloads — what is the real-world speedup on catastrophic backtracking candidates? - What is the correct Python approach for compiling 80+ regex patterns at startup and sharing compiled objects across concurrent hook invocations? Evaluate
functools.lru_cache, module-level constants, and explicitre.compile()caching. - How does
re.UNICODE | re.MULTILINEflag combination affect performance vsre.DOTALLalternatives for multi-line tool input scanning? - What is the subprocess startup overhead for Claude Code hooks invoked per-tool-call? Is there a way to run hooks as persistent long-lived processes rather than per-invocation subprocesses to eliminate startup cost?
- What are the Python
concurrent.futures.ThreadPoolExecutortimeout semantics for enforcing a 500ms scan deadline — specifically, doesfuture.result(timeout=0.5)reliably cancel a running regex match, or is signal-based interruption required?
Provide benchmark numbers where available. Recommend the optimal approach for CODITECT's 80-pattern, 1 MB payload, 500ms deadline constraint.
Prompt AD-02: Pattern Engine Optimization — Regex Compilation Caching and Tenant Override Merging
Context: CODITECT's PatternEngine (SDD Section 3.2) maintains a three-layer rule model: Layer 0 (platform non-overridable), Layer 1 (platform overridable), Layer 2 (tenant custom). Rules are stored as YAML files and loaded at process startup. Tenant overrides are stored in platform.db as JSON delta patches and cached with 30-second TTL. The PatternEngine must merge these layers on every scan request for each tenant.
Research task:
Research the optimal Python architecture for a multi-tenant pattern engine where:
- Base patterns (80+) are shared across all tenants and compiled once
- Per-tenant overrides modify which patterns are enabled/disabled and their action mappings
- Rule reloads must complete within 200ms without dropping in-flight scan requests
- SQLite WAL mode is the backing store for tenant configs (CODITECT
platform.db)
Specifically:
- What is the recommended Python pattern for a hot-reloadable config cache that invalidates per-tenant entries without restarting the process? Evaluate
threading.local,weakref.WeakValueDictionary, and athreading.RLock-protected dict. - How should the three-layer merge be implemented to avoid rebuilding the full merged rule set on every scan? Is a copy-on-write merged rule set per tenant with TTL appropriate, or is event-driven invalidation better?
- For tenant allowlists (tools or patterns that bypass scanning), what is the correct data structure for O(1) lookup —
frozenset,dict, or compiled trie — when checking against 50+ allowlist entries? - Research whether
watchdog(Python file system monitoring) orinotifyis the correct mechanism for detecting YAML rule file changes on macOS (the current development platform) and Linux (production), given that CODITECT operates on both.
Output: Recommended caching architecture with code sketch and estimated cache miss cost.
Prompt AD-03: SecurityGate Integration with CODITECT's 776-Agent Dispatch System
Context: CODITECT dispatches 776 agents via Claude Code's hook system. The SDD specifies three hook registrations: PreAgentStart, PreToolUse, and PostToolUse. Each is a Python subprocess invoked by Claude Code before/after tool execution. The current hook system has 118 existing hooks; none perform security evaluation. The hook invocation protocol is defined in the CODITECT hook system, which uses stdin/stdout JSON communication and exit codes (0 = allow, 2 = block) as the enforcement protocol.
Research task:
Research how Claude Code's hook system handles hook conflicts, priorities, and composition when multiple hooks register for the same event:
- What is Claude Code's documented behavior when two
PreToolUsehooks both attempt to modify tool input (REDACT action)? Is the output of the first hook passed as input to the second, or does each hook see the original input? - What exit code semantics does Claude Code use for hook responses? The SDD specifies exit 0 (allow) and exit 2 (block) — is exit 1 treated differently? What happens if a hook exits with a non-zero code that is not 2?
- How does Claude Code's hook system handle hook timeouts? Is there a documented maximum hook execution time, and what happens when a hook exceeds it — does Claude Code kill the subprocess, or does the tool call proceed regardless?
- When the SecurityGateHook performs a REDACT action (modify tool input before execution), what is the correct mechanism to pass the modified input back to Claude Code — is STDOUT JSON the correct channel, or is there a different protocol?
- Research whether CODITECT's existing
task_id_validator.pyandtask-tracking-enforcer.pyhooks (which already implement PreToolUse blocking via exit codes) can serve as implementation templates for the SecurityGateHook.
Provide the definitive Claude Code hook protocol spec relevant to SecurityGateHook implementation.
Prompt AD-04: Fail-Open vs Fail-Closed Behavior Under Edge Cases
Context: ADR-003 (at artifacts/adrs/ADR-003-fail-open-vs-fail-closed-security-gate.md) established the CODITECT fail behavior decision: fail-open by default, fail-closed configurable per tenant, always fail-closed for Ralph Wiggum autonomous loops. The SDD (Section 8) documents the failure taxonomy and circuit breaker behavior. Three unresolved edge cases were identified during ADR drafting that require further research.
Research task:
Research the behavior and correct implementation strategy for three specific edge cases in CODITECT's security gate fail-closed/fail-open decision:
-
Scan timeout during fail-closed: When
fail_mode = "closed"and a scan times out at 500ms, the tool call is blocked. However, Claude Code may re-issue the same tool call (agent retry). Research whether Claude Code has a documented retry mechanism for blocked tool calls, and whether the SecurityGateHook needs idempotency guarantees on repeated invocations for the same tool call ID. -
Partial REDACT failure: If the PatternEngine identifies 3 secrets in a tool output but the redaction function fails on the 2nd secret (e.g., regex substitution error on malformed input), should the security gate: (a) return the partially-redacted output with the 2 successful redactions applied, (b) block entirely (fail-closed on partial redaction), or (c) return the original unredacted output with a WARN? Research the security implications of each approach and identify what comparable systems do.
-
Concurrent hook invocations for the same session: Ralph Wiggum loops can invoke multiple tool calls in rapid succession. If SecurityGateHook invocations for the same session_id are running concurrently (tool A still scanning while tool B arrives), how should the per-session state (e.g., user-confirmed tools list, session risk accumulation) be managed safely across concurrent Python processes? Research whether SQLite WAL mode provides sufficient isolation for this access pattern or if a separate lock file / in-memory coordination layer is needed.
Output: Recommended implementation for each edge case with security justification.
Prompt AD-05: Fail-Open Default Reversal — Emerging Best Practices
Context: ADR-003 chose fail-open as the default with fail-closed opt-in, prioritizing developer experience. The SDD documents this as the factory default in security_gate.default.yaml. However, the AI agent security field is rapidly evolving and industry norms for security gate defaults may differ from CODITECT's chosen approach. This prompt validates or challenges the ADR-003 decision with current external evidence.
Research task:
Research current industry practices and expert guidance for AI agent security gate fail behavior defaults (as of early 2026):
- What do current AI agent security frameworks (Invariant Labs mcp-scan, LangChain security middleware, Guardrails AI) use as their default fail behavior when the security checker encounters an error? Is fail-open or fail-closed the prevailing default?
- What do OWASP, NIST, and CISA guidance documents say about fail behavior defaults for security interception layers in production systems? Is there a canonical recommendation?
- Research the operational incidents that have occurred in 2024-2025 where AI agent security tools failed open and what the consequences were — are there documented cases where a security scanner failure led to an exploitable window?
- Given CODITECT's specific context — a platform publicly launching March 11, 2026 with enterprise pilot tenants — is fail-open default a defensible security posture, or does it expose CODITECT to reputational or contractual risk in enterprise sales contexts?
Deliver: A recommendation on whether ADR-003's fail-open default should be reconsidered before the March 11 launch, with supporting evidence.
Category 2: Compliance and Regulatory
Prompt CR-01: SOC 2 Type II Evidence Generation from Security Event Logs
Context: CODITECT's AuditLogger (SDD Section 3.6) writes security events to org.db with retention policies: TOOL_BLOCKED and TOOL_REDACTED events retained 1 year, KILL_SWITCH_ACTIVATED events retained 5 years. The SDD notes that the security layer "advances SOC 2 readiness but does not complete any control" (coditect-impact.md Section 3.1). CODITECT is targeting SOC 2 Type II compliance for enterprise customer sales.
Research task:
Research what specific evidence artifacts, log formats, and operational controls are required to satisfy each SOC 2 Type II Common Criteria (CC) control that the CODITECT security layer is designed to address:
- CC6.1 (Logical access controls): What audit evidence does SOC 2 require to demonstrate that AI agent tool execution is access-controlled? Does a pre-execution security gate satisfy this criterion, or is identity-based authorization also required?
- CC7.2 (System monitoring): What format and retention requirements do SOC 2 auditors typically expect for security event logs? Is CODITECT's
org.dbSQLite-based audit log acceptable, or do auditors expect immutable append-only storage (e.g., AWS CloudTrail, a write-once S3 bucket)? - CC7.3 (Security incident identification): What constitutes an "incident" under SOC 2 for AI agent security events? Does a TOOL_BLOCKED event constitute an incident requiring an incident response record, or only events that indicate a confirmed attack?
- CC8.1 (Change management): The SDD stores security rules as git-versioned YAML files. Does this satisfy SOC 2 change management requirements for security controls, or is a separate change advisory board (CAB) process required?
- What is the minimum viable SOC 2 evidence package that CODITECT could produce from its security event logs to support a Type II audit by the March 11, 2026 launch date?
Output: A prioritized list of SOC 2 evidence gaps and the minimum evidence set producible from the current SDD design.
Prompt CR-02: PII Handling Across Jurisdictions — GDPR, CCPA, and PIPEDA
Context: ClawGuardian's patterns/pii.ts covers four PII categories: SSN (US), credit card, email, phone. The coditect-impact.md analysis identified this as consumer-oriented and insufficient for enterprise use. CODITECT operates in multiple jurisdictions: US customers (CCPA), EU customers (GDPR), and Canadian customers (PIPEDA — noted specifically due to the active C-of-C-CANADA project in the CODITECT submodule inventory). The PII detection patterns in the SDD (rules PII-001 through PII-005) need to be extended.
Research task:
Research the PII definition differences across GDPR (EU), CCPA (California), and PIPEDA (Canada) and their implications for CODITECT's PII detection pattern library:
- GDPR scope: What categories of personal data does GDPR Article 4 and Article 9 (special categories) define that are NOT covered by ClawGuardian's current four patterns? Specifically: biometric identifiers, genetic data, health/medical data, religious/political beliefs, and EU-specific ID formats (national ID numbers, VAT numbers).
- CCPA scope: What categories does CCPA add beyond GDPR? Specifically: precise geolocation, browsing history, inferences drawn from personal information, and California driver's license formats.
- PIPEDA scope: What Canadian-specific PII patterns need to be added? Specifically: Social Insurance Number (SIN — format: 9 digits, often formatted as NNN NNN NNN), Canadian passport number, provincial health card numbers (format varies by province), and business number (BN — 9 digits).
- For each jurisdiction, what is the correct default action when PII is detected in an agent tool call — block, redact, or warn? Does GDPR's "data minimization" principle imply that redaction is mandatory rather than optional?
- Research the
phonenumbersPython library as the recommended equivalent for ClawGuardian'slibphonenumber-jsdependency — does it support Canadian phone number validation and NANP formatting?
Output: Extended PII pattern definitions for all three jurisdictions, with severity and action mappings.
Prompt CR-03: Audit Trail Requirements for Regulated Industries
Context: The SDD specifies that security audit events are written to org.db (SQLite, ADR-118) with no UPDATE or DELETE grants (SR-07: tamper-evident). The AuditLogger writes events with scan_duration_ms, matched_rule_ids, redacted_fields (field names only, not values), and reasoning text. CODITECT's enterprise pilot customers include financial services and healthcare-adjacent organizations. Immutable audit log requirements vary by industry.
Research task:
Research audit trail requirements for three regulated industry categories relevant to CODITECT's target market, and assess whether the SDD's current AuditLogger design satisfies them:
- Financial services (SOX, PCI-DSS): What are the log immutability requirements? PCI-DSS Requirement 10 specifies audit log protection from unauthorized modification — does SQLite WAL mode with no UPDATE/DELETE grants satisfy this, or is a cryptographic hash chain or write-once storage (WORM) required?
- Healthcare (HIPAA): HIPAA's Security Rule requires audit controls (45 CFR 164.312(b)). What specific audit log fields does HIPAA require for activity monitoring of systems that process PHI? Does CODITECT's current
AuditEventschema include all required fields? - US Federal (FedRAMP): If CODITECT ever serves US government customers, what are the NIST SP 800-53 AU (Audit and Accountability) control requirements? Which AU controls are relevant to AI agent security event logging?
- What is the current state of AI-specific regulatory guidance? Are there emerging frameworks (EU AI Act, NIST AI RMF, executive orders) that specify audit requirements specifically for AI agent systems that CODITECT should be aware of?
- Recommend whether CODITECT should implement cryptographic audit log integrity verification (e.g., a Merkle hash chain over
security_audit_eventsrows) before the March 11 launch or defer it as a post-launch enhancement.
Prompt CR-04: Data Residency Implications of Local-First Security Scanning
Context: All three ClawGuard tools are local-first — they process security events on the customer's local machine without sending data to external services. The coditect-impact.md analysis identified this as "a compliance asset, not a liability" for GDPR data residency obligations. CODITECT's platform operates as a SaaS product where customers use the Claude Code CLI locally but session data syncs to cloud infrastructure (gs://coditect-cloud-infra-context-backups). The tension between local-first scanning and cloud-synced session logs creates a data residency question.
Research task:
Research the data residency and data transfer implications of CODITECT's hybrid local/cloud architecture for AI agent security scanning:
- Under GDPR Article 44-49 (data transfers to third countries), when CODITECT's session logs (which may contain redacted PII after security scanning) are synced to Google Cloud Storage (
gs://coditect-cloud-infra-context-backups), what legal mechanisms apply? Standard Contractual Clauses (SCCs)? Does the GCP region matter? - For Canadian PIPEDA compliance with C-of-C-CANADA tenant data: what are Canada's data residency requirements for business data processed by AI agents? Does PIPEDA's "accountability" principle require that processing occur on Canadian infrastructure?
- The SDD's AuditLogger writes security events to
org.dbon the user's local machine. When these events sync to the cloud (via the existing CODITECT backup infrastructure), does theredacted_fieldslist (which records field names that were redacted, but not values) itself constitute personal data under GDPR? E.g., does logging "field 'email' was redacted from tool call" constitute processing of personal data? - Research whether "security event metadata" (tool name, risk score, matched rule IDs, scan duration) without any content is exempt from GDPR personal data restrictions, or whether context linkage (session_id → user) makes it personal data.
Output: A data residency risk assessment for the current SDD architecture and recommended mitigations.
Category 3: Multi-Agent Orchestration
Prompt MO-01: Security Policy Inheritance in Nested Agent Chains
Context: CODITECT's 776-agent system supports nested agent invocation — an orchestrator agent (e.g., senior-architect) dispatching to sub-agents (e.g., testing-specialist, security-specialist). Each agent invocation fires PreAgentStart and PreToolUse hooks. The SDD's SecurityGateHook extracts tenant_id from session context on every call. Ralph Wiggum autonomous loops (ADR-108/110/111) run nested agents in fresh-context iterations with checkpoint handoffs.
Research task:
Research the correct security policy inheritance model for nested AI agent chains, specifically for CODITECT's architecture:
- When orchestrator agent A dispatches to sub-agent B, should sub-agent B inherit A's security policy exactly, or should B's security policy be determined independently by B's agent class? For example, if A is a
senior-architectagent with medium-risk tolerance, and B is adevops-engineeragent that has a legitimate need to use cloud CLIs (which are HIGH risk under JaydenBeard's patterns), what should B's effective security policy be? - Research whether there is a standard "security context propagation" pattern in distributed systems (analogous to OpenTelemetry context propagation) that could be applied to multi-agent security policy chains.
- For Ralph Wiggum autonomous loops specifically: each loop iteration runs in a fresh context, but the loop is associated with a single task and tenant. Should the security policy for a Ralph Wiggum loop be evaluated at loop-start-time and applied immutably for the entire loop duration, or should it be re-evaluated on each fresh context iteration?
- Research the "confused deputy" problem as it applies to AI agent security: can sub-agent B be exploited to perform actions that orchestrator A's policy would block, by having B's policy evaluated at a different (less restrictive) point in the policy hierarchy?
Output: Recommended security policy inheritance model with CODITECT hook implementation sketch.
Prompt MO-02: Cross-Agent Data Leakage Prevention
Context: CODITECT agents pass data between each other via tool results, session context, and messaging.db. The SDD's PostToolUse hook (SecurityGateHook.on_tool_result) redacts secrets and PII from tool outputs before they return to the agent context window. However, data can flow between agents via other channels: the messaging.db inter-agent messaging database, shared context in sessions.db, and tool outputs that are explicitly formatted as inputs for downstream agents.
Research task:
Research cross-agent data leakage vectors specific to CODITECT's architecture and recommend prevention controls:
- Messaging channel: CODITECT's
messaging.dbsupports inter-agent messaging. Should messages passing through this channel be scanned by the security layer? What is the correct hook point — a dedicatedPreMessageSendhook, or inspection within the Write/Read tools that access messaging.db? - Context window accumulation: In a multi-turn agent session, a secret that was redacted from a single tool output may still exist in the agent's context window from a prior turn. How do comparable systems handle context window "memory contamination" — once a secret has appeared in the context, redacting subsequent occurrences provides limited protection. Research whether CODITECT needs a "context scrubbing" capability.
- Structured data exfiltration: An agent that has read a sensitive file can encode its contents in tool call arguments that individually appear benign — e.g., writing a series of files with filenames that encode base64 chunks of a secret. Research whether this class of covert channel attack is addressed by any current AI agent security framework.
- Research what "information flow control" (IFC) mechanisms from traditional computer security (taint analysis, lattice-based access control) can be practically applied to AI agent pipelines without prohibitive performance overhead.
Output: Priority ranking of cross-agent leakage vectors for CODITECT and recommended detection approach for each.
Prompt MO-03: Security Scanning for Agent-to-Agent Communication Protocols
Context: The SDD covers security scanning for tool calls (PreToolUse/PostToolUse) and agent session starts (PreAgentStart). However, CODITECT agents also communicate via the messaging.db database and through shared context structures. The ClawGuard ecosystem was designed for single-agent monitoring and has no concepts for inter-agent security. The SDD's scope explicitly excludes "agent-to-agent trust boundaries" as out of scope for the current implementation.
Research task:
Research emerging approaches to securing inter-agent communication in multi-agent AI systems, with specific applicability to CODITECT:
- Research whether Anthropic's Model Context Protocol (MCP) specification includes any security provisions for agent-to-agent message validation, authentication, or integrity checking. Does MCP define a trust model for agent communication?
- What does the academic and industry literature (2024-2025) say about securing multi-agent LLM system communication? Search for: "multi-agent LLM security," "agent trust boundaries," "agentic AI security."
- Research Google's A2A (Agent-to-Agent) protocol proposals and OpenAI's emerging agent communication standards — do any of these include security provisions that CODITECT could adopt?
- For CODITECT's specific case: when the
messaging.dbdatabase is used for inter-agent messaging, is a database-layer trigger (SQLite AFTER INSERT trigger that invokes a security check) a viable mechanism for scanning agent messages without requiring changes to every agent that writes messages?
Output: Assessment of inter-agent security standards maturity and recommended minimal viable control for CODITECT.
Prompt MO-04: Rate Limiting and Abuse Detection for Ralph Wiggum Autonomous Loops
Context: Ralph Wiggum loops (ADR-108/110/111) run CODITECT agents in fresh-context iterations autonomously for extended periods. The SDD notes (Section 5.3, JaydenBeard integration) that JaydenBeard's sequence detection (sequenceWindowMinutes: 5) identifies multi-step attack patterns. However, JaydenBeard has no rate limiting or alerting deduplication (identified as Gap 7.6 in coditect-impact.md): "a single misconfigured agent could generate hundreds of high-risk events per minute." Ralph Wiggum loops are the highest-risk context for this failure mode.
Research task:
Research rate limiting and abuse detection patterns applicable to autonomous AI agent loops:
- What rate limiting approaches are used by comparable systems (LangChain, AutoGPT, CrewAI) to prevent runaway agent loops from consuming resources or generating excessive security events? Is there a standard token bucket / leaky bucket implementation for AI agent rate limiting?
- For CODITECT's security layer specifically: if a Ralph Wiggum loop is generating more than N BLOCK events per minute for the same rule (e.g., a misconfigured agent repeatedly attempting the same blocked operation), what is the correct response — automatically escalate to kill-switch, pause the loop pending operator review, or rate-limit future blocks to avoid alert flood?
- Research "alert fatigue" mitigation patterns in security operations — what are the standard techniques for deduplicating, correlating, and suppressing repetitive security alerts without missing genuine attack escalation? Apply these to JaydenBeard's webhook alert system design.
- The SDD Section 11.3 identifies "False positive DoS" as a threat: "crafted content triggers excessive CONFIRM dialogs." Research how other security systems prevent CONFIRM/MFA fatigue attacks (also called "MFA bombing") and recommend a CODITECT-specific mitigation.
Output: Rate limiting and alert deduplication design for CODITECT's security layer with specific Ralph Wiggum considerations.
Category 4: Competitive and Market Intelligence
Prompt CI-01: OpenClaw Security Ecosystem Maturity vs Claude Code vs Cursor
Context: The ClawGuard ecosystem targets OpenClaw, MoltBot, and ClawdBot as the primary AI agent runtimes. CODITECT uses Claude Code as its agent runtime. The research evaluated three ClawGuard repos but did not assess the broader security ecosystem maturity for Claude Code specifically vs OpenClaw vs Cursor. CODITECT's go-to-market is as a security-forward enterprise AI development platform — understanding the competitive security landscape is strategically important.
Research task:
Research and compare the security tooling ecosystems for the three major AI coding agent platforms as of early 2026:
- OpenClaw security ecosystem: Beyond the three ClawGuard repos evaluated, what other security tools exist for OpenClaw? What is OpenClaw's native security model — does it have any built-in security controls, or is security entirely plugin-dependent?
- Claude Code security ecosystem: What security tooling or guidance has Anthropic published for Claude Code deployments? Does Claude Code have any built-in hook-based security mechanisms beyond the hook system itself? Are there third-party Claude Code security tools beyond the ClawGuard ecosystem?
- Cursor security posture: How does Cursor IDE handle agent security — prompt injection, secret detection, destructive command prevention? Is Cursor's security model enterprise-ready compared to what CODITECT is building?
- GitHub Copilot/Copilot Workspace: What security controls does GitHub provide for its AI agent features in enterprise contexts? Are there features (content filters, secret scanning integration) that CODITECT should benchmark against?
- What is CODITECT's defensible security differentiation claim vs these competitors given the SDD's planned security layer? Where is the security layer genuinely ahead of the market vs table stakes?
Output: Competitive security posture matrix with CODITECT differentiation claims.
Prompt CI-02: Commercial AI Agent Security Products
Context: The research focused on open-source ClawGuard tools. There is a growing commercial AI agent security market that CODITECT must understand — both as potential integration targets and as competitors. The coditect-impact.md analysis notes regex-only detection as a limitation (Gap 7.7) and suggests LLM-based prompt injection detection as a complementary capability. Commercial products likely offer this.
Research task:
Research the commercial AI agent security product landscape as of early 2026:
- Invariant Labs mcp-scan: What does Invariant Labs' mcp-scan product do? What detection mechanisms does it use (regex vs semantic vs LLM-based)? Is it open-source or commercial? What is the pricing model and enterprise applicability?
- Robust Intelligence / Protect AI: What AI-specific security products do Protect AI offer? Do they address the agent/agentic-AI security use case specifically, or are they focused on model security (adversarial robustness, model scanning)?
- Lakera AI (Gandalf): Lakera specializes in prompt injection detection. What is their API-based detection approach vs regex-based detection? Is Lakera's detection accuracy measurably superior to regex-only approaches for real-world prompt injection attacks?
- Other entrants: Research any 2024-2025 funded startups specifically targeting AI agent security (not just LLM security). What are they building, and does any directly compete with CODITECT's planned security layer?
- For CODITECT specifically: is integrating a commercial prompt injection detection API (e.g., Lakera) as a complementary capability alongside the regex PatternEngine feasible within the March 11 launch timeline? What is the latency impact of an API call in the PreToolUse critical path?
Output: Commercial product comparison with build-vs-buy recommendation for the prompt injection detection gap.
Prompt CI-03: Open-Source Security Tool Consolidation Trends
Context: The ClawGuard ecosystem represents one organic community's response to AI agent security needs. The research found three independently developed tools that address overlapping concerns with no coordination between them. This fragmentation is a research risk: CODITECT is building on a snapshot of a rapidly evolving ecosystem, and consolidation may produce a more comprehensive tool that supersedes the individual repos within months.
Research task:
Research the consolidation trends in open-source AI agent security tooling as of early 2026:
- OWASP LLM Top 10 tooling: Has the OWASP LLM Top 10 project (which covers prompt injection as LLM01) produced any reference implementations or tool recommendations that have become de-facto standards?
- Emerging standard frameworks: Research whether any open-source AI agent security frameworks have emerged in 2024-2025 that aim to be comprehensive (covering prompt injection + secret detection + PII + destructive commands) rather than single-concern tools. Examples to research:
langchain-experimental,guardrails-ai,rebuff,llm-guard. - MCP security ecosystem: The Model Context Protocol (MCP) by Anthropic is rapidly gaining adoption. Research whether MCP-specific security tools are emerging that would be more directly applicable to CODITECT's Claude Code/MCP-based architecture than the OpenClaw-targeted ClawGuard tools.
- GitHub trending: Research what AI agent security repositories have the highest GitHub activity (stars, commits, issues) in the past 6 months. Is the ClawGuard ecosystem still active, or has activity shifted elsewhere?
- Given the consolidation landscape, should CODITECT's security layer be designed as a contributor to an emerging open standard, or as a proprietary internal capability?
Output: Trend assessment with recommendation on CODITECT's open-source strategy for its security layer.
Category 5: Product Feature Extraction
Prompt PF-01: ClawGuardian Plugin Architecture as CODITECT Plugin SDK Template
Context: ClawGuardian (superglue-ai, score 85/100) implements a plugin manifest (openclaw.plugin.json) that defines hook registrations, configuration schema, and peer dependency declarations. The SDD recommends adopting ClawGuardian's architecture as reference, not as a direct dependency. CODITECT has a planned skill/plugin ecosystem (445 skills currently) but no formal plugin SDK for third-party contributors. ClawGuardian's manifest-driven plugin architecture could serve as a template.
Research task:
Research what a best-practice plugin SDK architecture looks like for an AI agent platform, using ClawGuardian as a starting reference:
- Analyze ClawGuardian's
openclaw.plugin.jsonmanifest structure (documented in research-context.json and coditect-impact.md). What fields does it define for hook registration, configuration schema, allowlists, and peer dependencies? Reconstruct the full schema from the research context. - Research how established plugin SDKs (VS Code extension API, Obsidian plugin API, Grafana plugin SDK) structure their manifest contracts — what is the minimal viable manifest that enables: (a) hook point declaration, (b) configuration schema for tenant customization, (c) capability declarations for marketplace trust rating, (d) dependency specification.
- What trust and verification requirements should CODITECT's plugin SDK impose on third-party plugins? Research the npm package signing model, the VS Code extension review process, and the Apple App Store review model — which provides the right balance for CODITECT's enterprise context?
- The SDD's ADR-005 (supply chain detection) describes a trust registry for third-party components. How should the CODITECT plugin SDK integrate with this trust registry — should every plugin submission require binary artifact scanning + provenance attestation as part of the SDK submission workflow?
Output: Draft plugin manifest schema for a CODITECT Plugin SDK, informed by ClawGuardian's architecture.
Prompt PF-02: JaydenBeard Dashboard as CODITECT Security Tab
Context: JaydenBeard's dashboard (@jaydenbeard/clawguard v0.4.1) is a Node.js/Express WebSocket dashboard running on localhost:3847 with 9 REST routes and real-time security event streaming. The SDD (Section 3.5) specifies a native CODITECT MonitorDashboard (FastAPI + React) as the long-term target, with JaydenBeard as an "Integration Pattern B" interim option (link to JaydenBeard from CODITECT UI). CODITECT's trajectory dashboard is a separate system. The coditect-impact.md identifies "Integration Pattern C — Native CODITECT Security Dashboard" as the 6-month target.
Research task:
Research what the CODITECT security dashboard tab should contain and how to implement it efficiently within the March 11 launch constraint:
- What are the minimum viable dashboard capabilities needed for an enterprise security dashboard at a SaaS launch? Research what security dashboards from comparable platforms (Datadog Security, Lacework, Wiz) surface as their "default view" — are real-time event feeds, session risk scores, or pattern hit-rate charts most operationally valuable at launch?
- For the interim "Integration Pattern B" approach (surfacing JaydenBeard's dashboard via a link), research whether iframe embedding or a redirect link is the appropriate UX pattern for integrating a localhost service into a SaaS product UI. What are the security implications (CORS, CSP) of embedding a localhost service in a SaaS iframe?
- Research React + WebSocket libraries for building the real-time security event feed component described in the SDD (Section 3.5). Evaluate
socket.io-client,native WebSocket API, and@tanstack/react-querywith WebSocket adapter — which is most suitable for a high-frequency (sub-200ms) security event stream? - The SDD specifies a kill switch at
POST /api/v1/security/gateway/{tenant_id}/killrequiring MFA. Research React UI patterns for MFA-gated destructive actions — what is the standard UX for a "confirm with TOTP/passkey" workflow in a React dashboard?
Output: Prioritized feature list for the CODITECT security dashboard MVP, with recommended technology choices.
Prompt PF-03: Pattern Library as Standalone Open-Source Product
Context: The research recommends extracting JaydenBeard's 55+ patterns, ClawGuardian's typed pattern modules, and maxxie114's injection patterns into a versioned CODITECT-native pattern library. This pattern library — once built with full test coverage and YAML rule format — could have standalone value as an open-source community contribution project separate from CODITECT's proprietary security layer implementation.
Research task:
Research the strategic and operational considerations for open-sourcing CODITECT's AI agent security pattern library as a standalone community project:
- Community contribution model: Research how successful open-source security rule sets are governed — ClamAV signatures, Sigma detection rules, YARA rules, Semgrep rules. What governance models do these projects use for accepting community pattern contributions? Which model is most appropriate for AI agent security patterns?
- Pattern library versioning: Research semantic versioning practices for security rule sets. When a new attack vector is discovered, should the pattern library use SEMVER PATCH (new rule added = non-breaking), MINOR (rule behavior change = potentially breaking for false-positive rates), or MAJOR (rule removed = breaking)? Are there existing conventions?
- False positive management: Research how open-source security rule sets handle false positive reports from the community. What process do projects like Sigma (SIEM detection rules) use to triage, validate, and incorporate false positive corrections?
- CODITECT brand vs neutral project: Should the open-source pattern library be published under a CODITECT brand (reinforcing the security narrative for enterprise sales) or as a neutral community project (maximizing adoption and contribution)? Research comparable cases where security companies open-sourced detection components.
- What license should the pattern library use? Research whether MIT (current ClawGuard license), Apache 2.0, or a dual license (open for detection, commercial for enhanced patterns) is standard for security rule set projects.
Output: Go/no-go recommendation for open-sourcing the pattern library, with governance model if Go.
Prompt PF-04: Trust Registry Concept for CODITECT Skill Marketplace
Context: ADR-005 (artifacts/adrs/ADR-005-malicious-supply-chain-detection.md) specifies a trust-registry.yaml for CODITECT's submodule and package verification. The same trust concept — verified provenance, pinned versions, attestation records, and an untrusted-list — could be extended to CODITECT's skill marketplace (445 skills today, planned for third-party contributions). The lauty1505/clawguard trojan finding demonstrated that security tooling specifically attracts supply chain attacks.
Research task:
Research how to design a trust registry for a developer-facing skill marketplace that prevents supply chain attacks on AI agent components:
- Marketplace trust models: Research how major developer marketplaces handle trust verification — npm's package provenance (SLSA level 2), VS Code Marketplace's publisher verification, Terraform Registry's verified publisher status. What is the minimum viable trust signal for a skill marketplace at CODITECT's current scale?
- SLSA framework: Research SLSA (Supply chain Levels for Software Artifacts) — specifically SLSA Level 1 and Level 2 requirements. Can CODITECT's current git-based submodule workflow satisfy SLSA Level 1 with the attestation schema from ADR-005?
- Sigstore/cosign: Research Sigstore's cosign tool for signing software artifacts. Is this applicable to signing CODITECT skill packages (Python/YAML skill bundles)? What is the implementation complexity for adding cosign verification to CODITECT's skill installation workflow?
- Dynamic trust scoring: Research whether a dynamic trust score (based on download count, community reviews, time since last update, CVE reports against dependencies) is feasible for a skill marketplace at CODITECT's scale, or whether static binary (trusted/untrusted) is more appropriate.
- What should happen when a skill that was previously trusted is found to be malicious (analogous to the lauty1505 finding)? Research how npm handles package takedowns, PyPI handles malicious package reports, and what CODITECT's incident response protocol should be.
Output: Trust registry design for CODITECT's skill marketplace with enforcement mechanism.
Category 6: Risk and Mitigation
Prompt RM-01: Supply Chain Attack Vectors Beyond Trojanized Forks
Context: ADR-005 documents the lauty1505/clawguard trojanized fork finding and proposes binary artifact scanning + provenance verification + trust registry as mitigations. The ADR explicitly notes: "Binary scanning does not detect malicious code written in interpreted languages (Python, JavaScript) that does not use binary artifacts. Source-level analysis would require static analysis tooling (a future enhancement)." This is a significant gap.
Research task:
Research supply chain attack vectors beyond binary injection that are relevant to CODITECT's AI agent skill ecosystem, and recommend mitigations:
- Source-level injection in Python skills: Research documented cases of malicious PyPI packages that executed attacks without binary artifacts — purely through Python source code (e.g., typosquatting with
__import__calls,setup.pyexecution at install time,atexithooks). What static analysis tools (bandit, semgrep Python rules, pysa/Pysa) are effective for detecting these patterns in CODITECT skill source files? - Dependency confusion attacks: Research the dependency confusion attack vector (first published by Alex Birsan, 2021) — where a malicious package with a private package's name is published to a public registry. How can CODITECT's skill venv build process be hardened against dependency confusion? Is using a private PyPI mirror the correct mitigation?
- Malicious model weights / prompt injection via skill data files: CODITECT skills can include data files. Research whether embedding adversarial prompt injection in a skill's data files (e.g., a skill that reads a CSV for configuration, where the CSV contains injected instructions) is a viable attack vector and how it should be mitigated.
- CI/CD pipeline injection: Research attacks that target the CI/CD pipeline itself rather than the skill source — e.g., malicious GitHub Actions, compromised build containers. What is the current best practice for securing AI skill CI/CD pipelines against injection attacks?
- Timing-based supply chain attacks: Research "time bomb" patterns in open-source packages — code that is benign until a specific date or environment condition triggers malicious behavior. Can static analysis detect these patterns?
Output: Ranked supply chain threat model for CODITECT skills with recommended controls per vector.
Prompt RM-02: Pattern Evasion Techniques and Adversarial Prompt Injection Research
Context: The SDD (Section 11.3) lists "Rule bypass via encoding evasion" as a threat and references pattern PI-009 (encoding_evasion — Base64/URL/ROT13 encoded injection). The coditect-impact.md identifies "Pattern Evasion — Regex-Only Detection" as Gap 7.7: "Regex-based detection is bypassable. An adversarial input designed to evade known patterns — encoding variation, Unicode normalization attacks, context-splitting across tool calls — will not be caught by regex matching." CODITECT must understand the evasion landscape before deploying block-mode patterns.
Research task:
Research adversarial prompt injection evasion techniques and their implications for CODITECT's regex-based PatternEngine:
- Encoding evasion variants: Research the specific encoding techniques used to evade regex-based prompt injection detection beyond simple Base64/URL encoding. Include: Unicode confusable characters (homoglyphs), zero-width joiners, bidirectional text overrides (Trojan Source attack applied to prompts), HTML entity encoding, hex escape sequences, and multi-encoding (Base64 of URL-encoded text).
- Context splitting: Research whether "context splitting" attacks — distributing a malicious instruction across multiple tool call arguments or across multiple tool calls — are detectable by per-invocation regex scanning. Are there documented cases where this technique successfully evaded agent security tools?
- Academic research: Research the 2024-2025 academic literature on adversarial prompt injection — specifically papers studying evasion techniques against regex and semantic classifiers. What evasion success rates are reported in the literature against regex-only detection?
- Semantic detection alternatives: Research LLM-based prompt injection detection (using a separate classification model) vs transformer-based semantic similarity detection vs regex. What is the state of the art in detection accuracy vs latency for a production deployment? Specifically, is there a distilled model small enough to run inference within CODITECT's 500ms scan timeout?
- Given the evasion landscape, recommend which evasion techniques CODITECT's PI-009 (
encoding_evasion) rule must cover at minimum for the March 11 launch, and which require a longer-term semantic detection capability.
Output: Evasion technique taxonomy with recommended regex coverage and semantic detection upgrade path.
Prompt RM-03: False Positive Mitigation Strategies for Production Deployments
Context: The SDD (Section 12.3) includes a test: "No false positives on clean CODITECT operation payloads." The coditect-impact.md (Gap 7.5) notes that JaydenBeard's 55+ patterns "have unknown false positive and false negative rates in CODITECT's operational context" and recommends validation before deploying in block mode. ADR-003 chose fail-open default partly to mitigate false positive impact on developer experience. The pattern validation script (scripts/security/validate-patterns.py, Section 4 of coditect-impact.md) is specified but not yet implemented.
Research task:
Research production-grade false positive management for a security pattern library deployed against real developer workflows, specifically for an AI agent platform context:
- FP rate benchmarks: Research what false positive rates are considered acceptable for different action types in production security systems. Is a 5% FP rate acceptable for WARN actions? For REDACT? For BLOCK? What do security practitioners recommend as maximum tolerable FP rates per severity level?
- Pattern validation methodology: Research the standard methodology for measuring false positive and false negative rates of a security rule set against a corpus of real operational data. How do Sigma (SIEM rules) and Semgrep (SAST rules) projects measure and report FP rates? What is the minimum sample size needed to achieve statistical significance for CODITECT's 80+ patterns?
- Graduated rollout strategies: Research phased security rule deployment patterns — specifically the "log-only → warn → block" progression used by security teams when rolling out new detection rules. What metrics (FP rate, coverage rate, analyst workload) are used as gates for each phase transition?
- AWS legitimate use cases for HIGH-risk patterns: Several of CODITECT's HIGH-risk patterns (cloud CLI destructive operations,
rm -rfon user-specified paths,git reset --hard) are common legitimate operations for DevOps-focused CODITECT agents. Research how other security tools differentiate between malicious and legitimate usage of these patterns — is context (agent identity, time of day, session type, prior session history) sufficient, or is per-operation user confirmation the only safe approach? - Research whether machine learning-based anomaly detection (detecting unusual patterns relative to historical agent behavior baselines) could complement regex-based detection to reduce false positives without reducing true positive rates.
Output: False positive management playbook for CODITECT's security layer launch, including graduated rollout plan and FP measurement methodology.
Execution Notes
Priority Ordering (Recommended)
Pre-launch critical (before March 11, 2026):
- AD-03 (Hook protocol spec — needed for implementation)
- RM-03 (False positive mitigation — blocks block-mode deployment)
- RM-02 (Evasion techniques — needed before writing PI-009)
- CR-01 (SOC 2 evidence — needed for enterprise pilot customers)
High value for implementation phase (Weeks 1-6):
- AD-01 (Performance benchmarks)
- AD-02 (Pattern caching architecture)
- CR-02 (PII patterns for GDPR/CCPA/PIPEDA)
- MO-04 (Rate limiting for Ralph Wiggum)
Post-launch or parallel research:
- AD-04, AD-05 (Architecture refinement)
- CI-01, CI-02, CI-03 (Competitive intelligence)
- CR-03, CR-04 (Compliance depth)
- MO-01, MO-02, MO-03 (Multi-agent security)
- PF-01, PF-02, PF-03, PF-04 (Product features)
- RM-01 (Supply chain depth)
Execution Instructions
Each prompt is self-contained and can be executed independently by invoking:
/agent research-agent "<paste prompt text here>"
Or via Task sub-agent:
Task(subagent_type="research-agent",
description="<prompt ID and title>",
prompt="<paste prompt text>")
The CODITECT context summary at the top of this document should be prepended to any prompt if executing in a fresh session without prior context.
Document: follow-up-prompts.md
Full Path: submodules/core/coditect-core/analyze-new-artifacts/clawguard-ai-agent-security/artifacts/follow-up-prompts.md
Source artifacts: executive-summary.md, coditect-impact.md, sdd.md, research-context.json, adrs/ADR-001 through ADR-005
Total prompts: 21 across 6 categories