CODITECT Integration Impact: Autonomous Enterprise Agent Stack

(CrewAI + LangGraph + Semantic Kernel)

Research Date: 2026-02-19 Analyst: Claude (Sonnet 4.6) — Research Impact Analyzer Status: Draft for Engineering Review Predecessor Documents:

autonomous-enterprise-agent-framework-evaluation-2026-02-19.md (10-framework graded evaluation)
autonomous-enterprise-agent-pairwise-comparison-2026-02-19.md (Condorcet tournament, complementarity assessment)
autonomous-enterprise-agent-search-strategy-2026-02-19.md (license matrix, OWASP mapping, search strategy)

Executive Summary

The autonomous enterprise agent evaluation concluded with a 3-framework complementary stack recommendation: CrewAI (MIT) as primary orchestration for customer-facing workflows, LangGraph (MIT) for internal development workflows and stateful processing, and Semantic Kernel (MIT) as the security and Microsoft 365 integration layer. The pairwise tournament revealed a Condorcet cycle — no single framework dominates — which is the strongest possible signal that a layered stack is architecturally correct rather than a compromise.

CODITECT's current platform (776 agents, 445 skills, 377 commands, 118 hooks, Ralph Wiggum loops, 6-database context storage, MCP servers) provides natural integration anchor points for all three frameworks. However, the integration is not low-friction: CrewAI's host-process execution model directly contradicts CODITECT's compliance requirements, LangGraph's LangChain dependency surface creates significant version management risk, and Semantic Kernel's C#-primary architecture introduces a language boundary that will require sustained adapter maintenance.

Recommendation: Conditional Go

Conditions for Go:

CrewAI MUST run in an isolated Docker container, never in the coditect-core main process
Semantic Kernel MUST be deployed as a standalone Python microservice with MCP interface — no C# in the coditect-core process
LangGraph MUST be pinned to a requirements file with transitive dependencies frozen before any production deployment
An enterprise agent security ADR MUST be approved before any external API credentials are handled

Key Findings:

CrewAI native MCP support delivers immediate zero-adapter connection to existing coditect-core MCP servers (semantic-search, call-graph, impact-analysis)
LangGraph's interrupt-node model is the closest structural analog to Ralph Wiggum checkpoints of any evaluated framework
Semantic Kernel is the only framework with built-in OAuth2/OIDC, audit hooks, and content safety — not optional for a compliance-first platform
❌ CrewAI runs all agent code in the host process with no sandboxing — prompt injection reaches every system resource
❌ No framework provides tenant-scoped credential isolation natively — all three must be wrapped
❌ LangGraph's dependency graph (LangChain ecosystem) is notoriously brittle — version pinning is mandatory, not advisory
⚠️ Three-framework stack triples the maintenance surface area — each framework has independent release cadences
⚠️ Semantic Kernel's coditect-core fit score is 2/5 — the integration effort is "High (months)" and cannot be compressed

1. Integration Architecture

Control Plane vs Data Plane

Technology Role: All three frameworks operate in both planes depending on invocation mode.

Control Plane Integration:

The coditect-core orchestration layer — hooks, Ralph Wiggum loops, the agent dispatcher, and command execution — is the control plane. All three frameworks are invoked FROM this layer, never the reverse. This is the critical architectural constraint: CrewAI, LangGraph, and Semantic Kernel must be sub-orchestrators called by coditect-core, not top-level orchestrators that call into coditect-core.

This is not how any of the three frameworks are designed. CrewAI explicitly wants to be the top-level orchestrator. LangGraph assumes it owns the state machine root. Semantic Kernel assumes it owns the plugin execution context. All three must be architecturally inverted to fit CODITECT's existing orchestration hierarchy.

CrewAI inverted: coditect-core commands dispatch enterprise-orchestration-engine which launches a CrewAI Crew as a subprocess, monitors via callback hooks, and receives results
LangGraph inverted: Ralph Wiggum loop state machine calls LangGraph graph execution as a child process; LangGraph graph nodes map to Ralph Wiggum checkpoint transitions
Semantic Kernel inverted: exposed as an MCP server endpoint; coditect-core agents call SK functions as MCP tools, never embedding SK directly

Data Plane Integration:

The data plane is where tenant data flows: PostgreSQL (GKE), Redis, org.db/sessions.db context storage, session logs.

CrewAI data plane: CrewAI agents read enterprise data (Gmail, Calendar, Drive) and write results back via coditect-core session logging. Intermediate CrewAI agent memory MUST NOT persist to any shared storage without tenant scoping.
LangGraph data plane: LangGraph state (graph checkpoint objects) maps to Ralph Wiggum state files at ~/.coditect-data/ralph-loops/. Tenant context must be injected into every graph node's state dict.
Semantic Kernel data plane: SK memory stores (vector databases for semantic memory) MUST be tenant-isolated. If SK's default volatile memory is used, isolation is process-level only — not sufficient for multi-tenant production.

CODITECT Component Touchpoints

CODITECT Component	CrewAI Integration	LangGraph Integration	Semantic Kernel Integration
agents/ (776 agents)	New agent type: `enterprise-orchestration-engine` dispatches CrewAI Crews	New agent type: `stateful-workflow-engine` manages LangGraph graphs	New agent type: `ms365-integration-service` wraps SK as MCP server
commands/ (377 commands)	New: `/enterprise-agent start`, `/enterprise-agent status`, `/enterprise-agent stop`	New: `/workflow-graph create`, `/workflow-graph resume` (maps to Ralph Wiggum commands)	New: `/ms365-auth`, `/ms365-action`
hooks/ (118 hooks)	`PreToolUse` hook intercepts all CrewAI tool calls as HITL gates. `PostToolUse` records to audit log	`PreToolUse` mapped to LangGraph interrupt nodes — node execution pauses awaiting hook approval	`PreToolUse` validates SK function call scope against tenant OAuth2 scopes
skills/ (445 skills)	New skill directory: `skills/enterprise-agent/` — Gmail, Calendar, Drive, Office automation skills as SKILL.md definitions	New skill: `skills/stateful-workflow/SKILL.md` — LangGraph graph construction patterns	New skill: `skills/ms365-integration/SKILL.md`
MCP Servers	CrewAI native MCP client connects to `semantic-search` and `call-graph` servers with zero adapter code	LangGraph via `langchain-mcp-adapters` — adds fragility at tool boundary	SK exposes itself AS an MCP server: `sk-security-service:8080`
Ralph Wiggum loops	CrewAI Flows run as Ralph Wiggum managed sub-processes. RW checkpoints CrewAI Flow state	LangGraph IS Ralph Wiggum at the sub-loop level — interrupt-resume maps directly to RW checkpoint-handoff	SK is stateless from RW perspective — auth tokens managed separately
Session Logging	All CrewAI task completions emit to session log via PostToolUse hook	All LangGraph node transitions emit structured log entries to session log	All SK function calls with OAuth2 tokens emit audit events to session log
org.db	Enterprise agent decisions (e.g., "approved sending email to vendor X") stored in `decisions` table	Workflow graph design decisions stored in `decisions` table	MS365 auth configuration decisions stored in `decisions` table
sessions.db	CrewAI task run history stored in messages table	LangGraph execution traces stored in messages table	SK function call history stored in messages table
messaging.db	Agent-to-agent messages between coditect agents and enterprise-orchestration-engine	Workflow coordination messages during graph execution	Inter-agent messages for MS365 action coordination

Architecture Diagram

2. Multi-Tenancy Implications

Technology Native Multi-Tenancy: No — none of the three frameworks are multi-tenant-aware.

This is the most significant architectural gap in the entire stack. All three frameworks were designed for single-tenant or developer-personal use cases. Embedding them in a multi-tenant SaaS requires CODITECT to implement the full tenant isolation layer externally.

Tenant Isolation Requirements by Framework

CrewAI Tenant Isolation:

CrewAI has no concept of tenant_id. Its agent memory, crew state, and task results are process-global. The isolation strategy is process-level, not data-level:

Each tenant's CrewAI execution MUST run in a separate Docker container instance
Container is provisioned on demand, receives only that tenant's credentials and context, destroyed after task completion
No shared memory, no shared filesystem, no shared Redis between tenant containers
CrewAI container receives tenant context via environment variables injected by coditect-core at dispatch time

The container-per-tenant model is the only safe approach given CrewAI's lack of internal isolation. It is also operationally expensive: each enterprise agent invocation requires container provisioning latency (typically 1-5 seconds for a pre-warmed Docker container, 15-30 seconds cold start).

LangGraph Tenant Isolation:

LangGraph's graph state object is in-process memory by default. When using LangGraph's built-in persistence (SQLite checkpointer), ALL tenant graphs share the same checkpoint store unless explicitly partitioned.

MUST use thread_id scoped to tenant: config = {"configurable": {"thread_id": f"{tenant_id}:{task_id}"}}
MUST use a PostgreSQL checkpointer (not SQLite) with row-level security policies for production multi-tenant use
LangGraph's default MemorySaver is explicitly not safe for multi-tenant — any cross-tenant memory access is silent data leakage

Semantic Kernel Tenant Isolation:

SK's plugin system uses per-plugin OAuth2 tokens. In a multi-tenant context:

Each tenant's SK plugin invocations MUST use tokens scoped to that tenant's OAuth2 client credentials
SK's semantic memory (if used) MUST be backed by a tenant-partitioned vector store, not the default volatile memory
Token storage MUST NOT be shared across tenant sessions — tokens stored in sessions.db MUST be scoped to tenant_id

Credential Scoping

The enterprise agent stack will handle the most sensitive tenant credentials in the CODITECT platform: OAuth2 refresh tokens for Gmail, Calendar, Outlook, and potentially SharePoint. This is a fundamentally different risk profile from the current coditect-core credential model.

Current CODITECT credential scope: API keys for LLM providers, deployment credentials for GKE, database connection strings. These are platform-level credentials, not tenant-level.

Enterprise agent credential scope: Per-tenant OAuth2 refresh tokens for personal and organizational email/calendar/documents. Exfiltration of these credentials is a direct HIPAA/SOC2 breach.

❌ None of the three frameworks provide credential vaulting. CODITECT must implement:

Encrypted credential storage in org.db with tenant_id foreign key and AES-256 encryption at rest
Short-lived access token derivation (never persisting access tokens, only refresh tokens)
Credential injection at execution time via environment variable into isolated container (CrewAI) or context parameter (LangGraph)
Automatic credential rotation hooks when tokens expire

Resource Quotas

CrewAI crews can spawn multiple parallel agents, each consuming LLM API calls. Without quota enforcement, a single tenant's enterprise agent workflow can saturate the platform's LLM budget.

⚠️ No framework provides per-tenant resource quota enforcement. CODITECT must implement:

Token budget per tenant per hour (enforced at PostToolUse hook level)
Maximum concurrent CrewAI agents per tenant
LangGraph node execution time limits per tenant
Rate limiting at the MCP server gateway for SK function calls

3. Compliance Surface

Auditability

What changes when enterprise agents are active:

Today, all coditect-core actions are initiated by a human (developer, admin, or authorized user) through documented commands. The audit trail is: human command -> hook -> action -> session log.

With enterprise agents, the audit trail expands: human command -> coditect-core agent -> CrewAI crew -> multiple tool calls -> external API mutations (email sent, calendar event created, document modified) -> session log. The external API mutations are the critical new surface — they occur OUTSIDE the coditect-core process boundary in external systems.

Compliance Requirements for External API Actions:

Action Type	CODITECT Requirement	Gap Status
Email sent on behalf of tenant	Logged with recipient, subject, timestamp, tenant_id, user_id	❌ CrewAI does not emit structured audit events for tool results
Calendar event created	Logged with attendees, time, tenant_id	❌ CrewAI does not emit structured audit events for tool results
Document created/modified	Logged with document ID, change summary, tenant_id	❌ Same gap
SK OAuth2 token used	Logged with scope, resource, timestamp	✅ SK has built-in audit hooks — this is the one area where compliance works natively
LangGraph node transition	Logged as state transition with full state diff	✅ LangGraph's stateful graph produces an automatic audit trace
HITL approval requested	Logged with context, approver, decision, timestamp	✅ Exists via PreToolUse hooks, must be extended for enterprise actions

Mitigation for CrewAI audit gap:

# coditect/integrations/enterprise_agent/audit_wrapper.py

from datetime import datetime, timezone
import json
from crewai import Agent, Task, Crew
from crewai.callbacks import BaseCallback
from coditect.session_log import SessionLogger
from coditect.context import get_tenant_context


class CoditactAuditCallback(BaseCallback):
    """
    Wraps every CrewAI tool call with CODITECT-compliant audit emission.
    Injected at Crew construction time — NOT optional for production use.
    """

    def __init__(self, tenant_id: str, session_logger: SessionLogger):
        self.tenant_id = tenant_id
        self.session_logger = session_logger

    def on_tool_start(
        self,
        tool_name: str,
        tool_input: dict,
        agent_name: str,
        **kwargs
    ) -> None:
        self.session_logger.emit_structured({
            "event": "enterprise_agent.tool_start",
            "tenant_id": self.tenant_id,
            "tool": tool_name,
            "agent": agent_name,
            "input_summary": self._sanitize_input(tool_input),
            "timestamp": datetime.now(timezone.utc).isoformat(),
        })

    def on_tool_end(
        self,
        tool_name: str,
        tool_output: str,
        agent_name: str,
        **kwargs
    ) -> None:
        self.session_logger.emit_structured({
            "event": "enterprise_agent.tool_end",
            "tenant_id": self.tenant_id,
            "tool": tool_name,
            "agent": agent_name,
            "output_summary": self._sanitize_output(tool_output),
            "timestamp": datetime.now(timezone.utc).isoformat(),
        })

    def _sanitize_input(self, tool_input: dict) -> dict:
        """Strip PII fields from audit log — HIPAA-required."""
        SENSITIVE_KEYS = {"body", "content", "message", "text", "password"}
        return {
            k: "[REDACTED]" if k.lower() in SENSITIVE_KEYS else v
            for k, v in tool_input.items()
        }

    def _sanitize_output(self, output: str) -> str:
        """Truncate output to prevent PII in audit log."""
        return output[:500] + "...[truncated]" if len(output) > 500 else output

OWASP Agent Security Top 10 — CODITECT Mitigation Mapping

OWASP Risk	Severity	coditect-core Mitigation	Gap
Excessive Agency	Critical	PreToolUse hooks enforce tool allowlist per tenant policy	⚠️ Allowlist must be manually maintained per tenant configuration
Prompt Injection	Critical	SK content safety filters (if SK in stack). No native protection in CrewAI	❌ CrewAI has no prompt injection defense — external input to agents is unrestricted
Insecure Tool Use	High	PreToolUse hook validates tool parameters against schema before execution	⚠️ Schema validation must be authored per tool — not automatic
Data Exfiltration	Critical	CrewAI container outbound network allowlist (Docker network policy)	❌ Allowlist not implemented by default — must be GKE NetworkPolicy
Privilege Escalation	High	OAuth2 scoped tokens via SK; minimum-scope enforcement	⚠️ Scope minimization must be configured per workflow — not enforced by default
Insufficient Logging	Critical	CoditactAuditCallback on all CrewAI tool calls; LangGraph state trace; SK audit hooks	⚠️ Three separate audit systems — no unified audit query surface
Uncontrolled Autonomy	Critical	Ralph Wiggum HITL gate at loop start; PreToolUse hooks for high-risk tools	⚠️ HITL coverage gaps exist for mid-workflow decisions not anticipated at design time
Supply Chain	Medium	All three frameworks are MIT-licensed; SBOM generation required	❌ No SBOM currently generated for enterprise agent dependencies
Model Manipulation	High	SK content safety filters; coditect-core output filtering at PostToolUse	❌ No output guardrails in CrewAI — agent outputs are not validated before action execution
Denial of Service	Medium	Per-tenant token budget enforcement at PostToolUse	❌ Not implemented — must be built before multi-tenant launch

Policy Injection Points

CODITECT's compliance model requires that tenant-level policies intercept and block actions that violate that tenant's compliance posture (e.g., a HIPAA tenant must not allow PII export without explicit consent logging).

Available injection points:

PreToolUse hook (best injection point): Fires before any tool call. Can block the call, modify parameters, or require human approval. Maps to all three frameworks if CrewAI tools are routed through coditect-core's hook system.
LangGraph interrupt nodes (structural): A graph node can be flagged as an interrupt point. LangGraph pauses execution and awaits external input before continuing. This is the most auditable HITL mechanism in the stack.
SK process filters (deepest): SK's filter pipeline can intercept every function call before and after execution. This is the only framework-native policy injection mechanism that can block based on content.

Policy enforcement gap for CrewAI: CrewAI's callback system fires events but CANNOT block tool execution. If an enterprise agent is mid-workflow and a tool call would violate a tenant policy, the callback can log the violation but cannot prevent the action. Policy blocking in CrewAI requires the tool implementation itself to call back into coditect-core's policy engine before executing.

# coditect/integrations/enterprise_agent/policy_aware_tool.py

from crewai.tools import BaseTool
from coditect.policy import PolicyEngine
from coditect.exceptions import PolicyViolationError


class PolicyAwareTool(BaseTool):
    """
    Base class for all enterprise agent tools.
    Every tool in the enterprise agent stack MUST inherit from this,
    not from crewai.tools.BaseTool directly.
    """

    def __init__(self, tenant_id: str, policy_engine: PolicyEngine, **kwargs):
        super().__init__(**kwargs)
        self.tenant_id = tenant_id
        self.policy_engine = policy_engine

    def _run(self, **kwargs) -> str:
        # Policy check BEFORE execution
        policy_result = self.policy_engine.evaluate(
            tenant_id=self.tenant_id,
            action=self.name,
            parameters=kwargs
        )
        if not policy_result.allowed:
            raise PolicyViolationError(
                f"Policy blocked {self.name}: {policy_result.reason}\n"
                f"Tenant: {self.tenant_id}"
            )

        # HITL gate for high-risk actions
        if policy_result.requires_human_approval:
            approval = self._request_human_approval(kwargs, policy_result)
            if not approval.approved:
                raise PolicyViolationError(
                    f"Human declined {self.name}: {approval.reason}"
                )

        # Execute the actual tool action
        result = self._execute(**kwargs)

        # Policy check AFTER execution (for output filtering)
        filtered_result = self.policy_engine.filter_output(
            tenant_id=self.tenant_id,
            action=self.name,
            output=result
        )
        return filtered_result

    def _execute(self, **kwargs) -> str:
        raise NotImplementedError("Subclasses must implement _execute()")

    def _request_human_approval(self, parameters: dict, policy_result) -> object:
        # Dispatch to CODITECT HITL approval workflow
        # Returns approval object with .approved and .reason
        from coditect.hitl import HITLApprovalWorkflow
        return HITLApprovalWorkflow.request(
            tenant_id=self.tenant_id,
            action=self.name,
            parameters=parameters,
            policy_context=policy_result.context,
            timeout_seconds=300  # 5 minute timeout
        )

E-Signatures and Evidence

Enterprise agents will execute actions that constitute legal acts: sending communications on behalf of a business, creating contracts, scheduling commitments. These require evidence trails that satisfy the same bar as CODITECT's existing e-signature requirement for platform actions.

❌ None of the three frameworks provide e-signature or evidence trail mechanisms. CODITECT must route high-risk enterprise agent actions through the existing e-signature workflow:

Enterprise agent identifies action as high-risk (email to external party, document signing, financial commitment)
Action is serialized and submitted to CODITECT's e-signature workflow
Authorized user reviews and signs the pending action
Signed evidence stored in esignature_evidence table with enterprise_agent_action_id
Enterprise agent receives approval callback and executes the action
Execution result stored alongside evidence record

4. Observability Story

Current coditect-core Observability

coditect-core has a session-log-centric observability model: all significant actions are written to daily session log files at ~/.coditect-data/session-logs/. The sessions.db captures message history and analytics. There is no real-time metrics pipeline, no distributed tracing, and no centralized log aggregation in the current framework documentation.

What the Enterprise Agent Stack Adds

CrewAI observability:

CrewAI emits verbose console logging by default — not structured, not queryable
CrewAI does NOT expose Prometheus metrics or OpenTelemetry spans natively
CrewAI's verbose output must be captured and parsed to extract structured observability data

LangGraph observability:

LangGraph state transitions are structurally observable: every node execution produces a state diff
LangGraph supports LangSmith tracing natively (LangChain Inc.'s commercial product — not appropriate for a privacy-first platform)
LangGraph supports custom callbacks that can emit to any observability backend
The graph structure itself serves as a visual execution trace

Semantic Kernel observability:

SK has built-in telemetry via OpenTelemetry (the strongest observability story in the stack)
SK can export spans to any OTLP-compatible backend
SK function call durations, success/failure rates, and token counts are available as metrics

Gaps

⚠️ No unified observability surface. Three frameworks produce three different observability data streams with different formats, different telemetry protocols, and different granularities. A CODITECT operator cannot currently query "how many enterprise agent tool calls did tenant X make in the last hour" without building a custom aggregation layer.

⚠️ No per-tenant metrics attribution. All three frameworks operate at the process level. LLM token consumption, tool call counts, and execution durations are not automatically attributed to a tenant_id without custom instrumentation.

❌ CrewAI has no OpenTelemetry support. To get structured traces from CrewAI, CODITECT must implement a LangFuse-style tracing wrapper using CrewAI's callback system. LangFuse is already in the coditect ecosystem (noted in memory/agent-labs-analysis.md) — this is the recommended approach.

Recommended Observability Integration

# coditect/integrations/enterprise_agent/observability.py

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
import time


class EnterpriseAgentTracer:
    """
    Unified OpenTelemetry tracing for all three frameworks.
    CrewAI: populated via CoditactAuditCallback
    LangGraph: populated via StateTransitionCallback
    Semantic Kernel: native OTLP export configured in SK setup
    """

    def __init__(self, tenant_id: str, otlp_endpoint: str = "http://localhost:4317"):
        provider = TracerProvider()
        provider.add_span_processor(
            BatchSpanProcessor(OTLPSpanExporter(endpoint=otlp_endpoint))
        )
        trace.set_tracer_provider(provider)
        self.tracer = trace.get_tracer("coditect.enterprise-agent")
        self.tenant_id = tenant_id

    def trace_crewai_tool(self, tool_name: str, agent_name: str):
        """Context manager for wrapping CrewAI tool calls with OTel spans."""
        span = self.tracer.start_span(f"crewai.tool.{tool_name}")
        span.set_attribute("tenant.id", self.tenant_id)
        span.set_attribute("crewai.agent", agent_name)
        span.set_attribute("crewai.tool", tool_name)
        return span

    def trace_langgraph_node(self, node_name: str, graph_id: str):
        """Context manager for wrapping LangGraph node execution."""
        span = self.tracer.start_span(f"langgraph.node.{node_name}")
        span.set_attribute("tenant.id", self.tenant_id)
        span.set_attribute("langgraph.graph_id", graph_id)
        span.set_attribute("langgraph.node", node_name)
        return span

Grafana Dashboard Gaps

The following per-tenant dashboards do not exist and must be built:

Enterprise agent task completion rate by tenant
LLM token consumption by tenant by framework
HITL approval latency (time from gate trigger to approval/denial)
Policy violation count by tenant by tool
SK OAuth2 token refresh rate by tenant

5. Multi-Agent Orchestration Fit

coditect-core's Existing 776-Agent System

coditect-core's 776 agents are prompt-based specialist agents defined as Markdown files in agents/. Each agent has a specific expertise area (e.g., senior-architect, security-specialist, database-architect). Agents are invoked by the coditect-core command system or by Ralph Wiggum loops. They do not natively interoperate with each other — coordination is done by the calling layer.

The enterprise agent stack introduces a second agent system (CrewAI's multi-agent crew) that operates at a different level of abstraction and with different execution semantics.

Conflict Analysis: coditect-core Agents vs. Framework Agents

Dimension	coditect-core Agents	CrewAI Agents	LangGraph Nodes	SK Functions
Definition format	Markdown (.md files)	Python class instances	Python functions	Python plugins
Execution model	Invoked by command/hook system	Run within a Crew	Run as graph nodes	Called as SK functions
Memory	Session log + org.db	Per-agent memory (in-process)	Graph state (persistent)	Volatile (default)
Communication	messaging.db + hooks	Crew task delegation	State passing via edges	Function composition
Security	Hook-enforced approval gates	Callback-based (cannot block)	Interrupt nodes (can block)	Process filters (can block)
Multi-tenancy	Not natively tenant-aware	Not natively tenant-aware	Requires thread_id scoping	Requires token scoping

❌ No agent protocol interoperability. A coditect-core senior-architect agent cannot be called by a CrewAI crew, and a CrewAI agent cannot be invoked through coditect-core's hook system. The two agent systems are fully disjoint. The integration boundary is one-directional: coditect-core dispatches framework agents; framework agents cannot invoke coditect-core agents.

⚠️ Orchestrator hierarchy conflict. When a coditect-core command invokes an enterprise agent task, that task becomes a CrewAI crew or LangGraph graph that spawns multiple sub-agents. Those sub-agents make tool calls that should go through coditect-core hooks (for audit and policy), but CrewAI agents call tools directly without knowledge of the hook system. The tool implementations must be designed to call back into coditect-core's hook system — which is an inversion of the normal dependency direction.

Ralph Wiggum Loop Integration

Ralph Wiggum (ADR-108/110/111) provides autonomous loop execution with checkpoint-based session handoff. It is the closest existing coditect-core mechanism to a stateful workflow engine.

LangGraph + Ralph Wiggum integration (natural fit):

LangGraph's graph-node model IS a state machine with explicit checkpoints. The structural mapping is:

LangGraph Concept          Ralph Wiggum Concept
-------------------        --------------------
Graph state                Checkpoint payload
Interrupt node             Context compaction trigger
Graph resume               Session recovery from checkpoint
Thread ID                  Loop ID
Node execution             Task step
Edge condition             Step outcome routing

Ralph Wiggum currently stores checkpoint state at ~/.coditect-data/ralph-loops/{loop_id}/checkpoint.json. LangGraph's PostgreSQL checkpointer can be configured to write to the same storage pattern:

# coditect/integrations/enterprise_agent/langgraph_rw_bridge.py

from langgraph.checkpoint.postgres import PostgresSaver
from coditect.paths import get_langgraph_checkpoint_db_url
from coditect.ralph_wiggum import RalphWiggumLoopState


def create_rw_aligned_checkpointer(tenant_id: str, loop_id: str) -> PostgresSaver:
    """
    Creates a LangGraph checkpointer that writes to the same PostgreSQL
    database used by Ralph Wiggum's loop state, with tenant_id isolation.

    Thread IDs are scoped as: {tenant_id}:{loop_id}:{task_id}
    This ensures no cross-tenant state leakage at the checkpoint level.
    """
    db_url = get_langgraph_checkpoint_db_url()
    checkpointer = PostgresSaver.from_conn_string(db_url)
    checkpointer.setup()  # Creates langgraph_checkpoints table if not exists
    return checkpointer


def resume_graph_from_rw_checkpoint(
    graph,
    tenant_id: str,
    loop_id: str,
    task_id: str
):
    """
    Resume a LangGraph graph execution from a Ralph Wiggum checkpoint.
    Called when a Ralph Wiggum loop recovers from context compaction.
    """
    thread_id = f"{tenant_id}:{loop_id}:{task_id}"
    config = {"configurable": {"thread_id": thread_id}}

    # LangGraph resumes from last persisted checkpoint automatically
    for event in graph.stream(None, config=config):
        yield event

CrewAI + Ralph Wiggum integration (requires wrapping):

CrewAI Flows have their own internal state machine, but it is not designed for external checkpoint management. Ralph Wiggum must treat a CrewAI Flow as an opaque subprocess:

# coditect/integrations/enterprise_agent/crewai_rw_manager.py

import subprocess
import json
from pathlib import Path
from coditect.paths import get_ralph_wiggum_state_dir


class CrewAIRalphWiggumManager:
    """
    Manages CrewAI Flows as Ralph Wiggum autonomous sub-processes.
    Ralph Wiggum cannot checkpoint inside CrewAI — it can only:
    1. Start a new CrewAI Flow subprocess
    2. Monitor its stdout/stderr for progress
    3. Record the last known state on SIGTERM/failure
    4. Restart from the last recorded state on session recovery
    """

    def __init__(self, tenant_id: str, loop_id: str):
        self.tenant_id = tenant_id
        self.loop_id = loop_id
        self.state_file = (
            get_ralph_wiggum_state_dir() /
            f"{tenant_id}" /
            f"{loop_id}" /
            "crewai_state.json"
        )
        self.state_file.parent.mkdir(parents=True, exist_ok=True)

    def start_flow(self, flow_class: str, initial_input: dict) -> str:
        """Launch a CrewAI Flow as an isolated subprocess."""
        state = {
            "tenant_id": self.tenant_id,
            "loop_id": self.loop_id,
            "flow_class": flow_class,
            "initial_input": initial_input,
            "status": "running",
            "last_checkpoint": None,
        }
        self.state_file.write_text(json.dumps(state, indent=2))

        # CrewAI runs in Docker container, not in-process
        proc = subprocess.Popen(
            [
                "docker", "run", "--rm",
                "--network", f"tenant-{self.tenant_id}",  # Isolated network
                "--env", f"TENANT_ID={self.tenant_id}",
                "--env", f"LOOP_ID={self.loop_id}",
                "--env-file", f"/run/secrets/{self.tenant_id}/enterprise-agent.env",
                "coditect/enterprise-agent:latest",
                "python", "-m", "coditect.enterprise_agent.run_flow",
                "--flow-class", flow_class,
                "--input", json.dumps(initial_input),
            ],
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
        )
        return str(proc.pid)

    def save_checkpoint(self, checkpoint_data: dict) -> None:
        """Called by CrewAI flow on each significant step completion."""
        state = json.loads(self.state_file.read_text())
        state["last_checkpoint"] = checkpoint_data
        state["status"] = "checkpointed"
        self.state_file.write_text(json.dumps(state, indent=2))

    def resume_from_checkpoint(self) -> dict | None:
        """Returns last checkpoint for restart, or None if no checkpoint exists."""
        if not self.state_file.exists():
            return None
        state = json.loads(self.state_file.read_text())
        return state.get("last_checkpoint")

6. Advantages

What CODITECT concretely gains from this stack:

Advantage 1: Immediate MCP Server Utilization via CrewAI

CODITECT already operates three MCP servers: semantic-search, call-graph, and impact-analysis. CrewAI's native MCP client can connect to these servers without any adapter code — zero new code required on the MCP server side.

Practical impact: An enterprise agent workflow for "analyze this codebase and create a technical summary document" can use the call-graph MCP server to understand code structure, semantic-search to find relevant documentation, and impact-analysis to flag risky dependencies — all as natively available CrewAI tools on day one.

This is a day-one demonstration of value that requires only CrewAI configuration, not new development.

Advantage 2: Enterprise System Coverage Without Connector Development

CrewAI's 500+ first-party maintained integrations cover Google Workspace (Gmail, Calendar, Drive), Microsoft 365 (partial), Slack, Jira, Salesforce, HubSpot, and more. Without this stack, CODITECT would need to build and maintain each connector individually — an estimated 2-6 weeks per connector for quality enterprise API integration.

At 500+ connectors, the alternative-build cost is astronomical. Even considering only the 10 most common enterprise systems (Gmail, Calendar, Slack, Jira, Salesforce, HubSpot, Notion, Asana, Linear, GitHub), the build-vs-adopt decision has a very clear answer.

Advantage 3: LangGraph Closes the Ralph Wiggum Feature Gap

Ralph Wiggum (ADR-108/110/111) provides autonomous loop execution with checkpoint handoff, but it lacks fine-grained state machine primitives. Ralph Wiggum loops are defined as prompt-chain sequences with checkpoint JSON blobs — not as typed state machines with conditional routing.

LangGraph provides exactly the typed state machine layer that Ralph Wiggum needs for complex workflows. The structural mapping (interrupt nodes = checkpoint triggers, thread_id = loop_id, graph state = checkpoint payload) means LangGraph can serve as Ralph Wiggum's execution engine for complex internal development workflows — a capability gap that currently prevents automating multi-branch engineering tasks.

Advantage 4: Semantic Kernel Eliminates Enterprise Auth Build Cost

CODITECT currently has no OAuth2/OIDC framework for tenant-delegated enterprise system access. Building this from scratch — token storage, refresh logic, scope management, per-tenant credential isolation, audit logging for token use — is a 4-8 week engineering effort at minimum.

Semantic Kernel's built-in OAuth2/OIDC per-plugin model, audit hooks, and content safety filters provide this infrastructure. The integration cost is "High (months)" but less than building the equivalent from scratch with the same compliance level.

Advantage 5: Compliance Differentiation for Enterprise Customers

For CODITECT's enterprise customer segment (SOC2, HIPAA-ready), having an autonomous agent layer with built-in audit hooks, HITL gates, and OAuth2 security (via Semantic Kernel) is a significant competitive differentiator. Competitive enterprise agent platforms from vendors like Salesforce Agentforce, Microsoft Copilot Studio, and ServiceNow use proprietary stacks. CODITECT's MIT-licensed, open-source, self-hostable stack is the only option for customers with data residency requirements.

This is not a performance advantage — it is a market access advantage for regulated verticals.

7. Gaps and Risks

Critical Gaps

❌ CrewAI host process execution — prompt injection reaches all system resources. CrewAI runs all agent code in the same process as the caller. An adversarial enterprise input (malicious email body, crafted calendar invite, injected tool response) can manipulate a CrewAI agent's behavior. There is no sandboxing preventing the manipulated agent from calling any available tool. Given that enterprise agent tools include Gmail send, Calendar create, and file write, a successful prompt injection is a direct path to business email compromise (BEC). This is not a theoretical risk — CrowdStrike has documented this attack class for MCP-enabled AI systems. Required mitigation before any production use: Docker container isolation for all CrewAI execution.

❌ No multi-tenant credential isolation in any framework. All three frameworks handle credentials as process-global context. In a multi-tenant deployment, a bug in tenant isolation logic leaks Tenant A's Gmail OAuth2 token to Tenant B's agent. This is a direct HIPAA/SOC2 breach. The isolation implementation is entirely CODITECT's responsibility — the frameworks provide no defense. Required mitigation: container-per-tenant isolation for CrewAI; PostgreSQL RLS for LangGraph checkpoints; per-invocation token injection for SK.

❌ No built-in output guardrails in CrewAI. CrewAI agents receive tool outputs and pass them to subsequent agents or return them as task results without any content inspection. A tool that returns PII (patient data from a health-adjacent enterprise system) will be included in CrewAI's task output and potentially logged without filtering. Required mitigation: output filtering via PolicyAwareTool base class on all enterprise tools.

❌ No SBOM generated for the enterprise agent dependency surface. The three-framework stack adds LangChain, LangGraph, CrewAI, Semantic Kernel Python, and their transitive dependencies (a conservative estimate is 150+ additional Python packages). None of these have been security-scanned. SOC2 Type II requires evidence of dependency scanning. Required: pip-audit + cyclonedx-bom integrated into CI/CD before any production deployment.

Major Gaps

⚠️ Three independent release cadences — maintenance surface tripled. CrewAI releases weekly. LangGraph releases frequently as part of LangChain. Semantic Kernel follows Microsoft's release schedule. Breaking changes in any framework require immediate response. At the current coditect-core maintenance velocity, managing three external framework dependencies is a significant ongoing burden that WILL cause delayed security patches.

⚠️ LangChain dependency graph is notoriously brittle. LangGraph depends on LangChain. LangChain's version history shows frequent breaking changes across minor versions. "Notoriously large and version-sensitive; upgrades frequently break integrations" (from the primary evaluation). Every LangGraph upgrade requires a full regression test of all LangGraph-based workflows. This is not an acceptable situation without a frozen requirements file and a dedicated upgrade testing process.

⚠️ Semantic Kernel Python SDK is a second-class citizen. SK's primary development language is C#. The Python SDK lags behind in features, documentation, and bug fixes. Features available in SK for C# take weeks to months to appear in the Python SDK. For coditect-core's Python-first architecture, this means perpetually using an SDK that is behind its primary development track.

⚠️ No unified audit query surface. CrewAI audit events, LangGraph state traces, and SK audit hooks write to different formats and different storage locations. A compliance officer asking "show me all enterprise agent actions taken on behalf of Tenant X in the last 30 days" requires querying three separate data sources and correlating by tenant_id. This is an operational gap that creates compliance reporting risk.

⚠️ CrewAI acqui-hire risk. CrewAI Inc. is Series A funded and growing rapidly. Series A-stage AI companies with highly valued products are frequent acqui-hire targets. An acquisition could result in a license change (MIT to commercial), breaking API changes, or framework abandonment. The MIT license protects existing code but does not protect against feature freeze or deprecation of enterprise integrations.

Minor Gaps

ℹ️ LangGraph MCP support is via adapter, not native. The langchain-mcp-adapters package is a third-party adapter adding a dependency boundary. It works but adds a fragility point at the MCP tool boundary that CrewAI does not have.

ℹ️ CrewAI Flows checkpoint model is opaque. Ralph Wiggum cannot inspect CrewAI Flow state during execution — only before and after. Checkpoint granularity for long-running CrewAI workflows is limited to the step boundaries that the Flow explicitly defines, which may not align with Ralph Wiggum's checkpoint expectations.

ℹ️ Docker container cold start latency. The recommended CrewAI isolation model (container per tenant per task) adds 1-5 seconds cold start for pre-warmed containers, 15-30 seconds for cold starts. For synchronous enterprise agent workflows (user waiting for result), this is noticeable. Asynchronous workflows (background automation) are unaffected.

Risks

Risk: CrewAI production security incident (Probability: High, Impact: Critical) Prompt injection via enterprise data (email body, calendar description, document content) manipulates CrewAI agents to perform unauthorized actions. The host-process execution model means there is no containment. Mitigation: Docker isolation required before production.

Risk: LangChain ecosystem version conflict (Probability: High, Impact: Medium) A LangGraph upgrade breaks LangChain MCP adapter compatibility, disabling all MCP-based tools in LangGraph workflows. Mitigation: Frozen requirements.txt with pinned transitive dependencies; dedicated upgrade testing track.

Risk: Semantic Kernel Python SDK feature lag (Probability: Medium, Impact: Medium) A required SK security feature (e.g., new OAuth2 grant type, content safety update) is available in C# SDK but delayed 6+ months in Python SDK. CODITECT cannot implement the feature without forking SK Python. Mitigation: Evaluate whether the C# SDK as a separate microservice is preferable to the Python SDK for SK's primary role.

Risk: Three-framework stack overwhelms engineering capacity (Probability: Medium, Impact: High) Each framework has distinct upgrade, testing, and security patch requirements. At current team size, maintaining three active framework integrations alongside the core platform may be infeasible. Mitigation: Designate a dedicated Enterprise Agent Platform owner role; consider phased adoption (CrewAI first, LangGraph second, SK third).

8. Integration Patterns

Pattern 1: CrewAI Enterprise Agent Invoked via coditect-core Command

This pattern shows the full invocation flow from a CODITECT command to a CrewAI crew execution, with Docker isolation, tenant context injection, audit logging, and result storage.

# commands/enterprise-agent/start.py
# Handles: /enterprise-agent start --workflow gmail-triage --tenant <tenant_id>

from pathlib import Path
from coditect.context import get_tenant_context, TenantContext
from coditect.session_log import SessionLogger
from coditect.policy import PolicyEngine
from coditect.integrations.enterprise_agent.crewai_rw_manager import CrewAIRalphWiggumManager
from coditect.integrations.enterprise_agent.audit_wrapper import CoditactAuditCallback
from coditect.integrations.enterprise_agent.docker_runner import TenantDockerRunner


def execute_enterprise_agent(
    workflow_name: str,
    tenant_id: str,
    workflow_input: dict,
) -> dict:
    """
    Entry point for /enterprise-agent start command.
    Runs CrewAI workflow in isolated Docker container with full audit coverage.
    """
    tenant_ctx: TenantContext = get_tenant_context(tenant_id)
    logger = SessionLogger(tenant_id=tenant_id)
    policy_engine = PolicyEngine(tenant_id=tenant_id)

    # Pre-flight policy check
    policy_result = policy_engine.evaluate(
        action="enterprise_agent.start",
        parameters={"workflow": workflow_name, "input": workflow_input}
    )
    if not policy_result.allowed:
        logger.emit(f"[POLICY_BLOCK] enterprise_agent.start blocked: {policy_result.reason}")
        return {"status": "blocked", "reason": policy_result.reason}

    logger.emit(f"[ENTERPRISE_AGENT] Starting workflow: {workflow_name}")

    # Launch isolated Docker container with tenant credentials
    runner = TenantDockerRunner(tenant_id=tenant_id)
    result = runner.run_crewai_workflow(
        workflow_name=workflow_name,
        workflow_input=workflow_input,
        tenant_secrets_path=f"/run/secrets/{tenant_id}/enterprise-agent.env",
        network=f"coditect-tenant-{tenant_id}",
        image="coditect/enterprise-agent:latest",
    )

    # Store result with tenant scoping
    logger.emit(f"[ENTERPRISE_AGENT] Completed: {workflow_name} status={result['status']}")

    return result


# coditect/integrations/enterprise_agent/docker_runner.py

import subprocess
import json
import tempfile
import os
from typing import Any


class TenantDockerRunner:
    """
    Runs CrewAI workflows in isolated Docker containers.
    One container per task invocation.
    Container has access ONLY to:
    - That tenant's credentials (via Docker secrets)
    - That tenant's network segment
    - The MCP servers via allowlisted network routes
    """

    def __init__(self, tenant_id: str):
        self.tenant_id = tenant_id

    def run_crewai_workflow(
        self,
        workflow_name: str,
        workflow_input: dict,
        tenant_secrets_path: str,
        network: str,
        image: str,
        timeout_seconds: int = 300,
    ) -> dict:
        with tempfile.NamedTemporaryFile(
            mode='w', suffix='.json', delete=False
        ) as f:
            json.dump(workflow_input, f)
            input_file = f.name

        try:
            result = subprocess.run(
                [
                    "docker", "run",
                    "--rm",
                    "--network", network,
                    "--env", f"TENANT_ID={self.tenant_id}",
                    "--env", f"WORKFLOW_NAME={workflow_name}",
                    "--env-file", tenant_secrets_path,
                    "--mount", f"type=bind,source={input_file},target=/input.json,readonly",
                    "--memory", "2g",
                    "--cpus", "1.0",
                    "--read-only",
                    "--tmpfs", "/tmp:noexec,nosuid,size=100m",
                    "--security-opt", "no-new-privileges",
                    image,
                    "python", "-m", "coditect.enterprise_agent.entrypoint",
                ],
                capture_output=True,
                text=True,
                timeout=timeout_seconds,
            )

            if result.returncode != 0:
                return {
                    "status": "error",
                    "error": result.stderr[-2000:],  # Last 2000 chars
                    "workflow": workflow_name,
                }

            return json.loads(result.stdout)

        finally:
            os.unlink(input_file)

Pattern 2: LangGraph State Machine Mapped to Ralph Wiggum Checkpoints

This pattern shows a LangGraph graph for an internal development workflow (document generation with review loop) that integrates with Ralph Wiggum's checkpoint system for session-safe execution.

# coditect/integrations/langgraph/development_workflow.py

from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.postgres import PostgresSaver
from langgraph.graph.message import add_messages
from langchain_core.messages import HumanMessage, AIMessage
from coditect.integrations.enterprise_agent.langgraph_rw_bridge import (
    create_rw_aligned_checkpointer,
    resume_graph_from_rw_checkpoint,
)
from coditect.session_log import SessionLogger


class DocumentWorkflowState(TypedDict):
    """
    Typed state for document generation workflow.
    All fields are tenant-context-aware — no cross-tenant data mixed here.
    """
    tenant_id: str
    task_id: str
    document_topic: str
    messages: Annotated[list, add_messages]
    draft_content: str
    review_feedback: str
    revision_count: int
    approved: bool
    final_content: str


def build_document_workflow_graph(tenant_id: str, task_id: str):
    """
    Builds a LangGraph state machine for document generation with review loop.
    Maps to Ralph Wiggum's checkpoint model.
    """
    checkpointer = create_rw_aligned_checkpointer(
        tenant_id=tenant_id,
        loop_id=task_id
    )
    logger = SessionLogger(tenant_id=tenant_id)

    def generate_draft(state: DocumentWorkflowState) -> DocumentWorkflowState:
        """Node: Generate initial document draft."""
        logger.emit(f"[LANGGRAPH] generate_draft node executing for task={state['task_id']}")
        # Call coditect-core senior-architect agent via subprocess
        draft = _invoke_coditect_agent(
            "senior-architect",
            f"Create a technical document about: {state['document_topic']}"
        )
        return {**state, "draft_content": draft, "revision_count": 0}

    def review_draft(state: DocumentWorkflowState) -> DocumentWorkflowState:
        """Node: Review draft — this is an INTERRUPT node (HITL gate)."""
        # This node is decorated as interrupt — LangGraph pauses here
        # waiting for human input via the thread_id's update mechanism
        logger.emit(f"[LANGGRAPH] review_draft interrupt node for task={state['task_id']}")
        return state  # State unchanged; human provides feedback via graph.update_state()

    def revise_draft(state: DocumentWorkflowState) -> DocumentWorkflowState:
        """Node: Revise based on feedback."""
        logger.emit(f"[LANGGRAPH] revise_draft node, revision={state['revision_count']}")
        revised = _invoke_coditect_agent(
            "codi-documentation-writer",
            f"Revise this document based on feedback:\n"
            f"Document: {state['draft_content']}\n"
            f"Feedback: {state['review_feedback']}"
        )
        return {
            **state,
            "draft_content": revised,
            "revision_count": state["revision_count"] + 1
        }

    def finalize(state: DocumentWorkflowState) -> DocumentWorkflowState:
        """Node: Mark document as approved."""
        return {**state, "final_content": state["draft_content"], "approved": True}

    def should_revise(state: DocumentWorkflowState) -> str:
        """Conditional edge: route to revise or finalize based on approval."""
        if state.get("approved"):
            return "finalize"
        if state["revision_count"] >= 3:
            return "finalize"  # Auto-approve after 3 revisions
        return "revise"

    builder = StateGraph(DocumentWorkflowState)
    builder.add_node("generate_draft", generate_draft)
    builder.add_node("review_draft", review_draft)
    builder.add_node("revise_draft", revise_draft)
    builder.add_node("finalize", finalize)

    builder.set_entry_point("generate_draft")
    builder.add_edge("generate_draft", "review_draft")
    builder.add_conditional_edges(
        "review_draft",
        should_revise,
        {"revise": "revise_draft", "finalize": "finalize"}
    )
    builder.add_edge("revise_draft", "review_draft")
    builder.add_edge("finalize", END)

    # interrupt_before=["review_draft"] makes review_draft a HITL gate
    graph = builder.compile(
        checkpointer=checkpointer,
        interrupt_before=["review_draft"]
    )
    return graph


def _invoke_coditect_agent(agent_name: str, prompt: str) -> str:
    """Subprocess call to coditect-core agent — maintains the invocation direction."""
    import subprocess
    result = subprocess.run(
        ["python", "-m", "coditect.agent_runner", "--agent", agent_name, "--prompt", prompt],
        capture_output=True, text=True, timeout=120
    )
    return result.stdout

Pattern 3: Semantic Kernel as MCP Security Service

This pattern deploys Semantic Kernel as a standalone Python microservice exposing an MCP interface. coditect-core agents call into it as an MCP tool, isolating the C# dependency surface entirely.

# coditect/services/sk_mcp_server/server.py
# Runs as: python -m coditect.services.sk_mcp_server --port 8080
# Deployed as a separate Docker container in GKE

import asyncio
import json
from pathlib import Path
from fastapi import FastAPI, HTTPException, Header
from pydantic import BaseModel
import semantic_kernel as sk
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion
from semantic_kernel.core_plugins.microsoft_graph import MicrosoftGraphPlugin
from semantic_kernel.filters.auto_function_invocation.auto_function_invocation_context import (
    AutoFunctionInvocationContext,
)


app = FastAPI(title="CODITECT Semantic Kernel MCP Server")


class SKActionRequest(BaseModel):
    tenant_id: str
    plugin_name: str
    function_name: str
    parameters: dict
    oauth_token: str  # Short-lived token injected by coditect-core, never stored


class CoditactAuditFilter:
    """
    SK invocation filter that emits audit events for every function call.
    This is the compliance core of the SK integration.
    """

    async def on_auto_function_invocation(
        self,
        context: AutoFunctionInvocationContext,
        next,
    ) -> None:
        tenant_id = context.arguments.get("tenant_id", "unknown")

        # Pre-invocation audit
        audit_event = {
            "event": "sk.function_invoke",
            "tenant_id": tenant_id,
            "plugin": context.function.plugin_name,
            "function": context.function.name,
            "parameters": {
                k: "[REDACTED]" if "token" in k.lower() or "secret" in k.lower() else v
                for k, v in context.arguments.items()
            },
        }
        _emit_audit_event(audit_event)

        await next(context)

        # Post-invocation audit
        _emit_audit_event({
            "event": "sk.function_complete",
            "tenant_id": tenant_id,
            "plugin": context.function.plugin_name,
            "function": context.function.name,
            "success": not context.is_cancel_requested,
        })


@app.post("/mcp/tools/invoke")
async def invoke_sk_function(
    request: SKActionRequest,
    x_coditect_tenant: str = Header(None),
):
    """
    MCP-compatible tool invocation endpoint.
    Called by coditect-core agents and CrewAI crews via MCP protocol.
    """
    if x_coditect_tenant != request.tenant_id:
        raise HTTPException(status_code=403, detail="Tenant context mismatch")

    kernel = sk.Kernel()

    # Add audit filter
    kernel.add_filter("auto_function_invocation", CoditactAuditFilter())

    # Configure Microsoft Graph plugin with the provided OAuth token
    # Token is tenant-specific and short-lived — never persisted in this service
    if request.plugin_name == "MicrosoftGraph":
        plugin = MicrosoftGraphPlugin(access_token=request.oauth_token)
        kernel.add_plugin(plugin, plugin_name="MicrosoftGraph")

    result = await kernel.invoke(
        plugin_name=request.plugin_name,
        function_name=request.function_name,
        **request.parameters
    )

    return {
        "status": "success",
        "result": str(result),
        "tenant_id": request.tenant_id,
        "plugin": request.plugin_name,
        "function": request.function_name,
    }


@app.get("/mcp/tools/list")
async def list_available_tools():
    """MCP tool discovery endpoint — called by CrewAI and coditect-core at startup."""
    return {
        "tools": [
            {
                "name": "MicrosoftGraph.send_email",
                "description": "Send email via Microsoft Graph API (Outlook)",
                "parameters": {
                    "to": "string",
                    "subject": "string",
                    "body": "string",
                    "tenant_id": "string",
                },
                "requires_oauth": True,
                "audit_required": True,
                "hitl_required": True,
            },
            {
                "name": "MicrosoftGraph.create_calendar_event",
                "description": "Create calendar event via Microsoft Graph API",
                "parameters": {
                    "title": "string",
                    "start_time": "string",
                    "end_time": "string",
                    "attendees": "list[string]",
                    "tenant_id": "string",
                },
                "requires_oauth": True,
                "audit_required": True,
                "hitl_required": True,
            },
        ]
    }


def _emit_audit_event(event: dict) -> None:
    """Write to CODITECT session log via stdout (captured by coditect-core process manager)."""
    import sys
    print(json.dumps({"type": "audit", **event}), file=sys.stderr, flush=True)

Database Schema Extensions

-- Enterprise agent configuration per tenant
CREATE TABLE enterprise_agent_config (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    framework VARCHAR(50) NOT NULL CHECK (framework IN ('crewai', 'langgraph', 'semantic_kernel')),
    config JSONB NOT NULL DEFAULT '{}',
    enabled BOOLEAN NOT NULL DEFAULT false,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_eac_tenant ON enterprise_agent_config(tenant_id);
ALTER TABLE enterprise_agent_config ENABLE ROW LEVEL SECURITY;
CREATE POLICY eac_tenant_isolation ON enterprise_agent_config
    USING (tenant_id = current_setting('app.current_tenant')::UUID);

-- Enterprise agent task audit log
CREATE TABLE enterprise_agent_audit (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    user_id UUID REFERENCES users(id),
    framework VARCHAR(50) NOT NULL,
    tool_name VARCHAR(255) NOT NULL,
    action_type VARCHAR(100) NOT NULL,
    parameters JSONB,         -- PII-sanitized at write time
    outcome VARCHAR(50) CHECK (outcome IN ('success', 'blocked_policy', 'blocked_hitl', 'error')),
    evidence_id UUID,         -- FK to esignature_evidence if HITL was required
    ip_address INET,
    created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_eaa_tenant_time ON enterprise_agent_audit(tenant_id, created_at DESC);
ALTER TABLE enterprise_agent_audit ENABLE ROW LEVEL SECURITY;
CREATE POLICY eaa_tenant_isolation ON enterprise_agent_audit
    USING (tenant_id = current_setting('app.current_tenant')::UUID);

-- OAuth2 credentials store (refresh tokens only, AES-256 encrypted)
CREATE TABLE enterprise_agent_credentials (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    provider VARCHAR(100) NOT NULL,  -- 'google_workspace', 'microsoft_365', etc.
    encrypted_refresh_token BYTEA NOT NULL,  -- AES-256-GCM encrypted
    scope TEXT[] NOT NULL,
    expires_hint TIMESTAMPTZ,        -- Best-guess expiry, not authoritative
    created_at TIMESTAMPTZ DEFAULT NOW(),
    last_used_at TIMESTAMPTZ,
    UNIQUE(tenant_id, provider)
);
ALTER TABLE enterprise_agent_credentials ENABLE ROW LEVEL SECURITY;
CREATE POLICY eac_creds_tenant_isolation ON enterprise_agent_credentials
    USING (tenant_id = current_setting('app.current_tenant')::UUID);

-- LangGraph checkpoint table (created by PostgresSaver.setup() but needs RLS)
-- Run after PostgresSaver.setup():
ALTER TABLE langgraph_checkpoints ENABLE ROW LEVEL SECURITY;
CREATE POLICY lg_tenant_isolation ON langgraph_checkpoints
    -- thread_id format: {tenant_id}:{loop_id}:{task_id}
    USING (split_part(thread_id, ':', 1) = current_setting('app.current_tenant'));

9. Decision Framework

Weighted Scoring for CODITECT Integration

Criterion	Weight	CrewAI Score	LangGraph Score	SK Score	Stack Score	Notes
Multi-Tenant Isolation	Critical	2	4	3	3	CrewAI requires full container isolation; LG needs thread_id scoping (achievable); SK needs token scoping
Compliance Surface	Critical	3	5	9	6	CrewAI no audit trail natively; LG structural trace is excellent; SK has built-in audit hooks — stack average is misleading; SK alone is compliance-capable
Observability	Important	2	5	8	5	CrewAI no OTEL; LG via callbacks (good); SK native OTEL (best in stack)
Agent Orchestration Fit	Important	6	9	3	6	CrewAI fit but hierarchy conflict; LG is natural Ralph Wiggum analog; SK poorest fit
Performance at Scale	Important	3	6	7	5	CrewAI container startup latency is real; LG graph is efficient; SK is stateless MCP calls
License Compatibility	Critical	10	10	10	10	All MIT — no concerns

Note on stack scoring: The three-framework stack is assessed as a unit, not as individual selections. A gap in one framework that is covered by another in the stack counts as covered at the stack level (e.g., SK's audit hooks compensate for CrewAI's lack of audit natively). However, gaps that require CODITECT engineering (rather than another framework in the stack) are scored against the stack as a whole.

Criterion	Weight	Stack Score (0-10)	Weighted Score	Notes
Multi-Tenant Isolation	Critical (x2)	3	6	Requires full container isolation implementation
Compliance Surface	Critical (x2)	6	12	SK covers; CrewAI audit wrapper required
Observability	Important (x1)	5	5	LangFuse integration addresses most gaps
Agent Orchestration Fit	Important (x1)	6	6	LangGraph fit is excellent; CrewAI hierarchy manageable
Performance at Scale	Important (x1)	5	5	Container startup latency is the primary concern
License Compatibility	Critical (x2)	10	20	All MIT — maximum score
Total			54/70

Threshold: Go (>=55, all critical >=7), Conditional (35-54 OR any critical <7), No-Go (<35 OR any critical =0)

Score: 54/70 — Conditional

Multi-Tenant Isolation scores 3 (below the critical threshold of 7) because the isolation implementation is entirely CODITECT's responsibility and has not been built. License Compatibility is 10. Compliance Surface is 6 (SK provides the mechanism; the CrewAI audit wrapper must be built). The overall score of 54 falls at the top of the Conditional band.

Conditions for Conditional Go:

Container isolation implemented first. CrewAI MUST run in Docker container with tenant-scoped network, secrets injection, and --read-only filesystem before any production tenant data touches the enterprise agent stack. This is a hard gate, not a suggestion.
LangGraph requirements freeze. requirements-enterprise-agent.txt with all LangChain + LangGraph + MCP adapter versions pinned (pip freeze output, not version ranges) must be committed before the first production deployment.
Enterprise agent security ADR approved. The sandboxing model, HITL gate design, credential storage architecture, and audit log schema must be documented as an ADR and approved by engineering leadership before any tenant credentials are stored.
Semantic Kernel deployed as MCP microservice, not embedded. The Python SDK's second-class status relative to C# makes direct embedding high-risk. The MCP server deployment pattern isolates the SK dependency boundary.
CoditactAuditCallback implemented and tested. The audit wrapper for CrewAI tool calls must achieve >95% coverage of all tool invocations in integration tests before any production use.
Phased tenant rollout. The first production tenant on the enterprise agent stack must be an internal test tenant (coditect's own workflows), not an external customer. External customer enablement requires a separate gate after 30 days of internal operation without incidents.

10. Next Steps

If Conditional Go Conditions Met:

Week 1-2: Security ADR
- Draft ADR: Enterprise Agent Security Architecture
- Cover: container isolation model, credential vault design, HITL gate requirements, audit schema
- Approval required from: Engineering Lead, Compliance (SOC2 addendum review)
Week 2-4: Infrastructure
- Implement TenantDockerRunner with GKE NetworkPolicy for tenant isolation
- Deploy PostgreSQL RLS policies for LangGraph checkpoint table
- Implement encrypted credential store for OAuth2 refresh tokens
- Create enterprise_agent_audit table with RLS
Week 3-5: CrewAI Integration
- Implement CoditactAuditCallback with >95% tool call coverage
- Implement PolicyAwareTool base class
- Create enterprise-orchestration-engine agent definition
- Add /enterprise-agent command suite
- Prototype: Gmail triage crew using CrewAI + coditect MCP servers (day-one synergy)
Week 5-7: LangGraph Integration
- Implement langgraph_rw_bridge.py (PostgreSQL checkpointer with RLS)
- Create stateful-workflow-engine agent definition
- Add /workflow-graph command suite
- Prototype: Document generation workflow with LangGraph interrupt node as HITL gate
Week 7-10: Semantic Kernel MCP Server
- Deploy sk_mcp_server as GKE service
- Configure Microsoft Graph plugin with OAuth2 flow
- Implement CoditactAuditFilter
- Add /ms365-auth command for tenant OAuth2 onboarding
Week 10-12: Integration Testing
- Run enterprise agent workflows against internal test tenant
- SBOM generation and security scan of all new dependencies
- Penetration test: prompt injection via malicious email body
- Load test: 10 concurrent tenant enterprise agent tasks
- SOC2 addendum review of audit log schema and retention
Week 12+: Staged External Rollout
- Invite 1-2 pilot customers (non-HIPAA initially)
- Monitor for 30 days
- Gate HIPAA customer enablement on clean 30-day record

If No-Go (conditions not met in reasonable timeline):

Document as ADR: "Enterprise Agent Stack — Deferred Pending Security Infrastructure"
Archive all research artifacts in internal/analysis/autonomous-enterprise-agent/
Identify minimum viable alternative: CrewAI only, no LangGraph, no SK, with custom security wrapper
Revisit in 90 days after security infrastructure is built

References

Framework Evaluation: internal/analysis/autonomous-enterprise-agent/autonomous-enterprise-agent-framework-evaluation-2026-02-19.md
Pairwise Comparison: internal/analysis/autonomous-enterprise-agent/autonomous-enterprise-agent-pairwise-comparison-2026-02-19.md
Search Strategy: internal/analysis/autonomous-enterprise-agent/autonomous-enterprise-agent-search-strategy-2026-02-19.md
CODITECT Architecture: internal/architecture/SDD-CODITECT-MULTI-TENANT-SAAS.md
Ralph Wiggum: docs/guides/RALPH-WIGGUM-GUIDE.md | internal/architecture/RALPH-WIGGUM-ARCHITECTURE.md
Database Schema: docs/reference/database/DATABASE-SCHEMA.md
Multi-Tenancy Standard: ADR reference required — see internal/architecture/adrs/
Sidecar AGPL Analysis: internal/architecture/adrs/ADR-165.md through ADR-169.md
OWASP Agent Security Top 10 2026: https://medium.com/@oracle_43885/owasps-ai-agent-security-top-10-agent-security-risks-2026-fc5c435e86eb
CrowdStrike MCP Security: https://www.crowdstrike.com/en-us/blog/what-security-teams-need-to-know-about-openclaw-ai-super-agent/

(CrewAI + LangGraph + Semantic Kernel)

Executive Summary​

1. Integration Architecture​

Control Plane vs Data Plane​

CODITECT Component Touchpoints​

Architecture Diagram​

2. Multi-Tenancy Implications​

Tenant Isolation Requirements by Framework​

Credential Scoping​

Resource Quotas​

3. Compliance Surface​

Auditability​

OWASP Agent Security Top 10 — CODITECT Mitigation Mapping​

Policy Injection Points​

E-Signatures and Evidence​

4. Observability Story​

Current coditect-core Observability​

What the Enterprise Agent Stack Adds​

Gaps​

Recommended Observability Integration​

Grafana Dashboard Gaps​

5. Multi-Agent Orchestration Fit​

coditect-core's Existing 776-Agent System​

Conflict Analysis: coditect-core Agents vs. Framework Agents​

Ralph Wiggum Loop Integration​

6. Advantages​

Advantage 1: Immediate MCP Server Utilization via CrewAI​

Advantage 2: Enterprise System Coverage Without Connector Development​

Advantage 3: LangGraph Closes the Ralph Wiggum Feature Gap​

Advantage 4: Semantic Kernel Eliminates Enterprise Auth Build Cost​

Advantage 5: Compliance Differentiation for Enterprise Customers​

7. Gaps and Risks​

Critical Gaps​

Major Gaps​

Minor Gaps​

Risks​

8. Integration Patterns​

Pattern 1: CrewAI Enterprise Agent Invoked via coditect-core Command​

Pattern 2: LangGraph State Machine Mapped to Ralph Wiggum Checkpoints​

Pattern 3: Semantic Kernel as MCP Security Service​

Database Schema Extensions​

9. Decision Framework​

10. Next Steps​

References​