CODITECT Integration Impact: Autonomous Enterprise Agent Stack
(CrewAI + LangGraph + Semantic Kernel)
Research Date: 2026-02-19 Analyst: Claude (Sonnet 4.6) — Research Impact Analyzer Status: Draft for Engineering Review Predecessor Documents:
autonomous-enterprise-agent-framework-evaluation-2026-02-19.md(10-framework graded evaluation)autonomous-enterprise-agent-pairwise-comparison-2026-02-19.md(Condorcet tournament, complementarity assessment)autonomous-enterprise-agent-search-strategy-2026-02-19.md(license matrix, OWASP mapping, search strategy)
Executive Summary
The autonomous enterprise agent evaluation concluded with a 3-framework complementary stack recommendation: CrewAI (MIT) as primary orchestration for customer-facing workflows, LangGraph (MIT) for internal development workflows and stateful processing, and Semantic Kernel (MIT) as the security and Microsoft 365 integration layer. The pairwise tournament revealed a Condorcet cycle — no single framework dominates — which is the strongest possible signal that a layered stack is architecturally correct rather than a compromise.
CODITECT's current platform (776 agents, 445 skills, 377 commands, 118 hooks, Ralph Wiggum loops, 6-database context storage, MCP servers) provides natural integration anchor points for all three frameworks. However, the integration is not low-friction: CrewAI's host-process execution model directly contradicts CODITECT's compliance requirements, LangGraph's LangChain dependency surface creates significant version management risk, and Semantic Kernel's C#-primary architecture introduces a language boundary that will require sustained adapter maintenance.
Recommendation: Conditional Go
Conditions for Go:
- CrewAI MUST run in an isolated Docker container, never in the coditect-core main process
- Semantic Kernel MUST be deployed as a standalone Python microservice with MCP interface — no C# in the coditect-core process
- LangGraph MUST be pinned to a requirements file with transitive dependencies frozen before any production deployment
- An enterprise agent security ADR MUST be approved before any external API credentials are handled
Key Findings:
- CrewAI native MCP support delivers immediate zero-adapter connection to existing coditect-core MCP servers (semantic-search, call-graph, impact-analysis)
- LangGraph's interrupt-node model is the closest structural analog to Ralph Wiggum checkpoints of any evaluated framework
- Semantic Kernel is the only framework with built-in OAuth2/OIDC, audit hooks, and content safety — not optional for a compliance-first platform
- ❌ CrewAI runs all agent code in the host process with no sandboxing — prompt injection reaches every system resource
- ❌ No framework provides tenant-scoped credential isolation natively — all three must be wrapped
- ❌ LangGraph's dependency graph (LangChain ecosystem) is notoriously brittle — version pinning is mandatory, not advisory
- ⚠️ Three-framework stack triples the maintenance surface area — each framework has independent release cadences
- ⚠️ Semantic Kernel's coditect-core fit score is 2/5 — the integration effort is "High (months)" and cannot be compressed
1. Integration Architecture
Control Plane vs Data Plane
Technology Role: All three frameworks operate in both planes depending on invocation mode.
Control Plane Integration:
The coditect-core orchestration layer — hooks, Ralph Wiggum loops, the agent dispatcher, and command execution — is the control plane. All three frameworks are invoked FROM this layer, never the reverse. This is the critical architectural constraint: CrewAI, LangGraph, and Semantic Kernel must be sub-orchestrators called by coditect-core, not top-level orchestrators that call into coditect-core.
This is not how any of the three frameworks are designed. CrewAI explicitly wants to be the top-level orchestrator. LangGraph assumes it owns the state machine root. Semantic Kernel assumes it owns the plugin execution context. All three must be architecturally inverted to fit CODITECT's existing orchestration hierarchy.
- CrewAI inverted: coditect-core commands dispatch
enterprise-orchestration-enginewhich launches a CrewAI Crew as a subprocess, monitors via callback hooks, and receives results - LangGraph inverted: Ralph Wiggum loop state machine calls LangGraph graph execution as a child process; LangGraph graph nodes map to Ralph Wiggum checkpoint transitions
- Semantic Kernel inverted: exposed as an MCP server endpoint; coditect-core agents call SK functions as MCP tools, never embedding SK directly
Data Plane Integration:
The data plane is where tenant data flows: PostgreSQL (GKE), Redis, org.db/sessions.db context storage, session logs.
- CrewAI data plane: CrewAI agents read enterprise data (Gmail, Calendar, Drive) and write results back via coditect-core session logging. Intermediate CrewAI agent memory MUST NOT persist to any shared storage without tenant scoping.
- LangGraph data plane: LangGraph state (graph checkpoint objects) maps to Ralph Wiggum state files at
~/.coditect-data/ralph-loops/. Tenant context must be injected into every graph node's state dict. - Semantic Kernel data plane: SK memory stores (vector databases for semantic memory) MUST be tenant-isolated. If SK's default volatile memory is used, isolation is process-level only — not sufficient for multi-tenant production.
CODITECT Component Touchpoints
| CODITECT Component | CrewAI Integration | LangGraph Integration | Semantic Kernel Integration |
|---|---|---|---|
| agents/ (776 agents) | New agent type: enterprise-orchestration-engine dispatches CrewAI Crews | New agent type: stateful-workflow-engine manages LangGraph graphs | New agent type: ms365-integration-service wraps SK as MCP server |
| commands/ (377 commands) | New: /enterprise-agent start, /enterprise-agent status, /enterprise-agent stop | New: /workflow-graph create, /workflow-graph resume (maps to Ralph Wiggum commands) | New: /ms365-auth, /ms365-action |
| hooks/ (118 hooks) | PreToolUse hook intercepts all CrewAI tool calls as HITL gates. PostToolUse records to audit log | PreToolUse mapped to LangGraph interrupt nodes — node execution pauses awaiting hook approval | PreToolUse validates SK function call scope against tenant OAuth2 scopes |
| skills/ (445 skills) | New skill directory: skills/enterprise-agent/ — Gmail, Calendar, Drive, Office automation skills as SKILL.md definitions | New skill: skills/stateful-workflow/SKILL.md — LangGraph graph construction patterns | New skill: skills/ms365-integration/SKILL.md |
| MCP Servers | CrewAI native MCP client connects to semantic-search and call-graph servers with zero adapter code | LangGraph via langchain-mcp-adapters — adds fragility at tool boundary | SK exposes itself AS an MCP server: sk-security-service:8080 |
| Ralph Wiggum loops | CrewAI Flows run as Ralph Wiggum managed sub-processes. RW checkpoints CrewAI Flow state | LangGraph IS Ralph Wiggum at the sub-loop level — interrupt-resume maps directly to RW checkpoint-handoff | SK is stateless from RW perspective — auth tokens managed separately |
| Session Logging | All CrewAI task completions emit to session log via PostToolUse hook | All LangGraph node transitions emit structured log entries to session log | All SK function calls with OAuth2 tokens emit audit events to session log |
| org.db | Enterprise agent decisions (e.g., "approved sending email to vendor X") stored in decisions table | Workflow graph design decisions stored in decisions table | MS365 auth configuration decisions stored in decisions table |
| sessions.db | CrewAI task run history stored in messages table | LangGraph execution traces stored in messages table | SK function call history stored in messages table |
| messaging.db | Agent-to-agent messages between coditect agents and enterprise-orchestration-engine | Workflow coordination messages during graph execution | Inter-agent messages for MS365 action coordination |
Architecture Diagram
2. Multi-Tenancy Implications
Technology Native Multi-Tenancy: No — none of the three frameworks are multi-tenant-aware.
This is the most significant architectural gap in the entire stack. All three frameworks were designed for single-tenant or developer-personal use cases. Embedding them in a multi-tenant SaaS requires CODITECT to implement the full tenant isolation layer externally.
Tenant Isolation Requirements by Framework
CrewAI Tenant Isolation:
CrewAI has no concept of tenant_id. Its agent memory, crew state, and task results are process-global. The isolation strategy is process-level, not data-level:
- Each tenant's CrewAI execution MUST run in a separate Docker container instance
- Container is provisioned on demand, receives only that tenant's credentials and context, destroyed after task completion
- No shared memory, no shared filesystem, no shared Redis between tenant containers
- CrewAI container receives tenant context via environment variables injected by coditect-core at dispatch time
The container-per-tenant model is the only safe approach given CrewAI's lack of internal isolation. It is also operationally expensive: each enterprise agent invocation requires container provisioning latency (typically 1-5 seconds for a pre-warmed Docker container, 15-30 seconds cold start).
LangGraph Tenant Isolation:
LangGraph's graph state object is in-process memory by default. When using LangGraph's built-in persistence (SQLite checkpointer), ALL tenant graphs share the same checkpoint store unless explicitly partitioned.
- MUST use thread_id scoped to tenant:
config = {"configurable": {"thread_id": f"{tenant_id}:{task_id}"}} - MUST use a PostgreSQL checkpointer (not SQLite) with row-level security policies for production multi-tenant use
- LangGraph's default MemorySaver is explicitly not safe for multi-tenant — any cross-tenant memory access is silent data leakage
Semantic Kernel Tenant Isolation:
SK's plugin system uses per-plugin OAuth2 tokens. In a multi-tenant context:
- Each tenant's SK plugin invocations MUST use tokens scoped to that tenant's OAuth2 client credentials
- SK's semantic memory (if used) MUST be backed by a tenant-partitioned vector store, not the default volatile memory
- Token storage MUST NOT be shared across tenant sessions — tokens stored in sessions.db MUST be scoped to tenant_id
Credential Scoping
The enterprise agent stack will handle the most sensitive tenant credentials in the CODITECT platform: OAuth2 refresh tokens for Gmail, Calendar, Outlook, and potentially SharePoint. This is a fundamentally different risk profile from the current coditect-core credential model.
Current CODITECT credential scope: API keys for LLM providers, deployment credentials for GKE, database connection strings. These are platform-level credentials, not tenant-level.
Enterprise agent credential scope: Per-tenant OAuth2 refresh tokens for personal and organizational email/calendar/documents. Exfiltration of these credentials is a direct HIPAA/SOC2 breach.
❌ None of the three frameworks provide credential vaulting. CODITECT must implement:
- Encrypted credential storage in org.db with tenant_id foreign key and AES-256 encryption at rest
- Short-lived access token derivation (never persisting access tokens, only refresh tokens)
- Credential injection at execution time via environment variable into isolated container (CrewAI) or context parameter (LangGraph)
- Automatic credential rotation hooks when tokens expire
Resource Quotas
CrewAI crews can spawn multiple parallel agents, each consuming LLM API calls. Without quota enforcement, a single tenant's enterprise agent workflow can saturate the platform's LLM budget.
⚠️ No framework provides per-tenant resource quota enforcement. CODITECT must implement:
- Token budget per tenant per hour (enforced at PostToolUse hook level)
- Maximum concurrent CrewAI agents per tenant
- LangGraph node execution time limits per tenant
- Rate limiting at the MCP server gateway for SK function calls
3. Compliance Surface
Auditability
What changes when enterprise agents are active:
Today, all coditect-core actions are initiated by a human (developer, admin, or authorized user) through documented commands. The audit trail is: human command -> hook -> action -> session log.
With enterprise agents, the audit trail expands: human command -> coditect-core agent -> CrewAI crew -> multiple tool calls -> external API mutations (email sent, calendar event created, document modified) -> session log. The external API mutations are the critical new surface — they occur OUTSIDE the coditect-core process boundary in external systems.
Compliance Requirements for External API Actions:
| Action Type | CODITECT Requirement | Gap Status |
|---|---|---|
| Email sent on behalf of tenant | Logged with recipient, subject, timestamp, tenant_id, user_id | ❌ CrewAI does not emit structured audit events for tool results |
| Calendar event created | Logged with attendees, time, tenant_id | ❌ CrewAI does not emit structured audit events for tool results |
| Document created/modified | Logged with document ID, change summary, tenant_id | ❌ Same gap |
| SK OAuth2 token used | Logged with scope, resource, timestamp | ✅ SK has built-in audit hooks — this is the one area where compliance works natively |
| LangGraph node transition | Logged as state transition with full state diff | ✅ LangGraph's stateful graph produces an automatic audit trace |
| HITL approval requested | Logged with context, approver, decision, timestamp | ✅ Exists via PreToolUse hooks, must be extended for enterprise actions |
Mitigation for CrewAI audit gap:
# coditect/integrations/enterprise_agent/audit_wrapper.py
from datetime import datetime, timezone
import json
from crewai import Agent, Task, Crew
from crewai.callbacks import BaseCallback
from coditect.session_log import SessionLogger
from coditect.context import get_tenant_context
class CoditactAuditCallback(BaseCallback):
"""
Wraps every CrewAI tool call with CODITECT-compliant audit emission.
Injected at Crew construction time — NOT optional for production use.
"""
def __init__(self, tenant_id: str, session_logger: SessionLogger):
self.tenant_id = tenant_id
self.session_logger = session_logger
def on_tool_start(
self,
tool_name: str,
tool_input: dict,
agent_name: str,
**kwargs
) -> None:
self.session_logger.emit_structured({
"event": "enterprise_agent.tool_start",
"tenant_id": self.tenant_id,
"tool": tool_name,
"agent": agent_name,
"input_summary": self._sanitize_input(tool_input),
"timestamp": datetime.now(timezone.utc).isoformat(),
})
def on_tool_end(
self,
tool_name: str,
tool_output: str,
agent_name: str,
**kwargs
) -> None:
self.session_logger.emit_structured({
"event": "enterprise_agent.tool_end",
"tenant_id": self.tenant_id,
"tool": tool_name,
"agent": agent_name,
"output_summary": self._sanitize_output(tool_output),
"timestamp": datetime.now(timezone.utc).isoformat(),
})
def _sanitize_input(self, tool_input: dict) -> dict:
"""Strip PII fields from audit log — HIPAA-required."""
SENSITIVE_KEYS = {"body", "content", "message", "text", "password"}
return {
k: "[REDACTED]" if k.lower() in SENSITIVE_KEYS else v
for k, v in tool_input.items()
}
def _sanitize_output(self, output: str) -> str:
"""Truncate output to prevent PII in audit log."""
return output[:500] + "...[truncated]" if len(output) > 500 else output
OWASP Agent Security Top 10 — CODITECT Mitigation Mapping
| OWASP Risk | Severity | coditect-core Mitigation | Gap |
|---|---|---|---|
| Excessive Agency | Critical | PreToolUse hooks enforce tool allowlist per tenant policy | ⚠️ Allowlist must be manually maintained per tenant configuration |
| Prompt Injection | Critical | SK content safety filters (if SK in stack). No native protection in CrewAI | ❌ CrewAI has no prompt injection defense — external input to agents is unrestricted |
| Insecure Tool Use | High | PreToolUse hook validates tool parameters against schema before execution | ⚠️ Schema validation must be authored per tool — not automatic |
| Data Exfiltration | Critical | CrewAI container outbound network allowlist (Docker network policy) | ❌ Allowlist not implemented by default — must be GKE NetworkPolicy |
| Privilege Escalation | High | OAuth2 scoped tokens via SK; minimum-scope enforcement | ⚠️ Scope minimization must be configured per workflow — not enforced by default |
| Insufficient Logging | Critical | CoditactAuditCallback on all CrewAI tool calls; LangGraph state trace; SK audit hooks | ⚠️ Three separate audit systems — no unified audit query surface |
| Uncontrolled Autonomy | Critical | Ralph Wiggum HITL gate at loop start; PreToolUse hooks for high-risk tools | ⚠️ HITL coverage gaps exist for mid-workflow decisions not anticipated at design time |
| Supply Chain | Medium | All three frameworks are MIT-licensed; SBOM generation required | ❌ No SBOM currently generated for enterprise agent dependencies |
| Model Manipulation | High | SK content safety filters; coditect-core output filtering at PostToolUse | ❌ No output guardrails in CrewAI — agent outputs are not validated before action execution |
| Denial of Service | Medium | Per-tenant token budget enforcement at PostToolUse | ❌ Not implemented — must be built before multi-tenant launch |
Policy Injection Points
CODITECT's compliance model requires that tenant-level policies intercept and block actions that violate that tenant's compliance posture (e.g., a HIPAA tenant must not allow PII export without explicit consent logging).
Available injection points:
-
PreToolUse hook (best injection point): Fires before any tool call. Can block the call, modify parameters, or require human approval. Maps to all three frameworks if CrewAI tools are routed through coditect-core's hook system.
-
LangGraph interrupt nodes (structural): A graph node can be flagged as an interrupt point. LangGraph pauses execution and awaits external input before continuing. This is the most auditable HITL mechanism in the stack.
-
SK process filters (deepest): SK's filter pipeline can intercept every function call before and after execution. This is the only framework-native policy injection mechanism that can block based on content.
Policy enforcement gap for CrewAI: CrewAI's callback system fires events but CANNOT block tool execution. If an enterprise agent is mid-workflow and a tool call would violate a tenant policy, the callback can log the violation but cannot prevent the action. Policy blocking in CrewAI requires the tool implementation itself to call back into coditect-core's policy engine before executing.
# coditect/integrations/enterprise_agent/policy_aware_tool.py
from crewai.tools import BaseTool
from coditect.policy import PolicyEngine
from coditect.exceptions import PolicyViolationError
class PolicyAwareTool(BaseTool):
"""
Base class for all enterprise agent tools.
Every tool in the enterprise agent stack MUST inherit from this,
not from crewai.tools.BaseTool directly.
"""
def __init__(self, tenant_id: str, policy_engine: PolicyEngine, **kwargs):
super().__init__(**kwargs)
self.tenant_id = tenant_id
self.policy_engine = policy_engine
def _run(self, **kwargs) -> str:
# Policy check BEFORE execution
policy_result = self.policy_engine.evaluate(
tenant_id=self.tenant_id,
action=self.name,
parameters=kwargs
)
if not policy_result.allowed:
raise PolicyViolationError(
f"Policy blocked {self.name}: {policy_result.reason}\n"
f"Tenant: {self.tenant_id}"
)
# HITL gate for high-risk actions
if policy_result.requires_human_approval:
approval = self._request_human_approval(kwargs, policy_result)
if not approval.approved:
raise PolicyViolationError(
f"Human declined {self.name}: {approval.reason}"
)
# Execute the actual tool action
result = self._execute(**kwargs)
# Policy check AFTER execution (for output filtering)
filtered_result = self.policy_engine.filter_output(
tenant_id=self.tenant_id,
action=self.name,
output=result
)
return filtered_result
def _execute(self, **kwargs) -> str:
raise NotImplementedError("Subclasses must implement _execute()")
def _request_human_approval(self, parameters: dict, policy_result) -> object:
# Dispatch to CODITECT HITL approval workflow
# Returns approval object with .approved and .reason
from coditect.hitl import HITLApprovalWorkflow
return HITLApprovalWorkflow.request(
tenant_id=self.tenant_id,
action=self.name,
parameters=parameters,
policy_context=policy_result.context,
timeout_seconds=300 # 5 minute timeout
)
E-Signatures and Evidence
Enterprise agents will execute actions that constitute legal acts: sending communications on behalf of a business, creating contracts, scheduling commitments. These require evidence trails that satisfy the same bar as CODITECT's existing e-signature requirement for platform actions.
❌ None of the three frameworks provide e-signature or evidence trail mechanisms. CODITECT must route high-risk enterprise agent actions through the existing e-signature workflow:
- Enterprise agent identifies action as high-risk (email to external party, document signing, financial commitment)
- Action is serialized and submitted to CODITECT's e-signature workflow
- Authorized user reviews and signs the pending action
- Signed evidence stored in
esignature_evidencetable withenterprise_agent_action_id - Enterprise agent receives approval callback and executes the action
- Execution result stored alongside evidence record
4. Observability Story
Current coditect-core Observability
coditect-core has a session-log-centric observability model: all significant actions are written to daily session log files at ~/.coditect-data/session-logs/. The sessions.db captures message history and analytics. There is no real-time metrics pipeline, no distributed tracing, and no centralized log aggregation in the current framework documentation.
What the Enterprise Agent Stack Adds
CrewAI observability:
- CrewAI emits verbose console logging by default — not structured, not queryable
- CrewAI does NOT expose Prometheus metrics or OpenTelemetry spans natively
- CrewAI's verbose output must be captured and parsed to extract structured observability data
LangGraph observability:
- LangGraph state transitions are structurally observable: every node execution produces a state diff
- LangGraph supports LangSmith tracing natively (LangChain Inc.'s commercial product — not appropriate for a privacy-first platform)
- LangGraph supports custom callbacks that can emit to any observability backend
- The graph structure itself serves as a visual execution trace
Semantic Kernel observability:
- SK has built-in telemetry via OpenTelemetry (the strongest observability story in the stack)
- SK can export spans to any OTLP-compatible backend
- SK function call durations, success/failure rates, and token counts are available as metrics
Gaps
⚠️ No unified observability surface. Three frameworks produce three different observability data streams with different formats, different telemetry protocols, and different granularities. A CODITECT operator cannot currently query "how many enterprise agent tool calls did tenant X make in the last hour" without building a custom aggregation layer.
⚠️ No per-tenant metrics attribution. All three frameworks operate at the process level. LLM token consumption, tool call counts, and execution durations are not automatically attributed to a tenant_id without custom instrumentation.
❌ CrewAI has no OpenTelemetry support. To get structured traces from CrewAI, CODITECT must implement a LangFuse-style tracing wrapper using CrewAI's callback system. LangFuse is already in the coditect ecosystem (noted in memory/agent-labs-analysis.md) — this is the recommended approach.
Recommended Observability Integration
# coditect/integrations/enterprise_agent/observability.py
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
import time
class EnterpriseAgentTracer:
"""
Unified OpenTelemetry tracing for all three frameworks.
CrewAI: populated via CoditactAuditCallback
LangGraph: populated via StateTransitionCallback
Semantic Kernel: native OTLP export configured in SK setup
"""
def __init__(self, tenant_id: str, otlp_endpoint: str = "http://localhost:4317"):
provider = TracerProvider()
provider.add_span_processor(
BatchSpanProcessor(OTLPSpanExporter(endpoint=otlp_endpoint))
)
trace.set_tracer_provider(provider)
self.tracer = trace.get_tracer("coditect.enterprise-agent")
self.tenant_id = tenant_id
def trace_crewai_tool(self, tool_name: str, agent_name: str):
"""Context manager for wrapping CrewAI tool calls with OTel spans."""
span = self.tracer.start_span(f"crewai.tool.{tool_name}")
span.set_attribute("tenant.id", self.tenant_id)
span.set_attribute("crewai.agent", agent_name)
span.set_attribute("crewai.tool", tool_name)
return span
def trace_langgraph_node(self, node_name: str, graph_id: str):
"""Context manager for wrapping LangGraph node execution."""
span = self.tracer.start_span(f"langgraph.node.{node_name}")
span.set_attribute("tenant.id", self.tenant_id)
span.set_attribute("langgraph.graph_id", graph_id)
span.set_attribute("langgraph.node", node_name)
return span
Grafana Dashboard Gaps
The following per-tenant dashboards do not exist and must be built:
- Enterprise agent task completion rate by tenant
- LLM token consumption by tenant by framework
- HITL approval latency (time from gate trigger to approval/denial)
- Policy violation count by tenant by tool
- SK OAuth2 token refresh rate by tenant
5. Multi-Agent Orchestration Fit
coditect-core's Existing 776-Agent System
coditect-core's 776 agents are prompt-based specialist agents defined as Markdown files in agents/. Each agent has a specific expertise area (e.g., senior-architect, security-specialist, database-architect). Agents are invoked by the coditect-core command system or by Ralph Wiggum loops. They do not natively interoperate with each other — coordination is done by the calling layer.
The enterprise agent stack introduces a second agent system (CrewAI's multi-agent crew) that operates at a different level of abstraction and with different execution semantics.
Conflict Analysis: coditect-core Agents vs. Framework Agents
| Dimension | coditect-core Agents | CrewAI Agents | LangGraph Nodes | SK Functions |
|---|---|---|---|---|
| Definition format | Markdown (.md files) | Python class instances | Python functions | Python plugins |
| Execution model | Invoked by command/hook system | Run within a Crew | Run as graph nodes | Called as SK functions |
| Memory | Session log + org.db | Per-agent memory (in-process) | Graph state (persistent) | Volatile (default) |
| Communication | messaging.db + hooks | Crew task delegation | State passing via edges | Function composition |
| Security | Hook-enforced approval gates | Callback-based (cannot block) | Interrupt nodes (can block) | Process filters (can block) |
| Multi-tenancy | Not natively tenant-aware | Not natively tenant-aware | Requires thread_id scoping | Requires token scoping |
❌ No agent protocol interoperability. A coditect-core senior-architect agent cannot be called by a CrewAI crew, and a CrewAI agent cannot be invoked through coditect-core's hook system. The two agent systems are fully disjoint. The integration boundary is one-directional: coditect-core dispatches framework agents; framework agents cannot invoke coditect-core agents.
⚠️ Orchestrator hierarchy conflict. When a coditect-core command invokes an enterprise agent task, that task becomes a CrewAI crew or LangGraph graph that spawns multiple sub-agents. Those sub-agents make tool calls that should go through coditect-core hooks (for audit and policy), but CrewAI agents call tools directly without knowledge of the hook system. The tool implementations must be designed to call back into coditect-core's hook system — which is an inversion of the normal dependency direction.
Ralph Wiggum Loop Integration
Ralph Wiggum (ADR-108/110/111) provides autonomous loop execution with checkpoint-based session handoff. It is the closest existing coditect-core mechanism to a stateful workflow engine.
LangGraph + Ralph Wiggum integration (natural fit):
LangGraph's graph-node model IS a state machine with explicit checkpoints. The structural mapping is:
LangGraph Concept Ralph Wiggum Concept
------------------- --------------------
Graph state Checkpoint payload
Interrupt node Context compaction trigger
Graph resume Session recovery from checkpoint
Thread ID Loop ID
Node execution Task step
Edge condition Step outcome routing
Ralph Wiggum currently stores checkpoint state at ~/.coditect-data/ralph-loops/{loop_id}/checkpoint.json. LangGraph's PostgreSQL checkpointer can be configured to write to the same storage pattern:
# coditect/integrations/enterprise_agent/langgraph_rw_bridge.py
from langgraph.checkpoint.postgres import PostgresSaver
from coditect.paths import get_langgraph_checkpoint_db_url
from coditect.ralph_wiggum import RalphWiggumLoopState
def create_rw_aligned_checkpointer(tenant_id: str, loop_id: str) -> PostgresSaver:
"""
Creates a LangGraph checkpointer that writes to the same PostgreSQL
database used by Ralph Wiggum's loop state, with tenant_id isolation.
Thread IDs are scoped as: {tenant_id}:{loop_id}:{task_id}
This ensures no cross-tenant state leakage at the checkpoint level.
"""
db_url = get_langgraph_checkpoint_db_url()
checkpointer = PostgresSaver.from_conn_string(db_url)
checkpointer.setup() # Creates langgraph_checkpoints table if not exists
return checkpointer
def resume_graph_from_rw_checkpoint(
graph,
tenant_id: str,
loop_id: str,
task_id: str
):
"""
Resume a LangGraph graph execution from a Ralph Wiggum checkpoint.
Called when a Ralph Wiggum loop recovers from context compaction.
"""
thread_id = f"{tenant_id}:{loop_id}:{task_id}"
config = {"configurable": {"thread_id": thread_id}}
# LangGraph resumes from last persisted checkpoint automatically
for event in graph.stream(None, config=config):
yield event
CrewAI + Ralph Wiggum integration (requires wrapping):
CrewAI Flows have their own internal state machine, but it is not designed for external checkpoint management. Ralph Wiggum must treat a CrewAI Flow as an opaque subprocess:
# coditect/integrations/enterprise_agent/crewai_rw_manager.py
import subprocess
import json
from pathlib import Path
from coditect.paths import get_ralph_wiggum_state_dir
class CrewAIRalphWiggumManager:
"""
Manages CrewAI Flows as Ralph Wiggum autonomous sub-processes.
Ralph Wiggum cannot checkpoint inside CrewAI — it can only:
1. Start a new CrewAI Flow subprocess
2. Monitor its stdout/stderr for progress
3. Record the last known state on SIGTERM/failure
4. Restart from the last recorded state on session recovery
"""
def __init__(self, tenant_id: str, loop_id: str):
self.tenant_id = tenant_id
self.loop_id = loop_id
self.state_file = (
get_ralph_wiggum_state_dir() /
f"{tenant_id}" /
f"{loop_id}" /
"crewai_state.json"
)
self.state_file.parent.mkdir(parents=True, exist_ok=True)
def start_flow(self, flow_class: str, initial_input: dict) -> str:
"""Launch a CrewAI Flow as an isolated subprocess."""
state = {
"tenant_id": self.tenant_id,
"loop_id": self.loop_id,
"flow_class": flow_class,
"initial_input": initial_input,
"status": "running",
"last_checkpoint": None,
}
self.state_file.write_text(json.dumps(state, indent=2))
# CrewAI runs in Docker container, not in-process
proc = subprocess.Popen(
[
"docker", "run", "--rm",
"--network", f"tenant-{self.tenant_id}", # Isolated network
"--env", f"TENANT_ID={self.tenant_id}",
"--env", f"LOOP_ID={self.loop_id}",
"--env-file", f"/run/secrets/{self.tenant_id}/enterprise-agent.env",
"coditect/enterprise-agent:latest",
"python", "-m", "coditect.enterprise_agent.run_flow",
"--flow-class", flow_class,
"--input", json.dumps(initial_input),
],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)
return str(proc.pid)
def save_checkpoint(self, checkpoint_data: dict) -> None:
"""Called by CrewAI flow on each significant step completion."""
state = json.loads(self.state_file.read_text())
state["last_checkpoint"] = checkpoint_data
state["status"] = "checkpointed"
self.state_file.write_text(json.dumps(state, indent=2))
def resume_from_checkpoint(self) -> dict | None:
"""Returns last checkpoint for restart, or None if no checkpoint exists."""
if not self.state_file.exists():
return None
state = json.loads(self.state_file.read_text())
return state.get("last_checkpoint")
6. Advantages
What CODITECT concretely gains from this stack:
Advantage 1: Immediate MCP Server Utilization via CrewAI
CODITECT already operates three MCP servers: semantic-search, call-graph, and impact-analysis. CrewAI's native MCP client can connect to these servers without any adapter code — zero new code required on the MCP server side.
Practical impact: An enterprise agent workflow for "analyze this codebase and create a technical summary document" can use the call-graph MCP server to understand code structure, semantic-search to find relevant documentation, and impact-analysis to flag risky dependencies — all as natively available CrewAI tools on day one.
This is a day-one demonstration of value that requires only CrewAI configuration, not new development.
Advantage 2: Enterprise System Coverage Without Connector Development
CrewAI's 500+ first-party maintained integrations cover Google Workspace (Gmail, Calendar, Drive), Microsoft 365 (partial), Slack, Jira, Salesforce, HubSpot, and more. Without this stack, CODITECT would need to build and maintain each connector individually — an estimated 2-6 weeks per connector for quality enterprise API integration.
At 500+ connectors, the alternative-build cost is astronomical. Even considering only the 10 most common enterprise systems (Gmail, Calendar, Slack, Jira, Salesforce, HubSpot, Notion, Asana, Linear, GitHub), the build-vs-adopt decision has a very clear answer.
Advantage 3: LangGraph Closes the Ralph Wiggum Feature Gap
Ralph Wiggum (ADR-108/110/111) provides autonomous loop execution with checkpoint handoff, but it lacks fine-grained state machine primitives. Ralph Wiggum loops are defined as prompt-chain sequences with checkpoint JSON blobs — not as typed state machines with conditional routing.
LangGraph provides exactly the typed state machine layer that Ralph Wiggum needs for complex workflows. The structural mapping (interrupt nodes = checkpoint triggers, thread_id = loop_id, graph state = checkpoint payload) means LangGraph can serve as Ralph Wiggum's execution engine for complex internal development workflows — a capability gap that currently prevents automating multi-branch engineering tasks.
Advantage 4: Semantic Kernel Eliminates Enterprise Auth Build Cost
CODITECT currently has no OAuth2/OIDC framework for tenant-delegated enterprise system access. Building this from scratch — token storage, refresh logic, scope management, per-tenant credential isolation, audit logging for token use — is a 4-8 week engineering effort at minimum.
Semantic Kernel's built-in OAuth2/OIDC per-plugin model, audit hooks, and content safety filters provide this infrastructure. The integration cost is "High (months)" but less than building the equivalent from scratch with the same compliance level.
Advantage 5: Compliance Differentiation for Enterprise Customers
For CODITECT's enterprise customer segment (SOC2, HIPAA-ready), having an autonomous agent layer with built-in audit hooks, HITL gates, and OAuth2 security (via Semantic Kernel) is a significant competitive differentiator. Competitive enterprise agent platforms from vendors like Salesforce Agentforce, Microsoft Copilot Studio, and ServiceNow use proprietary stacks. CODITECT's MIT-licensed, open-source, self-hostable stack is the only option for customers with data residency requirements.
This is not a performance advantage — it is a market access advantage for regulated verticals.
7. Gaps and Risks
Critical Gaps
❌ CrewAI host process execution — prompt injection reaches all system resources. CrewAI runs all agent code in the same process as the caller. An adversarial enterprise input (malicious email body, crafted calendar invite, injected tool response) can manipulate a CrewAI agent's behavior. There is no sandboxing preventing the manipulated agent from calling any available tool. Given that enterprise agent tools include Gmail send, Calendar create, and file write, a successful prompt injection is a direct path to business email compromise (BEC). This is not a theoretical risk — CrowdStrike has documented this attack class for MCP-enabled AI systems. Required mitigation before any production use: Docker container isolation for all CrewAI execution.
❌ No multi-tenant credential isolation in any framework. All three frameworks handle credentials as process-global context. In a multi-tenant deployment, a bug in tenant isolation logic leaks Tenant A's Gmail OAuth2 token to Tenant B's agent. This is a direct HIPAA/SOC2 breach. The isolation implementation is entirely CODITECT's responsibility — the frameworks provide no defense. Required mitigation: container-per-tenant isolation for CrewAI; PostgreSQL RLS for LangGraph checkpoints; per-invocation token injection for SK.
❌ No built-in output guardrails in CrewAI. CrewAI agents receive tool outputs and pass them to subsequent agents or return them as task results without any content inspection. A tool that returns PII (patient data from a health-adjacent enterprise system) will be included in CrewAI's task output and potentially logged without filtering. Required mitigation: output filtering via PolicyAwareTool base class on all enterprise tools.
❌ No SBOM generated for the enterprise agent dependency surface. The three-framework stack adds LangChain, LangGraph, CrewAI, Semantic Kernel Python, and their transitive dependencies (a conservative estimate is 150+ additional Python packages). None of these have been security-scanned. SOC2 Type II requires evidence of dependency scanning. Required: pip-audit + cyclonedx-bom integrated into CI/CD before any production deployment.
Major Gaps
⚠️ Three independent release cadences — maintenance surface tripled. CrewAI releases weekly. LangGraph releases frequently as part of LangChain. Semantic Kernel follows Microsoft's release schedule. Breaking changes in any framework require immediate response. At the current coditect-core maintenance velocity, managing three external framework dependencies is a significant ongoing burden that WILL cause delayed security patches.
⚠️ LangChain dependency graph is notoriously brittle. LangGraph depends on LangChain. LangChain's version history shows frequent breaking changes across minor versions. "Notoriously large and version-sensitive; upgrades frequently break integrations" (from the primary evaluation). Every LangGraph upgrade requires a full regression test of all LangGraph-based workflows. This is not an acceptable situation without a frozen requirements file and a dedicated upgrade testing process.
⚠️ Semantic Kernel Python SDK is a second-class citizen. SK's primary development language is C#. The Python SDK lags behind in features, documentation, and bug fixes. Features available in SK for C# take weeks to months to appear in the Python SDK. For coditect-core's Python-first architecture, this means perpetually using an SDK that is behind its primary development track.
⚠️ No unified audit query surface. CrewAI audit events, LangGraph state traces, and SK audit hooks write to different formats and different storage locations. A compliance officer asking "show me all enterprise agent actions taken on behalf of Tenant X in the last 30 days" requires querying three separate data sources and correlating by tenant_id. This is an operational gap that creates compliance reporting risk.
⚠️ CrewAI acqui-hire risk. CrewAI Inc. is Series A funded and growing rapidly. Series A-stage AI companies with highly valued products are frequent acqui-hire targets. An acquisition could result in a license change (MIT to commercial), breaking API changes, or framework abandonment. The MIT license protects existing code but does not protect against feature freeze or deprecation of enterprise integrations.
Minor Gaps
ℹ️ LangGraph MCP support is via adapter, not native. The langchain-mcp-adapters package is a third-party adapter adding a dependency boundary. It works but adds a fragility point at the MCP tool boundary that CrewAI does not have.
ℹ️ CrewAI Flows checkpoint model is opaque. Ralph Wiggum cannot inspect CrewAI Flow state during execution — only before and after. Checkpoint granularity for long-running CrewAI workflows is limited to the step boundaries that the Flow explicitly defines, which may not align with Ralph Wiggum's checkpoint expectations.
ℹ️ Docker container cold start latency. The recommended CrewAI isolation model (container per tenant per task) adds 1-5 seconds cold start for pre-warmed containers, 15-30 seconds for cold starts. For synchronous enterprise agent workflows (user waiting for result), this is noticeable. Asynchronous workflows (background automation) are unaffected.
Risks
Risk: CrewAI production security incident (Probability: High, Impact: Critical) Prompt injection via enterprise data (email body, calendar description, document content) manipulates CrewAI agents to perform unauthorized actions. The host-process execution model means there is no containment. Mitigation: Docker isolation required before production.
Risk: LangChain ecosystem version conflict (Probability: High, Impact: Medium) A LangGraph upgrade breaks LangChain MCP adapter compatibility, disabling all MCP-based tools in LangGraph workflows. Mitigation: Frozen requirements.txt with pinned transitive dependencies; dedicated upgrade testing track.
Risk: Semantic Kernel Python SDK feature lag (Probability: Medium, Impact: Medium) A required SK security feature (e.g., new OAuth2 grant type, content safety update) is available in C# SDK but delayed 6+ months in Python SDK. CODITECT cannot implement the feature without forking SK Python. Mitigation: Evaluate whether the C# SDK as a separate microservice is preferable to the Python SDK for SK's primary role.
Risk: Three-framework stack overwhelms engineering capacity (Probability: Medium, Impact: High) Each framework has distinct upgrade, testing, and security patch requirements. At current team size, maintaining three active framework integrations alongside the core platform may be infeasible. Mitigation: Designate a dedicated Enterprise Agent Platform owner role; consider phased adoption (CrewAI first, LangGraph second, SK third).
8. Integration Patterns
Pattern 1: CrewAI Enterprise Agent Invoked via coditect-core Command
This pattern shows the full invocation flow from a CODITECT command to a CrewAI crew execution, with Docker isolation, tenant context injection, audit logging, and result storage.
# commands/enterprise-agent/start.py
# Handles: /enterprise-agent start --workflow gmail-triage --tenant <tenant_id>
from pathlib import Path
from coditect.context import get_tenant_context, TenantContext
from coditect.session_log import SessionLogger
from coditect.policy import PolicyEngine
from coditect.integrations.enterprise_agent.crewai_rw_manager import CrewAIRalphWiggumManager
from coditect.integrations.enterprise_agent.audit_wrapper import CoditactAuditCallback
from coditect.integrations.enterprise_agent.docker_runner import TenantDockerRunner
def execute_enterprise_agent(
workflow_name: str,
tenant_id: str,
workflow_input: dict,
) -> dict:
"""
Entry point for /enterprise-agent start command.
Runs CrewAI workflow in isolated Docker container with full audit coverage.
"""
tenant_ctx: TenantContext = get_tenant_context(tenant_id)
logger = SessionLogger(tenant_id=tenant_id)
policy_engine = PolicyEngine(tenant_id=tenant_id)
# Pre-flight policy check
policy_result = policy_engine.evaluate(
action="enterprise_agent.start",
parameters={"workflow": workflow_name, "input": workflow_input}
)
if not policy_result.allowed:
logger.emit(f"[POLICY_BLOCK] enterprise_agent.start blocked: {policy_result.reason}")
return {"status": "blocked", "reason": policy_result.reason}
logger.emit(f"[ENTERPRISE_AGENT] Starting workflow: {workflow_name}")
# Launch isolated Docker container with tenant credentials
runner = TenantDockerRunner(tenant_id=tenant_id)
result = runner.run_crewai_workflow(
workflow_name=workflow_name,
workflow_input=workflow_input,
tenant_secrets_path=f"/run/secrets/{tenant_id}/enterprise-agent.env",
network=f"coditect-tenant-{tenant_id}",
image="coditect/enterprise-agent:latest",
)
# Store result with tenant scoping
logger.emit(f"[ENTERPRISE_AGENT] Completed: {workflow_name} status={result['status']}")
return result
# coditect/integrations/enterprise_agent/docker_runner.py
import subprocess
import json
import tempfile
import os
from typing import Any
class TenantDockerRunner:
"""
Runs CrewAI workflows in isolated Docker containers.
One container per task invocation.
Container has access ONLY to:
- That tenant's credentials (via Docker secrets)
- That tenant's network segment
- The MCP servers via allowlisted network routes
"""
def __init__(self, tenant_id: str):
self.tenant_id = tenant_id
def run_crewai_workflow(
self,
workflow_name: str,
workflow_input: dict,
tenant_secrets_path: str,
network: str,
image: str,
timeout_seconds: int = 300,
) -> dict:
with tempfile.NamedTemporaryFile(
mode='w', suffix='.json', delete=False
) as f:
json.dump(workflow_input, f)
input_file = f.name
try:
result = subprocess.run(
[
"docker", "run",
"--rm",
"--network", network,
"--env", f"TENANT_ID={self.tenant_id}",
"--env", f"WORKFLOW_NAME={workflow_name}",
"--env-file", tenant_secrets_path,
"--mount", f"type=bind,source={input_file},target=/input.json,readonly",
"--memory", "2g",
"--cpus", "1.0",
"--read-only",
"--tmpfs", "/tmp:noexec,nosuid,size=100m",
"--security-opt", "no-new-privileges",
image,
"python", "-m", "coditect.enterprise_agent.entrypoint",
],
capture_output=True,
text=True,
timeout=timeout_seconds,
)
if result.returncode != 0:
return {
"status": "error",
"error": result.stderr[-2000:], # Last 2000 chars
"workflow": workflow_name,
}
return json.loads(result.stdout)
finally:
os.unlink(input_file)
Pattern 2: LangGraph State Machine Mapped to Ralph Wiggum Checkpoints
This pattern shows a LangGraph graph for an internal development workflow (document generation with review loop) that integrates with Ralph Wiggum's checkpoint system for session-safe execution.
# coditect/integrations/langgraph/development_workflow.py
from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.postgres import PostgresSaver
from langgraph.graph.message import add_messages
from langchain_core.messages import HumanMessage, AIMessage
from coditect.integrations.enterprise_agent.langgraph_rw_bridge import (
create_rw_aligned_checkpointer,
resume_graph_from_rw_checkpoint,
)
from coditect.session_log import SessionLogger
class DocumentWorkflowState(TypedDict):
"""
Typed state for document generation workflow.
All fields are tenant-context-aware — no cross-tenant data mixed here.
"""
tenant_id: str
task_id: str
document_topic: str
messages: Annotated[list, add_messages]
draft_content: str
review_feedback: str
revision_count: int
approved: bool
final_content: str
def build_document_workflow_graph(tenant_id: str, task_id: str):
"""
Builds a LangGraph state machine for document generation with review loop.
Maps to Ralph Wiggum's checkpoint model.
"""
checkpointer = create_rw_aligned_checkpointer(
tenant_id=tenant_id,
loop_id=task_id
)
logger = SessionLogger(tenant_id=tenant_id)
def generate_draft(state: DocumentWorkflowState) -> DocumentWorkflowState:
"""Node: Generate initial document draft."""
logger.emit(f"[LANGGRAPH] generate_draft node executing for task={state['task_id']}")
# Call coditect-core senior-architect agent via subprocess
draft = _invoke_coditect_agent(
"senior-architect",
f"Create a technical document about: {state['document_topic']}"
)
return {**state, "draft_content": draft, "revision_count": 0}
def review_draft(state: DocumentWorkflowState) -> DocumentWorkflowState:
"""Node: Review draft — this is an INTERRUPT node (HITL gate)."""
# This node is decorated as interrupt — LangGraph pauses here
# waiting for human input via the thread_id's update mechanism
logger.emit(f"[LANGGRAPH] review_draft interrupt node for task={state['task_id']}")
return state # State unchanged; human provides feedback via graph.update_state()
def revise_draft(state: DocumentWorkflowState) -> DocumentWorkflowState:
"""Node: Revise based on feedback."""
logger.emit(f"[LANGGRAPH] revise_draft node, revision={state['revision_count']}")
revised = _invoke_coditect_agent(
"codi-documentation-writer",
f"Revise this document based on feedback:\n"
f"Document: {state['draft_content']}\n"
f"Feedback: {state['review_feedback']}"
)
return {
**state,
"draft_content": revised,
"revision_count": state["revision_count"] + 1
}
def finalize(state: DocumentWorkflowState) -> DocumentWorkflowState:
"""Node: Mark document as approved."""
return {**state, "final_content": state["draft_content"], "approved": True}
def should_revise(state: DocumentWorkflowState) -> str:
"""Conditional edge: route to revise or finalize based on approval."""
if state.get("approved"):
return "finalize"
if state["revision_count"] >= 3:
return "finalize" # Auto-approve after 3 revisions
return "revise"
builder = StateGraph(DocumentWorkflowState)
builder.add_node("generate_draft", generate_draft)
builder.add_node("review_draft", review_draft)
builder.add_node("revise_draft", revise_draft)
builder.add_node("finalize", finalize)
builder.set_entry_point("generate_draft")
builder.add_edge("generate_draft", "review_draft")
builder.add_conditional_edges(
"review_draft",
should_revise,
{"revise": "revise_draft", "finalize": "finalize"}
)
builder.add_edge("revise_draft", "review_draft")
builder.add_edge("finalize", END)
# interrupt_before=["review_draft"] makes review_draft a HITL gate
graph = builder.compile(
checkpointer=checkpointer,
interrupt_before=["review_draft"]
)
return graph
def _invoke_coditect_agent(agent_name: str, prompt: str) -> str:
"""Subprocess call to coditect-core agent — maintains the invocation direction."""
import subprocess
result = subprocess.run(
["python", "-m", "coditect.agent_runner", "--agent", agent_name, "--prompt", prompt],
capture_output=True, text=True, timeout=120
)
return result.stdout
Pattern 3: Semantic Kernel as MCP Security Service
This pattern deploys Semantic Kernel as a standalone Python microservice exposing an MCP interface. coditect-core agents call into it as an MCP tool, isolating the C# dependency surface entirely.
# coditect/services/sk_mcp_server/server.py
# Runs as: python -m coditect.services.sk_mcp_server --port 8080
# Deployed as a separate Docker container in GKE
import asyncio
import json
from pathlib import Path
from fastapi import FastAPI, HTTPException, Header
from pydantic import BaseModel
import semantic_kernel as sk
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion
from semantic_kernel.core_plugins.microsoft_graph import MicrosoftGraphPlugin
from semantic_kernel.filters.auto_function_invocation.auto_function_invocation_context import (
AutoFunctionInvocationContext,
)
app = FastAPI(title="CODITECT Semantic Kernel MCP Server")
class SKActionRequest(BaseModel):
tenant_id: str
plugin_name: str
function_name: str
parameters: dict
oauth_token: str # Short-lived token injected by coditect-core, never stored
class CoditactAuditFilter:
"""
SK invocation filter that emits audit events for every function call.
This is the compliance core of the SK integration.
"""
async def on_auto_function_invocation(
self,
context: AutoFunctionInvocationContext,
next,
) -> None:
tenant_id = context.arguments.get("tenant_id", "unknown")
# Pre-invocation audit
audit_event = {
"event": "sk.function_invoke",
"tenant_id": tenant_id,
"plugin": context.function.plugin_name,
"function": context.function.name,
"parameters": {
k: "[REDACTED]" if "token" in k.lower() or "secret" in k.lower() else v
for k, v in context.arguments.items()
},
}
_emit_audit_event(audit_event)
await next(context)
# Post-invocation audit
_emit_audit_event({
"event": "sk.function_complete",
"tenant_id": tenant_id,
"plugin": context.function.plugin_name,
"function": context.function.name,
"success": not context.is_cancel_requested,
})
@app.post("/mcp/tools/invoke")
async def invoke_sk_function(
request: SKActionRequest,
x_coditect_tenant: str = Header(None),
):
"""
MCP-compatible tool invocation endpoint.
Called by coditect-core agents and CrewAI crews via MCP protocol.
"""
if x_coditect_tenant != request.tenant_id:
raise HTTPException(status_code=403, detail="Tenant context mismatch")
kernel = sk.Kernel()
# Add audit filter
kernel.add_filter("auto_function_invocation", CoditactAuditFilter())
# Configure Microsoft Graph plugin with the provided OAuth token
# Token is tenant-specific and short-lived — never persisted in this service
if request.plugin_name == "MicrosoftGraph":
plugin = MicrosoftGraphPlugin(access_token=request.oauth_token)
kernel.add_plugin(plugin, plugin_name="MicrosoftGraph")
result = await kernel.invoke(
plugin_name=request.plugin_name,
function_name=request.function_name,
**request.parameters
)
return {
"status": "success",
"result": str(result),
"tenant_id": request.tenant_id,
"plugin": request.plugin_name,
"function": request.function_name,
}
@app.get("/mcp/tools/list")
async def list_available_tools():
"""MCP tool discovery endpoint — called by CrewAI and coditect-core at startup."""
return {
"tools": [
{
"name": "MicrosoftGraph.send_email",
"description": "Send email via Microsoft Graph API (Outlook)",
"parameters": {
"to": "string",
"subject": "string",
"body": "string",
"tenant_id": "string",
},
"requires_oauth": True,
"audit_required": True,
"hitl_required": True,
},
{
"name": "MicrosoftGraph.create_calendar_event",
"description": "Create calendar event via Microsoft Graph API",
"parameters": {
"title": "string",
"start_time": "string",
"end_time": "string",
"attendees": "list[string]",
"tenant_id": "string",
},
"requires_oauth": True,
"audit_required": True,
"hitl_required": True,
},
]
}
def _emit_audit_event(event: dict) -> None:
"""Write to CODITECT session log via stdout (captured by coditect-core process manager)."""
import sys
print(json.dumps({"type": "audit", **event}), file=sys.stderr, flush=True)
Database Schema Extensions
-- Enterprise agent configuration per tenant
CREATE TABLE enterprise_agent_config (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
framework VARCHAR(50) NOT NULL CHECK (framework IN ('crewai', 'langgraph', 'semantic_kernel')),
config JSONB NOT NULL DEFAULT '{}',
enabled BOOLEAN NOT NULL DEFAULT false,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_eac_tenant ON enterprise_agent_config(tenant_id);
ALTER TABLE enterprise_agent_config ENABLE ROW LEVEL SECURITY;
CREATE POLICY eac_tenant_isolation ON enterprise_agent_config
USING (tenant_id = current_setting('app.current_tenant')::UUID);
-- Enterprise agent task audit log
CREATE TABLE enterprise_agent_audit (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
user_id UUID REFERENCES users(id),
framework VARCHAR(50) NOT NULL,
tool_name VARCHAR(255) NOT NULL,
action_type VARCHAR(100) NOT NULL,
parameters JSONB, -- PII-sanitized at write time
outcome VARCHAR(50) CHECK (outcome IN ('success', 'blocked_policy', 'blocked_hitl', 'error')),
evidence_id UUID, -- FK to esignature_evidence if HITL was required
ip_address INET,
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_eaa_tenant_time ON enterprise_agent_audit(tenant_id, created_at DESC);
ALTER TABLE enterprise_agent_audit ENABLE ROW LEVEL SECURITY;
CREATE POLICY eaa_tenant_isolation ON enterprise_agent_audit
USING (tenant_id = current_setting('app.current_tenant')::UUID);
-- OAuth2 credentials store (refresh tokens only, AES-256 encrypted)
CREATE TABLE enterprise_agent_credentials (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
provider VARCHAR(100) NOT NULL, -- 'google_workspace', 'microsoft_365', etc.
encrypted_refresh_token BYTEA NOT NULL, -- AES-256-GCM encrypted
scope TEXT[] NOT NULL,
expires_hint TIMESTAMPTZ, -- Best-guess expiry, not authoritative
created_at TIMESTAMPTZ DEFAULT NOW(),
last_used_at TIMESTAMPTZ,
UNIQUE(tenant_id, provider)
);
ALTER TABLE enterprise_agent_credentials ENABLE ROW LEVEL SECURITY;
CREATE POLICY eac_creds_tenant_isolation ON enterprise_agent_credentials
USING (tenant_id = current_setting('app.current_tenant')::UUID);
-- LangGraph checkpoint table (created by PostgresSaver.setup() but needs RLS)
-- Run after PostgresSaver.setup():
ALTER TABLE langgraph_checkpoints ENABLE ROW LEVEL SECURITY;
CREATE POLICY lg_tenant_isolation ON langgraph_checkpoints
-- thread_id format: {tenant_id}:{loop_id}:{task_id}
USING (split_part(thread_id, ':', 1) = current_setting('app.current_tenant'));
9. Decision Framework
Weighted Scoring for CODITECT Integration
| Criterion | Weight | CrewAI Score | LangGraph Score | SK Score | Stack Score | Notes |
|---|---|---|---|---|---|---|
| Multi-Tenant Isolation | Critical | 2 | 4 | 3 | 3 | CrewAI requires full container isolation; LG needs thread_id scoping (achievable); SK needs token scoping |
| Compliance Surface | Critical | 3 | 5 | 9 | 6 | CrewAI no audit trail natively; LG structural trace is excellent; SK has built-in audit hooks — stack average is misleading; SK alone is compliance-capable |
| Observability | Important | 2 | 5 | 8 | 5 | CrewAI no OTEL; LG via callbacks (good); SK native OTEL (best in stack) |
| Agent Orchestration Fit | Important | 6 | 9 | 3 | 6 | CrewAI fit but hierarchy conflict; LG is natural Ralph Wiggum analog; SK poorest fit |
| Performance at Scale | Important | 3 | 6 | 7 | 5 | CrewAI container startup latency is real; LG graph is efficient; SK is stateless MCP calls |
| License Compatibility | Critical | 10 | 10 | 10 | 10 | All MIT — no concerns |
Note on stack scoring: The three-framework stack is assessed as a unit, not as individual selections. A gap in one framework that is covered by another in the stack counts as covered at the stack level (e.g., SK's audit hooks compensate for CrewAI's lack of audit natively). However, gaps that require CODITECT engineering (rather than another framework in the stack) are scored against the stack as a whole.
| Criterion | Weight | Stack Score (0-10) | Weighted Score | Notes |
|---|---|---|---|---|
| Multi-Tenant Isolation | Critical (x2) | 3 | 6 | Requires full container isolation implementation |
| Compliance Surface | Critical (x2) | 6 | 12 | SK covers; CrewAI audit wrapper required |
| Observability | Important (x1) | 5 | 5 | LangFuse integration addresses most gaps |
| Agent Orchestration Fit | Important (x1) | 6 | 6 | LangGraph fit is excellent; CrewAI hierarchy manageable |
| Performance at Scale | Important (x1) | 5 | 5 | Container startup latency is the primary concern |
| License Compatibility | Critical (x2) | 10 | 20 | All MIT — maximum score |
| Total | 54/70 |
Threshold: Go (>=55, all critical >=7), Conditional (35-54 OR any critical <7), No-Go (<35 OR any critical =0)
Score: 54/70 — Conditional
Multi-Tenant Isolation scores 3 (below the critical threshold of 7) because the isolation implementation is entirely CODITECT's responsibility and has not been built. License Compatibility is 10. Compliance Surface is 6 (SK provides the mechanism; the CrewAI audit wrapper must be built). The overall score of 54 falls at the top of the Conditional band.
Conditions for Conditional Go:
-
Container isolation implemented first. CrewAI MUST run in Docker container with tenant-scoped network, secrets injection, and --read-only filesystem before any production tenant data touches the enterprise agent stack. This is a hard gate, not a suggestion.
-
LangGraph requirements freeze.
requirements-enterprise-agent.txtwith all LangChain + LangGraph + MCP adapter versions pinned (pip freeze output, not version ranges) must be committed before the first production deployment. -
Enterprise agent security ADR approved. The sandboxing model, HITL gate design, credential storage architecture, and audit log schema must be documented as an ADR and approved by engineering leadership before any tenant credentials are stored.
-
Semantic Kernel deployed as MCP microservice, not embedded. The Python SDK's second-class status relative to C# makes direct embedding high-risk. The MCP server deployment pattern isolates the SK dependency boundary.
-
CoditactAuditCallback implemented and tested. The audit wrapper for CrewAI tool calls must achieve >95% coverage of all tool invocations in integration tests before any production use.
-
Phased tenant rollout. The first production tenant on the enterprise agent stack must be an internal test tenant (coditect's own workflows), not an external customer. External customer enablement requires a separate gate after 30 days of internal operation without incidents.
10. Next Steps
If Conditional Go Conditions Met:
-
Week 1-2: Security ADR
- Draft ADR: Enterprise Agent Security Architecture
- Cover: container isolation model, credential vault design, HITL gate requirements, audit schema
- Approval required from: Engineering Lead, Compliance (SOC2 addendum review)
-
Week 2-4: Infrastructure
- Implement
TenantDockerRunnerwith GKE NetworkPolicy for tenant isolation - Deploy PostgreSQL RLS policies for LangGraph checkpoint table
- Implement encrypted credential store for OAuth2 refresh tokens
- Create
enterprise_agent_audittable with RLS
- Implement
-
Week 3-5: CrewAI Integration
- Implement
CoditactAuditCallbackwith >95% tool call coverage - Implement
PolicyAwareToolbase class - Create
enterprise-orchestration-engineagent definition - Add
/enterprise-agentcommand suite - Prototype: Gmail triage crew using CrewAI + coditect MCP servers (day-one synergy)
- Implement
-
Week 5-7: LangGraph Integration
- Implement
langgraph_rw_bridge.py(PostgreSQL checkpointer with RLS) - Create
stateful-workflow-engineagent definition - Add
/workflow-graphcommand suite - Prototype: Document generation workflow with LangGraph interrupt node as HITL gate
- Implement
-
Week 7-10: Semantic Kernel MCP Server
- Deploy
sk_mcp_serveras GKE service - Configure Microsoft Graph plugin with OAuth2 flow
- Implement
CoditactAuditFilter - Add
/ms365-authcommand for tenant OAuth2 onboarding
- Deploy
-
Week 10-12: Integration Testing
- Run enterprise agent workflows against internal test tenant
- SBOM generation and security scan of all new dependencies
- Penetration test: prompt injection via malicious email body
- Load test: 10 concurrent tenant enterprise agent tasks
- SOC2 addendum review of audit log schema and retention
-
Week 12+: Staged External Rollout
- Invite 1-2 pilot customers (non-HIPAA initially)
- Monitor for 30 days
- Gate HIPAA customer enablement on clean 30-day record
If No-Go (conditions not met in reasonable timeline):
- Document as ADR: "Enterprise Agent Stack — Deferred Pending Security Infrastructure"
- Archive all research artifacts in
internal/analysis/autonomous-enterprise-agent/ - Identify minimum viable alternative: CrewAI only, no LangGraph, no SK, with custom security wrapper
- Revisit in 90 days after security infrastructure is built
References
- Framework Evaluation:
internal/analysis/autonomous-enterprise-agent/autonomous-enterprise-agent-framework-evaluation-2026-02-19.md - Pairwise Comparison:
internal/analysis/autonomous-enterprise-agent/autonomous-enterprise-agent-pairwise-comparison-2026-02-19.md - Search Strategy:
internal/analysis/autonomous-enterprise-agent/autonomous-enterprise-agent-search-strategy-2026-02-19.md - CODITECT Architecture:
internal/architecture/SDD-CODITECT-MULTI-TENANT-SAAS.md - Ralph Wiggum:
docs/guides/RALPH-WIGGUM-GUIDE.md|internal/architecture/RALPH-WIGGUM-ARCHITECTURE.md - Database Schema:
docs/reference/database/DATABASE-SCHEMA.md - Multi-Tenancy Standard: ADR reference required — see
internal/architecture/adrs/ - Sidecar AGPL Analysis:
internal/architecture/adrs/ADR-165.mdthroughADR-169.md - OWASP Agent Security Top 10 2026: https://medium.com/@oracle_43885/owasps-ai-agent-security-top-10-agent-security-risks-2026-fc5c435e86eb
- CrowdStrike MCP Security: https://www.crowdstrike.com/en-us/blog/what-security-teams-need-to-know-about-openclaw-ai-super-agent/