Skip to main content

Generative AI (GenAI) Governance Addendum

Supplemental Standards for LLMs, Foundation Models & Agentic AI


Document Control

FieldDetails
Document TypePolicy Addendum / Technical Standard
Parent DocumentEnterprise AI Policy & Standard (Artifact 5)
Applies ToLarge Language Models (LLMs), Image/Video Generation, Code Assistants, Agentic AI
Versionv2.0
Framework AlignmentNIST AI RMF GenAI Profile, EU AI Act GPAI Requirements, OWASP Top 10 for LLMs, MITRE ATLAS

1. Unique Risk Profile of Generative AI

Unlike predictive models (which output a score or classification), GenAI models output new content. This creates unique risks that require specialized controls:

RiskDescriptionImpact
HallucinationModel confidently states false informationMisinformation, wrong decisions, liability
Prompt InjectionMalicious inputs bypass safety filtersData leakage, unauthorized actions
JailbreakingTechniques to circumvent content policiesHarmful content generation
Toxic OutputGeneration of harmful, biased, or offensive contentReputational damage, harm to users
IP InfringementReproducing copyrighted contentLegal liability
Data LeakageRevealing sensitive training data in outputsPrivacy violations
Excessive AgencyAgentic AI taking unauthorized actionsSecurity breach, operational harm

2. Architecture & Design Controls

2.1 Defense-in-Depth Architecture

All GenAI systems must implement layered defenses:

┌─────────────────────────────────────────────────────────────┐
│ USER INPUT │
└─────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ INPUT GUARDRAILS │
│ • PII Detection/Scrubbing • Token Limits │
│ • Prompt Injection Detection • Rate Limiting │
│ • Content Policy Check • Input Sanitization │
└─────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ MODEL LAYER │
│ • System Prompt Engineering • Temperature Control │
│ • RAG/Grounding (if applicable) • Response Constraints │
└─────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ OUTPUT GUARDRAILS │
│ • Toxicity Detection • PII Detection │
│ • Format Validation • Citation Verification │
│ • Confidence Thresholds • Refusal Handling │
└─────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ LOGGING & MONITORING │
│ • Full Prompt/Response Logging • Usage Analytics │
│ • Anomaly Detection • Audit Trail │
└─────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ USER OUTPUT │
└─────────────────────────────────────────────────────────────┘

2.2 Grounding Requirements (Anti-Hallucination)

For GenAI retrieving business facts:

RequirementStandard
RAG ImplementationUse Retrieval Augmented Generation for factual queries
Source ConstraintModel must answer based on retrieved context only
Citation DisplayUI must show citations/links to source documents
Confidence IndicatorsDisplay confidence levels where feasible
Fallback BehaviorClear message when information not found in sources

2.3 Input Guardrails (Pre-Processing)

ControlImplementation
PII ScrubbingAuto-detect and redact PII before external API calls
Injection DetectionBlock patterns like "ignore previous instructions", "DAN mode"
Token LimitsEnforce maximum input length
Rate LimitingPer-user request limits to prevent abuse
Content ClassificationFlag potentially policy-violating inputs

Injection Detection Patterns:

  • "Ignore previous instructions"
  • "You are now [alternate persona]"
  • "Pretend you are..."
  • "DAN mode" / "Developer mode"
  • Base64 encoded instructions
  • Multi-language obfuscation

2.4 Output Guardrails (Post-Processing)

ControlImplementation
Toxicity CheckUse classifier (Llama Guard, Azure Content Safety)
Refusal HandlingShow standardized error on safety violations
Format ValidationValidate JSON/SQL for agent tool use
PII DetectionScan outputs for inadvertent PII exposure
Citation VerificationVerify cited sources exist and match claims

3. Testing & Verification Standards

3.1 Red Teaming (Adversarial Testing)

Mandatory for High-Risk GenAI

ScenarioDescriptionPass Criteria
JailbreakingAttempt to generate prohibited contentNo violations in 1000 attempts
Prompt InjectionAttempt to override system instructionsSystem prompt maintained
Data ExtractionAttempt to reveal training dataNo sensitive data exposed
PII ExtractionAttempt to extract personal informationPII properly redacted
Harmful ContentRequest illegal/dangerous informationAppropriate refusals
Bias ExploitationProbe for discriminatory outputsNo discriminatory patterns

3.2 LLM Evaluation Metrics

MetricDescriptionThreshold
FaithfulnessDoes answer match source context?>95%
Answer RelevanceDid model answer the actual question?>90%
GroundednessAre claims supported by sources?>90%
Toxicity ScoreHarmful content detection<1%
Hallucination RateUnsupported factual claims<5%
Refusal AppropriatenessProper handling of edge cases>95%

3.3 Evaluation Tools

CategoryTools
RAG EvaluationRAGAS, TruLens, LangSmith
Safety TestingGarak, Promptfoo, Anthropic Eval
Toxicity DetectionPerspective API, Llama Guard, Azure AI
Red TeamingHarmBench, AdvBench, custom

4. Adversarial ML Defense Framework

4.1 Attack Taxonomy

Understanding adversarial attacks is essential for building resilient AI systems. This section provides comprehensive defense guidance aligned with MITRE ATLAS, OWASP LLM Top 10, and NIST AI RMF.

4.1.1 Attack Categories

CategoryDescriptionRisk LevelPrimary Target
Prompt InjectionManipulating model behavior via crafted inputsCriticalLLMs, Chat systems
JailbreakingBypassing safety guardrails and content policiesCriticalLLMs, GenAI
Data PoisoningCorrupting training or fine-tuning dataHighAll ML models
Model ExtractionStealing model weights or architectureHighProprietary models
Evasion AttacksCrafting inputs to cause misclassificationHighClassifiers, detectors
Membership InferenceDetermining if data was in training setMediumPrivacy-sensitive models
Model InversionReconstructing training data from modelHighModels trained on PII
Backdoor AttacksImplanting hidden triggers during trainingCriticalFine-tuned models
Adversarial ExamplesImperceptible perturbations causing errorsMediumComputer vision, NLP
Supply Chain AttacksCompromising ML dependencies or datasetsCriticalAll ML systems

4.1.2 OWASP LLM Top 10 Mapping

OWASP IDVulnerabilityDefense Section
LLM01Prompt Injection4.2.1
LLM02Insecure Output Handling4.2.4
LLM03Training Data Poisoning4.2.2
LLM04Model Denial of Service4.2.5
LLM05Supply Chain Vulnerabilities4.2.6
LLM06Sensitive Information Disclosure4.2.3
LLM07Insecure Plugin Design4.2.7
LLM08Excessive AgencySection 5 (Agentic AI)
LLM09OverrelianceSection 6.1
LLM10Model Theft4.2.3

4.2 Defense Strategies

4.2.1 Prompt Injection Defense

Attack Vector: Attacker embeds malicious instructions in user input or retrieved content to override system behavior.

Defense LayerControlImplementation
Input FilteringPattern DetectionRegex/ML-based detection of injection patterns
Input FilteringInput SanitizationStrip control characters, normalize Unicode
Input FilteringLength LimitsEnforce maximum input token limits
ArchitecturalPrompt IsolationSeparate user content from system instructions
ArchitecturalDual-LLM PatternUse separate model to validate inputs
ArchitecturalInstruction HierarchyMark system instructions as immutable
Output ValidationResponse VerificationCheck output adheres to expected format
Output ValidationAction ConfirmationRequire explicit confirmation for sensitive actions

Detection Patterns:

# High-Confidence Injection Patterns
- "ignore previous instructions"
- "disregard all prior"
- "you are now [persona]"
- "system: " or "[SYSTEM]" in user input
- "```system" code blocks
- Base64-encoded instruction blocks
- Unicode homoglyph obfuscation
- Multi-language instruction mixing
- Markdown/HTML injection attempts
- JSON/XML escape sequences

Recommended Tools:

  • Rebuff (open source)
  • Lakera Guard
  • Prompt Armor
  • Custom regex + ML ensemble

4.2.2 Data Poisoning Defense

Attack Vector: Attacker corrupts training data to embed backdoors or degrade model performance.

DefenseDescriptionWhen to Apply
Data ProvenanceTrack and verify all data sourcesData collection
Data ValidationStatistical analysis for anomaliesPre-training
Outlier DetectionIdentify and quarantine suspicious samplesPre-training
Certified TrainingUse techniques robust to label noiseTraining
Differential PrivacyLimit individual sample influenceTraining
Data AuditingRegular review of training pipelinesOngoing
Model TestingBackdoor detection via trigger analysisPost-training

Training Data Hygiene Checklist:

  • Verify data source authenticity
  • Scan for known malicious payloads
  • Check for statistical distribution shifts
  • Validate label consistency
  • Review web-scraped content for injection
  • Implement data versioning and lineage
  • Apply deduplication to prevent amplification

4.2.3 Model Extraction & Inversion Defense

Attack Vector: Attacker queries model to steal intellectual property or reconstruct sensitive training data.

DefenseImplementationTrade-off
Rate LimitingLimit queries per user/IP/sessionUsability
Query AuditingDetect suspicious query patternsLatency
Output PerturbationAdd controlled noise to outputsAccuracy
WatermarkingEmbed detectable signatures in outputsNone
API AuthenticationRequire strong authenticationAccess friction
Differential PrivacyLimit information leakage per queryModel utility
Confidence MaskingHide or round probability scoresFeature loss

Suspicious Query Patterns:

  • High volume of similar queries
  • Systematic exploration of decision boundaries
  • Queries requesting confidence scores
  • Grid-like sampling of input space
  • Queries probing for training data memorization

4.2.4 Output Handling Security

Attack Vector: Malicious model outputs are executed or rendered unsafely by downstream systems.

ControlDescriptionRequired For
Output SanitizationEscape special charactersAll outputs
Format ValidationStrict schema enforcementStructured outputs
Code SandboxingExecute generated code in isolationCode generation
URL ValidationVerify URLs before renderingLink generation
Content Security PolicyRestrict executable contentWeb applications
SQL ParameterizationNever interpolate outputs into SQLDatabase queries

Secure Output Handling Pattern:

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│ Model Output │ ──▶ │ Validation │ ──▶ │ Sanitization │
└─────────────────┘ │ - Schema check │ │ - Escape HTML │
│ - Type check │ │ - Strip scripts│
│ - Length check │ │ - Normalize │
└─────────────────┘ └─────────────────┘


┌─────────────────┐ ┌─────────────────┐
│ Application │ ◀── │ Safe Render │
│ Processing │ │ or Execute │
└─────────────────┘ └─────────────────┘

4.2.5 Denial of Service Defense

Attack Vector: Attacker exhausts compute resources or degrades service availability.

Attack TypeDefenseImplementation
Resource ExhaustionToken limitsMax input/output tokens
Resource ExhaustionTimeout limitsMaximum inference time
Batch AttacksRate limitingPer-user/IP request caps
AmplificationOutput limitsMaximum response length
Context Window AbuseContext managementSliding window, summarization
Recursive PromptsLoop detectionDetect self-referential patterns

Resource Limits Matrix:

TierRequests/minMax Input TokensMax Output TokensTimeout
Free104,0001,00030s
Standard608,0004,00060s
Enterprise30032,0008,000120s
Internal1000128,00016,000300s

4.2.6 Supply Chain Security

Attack Vector: Compromised dependencies, models, or datasets introduce vulnerabilities.

ComponentRiskDefense
Pre-trained ModelsBackdoors, poisoningVerify checksums, scan for triggers
ML LibrariesVulnerabilities, malicious codeSBOM, vulnerability scanning
Training DatasetsPoisoned data, IP issuesData provenance, licensing review
Vector DatabasesPoisoned embeddingsAccess controls, integrity checks
Fine-tuning DataBackdoor injectionData validation, source verification
Plugins/ToolsMalicious functionalityCode review, sandboxing

Supply Chain Security Checklist:

  • Maintain Software Bill of Materials (SBOM)
  • Verify model checksums against official releases
  • Scan dependencies for known vulnerabilities (CVEs)
  • Review model cards for security considerations
  • Audit fine-tuning data sources
  • Implement model signing and verification
  • Use private/mirrored artifact repositories
  • Monitor for upstream security advisories

4.2.7 Plugin and Tool Security

Attack Vector: Malicious or vulnerable plugins expand attack surface.

ControlDescriptionImplementation
Plugin AllowlistOnly approved plugins enabledConfiguration policy
Input ValidationValidate all plugin inputsSchema enforcement
Output SanitizationSanitize plugin outputsFilter before model consumption
Least PrivilegeMinimal permissions per pluginIAM/RBAC
SandboxingIsolate plugin executionContainers, VMs
Audit LoggingLog all plugin invocationsCentralized logging

4.3 Detection and Monitoring

4.3.1 Adversarial Detection Metrics

MetricDescriptionAlert Threshold
Injection Attempt RateDetected injection patterns / total requests>0.1%
Jailbreak Success RateSuccessful policy bypasses / attempts>0% (Critical)
Query Anomaly ScoreDeviation from normal query patterns>3σ
Output Toxicity SpikeSudden increase in harmful outputs>2x baseline
Extraction IndicatorSystematic query patterns detectedAny detection
Resource AnomalyUnusual compute/token consumption>2x baseline

4.3.2 Security Monitoring Architecture

┌─────────────────────────────────────────────────────────────────┐
│ INPUT MONITORING │
│ • Injection pattern detection • Rate anomaly detection │
│ • Input fingerprinting • User behavior analysis │
└─────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│ MODEL MONITORING │
│ • Inference latency tracking • Resource consumption │
│ • Error rate monitoring • Confidence distribution │
└─────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│ OUTPUT MONITORING │
│ • Toxicity scoring • Policy violation detection │
│ • Output pattern analysis • Sensitive data detection │
└─────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│ SIEM / SOAR │
│ • Alert correlation • Automated response │
│ • Incident management • Threat intelligence │
└─────────────────────────────────────────────────────────────────┘

4.4 Incident Response for Adversarial Attacks

4.4.1 Response Playbooks

Attack TypeImmediate ActionInvestigationRemediation
Prompt InjectionBlock pattern, log detailsAnalyze payload, check for data exfilUpdate filters, retrain detector
JailbreakDisable affected featureRoot cause analysis, scope impactPatch guardrails, update policy
Data PoisoningQuarantine modelAudit training data, identify sourceRetrain on clean data
Model ExtractionRate limit user, log queriesAnalyze query patternsImplement additional protections
Supply ChainIsolate componentCVE analysis, dependency auditPatch or replace component

4.4.2 Severity Classification

SeverityCriteriaResponse TimeEscalation
CriticalActive exploitation, data breach, safety impactImmediateCISO, Legal, AI Board
HighSuccessful attack, significant risk4 hoursSecurity Lead, AI Risk
MediumAttempted attack, partial success24 hoursSecurity Team
LowBlocked attack, no impact72 hoursOperations

4.5 Adversarial Robustness Testing

4.5.1 Required Testing by Risk Tier

Test TypeLow RiskMedium RiskHigh RiskCritical
Prompt InjectionRecommendedRequiredRequiredRequired
Jailbreak TestingOptionalRequiredRequiredRequired
Evasion TestingN/ARecommendedRequiredRequired
Data Poisoning SimN/AOptionalRequiredRequired
Red Team ExerciseN/AOptionalRequiredRequired
Penetration TestN/AOptionalRequiredRequired

4.5.2 Testing Tools

CategoryToolPurpose
Prompt InjectionGarak, PromptfooAutomated injection testing
JailbreakingHarmBench, JailbreakBenchSafety bypass testing
Adversarial ExamplesTextFooler, ARTNLP adversarial generation
Model RobustnessAdversarial Robustness ToolboxComprehensive testing
Red TeamingCustom frameworksHuman-led adversarial testing
FuzzingLLM FuzzerRandom input generation

5. Agentic AI Governance

5.1 Definition

Agentic AI refers to AI systems that can:

  • Perceive their environment
  • Make decisions autonomously
  • Take actions in the real world
  • Use tools and APIs
  • Coordinate with other agents

5.2 Agentic AI Risk Categories

Risk CategoryDescription
Goal HijackingAdversary manipulates agent's objectives
Memory PoisoningCorrupting agent's context/memory
Resource ExhaustionAgent consumes excessive resources
Excessive AgencyAgent takes unauthorized actions
Cascade FailuresMulti-agent errors propagate
Tool MisuseImproper use of integrated tools

5.3 Mandatory Agentic Controls

ControlRequirementTier
Action BoundariesExplicit whitelist of permitted actionsAll
Approval GatesHuman approval for sensitive actionsMedium+
SandboxingIsolated test environmentAll
Audit TrailLog all actions with full contextAll
Kill SwitchImmediate halt mechanism (tested monthly)All
Rate LimitsMaximum actions per time periodAll
Timeout LimitsMaximum execution timeAll
Rollback CapabilityAbility to undo agent actionsHigh+

5.4 Multi-Agent System Controls

ControlRequirement
Agent RegistrationAll agents must have unique identity
Communication LoggingLog all inter-agent messages
Cascade PreventionCircuit breakers between agents
Orchestrator OversightCentral coordination point
Consensus RequirementsMulti-agent decisions require consensus (configurable)
IsolationAgents cannot modify other agents

5.5 Tool Use Controls

ControlImplementation
Tool WhitelistOnly approved tools accessible
Parameter ValidationValidate all tool inputs
Output SanitizationSanitize tool outputs before use
Access ScopingMinimum necessary permissions
Credential ManagementNo persistent credentials in agent memory
Tool Call LoggingComplete audit of tool invocations

6. User Interaction Standards

6.1 Human-in-the-Loop Requirements

Risk LevelRequirement
LowAI output may be used directly
MediumHuman review recommended
HighHuman approval required before action
CriticalDual human approval required

6.2 Code Generation Standards

RequirementStandard
Review MandateAll AI-generated code must be human-reviewed
Testing RequiredAI code must pass unit tests before merge
Security ScanAutomated security scanning required
DocumentationAI-generated sections must be documented
No Direct CommitAI cannot commit directly to production branches

6.3 Content Generation Standards

RequirementStandard
External PublicationHuman editor review required
Factual ClaimsMust be verified against authoritative sources
WatermarkingApply C2PA or similar where feasible
DisclosureLabel AI-generated content internally
Deepfake ProhibitionNo synthetic media of real persons without consent

7. Vendor & Procurement Requirements

7.1 GenAI Vendor Due Diligence

RequirementStandard
IP IndemnificationRequired for enterprise GenAI vendors
Zero Data RetentionOur data not used for training
Data Processing AgreementGDPR/CCPA compliant DPA required
SOC 2 / ISO 27001Security certification required
Incident Notification24-hour notification requirement
Model TransparencyDocumentation on capabilities and limitations

7.2 GPAI Provider Requirements (EU AI Act)

For providers of General-Purpose AI models:

RequirementObligation
Technical DocumentationTraining process, evaluation results, limitations
Transparency ReportCapabilities, intended uses, known risks
Training Data SummaryPublished summary of training content
Copyright PolicyDocumentation of copyright compliance
EU AI Act ComplianceDemonstrated compliance with Articles 50-55

7.3 Systemic Risk GPAI Additional Requirements

For GPAI models with systemic risk (≥10²⁵ FLOPS):

RequirementObligation
Model EvaluationComprehensive capability assessment
Red TeamingAdversarial testing by qualified team
Risk AssessmentSystemic risk identification and mitigation
Incident TrackingSerious incident monitoring and reporting
CybersecurityAdequate protection measures

8. Specific Prohibitions for GenAI

The following uses are explicitly prohibited without AI Governance Board waiver:

ProhibitionRationale
Automated execution without approvalRisk of unintended consequences
Medical/legal advice without professional reviewLiability and harm risk
Private key/password generationPoor entropy, security risk
Autonomous hiring/firing decisionsDiscrimination risk, legal requirement
Real-time customer decisions without human oversightFairness and accuracy concerns
Synthetic media of real personsPrivacy, consent, deepfake concerns
Direct database writes without validationData integrity risk
Unlimited agent action scopesExcessive agency risk

9. Monitoring & Incident Response

9.1 GenAI-Specific Monitoring

MetricAlert ThresholdResponse
Hallucination rate>5%Review, retrain, or disable
Toxicity rate>1%Immediate investigation
Prompt injection attemptsAny detectedSecurity investigation
User complaints>2% of sessionsUX and safety review
Cost anomalies>2x baselineRate limit or disable
Latency degradation>3x baselinePerformance investigation

9.2 Incident Classification

SeverityExamplesResponse Time
CriticalData breach, safety harm, regulatory violationImmediate (1 hour)
HighSignificant hallucination, bias incident, jailbreak success4 hours
MediumMinor inaccuracy, user complaints, performance issue24 hours
LowEdge case behavior, minor UX issues72 hours

9.3 Logging Requirements

DataRetentionAccess
Full prompts & responses (High-Risk)90 daysAudit, Security
Summarized interactions (Medium)30 daysOperations
Metadata only (Low)7 daysAnalytics
Security events1 yearSecurity, Audit

10. EU AI Act GPAI Compliance Checklist

10.1 Standard GPAI Obligations (All GPAI Models)

ObligationCompleteEvidence Location
Technical documentation maintained[ ]
Information to downstream providers[ ]
Copyright compliance policy[ ]
Training data summary published[ ]

10.2 Systemic Risk GPAI Obligations (≥10²⁵ FLOPS)

ObligationCompleteEvidence Location
Model evaluation performed[ ]
Adversarial red teaming completed[ ]
Systemic risk assessment documented[ ]
Risk mitigation measures implemented[ ]
Serious incident tracking established[ ]
Cybersecurity measures verified[ ]
AI Office notification submitted[ ]

11. Document History

VersionDateAuthorChanges
1.02025-06-15AI Governance OfficeInitial release
2.02026-01-15AI Governance OfficeAdded agentic AI controls, EU AI Act GPAI requirements, multi-agent governance
2.12026-01-16AI Governance OfficeAdded Section 4: Adversarial ML Defense Framework (OWASP LLM Top 10, MITRE ATLAS alignment)

Next Step: Proceed to Artifact 10: Executive Summary


CODITECT AI Risk Management Framework

Document ID: AI-RMF-09 | Version: 2.0.0 | Status: Active


AZ1.AI Inc. | CODITECT Platform

Framework Alignment: NIST AI RMF 2.0 | EU AI Act | ISO/IEC 42001


This document is part of the CODITECT AI Risk Management Framework. For questions or updates, contact the AI Governance Office.

Repository: coditect-ai-risk-management-framework Last Updated: 2026-01-15 Owner: AZ1.AI Inc. | Lead: Hal Casteel