Judge Persona Quick Reference Card
The 4-Step Persona Creation Process
Step 1: EXTRACT Dimensions from Domain Documents
Source Documents → Stakeholders → Dimensions → Evidence
HIPAA Security Rule → Compliance Officer → Data Protection → "§164.312(a)(1)"
FDA 21 CFR Part 11 → Audit Specialist → Traceability → "§11.10(e)"
OWASP Top 10 → Security Engineer → Vulnerability Prevention → "A03:2021"
Extraction Prompt:
Given this regulatory document: [DOCUMENT]
Identify:
1. Key stakeholder roles who would evaluate compliance
2. Their primary evaluation dimensions
3. Specific evidence/requirements they would check
Output as: (Stakeholder, Dimension, Evidence Citation)
Step 2: CONSTRUCT the 5-Attribute Persona
| Attribute | Purpose | Example |
|---|---|---|
| Demographic | Establish credibility | "Dr. Sarah Chen, CISSP, 15 years security" |
| Evaluative Dimension | Focus area | "Security Vulnerability Assessment" |
| Domain Expertise | Knowledge scope | "OWASP, HIPAA technical safeguards" |
| Psychological Traits | Evaluation style | "Low risk tolerance, high detail" |
| Social Role | Organizational position | "Guardian of system integrity" |
Step 3: DESIGN the Question-Specific Rubric
Rubric Structure:
1. CONTEXT: What are we evaluating?
2. DIMENSIONS: 3-5 evaluation criteria with weights
3. SCALE: 5-point with concrete examples
4. CHAIN-OF-THOUGHT: Step-by-step evaluation instructions
5. OUTPUT: JSON schema for structured response
Scoring Best Practices:
- ✅ Binary (Pass/Fail) - Most reliable
- ✅ 3-point (Excellent/Acceptable/Poor) - Good balance
- ✅ 5-point with clear rubric - Acceptable
- ❌ 10+ point scales - Too granular for LLMs
Step 4: DIVERSIFY the Judge Panel
Model Family Diversity (Required):
Judge 1: Claude family
Judge 2: GPT family
Judge 3: DeepSeek/Mistral family
Perspective Diversity:
- Skeptic (finds issues)
- Advocate (seeks approval)
- Neutral (balanced)
Expertise Diversity:
- Security specialist
- Compliance specialist
- Quality specialist
- Domain specialist
Core Judge Personas for Coditect
| Persona | Primary Focus | Key Question |
|---|---|---|
| Security Architect | Vulnerabilities | "Can this be exploited?" |
| Compliance Officer | Regulatory adherence | "Will this pass audit?" |
| Quality Engineer | Code correctness | "Does this work correctly?" |
| Domain Expert | Business logic | "Is this clinically accurate?" |
| Ethics Judge | Safety/harm | "Could this hurt someone?" |
Voting Mechanisms
For Binary Decisions:
# 2/3 supermajority for approval
approved = (sum(votes) / len(votes)) >= 0.67
For Scoring:
# Weighted average with veto
score = weighted_average(judge_scores, judge_weights)
has_veto = any(score <= 1 for score in judge_scores.values())
Debate Triggers
Trigger debate when:
- Score variance > 1.0 point
- Any judge gives score ≤ 1 (critical failure)
- Pass/fail split among judges
Anti-Bias Checklist
- Randomize presentation order
- Use diverse model families
- Require evidence citations
- Independent first assessment (blind to others)
- Add debiasing instructions to prompts
Key Research Sources
- MAJ-EVAL (Chen et al. 2025) - Automatic persona extraction
- PoLL (Verga et al. 2024) - Diverse panel reduces bias 40-60%
- G-Eval (Liu et al. 2023) - Chain-of-thought evaluation
- Rubric Is All You Need (2025) - Question-specific rubrics
Keep this card handy when designing new judge personas