Skip to main content

Judge Persona Quick Reference Card

The 4-Step Persona Creation Process

Step 1: EXTRACT Dimensions from Domain Documents

Source Documents → Stakeholders → Dimensions → Evidence

HIPAA Security Rule → Compliance Officer → Data Protection → "§164.312(a)(1)"
FDA 21 CFR Part 11 → Audit Specialist → Traceability → "§11.10(e)"
OWASP Top 10 → Security Engineer → Vulnerability Prevention → "A03:2021"

Extraction Prompt:

Given this regulatory document: [DOCUMENT]

Identify:
1. Key stakeholder roles who would evaluate compliance
2. Their primary evaluation dimensions
3. Specific evidence/requirements they would check

Output as: (Stakeholder, Dimension, Evidence Citation)

Step 2: CONSTRUCT the 5-Attribute Persona

AttributePurposeExample
DemographicEstablish credibility"Dr. Sarah Chen, CISSP, 15 years security"
Evaluative DimensionFocus area"Security Vulnerability Assessment"
Domain ExpertiseKnowledge scope"OWASP, HIPAA technical safeguards"
Psychological TraitsEvaluation style"Low risk tolerance, high detail"
Social RoleOrganizational position"Guardian of system integrity"

Step 3: DESIGN the Question-Specific Rubric

Rubric Structure:

1. CONTEXT: What are we evaluating?
2. DIMENSIONS: 3-5 evaluation criteria with weights
3. SCALE: 5-point with concrete examples
4. CHAIN-OF-THOUGHT: Step-by-step evaluation instructions
5. OUTPUT: JSON schema for structured response

Scoring Best Practices:

  • ✅ Binary (Pass/Fail) - Most reliable
  • ✅ 3-point (Excellent/Acceptable/Poor) - Good balance
  • ✅ 5-point with clear rubric - Acceptable
  • ❌ 10+ point scales - Too granular for LLMs

Step 4: DIVERSIFY the Judge Panel

Model Family Diversity (Required):

Judge 1: Claude family
Judge 2: GPT family
Judge 3: DeepSeek/Mistral family

Perspective Diversity:

  • Skeptic (finds issues)
  • Advocate (seeks approval)
  • Neutral (balanced)

Expertise Diversity:

  • Security specialist
  • Compliance specialist
  • Quality specialist
  • Domain specialist

Core Judge Personas for Coditect

PersonaPrimary FocusKey Question
Security ArchitectVulnerabilities"Can this be exploited?"
Compliance OfficerRegulatory adherence"Will this pass audit?"
Quality EngineerCode correctness"Does this work correctly?"
Domain ExpertBusiness logic"Is this clinically accurate?"
Ethics JudgeSafety/harm"Could this hurt someone?"

Voting Mechanisms

For Binary Decisions:

# 2/3 supermajority for approval
approved = (sum(votes) / len(votes)) >= 0.67

For Scoring:

# Weighted average with veto
score = weighted_average(judge_scores, judge_weights)
has_veto = any(score <= 1 for score in judge_scores.values())

Debate Triggers

Trigger debate when:

  • Score variance > 1.0 point
  • Any judge gives score ≤ 1 (critical failure)
  • Pass/fail split among judges

Anti-Bias Checklist

  • Randomize presentation order
  • Use diverse model families
  • Require evidence citations
  • Independent first assessment (blind to others)
  • Add debiasing instructions to prompts

Key Research Sources

  1. MAJ-EVAL (Chen et al. 2025) - Automatic persona extraction
  2. PoLL (Verga et al. 2024) - Diverse panel reduces bias 40-60%
  3. G-Eval (Liu et al. 2023) - Chain-of-thought evaluation
  4. Rubric Is All You Need (2025) - Question-specific rubrics

Keep this card handy when designing new judge personas