Council Review Skill

When to Use This Skill

Use this skill when implementing council review patterns in your codebase.

How to Use This Skill

Review the patterns and examples below
Apply the relevant patterns to your implementation
Follow the best practices outlined in this skill

Multi-agent code review pattern with anonymized peer evaluation, consensus scoring, and structured verdicts. Based on the LLM Council pattern adapted for enterprise compliance requirements.

Overview

The Council Review pattern implements a 3-stage review pipeline:

Stage 1: Parallel Specialized Review - Multiple domain experts review independently
Stage 2: Anonymous Cross-Evaluation - Reviewers rank each other's work (identities hidden)
Stage 3: Chairman Synthesis - Final verdict with merge decision

Key Differentiators

Feature	Standard Review	Council Review
Reviewers	Single agent	4-6 specialized agents
Bias Prevention	None	Anonymous peer ranking
Consensus Signal	None	Kendall's W coefficient
Decision Output	Comments	Structured verdict (approve/changes/reject)
Audit Trail	Basic logs	Hash-chained compliance records

Core Components

1. Anonymization Pattern

# Prevent model bias through identity hiding during cross-evaluation
NEUTRAL_LABELS = ['Alpha', 'Beta', 'Gamma', 'Delta', 'Epsilon']

def anonymize_reviews(reviews: Dict[str, ReviewResult]) -> Tuple[Dict, Dict]:
    """
    Convert reviewer IDs to neutral labels.

    Returns:
        anonymized_reviews: Dict[label, content]
        label_mapping: Dict[label, original_reviewer_id]
    """
    shuffled_labels = random.sample(NEUTRAL_LABELS[:len(reviews)], len(reviews))
    label_mapping = {}
    anonymized = {}

    for (reviewer_id, review), label in zip(reviews.items(), shuffled_labels):
        label_mapping[label] = reviewer_id
        anonymized[label] = sanitize_content(review)

    return anonymized, label_mapping

2. Ranking Aggregation

def compute_consensus(rankings: Dict[str, List[str]]) -> float:
    """
    Compute Kendall's W (coefficient of concordance).

    Returns:
        0.0-1.0 where 1.0 = perfect agreement among rankers
    """
    # Collect position sums for each ranked item
    n_items = len(rankings[list(rankings.keys())[0]])
    n_rankers = len(rankings)

    rank_sums = defaultdict(int)
    for ranker, ordered_labels in rankings.items():
        for position, label in enumerate(ordered_labels, start=1):
            rank_sums[label] += position

    # Calculate W statistic
    mean_rank_sum = sum(rank_sums.values()) / n_items
    ss = sum((rs - mean_rank_sum) ** 2 for rs in rank_sums.values())
    max_ss = (n_rankers ** 2 * (n_items ** 3 - n_items)) / 12

    return ss / max_ss if max_ss > 0 else 1.0

3. Chairman Verdict Structure

@dataclass
class CouncilVerdict:
    """Structured output from council review."""
    decision: str  # "approve" | "request_changes" | "reject"
    aggregate_score: float  # 0.0-1.0
    blocking_findings: List[Finding]
    consensus_level: float  # Kendall's W
    chairman_synthesis: str
    audit_hash: str  # SHA256 chain for compliance

    def to_ci_status(self) -> str:
        """Convert to GitHub/GitLab CI status."""
        return {
            "approve": "success",
            "request_changes": "pending",
            "reject": "failure"
        }[self.decision]

Invocation Patterns

Via Task Tool

# Full council review
Task(
    subagent_type="council-orchestrator",
    prompt="""
    Execute council review on src/auth/jwt_handler.rs with:
    - Security reviewer focus on OWASP Top 10
    - Compliance reviewer focus on HIPAA
    - Performance reviewer focus on latency
    Require consensus >= 0.6 for approval
    """
)

Via Command

/council-review src/api/handlers.rs --reviewers security,compliance,performance --threshold 0.7

Reviewer Types

SecurityReviewer

Focus: Injection vulnerabilities, authentication issues, data exposure, cryptographic weaknesses

evaluation_rubric:
  injection_vulnerabilities: 0.25
  authentication_issues: 0.20
  data_exposure: 0.20
  cryptographic_weaknesses: 0.15
  input_validation: 0.10
  dependency_risks: 0.10

ComplianceReviewer

Focus: Data handling, audit logging, access control, encryption, retention

evaluation_rubric:
  data_handling: 0.30
  audit_logging: 0.25
  access_control: 0.20
  encryption: 0.15
  retention: 0.10

frameworks:
  - HIPAA
  - SOC2
  - FDA_21_CFR_Part_11

PerformanceReviewer

Focus: Algorithmic complexity, memory usage, I/O efficiency, concurrency

evaluation_rubric:
  algorithmic_complexity: 0.30
  memory_usage: 0.25
  io_efficiency: 0.20
  concurrency: 0.15
  caching: 0.10

TestCoverageReviewer

Focus: Test completeness, edge cases, mock quality, integration testing

evaluation_rubric:
  coverage_percentage: 0.30
  edge_case_handling: 0.25
  mock_quality: 0.20
  integration_tests: 0.15
  test_maintainability: 0.10

Pipeline Workflow

┌─────────────────────────────────────────────────────────────┐
│  Stage 1: Parallel Specialized Reviews                      │
│  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐           │
│  │Security │ │Compliance│ │Performance│ │TestCov │           │
│  └────┬────┘ └────┬────┘ └────┬─────┘ └────┬───┘           │
│       │           │            │            │               │
│       ▼           ▼            ▼            ▼               │
│  Findings   Findings    Findings     Findings               │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│  Stage 2: Anonymous Cross-Evaluation                        │
│  Each reviewer ranks others (Alpha, Beta, Gamma, Delta)     │
│  - Identity hidden to prevent model favoritism              │
│  - Rankings aggregated via Kendall's W                      │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│  Stage 3: Chairman Synthesis                                │
│  - Sees de-anonymized context                               │
│  - Applies hard thresholds                                  │
│  - Generates structured verdict                             │
│  - Signs audit record                                       │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│  Output: CouncilVerdict                                     │
│  - decision: approve | request_changes | reject             │
│  - aggregate_score: 0.0-1.0                                 │
│  - consensus_level: Kendall's W                             │
│  - audit_hash: SHA256 chain                                 │
└─────────────────────────────────────────────────────────────┘

Decision Thresholds

Condition	Action
Any CRITICAL finding	REJECT
>3 HIGH findings	REQUEST_CHANGES
Aggregate score < 0.7	REQUEST_CHANGES
Consensus < 0.5 + blocking findings	FLAG FOR HUMAN REVIEW
All pass + consensus >= 0.7	APPROVE

Compliance Integration

Audit Trail Structure

@dataclass
class ComplianceAuditRecord:
    timestamp: datetime
    artifact_hash: str  # SHA256 of reviewed code
    stage1_hashes: Dict[str, str]  # reviewer → response hash
    stage2_hashes: Dict[str, str]  # reviewer → ranking hash
    chairman_hash: str
    verdict_hash: str
    chain_hash: str  # SHA256 of all above

    def verify(self) -> bool:
        """Verify hash chain integrity for auditors."""
        expected = compute_chain_hash(
            self.artifact_hash,
            self.stage1_hashes,
            self.stage2_hashes,
            self.chairman_hash,
            self.verdict_hash
        )
        return expected == self.chain_hash

Supported Frameworks

FDA 21 CFR Part 11 - Electronic signatures and audit trails
HIPAA - PHI handling and access controls
SOC2 - Security and availability controls
GDPR - Data protection compliance

Token Economics

Stage	Tokens	Est. Cost
Stage 1 (4 reviewers)	~12,000	~$0.036
Stage 2 (4 cross-evals)	~8,000	~$0.024
Stage 3 (chairman)	~5,000	~$0.015
Total per file	~25,000	~$0.075

For a typical PR (10 files): ~$0.75

council-orchestrator agent - Coordinates the 3-stage pipeline
council-chairman agent - Synthesizes final verdicts
/council-review command - Entry point for council reviews
uncertainty-orchestrator agent - Alternative for research (not code review)

Version History

Version	Date	Changes
1.0.0	2025-12-20	Initial implementation based on LLM Council pattern

Origin: Adapted from Andrej Karpathy's LLM Council pattern with enterprise hardening for CODITECT compliance requirements.

Last Updated: 2025-12-20

Success Output

When successful, this skill MUST output:

✅ SKILL COMPLETE: council-review

Completed:
- [x] Stage 1: {REVIEWER_COUNT} specialized reviews completed
- [x] Stage 2: Anonymous cross-evaluation rankings aggregated
- [x] Stage 3: Chairman synthesis and verdict generated
- [x] Audit trail hash chain created
- [x] Compliance report generated

Council Decision: {APPROVE|REQUEST_CHANGES|REJECT}

Verdict Details:
- Aggregate score: {SCORE}/100
- Consensus level: {KENDALLS_W} (Kendall's W)
- Blocking findings: {BLOCKING_COUNT}
- Critical issues: {CRITICAL_COUNT}
- High issues: {HIGH_COUNT}

Reviewers:
- {REVIEWER_1}: Score {SCORE_1}/100
- {REVIEWER_2}: Score {SCORE_2}/100
- {REVIEWER_3}: Score {SCORE_3}/100
- {REVIEWER_4}: Score {SCORE_4}/100

Outputs:
- Council verdict: council-verdict-{TIMESTAMP}.json
- Audit record: audit-trail-{HASH}.json
- Compliance report: compliance-report-{TIMESTAMP}.md

Completion Checklist

Before marking this skill as complete, verify:

Failure Indicators

This skill has FAILED if:

❌ Fewer than 3 reviewers participated (insufficient diversity)
❌ Reviewer identities leaked during Stage 2 (bias risk)
❌ Kendall's W not calculated (no consensus metric)
❌ Blocking findings not surfaced in verdict
❌ Decision threshold logic bypassed
❌ Verdict missing required fields (decision, score, consensus)
❌ Audit hash chain verification fails
❌ Compliance requirements not met (HIPAA, SOC2, etc.)
❌ Chairman synthesis contradicts reviewer consensus
❌ Token cost exceeds budget (>30,000 tokens/file)

When NOT to Use

Do NOT use this skill when:

Simple code review - Use standard PR review for minor changes
Urgent hotfixes - Council review takes 15-30 minutes, too slow for emergencies
Non-critical code - Reserve for security-sensitive, compliance-required, or architectural changes
Low compliance requirements - Use single-agent review for internal tools
Budget constraints - Council review costs ~$0.75 per file (4-6 reviewers)
Early prototyping - Overkill for experimental code
Documentation-only PRs - Use doc review agent instead

Anti-Patterns (Avoid)

Anti-Pattern	Problem	Solution
Using for every PR	Noise, review fatigue, high cost	Reserve for critical paths (auth, payments, PHI)
Skipping anonymization	Model bias towards certain agents	Always anonymize in Stage 2
Ignoring low consensus	Reviewers disagree, verdict unreliable	Flag for human review when W < 0.5
Auto-approving high scores	Score doesn't capture blocking issues	Check blocking findings first
No audit trail	Compliance failure, no record	Always generate hash chain
Chairman overriding consensus	Defeats purpose of council	Chairman synthesizes, doesn't override
Same reviewers every time	Stale perspectives	Rotate reviewers based on code domain

Principles

This skill embodies:

#5 Eliminate Ambiguity - Structured verdict with clear decision criteria
#6 Clear, Understandable, Explainable - Chairman synthesis explains reasoning
#8 No Assumptions - Multiple reviewers catch different issues
#10 Security First - Security reviewer mandatory for auth/crypto code
#11 Quality is Non-Negotiable - Multi-agent review ensures thorough analysis
#12 Transparent Documentation - Audit trail provides complete record
#15 Compliance is Essential - Built for FDA 21 CFR Part 11, HIPAA, SOC2

Full Standard: CODITECT-STANDARD-AUTOMATION.md

When to Use This Skill​

How to Use This Skill​

Overview​

Key Differentiators​

Core Components​

1. Anonymization Pattern​

2. Ranking Aggregation​

3. Chairman Verdict Structure​

Invocation Patterns​

Via Task Tool​

Via Command​

Reviewer Types​

SecurityReviewer​

ComplianceReviewer​

PerformanceReviewer​

TestCoverageReviewer​

Pipeline Workflow​

Decision Thresholds​

Compliance Integration​

Audit Trail Structure​

Supported Frameworks​

Token Economics​

Related Components​

Version History​

Success Output​

Completion Checklist​

Failure Indicators​

When NOT to Use​

Anti-Patterns (Avoid)​

Principles​