Council Review Skill
Council Review Skill
When to Use This Skill
Use this skill when implementing council review patterns in your codebase.
How to Use This Skill
- Review the patterns and examples below
- Apply the relevant patterns to your implementation
- Follow the best practices outlined in this skill
Multi-agent code review pattern with anonymized peer evaluation, consensus scoring, and structured verdicts. Based on the LLM Council pattern adapted for enterprise compliance requirements.
Overview
The Council Review pattern implements a 3-stage review pipeline:
- Stage 1: Parallel Specialized Review - Multiple domain experts review independently
- Stage 2: Anonymous Cross-Evaluation - Reviewers rank each other's work (identities hidden)
- Stage 3: Chairman Synthesis - Final verdict with merge decision
Key Differentiators
| Feature | Standard Review | Council Review |
|---|---|---|
| Reviewers | Single agent | 4-6 specialized agents |
| Bias Prevention | None | Anonymous peer ranking |
| Consensus Signal | None | Kendall's W coefficient |
| Decision Output | Comments | Structured verdict (approve/changes/reject) |
| Audit Trail | Basic logs | Hash-chained compliance records |
Core Components
1. Anonymization Pattern
# Prevent model bias through identity hiding during cross-evaluation
NEUTRAL_LABELS = ['Alpha', 'Beta', 'Gamma', 'Delta', 'Epsilon']
def anonymize_reviews(reviews: Dict[str, ReviewResult]) -> Tuple[Dict, Dict]:
"""
Convert reviewer IDs to neutral labels.
Returns:
anonymized_reviews: Dict[label, content]
label_mapping: Dict[label, original_reviewer_id]
"""
shuffled_labels = random.sample(NEUTRAL_LABELS[:len(reviews)], len(reviews))
label_mapping = {}
anonymized = {}
for (reviewer_id, review), label in zip(reviews.items(), shuffled_labels):
label_mapping[label] = reviewer_id
anonymized[label] = sanitize_content(review)
return anonymized, label_mapping
2. Ranking Aggregation
def compute_consensus(rankings: Dict[str, List[str]]) -> float:
"""
Compute Kendall's W (coefficient of concordance).
Returns:
0.0-1.0 where 1.0 = perfect agreement among rankers
"""
# Collect position sums for each ranked item
n_items = len(rankings[list(rankings.keys())[0]])
n_rankers = len(rankings)
rank_sums = defaultdict(int)
for ranker, ordered_labels in rankings.items():
for position, label in enumerate(ordered_labels, start=1):
rank_sums[label] += position
# Calculate W statistic
mean_rank_sum = sum(rank_sums.values()) / n_items
ss = sum((rs - mean_rank_sum) ** 2 for rs in rank_sums.values())
max_ss = (n_rankers ** 2 * (n_items ** 3 - n_items)) / 12
return ss / max_ss if max_ss > 0 else 1.0
3. Chairman Verdict Structure
@dataclass
class CouncilVerdict:
"""Structured output from council review."""
decision: str # "approve" | "request_changes" | "reject"
aggregate_score: float # 0.0-1.0
blocking_findings: List[Finding]
consensus_level: float # Kendall's W
chairman_synthesis: str
audit_hash: str # SHA256 chain for compliance
def to_ci_status(self) -> str:
"""Convert to GitHub/GitLab CI status."""
return {
"approve": "success",
"request_changes": "pending",
"reject": "failure"
}[self.decision]
Invocation Patterns
Via Task Tool
# Full council review
Task(
subagent_type="council-orchestrator",
prompt="""
Execute council review on src/auth/jwt_handler.rs with:
- Security reviewer focus on OWASP Top 10
- Compliance reviewer focus on HIPAA
- Performance reviewer focus on latency
Require consensus >= 0.6 for approval
"""
)
Via Command
/council-review src/api/handlers.rs --reviewers security,compliance,performance --threshold 0.7
Reviewer Types
SecurityReviewer
Focus: Injection vulnerabilities, authentication issues, data exposure, cryptographic weaknesses
evaluation_rubric:
injection_vulnerabilities: 0.25
authentication_issues: 0.20
data_exposure: 0.20
cryptographic_weaknesses: 0.15
input_validation: 0.10
dependency_risks: 0.10
ComplianceReviewer
Focus: Data handling, audit logging, access control, encryption, retention
evaluation_rubric:
data_handling: 0.30
audit_logging: 0.25
access_control: 0.20
encryption: 0.15
retention: 0.10
frameworks:
- HIPAA
- SOC2
- FDA_21_CFR_Part_11
PerformanceReviewer
Focus: Algorithmic complexity, memory usage, I/O efficiency, concurrency
evaluation_rubric:
algorithmic_complexity: 0.30
memory_usage: 0.25
io_efficiency: 0.20
concurrency: 0.15
caching: 0.10
TestCoverageReviewer
Focus: Test completeness, edge cases, mock quality, integration testing
evaluation_rubric:
coverage_percentage: 0.30
edge_case_handling: 0.25
mock_quality: 0.20
integration_tests: 0.15
test_maintainability: 0.10
Pipeline Workflow
┌─────────────────────────────────────────────────────────────┐
│ Stage 1: Parallel Specialized Reviews │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │Security │ │Compliance│ │Performance│ │TestCov │ │
│ └────┬────┘ └────┬────┘ └────┬─────┘ └────┬───┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ Findings Findings Findings Findings │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Stage 2: Anonymous Cross-Evaluation │
│ Each reviewer ranks others (Alpha, Beta, Gamma, Delta) │
│ - Identity hidden to prevent model favoritism │
│ - Rankings aggregated via Kendall's W │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Stage 3: Chairman Synthesis │
│ - Sees de-anonymized context │
│ - Applies hard thresholds │
│ - Generates structured verdict │
│ - Signs audit record │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Output: CouncilVerdict │
│ - decision: approve | request_changes | reject │
│ - aggregate_score: 0.0-1.0 │
│ - consensus_level: Kendall's W │
│ - audit_hash: SHA256 chain │
└─────────────────────────────────────────────────────────────┘
Decision Thresholds
| Condition | Action |
|---|---|
| Any CRITICAL finding | REJECT |
| >3 HIGH findings | REQUEST_CHANGES |
| Aggregate score < 0.7 | REQUEST_CHANGES |
| Consensus < 0.5 + blocking findings | FLAG FOR HUMAN REVIEW |
| All pass + consensus >= 0.7 | APPROVE |
Compliance Integration
Audit Trail Structure
@dataclass
class ComplianceAuditRecord:
timestamp: datetime
artifact_hash: str # SHA256 of reviewed code
stage1_hashes: Dict[str, str] # reviewer → response hash
stage2_hashes: Dict[str, str] # reviewer → ranking hash
chairman_hash: str
verdict_hash: str
chain_hash: str # SHA256 of all above
def verify(self) -> bool:
"""Verify hash chain integrity for auditors."""
expected = compute_chain_hash(
self.artifact_hash,
self.stage1_hashes,
self.stage2_hashes,
self.chairman_hash,
self.verdict_hash
)
return expected == self.chain_hash
Supported Frameworks
- FDA 21 CFR Part 11 - Electronic signatures and audit trails
- HIPAA - PHI handling and access controls
- SOC2 - Security and availability controls
- GDPR - Data protection compliance
Token Economics
| Stage | Tokens | Est. Cost |
|---|---|---|
| Stage 1 (4 reviewers) | ~12,000 | ~$0.036 |
| Stage 2 (4 cross-evals) | ~8,000 | ~$0.024 |
| Stage 3 (chairman) | ~5,000 | ~$0.015 |
| Total per file | ~25,000 | ~$0.075 |
For a typical PR (10 files): ~$0.75
Related Components
- council-orchestrator agent - Coordinates the 3-stage pipeline
- council-chairman agent - Synthesizes final verdicts
- /council-review command - Entry point for council reviews
- uncertainty-orchestrator agent - Alternative for research (not code review)
Version History
| Version | Date | Changes |
|---|---|---|
| 1.0.0 | 2025-12-20 | Initial implementation based on LLM Council pattern |
Origin: Adapted from Andrej Karpathy's LLM Council pattern with enterprise hardening for CODITECT compliance requirements.
Last Updated: 2025-12-20
Success Output
When successful, this skill MUST output:
✅ SKILL COMPLETE: council-review
Completed:
- [x] Stage 1: {REVIEWER_COUNT} specialized reviews completed
- [x] Stage 2: Anonymous cross-evaluation rankings aggregated
- [x] Stage 3: Chairman synthesis and verdict generated
- [x] Audit trail hash chain created
- [x] Compliance report generated
Council Decision: {APPROVE|REQUEST_CHANGES|REJECT}
Verdict Details:
- Aggregate score: {SCORE}/100
- Consensus level: {KENDALLS_W} (Kendall's W)
- Blocking findings: {BLOCKING_COUNT}
- Critical issues: {CRITICAL_COUNT}
- High issues: {HIGH_COUNT}
Reviewers:
- {REVIEWER_1}: Score {SCORE_1}/100
- {REVIEWER_2}: Score {SCORE_2}/100
- {REVIEWER_3}: Score {SCORE_3}/100
- {REVIEWER_4}: Score {SCORE_4}/100
Outputs:
- Council verdict: council-verdict-{TIMESTAMP}.json
- Audit record: audit-trail-{HASH}.json
- Compliance report: compliance-report-{TIMESTAMP}.md
Completion Checklist
Before marking this skill as complete, verify:
- At least 3 specialized reviewers participated
- All reviewers completed Stage 1 independent reviews
- Identities anonymized in Stage 2 (Alpha, Beta, Gamma, Delta)
- Cross-evaluation rankings collected from each reviewer
- Kendall's W coefficient calculated (consensus metric)
- Chairman reviewed de-anonymized context
- Blocking findings identified (CRITICAL severity)
- Decision threshold logic applied correctly
- Structured verdict generated with all fields
- Audit trail hash chain created and verified
- CouncilVerdict JSON serialized to file
- Compliance report generated (if required)
Failure Indicators
This skill has FAILED if:
- ❌ Fewer than 3 reviewers participated (insufficient diversity)
- ❌ Reviewer identities leaked during Stage 2 (bias risk)
- ❌ Kendall's W not calculated (no consensus metric)
- ❌ Blocking findings not surfaced in verdict
- ❌ Decision threshold logic bypassed
- ❌ Verdict missing required fields (decision, score, consensus)
- ❌ Audit hash chain verification fails
- ❌ Compliance requirements not met (HIPAA, SOC2, etc.)
- ❌ Chairman synthesis contradicts reviewer consensus
- ❌ Token cost exceeds budget (>30,000 tokens/file)
When NOT to Use
Do NOT use this skill when:
- Simple code review - Use standard PR review for minor changes
- Urgent hotfixes - Council review takes 15-30 minutes, too slow for emergencies
- Non-critical code - Reserve for security-sensitive, compliance-required, or architectural changes
- Low compliance requirements - Use single-agent review for internal tools
- Budget constraints - Council review costs ~$0.75 per file (4-6 reviewers)
- Early prototyping - Overkill for experimental code
- Documentation-only PRs - Use doc review agent instead
Anti-Patterns (Avoid)
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Using for every PR | Noise, review fatigue, high cost | Reserve for critical paths (auth, payments, PHI) |
| Skipping anonymization | Model bias towards certain agents | Always anonymize in Stage 2 |
| Ignoring low consensus | Reviewers disagree, verdict unreliable | Flag for human review when W < 0.5 |
| Auto-approving high scores | Score doesn't capture blocking issues | Check blocking findings first |
| No audit trail | Compliance failure, no record | Always generate hash chain |
| Chairman overriding consensus | Defeats purpose of council | Chairman synthesizes, doesn't override |
| Same reviewers every time | Stale perspectives | Rotate reviewers based on code domain |
Principles
This skill embodies:
- #5 Eliminate Ambiguity - Structured verdict with clear decision criteria
- #6 Clear, Understandable, Explainable - Chairman synthesis explains reasoning
- #8 No Assumptions - Multiple reviewers catch different issues
- #10 Security First - Security reviewer mandatory for auth/crypto code
- #11 Quality is Non-Negotiable - Multi-agent review ensures thorough analysis
- #12 Transparent Documentation - Audit trail provides complete record
- #15 Compliance is Essential - Built for FDA 21 CFR Part 11, HIPAA, SOC2
Full Standard: CODITECT-STANDARD-AUTOMATION.md