ADR-161: Component Quality Assurance Framework
Status: Accepted Date: 2026-02-07 Author: Claude (Opus 4.6) Deciders: Hal Casteel
Context
CODITECT has 3,379+ components across 7 types (agents, skills, commands, hooks, scripts, workflows, tools) but no unified QA grading system. An initial agent QA report (Feb 7, 2026) revealed critical limitations:
- Presence-only checking: Grading detected whether sections existed but not whether content was meaningful
- Single-type coverage: Only agents had a grading script; 6 other types had none
- Missing standards: Commands had no standard at all (363 commands ungoverned)
- No content quality heuristics: No specificity scoring, code completeness detection, or link validation
- No orchestration: No way to grade all component types together
Decision
Implement a layered Component QA Framework with type-specific graders, content quality heuristics, and unified orchestration.
Architecture
Layer 4: /qa command (user interface)
Layer 3: component-qa-reviewer agent (orchestration + remediation)
Layer 2: qa-grading-framework skill (patterns + methodology)
Layer 1: scripts/qa/grade-*.py (7 type-specific execution engines)
Layer 0: coditect-core-standards/ (source of truth for criteria)
Shared: scripts/qa/qa_common.py (shared utilities)
Grading Matrix
| Type | Cat 1 | Cat 2 | Cat 3 | Cat 4 | Cat 5 |
|---|---|---|---|---|---|
| Agents | File Format (20%) | YAML Frontmatter (40%) | Instruction Quality (30%) | Documentation (10%) | - |
| Skills | YAML Frontmatter (40%) | Progressive Disclosure (25%) | Instruction Quality (25%) | File Structure (10%) | - |
| Commands | File Format (15%) | YAML Frontmatter (35%) | Specification Quality (30%) | Documentation (20%) | - |
| Hooks | Structure (20%) | Security (30%) | Performance (20%) | Integration (15%) | Documentation (15%) |
| Scripts | Structure (20%) | CLI Interface (20%) | Security (20%) | Error Handling (20%) | Documentation (20%) |
| Workflows | Prerequisites (15%) | Step Completeness (30%) | Examples (25%) | Integration (15%) | Troubleshooting (15%) |
| Tools | Features (20%) | Usage Examples (25%) | Architecture (20%) | Setup (20%) | Troubleshooting (15%) |
Unified Grading Scale
| Grade | Score | Meaning |
|---|---|---|
| A | 90-100% | Production-ready, exemplary |
| B | 80-89% | Production-ready, minor improvements |
| C | 70-79% | Functional, moderate improvements |
| D | 60-69% | Significant improvements needed |
| F | <60% | Does not meet minimum standards |
Compliance Target: Grade B (80%) minimum within 30 days of standard publication.
Content Quality Heuristics
Beyond presence-checking, graders implement:
- Specificity Score: Ratio of domain-specific terms vs generic words (threshold: 0.3)
- Code Example Quality: Detect runnable code (imports, function calls) vs pseudocode/comments-only
- Link Validation: Verify referenced files exist on disk
- Staleness Detection: Compare
updatedfrontmatter date vs file mtime (flag if >90 days stale) - Instruction Density: Imperative verbs per paragraph as effectiveness proxy
JSON Output Schema
All graders produce consistent JSON:
{
"summary": {
"total_components": 0,
"average_score": 0.0,
"grade_distribution": {"A": 0, "B": 0, "C": 0, "D": 0, "F": 0},
"errors": 0
},
"attribute_pass_rates": {
"attribute_name": {"passed": 0, "failed": 0, "rate": 0.0}
},
"components": [
{
"name": "component-name",
"scores": {"A1_check_name": 1},
"category_scores": {"A_category": 0.0},
"total_base": 0.0,
"grade": "A"
}
],
"errors": []
}
Consequences
Positive
- All 3,379+ components graded with type-specific criteria
- Content quality assessed beyond presence-checking
- Unified JSON output enables dashboards and trend tracking
- Standards serve as authoritative source of truth for grading criteria
- Non-destructive: hook warns but doesn't block;
--fixmode is opt-in
Negative
- Content quality heuristics are approximations, not perfect measures
- Initial run will surface many low-scoring components requiring remediation
- Maintenance burden: standards changes require grader script updates
Risks
- False positives: Heuristics may flag valid content as low-quality
- Mitigation: Tunable thresholds and manual override capability
References
coditect-core-standards/coditect-standard-agents.md- Agent quality criteriacoditect-core-standards/coditect-standard-skills.md- Skill quality criteriacoditect-core-standards/coditect-standard-commands.md- Command quality criteria (NEW)coditect-core-standards/coditect-standard-hooks.md- Hook quality criteriacoditect-core-standards/coditect-standard-scripts.md- Script quality criteriacoditect-core-standards/coditect-standard-workflows.md- Workflow quality criteriadocs/project-management/AGENT-QA-REPORT-2026-02-07.md- Initial agent QA report