Document Frontmatter Quality Report
Date: 2026-02-17
Task: J.20 (Document Taxonomy System)
Scope: 1,446 documents indexed with taxonomy-aware frontmatter
Database: platform.db (document_frontmatter table)
Grade Distribution
| Grade | Count | Percentage | Description |
|---|---|---|---|
| A (90-100) | 499 | 34.5% | Excellent — all key fields present |
| B (70-89) | 717 | 49.6% | Good — minor missing fields |
| C (50-69) | 172 | 11.9% | Fair — multiple missing fields |
| D (30-49) | 51 | 3.5% | Poor — major fields missing |
| F (<30) | 7 | 0.5% | Critical — external artifacts |
84.1% at B or above. 15.9% need attention.
Issue Frequency
| Issue | Count | % of Docs | Severity | Auto-fixable? |
|---|---|---|---|---|
| MISSING_AUDIENCE | 891 | 61.6% | Medium | Yes (directory-based inference) |
| MISSING_KEYWORDS | 878 | 60.7% | Low | Yes (content extraction) |
| MISSING_SUMMARY | 264 | 18.3% | High | Partial (AI-generated) |
| DRAFT_STATUS | 150 | 10.4% | Low | No (intentional) |
| MISSING_STATUS | 87 | 6.0% | Medium | Yes (default: active) |
| MISSING_TYPE | 64 | 4.4% | Medium | Yes (directory/filename) |
| AUTO_SUMMARY | 61 | 4.2% | Medium | Partial (need real summaries) |
| MISSING_TITLE | 58 | 4.0% | Critical | Yes (first heading / filename) |
| LOW_MOE_CONFIDENCE | 3 | 0.2% | Low | No (classifier limitation) |
Average Quality by Category
| Category | Avg Score | Doc Count |
|---|---|---|
| Analysis | 93.4 | 43 |
| Guides | 93.2 | 73 |
| Research | 87.0 | 171 |
| Architecture | 86.2 | 363 |
| Business | 85.0 | 6 |
| Reference | 83.2 | 540 |
| Operations | 81.1 | 36 |
| Planning | 77.0 | 214 |
Fix Plan: Priority Order
Priority 1: Missing Titles (49 core docs) — CRITICAL
Titles are displayed in sidebar navigation. Missing title = confusing UX.
Auto-fix strategy: Read first # Heading from document content.
Key files:
- 12 TRACK plan files (internal/project/plans/tracks/)
- 12 coditect-core-standards files
- 10 PCF track files
- 8 ADR files
- 7 other internal docs
Priority 2: Missing/Auto-classified Summaries (258 core + 61 auto = 319 total)
Summaries appear in search results and tooltips.
Auto-fix strategy: Extract first non-heading paragraph (truncated to 200 chars).
By directory:
- internal/architecture/: 70 docs
- internal/project/: 45 docs
- templates/business-documents/: 37 docs
- coditect-core-standards/: 12 docs
- config/templates/: 10 docs
Priority 3: Missing Audience (891 docs)
Audience field controls badge colors in viewer.
Auto-fix strategy: Map from directory path:
internal/*→ "contributor"docs/*→ "user"docs/reference/*→ "technical"templates/*→ "technical"coditect-core-standards/*→ "contributor"
Priority 4: Missing Keywords (878 docs)
Keywords boost search relevance.
Auto-fix strategy: Extract from document content — directory name, frontmatter type, first 3-5 significant words from title.
Priority 5: Missing Status (87 docs)
Auto-fix: Default to "active" for all.
External Artifacts (Skip)
67 documents under analyze-new-artifacts/ and codanna/ are third-party research artifacts. Their frontmatter quality is not actionable — they come from external sources.
Recommendation
- Build
scripts/fix-frontmatter.pyto batch-fix Priorities 1-5 - Run with
--dry-runfirst to preview changes - Validate with
component-indexer.py --frontmatter-statsafter - Re-generate publish.json to verify sidebar quality
- Skip external artifacts (analyze-new-artifacts/, codanna/)
Estimated effort: ~1 hour automated, ~30 min review Impact: Raises A+B from 84% to ~95%+