Document Classification Patterns
Document Classification Patterns
Pattern definitions, signal weights, and frontmatter templates for the CODITECT MoE document classification system.
When to Use This Skill
Use this skill when:
- Classifying new documents into CODITECT taxonomy
- Creating new documents with proper frontmatter
- Understanding classification signals and weights
- Debugging classification results
Classification Categories
Per ADR-018 and ADR-023, CODITECT documents are classified into these categories:
| Type | Description | Key Signals |
|---|---|---|
agent | AI agent definitions | type: agent, "You are a...", system prompts |
command | Slash commands | invocation: /xxx, command usage patterns |
skill | Reusable patterns | SKILL.md, ## When to Use This Skill |
guide | User tutorials | Step-by-step, troubleshooting sections |
reference | API/architecture docs | Tables, specifications, configuration |
workflow | Process definitions | Phases, sequenceDiagram, automation steps |
adr | Architecture decisions | Context/Decision/Consequences sections |
Signal Patterns
Agent Signals
Frontmatter (weight: 0.6):
type: agent
component_type: agent
Content (weight: 0.3):
- "You are a" / "You are an"
- "## Capabilities"
- "## When to Use"
Path (weight: 0.2):
agents//agents/
Title (weight: 0.15):
*-specialist.md*-expert.md*-agent.md*-orchestrator.md
Command Signals
Frontmatter (weight: 0.6):
type: command
component_type: command
invocation: /command-name
Content (weight: 0.3):
- "invocation:"
- "## Usage"
- "## System Prompt"
Path (weight: 0.2):
commands//commands/
Skill Signals
Frontmatter (weight: 0.6):
type: skill
component_type: skill
Content (weight: 0.3):
- "## When to Use This Skill"
- "## Pattern"
- SKILL.md filename
Path (weight: 0.2):
skills//skills/
Guide Signals
Frontmatter (weight: 0.6):
type: guide
component_type: guide
Content (weight: 0.3):
- "## Step 1" / "## Step"
- "## Prerequisites"
- "## Troubleshooting"
- "## Quick Start"
Path (weight: 0.2):
guides//guides/docs/
Title (weight: 0.15):
*-guide.md*-tutorial.md*-quickstart.md
Reference Signals
Frontmatter (weight: 0.6):
type: reference
component_type: reference
Content (weight: 0.3):
- "## API"
- "## Configuration"
- "| Parameter |"
- "| Field |"
Path (weight: 0.2):
reference//reference/docs/reference/
Title (weight: 0.15):
*-reference.md*-api.md*-spec.md
Workflow Signals
Frontmatter (weight: 0.6):
type: workflow
component_type: workflow
Content (weight: 0.3):
- "## Phase"
- "## Workflow"
- "sequenceDiagram"
- "graph TD"
Path (weight: 0.2):
workflows//workflows/
Title (weight: 0.15):
*-workflow.md*-pipeline.md
ADR Signals
Frontmatter (weight: 0.6):
type: adr
component_type: adr
doc_type: adr
adr_number: 23
Content (weight: 0.3):
- "## Context"
- "## Decision"
- "## Consequences"
- "Status: Accepted"
Path (weight: 0.2):
adrs//adrs/architecture/
Title (weight: 0.15):
ADR-*.mdadr-*.md
Confidence Thresholds
| Confidence | Approval Type | Action |
|---|---|---|
| ≥ 0.85 | AUTO_APPROVED | Classified automatically |
| 0.65 - 0.84 | JUDGE_APPROVED | Validated by consistency judge |
| 0.45 - 0.64 | DEEP_ANALYSIS_APPROVED | Required deep semantic analysis |
| < 0.45 | HUMAN_REVIEW_REQUIRED | Flagged for manual review |
Frontmatter Templates
Agent Template
---
title: Agent Name Specialist
type: agent
component_type: agent
version: 1.0.0
audience: contributor
status: active
summary: Brief description of agent purpose
keywords:
- domain
- specialization
tokens: ~1500
created: 'YYYY-MM-DD'
updated: 'YYYY-MM-DD'
tags:
- agent
- domain
---
Command Template
---
title: /command-name - Brief Description
component_type: command
version: 1.0.0
invocation: /command-name [args]
audience: customer
status: active
summary: What this command does
keywords:
- automation
tokens: ~1200
created: 'YYYY-MM-DD'
updated: 'YYYY-MM-DD'
command_name: /command-name
aliases: []
usage: /command-name [options]
requires_confirmation: false
modifies_files: true
network_access: false
type: command
tags:
- command
---
Guide Template
---
title: Topic Name Guide
type: guide
component_type: guide
version: 1.0.0
audience: customer
status: active
summary: Brief description of what this guide covers
keywords:
- topic
- tutorial
tokens: ~2000
created: 'YYYY-MM-DD'
updated: 'YYYY-MM-DD'
tags:
- guide
- tutorial
---
ADR Template
---
title: 'ADR-XXX: Decision Title'
component_type: adr
type: adr
version: 1.0.0
audience: contributor
status: accepted
summary: Brief summary of the decision
keywords:
- architecture
- decision
tokens: ~2500
created: 'YYYY-MM-DD'
updated: 'YYYY-MM-DD'
doc_type: adr
adr_number: XXX
deciders:
- Name
categories:
- architecture
supersedes: []
superseded_by: null
when_to_read: When working on related topics
prerequisites: []
related_adrs: []
tags:
- architecture
- adr
---
Related Resources
Success Output
When successful, this skill MUST output:
✅ SKILL COMPLETE: document-classification-patterns
Completed:
- [x] Document type classified: {type}
- [x] Confidence score: {score}
- [x] Approval type: {AUTO_APPROVED|JUDGE_APPROVED|DEEP_ANALYSIS_APPROVED|HUMAN_REVIEW_REQUIRED}
- [x] Frontmatter updated with classification metadata
- [x] Signal weights applied: frontmatter (0.6), content (0.3), path (0.2), title (0.15)
Outputs:
- Document: {file_path}
- Type: {agent|command|skill|guide|reference|workflow|adr}
- Confidence: {0.0-1.0}
- Approval: {approval_type}
- Keywords: {auto_generated_keywords}
- Tags: {auto_generated_tags}
Completion Checklist
Before marking this skill as complete, verify:
- Document type classified into one of 7 categories (agent, command, skill, guide, reference, workflow, adr)
- Confidence score calculated (0.0-1.0) based on weighted signals
- Approval type determined based on confidence threshold
- Frontmatter added/updated with
type,component_type,moe_confidence,moe_classified - Keywords extracted and added to frontmatter
- Tags generated based on document type and content
- Classification date stamped (YYYY-MM-DD format)
- Signal patterns matched (frontmatter, content, path, title)
- Low confidence (<0.45) flagged for human review
Failure Indicators
This skill has FAILED if:
- ❌ Confidence score is NaN or out of range (0.0-1.0)
- ❌ Document type is None or invalid (not in 7 categories)
- ❌ Frontmatter update fails or corrupts existing metadata
- ❌ Signal detection returns no matches (all weights = 0)
- ❌ Classification date format invalid
- ❌ Approval type logic produces incorrect threshold assignment
- ❌ Multiple equally strong type signals create ambiguity without resolution
- ❌ File path or content encoding issues prevent analysis
When NOT to Use
Do NOT use this skill when:
- Document already has valid frontmatter with
typeand highmoe_confidence(>0.85) - File is not a Markdown document (.md extension)
- Document is auto-generated build artifact (no persistent classification needed)
- File is template or example documentation (not actual component)
- Binary files or non-text content (images, PDFs, executables)
- Temporary scratch files or session logs
- External documentation outside CODITECT framework
Use alternatives:
- Manual frontmatter: For templates and examples
- Skip classification: For build artifacts and temporary files
- External tool: For non-Markdown documentation formats
- Batch reclassification: When standards change requiring re-analysis
Anti-Patterns (Avoid)
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Overwriting existing high-confidence classification | Loss of validated metadata | Check existing moe_confidence before reclassifying |
| Relying solely on path signals | Misclassifies moved/reorganized files | Use multi-signal weighted approach |
| Ignoring frontmatter conflicts | Inconsistent metadata | Validate against existing type and component_type |
| Auto-approving low confidence | Poor quality classifications | Enforce threshold gates (≥0.85 for AUTO_APPROVED) |
| Missing content signal patterns | Incomplete classification | Scan for all 3+ signal types per category |
| Generic keyword extraction | Low-value metadata | Extract domain-specific keywords from content |
| No reclassification on content change | Stale classifications | Track content hash and retrigger on change |
Principles
This skill embodies the following CODITECT principles:
- #5 Eliminate Ambiguity - Multi-signal weighted classification with confidence scoring
- #6 Clear, Understandable, Explainable - Transparent signal weights and threshold logic
- #8 No Assumptions - Explicit confidence scoring triggers human review for low confidence
- Trust & Transparency - Classification metadata fully exposed in frontmatter
- Factual Grounding - Signal detection based on concrete patterns (keywords, paths, structure)
- MoE Architecture - Confidence thresholds route to appropriate approval paths (auto/judge/deep/human)
Version: 1.1.0 | Created: 2025-12-28 | Updated: 2026-01-04 Author: CODITECT Core Team Framework: CODITECT v1.7.2