Skip to main content

Document Classification Patterns

Document Classification Patterns

Pattern definitions, signal weights, and frontmatter templates for the CODITECT MoE document classification system.

When to Use This Skill

Use this skill when:

  • Classifying new documents into CODITECT taxonomy
  • Creating new documents with proper frontmatter
  • Understanding classification signals and weights
  • Debugging classification results

Classification Categories

Per ADR-018 and ADR-023, CODITECT documents are classified into these categories:

TypeDescriptionKey Signals
agentAI agent definitionstype: agent, "You are a...", system prompts
commandSlash commandsinvocation: /xxx, command usage patterns
skillReusable patternsSKILL.md, ## When to Use This Skill
guideUser tutorialsStep-by-step, troubleshooting sections
referenceAPI/architecture docsTables, specifications, configuration
workflowProcess definitionsPhases, sequenceDiagram, automation steps
adrArchitecture decisionsContext/Decision/Consequences sections

Signal Patterns

Agent Signals

Frontmatter (weight: 0.6):

type: agent
component_type: agent

Content (weight: 0.3):

  • "You are a" / "You are an"
  • "## Capabilities"
  • "## When to Use"

Path (weight: 0.2):

  • agents/
  • /agents/

Title (weight: 0.15):

  • *-specialist.md
  • *-expert.md
  • *-agent.md
  • *-orchestrator.md

Command Signals

Frontmatter (weight: 0.6):

type: command
component_type: command
invocation: /command-name

Content (weight: 0.3):

  • "invocation:"
  • "## Usage"
  • "## System Prompt"

Path (weight: 0.2):

  • commands/
  • /commands/

Skill Signals

Frontmatter (weight: 0.6):

type: skill
component_type: skill

Content (weight: 0.3):

  • "## When to Use This Skill"
  • "## Pattern"
  • SKILL.md filename

Path (weight: 0.2):

  • skills/
  • /skills/

Guide Signals

Frontmatter (weight: 0.6):

type: guide
component_type: guide

Content (weight: 0.3):

  • "## Step 1" / "## Step"
  • "## Prerequisites"
  • "## Troubleshooting"
  • "## Quick Start"

Path (weight: 0.2):

  • guides/
  • /guides/
  • docs/

Title (weight: 0.15):

  • *-guide.md
  • *-tutorial.md
  • *-quickstart.md

Reference Signals

Frontmatter (weight: 0.6):

type: reference
component_type: reference

Content (weight: 0.3):

  • "## API"
  • "## Configuration"
  • "| Parameter |"
  • "| Field |"

Path (weight: 0.2):

  • reference/
  • /reference/
  • docs/reference/

Title (weight: 0.15):

  • *-reference.md
  • *-api.md
  • *-spec.md

Workflow Signals

Frontmatter (weight: 0.6):

type: workflow
component_type: workflow

Content (weight: 0.3):

  • "## Phase"
  • "## Workflow"
  • "sequenceDiagram"
  • "graph TD"

Path (weight: 0.2):

  • workflows/
  • /workflows/

Title (weight: 0.15):

  • *-workflow.md
  • *-pipeline.md

ADR Signals

Frontmatter (weight: 0.6):

type: adr
component_type: adr
doc_type: adr
adr_number: 23

Content (weight: 0.3):

  • "## Context"
  • "## Decision"
  • "## Consequences"
  • "Status: Accepted"

Path (weight: 0.2):

  • adrs/
  • /adrs/
  • architecture/

Title (weight: 0.15):

  • ADR-*.md
  • adr-*.md

Confidence Thresholds

ConfidenceApproval TypeAction
≥ 0.85AUTO_APPROVEDClassified automatically
0.65 - 0.84JUDGE_APPROVEDValidated by consistency judge
0.45 - 0.64DEEP_ANALYSIS_APPROVEDRequired deep semantic analysis
< 0.45HUMAN_REVIEW_REQUIREDFlagged for manual review

Frontmatter Templates

Agent Template

---
title: Agent Name Specialist
type: agent
component_type: agent
version: 1.0.0
audience: contributor
status: active
summary: Brief description of agent purpose
keywords:
- domain
- specialization
tokens: ~1500
created: 'YYYY-MM-DD'
updated: 'YYYY-MM-DD'
tags:
- agent
- domain
---

Command Template

---
title: /command-name - Brief Description
component_type: command
version: 1.0.0
invocation: /command-name [args]
audience: customer
status: active
summary: What this command does
keywords:
- automation
tokens: ~1200
created: 'YYYY-MM-DD'
updated: 'YYYY-MM-DD'
command_name: /command-name
aliases: []
usage: /command-name [options]
requires_confirmation: false
modifies_files: true
network_access: false
type: command
tags:
- command
---

Guide Template

---
title: Topic Name Guide
type: guide
component_type: guide
version: 1.0.0
audience: customer
status: active
summary: Brief description of what this guide covers
keywords:
- topic
- tutorial
tokens: ~2000
created: 'YYYY-MM-DD'
updated: 'YYYY-MM-DD'
tags:
- guide
- tutorial
---

ADR Template

---
title: 'ADR-XXX: Decision Title'
component_type: adr
type: adr
version: 1.0.0
audience: contributor
status: accepted
summary: Brief summary of the decision
keywords:
- architecture
- decision
tokens: ~2500
created: 'YYYY-MM-DD'
updated: 'YYYY-MM-DD'
doc_type: adr
adr_number: XXX
deciders:
- Name
categories:
- architecture
supersedes: []
superseded_by: null
when_to_read: When working on related topics
prerequisites: []
related_adrs: []
tags:
- architecture
- adr
---

Success Output

When successful, this skill MUST output:

✅ SKILL COMPLETE: document-classification-patterns

Completed:
- [x] Document type classified: {type}
- [x] Confidence score: {score}
- [x] Approval type: {AUTO_APPROVED|JUDGE_APPROVED|DEEP_ANALYSIS_APPROVED|HUMAN_REVIEW_REQUIRED}
- [x] Frontmatter updated with classification metadata
- [x] Signal weights applied: frontmatter (0.6), content (0.3), path (0.2), title (0.15)

Outputs:
- Document: {file_path}
- Type: {agent|command|skill|guide|reference|workflow|adr}
- Confidence: {0.0-1.0}
- Approval: {approval_type}
- Keywords: {auto_generated_keywords}
- Tags: {auto_generated_tags}

Completion Checklist

Before marking this skill as complete, verify:

  • Document type classified into one of 7 categories (agent, command, skill, guide, reference, workflow, adr)
  • Confidence score calculated (0.0-1.0) based on weighted signals
  • Approval type determined based on confidence threshold
  • Frontmatter added/updated with type, component_type, moe_confidence, moe_classified
  • Keywords extracted and added to frontmatter
  • Tags generated based on document type and content
  • Classification date stamped (YYYY-MM-DD format)
  • Signal patterns matched (frontmatter, content, path, title)
  • Low confidence (<0.45) flagged for human review

Failure Indicators

This skill has FAILED if:

  • ❌ Confidence score is NaN or out of range (0.0-1.0)
  • ❌ Document type is None or invalid (not in 7 categories)
  • ❌ Frontmatter update fails or corrupts existing metadata
  • ❌ Signal detection returns no matches (all weights = 0)
  • ❌ Classification date format invalid
  • ❌ Approval type logic produces incorrect threshold assignment
  • ❌ Multiple equally strong type signals create ambiguity without resolution
  • ❌ File path or content encoding issues prevent analysis

When NOT to Use

Do NOT use this skill when:

  • Document already has valid frontmatter with type and high moe_confidence (>0.85)
  • File is not a Markdown document (.md extension)
  • Document is auto-generated build artifact (no persistent classification needed)
  • File is template or example documentation (not actual component)
  • Binary files or non-text content (images, PDFs, executables)
  • Temporary scratch files or session logs
  • External documentation outside CODITECT framework

Use alternatives:

  • Manual frontmatter: For templates and examples
  • Skip classification: For build artifacts and temporary files
  • External tool: For non-Markdown documentation formats
  • Batch reclassification: When standards change requiring re-analysis

Anti-Patterns (Avoid)

Anti-PatternProblemSolution
Overwriting existing high-confidence classificationLoss of validated metadataCheck existing moe_confidence before reclassifying
Relying solely on path signalsMisclassifies moved/reorganized filesUse multi-signal weighted approach
Ignoring frontmatter conflictsInconsistent metadataValidate against existing type and component_type
Auto-approving low confidencePoor quality classificationsEnforce threshold gates (≥0.85 for AUTO_APPROVED)
Missing content signal patternsIncomplete classificationScan for all 3+ signal types per category
Generic keyword extractionLow-value metadataExtract domain-specific keywords from content
No reclassification on content changeStale classificationsTrack content hash and retrigger on change

Principles

This skill embodies the following CODITECT principles:

  • #5 Eliminate Ambiguity - Multi-signal weighted classification with confidence scoring
  • #6 Clear, Understandable, Explainable - Transparent signal weights and threshold logic
  • #8 No Assumptions - Explicit confidence scoring triggers human review for low confidence
  • Trust & Transparency - Classification metadata fully exposed in frontmatter
  • Factual Grounding - Signal detection based on concrete patterns (keywords, paths, structure)
  • MoE Architecture - Confidence thresholds route to appropriate approval paths (auto/judge/deep/human)

Version: 1.1.0 | Created: 2025-12-28 | Updated: 2026-01-04 Author: CODITECT Core Team Framework: CODITECT v1.7.2