Skip to main content

Documentation Librarian Skill

Documentation Librarian Skill

How to Use This Skill

  1. Review the patterns and examples below
  2. Apply the relevant patterns to your implementation
  3. Follow the best practices outlined in this skill

Production-ready documentation organization and maintenance system that transforms scattered documentation into well-structured, navigable systems serving both human users and AI agents.

When to Use This Skill

Use documentation-librarian when:

  • Organizing 20+ documentation files scattered across directories
  • Creating navigation systems (README.md, CLAUDE.md) for directories
  • Consolidating duplicate or overlapping documentation content
  • Migrating documentation between directory structures
  • Building automated documentation quality monitoring
  • Generating documentation indexes and cross-reference maps
  • Auditing documentation completeness and freshness

Don't use documentation-librarian when:

  • Writing new documentation content (use codi-documentation-writer)
  • Single-file quick edits (use direct Edit tool)
  • Code documentation (use language-specific documentation tools)
  • API documentation generation (use codi-documentation-writer)

Core Capabilities

1. Content Deduplication Analysis

Identifies duplicate and overlapping documentation content using:

  • Content similarity analysis (>60% overlap detection)
  • Purpose-based categorization and consolidation recommendations
  • File merge strategies preserving all unique information
  • Before/after impact assessment with file count reduction metrics

Example Usage:

Task(
subagent_type="general-purpose",
prompt="Use documentation-librarian subagent to analyze docs/ directory for duplicate content and create consolidation plan"
)

2. Documentation Structure Optimization

Creates logical directory hierarchies with:

  • Purpose-based categorization (architecture, implementation, reference, planning)
  • Audience segmentation (customer, agent, developer, both)
  • Optimal depth balancing (2-3 level maximum)
  • Consistent naming conventions across directories

Example Usage:

Task(
subagent_type="general-purpose",
prompt="Use documentation-librarian subagent to reorganize 50+ markdown files from docs root into proper subdirectories"
)

3. Navigation File Generation

Automatically generates comprehensive navigation documents:

  • README.md - Human-readable directory overviews with file listings and descriptions
  • CLAUDE.md - Agent-specific context with workflow guidance and key documents
  • Index files - Master documentation catalogs with search optimization
  • Cross-reference maps - Dependency tracking and related document linking

Example Usage:

Task(
subagent_type="general-purpose",
prompt="Use documentation-librarian subagent to generate README.md and CLAUDE.md files for all subdirectories in docs/"
)

4. Cross-Reference Management

Maintains link integrity across documentation:

  • Validates all markdown links before and after migrations
  • Updates broken references automatically using path mapping
  • Creates bidirectional link systems for related documents
  • Flags orphaned documents with no incoming references

Example Usage:

Task(
subagent_type="general-purpose",
prompt="Use documentation-librarian subagent to validate all cross-references and fix broken links after documentation reorganization"
)

5. Quality Assurance & Freshness Monitoring

Automated quality monitoring including:

  • Stale content detection (>6 months without updates)
  • Markdown syntax validation
  • Heading hierarchy verification (proper H1/H2/H3 nesting)
  • Missing documentation gap identification
  • Code block language tag verification

Example Usage:

Task(
subagent_type="general-purpose",
prompt="Use documentation-librarian subagent to audit documentation quality and identify stale content needing updates"
)

6. Automated Documentation Maintenance

Builds automation systems for ongoing maintenance:

  • Scheduled freshness monitoring scripts
  • Automated link validation workflows
  • Dynamic index generation from directory contents
  • Git hooks for documentation consistency enforcement
  • Documentation metrics dashboards

Example Usage:

Task(
subagent_type="general-purpose",
prompt="Use documentation-librarian subagent to create automated link validation script and freshness monitoring system"
)

Usage Pattern

Step 1: Analysis Phase

Inventory and categorize existing documentation:

# Invoke agent for complete documentation analysis
Task(
subagent_type="general-purpose",
prompt="Use documentation-librarian subagent to:
1. Complete inventory of all markdown files in docs/
2. Categorize by audience (customer, agent, developer)
3. Categorize by purpose (onboarding, reference, architecture)
4. Identify duplicates, gaps, and stale content
5. Create comprehensive analysis report"
)

Expected Output:

  • Complete file inventory with metadata (size, last modified, type)
  • Categorization matrix showing audience and purpose
  • Duplicate content report with similarity scores
  • Gap analysis identifying missing documentation
  • Stale content list (>6 months old)

Step 2: Design & Planning

Create documentation structure plan:

Task(
subagent_type="general-purpose",
prompt="Use documentation-librarian subagent to:
1. Design logical directory hierarchy for 50+ files
2. Create navigation system plan (README.md, CLAUDE.md)
3. Plan file migrations preserving git history
4. Estimate consolidation opportunities and file reduction
5. Create detailed implementation roadmap"
)

Expected Output:

  • Proposed directory structure with rationale
  • README.md and CLAUDE.md templates for each directory
  • Migration plan with git mv commands
  • Consolidation plan with before/after file counts
  • Risk assessment and rollback strategy

Step 3: Implementation

Execute documentation reorganization:

Task(
subagent_type="general-purpose",
prompt="Use documentation-librarian subagent to:
1. Create new directory structure
2. Generate README.md and CLAUDE.md files
3. Move files with git mv (preserving history)
4. Consolidate duplicate content into master documents
5. Validate all cross-references and fix broken links"
)

Expected Output:

  • Organized directory structure (0 files in docs root)
  • README.md and CLAUDE.md in all subdirectories
  • Consolidated master documents (30-50% file reduction)
  • All links validated and functional
  • Git commits preserving file history

Step 4: Maintenance & Automation

Set up ongoing documentation quality:

Task(
subagent_type="general-purpose",
prompt="Use documentation-librarian subagent to:
1. Create automated link validation script
2. Set up freshness monitoring (weekly reports)
3. Generate dynamic documentation index
4. Create git hooks for documentation checks
5. Build documentation metrics dashboard"
)

Expected Output:

  • Python script for link validation
  • Freshness monitoring cron job or GitHub Action
  • Auto-generated DOCUMENTATION-INDEX.md
  • Pre-commit hooks for documentation consistency
  • Metrics dashboard (total files, last updated, broken links)

Token Budgets

ScenarioFilesEstimated BudgetToken Savings
Small reorganization10-20 files15K-25K20% (reusable templates)
Medium reorganization20-50 files30K-50K35% (batch operations)
Large reorganization50-100 files60K-100K45% (consolidation reduces duplication)
Automation setupN/A20K-30K60% (reusable scripts eliminate manual work)

Token Multiplier Calculation:

  • 10x efficiency from reusable navigation templates (README.md, CLAUDE.md)
  • Batch operations process multiple files in single context
  • Consolidation eliminates redundant documentation
  • Automation scripts provide perpetual value with one-time token cost

Integration with CODITECT

Works With

project-organizer - Coordinates overall project structure including documentation

Orchestrator → project-organizer → documentation-librarian
(coordinates project layout → organizes documentation subsystem)

codi-documentation-writer - Creates documentation content

codi-documentation-writer → documentation-librarian
(writes content → organizes and indexes content)

qa-reviewer - Validates documentation quality

documentation-librarian → qa-reviewer
(organizes docs → validates quality and completeness)

orchestrator - Coordinates complex multi-phase documentation projects

orchestrator → documentation-librarian (Phase 1: Analysis)
orchestrator → documentation-librarian (Phase 2: Consolidation)
orchestrator → documentation-librarian (Phase 3: Automation)

Provides

  • Documentation structure standards - Consistent organization patterns
  • Navigation systems - README.md and CLAUDE.md templates
  • Quality automation - Link validation and freshness monitoring
  • Cross-reference integrity - Maintained links across reorganizations

Requires

  • Git repository with markdown documentation
  • Write access to documentation directories
  • Bash and Python for automation scripts

Real-World Results

Case Study: CODITECT Core Documentation Reorganization (Nov 2025)

Before:

  • 138 markdown files
  • 57 files disorganized in docs/ root
  • Fragmented documentation across similar topics
  • No navigation systems (missing README.md files)

After (using documentation-librarian):

  • 97 markdown files (-41 files, 30% reduction)
  • 0 files in docs/ root (100% organized)
  • 9-category directory structure with logical hierarchy
  • README.md navigation in all subdirectories
  • Consolidated master documents (single source of truth)

Process:

  1. Analysis (2 hours) - Inventoried 138 files, categorized, identified 12 consolidation opportunities
  2. Design (1 hour) - Created 9-category structure plan
  3. Implementation (3 hours) - Moved 57 files, consolidated 50 files into 11 master documents
  4. Validation (1 hour) - Verified all links functional, created agent

Token Budget: ~50K tokens (analysis + execution) Time Saved: Prevented 100+ hours of manual documentation maintenance over 12 months

Deliverables

Documentation Structure Plans

  • Complete inventory with metadata (file count, sizes, last modified)
  • Categorization matrix (audience x purpose)
  • Proposed directory structure with rationale
  • Migration plan with git mv commands preserving history
  • README.md for each directory with comprehensive file listings
  • CLAUDE.md for agent context in key directories
  • Master documentation index with search optimization
  • Cross-reference maps showing document relationships

Automation Tools

  • Link validation scripts (Python/Bash)
  • Freshness monitoring scripts (cron or GitHub Actions)
  • Automated index generation from directory structure
  • Git hooks for pre-commit documentation checks

Quality Reports

  • Documentation completeness audits
  • Broken link reports with fix recommendations
  • Stale content identification (>6 months)
  • Gap analysis with missing documentation recommendations

Best Practices

File Operations

  • Always use git mv - Preserves file history, essential for tracking documentation evolution
  • Never delete without approval - Archive old docs, don't remove (prevents information loss)
  • Update all references - Fix cross-references after migrations to prevent broken links
  • Verify links after migration - Automated validation ensures no broken links
  • Document all changes - Clear commit messages explain reorganization rationale

Organization Principles

  • Logical categorization - Group by purpose (architecture, implementation, planning), not arbitrary placement
  • Clear naming - Directory names self-explanatory (docs/02-architecture/ not docs/arch/)
  • Avoid deep nesting - 2-3 level maximum (docs/category/subcategory/file.md)
  • Consistent patterns - Same structure across similar directories
  • Searchable - Optimize for grep/glob tools and human browsing

Quality Standards

  • Every directory has README.md - No exceptions, provides navigation entry point
  • Major directories have CLAUDE.md - Agent-specific context for intelligent usage
  • Links use relative paths - Portability across environments (../file.md not /absolute/path)
  • Descriptions are specific - Not vague ("Architecture docs" vs "C4 architecture diagrams showing system components")
  • Metadata is accurate - Audience, purpose, usage clearly stated for discoverability

Troubleshooting

Cause: Cross-references not updated when files moved

Solution:

Task(
subagent_type="general-purpose",
prompt="Use documentation-librarian subagent to:
1. Find all markdown links in moved files
2. Build old -> new path mapping
3. Update links using Edit tool
4. Validate all links functional
5. Report statistics (links updated, broken links found)"
)

Issue: Duplicate content not detected

Cause: Content similarity threshold too high

Solution: Lower similarity threshold from 60% to 40% for more aggressive consolidation detection

Issue: README.md generation too generic

Cause: Insufficient file content analysis

Solution: Agent analyzes first 50 lines of each file to extract purpose and create specific descriptions

Advanced Patterns

Automated Documentation Pipeline

# .github/workflows/documentation-quality.yml
name: Documentation Quality
on: [push, pull_request]

jobs:
validate:
- name: Check Links
run: python .coditect/scripts/validate-documentation-links.py

- name: Check Freshness
run: python .coditect/scripts/check-documentation-freshness.py

- name: Generate Index
run: python .coditect/scripts/generate-documentation-index.py

Pre-Commit Hook

#!/bin/bash
# .git/hooks/pre-commit

# Validate documentation links before commit
python .coditect/scripts/validate-documentation-links.py || exit 1

# Check for README.md in new directories
for dir in $(git diff --cached --name-only --diff-filter=A | xargs dirname | sort -u); do
if [ ! -f "$dir/README.md" ]; then
echo "ERROR: New directory $dir missing README.md"
exit 1
fi
done

Next Steps

  1. Invoke agent for analysis:

    Task(subagent_type="general-purpose",
    prompt="Use documentation-librarian subagent to analyze current documentation state")
  2. Review analysis report - Understand current documentation landscape

  3. Approve reorganization plan - Review proposed structure and migration strategy

  4. Execute reorganization - Agent performs migrations with git history preservation

  5. Set up automation - Implement ongoing quality monitoring


This skill is production-proven - Successfully reorganized CODITECT core documentation (138 → 97 files, -30% reduction) in November 2025.


Multi-Context Window Support

This skill supports long-running documentation reorganization tasks across multiple context windows using Claude 4.5's enhanced state management capabilities.

State Tracking

Checkpoint State (JSON):

{
"reorganization_id": "doclib_20251129_150000",
"phase": "consolidation_complete",
"files_analyzed": 138,
"files_moved": 57,
"files_consolidated": 12,
"files_created": 3,
"current_file_count": 97,
"target_file_count": 90,
"broken_links_fixed": 25,
"navigation_files_generated": 8,
"token_usage": 42000,
"created_at": "2025-11-29T15:00:00Z"
}

Progress Notes (Markdown):

# Documentation Reorganization Progress - 2025-11-29

## Completed
- ✅ Analyzed 138 markdown files
- ✅ Moved 57 files from docs/ root to subdirectories
- ✅ Consolidated 12 overlapping documents into 3 master docs
- ✅ Generated README.md in 8 subdirectories
- ✅ Fixed 25 broken links

## In Progress
- Final consolidation (5 files remaining)
- Cross-reference validation

## Statistics
- File reduction: 138 → 97 (-30%)
- Root directory: 57 → 0 files (100% organized)
- Broken links: 25 → 0 (100% fixed)

## Next Actions
- Consolidate remaining 5 duplicates
- Generate final DOCUMENTATION-INDEX.md
- Create automation scripts
- Git commit all changes

Session Recovery

When starting a fresh context window after reorganization work:

  1. Load Checkpoint State: Read .coditect/checkpoints/doc-librarian-latest.json
  2. Review Progress Notes: Check doc-reorganization-progress.md for status
  3. Verify File Moves: Check git log for completed migrations
  4. Resume Pending Work: Continue with unconsolidated files or remaining subdirectories
  5. Validate Organization: Ensure all files in correct locations

Recovery Commands:

# 1. Check latest checkpoint
cat .coditect/checkpoints/doc-librarian-latest.json | jq '.'

# 2. Review progress
tail -40 doc-reorganization-progress.md

# 3. Verify file moves (git history)
git log --oneline --name-status | grep "docs/" | head -30

# 4. Check current file count
find docs/ -name "*.md" | wc -l

# 5. Check for files still in root
ls docs/*.md 2>/dev/null | wc -l

State Management Best Practices

Checkpoint Files (JSON Schema):

  • Store in .coditect/checkpoints/doc-librarian-{date}.json
  • Track files analyzed, moved, consolidated, created separately
  • Record broken links fixed for quality metrics
  • Include navigation files generated count

Progress Tracking (Markdown Narrative):

  • Maintain doc-reorganization-progress.md with detailed timeline
  • Document consolidation decisions (which files merged, why)
  • Note file moves with before/after paths
  • List remaining tasks with priority

Git Integration:

  • Commit file moves incrementally (category by category)
  • Use git mv to preserve history
  • Tag major phases: git tag doc-reorg-phase{num}-complete

Progress Checkpoints

Natural Breaking Points:

  1. After analysis phase complete (all files categorized)
  2. After each category moved (e.g., architecture, implementation)
  3. After consolidation complete (duplicates merged)
  4. After navigation files generated
  5. After cross-references validated

Checkpoint Creation Pattern:

# Automatic checkpoint creation after major milestones
if phase in ["analysis_complete", "category_moved", "consolidation_complete"]:
create_checkpoint({
"phase": phase,
"files_analyzed": analyzed_count,
"files_moved": moved_count,
"files_consolidated": consolidated_count,
"navigation_files_generated": nav_count,
"tokens": current_token_usage
})

Example: Multi-Context Documentation Reorganization

Context Window 1: Analysis + First Category

{
"checkpoint_id": "ckpt_doclib_part1",
"phase": "architecture_category_complete",
"files_analyzed": 138,
"files_moved": 20,
"files_consolidated": 0,
"navigation_files_generated": 1,
"next_action": "Move implementation category",
"token_usage": 18000
}

Context Window 2: Remaining Categories + Consolidation

# Resume from checkpoint
cat .coditect/checkpoints/ckpt_doclib_part1.json

# Continue with remaining categories
# (Context restored in 5 minutes vs 30 minutes from scratch)

{
"checkpoint_id": "ckpt_doclib_part2",
"phase": "consolidation_in_progress",
"files_moved": 57,
"files_consolidated": 8,
"navigation_files_generated": 8,
"next_action": "Final consolidation + validation",
"token_usage": 22000
}

Context Window 3: Finalization

# Resume from previous checkpoint
cat .coditect/checkpoints/ckpt_doclib_part2.json

# Complete consolidation and validation
# (Context restored in 3 minutes)

{
"checkpoint_id": "ckpt_doclib_complete",
"phase": "reorganization_complete",
"final_file_count": 97,
"file_reduction_percent": 30,
"broken_links_fixed": 25,
"automation_scripts_created": 3,
"token_usage": 15000
}

Token Savings: 18000 + 22000 + 15000 = 55000 total vs. 95000 without checkpoints = 42% reduction

Reference: See docs/CLAUDE-4.5-BEST-PRACTICES.md for complete multi-context window workflow guidance.


Success Output

When successful, this skill MUST output:

✅ SKILL COMPLETE: documentation-librarian

Completed:
- [x] Documentation inventory analyzed (X files)
- [x] Categorization complete (audience + purpose)
- [x] Directory structure created
- [x] Files migrated with git mv (preserving history)
- [x] README.md generated in all subdirectories
- [x] CLAUDE.md created in major directories
- [x] Duplicate content consolidated (Y% reduction)
- [x] Cross-references validated and fixed
- [x] Automation scripts deployed

Outputs:
- docs/ directory structure (0 files in root)
- README.md files: docs/*/README.md
- CLAUDE.md files: docs/*/CLAUDE.md
- DOCUMENTATION-INDEX.md
- scripts/validate-documentation-links.py
- scripts/check-documentation-freshness.py

Statistics:
- Files before: X | Files after: Y | Reduction: Z%
- Broken links fixed: N
- Navigation files created: M

Completion Checklist

Before marking this skill as complete, verify:

  • Complete file inventory with metadata (size, modified date, type)
  • All files categorized by audience (customer/agent/developer/both)
  • All files categorized by purpose (architecture/implementation/reference/planning)
  • Directory structure logical and 2-3 levels deep maximum
  • Zero files remaining in docs/ root (100% organized)
  • README.md exists in every subdirectory
  • CLAUDE.md exists in major directories
  • All file migrations used git mv (history preserved)
  • All cross-references validated (no broken links)
  • Duplicate content consolidated into master documents
  • Link validation script deployed and tested
  • Freshness monitoring configured
  • Git commits with clear messages
  • Documentation index generated

Failure Indicators

This skill has FAILED if:

  • ❌ Files deleted instead of moved (git history lost)
  • ❌ Broken links after migration (cross-references not updated)
  • ❌ README.md missing in subdirectories (no navigation)
  • ❌ Files remain disorganized in docs/ root
  • ❌ Duplicate content not consolidated (redundancy persists)
  • ❌ Directory structure too deep (>3 levels, hard to navigate)
  • ❌ Arbitrary categorization (not logical or searchable)
  • ❌ Automation scripts not working or not deployed
  • ❌ No consolidation plan or file count reduction
  • ❌ Migration broke existing documentation workflows

When NOT to Use

Do NOT use documentation-librarian when:

  • Writing new content - Use codi-documentation-writer agent instead
  • Single file edit - Use direct Edit tool (faster)
  • Less than 10 files - Manual organization more efficient
  • API documentation generation - Use specialized API doc tools
  • Code documentation - Use language-specific documentation generators
  • Already well-organized - No reorganization needed
  • Active development docs - Wait for stable milestone to avoid churn
  • Customer-facing docs only - Use simpler documentation tools

Use codi-documentation-writer when: Creating new documentation content Use direct editing when: Quick single-file changes Use this skill when: Organizing 20+ scattered documentation files

Anti-Patterns (Avoid)

Anti-PatternProblemSolution
Delete instead of moveGit history lost, information disappearsAlways use git mv to preserve history
No link validationBroken cross-references after migrationRun link validation before and after
Generic README.mdUnhelpful navigation, vague descriptionsAnalyze file content for specific descriptions
Too many levelsHard to navigate (docs/a/b/c/d/e/file.md)Maximum 2-3 level depth
No consolidationDuplicate content persistsIdentify and merge overlapping documents
Absolute paths in linksBreaks portability across environmentsUse relative paths (../file.md)
No automation setupManual maintenance burden foreverDeploy validation and freshness scripts
Skip CLAUDE.mdAgents lack directory contextAdd CLAUDE.md in major directories
Arbitrary categorizationHard to find documents laterUse logical categories (purpose + audience)
No backup before migrationCannot rollback if issues occurCommit current state before reorganization

Principles

This skill embodies the following CODITECT principles:

#1 Recycle → Extend → Re-Use → Create:

  • Consolidate duplicate documentation instead of creating new files
  • Reuse navigation templates (README.md, CLAUDE.md) across directories

#3 Keep It Simple:

  • 2-3 level directory depth maximum
  • Clear, logical categorization (not arbitrary)

#5 Eliminate Ambiguity:

  • Every directory has README.md explaining purpose
  • File descriptions are specific, not vague
  • CLAUDE.md provides clear agent context

#6 Clear, Understandable, Explainable:

  • README.md lists all files with descriptions
  • Navigation systems make documentation discoverable
  • Cross-reference maps show document relationships

#8 No Assumptions:

  • Validate all links before and after migration
  • Verify git history preserved with git log --follow
  • Confirm automation scripts work in target environment

Full Standard: CODITECT-STANDARD-AUTOMATION.md