Documentation Librarian Skill
Documentation Librarian Skill
How to Use This Skill
- Review the patterns and examples below
- Apply the relevant patterns to your implementation
- Follow the best practices outlined in this skill
Production-ready documentation organization and maintenance system that transforms scattered documentation into well-structured, navigable systems serving both human users and AI agents.
When to Use This Skill
✅ Use documentation-librarian when:
- Organizing 20+ documentation files scattered across directories
- Creating navigation systems (README.md, CLAUDE.md) for directories
- Consolidating duplicate or overlapping documentation content
- Migrating documentation between directory structures
- Building automated documentation quality monitoring
- Generating documentation indexes and cross-reference maps
- Auditing documentation completeness and freshness
❌ Don't use documentation-librarian when:
- Writing new documentation content (use codi-documentation-writer)
- Single-file quick edits (use direct Edit tool)
- Code documentation (use language-specific documentation tools)
- API documentation generation (use codi-documentation-writer)
Core Capabilities
1. Content Deduplication Analysis
Identifies duplicate and overlapping documentation content using:
- Content similarity analysis (>60% overlap detection)
- Purpose-based categorization and consolidation recommendations
- File merge strategies preserving all unique information
- Before/after impact assessment with file count reduction metrics
Example Usage:
Task(
subagent_type="general-purpose",
prompt="Use documentation-librarian subagent to analyze docs/ directory for duplicate content and create consolidation plan"
)
2. Documentation Structure Optimization
Creates logical directory hierarchies with:
- Purpose-based categorization (architecture, implementation, reference, planning)
- Audience segmentation (customer, agent, developer, both)
- Optimal depth balancing (2-3 level maximum)
- Consistent naming conventions across directories
Example Usage:
Task(
subagent_type="general-purpose",
prompt="Use documentation-librarian subagent to reorganize 50+ markdown files from docs root into proper subdirectories"
)
3. Navigation File Generation
Automatically generates comprehensive navigation documents:
- README.md - Human-readable directory overviews with file listings and descriptions
- CLAUDE.md - Agent-specific context with workflow guidance and key documents
- Index files - Master documentation catalogs with search optimization
- Cross-reference maps - Dependency tracking and related document linking
Example Usage:
Task(
subagent_type="general-purpose",
prompt="Use documentation-librarian subagent to generate README.md and CLAUDE.md files for all subdirectories in docs/"
)
4. Cross-Reference Management
Maintains link integrity across documentation:
- Validates all markdown links before and after migrations
- Updates broken references automatically using path mapping
- Creates bidirectional link systems for related documents
- Flags orphaned documents with no incoming references
Example Usage:
Task(
subagent_type="general-purpose",
prompt="Use documentation-librarian subagent to validate all cross-references and fix broken links after documentation reorganization"
)
5. Quality Assurance & Freshness Monitoring
Automated quality monitoring including:
- Stale content detection (>6 months without updates)
- Markdown syntax validation
- Heading hierarchy verification (proper H1/H2/H3 nesting)
- Missing documentation gap identification
- Code block language tag verification
Example Usage:
Task(
subagent_type="general-purpose",
prompt="Use documentation-librarian subagent to audit documentation quality and identify stale content needing updates"
)
6. Automated Documentation Maintenance
Builds automation systems for ongoing maintenance:
- Scheduled freshness monitoring scripts
- Automated link validation workflows
- Dynamic index generation from directory contents
- Git hooks for documentation consistency enforcement
- Documentation metrics dashboards
Example Usage:
Task(
subagent_type="general-purpose",
prompt="Use documentation-librarian subagent to create automated link validation script and freshness monitoring system"
)
Usage Pattern
Step 1: Analysis Phase
Inventory and categorize existing documentation:
# Invoke agent for complete documentation analysis
Task(
subagent_type="general-purpose",
prompt="Use documentation-librarian subagent to:
1. Complete inventory of all markdown files in docs/
2. Categorize by audience (customer, agent, developer)
3. Categorize by purpose (onboarding, reference, architecture)
4. Identify duplicates, gaps, and stale content
5. Create comprehensive analysis report"
)
Expected Output:
- Complete file inventory with metadata (size, last modified, type)
- Categorization matrix showing audience and purpose
- Duplicate content report with similarity scores
- Gap analysis identifying missing documentation
- Stale content list (>6 months old)
Step 2: Design & Planning
Create documentation structure plan:
Task(
subagent_type="general-purpose",
prompt="Use documentation-librarian subagent to:
1. Design logical directory hierarchy for 50+ files
2. Create navigation system plan (README.md, CLAUDE.md)
3. Plan file migrations preserving git history
4. Estimate consolidation opportunities and file reduction
5. Create detailed implementation roadmap"
)
Expected Output:
- Proposed directory structure with rationale
- README.md and CLAUDE.md templates for each directory
- Migration plan with git mv commands
- Consolidation plan with before/after file counts
- Risk assessment and rollback strategy
Step 3: Implementation
Execute documentation reorganization:
Task(
subagent_type="general-purpose",
prompt="Use documentation-librarian subagent to:
1. Create new directory structure
2. Generate README.md and CLAUDE.md files
3. Move files with git mv (preserving history)
4. Consolidate duplicate content into master documents
5. Validate all cross-references and fix broken links"
)
Expected Output:
- Organized directory structure (0 files in docs root)
- README.md and CLAUDE.md in all subdirectories
- Consolidated master documents (30-50% file reduction)
- All links validated and functional
- Git commits preserving file history
Step 4: Maintenance & Automation
Set up ongoing documentation quality:
Task(
subagent_type="general-purpose",
prompt="Use documentation-librarian subagent to:
1. Create automated link validation script
2. Set up freshness monitoring (weekly reports)
3. Generate dynamic documentation index
4. Create git hooks for documentation checks
5. Build documentation metrics dashboard"
)
Expected Output:
- Python script for link validation
- Freshness monitoring cron job or GitHub Action
- Auto-generated DOCUMENTATION-INDEX.md
- Pre-commit hooks for documentation consistency
- Metrics dashboard (total files, last updated, broken links)
Token Budgets
| Scenario | Files | Estimated Budget | Token Savings |
|---|---|---|---|
| Small reorganization | 10-20 files | 15K-25K | 20% (reusable templates) |
| Medium reorganization | 20-50 files | 30K-50K | 35% (batch operations) |
| Large reorganization | 50-100 files | 60K-100K | 45% (consolidation reduces duplication) |
| Automation setup | N/A | 20K-30K | 60% (reusable scripts eliminate manual work) |
Token Multiplier Calculation:
- 10x efficiency from reusable navigation templates (README.md, CLAUDE.md)
- Batch operations process multiple files in single context
- Consolidation eliminates redundant documentation
- Automation scripts provide perpetual value with one-time token cost
Integration with CODITECT
Works With
project-organizer - Coordinates overall project structure including documentation
Orchestrator → project-organizer → documentation-librarian
(coordinates project layout → organizes documentation subsystem)
codi-documentation-writer - Creates documentation content
codi-documentation-writer → documentation-librarian
(writes content → organizes and indexes content)
qa-reviewer - Validates documentation quality
documentation-librarian → qa-reviewer
(organizes docs → validates quality and completeness)
orchestrator - Coordinates complex multi-phase documentation projects
orchestrator → documentation-librarian (Phase 1: Analysis)
orchestrator → documentation-librarian (Phase 2: Consolidation)
orchestrator → documentation-librarian (Phase 3: Automation)
Provides
- Documentation structure standards - Consistent organization patterns
- Navigation systems - README.md and CLAUDE.md templates
- Quality automation - Link validation and freshness monitoring
- Cross-reference integrity - Maintained links across reorganizations
Requires
- Git repository with markdown documentation
- Write access to documentation directories
- Bash and Python for automation scripts
Real-World Results
Case Study: CODITECT Core Documentation Reorganization (Nov 2025)
Before:
- 138 markdown files
- 57 files disorganized in docs/ root
- Fragmented documentation across similar topics
- No navigation systems (missing README.md files)
After (using documentation-librarian):
- 97 markdown files (-41 files, 30% reduction)
- 0 files in docs/ root (100% organized)
- 9-category directory structure with logical hierarchy
- README.md navigation in all subdirectories
- Consolidated master documents (single source of truth)
Process:
- Analysis (2 hours) - Inventoried 138 files, categorized, identified 12 consolidation opportunities
- Design (1 hour) - Created 9-category structure plan
- Implementation (3 hours) - Moved 57 files, consolidated 50 files into 11 master documents
- Validation (1 hour) - Verified all links functional, created agent
Token Budget: ~50K tokens (analysis + execution) Time Saved: Prevented 100+ hours of manual documentation maintenance over 12 months
Deliverables
Documentation Structure Plans
- Complete inventory with metadata (file count, sizes, last modified)
- Categorization matrix (audience x purpose)
- Proposed directory structure with rationale
- Migration plan with git mv commands preserving history
Navigation Documents
- README.md for each directory with comprehensive file listings
- CLAUDE.md for agent context in key directories
- Master documentation index with search optimization
- Cross-reference maps showing document relationships
Automation Tools
- Link validation scripts (Python/Bash)
- Freshness monitoring scripts (cron or GitHub Actions)
- Automated index generation from directory structure
- Git hooks for pre-commit documentation checks
Quality Reports
- Documentation completeness audits
- Broken link reports with fix recommendations
- Stale content identification (>6 months)
- Gap analysis with missing documentation recommendations
Best Practices
File Operations
- Always use git mv - Preserves file history, essential for tracking documentation evolution
- Never delete without approval - Archive old docs, don't remove (prevents information loss)
- Update all references - Fix cross-references after migrations to prevent broken links
- Verify links after migration - Automated validation ensures no broken links
- Document all changes - Clear commit messages explain reorganization rationale
Organization Principles
- Logical categorization - Group by purpose (architecture, implementation, planning), not arbitrary placement
- Clear naming - Directory names self-explanatory (docs/02-architecture/ not docs/arch/)
- Avoid deep nesting - 2-3 level maximum (docs/category/subcategory/file.md)
- Consistent patterns - Same structure across similar directories
- Searchable - Optimize for grep/glob tools and human browsing
Quality Standards
- Every directory has README.md - No exceptions, provides navigation entry point
- Major directories have CLAUDE.md - Agent-specific context for intelligent usage
- Links use relative paths - Portability across environments (../file.md not /absolute/path)
- Descriptions are specific - Not vague ("Architecture docs" vs "C4 architecture diagrams showing system components")
- Metadata is accurate - Audience, purpose, usage clearly stated for discoverability
Troubleshooting
Issue: Broken links after migration
Cause: Cross-references not updated when files moved
Solution:
Task(
subagent_type="general-purpose",
prompt="Use documentation-librarian subagent to:
1. Find all markdown links in moved files
2. Build old -> new path mapping
3. Update links using Edit tool
4. Validate all links functional
5. Report statistics (links updated, broken links found)"
)
Issue: Duplicate content not detected
Cause: Content similarity threshold too high
Solution: Lower similarity threshold from 60% to 40% for more aggressive consolidation detection
Issue: README.md generation too generic
Cause: Insufficient file content analysis
Solution: Agent analyzes first 50 lines of each file to extract purpose and create specific descriptions
Advanced Patterns
Automated Documentation Pipeline
# .github/workflows/documentation-quality.yml
name: Documentation Quality
on: [push, pull_request]
jobs:
validate:
- name: Check Links
run: python .coditect/scripts/validate-documentation-links.py
- name: Check Freshness
run: python .coditect/scripts/check-documentation-freshness.py
- name: Generate Index
run: python .coditect/scripts/generate-documentation-index.py
Pre-Commit Hook
#!/bin/bash
# .git/hooks/pre-commit
# Validate documentation links before commit
python .coditect/scripts/validate-documentation-links.py || exit 1
# Check for README.md in new directories
for dir in $(git diff --cached --name-only --diff-filter=A | xargs dirname | sort -u); do
if [ ! -f "$dir/README.md" ]; then
echo "ERROR: New directory $dir missing README.md"
exit 1
fi
done
Next Steps
-
Invoke agent for analysis:
Task(subagent_type="general-purpose",
prompt="Use documentation-librarian subagent to analyze current documentation state") -
Review analysis report - Understand current documentation landscape
-
Approve reorganization plan - Review proposed structure and migration strategy
-
Execute reorganization - Agent performs migrations with git history preservation
-
Set up automation - Implement ongoing quality monitoring
This skill is production-proven - Successfully reorganized CODITECT core documentation (138 → 97 files, -30% reduction) in November 2025.
Multi-Context Window Support
This skill supports long-running documentation reorganization tasks across multiple context windows using Claude 4.5's enhanced state management capabilities.
State Tracking
Checkpoint State (JSON):
{
"reorganization_id": "doclib_20251129_150000",
"phase": "consolidation_complete",
"files_analyzed": 138,
"files_moved": 57,
"files_consolidated": 12,
"files_created": 3,
"current_file_count": 97,
"target_file_count": 90,
"broken_links_fixed": 25,
"navigation_files_generated": 8,
"token_usage": 42000,
"created_at": "2025-11-29T15:00:00Z"
}
Progress Notes (Markdown):
# Documentation Reorganization Progress - 2025-11-29
## Completed
- ✅ Analyzed 138 markdown files
- ✅ Moved 57 files from docs/ root to subdirectories
- ✅ Consolidated 12 overlapping documents into 3 master docs
- ✅ Generated README.md in 8 subdirectories
- ✅ Fixed 25 broken links
## In Progress
- Final consolidation (5 files remaining)
- Cross-reference validation
## Statistics
- File reduction: 138 → 97 (-30%)
- Root directory: 57 → 0 files (100% organized)
- Broken links: 25 → 0 (100% fixed)
## Next Actions
- Consolidate remaining 5 duplicates
- Generate final DOCUMENTATION-INDEX.md
- Create automation scripts
- Git commit all changes
Session Recovery
When starting a fresh context window after reorganization work:
- Load Checkpoint State: Read
.coditect/checkpoints/doc-librarian-latest.json - Review Progress Notes: Check
doc-reorganization-progress.mdfor status - Verify File Moves: Check git log for completed migrations
- Resume Pending Work: Continue with unconsolidated files or remaining subdirectories
- Validate Organization: Ensure all files in correct locations
Recovery Commands:
# 1. Check latest checkpoint
cat .coditect/checkpoints/doc-librarian-latest.json | jq '.'
# 2. Review progress
tail -40 doc-reorganization-progress.md
# 3. Verify file moves (git history)
git log --oneline --name-status | grep "docs/" | head -30
# 4. Check current file count
find docs/ -name "*.md" | wc -l
# 5. Check for files still in root
ls docs/*.md 2>/dev/null | wc -l
State Management Best Practices
Checkpoint Files (JSON Schema):
- Store in
.coditect/checkpoints/doc-librarian-{date}.json - Track files analyzed, moved, consolidated, created separately
- Record broken links fixed for quality metrics
- Include navigation files generated count
Progress Tracking (Markdown Narrative):
- Maintain
doc-reorganization-progress.mdwith detailed timeline - Document consolidation decisions (which files merged, why)
- Note file moves with before/after paths
- List remaining tasks with priority
Git Integration:
- Commit file moves incrementally (category by category)
- Use git mv to preserve history
- Tag major phases:
git tag doc-reorg-phase{num}-complete
Progress Checkpoints
Natural Breaking Points:
- After analysis phase complete (all files categorized)
- After each category moved (e.g., architecture, implementation)
- After consolidation complete (duplicates merged)
- After navigation files generated
- After cross-references validated
Checkpoint Creation Pattern:
# Automatic checkpoint creation after major milestones
if phase in ["analysis_complete", "category_moved", "consolidation_complete"]:
create_checkpoint({
"phase": phase,
"files_analyzed": analyzed_count,
"files_moved": moved_count,
"files_consolidated": consolidated_count,
"navigation_files_generated": nav_count,
"tokens": current_token_usage
})
Example: Multi-Context Documentation Reorganization
Context Window 1: Analysis + First Category
{
"checkpoint_id": "ckpt_doclib_part1",
"phase": "architecture_category_complete",
"files_analyzed": 138,
"files_moved": 20,
"files_consolidated": 0,
"navigation_files_generated": 1,
"next_action": "Move implementation category",
"token_usage": 18000
}
Context Window 2: Remaining Categories + Consolidation
# Resume from checkpoint
cat .coditect/checkpoints/ckpt_doclib_part1.json
# Continue with remaining categories
# (Context restored in 5 minutes vs 30 minutes from scratch)
{
"checkpoint_id": "ckpt_doclib_part2",
"phase": "consolidation_in_progress",
"files_moved": 57,
"files_consolidated": 8,
"navigation_files_generated": 8,
"next_action": "Final consolidation + validation",
"token_usage": 22000
}
Context Window 3: Finalization
# Resume from previous checkpoint
cat .coditect/checkpoints/ckpt_doclib_part2.json
# Complete consolidation and validation
# (Context restored in 3 minutes)
{
"checkpoint_id": "ckpt_doclib_complete",
"phase": "reorganization_complete",
"final_file_count": 97,
"file_reduction_percent": 30,
"broken_links_fixed": 25,
"automation_scripts_created": 3,
"token_usage": 15000
}
Token Savings: 18000 + 22000 + 15000 = 55000 total vs. 95000 without checkpoints = 42% reduction
Reference: See docs/CLAUDE-4.5-BEST-PRACTICES.md for complete multi-context window workflow guidance.
Success Output
When successful, this skill MUST output:
✅ SKILL COMPLETE: documentation-librarian
Completed:
- [x] Documentation inventory analyzed (X files)
- [x] Categorization complete (audience + purpose)
- [x] Directory structure created
- [x] Files migrated with git mv (preserving history)
- [x] README.md generated in all subdirectories
- [x] CLAUDE.md created in major directories
- [x] Duplicate content consolidated (Y% reduction)
- [x] Cross-references validated and fixed
- [x] Automation scripts deployed
Outputs:
- docs/ directory structure (0 files in root)
- README.md files: docs/*/README.md
- CLAUDE.md files: docs/*/CLAUDE.md
- DOCUMENTATION-INDEX.md
- scripts/validate-documentation-links.py
- scripts/check-documentation-freshness.py
Statistics:
- Files before: X | Files after: Y | Reduction: Z%
- Broken links fixed: N
- Navigation files created: M
Completion Checklist
Before marking this skill as complete, verify:
- Complete file inventory with metadata (size, modified date, type)
- All files categorized by audience (customer/agent/developer/both)
- All files categorized by purpose (architecture/implementation/reference/planning)
- Directory structure logical and 2-3 levels deep maximum
- Zero files remaining in docs/ root (100% organized)
- README.md exists in every subdirectory
- CLAUDE.md exists in major directories
- All file migrations used
git mv(history preserved) - All cross-references validated (no broken links)
- Duplicate content consolidated into master documents
- Link validation script deployed and tested
- Freshness monitoring configured
- Git commits with clear messages
- Documentation index generated
Failure Indicators
This skill has FAILED if:
- ❌ Files deleted instead of moved (git history lost)
- ❌ Broken links after migration (cross-references not updated)
- ❌ README.md missing in subdirectories (no navigation)
- ❌ Files remain disorganized in docs/ root
- ❌ Duplicate content not consolidated (redundancy persists)
- ❌ Directory structure too deep (>3 levels, hard to navigate)
- ❌ Arbitrary categorization (not logical or searchable)
- ❌ Automation scripts not working or not deployed
- ❌ No consolidation plan or file count reduction
- ❌ Migration broke existing documentation workflows
When NOT to Use
Do NOT use documentation-librarian when:
- Writing new content - Use
codi-documentation-writeragent instead - Single file edit - Use direct Edit tool (faster)
- Less than 10 files - Manual organization more efficient
- API documentation generation - Use specialized API doc tools
- Code documentation - Use language-specific documentation generators
- Already well-organized - No reorganization needed
- Active development docs - Wait for stable milestone to avoid churn
- Customer-facing docs only - Use simpler documentation tools
Use codi-documentation-writer when: Creating new documentation content Use direct editing when: Quick single-file changes Use this skill when: Organizing 20+ scattered documentation files
Anti-Patterns (Avoid)
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Delete instead of move | Git history lost, information disappears | Always use git mv to preserve history |
| No link validation | Broken cross-references after migration | Run link validation before and after |
| Generic README.md | Unhelpful navigation, vague descriptions | Analyze file content for specific descriptions |
| Too many levels | Hard to navigate (docs/a/b/c/d/e/file.md) | Maximum 2-3 level depth |
| No consolidation | Duplicate content persists | Identify and merge overlapping documents |
| Absolute paths in links | Breaks portability across environments | Use relative paths (../file.md) |
| No automation setup | Manual maintenance burden forever | Deploy validation and freshness scripts |
| Skip CLAUDE.md | Agents lack directory context | Add CLAUDE.md in major directories |
| Arbitrary categorization | Hard to find documents later | Use logical categories (purpose + audience) |
| No backup before migration | Cannot rollback if issues occur | Commit current state before reorganization |
Principles
This skill embodies the following CODITECT principles:
#1 Recycle → Extend → Re-Use → Create:
- Consolidate duplicate documentation instead of creating new files
- Reuse navigation templates (README.md, CLAUDE.md) across directories
#3 Keep It Simple:
- 2-3 level directory depth maximum
- Clear, logical categorization (not arbitrary)
#5 Eliminate Ambiguity:
- Every directory has README.md explaining purpose
- File descriptions are specific, not vague
- CLAUDE.md provides clear agent context
#6 Clear, Understandable, Explainable:
- README.md lists all files with descriptions
- Navigation systems make documentation discoverable
- Cross-reference maps show document relationships
#8 No Assumptions:
- Validate all links before and after migration
- Verify git history preserved with
git log --follow - Confirm automation scripts work in target environment
Full Standard: CODITECT-STANDARD-AUTOMATION.md