Skip to main content

/skill-from-docs

Purpose

Generate a Claude Code skill from a documentation website. Implements smart scraping, categorization, pattern extraction, and AI enhancement.

Syntax

/skill-from-docs <url> [--name <skill-name>] [--max-pages <n>] [--output <path>]

Parameters

ParameterTypeRequiredDefaultDescription
urlstringYes-Documentation website URL
--namestringNoderivedSkill name
--max-pagesintegerNo500Maximum pages to scrape
--outputstringNo~/.coditect/skills/{name}/Output directory
--asyncflagNotrueEnable async scraping
--workersintegerNo8Async worker count
--estimate-onlyflagNofalseOnly estimate page count
--skip-enhanceflagNofalseSkip AI enhancement
--targetstringNoclaudeTarget platform (claude/gemini/openai/markdown)
--rate-limitfloatNo0.5Requests per second

Examples

Basic Usage

# Generate skill from React documentation
/skill-from-docs https://react.dev/

# With custom name
/skill-from-docs https://fastapi.tiangolo.com/ --name fastapi

Advanced Usage

# Estimate pages first
/skill-from-docs https://docs.python.org/ --estimate-only

# Large documentation with limits
/skill-from-docs https://godotengine.org/docs/ --max-pages 2000 --workers 8

# Custom output location
/skill-from-docs https://flask.palletsprojects.com/ --output ~/my-skills/flask/

# For Google Gemini
/skill-from-docs https://react.dev/ --target gemini

Workflow Examples

# 1. Estimate → Scrape → Enhance → Package
/skill-from-docs https://react.dev/ --estimate-only
# Output: ~180 pages estimated

/skill-from-docs https://react.dev/
# Output: SKILL.md + references/ created

# 2. Quick iteration (skip enhancement)
/skill-from-docs https://flask.palletsprojects.com/ --skip-enhance

# 3. Multi-platform generation
/skill-from-docs https://react.dev/ --target claude
/skill-from-docs https://react.dev/ --target gemini
/skill-from-docs https://react.dev/ --target openai

Output

~/.coditect/skills/{name}/
├── SKILL.md # AI-enhanced (300+ lines)
├── references/
│ ├── index.md # Category index
│ ├── getting_started/ # Categorized content
│ ├── api_reference/
│ ├── tutorials/
│ └── patterns/
├── examples/
│ ├── basic/
│ └── advanced/
└── metadata.json
{
"source_url": "...",
"pages_scraped": 180,
"scrape_date": "2026-01-23",
"quality_score": 8.7
}

What Happens

  1. URL Validation: Verify URL is accessible
  2. Strategy Detection: Check for llms.txt, sitemap, or BFS
  3. Page Scraping: Extract content with rate limiting
  4. Categorization: Smart category assignment
  5. Pattern Extraction: Find code patterns and examples
  6. AI Enhancement: Generate comprehensive SKILL.md
  7. Quality Check: Validate against CODITECT standards

Success Output

✅ COMMAND COMPLETE: /skill-from-docs

Scraping Summary:
- [x] URL: https://react.dev/
- [x] Strategy: llms.txt (10x faster)
- [x] Pages: 180/180 (100%)
- [x] Categories: 6
- [x] Patterns: 23

Output: ~/.coditect/skills/react/
- SKILL.md (342 lines, AI-enhanced)
- references/ (6 categories)
- examples/ (12 examples)

Quality Score: 8.7/10

When NOT to Use

Do NOT use this command when:

  • Documentation requires authentication (OAuth, API keys, login) - use manual download instead
  • Target site has aggressive anti-bot protection (Cloudflare, captchas) - will fail
  • Documentation is behind a paywall - not supported
  • Source is a video tutorial - not supported
  • Documentation is a single PDF file - use /pdf skill instead
  • Need to combine docs with code analysis - use skill-generator-orchestrator instead
  • Documentation is not in English - limited support
  • Quick one-off lookup - just ask Claude directly

Completion Checklist

Before marking this command as complete, verify:

  • URL validated and accessible
  • Scraping strategy determined (llms.txt, sitemap, or BFS)
  • All pages scraped within configured limits
  • Content categorized into correct directories
  • Code blocks extracted with language detection
  • Patterns identified and documented
  • SKILL.md generated (300+ lines if AI-enhanced)
  • Output directory structure matches specification
  • metadata.json created with scraping metrics

Failure Indicators

  • ❌ URL not accessible (404, 403, timeout)
  • ❌ robots.txt blocks scraping
  • ❌ Zero pages scraped
  • ❌ Rate limited by target site
  • ❌ SKILL.md under 100 lines
  • ❌ Categorization confidence below 50%
  • ❌ Output directory not created

Anti-Patterns (Avoid)

Anti-PatternProblemSolution
No rate limitingIP blocked, incomplete scrapeUse --rate-limit 0.5 minimum
Ignoring robots.txtLegal issues, site blocksCommand respects robots.txt by default
Scraping auth pagesEmpty contentUse manual download + PDF analysis
Too many workersServer overload, blocksLimit --workers 8 maximum
No max-pages limitHours of scrapingSet reasonable --max-pages
Skipping estimationUnexpected large sitesRun --estimate-only first
Skipping AI enhancementGeneric outputOnly use --skip-enhance for iteration

Verification

After execution, verify success:

# 1. Check output directory exists
ls -la ~/.coditect/skills/{name}/

# 2. Verify SKILL.md length (should be 200+ lines)
wc -l ~/.coditect/skills/{name}/SKILL.md

# 3. Check categorization
ls -la ~/.coditect/skills/{name}/references/

# 4. Validate metadata
cat ~/.coditect/skills/{name}/metadata.json | python3 -m json.tool

# 5. Count extracted code blocks
grep -r '```' ~/.coditect/skills/{name}/references/ | wc -l
  • Agent: doc-to-skill-converter
  • Orchestrator: skill-generator-orchestrator
  • Companion: /skill-from-repo
  • Skill: multi-source-skill-generation