Skill Improvement Tracker
Continuously track, analyze, and improve skills based on real session outcomes. This skill implements the Continual Learning Loop where every session compounds into better future skills.
When to Use
Use this skill when:
- Ending a significant work session (run retrospective)
- A skill invocation failed or underperformed
- During optimization sprints or quality reviews
- Before major deployments (validate skill health)
- After noticing repeated issues with a skill
Don't use when:
- Quick 1-2 turn interactions
- Non-skill-related tasks
- Already in a retrospective session
Core Concepts
The Continual Learning Loop
┌──────────────────────────────────────────────────────────────┐
│ CONTINUAL LEARNING LOOP │
├──────────────────────────────────────────────────────────────┤
│ │
│ SESSION START │
│ │ │
│ ▼ │
│ ┌───────────────┐ │
│ │ Load Baseline │◀──────────────────────────────┐ │
│ │ Learnings │ │ │
│ └───────┬───────┘ │ │
│ │ │ │
│ ▼ │ │
│ ┌───────────────┐ │ │
│ │ Apply Known │ │ │
│ │ Optimizations │ │ │
│ └───────┬───────┘ │ │
│ │ │ │
│ ▼ │ │
│ ┌───────────────┐ │ │
│ │ Track Skill │ │ │
│ │ Invocations │ │ │
│ └───────┬───────┘ │ │
│ │ │ │
│ ▼ │ │
│ ┌───────────────┐ │ │
│ │ Detect │ │ │
│ │ Patterns │──▶ Real-time alerts │ │
│ └───────┬───────┘ │ │
│ │ │ │
│ SESSION END │ │
│ │ │ │
│ ▼ │ │
│ ┌───────────────┐ │ │
│ │ Run │ │ │
│ │ Retrospective │ │ │
│ └───────┬───────┘ │ │
│ │ │ │
│ ▼ │ │
│ ┌───────────────┐ ┌───────────────┐ │ │
│ │ Generate │────▶│ Update │────────┘ │
│ │ Improvements │ │ skill-learnings.json │
│ └───────────────┘ └───────────────┘ │
│ │
└──────────────────────────────────────────────────────────────┘
Skill Effectiveness Scoring
Skills are scored based on invocation outcomes:
| Metric | Weight | Description |
|---|---|---|
| Success Rate | 40% | % of invocations completing successfully |
| Error Rate | 30% | % of invocations with errors |
| Retry Rate | 20% | % requiring multiple attempts |
| User Satisfaction | 10% | Explicit feedback (if available) |
Score Thresholds:
- 90-100%: Excellent - maintain current state
- 70-89%: Good - minor optimizations
- 50-69%: Needs Work - improve triggers and examples
- <50%: Critical - major revision required
Learning Loop Workflow
Phase 1: Session Start - Load Baselines
At session start, load accumulated learnings:
# Check skill health before starting
python3 hooks/session-retrospective.py --analyze-skills --json
What to Load:
- skill-learnings.json - Historical success/failure patterns
- skill-patterns.json - Known anti-patterns to avoid
- Recent session summaries - Context from recent work
Apply Optimizations:
- If a skill has known failure modes, add explicit warnings
- If a skill has declining performance, suggest alternatives
- If a skill has recurring errors, include workarounds
Phase 2: Real-Time Tracking
During session, track skill invocations:
Signals to Monitor:
- Skill invocation commands (
/skill-name,Use X skill) - Success indicators (checkmarks, "completed", "done")
- Failure indicators (errors, retries, "failed")
- User frustration signals (restarts, clarifications)
Anti-Pattern Detection:
| Anti-Pattern | Signal | Intervention |
|---|---|---|
| Excessive Retries | 3+ attempts | Suggest narrower scope |
| Context Confusion | Misunderstood task | Clarify skill boundaries |
| Tool Misuse | Wrong tool selected | Review trigger criteria |
| Incomplete Output | Missing elements | Add completion checklist |
| Hallucination Risk | Ungrounded claims | Add verification steps |
Phase 3: Session End - Retrospective
At session end, run full retrospective:
# Automatic retrospective
python3 hooks/session-retrospective.py --event session.end
# Manual with specific session
python3 hooks/session-retrospective.py --manual --session-id SESSION_ID
Retrospective Outputs:
- Skill Scores - Effectiveness rating per skill
- Anti-patterns - Detected problematic patterns
- Improvements - Specific recommendations
- Learnings Update - Updates to skill-learnings.json
Phase 4: Apply Improvements
For skills needing improvement:
# Generate improvement recommendations
python3 hooks/session-retrospective.py --optimize-now
# View specific skill learnings
python3 hooks/session-retrospective.py --analyze-skills --skill SKILL_NAME
Improvement Actions:
-
Update "When to Use" Section
- Add explicit triggers that worked
- Add explicit anti-triggers for failures
- Narrow scope if confusion occurred
-
Add Examples
- Working examples from successful invocations
- Error cases with solutions
- Edge cases discovered
-
Add Troubleshooting
- Common errors and solutions
- Known limitations
- Workarounds
-
Update Metadata
- Adjust token estimates
- Update confidence scores
- Add learned keywords
Anti-Pattern Detection
Detection Patterns
anti_patterns = {
"excessive_retries": {
"pattern": r'(?:retry|again|once more|try again)',
"threshold": 3,
"description": "Multiple retries indicate unclear instructions",
"recommendation": "Add clearer examples and edge case handling"
},
"context_confusion": {
"pattern": r'(?:confused|unclear|ambiguous|misunderstood)',
"threshold": 1,
"description": "Context confusion indicates poor skill scoping",
"recommendation": "Narrow skill scope and add explicit boundaries"
},
"tool_misuse": {
"pattern": r'(?:wrong tool|incorrect tool|should have used)',
"threshold": 1,
"description": "Tool selection issues indicate weak skill triggers",
"recommendation": "Improve 'When to Use' section with clearer triggers"
},
"incomplete_output": {
"pattern": r'(?:incomplete|partial|missing|forgot)',
"threshold": 2,
"description": "Incomplete outputs indicate missing checklist items",
"recommendation": "Add completion checklist to skill"
},
"hallucination_risk": {
"pattern": r'(?:assumed|guessed|made up|invented)',
"threshold": 1,
"description": "Assumptions indicate insufficient grounding",
"recommendation": "Add verification steps and source references"
}
}
Intervention Strategies
For Excessive Retries:
## Learned: Common Retry Scenarios
When this skill is invoked and requires retry:
1. Check if context is complete (all required files read)
2. Verify tool permissions are available
3. Confirm scope is narrow enough for single execution
If retry needed:
- Document what was missing
- Add to skill's prerequisites list
For Context Confusion:
## Skill Boundaries (Learned)
This skill handles:
- [Specific scope learned from success]
This skill does NOT handle:
- [Learned from confusion cases]
If unclear, use /which command to find correct skill.
Skill Update Protocol
When to Update Skills
-
Automatic Updates (via hook)
- Add learned anti-patterns
- Update effectiveness scores
- Append error solutions
-
Manual Updates (via PR)
- Major scope changes
- New examples from sessions
- Structural improvements
Update Format
## Learned Improvements
**Last Updated:** [ISO timestamp]
**Sessions Analyzed:** [count]
**Effectiveness Score:** [X]%
### Success Patterns (from learnings)
- [Pattern that worked well]
- [Context that led to success]
### Failure Patterns (avoid these)
- [Pattern that caused failures]
- [Context that led to confusion]
### Common Errors & Solutions
1. **Error:** [error message]
**Solution:** [learned solution]
2. **Error:** [error message]
**Solution:** [learned solution]
Integration Points
With /cx and /cxq
# Capture session for retrospective
/cx # Saves to context.db
# Query past learnings
/cxq --recall "skill optimization patterns"
With MoE Task Execution
# During /orient, load skill baselines
/orient
# Step 1: Check skill learnings
# Step 2: Apply known optimizations
# Step 3: Begin task execution
With Session Hooks
# Configure in settings.json
{
"hooks": {
"PostSession": [
{
"command": "python3 hooks/session-retrospective.py --event session.end",
"async": true,
"timeout": 60
}
]
}
}
Quick Reference
Commands
| Command | Purpose |
|---|---|
/optimize-skills | Real-time skill optimization |
/retrospective | Run session retrospective |
python3 hooks/session-retrospective.py --analyze-skills | View all skill learnings |
python3 hooks/session-retrospective.py --optimize-now | Get improvement recommendations |
Files
| File | Purpose |
|---|---|
context-storage/skill-learnings.json | Accumulated skill learnings |
context-storage/skill-patterns.json | Detected patterns |
hooks/session-retrospective.py | Main retrospective hook |
Metrics to Track
| Metric | Target | Warning |
|---|---|---|
| Overall Success Rate | >85% | <70% |
| Anti-patterns/Session | <2 | >5 |
| Skills Needing Work | <20% | >40% |
| Improvement Backlog | <10 | >25 |
Example Session
# Session Retrospective Example
## Session: 2025-01-01-1430
Duration: 45 minutes
Skills Invoked: 8
## Skill Performance
| Skill | Invocations | Success | Score |
|-------|-------------|---------|-------|
| git-workflow | 3 | 3 | 100% |
| code-editor | 5 | 4 | 80% |
| deployment | 2 | 1 | 50% |
## Anti-patterns Detected
1. **excessive_retries** (3 occurrences)
- Skill: deployment
- Cause: Missing prerequisite context
- Action: Add pre-flight checklist
## Improvements Generated
1. [HIGH] deployment skill needs work
- Add Kubernetes context verification
- Include rollback examples
- Document common GKE errors
2. [MEDIUM] code-editor partial failure
- Add multi-file coordination examples
- Include undo/rollback patterns
## Learnings Applied
- Updated skill-learnings.json
- Added 3 new error solutions
- Tracked 2 new anti-pattern occurrences
Best Practices
-
Run Retrospective After Significant Sessions
- 30+ minute sessions
- Multiple skill invocations
- Any failures or confusion
-
Act on High-Priority Improvements
- Fix skills with <70% success rate
- Address recurring anti-patterns
- Update frequently-used skills first
-
Maintain Learning History
- Don't delete old learnings
- Track trends over time
- Build institutional knowledge
-
Close the Loop
- Apply learnings at session start
- Verify improvements in next session
- Measure impact of changes
Success Output
When successful, this skill MUST output:
✅ SKILL COMPLETE: skill-improvement-tracker
Completed:
- [x] Session retrospective executed
- [x] Skill performance metrics calculated
- [x] Anti-patterns detected and categorized
- [x] Improvement recommendations generated
- [x] skill-learnings.json updated
- [x] Skill health dashboard reviewed
Outputs:
- context-storage/skill-learnings.json (updated)
- Retrospective report with skill scores
- List of skills needing improvement
- Actionable improvement recommendations
Metrics:
- Skills analyzed: X
- Overall success rate: Y%
- Anti-patterns detected: Z
- Improvements recommended: N
Completion Checklist
Before marking this skill as complete, verify:
- Retrospective hook executed successfully
- All skill invocations analyzed
- Success/failure markers detected
- Skill scores calculated (0-100%)
- Anti-patterns categorized by type
- skill-learnings.json file updated
- Improvement recommendations generated
- High-priority skills identified (<70% score)
- Trends analyzed across multiple sessions
- Next session baseline prepared
Failure Indicators
This skill has FAILED if:
- ❌ Cannot access context database (context.db)
- ❌ skill-learnings.json is corrupted or missing
- ❌ No skill invocations detected in session
- ❌ Anti-pattern detection produces false positives
- ❌ Retrospective hook not configured properly
- ❌ Skill scores calculation errors
- ❌ Cannot write to skill-learnings.json
- ❌ Circular dependencies in skill improvement logic
- ❌ Memory/performance issues analyzing large sessions
When NOT to Use
Do NOT use this skill when:
- Quick 1-2 turn conversations (insufficient data)
- No skills were invoked during session
- Session focused on non-skill tasks (chat, exploration)
- Already in a retrospective/analysis session (avoid recursion)
- Testing new skills before baseline exists
- Alternative: Use manual skill review for one-off debugging
Use alternative approaches for:
- Real-time skill optimization →
/optimize-skillscommand - Individual skill debugging → Manual skill inspection
- Performance profiling → Separate profiling tools
- Quality assessment → Skill validation checklist
Anti-Patterns (Avoid)
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Running on every session | Noise, wasted compute on trivial sessions | Only run on 30+ min sessions with multiple skills |
| Ignoring recommendations | Skills never improve, same errors repeat | Act on high-priority improvements within 1 week |
| Over-optimizing metrics | Focus on scores instead of real improvement | Balance quantitative scores with qualitative feedback |
| Deleting old learnings | Lose trend data, can't track improvement | Archive old data, never delete |
| Manual skill updates without retrospective | Changes not data-driven, may worsen performance | Always run retrospective before major skill changes |
| Recursive analysis | Skill analyzing itself causes loops | Exclude skill-improvement-tracker from its own analysis |
| No verification loop | Changes not validated in next session | Always measure impact of improvements |
| Treating all anti-patterns equally | Miss critical vs minor issues | Prioritize by severity and frequency |
Principles
This skill embodies these CODITECT principles:
- #10 Iterative Refinement - Continuous skill improvement through data
- #6 Clear, Understandable, Explainable - Transparent reasoning chains for improvements
- #8 No Assumptions - Data-driven decisions, not hunches
- #5 Eliminate Ambiguity - Explicit success/failure markers
- Automation - Autonomous learning loop with minimal human intervention
- Compound Intelligence - Every session makes future sessions better
Related Standards:
Status: Production-ready Integration: Hooks, Commands, Skills Automation Level: Semi-autonomous (human review for skill updates)