/perf-profile Command
Execute comprehensive performance profiling for applications to identify CPU bottlenecks, memory issues, and I/O constraints with visual flame graph generation.
System Prompt
⚠️ EXECUTION DIRECTIVE: When the user invokes this command, you MUST:
- IMMEDIATELY execute - no questions, no explanations first
- ALWAYS show full output from script/tool execution
- ALWAYS provide summary after execution completes
DO NOT:
- Say "I don't need to take action" - you ALWAYS execute when invoked
- Ask for confirmation unless
requires_confirmation: truein frontmatter - Skip execution even if it seems redundant - run it anyway
The user invoking the command IS the confirmation.
Usage
/perf-profile [options] <target>
Options:
--type <types> Profile types: cpu,memory,io,async (default: cpu)
--duration <secs> Profiling duration in seconds (default: 30)
--output <format> Output: terminal,json,flamegraph,html (default: terminal)
--threshold <ms> Minimum time threshold for reporting (default: 10ms)
--compare <file> Compare with baseline profile
--language <lang> Target language: python,node,rust,go (auto-detect)
Profile Types
CPU Profiling
/perf-profile --type cpu <command>
# Python: py-spy, cProfile
# Node.js: clinic.js doctor, 0x
# Rust: cargo-flamegraph, perf
# Go: pprof
Memory Profiling
/perf-profile --type memory <command>
# Detects:
# - Memory leaks
# - High memory allocations
# - Object retention issues
# - Garbage collection pressure
I/O Profiling
/perf-profile --type io <command>
# Monitors:
# - File system operations
# - Network I/O
# - Database queries
# - API call latency
Async Profiling
/perf-profile --type async <command>
# Analyzes:
# - Event loop blocking
# - Promise/Future bottlenecks
# - Concurrent task performance
# - Thread pool utilization
Examples
Basic CPU Profile
# Profile a Python application
/perf-profile --type cpu python app.py
# Profile with flame graph output
/perf-profile --type cpu --output flamegraph --duration 60 node server.js
Memory Analysis
# Memory profile with leak detection
/perf-profile --type memory --duration 120 python process_data.py
# Memory comparison (before/after optimization)
/perf-profile --type memory --compare baseline.json python app.py
Production Profiling
# Attach to running process (Python)
/perf-profile --type cpu --pid 12345
# Sample-based profiling (low overhead)
/perf-profile --type cpu --sampling 100 cargo run --release
Load Test Profiling
# Profile under load
k6 run load-test.js &
/perf-profile --type cpu,memory --duration 300 python api_server.py
Output Formats
Terminal Summary
Performance Profile Results
===========================
Duration: 30.0s | Samples: 3000 | CPU: 45% avg
Top 10 Functions by CPU Time:
1. process_batch 28.5% 8550ms src/processor.py:45
2. serialize_json 15.2% 4560ms src/utils.py:123
3. db_query 12.8% 3840ms src/database.py:78
4. validate_input 8.3% 2490ms src/validators.py:34
5. parse_request 6.1% 1830ms src/api.py:56
Memory:
Peak: 512MB | Average: 380MB | GC Pauses: 12
Recommendations:
- Consider caching serialize_json results (15.2% CPU)
- Batch db_query calls to reduce overhead
- Review process_batch algorithm complexity
Flame Graph
/perf-profile --output flamegraph python app.py
# Generates: profile-flamegraph.svg
# Open in browser for interactive analysis
JSON Export
/perf-profile --output json python app.py > profile.json
# Structure:
# {
# "duration_ms": 30000,
# "samples": 3000,
# "hotspots": [...],
# "memory": {...},
# "recommendations": [...]
# }
Language-Specific Tools
Python
# py-spy (sampling profiler)
/perf-profile --language python --type cpu app.py
# memory_profiler
/perf-profile --language python --type memory app.py
Node.js
# clinic.js (automatic analysis)
/perf-profile --language node --type cpu server.js
# 0x (flame graphs)
/perf-profile --language node --output flamegraph server.js
Rust
# cargo-flamegraph
/perf-profile --language rust --output flamegraph cargo run
# perf + FlameGraph
/perf-profile --language rust --type cpu target/release/app
Go
# pprof (built-in)
/perf-profile --language go --type cpu,memory ./app
Optimization Workflow
- Baseline:
/perf-profile --output json app > baseline.json - Identify hotspots from top functions list
- Optimize targeted functions
- Compare:
/perf-profile --compare baseline.json app - Verify improvement percentage
- Document optimization results
Integration
CI/CD Performance Gates
- name: Performance Regression Test
run: |
/perf-profile --type cpu --duration 60 --output json app > profile.json
python scripts/check-perf-regression.py profile.json --threshold 10%
Continuous Profiling
# Schedule periodic profiling in production
/perf-profile --type cpu --duration 300 --output json \
--pid $(pgrep -f "python app.py") > profiles/$(date +%Y%m%d-%H%M).json
Related Components
- performance-profiler-agent - Detailed profiling strategies
- optimization-patterns-skill - Code optimization patterns
- load-testing-skill - Load testing for performance validation
- /dependency-audit - Performance-impacting dependencies
Action Policy
<default_behavior> This command executes performance profiling. Proceeds with:
- Language detection
- Profile collection
- Hotspot analysis
- Recommendation generation
Generates reports in specified format. </default_behavior>
Success Output
When perf-profile completes:
✅ COMMAND COMPLETE: /perf-profile
Target: <command>
Duration: <N>s
Type: <cpu|memory|io|async>
Hotspots: <N> identified
Output: <format>
Completion Checklist
Before marking complete:
- Target profiled
- Data collected
- Analysis complete
- Report generated
Failure Indicators
This command has FAILED if:
- ❌ No target specified
- ❌ Profiling failed
- ❌ No data collected
- ❌ Report generation failed
When NOT to Use
Do NOT use when:
- Quick timing needed (use time)
- Production with high load
- No profiling tools installed
Anti-Patterns (Avoid)
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Profile in debug mode | Misleading data | Use release builds |
| Short duration | Insufficient samples | Use 30s+ duration |
| Skip baseline | No comparison | Create baseline first |
Principles
This command embodies:
- #3 Complete Execution - Full profiling workflow
- #9 Based on Facts - Data-driven analysis
- #6 Clear, Understandable - Clear reports
Full Standard: CODITECT-STANDARD-AUTOMATION.md