Skip to main content

/perf-profile Command

Execute comprehensive performance profiling for applications to identify CPU bottlenecks, memory issues, and I/O constraints with visual flame graph generation.

System Prompt

⚠️ EXECUTION DIRECTIVE: When the user invokes this command, you MUST:

  1. IMMEDIATELY execute - no questions, no explanations first
  2. ALWAYS show full output from script/tool execution
  3. ALWAYS provide summary after execution completes

DO NOT:

  • Say "I don't need to take action" - you ALWAYS execute when invoked
  • Ask for confirmation unless requires_confirmation: true in frontmatter
  • Skip execution even if it seems redundant - run it anyway

The user invoking the command IS the confirmation.


Usage

/perf-profile [options] <target>

Options:
--type <types> Profile types: cpu,memory,io,async (default: cpu)
--duration <secs> Profiling duration in seconds (default: 30)
--output <format> Output: terminal,json,flamegraph,html (default: terminal)
--threshold <ms> Minimum time threshold for reporting (default: 10ms)
--compare <file> Compare with baseline profile
--language <lang> Target language: python,node,rust,go (auto-detect)

Profile Types

CPU Profiling

/perf-profile --type cpu <command>

# Python: py-spy, cProfile
# Node.js: clinic.js doctor, 0x
# Rust: cargo-flamegraph, perf
# Go: pprof

Memory Profiling

/perf-profile --type memory <command>

# Detects:
# - Memory leaks
# - High memory allocations
# - Object retention issues
# - Garbage collection pressure

I/O Profiling

/perf-profile --type io <command>

# Monitors:
# - File system operations
# - Network I/O
# - Database queries
# - API call latency

Async Profiling

/perf-profile --type async <command>

# Analyzes:
# - Event loop blocking
# - Promise/Future bottlenecks
# - Concurrent task performance
# - Thread pool utilization

Examples

Basic CPU Profile

# Profile a Python application
/perf-profile --type cpu python app.py

# Profile with flame graph output
/perf-profile --type cpu --output flamegraph --duration 60 node server.js

Memory Analysis

# Memory profile with leak detection
/perf-profile --type memory --duration 120 python process_data.py

# Memory comparison (before/after optimization)
/perf-profile --type memory --compare baseline.json python app.py

Production Profiling

# Attach to running process (Python)
/perf-profile --type cpu --pid 12345

# Sample-based profiling (low overhead)
/perf-profile --type cpu --sampling 100 cargo run --release

Load Test Profiling

# Profile under load
k6 run load-test.js &
/perf-profile --type cpu,memory --duration 300 python api_server.py

Output Formats

Terminal Summary

Performance Profile Results
===========================

Duration: 30.0s | Samples: 3000 | CPU: 45% avg

Top 10 Functions by CPU Time:
1. process_batch 28.5% 8550ms src/processor.py:45
2. serialize_json 15.2% 4560ms src/utils.py:123
3. db_query 12.8% 3840ms src/database.py:78
4. validate_input 8.3% 2490ms src/validators.py:34
5. parse_request 6.1% 1830ms src/api.py:56

Memory:
Peak: 512MB | Average: 380MB | GC Pauses: 12

Recommendations:
- Consider caching serialize_json results (15.2% CPU)
- Batch db_query calls to reduce overhead
- Review process_batch algorithm complexity

Flame Graph

/perf-profile --output flamegraph python app.py

# Generates: profile-flamegraph.svg
# Open in browser for interactive analysis

JSON Export

/perf-profile --output json python app.py > profile.json

# Structure:
# {
# "duration_ms": 30000,
# "samples": 3000,
# "hotspots": [...],
# "memory": {...},
# "recommendations": [...]
# }

Language-Specific Tools

Python

# py-spy (sampling profiler)
/perf-profile --language python --type cpu app.py

# memory_profiler
/perf-profile --language python --type memory app.py

Node.js

# clinic.js (automatic analysis)
/perf-profile --language node --type cpu server.js

# 0x (flame graphs)
/perf-profile --language node --output flamegraph server.js

Rust

# cargo-flamegraph
/perf-profile --language rust --output flamegraph cargo run

# perf + FlameGraph
/perf-profile --language rust --type cpu target/release/app

Go

# pprof (built-in)
/perf-profile --language go --type cpu,memory ./app

Optimization Workflow

  1. Baseline: /perf-profile --output json app > baseline.json
  2. Identify hotspots from top functions list
  3. Optimize targeted functions
  4. Compare: /perf-profile --compare baseline.json app
  5. Verify improvement percentage
  6. Document optimization results

Integration

CI/CD Performance Gates

- name: Performance Regression Test
run: |
/perf-profile --type cpu --duration 60 --output json app > profile.json
python scripts/check-perf-regression.py profile.json --threshold 10%

Continuous Profiling

# Schedule periodic profiling in production
/perf-profile --type cpu --duration 300 --output json \
--pid $(pgrep -f "python app.py") > profiles/$(date +%Y%m%d-%H%M).json
  • performance-profiler-agent - Detailed profiling strategies
  • optimization-patterns-skill - Code optimization patterns
  • load-testing-skill - Load testing for performance validation
  • /dependency-audit - Performance-impacting dependencies

Action Policy

<default_behavior> This command executes performance profiling. Proceeds with:

  • Language detection
  • Profile collection
  • Hotspot analysis
  • Recommendation generation

Generates reports in specified format. </default_behavior>

After execution, verify: - Profile collected - Hotspots identified - Report generated - Recommendations provided

Success Output

When perf-profile completes:

✅ COMMAND COMPLETE: /perf-profile
Target: <command>
Duration: <N>s
Type: <cpu|memory|io|async>
Hotspots: <N> identified
Output: <format>

Completion Checklist

Before marking complete:

  • Target profiled
  • Data collected
  • Analysis complete
  • Report generated

Failure Indicators

This command has FAILED if:

  • ❌ No target specified
  • ❌ Profiling failed
  • ❌ No data collected
  • ❌ Report generation failed

When NOT to Use

Do NOT use when:

  • Quick timing needed (use time)
  • Production with high load
  • No profiling tools installed

Anti-Patterns (Avoid)

Anti-PatternProblemSolution
Profile in debug modeMisleading dataUse release builds
Short durationInsufficient samplesUse 30s+ duration
Skip baselineNo comparisonCreate baseline first

Principles

This command embodies:

  • #3 Complete Execution - Full profiling workflow
  • #9 Based on Facts - Data-driven analysis
  • #6 Clear, Understandable - Clear reports

Full Standard: CODITECT-STANDARD-AUTOMATION.md