Agent Skills Framework Extension
Debugging Patterns Skill
When to Use This Skill
Use this skill when implementing debugging patterns patterns in your codebase.
How to Use This Skill
- Review the patterns and examples below
- Apply the relevant patterns to your implementation
- Follow the best practices outlined in this skill
Systematic debugging strategies, root cause analysis, and production-grade troubleshooting.
Core Capabilities
- Root Cause Analysis - Systematic problem isolation and diagnosis
- Performance Profiling - CPU, memory, I/O bottleneck identification
- Memory Debugging - Leak detection, use-after-free, buffer overflows
- Distributed Debugging - Tracing, correlation, causality analysis
- Production Debugging - Live system troubleshooting, minimal impact
Systematic Debugging Framework
# debugging/systematic_approach.py
"""
Structured debugging methodology
"""
from dataclasses import dataclass
from typing import List, Optional, Dict, Any
from enum import Enum
import subprocess
import json
class DebugPhase(Enum):
REPRODUCE = "reproduce"
ISOLATE = "isolate"
ANALYZE = "analyze"
FIX = "fix"
VERIFY = "verify"
@dataclass
class DebugHypothesis:
"""Testable hypothesis about bug cause"""
description: str
likelihood: float # 0.0 to 1.0
test_procedure: str
expected_outcome: str
actual_outcome: Optional[str] = None
validated: Optional[bool] = None
@dataclass
class DebugSession:
"""Track debugging session state"""
issue_description: str
reproduction_steps: List[str]
phase: DebugPhase
hypotheses: List[DebugHypothesis]
observations: List[str]
fix_applied: Optional[str] = None
class SystematicDebugger:
"""Framework for systematic debugging"""
def __init__(self):
self.session = None
def start_session(self, issue: str) -> DebugSession:
"""Initialize debugging session"""
self.session = DebugSession(
issue_description=issue,
reproduction_steps=[],
phase=DebugPhase.REPRODUCE,
hypotheses=[],
observations=[]
)
return self.session
def reproduce(self, steps: List[str]) -> bool:
"""Attempt to reproduce the issue"""
self.session.reproduction_steps = steps
self.session.phase = DebugPhase.ISOLATE
print("Reproduction steps:")
for i, step in enumerate(steps, 1):
print(f" {i}. {step}")
# Execute reproduction
# Returns True if successfully reproduced
return True
def isolate_component(self) -> str:
"""Narrow down which component is failing"""
self.session.phase = DebugPhase.ISOLATE
# Binary search through system components
# Divide and conquer to find failing subsystem
isolation_strategies = [
"Remove external dependencies",
"Run with minimal configuration",
"Test individual modules in isolation",
"Check with known-good data",
"Bisect git history (git bisect)"
]
return isolation_strategies
def form_hypotheses(self, observations: List[str]) -> List[DebugHypothesis]:
"""Generate testable hypotheses"""
self.session.observations.extend(observations)
self.session.phase = DebugPhase.ANALYZE
hypotheses = []
# Based on observations, form hypotheses
# Example: If seeing null pointer error
hypotheses.append(DebugHypothesis(
description="Uninitialized variable accessed",
likelihood=0.7,
test_procedure="Add null checks, examine initialization",
expected_outcome="Crash at initialization site"
))
self.session.hypotheses.extend(hypotheses)
return hypotheses
def test_hypothesis(self, hypothesis: DebugHypothesis) -> bool:
"""Execute hypothesis test"""
print(f"Testing: {hypothesis.description}")
print(f"Procedure: {hypothesis.test_procedure}")
# Execute test
# Record actual outcome
# Validate or reject hypothesis
return True
def apply_fix(self, fix_description: str, code_changes: str):
"""Apply and verify fix"""
self.session.phase = DebugPhase.FIX
self.session.fix_applied = fix_description
print(f"Applying fix: {fix_description}")
print(f"Changes:\n{code_changes}")
def verify_fix(self) -> bool:
"""Verify fix resolves issue"""
self.session.phase = DebugPhase.VERIFY
# Re-run reproduction steps
# Should no longer reproduce
# Run regression tests
# Ensure no new issues introduced
return True
def generate_report(self) -> Dict[str, Any]:
"""Generate debugging session report"""
return {
'issue': self.session.issue_description,
'reproduction_steps': self.session.reproduction_steps,
'observations': self.session.observations,
'hypotheses': [
{
'description': h.description,
'validated': h.validated,
'likelihood': h.likelihood
}
for h in self.session.hypotheses
],
'fix': self.session.fix_applied
}
Performance Profiling
// src/profiling/performance.rs
use std::time::{Duration, Instant};
use std::collections::HashMap;
/// CPU profiling with flame graph generation
pub struct Profiler {
samples: Vec<StackSample>,
sampling_rate: Duration,
}
struct StackSample {
timestamp: Instant,
stack_trace: Vec<String>,
}
impl Profiler {
pub fn new(sampling_rate_ms: u64) -> Self {
Self {
samples: Vec::new(),
sampling_rate: Duration::from_millis(sampling_rate_ms),
}
}
pub fn start_sampling(&mut self) {
// Start background thread to sample stack traces
// Use backtrace crate for stack unwinding
}
pub fn stop_and_analyze(&self) -> ProfilingReport {
let mut function_time = HashMap::new();
// Aggregate samples per function
for sample in &self.samples {
for func in &sample.stack_trace {
*function_time.entry(func.clone()).or_insert(0) += 1;
}
}
// Calculate percentages
let total_samples = self.samples.len() as f64;
let mut hot_spots: Vec<_> = function_time
.into_iter()
.map(|(func, count)| {
let percentage = (count as f64 / total_samples) * 100.0;
(func, percentage)
})
.collect();
hot_spots.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
ProfilingReport {
total_samples: self.samples.len(),
hot_spots,
}
}
}
pub struct ProfilingReport {
total_samples: usize,
hot_spots: Vec<(String, f64)>,
}
impl ProfilingReport {
pub fn print(&self) {
println!("=== Profiling Report ===");
println!("Total samples: {}", self.total_samples);
println!("\nTop 10 hot spots:");
for (i, (func, pct)) in self.hot_spots.iter().take(10).enumerate() {
println!(" {}. {:<50} {:>6.2}%", i + 1, func, pct);
}
}
}
/// Memory leak detection
pub struct MemoryTracker {
allocations: HashMap<usize, AllocationInfo>,
}
struct AllocationInfo {
size: usize,
stack_trace: Vec<String>,
timestamp: Instant,
}
impl MemoryTracker {
pub fn new() -> Self {
Self {
allocations: HashMap::new(),
}
}
pub fn track_allocation(&mut self, ptr: usize, size: usize) {
let stack_trace = vec!["TODO: capture backtrace".to_string()];
self.allocations.insert(ptr, AllocationInfo {
size,
stack_trace,
timestamp: Instant::now(),
});
}
pub fn track_deallocation(&mut self, ptr: usize) {
self.allocations.remove(&ptr);
}
pub fn report_leaks(&self) -> Vec<String> {
let mut leaks = Vec::new();
for (ptr, info) in &self.allocations {
let age = info.timestamp.elapsed();
// Report allocations older than threshold
if age > Duration::from_secs(300) { // 5 minutes
leaks.push(format!(
"Leak: {:?} - {} bytes - age: {:?}",
ptr, info.size, age
));
}
}
leaks
}
}
// Benchmark macro for performance testing
#[macro_export]
macro_rules! benchmark {
($name:expr, $code:block) => {{
let start = std::time::Instant::now();
let result = $code;
let duration = start.elapsed();
println!("{}: {:?}", $name, duration);
result
}};
}
Production Debugging Tools
#!/bin/bash
# scripts/production_debug.sh
# Safe production debugging toolkit
set -euo pipefail
# Configuration
NAMESPACE="production"
POD_SELECTOR="app=myapp"
LOG_LINES=1000
print_section() {
echo "======================================"
echo "$1"
echo "======================================"
}
# 1. Health Check
health_check() {
print_section "Health Check"
kubectl get pods -n $NAMESPACE -l $POD_SELECTOR
# Check pod status
kubectl get pods -n $NAMESPACE -l $POD_SELECTOR \
-o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.phase}{"\n"}{end}'
}
# 2. Resource Usage
resource_usage() {
print_section "Resource Usage"
kubectl top pods -n $NAMESPACE -l $POD_SELECTOR
# Detailed metrics
kubectl get pods -n $NAMESPACE -l $POD_SELECTOR \
-o jsonpath='{range .items[*]}{.metadata.name}{"\t"}CPU: {.status.containerStatuses[0].usage.cpu}{"\t"}Memory: {.status.containerStatuses[0].usage.memory}{"\n"}{end}'
}
# 3. Recent Logs
recent_logs() {
print_section "Recent Logs"
POD=$(kubectl get pods -n $NAMESPACE -l $POD_SELECTOR \
-o jsonpath='{.items[0].metadata.name}')
kubectl logs -n $NAMESPACE $POD --tail=$LOG_LINES
# Error analysis
echo -e "\nError Summary:"
kubectl logs -n $NAMESPACE $POD --tail=$LOG_LINES | \
grep -i error | sort | uniq -c | sort -rn | head -10
}
# 4. Application Metrics
app_metrics() {
print_section "Application Metrics"
POD=$(kubectl get pods -n $NAMESPACE -l $POD_SELECTOR \
-o jsonpath='{.items[0].metadata.name}')
# Scrape /metrics endpoint
kubectl exec -n $NAMESPACE $POD -- \
curl -s http://localhost:8080/metrics
}
# 5. Network Connectivity
network_check() {
print_section "Network Connectivity"
POD=$(kubectl get pods -n $NAMESPACE -l $POD_SELECTOR \
-o jsonpath='{.items[0].metadata.name}')
# Check DNS
kubectl exec -n $NAMESPACE $POD -- nslookup google.com
# Check service endpoints
kubectl get endpoints -n $NAMESPACE
}
# 6. Configuration Verification
config_check() {
print_section "Configuration Check"
# Check ConfigMaps
kubectl get configmaps -n $NAMESPACE
# Check Secrets (names only, not values)
kubectl get secrets -n $NAMESPACE
}
# 7. Event Log
events() {
print_section "Recent Events"
kubectl get events -n $NAMESPACE --sort-by='.lastTimestamp' | tail -20
}
# Main execution
main() {
health_check
resource_usage
recent_logs
app_metrics
network_check
config_check
events
}
main "$@"
Distributed Tracing Debug
// src/tracing/distributed_debug.rs
use opentelemetry::{trace::Tracer, global};
use tracing::{info, error, instrument};
use tracing_opentelemetry::OpenTelemetrySpanExt;
use uuid::Uuid;
/// Distributed request tracing for debugging
pub struct DistributedDebugger {
tracer: Box<dyn Tracer + Send + Sync>,
}
impl DistributedDebugger {
#[instrument(skip(self))]
pub async fn trace_request(
&self,
request_id: Uuid,
service_name: &str,
) -> Result<(), String> {
// Create span for this operation
let span = info_span!(
"process_request",
request_id = %request_id,
service = %service_name
);
let _guard = span.enter();
info!("Processing request in {}", service_name);
// Simulate work
self.call_dependency("database", request_id).await?;
self.call_dependency("cache", request_id).await?;
info!("Request completed successfully");
Ok(())
}
#[instrument(skip(self))]
async fn call_dependency(
&self,
dependency: &str,
request_id: Uuid,
) -> Result<(), String> {
info!("Calling dependency: {}", dependency);
// Propagate trace context to downstream service
// via HTTP headers (W3C Trace Context)
Ok(())
}
}
Usage Examples
Systematic Debugging
Apply debugging-patterns skill to debug production memory leak using systematic approach
Performance Profiling
Apply debugging-patterns skill to profile CPU hot spots and generate flame graph
Production Debugging
Apply debugging-patterns skill to diagnose production issue with minimal system impact
Integration Points
- error-debugging-patterns - Error analysis
- rust-development-patterns - Rust-specific debugging
- rust-qa-patterns - Testing integration
Success Output
When successful, this skill MUST output:
✅ SKILL COMPLETE: debugging-patterns
Completed:
- [x] Issue reproduced with documented steps
- [x] Component isolation completed (failing subsystem identified)
- [x] Hypotheses formed and tested systematically
- [x] Root cause identified with evidence
- [x] Fix applied and verified with regression tests
- [x] Debugging session report generated
Outputs:
- debug-session-report.json (issue, hypotheses, observations, fix)
- reproduction-steps.md (exact steps to reproduce issue)
- profiling-report.txt (performance hot spots if applicable)
- memory-leak-report.txt (allocation tracking if applicable)
Debug Session Summary:
Issue: [Description of original issue]
Root Cause: [Identified root cause with evidence]
Fix Applied: [Description of fix with code changes]
Verification: [Test results showing fix resolves issue]
Performance Metrics (if applicable):
- Top hot spot: [Function name] - XX.X% CPU time
- Memory leak: [Allocation site] - XXX bytes leaked
Completion Checklist
Before marking this skill as complete, verify:
- Issue successfully reproduced with documented steps
- Reproduction steps run consistently (not intermittent)
- Component isolation performed (binary search through subsystems)
- Hypotheses documented with likelihood estimates
- At least one hypothesis tested and validated/rejected
- Root cause identified with concrete evidence
- Fix applied with code changes documented
- Fix verified by re-running reproduction steps (no longer reproduces)
- Regression tests added to prevent recurrence
- Debugging session report generated with all findings
Failure Indicators
This skill has FAILED if:
- ❌ Issue cannot be reproduced consistently (intermittent or environmental)
- ❌ Component isolation incomplete (failing subsystem not identified)
- ❌ No hypotheses formed (skipped systematic analysis)
- ❌ Root cause not identified (guessing at fixes)
- ❌ Fix not verified (assumed fixed without re-testing)
- ❌ Regression tests not added (issue will recur)
- ❌ Debugging session report missing (no documentation of findings)
- ❌ Production debugging performed without safety checks (system impacted)
When NOT to Use
Do NOT use this skill when:
- Issue is a known bug with documented fix (apply fix directly)
- Problem is configuration error (use configuration validation instead)
- Issue is user error or misuse (provide documentation/training)
- Bug is in third-party library (report upstream, apply workaround)
- Use error-debugging-patterns for simple error message analysis
- Use performance-optimization-patterns for optimization without bugs
Alternative skills for different debugging needs:
- error-debugging-patterns - Error message analysis and stack trace interpretation
- performance-profiling-patterns - CPU/memory profiling for optimization
- distributed-tracing-patterns - Debugging across microservices
- production-debugging-patterns - Safe production troubleshooting
Anti-Patterns (Avoid)
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Skipping reproduction | Cannot verify fix works | Always reproduce issue first before debugging |
| Guessing at fixes | Wastes time, introduces new bugs | Form testable hypotheses, validate systematically |
| Not isolating components | Debug entire system, inefficient | Binary search to isolate failing subsystem |
| Changing multiple things at once | Cannot determine what fixed issue | Change one variable at a time, test hypothesis |
| Assuming fix works | Issue recurs later | Always verify fix with reproduction steps |
| No regression tests | Bug returns in future | Add tests to prevent recurrence |
| Debugging in production without backups | Data loss, system crashes | Use read-only debugging, staging environments |
| Skipping documentation | Knowledge lost, issue repeats | Document findings, root cause, and fix |
Principles
This skill embodies CODITECT foundational principles:
#2 First Principles Thinking
- Systematic debugging methodology (reproduce → isolate → analyze → fix → verify)
- Binary search for component isolation (divide and conquer)
- Hypothesis-driven testing (not random trial and error)
#5 Eliminate Ambiguity
- Explicit reproduction steps (no "it just crashes sometimes")
- Testable hypotheses with expected outcomes
- Clear root cause identification with evidence
#6 Clear, Understandable, Explainable
- Debugging session report documents entire investigation
- Reproduction steps written for any developer to follow
- Root cause explained with reasoning chain
#7 Measurable Outcomes
- Performance profiling with quantitative metrics (CPU %, memory usage)
- Memory leak detection with byte counts and allocation sites
- Fix verification with before/after test results
#8 No Assumptions
- Reproduce issue before debugging (don't assume cause)
- Test hypotheses systematically (don't assume first guess is correct)
- Verify fix works (don't assume it's fixed without testing)
Full Principles: CODITECT-STANDARD-AUTOMATION.md
Version: 1.1.0 | Updated: 2026-01-04 | Author: CODITECT Team