Skip to main content

Defect Escape Detector

Purpose

  1. Analyzes every new bug report created in issue tracker to determine where defect escaped QA
  2. Identifies escape point along quality gates: unit test, integration test, staging, canary, production
  3. Tags bug with escape point classification for root cause analysis
  4. Analyzes bug characteristics to suggest severity level (S1-S4)
  5. Adds triage metadata: affected systems, components, potential impact
  6. Auto-notifies triage channel for S1/S2 bugs requiring immediate attention
  7. Feeds escape data into continuous quality metrics for QA effectiveness tracking

Trigger

Event Type: New Bug Report Creation

Source: Issue tracker webhook (Jira, GitHub Issues, Linear)

Blocking: No (non-blocking background job)

Timeout: 30 seconds

Trigger Condition: New issue created with type: Bug or label: bug in configured projects

Behavior

When Triggered

  1. Receives bug report webhook with:
    • Title, description, steps to reproduce
    • Reporter name, affected system/component
    • Affected version if specified
    • Initial severity or other tags
  2. Analyzes description and metadata to determine escape point:
    • Unit Test Escape: Bug is fundamental logic error, simple to reproduce
    • Integration Test Escape: Bug involves module interactions or data flow
    • Staging Escape: Bug manifests under realistic test data volumes or concurrent scenarios
    • Canary Escape: Bug manifests in limited production rollout (1-10% traffic)
    • Production Escape: Bug detected in full production (10-100% traffic)
  3. Uses heuristics to classify escape point:
    • Keywords in description: "unit", "integration", "staging", "production"
    • Reproducibility: Deterministic → likely pre-production, intermittent → likely production
    • Scope: Single function/method → unit, cross-module → integration, infrastructure → production
    • Time to discovery: Immediate (unit), delayed (staging/prod)
  4. Suggests severity based on:
    • Impact keywords: critical, blocking, data loss → S1/S2
    • Affected system criticality: core, auth, payments → S1/S2
    • User count: broad → S1, specific users → S3/S4
    • Workaround availability: none → higher severity, workaround exists → lower severity
  5. Updates bug with:
    • Escape point tag: escape/unit-test, escape/integration-test, escape/staging, escape/canary, escape/production
    • Suggested severity: severity/S1, severity/S2, severity/S3, severity/S4
    • Triage metadata: affected components, systems, estimated impact
    • Escape analysis summary in comment
  6. For S1/S2 bugs:
    • Post to #bug-triage channel with summary and metadata
    • Tag on-call engineer and triage lead
    • Include escape analysis and suggested action items
  7. Links bug to defect escape database for trend analysis

Configuration

# .coditect/config/defect-escape-detector.json
{
"enabled": true,
"timeout_seconds": 30,
"issue_tracker": "jira",
"issue_tracker_url": "https://jira.company.com",
"project_keys": [
"PROD",
"BACKEND",
"FRONTEND"
],
"escape_point_detection": {
"enabled": true,
"use_description_keywords": true,
"use_ml_classification": false,
"confidence_threshold": 0.75
},
"severity_suggestion": {
"enabled": true,
"use_impact_keywords": true,
"use_affected_systems": true,
"confidence_threshold": 0.8
},
"severity_mapping": {
"critical_keywords": [
"data loss",
"security",
"authentication failure",
"payment",
"data corruption"
],
"high_impact_systems": [
"auth-service",
"payment-processor",
"core-api",
"database"
],
"broad_user_count": 1000,
"system_criticality": {
"CRITICAL": "S1",
"HIGH": "S2",
"MEDIUM": "S3",
"LOW": "S4"
}
},
"escape_points": {
"unit_test": {
"keywords": ["unit", "test", "simple", "logic"],
"reproducibility": "deterministic"
},
"integration_test": {
"keywords": ["integration", "module", "interaction", "cross-service"],
"reproducibility": "deterministic"
},
"staging": {
"keywords": ["staging", "test data", "volume", "concurrent", "load"],
"reproducibility": "intermittent"
},
"canary": {
"keywords": ["canary", "partial rollout", "1%", "10%"],
"reproducibility": "intermittent"
},
"production": {
"keywords": ["production", "prod", "live", "users affected"],
"reproducibility": "random"
}
},
"notifications": {
"enabled": true,
"slack_channel": "#bug-triage",
"notify_for_severities": [
"S1",
"S2"
],
"tag_on_call": true,
"include_escape_analysis": true
},
"metadata_fields": {
"issue_type": "Bug",
"labels": [
"bug",
"escape-detected"
]
}
}

Integration

Integrates with:

  • Issue tracker (Jira, GitHub Issues, Linear) API
  • Slack API for notifications
  • Internal database for defect escape metrics
  • ML service (optional) for classification
  • System criticality database for severity suggestion

Output

Updated Issue Fields:

  • Escape Point Tag: escape/unit-test, escape/integration-test, escape/staging, escape/canary, or escape/production
  • Severity Tag: severity/S1, severity/S2, severity/S3, or severity/S4
  • Component Tags: component/{system_name} for affected systems
  • Labels: escape-detected, triage-required (for S1/S2)

Escape Analysis Comment:

## Defect Escape Analysis

**Escape Point:** Production (detected in live environment)

**Severity Suggested:** S2 (High - affects payment processing)

**Reasoning:**
- Keywords suggest production environment detection
- Affected system: payment-processor (critical system)
- Impact: All users attempting payments
- Deterministic reproduction steps provided

**Metadata:**
- Affected System: Payment Processing
- Components: checkout-service, payment-gateway
- Estimated User Impact: 100%
- Time to Discovery: 2 hours after release
- Potential RCA: Insufficient load testing in integration tests

**Recommended Action:**
1. Immediate: Escalate to on-call engineer
2. Short-term: Hotfix and deploy
3. Root Cause: Add integration test for payment scenarios
4. Prevention: Increase canary monitoring for payment endpoints

Slack Notification (S1/S2):

BUG TRIAGE ALERT - S{severity} Defect Escape

Title: {bug_title}
Escape Point: {escape_point}
Severity: {severity}
Status: {status}

Affected System: {system}
Components: {components}
User Impact: {impact_estimate}

Issue Link: {issue_url}
Escape Analysis: {summary}

Assigned to: @{triage_lead}, @{on_call}
Action Required: Review and create mitigation plan

Database Entry: Logged to defect escape database with:

  • Bug ID, title, description
  • Escape point classification
  • Severity level
  • Affected systems/components
  • Time to discovery
  • Reporter
  • RCA (when available)
  • Fix status and date

Failure Handling

Failure ScenarioHandling
Issue tracker API unreachableRetry with exponential backoff (3 attempts), alert #platform-oncall
Escape point detection failsDefault to escape/production, continue with severity analysis
Severity suggestion failsDefault to severity/S2, alert QA team
Slack notification failsLog error, continue (non-fatal)
Classification confidence too lowUse default values, add comment noting low confidence
Metadata field update failsLog error, notify QA team for manual remediation

Retry Logic: 3 retries with exponential backoff (1s, 2s, 5s) Alert Channel: #platform-oncall for critical failures Manual Fallback: QA team notified if automation cannot complete

HookRelationshipPurpose
bug-severity-classifierComplementaryValidates severity classification
qa-effectiveness-metricsDownstreamAggregates escape data for QA trends
release-quality-gateRelatedUses escape data to assess release readiness
incident-correlationRelatedCorrelates production escapes with incidents
regression-detectorRelatedIdentifies patterns in escaped defects

Principles

  1. Escape Detection First: Identify where defect originated for systemic improvement
  2. Immediate Triage: S1/S2 bugs trigger urgent notifications
  3. Data-Driven: Escape metrics guide QA investment decisions
  4. Preventive Mindset: Analysis includes prevention recommendations
  5. Transparent Classification: Reasoning visible in comments for team learning
  6. Continuous Learning: Escape patterns tracked for trend analysis
  7. Actionable Output: Every tag includes context for decision making
  8. Severity Consistency: Standardized rules ensure consistent severity assignment across team