Skip to main content

Stale Feature Flag Detector

Purpose

  1. Scans all active feature flags weekly to identify staleness conditions
  2. Detects flags past their configured TTL (time-to-live)
  3. Identifies flags at 100% rollout for more than 14 days (should be permanent feature or removed)
  4. Identifies flags at 0% rollout for more than 30 days (legacy flags, likely forgotten)
  5. Detects flags with missing owner assignment (orphaned flags)
  6. Auto-creates cleanup tickets with recommendations for flag disposition
  7. Maintains flag hygiene and prevents technical debt accumulation

Trigger

Event Type: Scheduled Cron

Schedule: Weekly, Monday 9:00 AM UTC

Blocking: No (non-blocking background job)

Timeout: 60 seconds

Frequency: Once per week (configurable to daily if needed)

Behavior

When Triggered

  1. Connects to feature flag management system (Unleash, LaunchDarkly, or custom)
  2. Fetches all flags with metadata (creation date, last modified, TTL, current rollout %, owner)
  3. Evaluates each flag against staleness criteria:
    • Flag age > TTL (if TTL is set)
    • Flag at 100% rollout for > 14 days
    • Flag at 0% rollout for > 30 days
    • Flag missing owner field or owner is inactive
  4. For each stale flag detected:
    • Calculates staleness severity (1-3: low, medium, high)
    • Generates recommendation (promote to permanent feature, remove, or assign owner)
    • Creates cleanup ticket with stale flag metadata
  5. Batches tickets by recommendation type
  6. Sends summary report to #flag-hygiene Slack channel with counts by category

Configuration

# .coditect/config/stale-flag-detector.json
{
"enabled": true,
"schedule": "0 9 ? * MON",
"timeout_seconds": 60,
"feature_flag_provider": "unleash",
"feature_flag_api_url": "https://flags.api.internal/",
"staleness_criteria": {
"ttl_exceeded": true,
"max_rollout_days": 14,
"min_rollout_days": 30,
"require_owner": true,
"owner_inactivity_threshold_days": 30
},
"ticket_creation": {
"enabled": true,
"project_key": "INFRA",
"label": "flag-cleanup",
"priority": "LOW",
"assignee_team": "platform"
},
"notifications": {
"slack_channel": "#flag-hygiene",
"include_details": true,
"include_metrics": true
},
"exclusions": {
"flag_name_patterns": [
"internal_*",
"experiment_*"
]
}
}

Integration

Integrates with:

  • Feature flag management platform (Unleash, LaunchDarkly, custom API)
  • Issue tracking system (Jira, GitHub Issues) for ticket creation
  • Slack API for notifications
  • Internal database for historical flag metadata

Output

Ticket Fields:

  • Title: [FLAG CLEANUP] {flag_name} - {recommendation_type}
  • Description: Markdown report with:
    • Flag name, creation date, last modified date
    • Current rollout percentage and audience
    • TTL status (if applicable)
    • Owner name and last activity
    • Recommendation with disposal options
    • Estimated cleanup time
  • Labels: flag-cleanup, {recommendation_type} (e.g., promote-to-permanent, remove-flag, orphaned-flag)
  • Priority: LOW (MEDIUM if S1/S2 system affected)
  • Assignee: Flag owner or Platform team

Slack Notification Format:

Weekly Flag Hygiene Report (Mon 9am)

Total flags scanned: 487
Stale flags detected: 23
- Past TTL: 8
- At 100% > 14d: 7
- At 0% > 30d: 5
- Missing owner: 3

Tickets created: 23
Action required: Review and prioritize cleanup tasks

Top candidates for removal: {flag_names}

Failure Handling

Failure ScenarioHandling
Feature flag API unreachableRetry with exponential backoff (3 attempts), skip run if all fail, alert #platform-oncall
Issue tracker connection failsLog error, defer ticket creation, send alert to team lead
Slack notification failsLog error, continue (non-fatal)
No stale flags detectedSend success notification with "0 issues found"
Configuration invalidFail fast, alert to #platform-oncall with error details

Retry Logic: 3 retries with exponential backoff (2s, 5s, 10s) Alert Channel: #platform-oncall for failures

HookRelationshipPurpose
feature-flag-deployment-validatorComplementaryValidates flags during deployment
flag-rollout-monitorUpstreamMonitors active flag rollouts in real-time
feature-flag-cleanup-schedulerDownstreamExecutes actual cleanup based on detector findings
incident-flag-correlationRelatedCorrelates flags with incident patterns

Principles

  1. Flag Hygiene First: Proactive detection prevents debt from accumulating
  2. Actionable Output: Every detection includes clear recommendation for remediation
  3. Minimal False Positives: Strict criteria and exclusion patterns to avoid noise
  4. Non-Blocking: Background job never blocks critical deployments or services
  5. Transparent Reporting: Summary reports visible to entire team for accountability
  6. Configuration-Driven: All criteria and thresholds configurable per environment
  7. Gradual Enforcement: Starts with detection and notifications before automation enforcement
  8. Audit Trail: All flag state changes logged for compliance and root cause analysis