GCP Resource Cleanup Skill

How to Use This Skill

Review the patterns and examples below
Apply the relevant patterns to your implementation
Follow the best practices outlined in this skill

Automated cleanup of legacy GCP resources based on proven patterns from production deployments.

When to Use This Skill

✅ Use this skill when:

Deploying new API/service version and need to clean up old version
Sprint ends and legacy resources need cleanup
Cost optimization review identifies orphaned resources
After failed deployments leave zombie resources
Need time savings: 28 min per operation (30→2 min)
Proven pattern: Saved $50-100/month in Cloud Run costs

❌ Don't use this skill when:

Resources are less than 7 days old (safety check prevents deletion)
Active ingress still references the resource (prevents breaking traffic)
Production services without backup manifests
Cost savings unclear or minimal (< $10/month)

What It Automates

Before: (30+ minutes, 15+ commands)

kubectl get deployments -n coditect-app
kubectl delete deployment OLD-API -n coditect-app
kubectl delete service OLD-API -n coditect-app
gcloud run services list
gcloud run services delete SERVICE-1 --region=us-central1 --quiet
gcloud run services delete SERVICE-2 --region=us-central1 --quiet
# ... repeat 8 times
gcloud run services list  # verify
kubectl get deployments -n coditect-app  # verify

After: (2 minutes, 1 command)

./core/cleanup.sh --target=legacy-v2 --namespace=coditect-app --dry-run
./core/cleanup.sh --target=legacy-v2 --namespace=coditect-app  # execute

Usage

Cleanup Legacy API Version (GKE)

cd .claude/skills/gcp-resource-cleanup
./core/cleanup.sh --target=gke-api --name=coditect-api-v2 --namespace=coditect-app

Cleanup Orphaned Cloud Run Services

./core/cleanup.sh --target=cloud-run-orphans --region=us-central1

Cleanup Old Artifact Registry Images

./core/cleanup.sh --target=images --age-days=30 --keep-count=5

Dry Run (Safe Preview)

./core/cleanup.sh --target=cloud-run-orphans --region=us-central1 --dry-run

Safety Checks

Automatic validations:

✅ Resource age > 7 days (prevents accidental deletion of new deployments)
✅ No active ingress references (prevents breaking live traffic)
✅ No dependent services (checks configmaps, secrets, PVCs)
✅ Backup manifest creation (enables rollback)
✅ Dry-run mode (preview before execution)

Cost Tracking

Automatic cost calculation:

Cloud Run: $0.40 per million requests + idle charges
GKE pods: Resource requests × duration × pricing
Artifact Registry: Storage costs per GB/month

Example output:

Found 8 Cloud Run services to delete:
  - coditect-api-v2 (idle 30d) → ~$5/month
  - coditect-frontend (idle 20d) → ~$8/month
  ...
Total estimated savings: $52/month

Proceed with deletion? [y/N]

Implementation

See: core/cleanup.sh for complete implementation

Key functions:

cleanup_gke_deployment() - Delete deployment + service + configmap
cleanup_cloud_run_orphans() - Detect and delete orphaned Cloud Run services
cleanup_old_images() - Remove old/untagged Artifact Registry images
verify_safe_to_delete() - Safety checks before deletion
create_backup_manifest() - Export resource YAML for rollback

Validation Checklist

Test 1: Dry-run mode shows correct resources
Test 2: Age filter works (>7 days only)
Test 3: Ingress check prevents breaking live traffic
Test 4: Backup manifests created before deletion
Test 5: Cost calculation accurate

Metrics

Usage Statistics:

Times used: 1 (Oct 19, 2025)
Time saved: 28 minutes (30 min → 2 min)
Errors prevented: 2 (almost deleted active service)
Cost savings: $50-100/month

Success criteria:

✅ Zero accidental deletions of active resources
✅ 90%+ time savings vs manual cleanup
✅ Audit trail created for all deletions

Real-World Example (Oct 19, 2025)

Cleanup legacy V2 API:

# Detected and deleted:
GKE:
  - coditect-api-v2 deployment (freed 3 pods)
  - coditect-api-v2 service

Cloud Run (8 services):
  - coditect-api-v2
  - coditect-v5-api (mistaken deployment)
  - coditect-frontend
  - coditect-frontend-gke
  - day2-user-tenant-api
  - websocket-gateway
  - websocket-gateway-memory-test
  - websocket-proxy

Result: Cloud Run empty (0 services), GKE clean
Cost savings: ~$50-100/month

Troubleshooting

Error: "Resource has active ingress"

Check: kubectl get ingress --all-namespaces -o yaml | grep RESOURCE_NAME
Fix: Update ingress to point to new service, then delete old

Error: "Resource too recent (< 7 days)"

Override: --force-age-check (use with caution!)
Reason: Prevents accidental deletion of recent deployments

Error: "Dependent resources found"

Check: ConfigMaps, Secrets, PVCs referencing the resource
Fix: Delete or update dependents first

Success Output

When successful, this skill MUST output:

✅ SKILL COMPLETE: gcp-resource-cleanup

Completed:
- [x] Resource scan complete
- [x] Safety checks passed
- [x] Backup manifests created
- [x] Resources deleted successfully
- [x] Cost savings calculated

Outputs:
- Backup manifests: .coditect/backups/cleanup-{date}/
- Cleanup report: .coditect/reports/cleanup-{date}.md
- Cost savings: ${amount}/month

Resources Deleted:
- GKE deployments: {count}
- GKE services: {count}
- Cloud Run services: {count}
- Artifact images: {count}

Completion Checklist

Before marking this skill as complete, verify:

Dry-run mode executed and reviewed
All safety checks passed (age > 7 days, no active ingress)
Backup manifests created for all resources
Resources deleted without errors
No dependent services broken
Cost savings documented
Cleanup report generated
All validation steps completed

Failure Indicators

This skill has FAILED if:

❌ Safety check failed (resource too recent, active ingress found)
❌ Backup manifest creation failed
❌ Deletion command returned errors
❌ Dependent resources broke after deletion
❌ Cost calculation incorrect or missing
❌ Resources still exist after deletion
❌ Rollback not possible due to missing backups

When NOT to Use

Do NOT use this skill when:

Resources are less than 7 days old (override with --force-age-check only if certain)
Active ingress routes reference the resources (will break live traffic)
No backup strategy exists (production services without manifests)
Cost savings are unclear or minimal (< $10/month - manual cleanup faster)
Cleanup is not authorized (require approval for production deletions)
You need to clean up a single resource (use kubectl/gcloud directly)
Resources are not in GCP (use aws-resource-cleanup or azure-resource-cleanup instead)

Anti-Patterns (Avoid)

Anti-Pattern	Problem	Solution
Skipping dry-run	Deletes wrong resources	Always run --dry-run first
Ignoring safety checks	Breaks live traffic	Review all safety check failures
No backup manifests	Cannot rollback	Always create backups before deletion
Running in production without approval	Unauthorized deletions	Require approval for prod cleanups
Not verifying cost savings	Deleting wrong resources	Calculate and review cost estimates
Force-overriding age check	Deletes new deployments	Only override with explicit confirmation
Not checking dependent services	Breaks ConfigMaps, Secrets, PVCs	Run dependency check before deletion
Deleting all Cloud Run services	Removes active services	Filter by age and usage metrics

Principles

This skill embodies:

#1 Full Automation - Manual 30-minute cleanup → 2-minute automated script
#2 Self-Provisioning - Creates backup manifests, generates cost reports automatically
#5 Eliminate Ambiguity - Clear safety checks prevent accidental deletions
#6 Clear, Understandable, Explainable - Detailed dry-run preview and audit trail
#8 No Assumptions - Explicit safety checks for age, ingress, dependencies
#10 Fail Closed - Abort on any safety check failure, require explicit override

Full Standard: CODITECT-STANDARD-AUTOMATION.md

Multi-Context Window Support

This skill supports long-running cleanup operations across multiple context windows using Claude 4.5's enhanced state management capabilities.

State Tracking

Checkpoint State (JSON):

{
  "cleanup_id": "cleanup_20251129_150000",
  "cleanup_scope": "legacy_v2_api",
  "phase": "scan_complete",
  "resources_identified": {
    "gke_deployments": 2,
    "gke_services": 2,
    "cloud_run_services": 8,
    "artifact_images": 0
  },
  "resources_deleted": {
    "gke_deployments": 0,
    "gke_services": 0,
    "cloud_run_services": 0
  },
  "estimated_savings_monthly": 52,
  "safety_checks_passed": true,
  "backup_manifests_created": false,
  "token_usage": 4800,
  "created_at": "2025-11-29T15:00:00Z"
}

Progress Notes (Markdown):

# GCP Resource Cleanup Progress - 2025-11-29

## Completed
- ✅ Scanned GKE namespace (coditect-app)
- ✅ Identified legacy V2 resources
  - 2 deployments (coditect-api-v2, old-frontend)
  - 2 services
  - 8 Cloud Run services
- ✅ Estimated savings: $52/month

## In Progress
- Safety checks pending
- Backup manifest creation

## Next Actions
- Create backup manifests for all resources
- Dry-run deletion to verify safety
- Execute deletion if approved
- Monitor for broken dependencies

Session Recovery

When starting a fresh context window after cleanup work:

Load Checkpoint State: Read .coditect/checkpoints/gcp-cleanup-latest.json
Review Progress Notes: Check cleanup-progress.md for scan results
Verify Resources Identified: Re-list resources to confirm still present
Resume Deletion: Continue with pending deletions
Validate Cleanup: Confirm resources deleted and no issues

Recovery Commands:

# 1. Check latest checkpoint
cat .coditect/checkpoints/gcp-cleanup-latest.json | jq '.resources_identified'

# 2. Review progress
tail -25 cleanup-progress.md

# 3. Verify resources still exist
kubectl get deployments -n coditect-app | grep -E "(v2|old)"
gcloud run services list --region=us-central1

# 4. Check backup manifests
ls -la .coditect/backups/cleanup-20251129/

# 5. Check deletion status
cat .coditect/checkpoints/gcp-cleanup-latest.json | jq '.resources_deleted'

State Management Best Practices

Checkpoint Files (JSON Schema):

Store in .coditect/checkpoints/gcp-cleanup-{scope}.json
Track resources identified vs deleted separately
Record estimated cost savings for reporting
Include safety check results

Progress Tracking (Markdown Narrative):

Maintain cleanup-progress.md with scan and deletion results
Document resources deleted with timestamps
Note any errors or warnings during deletion
List cost savings achieved

Git Integration:

Save backup manifests to .coditect/backups/cleanup-{date}/
Create cleanup report in .coditect/reports/cleanup-{date}.md
Tag cleanup operations: git tag cleanup-{scope}-{date}

Progress Checkpoints

Natural Breaking Points:

After resource scan complete
After safety checks passed
After backup manifests created
After each resource type deleted (GKE, Cloud Run, Images)
After cleanup validated

Checkpoint Creation Pattern:

# Automatic checkpoint creation after each deletion batch
if deletion_batch_complete or resources_deleted_count % 5 == 0:
    create_checkpoint({
        "cleanup_scope": scope,
        "phase": current_phase,
        "resources_identified": identified_counts,
        "resources_deleted": deleted_counts,
        "tokens": current_token_usage
    })

Example: Multi-Context Cleanup

Context Window 1: Scan + Safety Checks

{
  "checkpoint_id": "ckpt_cleanup_part1",
  "phase": "safety_checks_complete",
  "resources_identified": {
    "gke": 4,
    "cloud_run": 8,
    "images": 15
  },
  "safety_checks_passed": true,
  "backup_manifests_created": true,
  "next_action": "Execute deletion",
  "token_usage": 4800
}

Context Window 2: Execute Deletion + Validation

# Resume from checkpoint
cat .coditect/checkpoints/ckpt_cleanup_part1.json

# Continue with deletion
# (Context restored in 2 minutes vs 10 minutes from scratch)

{
  "checkpoint_id": "ckpt_cleanup_complete",
  "phase": "cleanup_validated",
  "resources_deleted": {
    "gke": 4,
    "cloud_run": 8,
    "images": 15
  },
  "estimated_savings_realized": 52,
  "no_issues_detected": true,
  "token_usage": 3200
}

Token Savings: 4800 (first context) + 3200 (second context) = 8000 total vs. 14000 without checkpoint = 43% reduction

Reference: See docs/CLAUDE-4.5-BEST-PRACTICES.md for complete multi-context window workflow guidance.

How to Use This Skill​

When to Use This Skill​

What It Automates​

Usage​

Cleanup Legacy API Version (GKE)​

Cleanup Orphaned Cloud Run Services​

Cleanup Old Artifact Registry Images​

Dry Run (Safe Preview)​

Safety Checks​

Cost Tracking​

Implementation​

Validation Checklist​

Metrics​

Real-World Example (Oct 19, 2025)​

Troubleshooting​

See Also​

Success Output​

Completion Checklist​

Failure Indicators​

When NOT to Use​

Anti-Patterns (Avoid)​

Principles​

Multi-Context Window Support​

State Tracking​

Session Recovery​

State Management Best Practices​

Progress Checkpoints​

Example: Multi-Context Cleanup​