Skip to main content

Deployment Archeology Skill

Deployment Archeology Skill

How to Use This Skill

  1. Review the patterns and examples below
  2. Apply the relevant patterns to your implementation
  3. Follow the best practices outlined in this skill

Purpose: Find and restore previous successful deployment configurations by analyzing git history, Cloud Build logs, and Kubernetes deployments.

When to Use

Use this skill when:

  • Current deployment failing and need to find what worked before
  • Need to understand how a service was originally deployed
  • Investigating deployment regressions
  • Recovering from accidental configuration changes

Process

Step 1: Identify Current Deployment Date

# Get deployment creation timestamp from Kubernetes
kubectl get deployment <DEPLOYMENT_NAME> -n <NAMESPACE> -o jsonpath='{.metadata.creationTimestamp}'

Step 2: Search Cloud Build History

# List builds around the deployment date
gcloud builds list \
--filter="createTime>='YYYY-MM-DDT00:00:00Z' AND createTime<='YYYY-MM-DDT23:59:59Z'" \
--format="table(id,status,createTime)" \
--limit=50

Step 3: Analyze Successful Build

# Get build configuration from successful build closest to deployment time
gcloud builds describe <BUILD_ID> --format="yaml(steps,substitutions,options)"

Key things to extract:

  • Dockerfile name (check args for -f flag)
  • Machine type (options.machineType)
  • Environment variables (options.env)
  • Build steps and deployment method

Step 4: Search Git History

# Find commits around deployment date
git log --since="YYYY-MM-DD" --until="YYYY-MM-DD" --oneline --all

# Check if files were archived
git log --all --full-history -- <FILENAME>

Step 5: Restore Configuration

# Find archived files
find . -name "<FILENAME>" -type f

# Check git history for file content at specific date
git show <COMMIT>:<PATH>

# Restore from archive directory if needed
cp docs/99-archive/deployment-obsolete/<FILE> ./<FILE>

Example: Combined Service Recovery (Oct 18, 2025)

Problem: Dockerfile.combined failing to build

Investigation:

  1. Found deployment created: 2025-10-13T09:58:29Z
  2. Found successful build: 6e95a4d9-2f19-456c-bba8-5a1ed7a8fdf7 at 09:50:07Z
  3. Build used: Dockerfile.local-test (not Dockerfile.combined!)
  4. Machine: E2_HIGHCPU_32 with NODE_OPTIONS=--max_old_space_size=8192
  5. File archived in commit 04ef4b4 to docs/99-archive/deployment-obsolete/

Recovery:

# Restore working Dockerfile
cp docs/99-archive/deployment-obsolete/Dockerfile.local-test ./

# Update cloudbuild config
# Change: Dockerfile.combined -> Dockerfile.local-test
# Change: N1_HIGHCPU_8 -> E2_HIGHCPU_32
# Add: NODE_OPTIONS=--max_old_space_size=8192

# Rebuild with proven config
gcloud builds submit --config cloudbuild-combined.yaml .

Automation Script

#!/bin/bash
# deployment-archeology.sh - Find previous successful build config

DEPLOYMENT=$1
NAMESPACE=${2:-default}

echo "=== Deployment Archeology ==="

# Step 1: Get deployment date
DEPLOY_DATE=$(kubectl get deployment $DEPLOYMENT -n $NAMESPACE -o jsonpath='{.metadata.creationTimestamp}')
SEARCH_DATE=$(date -d $DEPLOY_DATE '+%Y-%m-%d')

echo "Deployment created: $DEPLOY_DATE"
echo "Searching builds on: $SEARCH_DATE"

# Step 2: Find builds on that date
echo ""
echo "=== Cloud Build History ==="
gcloud builds list \
--filter="createTime>='${SEARCH_DATE}T00:00:00Z' AND createTime<='${SEARCH_DATE}T23:59:59Z'" \
--format="table(id,status,createTime)" \
--limit=20

# Step 3: Show git commits around that date
echo ""
echo "=== Git History ==="
git log --since="$SEARCH_DATE" --until="$(date -d "$SEARCH_DATE + 1 day" '+%Y-%m-%d')" --oneline --all | head -20

echo ""
echo "Next steps:"
echo "1. Identify successful build ID (STATUS=SUCCESS)"
echo "2. Run: gcloud builds describe <BUILD_ID> --format='yaml(steps,options)'"
echo "3. Check for archived files: find . -name 'Dockerfile*' | grep archive"
echo "4. Compare current config vs successful build config"

Tips

  1. Look for BUILD_ID vs SHORT_SHA: Manual builds use $BUILD_ID, git triggers use $SHORT_SHA
  2. Check machine type: Theia builds need high CPU (E2_HIGHCPU_32)
  3. Node memory: Webpack builds often need 8GB+ heap (NODE_OPTIONS=--max_old_space_size=8192)
  4. Archive directories: Check docs/99-archive/ and archive/ for old configs
  5. Git submodules: May contain reference implementations

Common Gotchas

  • ❌ Assuming current files match deployed version
  • ❌ Not checking environment variables in Cloud Build options
  • ❌ Forgetting to check for archived/moved files
  • ❌ Using wrong Dockerfile (may have multiple variants)
  • ❌ Missing build prerequisites (like pre-built dist/ directory)

Integration with Other Skills

  • codebase-locator: Find all Dockerfile variants
  • thoughts-locator: Find deployment session exports
  • web-search-researcher: Research Cloud Build error messages

Multi-Context Window Support

This skill supports long-running deployment investigation tasks across multiple context windows using Claude 4.5's enhanced state management capabilities.

State Tracking

Checkpoint State (JSON):

{
"investigation_id": "archeology_20251129_150000",
"target_deployment": "coditect-combined",
"phase": "build_history_analyzed",
"deployment_date": "2025-10-13T09:58:29Z",
"successful_builds_found": [
{"build_id": "6e95a4d9-2f19-456c-bba8-5a1ed7a8fdf7", "date": "2025-10-13T09:50:07Z"},
{"build_id": "abc123-def456-789", "date": "2025-10-12T15:30:00Z"}
],
"config_differences": [],
"archived_files_located": ["docs/99-archive/deployment-obsolete/Dockerfile.local-test"],
"recovery_plan_created": false,
"token_usage": 8200,
"created_at": "2025-11-29T15:00:00Z"
}

Progress Notes (Markdown):

# Deployment Archeology Progress - 2025-11-29

## Completed
- ✅ Found deployment timestamp: 2025-10-13T09:58:29Z
- ✅ Searched Cloud Build history around that date
- ✅ Found 2 successful builds (6e95a4d9, abc123)
- ✅ Located archived Dockerfile.local-test

## In Progress
- Analyzing build configuration differences
- Creating recovery plan

## Key Findings
- Successful build used Dockerfile.local-test (not Dockerfile.combined)
- Machine type: E2_HIGHCPU_32 (not N1_HIGHCPU_8)
- NODE_OPTIONS=--max_old_space_size=8192

## Next Actions
- Compare current config vs successful build
- Create step-by-step recovery plan
- Test recovery with dry-run build

Session Recovery

When starting a fresh context window after deployment investigation:

  1. Load Checkpoint State: Read .coditect/checkpoints/deployment-archeology-latest.json
  2. Review Progress Notes: Check archeology-progress.md for findings
  3. Verify Build History: Confirm successful builds identified
  4. Resume Analysis: Continue with config comparison or recovery plan
  5. Apply Recovery: Execute recovery plan if ready

Recovery Commands:

# 1. Check latest checkpoint
cat .coditect/checkpoints/deployment-archeology-latest.json | jq '.successful_builds_found'

# 2. Review progress
tail -30 archeology-progress.md

# 3. Verify successful build details
gcloud builds describe 6e95a4d9-2f19-456c-bba8-5a1ed7a8fdf7 --format=yaml

# 4. Check archived files
ls -la docs/99-archive/deployment-obsolete/

# 5. Compare configs
diff current-config.yaml successful-build-config.yaml

State Management Best Practices

Checkpoint Files (JSON Schema):

  • Store in .coditect/checkpoints/deployment-archeology-{target}.json
  • Track successful builds found with timestamps
  • Record config differences for comparison
  • Include archived file locations for quick access

Progress Tracking (Markdown Narrative):

  • Maintain archeology-progress.md with investigation timeline
  • Document key findings and insights
  • Note configuration differences discovered
  • List recovery steps with commands

Git Integration:

  • Save recovery plan to .coditect/reports/recovery-plan-{target}-{date}.md
  • Tag investigations: git tag archeology-{target}-{date}

Progress Checkpoints

Natural Breaking Points:

  1. After deployment timestamp identified
  2. After Cloud Build history searched
  3. After successful build configurations analyzed
  4. After recovery plan created
  5. After recovery plan tested/executed

Checkpoint Creation Pattern:

# Automatic checkpoint creation after each phase
if phase in ["deployment_found", "builds_analyzed", "recovery_plan_created"]:
create_checkpoint({
"target_deployment": deployment_name,
"phase": phase,
"successful_builds_found": builds_list,
"config_differences": diffs_list,
"tokens": current_token_usage
})

Example: Multi-Context Investigation

Context Window 1: Discovery Phase

{
"checkpoint_id": "ckpt_archeology_part1",
"phase": "builds_found",
"deployment_date": "2025-10-13T09:58:29Z",
"successful_builds_found": 2,
"archived_files_located": 1,
"next_action": "Analyze build configurations",
"token_usage": 8200
}

Context Window 2: Analysis & Recovery Plan

# Resume from checkpoint
cat .coditect/checkpoints/ckpt_archeology_part1.json

# Continue with config analysis
# (Context restored in 2 minutes vs 12 minutes from scratch)

{
"checkpoint_id": "ckpt_archeology_complete",
"phase": "recovery_plan_ready",
"config_differences_identified": 3,
"recovery_plan_created": true,
"plan_path": ".coditect/reports/recovery-plan-combined-2025-11-29.md",
"token_usage": 6500
}

Token Savings: 8200 (first context) + 6500 (second context) = 14700 total vs. 25000 without checkpoint = 41% reduction

Reference: See docs/CLAUDE-4.5-BEST-PRACTICES.md for complete multi-context window workflow guidance.


Success Output

When this skill completes successfully, output:

✅ SKILL COMPLETE: deployment-archeology

Completed:
- [x] Deployment timestamp identified from Kubernetes
- [x] Cloud Build history searched for successful builds
- [x] Successful build configuration analyzed
- [x] Git history searched for archived files
- [x] Configuration files restored
- [x] Recovery plan created and validated

Outputs:
- Deployment date: 2025-10-13T09:58:29Z
- Successful build ID: 6e95a4d9-2f19-456c-bba8-5a1ed7a8fdf7
- Restored files: Dockerfile.local-test, cloudbuild config
- Recovery plan: .coditect/reports/recovery-plan-combined-2025-11-29.md
- Configuration differences documented

Completion Checklist

Before marking this skill as complete, verify:

  • Deployment creation timestamp retrieved from Kubernetes
  • Cloud Build history searched around deployment date
  • At least one successful build identified
  • Build configuration extracted (Dockerfile, machine type, env vars)
  • Git history searched for commits around deployment date
  • Archived files located (check docs/99-archive/)
  • Configuration differences documented
  • Recovery plan created with step-by-step commands
  • Checkpoint saved to .coditect/checkpoints/

Failure Indicators

This skill has FAILED if:

  • ❌ Cannot retrieve deployment timestamp from Kubernetes
  • ❌ No Cloud Build history found for deployment date range
  • ❌ No successful builds found (all builds failed or not found)
  • ❌ Build configuration incomplete (missing Dockerfile, machine type, etc.)
  • ❌ Git history search returns no relevant commits
  • ❌ Cannot locate archived configuration files
  • ❌ Configuration differences not identified
  • ❌ Recovery plan missing or incomplete

When NOT to Use

Do NOT use this skill when:

  • Deployment is currently working (no investigation needed)
  • Issue is not deployment-related (use error-debugging-patterns instead)
  • You have the exact working configuration already
  • Problem is with application code, not deployment config
  • Building a new service from scratch (use deployment-automation instead)
  • Only need to check current deployment status (use kubectl get deployment)
  • Working with local development environment (not production/staging)

Use alternatives:

  • error-debugging-patterns - For application errors/bugs
  • deployment-automation - For new deployments
  • kubernetes-troubleshooting - For current deployment issues
  • git-workflow-automation - For general git history searches

Anti-Patterns (Avoid)

Anti-PatternProblemSolution
Assuming current files match deployedFiles may have been archived/modifiedAlways check Cloud Build logs for actual config used
Ignoring environment variablesMissing critical config (NODE_OPTIONS, etc.)Check Cloud Build options.env field
Not checking for archived filesMiss working configurationsSearch docs/99-archive/, git log --all --full-history
Using wrong DockerfileMultiple variants existVerify from Cloud Build args -f flag
Forgetting build prerequisitesDeployment may need pre-built artifactsCheck if dist/ or other artifacts required
Single date searchBuild may be day before/afterSearch ±1 day from deployment date
Ignoring machine typeLow-resource machine causes failuresCheck Cloud Build options.machineType
Not documenting recovery planKnowledge lost if context clearedAlways create recovery-plan-{target}-{date}.md

Principles

This skill embodies these CODITECT principles:

  • #1 Search Before Create - Find what worked before instead of rebuilding from scratch
  • #2 Evidence-Based Recovery - Use actual Cloud Build logs, not assumptions
  • #3 Historical Context Matters - Git history and deployment dates provide critical clues
  • #5 Eliminate Ambiguity - Document exact configuration differences discovered
  • #6 Clear, Understandable - Recovery plan with step-by-step commands
  • #7 Systematic Investigation - Follow 5-step process (timestamp → builds → analyze → git → restore)
  • #8 No Assumptions - Verify every configuration detail from actual logs

Reference: CODITECT-STANDARD-AUTOMATION.md