WF-072: Database Backup Flow
Priority: P0 (Critical) | Phase: Phase 1D - Security & Operations | Effort: 12 hours
Overview
Automated daily PostgreSQL snapshot at 2am UTC, weekly restore testing (Mondays), 30-day retention with automatic cleanup, and email reporting.
Trigger: Scheduled (daily 2am UTC) | Duration: ~10-15 minutes
Flow
- Create GCP Cloud SQL snapshot (automated backup)
- Poll backup status (max 10 min timeout)
- If successful:
- Weekly (Monday): Test restore to test instance
- List all backups
- Identify backups > 30 days old
- Delete old backups
- Log backup success
- Report via email
- If failed: Alert ops team via email
Backup Strategy
- Frequency: Daily at 2am UTC
- Retention: 30 days
- Restore Testing: Weekly (every Monday)
- Verification: Automated restore test to separate instance
- Storage: GCP Cloud SQL managed backups (replicated cross-region)
Business Impact
- RTO (Recovery Time Objective): < 2 hours
- RPO (Recovery Point Objective): < 24 hours
- Restore Success Rate: 100% (tested weekly)
- Storage Cost: ~$50/month (30 days × 50GB avg)
Testing
- Daily backup executes at 2am UTC
- Backup completes successfully
- Weekly restore test works (Monday)
- Old backups deleted (> 30 days)
- Success reported via email
- Failure alerts sent to ops team
- Backup log updated
Status: ✅ Ready for Implementation