Skip to main content

WF-072: Database Backup Flow

Priority: P0 (Critical) | Phase: Phase 1D - Security & Operations | Effort: 12 hours

Overview

Automated daily PostgreSQL snapshot at 2am UTC, weekly restore testing (Mondays), 30-day retention with automatic cleanup, and Slack reporting.

Trigger: Scheduled (daily 2am UTC) | Duration: ~10-15 minutes

Flow

Create GCP Cloud SQL snapshot (automated backup)
Poll backup status (max 10 min timeout)
If successful:
- Weekly (Monday): Test restore to test instance
- List all backups
- Identify backups > 30 days old
- Delete old backups
- Log backup success
- Report to Slack
If failed: Alert ops team via Slack

Backup Strategy

Frequency: Daily at 2am UTC
Retention: 30 days
Restore Testing: Weekly (every Monday)
Verification: Automated restore test to separate instance
Storage: GCP Cloud SQL managed backups (replicated cross-region)

Business Impact

RTO (Recovery Time Objective): < 2 hours
RPO (Recovery Point Objective): < 24 hours
Restore Success Rate: 100% (tested weekly)
Storage Cost: ~$50/month (30 days × 50GB avg)

Testing

Daily backup executes at 2am UTC
Backup completes successfully
Weekly restore test works (Monday)
Old backups deleted (> 30 days)
Success reported to Slack
Failure alerts sent to ops team
Backup log updated

Status: ✅ Ready for Implementation

Overview
Flow
Backup Strategy
Business Impact
Testing