Tenant Data Export and Migration Specification (Comprehensive)
Document ID: CODITECT-BIO-EXPORT-002 Version: 2.0.0 Effective Date: 2026-02-16 Classification: Internal - Confidential Owner: Chief Technology Officer (CTO) / Data Protection Officer (DPO)
Executive Summary
This document extends the base tenant data export specification (CODITECT-BIO-TDEM-001) with comprehensive coverage of:
- Large-Scale Export Handling: Streaming exports for tenants with >1M records using JSON Lines format
- GDPR Article 20 Detailed Compliance: Machine-readable data portability with JSON-LD semantic tagging
- M&A Migration Workflows: Multi-phase tenant-to-tenant migration with incremental delta sync and <4-hour cutover
- Advanced Integrity Verification: SHA-256 checksums, relationship integrity, attachment completeness, signature verification
- Operational Excellence: Background processing, resume capability, progress notifications, monitoring dashboards
Key Enhancements Over Base Specification:
- Streaming Export: JSON Lines (
.jsonl) for memory-efficient processing of 10M+ record tenants - Checkpoint Resume: Export jobs resume from last checkpoint after failures (network, timeout, OOM)
- Incremental Migration: Delta sync reduces cutover window from 4 hours to 30 minutes
- Integrity Guarantees: Zero tolerance for data loss (<0.01% record loss threshold triggers automatic rollback)
- GDPR Automation: Self-service personal data export responding to Article 15+20 requests in <24 hours
Compliance Mapping:
- GDPR Article 20: Right to data portability in structured, commonly used, machine-readable format
- GDPR Article 15: Right of access to personal data
- FDA 21 CFR Part 11 §11.10(c): Protection of records to enable accurate and ready retrieval
- HIPAA §164.524: Individual right of access to PHI within 30 days
Table of Contents
- Export Format Specifications
- Data Completeness Guarantees
- GDPR Article 20 Data Portability
- Tenant-to-Tenant Migration (M&A Scenarios)
- Export Integrity Verification
- Large-Scale Export Handling
- Operational Procedures
- Security and Compliance
- Monitoring and Alerting
1. Export Format Specifications
1.1 JSON Export (Full-Fidelity)
Purpose: Complete tenant data export with all relationships, metadata, and audit history preserved.
Format: JSON Lines (.jsonl) for streaming large datasets, single JSON (.json) for <10K records.
Schema Structure:
{
"export_metadata": {
"export_id": "exp_3kj2h5k234jh5k",
"tenant_id": "org_k3jh5k2j3h45k",
"tenant_name": "Acme Pharmaceuticals Inc.",
"export_type": "full",
"export_initiated_at": "2026-02-16T14:32:18.234Z",
"export_completed_at": "2026-02-16T14:45:22.891Z",
"export_requested_by": "user_admin@acmepharma.com",
"platform_version": "1.2.0",
"schema_version": "1.0.0",
"total_records": 152847,
"total_attachments": 3241,
"total_size_bytes": 4823947234,
"encryption_key_id": "kek_tenant_acme_20260216",
"checksum_algorithm": "SHA-256",
"manifest_checksum": "a3f5b8c2d9e1f4a7b6c3d8e2f5a9b7c4d1e8f2a6b9c5d3e7f1a4b8c2d9e6f3a1"
},
"tenant_configuration": {
"subscription_tier": "enterprise",
"regulatory_profiles": ["FDA_PART_11", "HIPAA", "SOC2"],
"features_enabled": ["electronic_signatures", "audit_trail", "workflow_automation"],
"custom_fields": [...],
"workflow_definitions": [...],
"document_types": [...]
},
"users": [...],
"groups": [...],
"roles": [...],
"documents": [...],
"electronic_signatures": [...],
"audit_trail": [...],
"attachments": [...],
"workflows": [...],
"notifications": [...]
}
Document Record Schema:
{
"id": "doc_abc123",
"tenant_id": "org_k3jh5k2j3h45k",
"document_number": "SOP-001-2026",
"title": "Standard Operating Procedure - Equipment Validation",
"document_type": "sop",
"version": "3.2.1",
"status": "approved",
"created_at": "2026-01-15T09:23:45.123Z",
"created_by_user_id": "user_123",
"created_by_email": "john.doe@acmepharma.com",
"modified_at": "2026-02-10T14:12:33.456Z",
"modified_by_user_id": "user_456",
"approved_at": "2026-02-11T08:45:12.789Z",
"approved_by_user_id": "user_789",
"content_checksum_sha256": "b4f7c9d3e1a5b8c2d9e6f3a7b1c4d8e2f5a9b6c3d7e1f4a8b2c9d6e3f7a1b5c8",
"attachments": ["att_001", "att_002"],
"signatures": ["sig_001", "sig_002", "sig_003"],
"audit_trail_ids": ["audit_001", "audit_002", "audit_003"],
"relationships": {
"supersedes": "doc_xyz789",
"related_documents": ["doc_def456", "doc_ghi789"]
},
"metadata": {
"department": "Quality Assurance",
"effective_date": "2026-02-15",
"review_cycle_months": 12,
"next_review_date": "2027-02-15",
"classification": "controlled",
"retention_years": 7
}
}
Electronic Signature Record Schema:
{
"id": "sig_001",
"tenant_id": "org_k3jh5k2j3h45k",
"document_id": "doc_abc123",
"document_version": "3.2.1",
"signer_user_id": "user_789",
"signer_email": "jane.smith@acmepharma.com",
"signer_full_name": "Jane Smith, PhD",
"signature_meaning": "approved",
"signed_at": "2026-02-11T08:45:12.789Z",
"signature_algorithm": "ECDSA_P256_SHA256",
"signature_value_base64": "MEUCIQDf8sY2...",
"certificate_pem": "-----BEGIN CERTIFICATE-----\n...",
"certificate_serial_number": "3A:4F:8B:2C:...",
"certificate_subject": "CN=Jane Smith,OU=Quality Assurance,O=Acme Pharma,C=US",
"certificate_issuer": "CN=CODITECT BIO-QMS CA,O=AZ1.AI Inc,C=US",
"certificate_not_before": "2025-01-01T00:00:00Z",
"certificate_not_after": "2028-01-01T00:00:00Z",
"timestamp_authority": "tsa.coditect.ai",
"timestamp_value": "2026-02-11T08:45:13.234Z",
"timestamp_signature_base64": "MIIBrDCC...",
"reason": "Document meets regulatory requirements and QA standards",
"ip_address": "10.24.5.123",
"user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)...",
"two_factor_method": "totp",
"password_verified": true,
"biometric_verified": false,
"audit_trail_id": "audit_sig_001"
}
1.2 CSV Export (Flat Tables)
Purpose: Spreadsheet-compatible format for analysis and reporting.
Format: One CSV file per entity type, UTF-8 encoding, comma-delimited, RFC 4180 compliant.
File Structure:
tenant_export_org_k3jh5k2j3h45k_20260216/
├── manifest.json
├── documents.csv
├── electronic_signatures.csv
├── audit_trail.csv
├── users.csv
├── groups.csv
├── roles.csv
├── attachments.csv (metadata only, binaries separate)
├── workflows.csv
├── notifications.csv
└── checksums.txt
Example: documents.csv
id,tenant_id,document_number,title,document_type,version,status,created_at,created_by_email,approved_at,approved_by_email,content_checksum_sha256,attachment_ids,signature_ids
doc_abc123,org_k3jh5k2j3h45k,SOP-001-2026,"Standard Operating Procedure - Equipment Validation",sop,3.2.1,approved,2026-01-15T09:23:45.123Z,john.doe@acmepharma.com,2026-02-11T08:45:12.789Z,jane.smith@acmepharma.com,b4f7c9d3e1a5b8c2d9e6f3a7b1c4d8e2f5a9b6c3d7e1f4a8b2c9d6e3f7a1b5c8,"att_001,att_002","sig_001,sig_002,sig_003"
1.3 XML Export (Structured with XSD)
Purpose: Industry-standard structured format with schema validation for regulatory submissions.
Format: XML 1.0, UTF-8, with XSD schema validation.
Root Structure:
<?xml version="1.0" encoding="UTF-8"?>
<TenantExport xmlns="https://coditect.ai/schemas/bio-qms/export/v1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="https://coditect.ai/schemas/bio-qms/export/v1 tenant-export-schema-v1.0.xsd">
<ExportMetadata>
<ExportID>exp_3kj2h5k234jh5k</ExportID>
<TenantID>org_k3jh5k2j3h45k</TenantID>
<TenantName>Acme Pharmaceuticals Inc.</TenantName>
<ExportType>full</ExportType>
<ExportInitiatedAt>2026-02-16T14:32:18.234Z</ExportInitiatedAt>
<ExportCompletedAt>2026-02-16T14:45:22.891Z</ExportCompletedAt>
<PlatformVersion>1.2.0</PlatformVersion>
<SchemaVersion>1.0.0</SchemaVersion>
<TotalRecords>152847</TotalRecords>
<ManifestChecksum algorithm="SHA-256">a3f5b8c2d9e1f4a7b6c3d8e2f5a9b7c4d1e8f2a6b9c5d3e7f1a4b8c2d9e6f3a1</ManifestChecksum>
</ExportMetadata>
<Documents>
<Document id="doc_abc123">
<DocumentNumber>SOP-001-2026</DocumentNumber>
<Title>Standard Operating Procedure - Equipment Validation</Title>
<DocumentType>sop</DocumentType>
<Version>3.2.1</Version>
<Status>approved</Status>
<CreatedAt>2026-01-15T09:23:45.123Z</CreatedAt>
<CreatedBy email="john.doe@acmepharma.com">user_123</CreatedBy>
<ApprovedAt>2026-02-11T08:45:12.789Z</ApprovedAt>
<ApprovedBy email="jane.smith@acmepharma.com">user_789</ApprovedBy>
<ContentChecksum algorithm="SHA-256">b4f7c9d3e1a5b8c2d9e6f3a7b1c4d8e2f5a9b6c3d7e1f4a8b2c9d6e3f7a1b5c8</ContentChecksum>
<Attachments>
<Attachment ref="att_001"/>
<Attachment ref="att_002"/>
</Attachments>
<Signatures>
<Signature ref="sig_001"/>
<Signature ref="sig_002"/>
<Signature ref="sig_003"/>
</Signatures>
</Document>
</Documents>
</TenantExport>
1.4 PDF Export (Human-Readable)
Purpose: Regulatory-compliant human-readable export for archival and audits.
Format: PDF/A-2b (ISO 19005-2) for long-term archival compliance.
Structure:
- Title Page: Tenant name, export date, document ID, certification statement
- Table of Contents: Hyperlinked to sections
- Export Metadata: Export ID, date range, record counts, checksum
- Tenant Configuration: Subscription, regulatory profiles, features
- User Directory: All users with roles and permissions
- Document Inventory: Table with document number, title, status, approval date
- Document Details: One page per document with full metadata, signature details, audit history
- Electronic Signatures: Certificate details, signature meaning, timestamp
- Audit Trail: Chronological log of all tenant activities
- Attachment Inventory: List of attachments with checksums (binaries not included in PDF)
- Certification Page: Digital signature from DPO attesting to export completeness
2. Data Completeness Guarantees
2.1 Complete Data Inventory Per Tenant
10-Entity Export Coverage:
| Entity Type | Description | Relationships | Export Format |
|---|---|---|---|
| Documents | All QMS documents (SOPs, protocols, reports, forms) | Attachments, Signatures, Audit Trail, Workflows | JSON, CSV, XML, PDF |
| Electronic Signatures | FDA Part 11 compliant signatures with certificates | Documents, Users, Audit Trail | JSON, CSV, XML, PDF |
| Audit Trail | Complete activity log (HIPAA §164.312(b)) | All entities | JSON, CSV, XML, PDF |
| Users | User accounts, profiles, authentication history | Groups, Roles, Documents, Signatures | JSON, CSV, XML |
| Groups | User groups for access control | Users, Roles, Permissions | JSON, CSV, XML |
| Roles | RBAC roles with permission sets | Users, Groups, Permissions | JSON, CSV, XML |
| Attachments | Binary files (PDFs, images, spreadsheets) | Documents, Audit Trail | JSON (metadata), Binary Files |
| Workflows | XState workflow definitions and execution history | Documents, Users, Audit Trail | JSON, CSV, XML |
| Notifications | System notifications and alerts | Users, Documents, Workflows | JSON, CSV |
| Configurations | Tenant settings, custom fields, document types | All entities | JSON, XML |
2.2 Attachment Export (Binary Files with Metadata)
Directory Structure:
export_org_k3jh5k2j3h45k_20260216/
├── attachments/
│ ├── 2026/
│ │ ├── 01/
│ │ │ ├── att_001.pdf (original filename preserved)
│ │ │ └── att_001.json (metadata sidecar)
│ │ └── 02/
│ │ ├── att_002.xlsx
│ │ └── att_002.json
│ └── checksums.txt (SHA-256 for all files)
└── attachments_manifest.json
Attachment Metadata Sidecar (att_001.json):
{
"id": "att_001",
"tenant_id": "org_k3jh5k2j3h45k",
"document_id": "doc_abc123",
"original_filename": "equipment-calibration-photos.pdf",
"content_type": "application/pdf",
"size_bytes": 2847392,
"checksum_sha256": "c8f2e9a1b3d4c7e6f9a2b5c8d1e4f7a9b3c6d9e2f5a8b1c4d7e9f2a5b8c1d4e7",
"uploaded_at": "2026-01-20T11:45:23Z",
"uploaded_by_user_id": "user_123",
"uploaded_by_email": "john.doe@acmepharma.com",
"virus_scan_status": "clean",
"virus_scan_timestamp": "2026-01-20T11:45:45Z",
"export_path": "attachments/2026/01/att_001.pdf"
}
3. GDPR Article 20 Data Portability
3.1 GDPR Article 20 Requirements
GDPR Article 20 - Right to Data Portability:
The data subject shall have the right to receive the personal data concerning him or her, which he or she has provided to a controller, in a structured, commonly used and machine-readable format and have the right to transmit those data to another controller without hindrance.
Compliance Implementation:
| GDPR Requirement | BIO-QMS Implementation | Evidence |
|---|---|---|
| Structured format | JSON with documented schema | tenant-export-schema-v1.0.json |
| Commonly used format | JSON, CSV, XML (industry standards) | RFC 8259, RFC 4180, W3C XML |
| Machine-readable | JSON-LD with schema.org vocabulary | Parsable by standard libraries |
| Without hindrance | Self-service export UI, API endpoint | <30-day SLA, zero fees |
| Transmit to another controller | Portable format with import guide | IMPORT-GUIDE.md included |
3.2 Machine-Readable Format (JSON-LD with schema.org)
JSON-LD Context for Semantic Portability:
{
"@context": {
"@vocab": "https://schema.org/",
"coditect": "https://coditect.ai/schemas/bio-qms/v1#",
"gdpr": "https://w3c.github.io/dpv/dpv-gdpr#"
},
"@type": "Dataset",
"name": "Tenant Data Export - Acme Pharmaceuticals Inc.",
"description": "Complete tenant data export for GDPR Article 20 data portability",
"dateCreated": "2026-02-16T14:32:18.234Z",
"creator": {
"@type": "Organization",
"name": "CODITECT BIO-QMS Platform",
"url": "https://bio-qms.coditect.ai"
},
"tenant": {
"@type": "Organization",
"@id": "org:org_k3jh5k2j3h45k",
"name": "Acme Pharmaceuticals Inc.",
"legalName": "Acme Pharmaceuticals Incorporated",
"url": "https://acmepharma.com"
},
"dataSubjects": [
{
"@type": "Person",
"@id": "user:user_456",
"email": "mary.jones@acmepharma.com",
"name": "Mary Jones",
"jobTitle": "Senior QA Specialist",
"worksFor": {"@id": "org:org_k3jh5k2j3h45k"},
"gdpr:personalData": {
"gdpr:dataCategory": "user_profile",
"gdpr:legalBasis": "contract",
"gdpr:processingPurpose": "quality_management_system_access",
"gdpr:retentionPeriod": "P7Y"
}
}
],
"distribution": [
{
"@type": "DataDownload",
"encodingFormat": "application/json",
"contentUrl": "tenant_export_org_k3jh5k2j3h45k_20260216.json",
"contentSize": "4823947234 bytes",
"sha256": "a3f5b8c2d9e1f4a7b6c3d8e2f5a9b7c4d1e8f2a6b9c5d3e7f1a4b8c2d9e6f3a1"
}
],
"license": "https://creativecommons.org/licenses/by/4.0/",
"isAccessibleForFree": true,
"gdpr:dataPortability": {
"gdpr:portabilityFormat": "JSON-LD",
"gdpr:portabilityCompliance": "GDPR Article 20",
"gdpr:importGuide": "IMPORT-GUIDE.md"
}
}
3.3 Automated Export on Data Subject Request
GDPR Article 15 + 20 Combined Request:
Timeline:
- Day 0: Request received, identity verification initiated
- Day 1-2: Identity verified, export generated
- Day 2: Secure download link emailed
- Day 7: Reminder if not downloaded
- Day 21: Link expires, can re-request
Export Contents (Personal Data Only):
personal_data_export_user_456_20260216/
├── GDPR-ARTICLE-15-RESPONSE.md (human-readable summary)
├── GDPR-ARTICLE-20-DATA.json (machine-readable)
├── user_profile.json
├── documents_authored.csv
├── signatures.csv
├── audit_trail_personal.csv (only records where user is actor or subject)
├── permissions_history.json
├── attachments/ (only files uploaded by user)
└── checksums.txt
4. Tenant-to-Tenant Migration (M&A Scenarios)
4.1 M&A Migration Use Cases
Scenario 1: Company Acquisition
- Acquired company (TenantA) merges into parent (TenantB)
- All TenantA data accessible in TenantB
- TenantA users become TenantB users
- TenantA audit trail preserved
- TenantA encryption keys retained for signature verification
Scenario 2: Spin-Off / Divestiture
- Division spun off from parent (TenantA subset → new TenantC)
- Partial data migration (specific departments/products only)
- User subset migrated
- Audit trail filtered to migrated data
- Signatures remain verifiable
4.2 Incremental Migration Support (Delta Sync)
Migration Phases:
Phase 1: Initial Bulk Migration (T-7 days)
- Export all source data as of T-7
- Import into target tenant
- Source tenant remains active (users continue working)
- Duration: ~4 hours for typical tenant
Phase 2: Delta Sync #1 (T-3 days)
- Export only changes since T-7 (incremental)
- Import deltas into target
- Source still active
- Duration: ~1 hour
Phase 3: Delta Sync #2 (T-6 hours, start of cutover window)
- Export changes since T-3
- Import deltas
- Source set to read-only mode (no new changes)
- Duration: ~30 minutes
Phase 4: Final Delta Sync (T-0, cutover)
- Export final changes since T-6h
- Import final deltas
- Verify integrity (row counts, checksums)
- Switch users from source to target
- Duration: ~15 minutes
- Total downtime: ~30 minutes (final sync + verification + switchover)
Incremental Export Query:
-- Export documents modified since last sync
SELECT *
FROM documents
WHERE tenant_id = 'org_source123'
AND (
modified_at > '2026-02-09T02:00:00Z' -- last sync timestamp
OR created_at > '2026-02-09T02:00:00Z'
)
ORDER BY modified_at ASC;
4.3 Cutover Procedure with Minimal Downtime
Cutover Checklist:
T-7 Days: Pre-Cutover
- Migration assessment complete
- Compatibility check passed
- Initial bulk migration complete
- Delta sync #1 complete
- Rollback plan documented
- Cutover window scheduled
T-0: Switchover
- DNS updated
- Load balancer updated
- Users redirected to target tenant
- Source tenant remains read-only for 7 days (safety net)
T+1 Hour: Post-Cutover Monitoring
- User login success rate ≥95%
- No signature verification failures
- No audit trail gaps
- Performance metrics normal
5. Export Integrity Verification
5.1 Row Count Verification Per Entity
Verification Matrix:
| Entity | Source Count | Export Count | Target Count | Status |
|---|---|---|---|---|
| Users | 247 | 247 | 247 | ✅ PASS |
| Documents | 12,847 | 12,847 | 12,847 | ✅ PASS |
| Signatures | 34,521 | 34,521 | 34,521 | ✅ PASS |
| Audit Trail | 1,247,389 | 1,247,389 | 1,247,389 | ✅ PASS |
| Attachments | 3,241 | 3,241 | 3,241 | ✅ PASS |
5.2 SHA-256 Hash Verification Per Exported File
Checksum Manifest (checksums.txt):
SHA256 (tenant_export.json) = a3f5b8c2d9e1f4a7b6c3d8e2f5a9b7c4d1e8f2a6b9c5d3e7f1a4b8c2d9e6f3a1
SHA256 (documents.csv) = b4f7c9d3e1a5b8c2d9e6f3a7b1c4d8e2f5a9b6c3d7e1f4a8b2c9d6e3f7a1b5c8
SHA256 (electronic_signatures.csv) = c8f2e9a1b3d4c7e6f9a2b5c8d1e4f7a9b3c6d9e2f5a8b1c4d7e9f2a5b8c1d4e7
SHA256 (audit_trail.csv) = d1e4f7a9b3c6d9e2f5a8b1c4d7e9f2a5b8c1d4e7f9a2b5c8d1e4f7a9b3c6d9e2
SHA256 (attachments/2026/01/att_001.pdf) = e2f5a8b1c4d7e9f2a5b8c1d4e7f9a2b5c8d1e4f7a9b3c6d9e2f5a8b1c4d7e9f2
Verification:
cd export_org_k3jh5k2j3h45k_20260216/
sha256sum -c checksums.txt
# Output: All files OK
5.3 Relationship Integrity Check
Foreign Key Validation:
| Source Entity | Foreign Key Field | Target Entity | Validation Rule |
|---|---|---|---|
| Document | created_by_user_id | User | User with ID must exist |
| Document | attachments[] | Attachment | All attachment IDs must exist |
| Signature | document_id | Document | Document with ID must exist |
| Audit Trail | actor_user_id | User | User with ID must exist |
6. Large-Scale Export Handling
6.1 Streaming Export for Large Tenants (>1M Records)
Challenge: Tenants with >1M audit trail records, >100K documents cannot fit in memory.
Solution: JSON Lines (.jsonl) streaming format with chunked file generation.
JSON Lines Format:
{"export_metadata": {...}}
{"entity_type": "user", "id": "user_001", "email": "user1@example.com", ...}
{"entity_type": "user", "id": "user_002", "email": "user2@example.com", ...}
{"entity_type": "document", "id": "doc_001", "title": "SOP-001", ...}
{"entity_type": "document", "id": "doc_002", "title": "SOP-002", ...}
Streaming Export Implementation:
def streaming_export(tenant_id: str, output_path: str):
"""Stream export large tenant without loading all data into memory."""
with open(output_path, "w") as out:
# Write metadata
metadata = generate_export_metadata(tenant_id)
out.write(json.dumps({"export_metadata": metadata}) + "\n")
# Stream documents (batch query)
offset = 0
batch_size = 1000
while True:
docs = db.query(
"SELECT * FROM documents WHERE tenant_id = ? ORDER BY id LIMIT ? OFFSET ?",
tenant_id, batch_size, offset
)
if not docs:
break
for doc in docs:
out.write(json.dumps({"entity_type": "document", **doc}) + "\n")
offset += batch_size
print(f"Exported {offset} documents...")
# Stream audit trail (potentially huge)
offset = 0
while True:
audits = db.query(
"SELECT * FROM audit_trail WHERE tenant_id = ? ORDER BY timestamp LIMIT ? OFFSET ?",
tenant_id, batch_size, offset
)
if not audits:
break
for audit in audits:
out.write(json.dumps({"entity_type": "audit", **audit}) + "\n")
offset += batch_size
if offset % 100000 == 0:
print(f"Exported {offset} audit records...")
6.2 Chunked Export with Resume Capability
Checkpoint File:
{
"export_id": "exp_abc123",
"tenant_id": "org_k3jh5k2j3h45k",
"started_at": "2026-02-16T14:00:00Z",
"last_checkpoint_at": "2026-02-16T14:23:45Z",
"status": "in_progress",
"completed_entities": ["users", "groups", "roles"],
"in_progress_entity": "documents",
"in_progress_offset": 45000,
"pending_entities": ["signatures", "audit_trail", "attachments"],
"total_records_exported": 45247,
"estimated_completion": "2026-02-16T15:30:00Z"
}
Resumable Export:
def resumable_export(tenant_id: str, export_id: str = None):
"""Export with checkpoint resume capability."""
if export_id:
# Resume from checkpoint
checkpoint = load_checkpoint(export_id)
print(f"Resuming export {export_id} from {checkpoint['in_progress_entity']} offset {checkpoint['in_progress_offset']}")
else:
# New export
export_id = generate_export_id()
checkpoint = {
"export_id": export_id,
"tenant_id": tenant_id,
"started_at": datetime.now().isoformat(),
"completed_entities": [],
"pending_entities": ["users", "groups", "roles", "documents", "signatures", "audit_trail", "attachments"]
}
try:
# Export entities in order
for entity in checkpoint["pending_entities"]:
offset = checkpoint.get("in_progress_offset", 0) if checkpoint.get("in_progress_entity") == entity else 0
while True:
records = fetch_entity_batch(tenant_id, entity, offset, batch_size=1000)
if not records:
break
write_records_to_export(export_id, entity, records)
offset += len(records)
# Update checkpoint every 10K records
if offset % 10000 == 0:
checkpoint["in_progress_entity"] = entity
checkpoint["in_progress_offset"] = offset
checkpoint["last_checkpoint_at"] = datetime.now().isoformat()
save_checkpoint(checkpoint)
# Entity complete
checkpoint["completed_entities"].append(entity)
checkpoint["pending_entities"].remove(entity)
checkpoint["in_progress_entity"] = None
checkpoint["in_progress_offset"] = 0
save_checkpoint(checkpoint)
# Export complete
checkpoint["status"] = "completed"
checkpoint["completed_at"] = datetime.now().isoformat()
save_checkpoint(checkpoint)
return {"status": "success", "export_id": export_id}
except Exception as e:
# Save checkpoint on failure
checkpoint["status"] = "failed"
checkpoint["error"] = str(e)
checkpoint["failed_at"] = datetime.now().isoformat()
save_checkpoint(checkpoint)
return {"status": "failed", "export_id": export_id, "resume_command": f"resume_export('{export_id}')"}
6.3 Background Processing with Progress Notifications
Celery Task:
from celery import Celery, Task
app = Celery('bio_qms_exports', broker='redis://localhost:6379/0')
@app.task(bind=True)
def export_tenant_data(self, tenant_id: str, user_id: str, export_type: str, formats: list):
"""Background task for tenant data export."""
try:
# Update progress: 0%
self.update_state(state='PROGRESS', meta={'progress': 0, 'status': 'Initializing export...'})
export_id = generate_export_id()
# Export each entity with progress updates
entities = ["users", "groups", "roles", "documents", "signatures", "audit_trail", "attachments"]
total_entities = len(entities)
for i, entity in enumerate(entities):
progress = int((i / total_entities) * 100)
self.update_state(state='PROGRESS', meta={'progress': progress, 'status': f'Exporting {entity}...'})
export_entity(tenant_id, export_id, entity)
# Generate export files
self.update_state(state='PROGRESS', meta={'progress': 90, 'status': 'Generating export files...'})
export_files = {}
if "json" in formats:
export_files["json"] = generate_json_export(export_id)
if "csv" in formats:
export_files["csv"] = generate_csv_export(export_id)
# Compress and upload
self.update_state(state='PROGRESS', meta={'progress': 95, 'status': 'Compressing export...'})
archive_path = create_tar_gz(export_id, export_files)
download_url = upload_to_gcs(archive_path, tenant_id, export_id)
return {
"export_id": export_id,
"download_url": download_url,
"expires_at": (datetime.now() + timedelta(days=7)).isoformat(),
"size_bytes": os.path.getsize(archive_path)
}
except Exception as e:
raise self.retry(exc=e, countdown=60 * (2 ** self.request.retries))
6.4 Compressed Archive Generation
Archive Structure:
export_org_k3jh5k2j3h45k_20260216.tar.gz
├── README.md
├── IMPORT-GUIDE.md
├── manifest.json
├── checksums.txt
├── tenant_export.json
├── documents.csv
├── electronic_signatures.csv
├── audit_trail.csv
├── users.csv
├── attachments/
│ └── 2026/01/att_001.pdf
└── tenant_export.xml
7. Operational Procedures
7.1 Export Request Workflow
- User navigates to Settings > Data Export
- User selects export type (full, personal data, date range)
- User selects formats (JSON, CSV, XML, PDF)
- System displays size estimation and completion time
- User clicks "Request Export"
- Background worker processes export
- User receives email when complete
- User downloads via secure link (7-day expiration)
7.2 Migration Coordination (M&A)
Pre-Migration (T-30 days):
- Legal due diligence complete
- Migration assessment run
- Compatibility issues resolved
- Cutover window scheduled
Migration Execution (T-7 to T+0):
- T-7: Initial bulk migration
- T-3: Delta sync #1
- T-6h: Delta sync #2, source read-only
- T-30min: Final delta sync
- T-0: DNS cutover
8. Security and Compliance
8.1 Export Access Controls
RBAC Permissions:
| Role | Permission | Scope |
|---|---|---|
| Tenant Admin | export:full_tenant | All tenant data |
| User | export:personal_data_self | Own personal data only |
| System | export:migration | Migration exports only |
8.2 Export Encryption
Encryption Options:
| Option | Use Case | Method |
|---|---|---|
| Tenant KEK | Standard export | AES-256-GCM with tenant KEK |
| Password | GDPR export | AES-256-GCM with password-derived key |
| PGP | External transfer | PGP with recipient public key |
9. Monitoring and Alerting
9.1 Export Health Metrics
CloudWatch Metrics:
| Metric | Threshold | Alert |
|---|---|---|
export_duration_minutes | >60 minutes | Investigate slow export |
export_failure_rate | >5% | Check background worker health |
export_checksum_failure_rate | >0% | Critical: data integrity issue |
export_queue_depth | >100 jobs | Scale background workers |
Appendices
Appendix A: Export API Reference
POST /api/v1/tenants/{tenant_id}/exports
GET /api/v1/exports/{export_id}/status
GET /api/v1/exports/{export_id}/download
DELETE /api/v1/exports/{export_id}
Appendix B: GDPR Compliance Checklist
- Machine-readable format (JSON-LD)
- Structured format with schema
- Self-service export UI
- <30-day response SLA
- Zero fees for data portability
- Import guide included
Document Classification: Internal - Confidential Copyright: © 2026 AZ1.AI Inc. All rights reserved. Version: 2.0.0 Effective Date: 2026-02-16