Skip to main content

Tenant Data Export and Migration Specification (Comprehensive)

Document ID: CODITECT-BIO-EXPORT-002 Version: 2.0.0 Effective Date: 2026-02-16 Classification: Internal - Confidential Owner: Chief Technology Officer (CTO) / Data Protection Officer (DPO)


Executive Summary

This document extends the base tenant data export specification (CODITECT-BIO-TDEM-001) with comprehensive coverage of:

  • Large-Scale Export Handling: Streaming exports for tenants with >1M records using JSON Lines format
  • GDPR Article 20 Detailed Compliance: Machine-readable data portability with JSON-LD semantic tagging
  • M&A Migration Workflows: Multi-phase tenant-to-tenant migration with incremental delta sync and <4-hour cutover
  • Advanced Integrity Verification: SHA-256 checksums, relationship integrity, attachment completeness, signature verification
  • Operational Excellence: Background processing, resume capability, progress notifications, monitoring dashboards

Key Enhancements Over Base Specification:

  • Streaming Export: JSON Lines (.jsonl) for memory-efficient processing of 10M+ record tenants
  • Checkpoint Resume: Export jobs resume from last checkpoint after failures (network, timeout, OOM)
  • Incremental Migration: Delta sync reduces cutover window from 4 hours to 30 minutes
  • Integrity Guarantees: Zero tolerance for data loss (<0.01% record loss threshold triggers automatic rollback)
  • GDPR Automation: Self-service personal data export responding to Article 15+20 requests in <24 hours

Compliance Mapping:

  • GDPR Article 20: Right to data portability in structured, commonly used, machine-readable format
  • GDPR Article 15: Right of access to personal data
  • FDA 21 CFR Part 11 §11.10(c): Protection of records to enable accurate and ready retrieval
  • HIPAA §164.524: Individual right of access to PHI within 30 days

Table of Contents

  1. Export Format Specifications
  2. Data Completeness Guarantees
  3. GDPR Article 20 Data Portability
  4. Tenant-to-Tenant Migration (M&A Scenarios)
  5. Export Integrity Verification
  6. Large-Scale Export Handling
  7. Operational Procedures
  8. Security and Compliance
  9. Monitoring and Alerting

1. Export Format Specifications

1.1 JSON Export (Full-Fidelity)

Purpose: Complete tenant data export with all relationships, metadata, and audit history preserved.

Format: JSON Lines (.jsonl) for streaming large datasets, single JSON (.json) for <10K records.

Schema Structure:

{
"export_metadata": {
"export_id": "exp_3kj2h5k234jh5k",
"tenant_id": "org_k3jh5k2j3h45k",
"tenant_name": "Acme Pharmaceuticals Inc.",
"export_type": "full",
"export_initiated_at": "2026-02-16T14:32:18.234Z",
"export_completed_at": "2026-02-16T14:45:22.891Z",
"export_requested_by": "user_admin@acmepharma.com",
"platform_version": "1.2.0",
"schema_version": "1.0.0",
"total_records": 152847,
"total_attachments": 3241,
"total_size_bytes": 4823947234,
"encryption_key_id": "kek_tenant_acme_20260216",
"checksum_algorithm": "SHA-256",
"manifest_checksum": "a3f5b8c2d9e1f4a7b6c3d8e2f5a9b7c4d1e8f2a6b9c5d3e7f1a4b8c2d9e6f3a1"
},
"tenant_configuration": {
"subscription_tier": "enterprise",
"regulatory_profiles": ["FDA_PART_11", "HIPAA", "SOC2"],
"features_enabled": ["electronic_signatures", "audit_trail", "workflow_automation"],
"custom_fields": [...],
"workflow_definitions": [...],
"document_types": [...]
},
"users": [...],
"groups": [...],
"roles": [...],
"documents": [...],
"electronic_signatures": [...],
"audit_trail": [...],
"attachments": [...],
"workflows": [...],
"notifications": [...]
}

Document Record Schema:

{
"id": "doc_abc123",
"tenant_id": "org_k3jh5k2j3h45k",
"document_number": "SOP-001-2026",
"title": "Standard Operating Procedure - Equipment Validation",
"document_type": "sop",
"version": "3.2.1",
"status": "approved",
"created_at": "2026-01-15T09:23:45.123Z",
"created_by_user_id": "user_123",
"created_by_email": "john.doe@acmepharma.com",
"modified_at": "2026-02-10T14:12:33.456Z",
"modified_by_user_id": "user_456",
"approved_at": "2026-02-11T08:45:12.789Z",
"approved_by_user_id": "user_789",
"content_checksum_sha256": "b4f7c9d3e1a5b8c2d9e6f3a7b1c4d8e2f5a9b6c3d7e1f4a8b2c9d6e3f7a1b5c8",
"attachments": ["att_001", "att_002"],
"signatures": ["sig_001", "sig_002", "sig_003"],
"audit_trail_ids": ["audit_001", "audit_002", "audit_003"],
"relationships": {
"supersedes": "doc_xyz789",
"related_documents": ["doc_def456", "doc_ghi789"]
},
"metadata": {
"department": "Quality Assurance",
"effective_date": "2026-02-15",
"review_cycle_months": 12,
"next_review_date": "2027-02-15",
"classification": "controlled",
"retention_years": 7
}
}

Electronic Signature Record Schema:

{
"id": "sig_001",
"tenant_id": "org_k3jh5k2j3h45k",
"document_id": "doc_abc123",
"document_version": "3.2.1",
"signer_user_id": "user_789",
"signer_email": "jane.smith@acmepharma.com",
"signer_full_name": "Jane Smith, PhD",
"signature_meaning": "approved",
"signed_at": "2026-02-11T08:45:12.789Z",
"signature_algorithm": "ECDSA_P256_SHA256",
"signature_value_base64": "MEUCIQDf8sY2...",
"certificate_pem": "-----BEGIN CERTIFICATE-----\n...",
"certificate_serial_number": "3A:4F:8B:2C:...",
"certificate_subject": "CN=Jane Smith,OU=Quality Assurance,O=Acme Pharma,C=US",
"certificate_issuer": "CN=CODITECT BIO-QMS CA,O=AZ1.AI Inc,C=US",
"certificate_not_before": "2025-01-01T00:00:00Z",
"certificate_not_after": "2028-01-01T00:00:00Z",
"timestamp_authority": "tsa.coditect.ai",
"timestamp_value": "2026-02-11T08:45:13.234Z",
"timestamp_signature_base64": "MIIBrDCC...",
"reason": "Document meets regulatory requirements and QA standards",
"ip_address": "10.24.5.123",
"user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)...",
"two_factor_method": "totp",
"password_verified": true,
"biometric_verified": false,
"audit_trail_id": "audit_sig_001"
}

1.2 CSV Export (Flat Tables)

Purpose: Spreadsheet-compatible format for analysis and reporting.

Format: One CSV file per entity type, UTF-8 encoding, comma-delimited, RFC 4180 compliant.

File Structure:

tenant_export_org_k3jh5k2j3h45k_20260216/
├── manifest.json
├── documents.csv
├── electronic_signatures.csv
├── audit_trail.csv
├── users.csv
├── groups.csv
├── roles.csv
├── attachments.csv (metadata only, binaries separate)
├── workflows.csv
├── notifications.csv
└── checksums.txt

Example: documents.csv

id,tenant_id,document_number,title,document_type,version,status,created_at,created_by_email,approved_at,approved_by_email,content_checksum_sha256,attachment_ids,signature_ids
doc_abc123,org_k3jh5k2j3h45k,SOP-001-2026,"Standard Operating Procedure - Equipment Validation",sop,3.2.1,approved,2026-01-15T09:23:45.123Z,john.doe@acmepharma.com,2026-02-11T08:45:12.789Z,jane.smith@acmepharma.com,b4f7c9d3e1a5b8c2d9e6f3a7b1c4d8e2f5a9b6c3d7e1f4a8b2c9d6e3f7a1b5c8,"att_001,att_002","sig_001,sig_002,sig_003"

1.3 XML Export (Structured with XSD)

Purpose: Industry-standard structured format with schema validation for regulatory submissions.

Format: XML 1.0, UTF-8, with XSD schema validation.

Root Structure:

<?xml version="1.0" encoding="UTF-8"?>
<TenantExport xmlns="https://coditect.ai/schemas/bio-qms/export/v1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="https://coditect.ai/schemas/bio-qms/export/v1 tenant-export-schema-v1.0.xsd">

<ExportMetadata>
<ExportID>exp_3kj2h5k234jh5k</ExportID>
<TenantID>org_k3jh5k2j3h45k</TenantID>
<TenantName>Acme Pharmaceuticals Inc.</TenantName>
<ExportType>full</ExportType>
<ExportInitiatedAt>2026-02-16T14:32:18.234Z</ExportInitiatedAt>
<ExportCompletedAt>2026-02-16T14:45:22.891Z</ExportCompletedAt>
<PlatformVersion>1.2.0</PlatformVersion>
<SchemaVersion>1.0.0</SchemaVersion>
<TotalRecords>152847</TotalRecords>
<ManifestChecksum algorithm="SHA-256">a3f5b8c2d9e1f4a7b6c3d8e2f5a9b7c4d1e8f2a6b9c5d3e7f1a4b8c2d9e6f3a1</ManifestChecksum>
</ExportMetadata>

<Documents>
<Document id="doc_abc123">
<DocumentNumber>SOP-001-2026</DocumentNumber>
<Title>Standard Operating Procedure - Equipment Validation</Title>
<DocumentType>sop</DocumentType>
<Version>3.2.1</Version>
<Status>approved</Status>
<CreatedAt>2026-01-15T09:23:45.123Z</CreatedAt>
<CreatedBy email="john.doe@acmepharma.com">user_123</CreatedBy>
<ApprovedAt>2026-02-11T08:45:12.789Z</ApprovedAt>
<ApprovedBy email="jane.smith@acmepharma.com">user_789</ApprovedBy>
<ContentChecksum algorithm="SHA-256">b4f7c9d3e1a5b8c2d9e6f3a7b1c4d8e2f5a9b6c3d7e1f4a8b2c9d6e3f7a1b5c8</ContentChecksum>
<Attachments>
<Attachment ref="att_001"/>
<Attachment ref="att_002"/>
</Attachments>
<Signatures>
<Signature ref="sig_001"/>
<Signature ref="sig_002"/>
<Signature ref="sig_003"/>
</Signatures>
</Document>
</Documents>

</TenantExport>

1.4 PDF Export (Human-Readable)

Purpose: Regulatory-compliant human-readable export for archival and audits.

Format: PDF/A-2b (ISO 19005-2) for long-term archival compliance.

Structure:

  • Title Page: Tenant name, export date, document ID, certification statement
  • Table of Contents: Hyperlinked to sections
  • Export Metadata: Export ID, date range, record counts, checksum
  • Tenant Configuration: Subscription, regulatory profiles, features
  • User Directory: All users with roles and permissions
  • Document Inventory: Table with document number, title, status, approval date
  • Document Details: One page per document with full metadata, signature details, audit history
  • Electronic Signatures: Certificate details, signature meaning, timestamp
  • Audit Trail: Chronological log of all tenant activities
  • Attachment Inventory: List of attachments with checksums (binaries not included in PDF)
  • Certification Page: Digital signature from DPO attesting to export completeness

2. Data Completeness Guarantees

2.1 Complete Data Inventory Per Tenant

10-Entity Export Coverage:

Entity TypeDescriptionRelationshipsExport Format
DocumentsAll QMS documents (SOPs, protocols, reports, forms)Attachments, Signatures, Audit Trail, WorkflowsJSON, CSV, XML, PDF
Electronic SignaturesFDA Part 11 compliant signatures with certificatesDocuments, Users, Audit TrailJSON, CSV, XML, PDF
Audit TrailComplete activity log (HIPAA §164.312(b))All entitiesJSON, CSV, XML, PDF
UsersUser accounts, profiles, authentication historyGroups, Roles, Documents, SignaturesJSON, CSV, XML
GroupsUser groups for access controlUsers, Roles, PermissionsJSON, CSV, XML
RolesRBAC roles with permission setsUsers, Groups, PermissionsJSON, CSV, XML
AttachmentsBinary files (PDFs, images, spreadsheets)Documents, Audit TrailJSON (metadata), Binary Files
WorkflowsXState workflow definitions and execution historyDocuments, Users, Audit TrailJSON, CSV, XML
NotificationsSystem notifications and alertsUsers, Documents, WorkflowsJSON, CSV
ConfigurationsTenant settings, custom fields, document typesAll entitiesJSON, XML

2.2 Attachment Export (Binary Files with Metadata)

Directory Structure:

export_org_k3jh5k2j3h45k_20260216/
├── attachments/
│ ├── 2026/
│ │ ├── 01/
│ │ │ ├── att_001.pdf (original filename preserved)
│ │ │ └── att_001.json (metadata sidecar)
│ │ └── 02/
│ │ ├── att_002.xlsx
│ │ └── att_002.json
│ └── checksums.txt (SHA-256 for all files)
└── attachments_manifest.json

Attachment Metadata Sidecar (att_001.json):

{
"id": "att_001",
"tenant_id": "org_k3jh5k2j3h45k",
"document_id": "doc_abc123",
"original_filename": "equipment-calibration-photos.pdf",
"content_type": "application/pdf",
"size_bytes": 2847392,
"checksum_sha256": "c8f2e9a1b3d4c7e6f9a2b5c8d1e4f7a9b3c6d9e2f5a8b1c4d7e9f2a5b8c1d4e7",
"uploaded_at": "2026-01-20T11:45:23Z",
"uploaded_by_user_id": "user_123",
"uploaded_by_email": "john.doe@acmepharma.com",
"virus_scan_status": "clean",
"virus_scan_timestamp": "2026-01-20T11:45:45Z",
"export_path": "attachments/2026/01/att_001.pdf"
}

3. GDPR Article 20 Data Portability

3.1 GDPR Article 20 Requirements

GDPR Article 20 - Right to Data Portability:

The data subject shall have the right to receive the personal data concerning him or her, which he or she has provided to a controller, in a structured, commonly used and machine-readable format and have the right to transmit those data to another controller without hindrance.

Compliance Implementation:

GDPR RequirementBIO-QMS ImplementationEvidence
Structured formatJSON with documented schematenant-export-schema-v1.0.json
Commonly used formatJSON, CSV, XML (industry standards)RFC 8259, RFC 4180, W3C XML
Machine-readableJSON-LD with schema.org vocabularyParsable by standard libraries
Without hindranceSelf-service export UI, API endpoint<30-day SLA, zero fees
Transmit to another controllerPortable format with import guideIMPORT-GUIDE.md included

3.2 Machine-Readable Format (JSON-LD with schema.org)

JSON-LD Context for Semantic Portability:

{
"@context": {
"@vocab": "https://schema.org/",
"coditect": "https://coditect.ai/schemas/bio-qms/v1#",
"gdpr": "https://w3c.github.io/dpv/dpv-gdpr#"
},
"@type": "Dataset",
"name": "Tenant Data Export - Acme Pharmaceuticals Inc.",
"description": "Complete tenant data export for GDPR Article 20 data portability",
"dateCreated": "2026-02-16T14:32:18.234Z",
"creator": {
"@type": "Organization",
"name": "CODITECT BIO-QMS Platform",
"url": "https://bio-qms.coditect.ai"
},
"tenant": {
"@type": "Organization",
"@id": "org:org_k3jh5k2j3h45k",
"name": "Acme Pharmaceuticals Inc.",
"legalName": "Acme Pharmaceuticals Incorporated",
"url": "https://acmepharma.com"
},
"dataSubjects": [
{
"@type": "Person",
"@id": "user:user_456",
"email": "mary.jones@acmepharma.com",
"name": "Mary Jones",
"jobTitle": "Senior QA Specialist",
"worksFor": {"@id": "org:org_k3jh5k2j3h45k"},
"gdpr:personalData": {
"gdpr:dataCategory": "user_profile",
"gdpr:legalBasis": "contract",
"gdpr:processingPurpose": "quality_management_system_access",
"gdpr:retentionPeriod": "P7Y"
}
}
],
"distribution": [
{
"@type": "DataDownload",
"encodingFormat": "application/json",
"contentUrl": "tenant_export_org_k3jh5k2j3h45k_20260216.json",
"contentSize": "4823947234 bytes",
"sha256": "a3f5b8c2d9e1f4a7b6c3d8e2f5a9b7c4d1e8f2a6b9c5d3e7f1a4b8c2d9e6f3a1"
}
],
"license": "https://creativecommons.org/licenses/by/4.0/",
"isAccessibleForFree": true,
"gdpr:dataPortability": {
"gdpr:portabilityFormat": "JSON-LD",
"gdpr:portabilityCompliance": "GDPR Article 20",
"gdpr:importGuide": "IMPORT-GUIDE.md"
}
}

3.3 Automated Export on Data Subject Request

GDPR Article 15 + 20 Combined Request:

Timeline:

  • Day 0: Request received, identity verification initiated
  • Day 1-2: Identity verified, export generated
  • Day 2: Secure download link emailed
  • Day 7: Reminder if not downloaded
  • Day 21: Link expires, can re-request

Export Contents (Personal Data Only):

personal_data_export_user_456_20260216/
├── GDPR-ARTICLE-15-RESPONSE.md (human-readable summary)
├── GDPR-ARTICLE-20-DATA.json (machine-readable)
├── user_profile.json
├── documents_authored.csv
├── signatures.csv
├── audit_trail_personal.csv (only records where user is actor or subject)
├── permissions_history.json
├── attachments/ (only files uploaded by user)
└── checksums.txt

4. Tenant-to-Tenant Migration (M&A Scenarios)

4.1 M&A Migration Use Cases

Scenario 1: Company Acquisition

  • Acquired company (TenantA) merges into parent (TenantB)
  • All TenantA data accessible in TenantB
  • TenantA users become TenantB users
  • TenantA audit trail preserved
  • TenantA encryption keys retained for signature verification

Scenario 2: Spin-Off / Divestiture

  • Division spun off from parent (TenantA subset → new TenantC)
  • Partial data migration (specific departments/products only)
  • User subset migrated
  • Audit trail filtered to migrated data
  • Signatures remain verifiable

4.2 Incremental Migration Support (Delta Sync)

Migration Phases:

Phase 1: Initial Bulk Migration (T-7 days)

  • Export all source data as of T-7
  • Import into target tenant
  • Source tenant remains active (users continue working)
  • Duration: ~4 hours for typical tenant

Phase 2: Delta Sync #1 (T-3 days)

  • Export only changes since T-7 (incremental)
  • Import deltas into target
  • Source still active
  • Duration: ~1 hour

Phase 3: Delta Sync #2 (T-6 hours, start of cutover window)

  • Export changes since T-3
  • Import deltas
  • Source set to read-only mode (no new changes)
  • Duration: ~30 minutes

Phase 4: Final Delta Sync (T-0, cutover)

  • Export final changes since T-6h
  • Import final deltas
  • Verify integrity (row counts, checksums)
  • Switch users from source to target
  • Duration: ~15 minutes
  • Total downtime: ~30 minutes (final sync + verification + switchover)

Incremental Export Query:

-- Export documents modified since last sync
SELECT *
FROM documents
WHERE tenant_id = 'org_source123'
AND (
modified_at > '2026-02-09T02:00:00Z' -- last sync timestamp
OR created_at > '2026-02-09T02:00:00Z'
)
ORDER BY modified_at ASC;

4.3 Cutover Procedure with Minimal Downtime

Cutover Checklist:

T-7 Days: Pre-Cutover

  • Migration assessment complete
  • Compatibility check passed
  • Initial bulk migration complete
  • Delta sync #1 complete
  • Rollback plan documented
  • Cutover window scheduled

T-0: Switchover

  • DNS updated
  • Load balancer updated
  • Users redirected to target tenant
  • Source tenant remains read-only for 7 days (safety net)

T+1 Hour: Post-Cutover Monitoring

  • User login success rate ≥95%
  • No signature verification failures
  • No audit trail gaps
  • Performance metrics normal

5. Export Integrity Verification

5.1 Row Count Verification Per Entity

Verification Matrix:

EntitySource CountExport CountTarget CountStatus
Users247247247✅ PASS
Documents12,84712,84712,847✅ PASS
Signatures34,52134,52134,521✅ PASS
Audit Trail1,247,3891,247,3891,247,389✅ PASS
Attachments3,2413,2413,241✅ PASS

5.2 SHA-256 Hash Verification Per Exported File

Checksum Manifest (checksums.txt):

SHA256 (tenant_export.json) = a3f5b8c2d9e1f4a7b6c3d8e2f5a9b7c4d1e8f2a6b9c5d3e7f1a4b8c2d9e6f3a1
SHA256 (documents.csv) = b4f7c9d3e1a5b8c2d9e6f3a7b1c4d8e2f5a9b6c3d7e1f4a8b2c9d6e3f7a1b5c8
SHA256 (electronic_signatures.csv) = c8f2e9a1b3d4c7e6f9a2b5c8d1e4f7a9b3c6d9e2f5a8b1c4d7e9f2a5b8c1d4e7
SHA256 (audit_trail.csv) = d1e4f7a9b3c6d9e2f5a8b1c4d7e9f2a5b8c1d4e7f9a2b5c8d1e4f7a9b3c6d9e2
SHA256 (attachments/2026/01/att_001.pdf) = e2f5a8b1c4d7e9f2a5b8c1d4e7f9a2b5c8d1e4f7a9b3c6d9e2f5a8b1c4d7e9f2

Verification:

cd export_org_k3jh5k2j3h45k_20260216/
sha256sum -c checksums.txt
# Output: All files OK

5.3 Relationship Integrity Check

Foreign Key Validation:

Source EntityForeign Key FieldTarget EntityValidation Rule
Documentcreated_by_user_idUserUser with ID must exist
Documentattachments[]AttachmentAll attachment IDs must exist
Signaturedocument_idDocumentDocument with ID must exist
Audit Trailactor_user_idUserUser with ID must exist

6. Large-Scale Export Handling

6.1 Streaming Export for Large Tenants (>1M Records)

Challenge: Tenants with >1M audit trail records, >100K documents cannot fit in memory.

Solution: JSON Lines (.jsonl) streaming format with chunked file generation.

JSON Lines Format:

{"export_metadata": {...}}
{"entity_type": "user", "id": "user_001", "email": "user1@example.com", ...}
{"entity_type": "user", "id": "user_002", "email": "user2@example.com", ...}
{"entity_type": "document", "id": "doc_001", "title": "SOP-001", ...}
{"entity_type": "document", "id": "doc_002", "title": "SOP-002", ...}

Streaming Export Implementation:

def streaming_export(tenant_id: str, output_path: str):
"""Stream export large tenant without loading all data into memory."""

with open(output_path, "w") as out:
# Write metadata
metadata = generate_export_metadata(tenant_id)
out.write(json.dumps({"export_metadata": metadata}) + "\n")

# Stream documents (batch query)
offset = 0
batch_size = 1000
while True:
docs = db.query(
"SELECT * FROM documents WHERE tenant_id = ? ORDER BY id LIMIT ? OFFSET ?",
tenant_id, batch_size, offset
)
if not docs:
break

for doc in docs:
out.write(json.dumps({"entity_type": "document", **doc}) + "\n")

offset += batch_size
print(f"Exported {offset} documents...")

# Stream audit trail (potentially huge)
offset = 0
while True:
audits = db.query(
"SELECT * FROM audit_trail WHERE tenant_id = ? ORDER BY timestamp LIMIT ? OFFSET ?",
tenant_id, batch_size, offset
)
if not audits:
break

for audit in audits:
out.write(json.dumps({"entity_type": "audit", **audit}) + "\n")

offset += batch_size
if offset % 100000 == 0:
print(f"Exported {offset} audit records...")

6.2 Chunked Export with Resume Capability

Checkpoint File:

{
"export_id": "exp_abc123",
"tenant_id": "org_k3jh5k2j3h45k",
"started_at": "2026-02-16T14:00:00Z",
"last_checkpoint_at": "2026-02-16T14:23:45Z",
"status": "in_progress",
"completed_entities": ["users", "groups", "roles"],
"in_progress_entity": "documents",
"in_progress_offset": 45000,
"pending_entities": ["signatures", "audit_trail", "attachments"],
"total_records_exported": 45247,
"estimated_completion": "2026-02-16T15:30:00Z"
}

Resumable Export:

def resumable_export(tenant_id: str, export_id: str = None):
"""Export with checkpoint resume capability."""

if export_id:
# Resume from checkpoint
checkpoint = load_checkpoint(export_id)
print(f"Resuming export {export_id} from {checkpoint['in_progress_entity']} offset {checkpoint['in_progress_offset']}")
else:
# New export
export_id = generate_export_id()
checkpoint = {
"export_id": export_id,
"tenant_id": tenant_id,
"started_at": datetime.now().isoformat(),
"completed_entities": [],
"pending_entities": ["users", "groups", "roles", "documents", "signatures", "audit_trail", "attachments"]
}

try:
# Export entities in order
for entity in checkpoint["pending_entities"]:
offset = checkpoint.get("in_progress_offset", 0) if checkpoint.get("in_progress_entity") == entity else 0

while True:
records = fetch_entity_batch(tenant_id, entity, offset, batch_size=1000)
if not records:
break

write_records_to_export(export_id, entity, records)
offset += len(records)

# Update checkpoint every 10K records
if offset % 10000 == 0:
checkpoint["in_progress_entity"] = entity
checkpoint["in_progress_offset"] = offset
checkpoint["last_checkpoint_at"] = datetime.now().isoformat()
save_checkpoint(checkpoint)

# Entity complete
checkpoint["completed_entities"].append(entity)
checkpoint["pending_entities"].remove(entity)
checkpoint["in_progress_entity"] = None
checkpoint["in_progress_offset"] = 0
save_checkpoint(checkpoint)

# Export complete
checkpoint["status"] = "completed"
checkpoint["completed_at"] = datetime.now().isoformat()
save_checkpoint(checkpoint)

return {"status": "success", "export_id": export_id}

except Exception as e:
# Save checkpoint on failure
checkpoint["status"] = "failed"
checkpoint["error"] = str(e)
checkpoint["failed_at"] = datetime.now().isoformat()
save_checkpoint(checkpoint)

return {"status": "failed", "export_id": export_id, "resume_command": f"resume_export('{export_id}')"}

6.3 Background Processing with Progress Notifications

Celery Task:

from celery import Celery, Task

app = Celery('bio_qms_exports', broker='redis://localhost:6379/0')

@app.task(bind=True)
def export_tenant_data(self, tenant_id: str, user_id: str, export_type: str, formats: list):
"""Background task for tenant data export."""

try:
# Update progress: 0%
self.update_state(state='PROGRESS', meta={'progress': 0, 'status': 'Initializing export...'})

export_id = generate_export_id()

# Export each entity with progress updates
entities = ["users", "groups", "roles", "documents", "signatures", "audit_trail", "attachments"]
total_entities = len(entities)

for i, entity in enumerate(entities):
progress = int((i / total_entities) * 100)
self.update_state(state='PROGRESS', meta={'progress': progress, 'status': f'Exporting {entity}...'})

export_entity(tenant_id, export_id, entity)

# Generate export files
self.update_state(state='PROGRESS', meta={'progress': 90, 'status': 'Generating export files...'})

export_files = {}
if "json" in formats:
export_files["json"] = generate_json_export(export_id)
if "csv" in formats:
export_files["csv"] = generate_csv_export(export_id)

# Compress and upload
self.update_state(state='PROGRESS', meta={'progress': 95, 'status': 'Compressing export...'})

archive_path = create_tar_gz(export_id, export_files)
download_url = upload_to_gcs(archive_path, tenant_id, export_id)

return {
"export_id": export_id,
"download_url": download_url,
"expires_at": (datetime.now() + timedelta(days=7)).isoformat(),
"size_bytes": os.path.getsize(archive_path)
}

except Exception as e:
raise self.retry(exc=e, countdown=60 * (2 ** self.request.retries))

6.4 Compressed Archive Generation

Archive Structure:

export_org_k3jh5k2j3h45k_20260216.tar.gz
├── README.md
├── IMPORT-GUIDE.md
├── manifest.json
├── checksums.txt
├── tenant_export.json
├── documents.csv
├── electronic_signatures.csv
├── audit_trail.csv
├── users.csv
├── attachments/
│ └── 2026/01/att_001.pdf
└── tenant_export.xml

7. Operational Procedures

7.1 Export Request Workflow

  1. User navigates to Settings > Data Export
  2. User selects export type (full, personal data, date range)
  3. User selects formats (JSON, CSV, XML, PDF)
  4. System displays size estimation and completion time
  5. User clicks "Request Export"
  6. Background worker processes export
  7. User receives email when complete
  8. User downloads via secure link (7-day expiration)

7.2 Migration Coordination (M&A)

Pre-Migration (T-30 days):

  • Legal due diligence complete
  • Migration assessment run
  • Compatibility issues resolved
  • Cutover window scheduled

Migration Execution (T-7 to T+0):

  • T-7: Initial bulk migration
  • T-3: Delta sync #1
  • T-6h: Delta sync #2, source read-only
  • T-30min: Final delta sync
  • T-0: DNS cutover

8. Security and Compliance

8.1 Export Access Controls

RBAC Permissions:

RolePermissionScope
Tenant Adminexport:full_tenantAll tenant data
Userexport:personal_data_selfOwn personal data only
Systemexport:migrationMigration exports only

8.2 Export Encryption

Encryption Options:

OptionUse CaseMethod
Tenant KEKStandard exportAES-256-GCM with tenant KEK
PasswordGDPR exportAES-256-GCM with password-derived key
PGPExternal transferPGP with recipient public key

9. Monitoring and Alerting

9.1 Export Health Metrics

CloudWatch Metrics:

MetricThresholdAlert
export_duration_minutes>60 minutesInvestigate slow export
export_failure_rate>5%Check background worker health
export_checksum_failure_rate>0%Critical: data integrity issue
export_queue_depth>100 jobsScale background workers

Appendices

Appendix A: Export API Reference

POST /api/v1/tenants/{tenant_id}/exports
GET /api/v1/exports/{export_id}/status
GET /api/v1/exports/{export_id}/download
DELETE /api/v1/exports/{export_id}

Appendix B: GDPR Compliance Checklist

  • Machine-readable format (JSON-LD)
  • Structured format with schema
  • Self-service export UI
  • <30-day response SLA
  • Zero fees for data portability
  • Import guide included

Document Classification: Internal - Confidential Copyright: © 2026 AZ1.AI Inc. All rights reserved. Version: 2.0.0 Effective Date: 2026-02-16