ADR-012: CODITECT Data Isolation Strategy

Status

ACCEPTED (2026-02-03)

Context

CODITECT serves multiple customer organizations (tenants) on shared infrastructure while requiring:

Complete data separation between tenants
Compliance readiness for SOC 2, GDPR, HIPAA
Defense in depth - multiple isolation layers
Performance - isolation without overhead
Auditability - prove isolation to customers

Threat Model

Threat	Impact	Mitigation Required
Cross-tenant data leak	Critical	Database-level enforcement
SQL injection bypass	Critical	RLS policies (not app-level)
Insider access abuse	High	Audit logging, IAM
Shared resource contention	Medium	Resource quotas
Backup data exposure	High	Encrypted, tenant-scoped

Decision

Architecture: Defense in Depth Isolation

Implement four layers of data isolation to ensure no single point of failure:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    CODITECT DATA ISOLATION LAYERS                            │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   LAYER 1: NETWORK ISOLATION                                                │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │  GCP VPC with private subnets                                       │   │
│   │  Cloud NAT for egress-only internet                                 │   │
│   │  Private Service Connect for GCP services                           │   │
│   │  No direct database access from internet                            │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│   LAYER 2: APPLICATION ISOLATION                                            │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │  Django middleware sets tenant context from JWT                     │   │
│   │  django-multitenant auto-filters all queries                       │   │
│   │  Request-scoped tenant_id (cannot change mid-request)              │   │
│   │  API endpoints validate tenant ownership                            │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│   LAYER 3: DATABASE ISOLATION (PostgreSQL RLS)                              │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │  Row-Level Security policies on ALL tenant tables                  │   │
│   │  Session variable: SET app.tenant_id = 'uuid'                      │   │
│   │  RLS policy: USING (tenant_id = current_setting('app.tenant_id'))  │   │
│   │  Database enforces even if app layer bypassed                      │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│   LAYER 4: STORAGE ISOLATION                                                │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │  Per-tenant GCS buckets (gs://coditect-{tenant-slug}-data)         │   │
│   │  Per-tenant service accounts with bucket-only access               │   │
│   │  Customer-managed encryption keys (CMEK) optional                  │   │
│   │  Cross-tenant bucket access impossible via IAM                     │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

PostgreSQL Row-Level Security Implementation

Base Policy Pattern:

-- Enable RLS on tenant table
ALTER TABLE context_messages ENABLE ROW LEVEL SECURITY;

-- Force RLS even for table owners (security best practice)
ALTER TABLE context_messages FORCE ROW LEVEL SECURITY;

-- SELECT policy - users can only read their tenant's data
CREATE POLICY tenant_isolation_select ON context_messages
    FOR SELECT
    USING (tenant_id = current_setting('app.tenant_id')::UUID);

-- INSERT policy - users can only insert to their tenant
CREATE POLICY tenant_isolation_insert ON context_messages
    FOR INSERT
    WITH CHECK (tenant_id = current_setting('app.tenant_id')::UUID);

-- UPDATE policy - users can only update their tenant's data
CREATE POLICY tenant_isolation_update ON context_messages
    FOR UPDATE
    USING (tenant_id = current_setting('app.tenant_id')::UUID)
    WITH CHECK (tenant_id = current_setting('app.tenant_id')::UUID);

-- DELETE policy - users can only delete their tenant's data
CREATE POLICY tenant_isolation_delete ON context_messages
    FOR DELETE
    USING (tenant_id = current_setting('app.tenant_id')::UUID);

Django Middleware Integration:

class TenantIsolationMiddleware:
    """
    Sets PostgreSQL session variable for RLS enforcement.

    Security: Even if application code has bugs, database
    enforces tenant isolation at the row level.
    """

    def __call__(self, request):
        # Extract tenant_id from validated JWT token
        tenant_id = request.auth.get('tenant_id')
        user_id = request.auth.get('user_id')

        if not tenant_id:
            raise PermissionDenied("No tenant context")

        # Set PostgreSQL session variables for RLS
        with connection.cursor() as cursor:
            cursor.execute(
                "SET app.tenant_id = %s; SET app.user_id = %s;",
                [str(tenant_id), str(user_id)]
            )

        # Attach to request for application use
        request.tenant_id = tenant_id
        request.user_id = user_id

        return self.get_response(request)

RLS Policy Flow

┌─────────────────────────────────────────────────────────────────────────────┐
│                    RLS POLICY ENFORCEMENT FLOW                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   1. Request arrives with JWT token                                         │
│      │                                                                       │
│      ▼                                                                       │
│   2. Middleware extracts tenant_id from JWT                                 │
│      │                                                                       │
│      ▼                                                                       │
│   3. SET app.tenant_id = 'tenant-uuid'                                      │
│      │                                                                       │
│      ▼                                                                       │
│   4. Application executes query: SELECT * FROM messages                     │
│      │                                                                       │
│      ▼                                                                       │
│   5. PostgreSQL RLS automatically rewrites to:                              │
│      SELECT * FROM messages WHERE tenant_id = 'tenant-uuid'                 │
│      │                                                                       │
│      ▼                                                                       │
│   6. Only tenant's data returned (database-enforced)                        │
│                                                                              │
│   ════════════════════════════════════════════════════════════════════════  │
│                                                                              │
│   ATTEMPTED BYPASS (SQL Injection):                                         │
│                                                                              │
│   Attacker tries: SELECT * FROM messages WHERE tenant_id = 'other-tenant'   │
│      │                                                                       │
│      ▼                                                                       │
│   RLS policy STILL enforced:                                                │
│   SELECT * FROM messages                                                     │
│     WHERE tenant_id = 'other-tenant'                                        │
│     AND tenant_id = current_setting('app.tenant_id')  ← RLS adds this      │
│      │                                                                       │
│      ▼                                                                       │
│   Result: Empty set (conditions contradict)                                 │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Tables with RLS Enforcement

Table	tenant_id Column	RLS Enabled	Indexed
`organizations`	`id` (self)	Yes	Primary
`teams`	`tenant_id`	Yes	Yes
`users`	`tenant_id`	Yes	Yes
`projects`	`tenant_id`	Yes	Yes
`sessions`	`tenant_id`	Yes	Yes
`messages`	`tenant_id` (via session)	Yes	Yes
`context_messages`	`tenant_id`	Yes	Yes
`task_tracking`	`tenant_id`	Yes	Yes
`sync_cursors`	`tenant_id`	Yes	Yes
`decisions`	`tenant_id`	Yes	Yes
`skill_learnings`	`tenant_id`	Yes	Yes

Local Database Isolation (ADR-118)

For local installations, isolation is per-machine via filesystem:

~/PROJECTS/.coditect-data/context-storage/
├── platform.db       # Tier 1: Read-only component data (shared)
├── org.db            # Tier 2: User's organization data (isolated)
├── sessions.db       # Tier 3: User's session data (isolated)
└── projects.db       # Tier 4: Project-specific data (isolated)

Local Isolation Properties:

File permissions: 0600 (owner read/write only)
No network exposure (local-only)
Machine-specific UUIDs prevent cross-device conflicts
Cloud sync requires authentication (ADR-053)

GCS Storage Isolation

┌─────────────────────────────────────────────────────────────────────────────┐
│                    GCS BUCKET ISOLATION                                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   Per-Tenant Bucket Structure:                                              │
│                                                                              │
│   gs://coditect-acme-corp-data/           ← Tenant: Acme Corp               │
│   ├── exports/                                                              │
│   ├── backups/                                                              │
│   └── artifacts/                                                            │
│                                                                              │
│   gs://coditect-beta-inc-data/            ← Tenant: Beta Inc                │
│   ├── exports/                                                              │
│   ├── backups/                                                              │
│   └── artifacts/                                                            │
│                                                                              │
│   IAM Policy (per bucket):                                                  │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │  Principal: coditect-acme-corp-sa@project.iam.gserviceaccount.com  │   │
│   │  Role: roles/storage.objectAdmin                                    │   │
│   │  Resource: gs://coditect-acme-corp-data/*                          │   │
│   │                                                                      │   │
│   │  NO access to gs://coditect-beta-inc-data/* (different SA)         │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Audit Logging

All data access is logged for compliance:

# Audit log entry structure
{
    "timestamp": "2026-02-03T12:00:00Z",
    "tenant_id": "tenant-uuid",
    "user_id": "user-uuid",
    "action": "SELECT",
    "table": "context_messages",
    "row_count": 42,
    "source_ip": "10.0.0.1",
    "user_agent": "CODITECT/2.0"
}

Audit Storage:

Cloud Logging (30-day retention, searchable)
BigQuery export (long-term compliance)
Immutable audit trail

Consequences

Positive

Database-enforced isolation - Cannot be bypassed by application bugs
Defense in depth - Four independent layers
Compliance-ready - SOC 2, GDPR, HIPAA evidence
Auditability - Complete access logging
Performance - RLS uses efficient index scans
Simplicity - django-multitenant handles complexity

Negative

Complexity - Multiple isolation mechanisms to maintain
Testing overhead - Must verify isolation in all layers
Migration complexity - Adding RLS to existing tables
Debugging difficulty - Must set tenant context for queries

Risks

Risk	Mitigation
RLS policy misconfiguration	Automated tests, security audit
Session variable not set	Middleware enforces before any query
Backup contains other tenant data	Per-tenant backup jobs
Admin bypasses isolation	Separate admin audit log

Testing Strategy

Isolation Test Suite

class TenantIsolationTests(TestCase):
    """Verify tenant isolation at all layers."""

    def test_rls_prevents_cross_tenant_select(self):
        """Tenant A cannot read Tenant B's data."""
        tenant_a = create_tenant("Acme")
        tenant_b = create_tenant("Beta")

        # Create message for Tenant A
        set_tenant(tenant_a)
        msg = Message.objects.create(content="secret")

        # Switch to Tenant B
        set_tenant(tenant_b)

        # Should not find Tenant A's message
        self.assertEqual(Message.objects.count(), 0)

        # Direct query should also fail
        with connection.cursor() as c:
            c.execute(f"SELECT * FROM messages WHERE id = '{msg.id}'")
            self.assertEqual(c.fetchall(), [])

    def test_rls_prevents_cross_tenant_update(self):
        """Tenant A cannot modify Tenant B's data."""
        # ... similar pattern

    def test_gcs_bucket_isolation(self):
        """Service account cannot access other tenant's bucket."""
        # ... IAM verification

Implementation

Phase 1: Database RLS (Complete)

PostgreSQL RLS policies on all tenant tables
django-multitenant integration
Middleware for session variable

Phase 2: Storage Isolation (Complete)

Per-tenant GCS buckets
Per-tenant service accounts
IAM policy automation

Phase 3: Audit Logging (In Progress)

Cloud Logging integration
BigQuery export
Compliance dashboards

Phase 4: Compliance Certification (Planned)

SOC 2 Type II audit
Penetration testing
Security documentation

ADR-009: Multi-Tenant Architecture
ADR-010: Cloud Workstations Architecture
ADR-011: Authentication Strategy
ADR-044: Custom REST Sync Architecture
ADR-053: Cloud Context Sync Architecture
ADR-089: Two-Database Architecture
ADR-103: Four-Database Separation
ADR-118: Four-Tier Database Architecture

Diagrams

diagrams/mermaid/adr012-isolation-layers.mmd
diagrams/mermaid/adr012-rls-policy-flow.mmd
diagrams/mermaid/adr012-tenant-boundary.mmd
diagrams/mermaid/adr012-sharding-strategy.mmd

Track: D (Security Hardening) Task: F.12.2.4

Status​

Context​

Threat Model​

Decision​

Architecture: Defense in Depth Isolation​

PostgreSQL Row-Level Security Implementation​

RLS Policy Flow​

Tables with RLS Enforcement​

Local Database Isolation (ADR-118)​

GCS Storage Isolation​

Audit Logging​

Consequences​

Positive​

Negative​

Risks​

Testing Strategy​

Isolation Test Suite​

Implementation​

Phase 1: Database RLS (Complete)​

Phase 2: Storage Isolation (Complete)​

Phase 3: Audit Logging (In Progress)​

Phase 4: Compliance Certification (Planned)​

Related​

Diagrams​