Prompt 09: Architecture Decision Record - Multi-Tenancy Architecture
Context
You are a principal architect designing the multi-tenancy model for CODITECT-COMPLIANCE. This ADR establishes how customer organizations are isolated, how data is partitioned, and how the platform scales across thousands of tenants.
Output Specification
Generate a comprehensive Architecture Decision Record (ADR) following the standard ADR format. The document should be 3,000-4,500 words (9,000-14,000 tokens).
Document Structure
ADR-005: Multi-Tenancy Architecture
# ADR-005: Multi-Tenancy Architecture and Data Isolation
## Status
Proposed | Accepted | Deprecated | Superseded
## Date
[Current Date]
## Decision Makers
- [Role: Chief Architect]
- [Role: Security Architect]
- [Role: Platform Engineering Lead]
## Context
### Problem Statement
CODITECT-COMPLIANCE must support thousands of customer organizations with:
- **Data Isolation**: Customer A cannot access Customer B's data
- **Performance Isolation**: One customer's load cannot impact others
- **Compliance Isolation**: Each customer has independent compliance postures
- **Administrative Isolation**: Customer admins manage only their organization
### Scale Requirements
| Metric | Year 1 | Year 3 | Year 5 |
|--------|--------|--------|--------|
| Organizations | 100 | 1,000 | 10,000 |
| Users per Org | 10-50 | 10-500 | 10-1000 |
| Controls per Org | 500-2000 | 500-5000 | 500-10000 |
| Evidence Items | 10K-100K | 100K-1M | 1M-10M |
| Integrations | 5-15 | 15-50 | 50-100 |
### Compliance Requirements
- SOC 2 Type II: Logical access controls, data segregation
- HIPAA: PHI isolation for healthcare customers
- GDPR: Data residency and right to erasure
- Enterprise: Single-tenant deployment option
### Technical Constraints
- FoundationDB as primary database (supports keyspace isolation)
- Neo4j for graph data (supports multi-database or label isolation)
- GCS for blob storage (bucket-level or prefix isolation)
- Kubernetes for compute (namespace isolation)
- Single codebase, multi-tenant deployment
## Decision Drivers
1. **Security**: Cryptographic guarantees of data isolation
2. **Cost Efficiency**: Shared infrastructure where safe
3. **Operational Simplicity**: Manageable at scale
4. **Performance**: Predictable latency regardless of neighbor load
5. **Compliance**: Meet regulatory isolation requirements
6. **Scalability**: Support 10,000+ organizations
## Options Considered
### Option 1: Single Database with Logical Isolation
**Description**: All tenants in one database, isolated by organization_id column.
┌─────────────────────────────────────────┐ │ Shared Database │ │ ┌──────────────────────────────────┐ │ │ │ controls │ │ │ │ - organization_id (index) │ │ │ │ - control_id │ │ │ │ - ... │ │ │ └──────────────────────────────────┘ │ └─────────────────────────────────────────┘
**Pros**:
- Simplest implementation
- Lowest infrastructure cost
- Easy schema migrations
**Cons**:
- No isolation for noisy neighbors
- Cross-tenant bugs possible
- Single compliance boundary
- Difficult to provide single-tenant option
### Option 2: Database-per-Tenant
**Description**: Each tenant gets dedicated database instances.
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ Org A DB │ │ Org B DB │ │ Org C DB │ └─────────────┘ └─────────────┘ └─────────────┘
**Pros**:
- Complete isolation
- Per-tenant performance
- Easy data residency
- Clean deletion
**Cons**:
- Expensive at scale (10,000 databases)
- Complex connection management
- Schema migration complexity
- High operational overhead
### Option 3: Hybrid - Shared Infrastructure with Isolation Boundaries
**Description**: Shared databases with strong isolation guarantees via keyspace partitioning, encryption, and access controls.
┌─────────────────────────────────────────────────────────────────┐ │ Shared Infrastructure │ │ ┌───────────────────────────────────────────────────────────┐ │ │ │ FoundationDB Cluster │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ │ │ Keyspace: │ │ Keyspace: │ │ Keyspace: │ │ │ │ │ │ org_a/... │ │ org_b/... │ │ org_c/... │ │ │ │ │ │ (encrypted) │ │ (encrypted) │ │ (encrypted) │ │ │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │ └───────────────────────────────────────────────────────────┘ │ │ ┌───────────────────────────────────────────────────────────┐ │ │ │ Neo4j Cluster │ │ │ │ Multi-database OR Label-based isolation │ │ │ └───────────────────────────────────────────────────────────┘ │ │ ┌───────────────────────────────────────────────────────────┐ │ │ │ GCS Buckets │ │ │ │ gs://coditect-evidence/{org_id}/... │ │ │ │ (per-org encryption keys) │ │ │ └───────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘
**Pros**:
- Cost-efficient shared infrastructure
- Strong isolation via keyspace + encryption
- Scalable to 10,000+ tenants
- Optional dedicated infrastructure for enterprise
**Cons**:
- More complex access control implementation
- Requires discipline in query construction
- Mixed compliance boundary
### Option 4: Cell-Based Architecture
**Description**: Tenants grouped into "cells" with dedicated infrastructure per cell.
┌─────────────────────────┐ ┌─────────────────────────┐ │ Cell 1 │ │ Cell 2 │ │ ┌──────────────────┐ │ │ ┌──────────────────┐ │ │ │ Orgs 1-100 │ │ │ │ Orgs 101-200 │ │ │ │ Dedicated DB │ │ │ │ Dedicated DB │ │ │ │ Dedicated Cache │ │ │ │ Dedicated Cache │ │ │ └──────────────────┘ │ │ └──────────────────┘ │ └─────────────────────────┘ └─────────────────────────┘
**Pros**:
- Blast radius limited to cell
- Easier capacity planning
- Natural compliance boundaries
- Supports data residency
**Cons**:
- More complex routing
- Cross-cell operations difficult
- Cell sizing challenges
## Decision
**Chosen Option**: Hybrid of Option 3 (Shared + Isolation) + Option 4 (Cell-Based for Enterprise)
### Rationale
1. **Default**: Shared infrastructure with keyspace isolation for cost efficiency
2. **Enterprise Tier**: Cell-based deployment for customers requiring dedicated infrastructure
3. **Compliance**: Per-org encryption keys for data-at-rest
4. **Data Residency**: Regional cells for GDPR/data sovereignty requirements
## Detailed Design
### Tenant Context Model
```python
from dataclasses import dataclass
from typing import Optional, List
from enum import Enum
from contextvars import ContextVar
class TenantTier(Enum):
STARTER = "starter" # Shared infrastructure
GROWTH = "growth" # Shared with priority
ENTERPRISE = "enterprise" # Dedicated cell
class DataResidency(Enum):
US = "us"
EU = "eu"
APAC = "apac"
@dataclass
class Organization:
"""Organization (tenant) entity."""
id: str
name: str
slug: str
tier: TenantTier
data_residency: DataResidency
# Infrastructure assignment
cell_id: Optional[str] # For enterprise tier
encryption_key_id: str
# Limits
max_users: int
max_integrations: int
max_frameworks: int
# Metadata
created_at: datetime
settings: Dict[str, Any]
@dataclass
class TenantContext:
"""
Context for current request/operation.
Propagated through all service calls.
"""
organization_id: str
user_id: str
roles: List[str]
permissions: List[str]
tier: TenantTier
cell_id: Optional[str]
encryption_key_id: str
def has_permission(self, permission: str) -> bool:
return permission in self.permissions
# Context variable for request-scoped tenant context
_tenant_context: ContextVar[Optional[TenantContext]] = ContextVar(
'tenant_context',
default=None
)
def get_tenant_context() -> TenantContext:
"""Get current tenant context."""
ctx = _tenant_context.get()
if ctx is None:
raise TenantContextError("No tenant context available")
return ctx
def set_tenant_context(ctx: TenantContext) -> None:
"""Set tenant context for current request."""
_tenant_context.set(ctx)
Data Isolation Layers
class TenantIsolationMiddleware:
"""
Middleware to establish tenant context from JWT.
"""
async def __call__(
self,
request: Request,
call_next: Callable
) -> Response:
# Extract tenant from JWT
token = request.headers.get("Authorization", "").replace("Bearer ", "")
claims = await self.verify_token(token)
# Build tenant context
org = await self.org_repo.get(claims["org_id"])
user = await self.user_repo.get(claims["sub"])
context = TenantContext(
organization_id=org.id,
user_id=user.id,
roles=user.roles,
permissions=self._expand_permissions(user.roles),
tier=org.tier,
cell_id=org.cell_id,
encryption_key_id=org.encryption_key_id
)
# Set context for request
set_tenant_context(context)
try:
response = await call_next(request)
return response
finally:
# Clear context
_tenant_context.set(None)
FoundationDB Keyspace Design
class TenantKeyspace:
"""
FoundationDB keyspace design for multi-tenancy.
Key structure:
/{org_id}/{entity_type}/{entity_id}
Example:
/org_abc123/controls/ctrl_xyz789
/org_abc123/evidence/evid_123456
/org_abc123/frameworks/frm_soc2v2
"""
@staticmethod
def org_prefix(organization_id: str) -> bytes:
"""Get keyspace prefix for organization."""
return f"/{organization_id}/".encode()
@staticmethod
def entity_key(
organization_id: str,
entity_type: str,
entity_id: str
) -> bytes:
"""Build full key for entity."""
return f"/{organization_id}/{entity_type}/{entity_id}".encode()
@staticmethod
def entity_range(
organization_id: str,
entity_type: str
) -> tuple[bytes, bytes]:
"""Get key range for all entities of type in org."""
prefix = f"/{organization_id}/{entity_type}/".encode()
return (prefix, prefix + b'\xff')
class TenantAwareRepository:
"""
Base repository that enforces tenant isolation.
"""
def __init__(self, fdb_client: FoundationDBClient):
self.fdb = fdb_client
@property
def organization_id(self) -> str:
"""Get organization_id from tenant context."""
return get_tenant_context().organization_id
async def get(self, entity_id: str) -> Optional[Entity]:
"""
Get entity by ID, scoped to current tenant.
"""
key = TenantKeyspace.entity_key(
self.organization_id,
self.entity_type,
entity_id
)
data = await self.fdb.get(key)
if data is None:
return None
# Decrypt if needed
decrypted = await self._decrypt(data)
return self._deserialize(decrypted)
async def list(
self,
limit: int = 100,
cursor: Optional[str] = None
) -> tuple[List[Entity], Optional[str]]:
"""
List entities for current tenant.
"""
start, end = TenantKeyspace.entity_range(
self.organization_id,
self.entity_type
)
if cursor:
start = base64.b64decode(cursor)
results = await self.fdb.get_range(start, end, limit=limit + 1)
entities = [self._deserialize(r.value) for r in results[:limit]]
next_cursor = None
if len(results) > limit:
next_cursor = base64.b64encode(results[limit].key).decode()
return entities, next_cursor
async def save(self, entity: Entity) -> None:
"""
Save entity, automatically scoped to tenant.
"""
# Ensure entity belongs to current tenant
if entity.organization_id != self.organization_id:
raise TenantIsolationError(
"Cannot save entity for different organization"
)
key = TenantKeyspace.entity_key(
self.organization_id,
self.entity_type,
entity.id
)
data = self._serialize(entity)
encrypted = await self._encrypt(data)
await self.fdb.set(key, encrypted)
async def _encrypt(self, data: bytes) -> bytes:
"""Encrypt data with tenant's key."""
ctx = get_tenant_context()
key = await self.key_vault.get_key(ctx.encryption_key_id)
return self.cipher.encrypt(data, key)
async def _decrypt(self, data: bytes) -> bytes:
"""Decrypt data with tenant's key."""
ctx = get_tenant_context()
key = await self.key_vault.get_key(ctx.encryption_key_id)
return self.cipher.decrypt(data, key)
Neo4j Multi-Tenancy
class TenantAwareGraphRepository:
"""
Neo4j repository with tenant isolation.
Isolation strategy:
- All nodes have organization_id property
- All queries filter by organization_id
- Query builder prevents cross-tenant access
"""
def __init__(self, driver: neo4j.Driver):
self.driver = driver
@property
def organization_id(self) -> str:
return get_tenant_context().organization_id
async def query(
self,
cypher: str,
parameters: Dict[str, Any]
) -> List[Dict]:
"""
Execute Cypher query with tenant isolation.
Automatically injects organization_id filter.
"""
# Validate query doesn't bypass isolation
self._validate_query(cypher)
# Inject tenant parameter
parameters["_organization_id"] = self.organization_id
# Wrap query with tenant filter
isolated_query = self._wrap_with_tenant_filter(cypher)
async with self.driver.session() as session:
result = await session.run(isolated_query, parameters)
return [record.data() for record in await result.fetch_all()]
def _validate_query(self, cypher: str) -> None:
"""
Validate query doesn't attempt cross-tenant access.
Raises exception if query:
- Doesn't use parameterized organization_id
- Uses UNION across different orgs
- Bypasses tenant filter
"""
# Simple validation - production would use query parser
if "organization_id:" in cypher.lower():
raise TenantIsolationError(
"Direct organization_id literals not allowed"
)
def _wrap_with_tenant_filter(self, cypher: str) -> str:
"""
Wrap Cypher query to enforce tenant isolation.
Adds WHERE clause to all MATCH patterns.
"""
# This is simplified - production uses proper AST manipulation
return cypher.replace(
"MATCH (",
"MATCH (n {organization_id: $_organization_id})--(n2) WHERE n"
)
GCS Blob Isolation
class TenantBlobStorage:
"""
GCS storage with tenant isolation.
Structure:
gs://coditect-evidence-{region}/{org_id}/{evidence_type}/{year}/{month}/{file}
Security:
- Per-org encryption keys (CMEK)
- Signed URLs with short expiry
- Audit logging for all access
"""
BUCKET_TEMPLATE = "coditect-evidence-{region}"
def __init__(
self,
gcs_client: storage.Client,
key_manager: KeyManager
):
self.gcs = gcs_client
self.keys = key_manager
def _get_bucket_name(self, region: str) -> str:
return self.BUCKET_TEMPLATE.format(region=region)
def _get_blob_path(
self,
organization_id: str,
evidence_type: str,
filename: str
) -> str:
now = datetime.utcnow()
return f"{organization_id}/{evidence_type}/{now.year}/{now.month:02d}/{filename}"
async def upload(
self,
evidence_type: str,
content: bytes,
filename: str,
content_type: str
) -> str:
"""Upload blob for current tenant."""
ctx = get_tenant_context()
org = await self.org_repo.get(ctx.organization_id)
bucket_name = self._get_bucket_name(org.data_residency.value)
blob_path = self._get_blob_path(
ctx.organization_id,
evidence_type,
filename
)
bucket = self.gcs.bucket(bucket_name)
blob = bucket.blob(blob_path)
# Set customer-managed encryption key
blob.upload_from_string(
content,
content_type=content_type,
encryption_key=await self.keys.get_key(org.encryption_key_id)
)
return f"gs://{bucket_name}/{blob_path}"
async def get_signed_url(
self,
blob_uri: str,
expiry_minutes: int = 15
) -> str:
"""Generate signed URL for blob access."""
ctx = get_tenant_context()
# Parse URI
bucket_name, blob_path = self._parse_uri(blob_uri)
# Verify blob belongs to current tenant
if not blob_path.startswith(f"{ctx.organization_id}/"):
raise TenantIsolationError(
"Cannot access blob from different organization"
)
bucket = self.gcs.bucket(bucket_name)
blob = bucket.blob(blob_path)
return blob.generate_signed_url(
version="v4",
expiration=timedelta(minutes=expiry_minutes),
method="GET"
)
Resource Limits
@dataclass
class TenantLimits:
"""Resource limits per tenant tier."""
max_users: int
max_integrations: int
max_frameworks: int
max_controls: int
max_evidence_items: int
max_storage_gb: int
api_rate_limit: int # requests per minute
agent_task_limit: int # concurrent agent tasks
TIER_LIMITS = {
TenantTier.STARTER: TenantLimits(
max_users=10,
max_integrations=5,
max_frameworks=3,
max_controls=1000,
max_evidence_items=50000,
max_storage_gb=10,
api_rate_limit=100,
agent_task_limit=2
),
TenantTier.GROWTH: TenantLimits(
max_users=100,
max_integrations=20,
max_frameworks=10,
max_controls=5000,
max_evidence_items=500000,
max_storage_gb=100,
api_rate_limit=500,
agent_task_limit=10
),
TenantTier.ENTERPRISE: TenantLimits(
max_users=1000,
max_integrations=100,
max_frameworks=50,
max_controls=20000,
max_evidence_items=5000000,
max_storage_gb=1000,
api_rate_limit=2000,
agent_task_limit=50
)
}
class TenantLimitEnforcer:
"""Enforce resource limits per tenant."""
async def check_limit(
self,
resource_type: str,
increment: int = 1
) -> bool:
"""Check if operation would exceed limit."""
ctx = get_tenant_context()
limits = TIER_LIMITS[ctx.tier]
current = await self._get_current_usage(
ctx.organization_id,
resource_type
)
max_allowed = getattr(limits, f"max_{resource_type}")
return current + increment <= max_allowed
async def enforce_limit(
self,
resource_type: str,
increment: int = 1
) -> None:
"""Raise exception if limit exceeded."""
if not await self.check_limit(resource_type, increment):
ctx = get_tenant_context()
raise TenantLimitExceeded(
f"Organization {ctx.organization_id} has reached "
f"{resource_type} limit for {ctx.tier.value} tier"
)
Tenant Deletion
class TenantDeletionService:
"""
Handle complete tenant data deletion for GDPR compliance.
"""
async def delete_tenant(
self,
organization_id: str,
requester: str,
reason: str
) -> DeletionReceipt:
"""
Delete all data for a tenant.
Steps:
1. Verify authorization
2. Create deletion audit record
3. Delete FoundationDB keyspace
4. Delete Neo4j nodes/relationships
5. Delete GCS blobs
6. Revoke encryption keys
7. Return receipt
"""
# Verify authorization
await self._verify_deletion_authorization(organization_id, requester)
# Create audit record
deletion_id = await self._create_deletion_record(
organization_id, requester, reason
)
try:
# Delete FoundationDB data
await self._delete_fdb_keyspace(organization_id)
# Delete Neo4j data
await self._delete_graph_data(organization_id)
# Delete GCS blobs
await self._delete_blob_storage(organization_id)
# Revoke encryption keys
await self._revoke_encryption_keys(organization_id)
# Mark deletion complete
receipt = await self._complete_deletion(deletion_id)
return receipt
except Exception as e:
# Mark deletion failed, trigger manual review
await self._fail_deletion(deletion_id, str(e))
raise
async def _delete_fdb_keyspace(self, organization_id: str) -> None:
"""Delete all FoundationDB keys for organization."""
prefix = TenantKeyspace.org_prefix(organization_id)
await self.fdb.clear_range_startswith(prefix)
async def _delete_graph_data(self, organization_id: str) -> None:
"""Delete all Neo4j nodes for organization."""
async with self.neo4j.session() as session:
await session.run(
"""
MATCH (n {organization_id: $org_id})
DETACH DELETE n
""",
org_id=organization_id
)
async def _delete_blob_storage(self, organization_id: str) -> None:
"""Delete all GCS blobs for organization."""
for region in DataResidency:
bucket_name = f"coditect-evidence-{region.value}"
bucket = self.gcs.bucket(bucket_name)
# List and delete all blobs with org prefix
blobs = bucket.list_blobs(prefix=f"{organization_id}/")
for blob in blobs:
blob.delete()
Consequences
Positive
- Strong Isolation: Per-tenant encryption + keyspace isolation
- Cost Efficiency: Shared infrastructure for most tenants
- Compliance: Supports GDPR deletion, data residency
- Scalability: Handles 10,000+ tenants
- Flexibility: Enterprise tier gets dedicated infrastructure
Negative
- Complexity: Multiple isolation mechanisms to maintain
- Performance: Per-tenant encryption adds latency
- Debugging: Harder to diagnose cross-tenant issues
Mitigations
- Comprehensive tenant context logging
- Performance benchmarks per operation type
- Automated isolation testing in CI/CD
Implementation Plan
Phase 1: Core Model (Week 1)
- TenantContext implementation
- Middleware for context propagation
- FoundationDB keyspace design
Phase 2: Data Layer (Week 2-3)
- TenantAwareRepository base class
- Neo4j isolation wrapper
- GCS tenant storage
Phase 3: Limits & Quotas (Week 4)
- Resource limit enforcement
- Quota monitoring
- Alerting for limit approaching
Phase 4: Deletion (Week 5)
- Tenant deletion workflow
- Audit trail
- Verification tests
Validation Criteria
- Isolation: Zero cross-tenant data access possible
- Deletion: Complete data removal within 24 hours
- Performance: < 5% latency impact from encryption
- Audit: 100% of data access logged
- Limits: Enforcement prevents resource exhaustion
References
## Acceptance Criteria
1. **Isolation Model**: Complete design for all data layers
2. **Context Propagation**: Request-scoped tenant context
3. **Encryption**: Per-tenant key management
4. **Deletion**: GDPR-compliant tenant removal
5. **Limits**: Resource quota enforcement
## Token Budget
- Target: 10,000-16,000 tokens
- Priority: Data isolation and deletion sections
## Dependencies
- Input: SDD multi-tenant requirements
- Input: ADR-001 (Control Graph) for graph isolation
- Output: Feeds into all component build prompts
## Integration Points
This ADR establishes patterns used by:
- All repository implementations
- API authentication middleware
- Agent task execution context
- Evidence storage service