PROMPT 05: Architecture Decision Record - Control Graph Data Model
Prompt Metadata
prompt_id: CODITECT-COMPLIANCE-ADR-001
prompt_type: architecture_decision_record
output_artifact: ADR-001-control-graph-data-model.md
estimated_tokens: 8,000-12,000
dependencies:
- 01-product-definition-prompt.md (completed)
- 02-product-requirements-prompt.md (completed)
- 03-software-design-document-prompt.md (completed)
System Context
You are a Principal Architect at CODITECT, tasked with documenting the critical architecture decision for the Control Graph data model in the CODITECT-COMPLIANCE module. This ADR will guide all downstream implementation of the compliance control relationships, framework mappings, and regulatory requirement tracking.
Input Documents Required
Before generating this ADR, ensure you have access to:
- Product Requirements Document (PRD) - specifically FR-CG-* requirements
- Software Design Document (SDD) - Section 4 (Component Design) and Section 5 (Data Architecture)
- Vanta compliance features analysis document
Output Specification
Generate a comprehensive Architecture Decision Record following this exact structure:
Document Structure
# ADR-001: Control Graph Data Model Architecture
## Status
[Proposed | Accepted | Deprecated | Superseded]
## Date
[YYYY-MM-DD]
## Decision Makers
- [Role]: [Responsibility in decision]
## Context
### Business Context
[Describe the business problem driving this decision]
### Technical Context
[Describe the technical landscape and constraints]
### Current State
[If applicable, describe what exists today]
## Decision Drivers
### Primary Drivers
1. [Driver with priority and rationale]
2. [Driver with priority and rationale]
...
### Constraints
1. [Hard constraint that cannot be violated]
2. [Soft constraint that should be respected]
...
## Considered Options
### Option 1: [Name]
[Detailed description]
**Pros:**
- [Advantage]
**Cons:**
- [Disadvantage]
**Estimated Effort:** [T-shirt size]
**Risk Level:** [Low/Medium/High]
### Option 2: [Name]
[Repeat structure]
### Option 3: [Name]
[Repeat structure]
## Decision
### Selected Option
[State the chosen option]
### Rationale
[Explain why this option was selected]
### Trade-offs Accepted
[Document what we're giving up]
## Consequences
### Positive Consequences
1. [Benefit]
### Negative Consequences
1. [Drawback and mitigation]
### Neutral Consequences
1. [Observation]
## Implementation Guidelines
### Phase 1: [Name]
[Implementation steps]
### Phase 2: [Name]
[Implementation steps]
## Validation Criteria
### Success Metrics
1. [Measurable outcome]
### Acceptance Criteria
1. [Testable criterion]
## Related Decisions
- [Link to related ADRs]
## References
- [External references]
Content Requirements
Section: Context
Generate comprehensive context covering:
Business Context Requirements:
- Compliance frameworks have hierarchical, overlapping structures
- Single control can satisfy multiple framework requirements
- Organizations need to map their implementations to framework controls
- Auditors require evidence linked to specific controls
- Cross-framework efficiency is a key differentiator (implement once, satisfy many)
Technical Context Requirements:
- Graph relationships are central to the domain
- Query patterns include: ancestor/descendant traversal, shortest path, subgraph extraction
- Write patterns: bulk framework import, incremental control updates
- Read-heavy workload (95% reads, 5% writes)
- Multi-tenant isolation required
Quantitative Context:
expected_scale:
frameworks: 50+ (30+ at launch)
controls_per_framework: 50-500
total_controls: 5,000-15,000
control_mappings: 50,000-200,000 (many-to-many)
organizations: 1,000+
implementations_per_org: 500-5,000
query_patterns:
framework_hierarchy_traversal: 40%
cross_framework_mapping: 25%
implementation_status_lookup: 20%
gap_analysis: 10%
audit_evidence_linking: 5%
performance_requirements:
hierarchy_traversal: <100ms P95
cross_framework_query: <500ms P95
gap_analysis_full: <5s P95
bulk_import: <30s for 500 controls
Section: Decision Drivers
Generate decision drivers including:
Primary Drivers (in priority order):
-
Relationship Query Performance
- Graph traversals must be efficient
- Multi-hop queries (control → mapping → control → framework) common
- Priority: CRITICAL
-
Schema Flexibility
- New frameworks added frequently
- Control structures vary by framework
- Custom attributes per framework
- Priority: HIGH
-
Cross-Framework Mapping Efficiency
- Many-to-many relationships with metadata
- Mapping confidence scores
- Bidirectional traversal
- Priority: HIGH
-
Multi-Tenant Data Isolation
- Organization-specific implementations
- Shared framework definitions
- Tenant data never crosses boundaries
- Priority: CRITICAL
-
Audit Trail Requirements
- All changes must be tracked
- Point-in-time queries for audits
- Evidence chain integrity
- Priority: HIGH
-
Integration with CODITECT-CORE
- Must work with FoundationDB operational store
- Event sourcing compatibility
- Consistent transaction boundaries
- Priority: MEDIUM
Constraints:
- HARD: Multi-tenant isolation - Tenant data must never leak
- HARD: ACID for implementations - Organization data changes must be transactional
- HARD: Sub-second read latency - Common queries under 500ms
- SOFT: Minimize operational complexity - Prefer managed services
- SOFT: Cost efficiency - Scale costs linearly with usage
Section: Considered Options
Generate detailed analysis for these options:
Option 1: Neo4j (Dedicated Graph Database)
description: |
Use Neo4j as dedicated graph database for all control graph data.
Frameworks, controls, and mappings stored as nodes and relationships.
Organization implementations stored as separate subgraphs.
architecture:
primary_store: Neo4j
query_language: Cypher
deployment: Neo4j AuraDB (managed) or self-hosted
data_model:
nodes:
- Framework (id, name, version, category, status)
- Control (id, framework_id, code, title, description, attributes)
- ControlMapping (id, confidence, rationale, created_at)
- Implementation (id, org_id, control_id, status, evidence_refs)
relationships:
- (Framework)-[:CONTAINS]->(Control)
- (Control)-[:PARENT_OF]->(Control)
- (Control)-[:MAPS_TO {confidence, rationale}]->(Control)
- (Implementation)-[:IMPLEMENTS]->(Control)
query_examples:
hierarchy: |
MATCH (f:Framework {id: $framework_id})-[:CONTAINS]->(c:Control)
OPTIONAL MATCH (c)-[:PARENT_OF*]->(child:Control)
RETURN c, collect(child) as children
cross_mapping: |
MATCH (c1:Control {id: $control_id})-[:MAPS_TO*1..2]-(c2:Control)
WHERE c2.framework_id <> c1.framework_id
RETURN c2, relationships(path) as mappings
gap_analysis: |
MATCH (f:Framework {id: $framework_id})-[:CONTAINS]->(c:Control)
OPTIONAL MATCH (i:Implementation {org_id: $org_id})-[:IMPLEMENTS]->(c)
RETURN c, i IS NOT NULL as implemented
pros:
- Native graph query performance (index-free adjacency)
- Expressive Cypher query language
- Built-in graph algorithms (shortest path, community detection)
- Schema flexibility (add properties without migration)
- Visualization tools for debugging
cons:
- Additional operational complexity (another database)
- Transaction boundaries don't align with FoundationDB
- Multi-tenancy requires careful modeling
- Cost scales with data size
- Learning curve for Cypher
effort: Medium
risk: Medium
Option 2: FoundationDB with Graph Layer
description: |
Implement graph semantics on top of FoundationDB using custom layer.
Leverage FoundationDB's ordered key-value model for adjacency lists.
Single database for all CODITECT-COMPLIANCE data.
architecture:
primary_store: FoundationDB
query_language: Custom Python API
deployment: Existing CODITECT-CORE FoundationDB cluster
data_model:
key_patterns:
# Node storage
node: "graph/{tenant}/node/{type}/{id}"
node_props: "graph/{tenant}/node/{type}/{id}/props"
# Edge storage (adjacency lists)
outgoing: "graph/{tenant}/edge/{from_type}/{from_id}/out/{rel_type}/{to_id}"
incoming: "graph/{tenant}/edge/{to_type}/{to_id}/in/{rel_type}/{from_id}"
# Indexes
by_framework: "graph/{tenant}/idx/framework/{framework_id}/{control_id}"
by_status: "graph/{tenant}/idx/status/{status}/{type}/{id}"
traversal_implementation: |
async def get_children(node_id: str, depth: int = 1) -> List[Node]:
prefix = f"graph/{tenant}/edge/control/{node_id}/out/PARENT_OF/"
children = []
async for key, value in db.get_range_startswith(prefix):
child_id = key.split('/')[-1]
child = await get_node(child_id)
children.append(child)
if depth > 1:
children.extend(await get_children(child_id, depth - 1))
return children
pros:
- Single database (operational simplicity)
- ACID transactions across all data
- Native multi-tenancy via key prefixes
- Consistent with CODITECT-CORE patterns
- No additional infrastructure
cons:
- Complex traversal implementation
- No built-in graph algorithms
- Performance degrades with hop count
- More code to maintain
- No visualization tools
effort: High
risk: High
Option 3: Hybrid (Neo4j for Graph + FoundationDB for Operations)
description: |
Use Neo4j for framework/control graph (shared reference data).
Use FoundationDB for organization-specific implementations.
Synchronize via event-driven updates.
architecture:
graph_store: Neo4j (frameworks, controls, mappings)
operational_store: FoundationDB (implementations, evidence, audit)
sync_mechanism: Redis Streams events
data_distribution:
neo4j:
- Framework definitions (shared)
- Control definitions (shared)
- Control mappings (shared)
- Control hierarchy (shared)
foundationdb:
- Organization profiles
- Implementation records
- Evidence items
- Check results
- Audit logs
sync_patterns:
# On implementation change
- Event: implementation.updated
- Action: Update Neo4j implementation node for queries
- Consistency: Eventual (< 1s)
# On framework import
- Event: framework.imported
- Action: Bulk load to Neo4j
- Consistency: Strong (wait for completion)
query_routing:
neo4j_queries:
- Framework hierarchy traversal
- Cross-framework mapping discovery
- Gap analysis (control coverage)
- Graph visualizations
foundationdb_queries:
- Implementation CRUD
- Evidence retrieval
- Check result history
- Audit trail
pros:
- Best tool for each job
- Graph queries remain fast
- Transactional integrity for org data
- Clear separation of concerns
- Scalable independently
cons:
- Two databases to operate
- Sync complexity
- Eventual consistency for some queries
- More complex deployment
- Cross-store queries require orchestration
effort: Medium-High
risk: Medium
Section: Decision
Generate decision section with:
Selected Option: Option 3 - Hybrid (Neo4j for Graph + FoundationDB for Operations)
Rationale:
-
Query Pattern Alignment
- Graph traversals (40% of queries) need native graph performance
- Operational queries (CRUD, history) fit key-value model
- Neither database alone optimally serves both patterns
-
Data Ownership Clarity
- Shared reference data (frameworks) in Neo4j makes sense
- Organization-specific data in FoundationDB aligns with CODITECT-CORE
- Clear boundaries reduce confusion
-
Performance Optimization
- Neo4j handles complex traversals in <100ms
- FoundationDB handles transactional writes with ACID
- Neither compromises on its strength
-
Operational Trade-off Acceptance
- Additional database is worth the query performance
- Event-driven sync is well-understood pattern
- Managed Neo4j (AuraDB) reduces operational burden
Trade-offs Accepted:
- Eventual consistency between stores (mitigated by <1s sync)
- Additional operational complexity (mitigated by managed services)
- Cross-store queries require orchestration (mitigated by clear routing)
Section: Implementation Guidelines
Generate implementation phases:
Phase 1: Neo4j Foundation (Week 1-2)
# Neo4j Schema Setup
CREATE CONSTRAINT framework_id ON (f:Framework) ASSERT f.id IS UNIQUE;
CREATE CONSTRAINT control_id ON (c:Control) ASSERT c.id IS UNIQUE;
CREATE INDEX control_framework FOR (c:Control) ON (c.framework_id);
CREATE INDEX control_status FOR (c:Control) ON (c.status);
# Python Driver Setup
from neo4j import AsyncGraphDatabase
class ControlGraphRepository:
def __init__(self, uri: str, auth: tuple):
self.driver = AsyncGraphDatabase.driver(uri, auth=auth)
async def create_framework(self, framework: Framework) -> str:
async with self.driver.session() as session:
result = await session.run("""
CREATE (f:Framework {
id: $id,
name: $name,
version: $version,
category: $category,
status: $status,
created_at: datetime()
})
RETURN f.id
""", framework.dict())
return result.single()['f.id']
Phase 2: FoundationDB Integration (Week 2-3)
# Implementation Repository
class ImplementationRepository:
def __init__(self, db: FoundationDB):
self.db = db
async def create_implementation(
self,
org_id: str,
implementation: Implementation
) -> str:
key = f"compliance/{org_id}/impl/{implementation.control_id}"
@fdb.transactional
async def do_create(tr):
# Store implementation
tr[key] = implementation.to_bytes()
# Publish event for Neo4j sync
event = ImplementationCreatedEvent(
org_id=org_id,
implementation=implementation
)
await self.event_publisher.publish(event)
return implementation.id
return await do_create(self.db)
Phase 3: Sync Mechanism (Week 3-4)
# Event Handler for Neo4j Sync
class Neo4jSyncHandler:
async def handle_implementation_created(
self,
event: ImplementationCreatedEvent
):
async with self.neo4j.session() as session:
await session.run("""
MATCH (c:Control {id: $control_id})
MERGE (i:Implementation {
id: $impl_id,
org_id: $org_id
})
SET i.status = $status,
i.updated_at = datetime()
MERGE (i)-[:IMPLEMENTS]->(c)
""", {
'control_id': event.implementation.control_id,
'impl_id': event.implementation.id,
'org_id': event.org_id,
'status': event.implementation.status
})
Phase 4: Query Router (Week 4-5)
# Unified Query Interface
class ControlGraphService:
def __init__(
self,
neo4j_repo: ControlGraphRepository,
fdb_repo: ImplementationRepository
):
self.neo4j = neo4j_repo
self.fdb = fdb_repo
async def get_gap_analysis(
self,
org_id: str,
framework_id: str
) -> GapAnalysis:
# Get required controls from Neo4j
controls = await self.neo4j.get_framework_controls(framework_id)
# Get implementations from FoundationDB
implementations = await self.fdb.get_implementations(
org_id,
[c.id for c in controls]
)
# Calculate gaps
impl_map = {i.control_id: i for i in implementations}
gaps = []
for control in controls:
if control.id not in impl_map:
gaps.append(GapItem(
control=control,
status='missing'
))
elif impl_map[control.id].status != 'compliant':
gaps.append(GapItem(
control=control,
status='incomplete',
implementation=impl_map[control.id]
))
return GapAnalysis(
framework_id=framework_id,
total_controls=len(controls),
implemented=len(implementations),
gaps=gaps
)
Section: Validation Criteria
Success Metrics:
-
Graph Query Performance
- Hierarchy traversal: <100ms P95 for 5-level depth
- Cross-framework mapping: <200ms P95 for 2-hop queries
- Gap analysis: <2s P95 for 500-control framework
-
Sync Latency
- Implementation changes reflected in Neo4j: <1s P95
- Framework imports fully synced: <30s for 500 controls
-
Data Consistency
- Zero cross-tenant data leaks (verified by audit)
- 100% eventual consistency within SLA
Acceptance Criteria:
- Neo4j cluster deployed and accessible
- All framework definitions loadable via import
- Control hierarchy queries return correct results
- Cross-framework mappings queryable bidirectionally
- Implementation sync events processed reliably
- Gap analysis returns accurate results
- Multi-tenant isolation verified by security audit
- Performance benchmarks meet targets
Section: Related Decisions
- ADR-002: Agent Orchestration Architecture (agents query control graph)
- ADR-003: Evidence Collection Pipeline (evidence links to controls)
- ADR-005: Multi-Tenancy Strategy (isolation patterns)
Output Format Requirements
- Generate complete ADR in Markdown format
- Include all code examples with proper syntax highlighting
- Include Mermaid diagrams for architecture visualization
- Use tables for comparison matrices
- Include YAML/JSON for configuration examples
- Total length: 4,000-6,000 words
Quality Criteria
The generated ADR must:
- Clearly state the decision and rationale
- Document all considered alternatives fairly
- Include concrete implementation guidance
- Provide measurable success criteria
- Reference related architectural decisions
- Be understandable by both technical and non-technical stakeholders
- Include diagrams for complex concepts
- Provide enough detail for implementation to begin
Execution Instructions
- Read the SDD Section 5 (Data Architecture) for context
- Generate the complete ADR following the structure above
- Ensure all code examples are syntactically correct
- Include at least 2 Mermaid diagrams (data model, sync flow)
- Output as a single Markdown file
- Self-validate against quality criteria before completion