ADR-021-v4: Unified Hybrid Graph System (Part 1: Narrative)
Document: ADR-021-v4-unified-hybrid-graph-system-part1-narrative
Version: 2.0.0
Purpose: Define unified hybrid graph architecture combining FoundationDB and Qdrant for human understanding
Audience: Business stakeholders, product managers, developers
Date Created: 2025-08-30
Date Modified: 2025-09-01
QA Reviewed: Pending
Status: DRAFT
Supersedes: v1.0.0
Table of Contents
- 1. Document Information
- 2. Purpose of this ADR
- 3. User Story Context
- 4. Executive Summary
- 5. Visual Overview
- 6. Background & Problem
- 7. Decision
- 8. Implementation Blueprint
- 9. Testing Strategy
- 10. Security Considerations
- 11. Performance Characteristics
- 12. Operational Considerations
- 13. Migration Strategy
- 14. Consequences
- 15. References & Standards
- 16. Review & Approval
- 17. Appendix
- 18. QA Review Block
1. Document Information 🔴 REQUIRED
| Field | Value |
|---|---|
| ADR Number | ADR-021 |
| Title | Unified Hybrid Graph System |
| Status | Draft |
| Date Created | 2025-08-30 |
| Last Modified | 2025-09-01 |
| Version | 2.0.0 |
| Decision Makers | CTO, Chief Architect, Data Science Lead |
| Stakeholders | All CODITECT teams, customers, AI agents |
2. Purpose of this ADR 🔴 REQUIRED
This ADR serves dual purposes:
- For Humans 👥: Understand how CODITECT unifies diverse knowledge domains through a hybrid graph system
- For AI Agents 🤖: Implement a universal graph abstraction combining FoundationDB's ACID properties with Qdrant's semantic search
3. User Story Context 🔴 REQUIRED
As a developer debugging production issues,
I want intelligent log analysis that finds patterns across time and services,
So that I can resolve issues 80% faster with proven solutions.
As an AI engineer building prompts,
I want access to proven prompt patterns that work,
So that I can achieve 50% better results without reinventing the wheel.
As a code reviewer,
I want intelligent code analysis that shows hidden dependencies,
So that I can prevent 70% of refactoring-related bugs.
📋 Acceptance Criteria:
- Single API for all graph operations across domains
- Sub-100ms query response for common patterns
- Cross-domain intelligence discovery
- Multi-tenant data isolation
- Semantic and structural queries in one system
- Real-time graph updates with ACID guarantees
- Privacy-preserving cross-tenant learning
4. Executive Summary 🔴 REQUIRED
🏢 For Business Stakeholders
Imagine if your company's knowledge was like a vast library where:
- Books (data) are perfectly organized on shelves (FoundationDB)
- A librarian with perfect memory (Qdrant) can find any book by meaning, not just title
- All books are connected by invisible threads showing relationships
- Every reader's insights make the library smarter for everyone
This is CODITECT's Unified Hybrid Graph System - combining the best of structured databases with AI-powered understanding.
Business Value:
- 80% faster issue resolution through intelligent log analysis
- 50% better AI ROI through prompt intelligence
- 70% fewer bugs through code intelligence
- 100% knowledge retention - nothing learned is ever lost
Key Decision: Implement a hybrid architecture using FoundationDB for structure and Qdrant for semantics.
💻 For Technical Readers
Technical Summary: Generic graph trait system allowing any domain (logs, prompts, code, etc.) to leverage both FoundationDB's ACID transactions and Qdrant's vector similarity search through a unified API.
5. Visual Overview 🔴 REQUIRED
5.1 Hybrid Architecture
5.2 Cross-Domain Intelligence Flow
5.3 Data Flow Architecture
6. Background & Problem 🔴 REQUIRED
6.1 Business Context
Why this matters:
- Knowledge Silos: Each domain (logs, code, prompts) uses different storage systems
- Lost Insights: Valuable patterns hidden because systems don't talk to each other
- Duplicate Effort: Teams solve the same problems repeatedly
- Scaling Costs: Multiple specialized databases increase complexity and cost
User impact:
- Developers waste 40% of time on manual correlation
- AI engineers rebuild identical patterns
- Code reviewers miss critical dependencies
- Operations teams react instead of predict
Cost of inaction:
- $2.5M/year in lost developer productivity
- 23% of bugs from missed dependencies
- 60% redundant AI prompt development
- Customer churn from slow issue resolution
6.2 Technical Context
Current state in industry:
- Graph databases: Neo4j (relationships, no vectors)
- Vector databases: Pinecone (semantics, no structure)
- Hybrid attempts: Complex ETL pipelines
- Result: Fragmented, inconsistent, expensive
Limitations:
- Can't ask "find errors semantically similar to this across all time"
- Can't track how solutions propagate through systems
- Can't learn from patterns across domains
- Can't maintain consistency across stores
6.3 Constraints
| Type | Constraint | Impact |
|---|---|---|
| ⏰ Time | 3-month implementation | Phased rollout by domain |
| 💰 Budget | Use existing FDB + Qdrant | No new infrastructure |
| 👥 Resources | Current team only | Leverage trait patterns |
| 🔧 Technical | Multi-tenant isolation | Tenant-aware design |
| 📜 Compliance | Data residency rules | Region-aware storage |
7. Decision 🔴 REQUIRED
7.1 Y-Statement Format
In the context of managing diverse knowledge domains at scale,
facing fragmented storage systems and lost cross-domain insights,
we decided for unified hybrid graph system with FoundationDB + Qdrant
and neglected separate specialized databases and complex ETL pipelines,
to achieve 80% faster insights and 50% cost reduction,
accepting initial implementation complexity and learning curve,
because unified intelligence drives competitive advantage.
7.2 What We're Doing
Implementing a comprehensive hybrid graph system:
-
Universal Graph Abstraction
- Single API for all domains
- Generic trait system
- Type-safe operations
- Async-first design
-
Dual Storage Backend
- FoundationDB: Structure, ACID, time-series
- Qdrant: Vectors, similarity, semantics
- Unified query engine
- Automatic data routing
-
Domain Implementations
- Log Intelligence System
- Prompt Intelligence Engine
- Code Intelligence Graph
- Extensible to any domain
-
Cross-Domain Features
- Pattern discovery across domains
- Knowledge transfer
- Privacy-preserving learning
- Real-time updates
7.3 Why This Approach
Hybrid architecture benefits:
- Best of both worlds (structure + semantics)
- No compromises on ACID or vectors
- Single source of truth
- Consistent operations
Generic trait system advantages:
- Any domain can plug in
- Consistent behavior
- Code reuse
- Type safety
This approach balances:
- Performance vs flexibility
- Consistency vs distribution
- Simplicity vs power
- Current vs future needs
7.4 Alternatives Considered 🟡 OPTIONAL
Option A: Separate Specialized Databases
| Aspect | Details |
|---|---|
| Description | Different database for each domain |
| ✅ Pros | • Best tool for each job • Independent scaling • Proven solutions |
| ❌ Cons | • Complex synchronization • No cross-domain insights • High operational cost |
| Rejection Reason | Perpetuates silos, prevents unified intelligence |
Option B: Single Graph Database
| Aspect | Details |
|---|---|
| Description | Use Neo4j or similar for everything |
| ✅ Pros | • Single system • Good relationships • Mature tooling |
| ❌ Cons | • No vector operations • Limited semantic search • Performance at scale |
| Rejection Reason | Can't handle AI/semantic requirements |
8. Implementation Blueprint 🔴 REQUIRED
8.1 Architecture Overview
The system consists of three layers:
Application Layer: Domain-specific implementations (logs, prompts, code)
Graph Abstraction Layer: Universal API and trait system
Storage Layer: FoundationDB + Qdrant with unified query engine
8.2 Use Case 1: Log Intelligence System
Problem: Teams waste 40% of debugging time manually correlating errors
Solution Architecture:
- FoundationDB stores:
{tenant_id}/logs/{timestamp}/{log_id} - Qdrant indexes: Error embeddings, stack traces, patterns
- Graph links: Related errors, solutions, propagation chains
Value: 80% faster issue resolution with proven solutions
8.3 Use Case 2: Prompt Intelligence Engine
Problem: AI engineers rebuild identical patterns, wasting 60% of time
Solution Architecture:
- FoundationDB stores:
{tenant_id}/prompts/{domain}/{technique_id} - Qdrant indexes: Prompt embeddings, intent vectors
- Graph links: Successful techniques, evolution chains
Value: 50% better AI ROI through pattern reuse
8.4 Use Case 3: Code Intelligence Graph
Problem: Developers spend 75% of time understanding existing code
Solution Architecture:
- FoundationDB stores:
{tenant_id}/code/{commit}/{file_path} - Qdrant indexes: Code embeddings, documentation vectors
- Graph links: Dependencies, refactoring impacts, review insights
Value: 70% fewer bugs through dependency awareness
8.5 Configuration
All managed through environment variables:
- FDB cluster configuration
- Qdrant cloud credentials
- Domain-specific settings
- Performance tuning parameters
See Part 2 - To be created for details.
8.6 API Endpoints
| Endpoint | Method | Purpose |
|---|---|---|
/graph/{domain}/nodes | POST | Create node |
/graph/{domain}/search | POST | Hybrid search |
/graph/{domain}/traverse | POST | Graph traversal |
/graph/{domain}/insights | GET | Cross-domain patterns |
8.7 Logging Requirements
All operations logged with:
- Domain context
- Query performance
- Cross-domain discoveries
- Error conditions
- Audit trail
Detailed patterns in Part 2 - To be created.
8.8 Error Handling
Graph errors must:
- Preserve partial results
- Indicate which backend failed
- Provide fallback options
- Log detailed diagnostics
See Part 2 - To be created for implementation.
9. Testing Strategy 🔴 REQUIRED
9.1 Test Scenarios
-
Domain Tests
- Log pattern matching
- Prompt similarity search
- Code dependency traversal
- Cross-domain discovery
- Multi-tenant isolation
-
Performance Tests
- 100k nodes insertion
- Million-node traversal
- Concurrent operations
- Vector search latency
- Hybrid query optimization
-
Integration Tests
- FDB + Qdrant coordination
- Transaction consistency
- Failover handling
- Cache coherence
- Real-time updates
9.2 Performance Benchmarks
| Test | Target | Method |
|---|---|---|
| Node creation | <10ms | Single operation |
| Pattern search | <100ms | 1M nodes |
| Graph traversal | <50ms | 3 hops |
| Cross-domain | <200ms | Multi-index |
9.3 Test Coverage Requirements
| Component | Unit | Integration | E2E |
|---|---|---|---|
| Graph Traits | ≥95% | ≥85% | ≥75% |
| Query Engine | ≥90% | ≥80% | ≥70% |
| Domain Impls | ≥85% | ≥75% | ≥65% |
| Storage Layer | ≥90% | ≥85% | ≥75% |
10. Security Considerations 🔴 REQUIRED
10.1 Multi-Tenant Isolation
Data segregation:
- Tenant ID prefix on all keys
- No cross-tenant queries
- Separate vector collections
- Audit all access
10.2 Privacy Protection
Cross-tenant learning:
- Aggregate patterns only
- No raw data sharing
- Differential privacy
- Opt-in mechanisms
10.3 Threat Model
| Threat | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Data leakage | Low | Critical | Tenant isolation |
| Pattern inference | Medium | High | Privacy algorithms |
| Query injection | Low | High | Input validation |
| DoS attacks | Medium | High | Rate limiting |
11. Performance Characteristics 🔴 REQUIRED
11.1 Expected Metrics
| Operation | Target | Actual | Notes |
|---|---|---|---|
| Write throughput | 10k/sec | TBD | Per tenant |
| Read latency | <10ms | TBD | P99 |
| Search time | <100ms | TBD | 1M nodes |
| Memory usage | <8GB | TBD | Per service |
11.2 Scalability
Horizontal scaling:
- FDB automatic sharding
- Qdrant cluster mode
- Stateless services
- Load balancing
Bottlenecks:
- Vector computation
- Complex traversals
- Real-time indexing
- Cross-region sync
12. Operational Considerations 🔴 REQUIRED
12.1 Monitoring
| Metric | Alert Threshold | Action |
|---|---|---|
| Query latency | >200ms | Check indices |
| Error rate | >1% | Review logs |
| Storage growth | >80% | Add capacity |
| Cross-domain | <10/min | Check discovery |
12.2 Maintenance
Regular tasks:
- Reindex vectors monthly
- Prune old relationships
- Optimize query patterns
- Update embeddings
- Backup critical graphs
12.3 Emergency Procedures
Backend failure:
- Automatic failover
- Degraded mode ops
- Queue updates
- Restore when available
13. Migration Strategy 🔴 REQUIRED
13.1 Phase 1: Foundation (Month 1)
- Deploy graph abstraction layer
- Configure FDB + Qdrant
- Implement core traits
- Setup monitoring
13.2 Phase 2: Log Intelligence (Month 2)
- Migrate log storage
- Build pattern matching
- Train initial models
- Launch beta
13.3 Phase 3: Prompt & Code (Month 3)
- Add prompt domain
- Implement code graph
- Cross-domain features
- Full production
13.4 Rollback Plan
If issues arise:
- Keep legacy systems
- Dual-write period
- Gradual migration
- Instant fallback
14. Consequences 🔴 REQUIRED
14.1 Positive Outcomes
✅ Unified intelligence:
- Cross-domain insights
- Knowledge preservation
- Pattern discovery
- Competitive advantage
✅ Developer productivity:
- 80% faster debugging
- 50% better AI results
- 70% fewer bugs
- Happier teams
✅ Business benefits:
- Reduced costs
- Faster delivery
- Better quality
- Customer satisfaction
14.2 Negative Impacts
⚠️ Increased complexity:
- Two storage systems
- New abstractions
- Learning curve
- Integration effort
⚠️ Operational overhead:
- More monitoring
- Dual backups
- Sync coordination
- Version management
⚠️ Migration effort:
- Data movement
- Team training
- Process changes
- Temporary duplication
15. References & Standards 🔴 REQUIRED
15.1 Related ADRs
- ADR-001-v4: Container architecture
- ADR-002-v4: Storage patterns
- ADR-006-v4: Data modeling
- ADR-022-v4: Log analysis patterns
15.2 External Standards
- FoundationDB Architecture: ACID guarantees
- Qdrant Documentation: Vector operations
- GraphQL Spec: Query language
- OpenTelemetry: Observability
15.3 Best Practices
- Domain-Driven Design: Bounded contexts
- Event Sourcing: State management
- CQRS Pattern: Read/write separation
16. Review & Approval 🔴 REQUIRED
Approval Signatures
| Role | Name | Date | Signature |
|---|---|---|---|
| CTO | _______ | _______ | ___________ |
| Chief Architect | _______ | _______ | ___________ |
| Data Science Lead | _______ | _______ | ___________ |
| Security Officer | _______ | _______ | ___________ |
Review History
| Version | Date | Reviewer | Status | Comments |
|---|---|---|---|---|
| 1.0.0 | 2025-08-30 | Initial | DRAFT | Original version |
| 2.0.0 | 2025-09-01 | SESSION4 | DRAFT | Complete v4.2 rewrite |
Approval Workflow
17. Appendix
17.1 Glossary
| Term | Definition |
|---|---|
| Hybrid Graph | System combining structural and semantic storage |
| Vector Embedding | Numeric representation of semantic meaning |
| Graph Traversal | Following relationships between nodes |
| Cross-Domain | Insights spanning multiple knowledge areas |
| ACID | Atomicity, Consistency, Isolation, Durability |
| Tenant Isolation | Complete data separation between customers |
| Semantic Search | Finding by meaning, not just keywords |
| Pattern Discovery | Automatic identification of recurring structures |
17.2 Performance Comparison
| Feature | Traditional | Hybrid Graph |
|---|---|---|
| Query Speed | O(n) scan | O(log n) + vectors |
| Relationships | Foreign keys | Native graph |
| Semantics | External NLP | Built-in vectors |
| Scale | Vertical | Horizontal |
| Consistency | ACID | ACID + eventual |
17.3 Implementation Timeline
18. QA Review Block
Status: AWAITING INDEPENDENT QA REVIEW
This section will be completed by an independent QA reviewer according to ADR-QA-REVIEW-GUIDE-v4.2.
Document ready for review as of: 2025-09-01
Version ready for review: 2.0.0
Next: See Part 2: Technical Implementation - To be created for complete implementation details.