Skip to main content

ADR-021-v4: Unified Hybrid Graph System (Part 1: Narrative)

Document: ADR-021-v4-unified-hybrid-graph-system-part1-narrative
Version: 2.0.0
Purpose: Define unified hybrid graph architecture combining FoundationDB and Qdrant for human understanding
Audience: Business stakeholders, product managers, developers
Date Created: 2025-08-30
Date Modified: 2025-09-01
QA Reviewed: Pending
Status: DRAFT
Supersedes: v1.0.0

Table of Contents

1. Document Information 🔴 REQUIRED

FieldValue
ADR NumberADR-021
TitleUnified Hybrid Graph System
StatusDraft
Date Created2025-08-30
Last Modified2025-09-01
Version2.0.0
Decision MakersCTO, Chief Architect, Data Science Lead
StakeholdersAll CODITECT teams, customers, AI agents

2. Purpose of this ADR 🔴 REQUIRED

This ADR serves dual purposes:

  • For Humans 👥: Understand how CODITECT unifies diverse knowledge domains through a hybrid graph system
  • For AI Agents 🤖: Implement a universal graph abstraction combining FoundationDB's ACID properties with Qdrant's semantic search

3. User Story Context 🔴 REQUIRED

As a developer debugging production issues,
I want intelligent log analysis that finds patterns across time and services,
So that I can resolve issues 80% faster with proven solutions.

As an AI engineer building prompts,
I want access to proven prompt patterns that work,
So that I can achieve 50% better results without reinventing the wheel.

As a code reviewer,
I want intelligent code analysis that shows hidden dependencies,
So that I can prevent 70% of refactoring-related bugs.

📋 Acceptance Criteria:

  • Single API for all graph operations across domains
  • Sub-100ms query response for common patterns
  • Cross-domain intelligence discovery
  • Multi-tenant data isolation
  • Semantic and structural queries in one system
  • Real-time graph updates with ACID guarantees
  • Privacy-preserving cross-tenant learning

4. Executive Summary 🔴 REQUIRED

🏢 For Business Stakeholders

Imagine if your company's knowledge was like a vast library where:

  • Books (data) are perfectly organized on shelves (FoundationDB)
  • A librarian with perfect memory (Qdrant) can find any book by meaning, not just title
  • All books are connected by invisible threads showing relationships
  • Every reader's insights make the library smarter for everyone

This is CODITECT's Unified Hybrid Graph System - combining the best of structured databases with AI-powered understanding.

Business Value:

  • 80% faster issue resolution through intelligent log analysis
  • 50% better AI ROI through prompt intelligence
  • 70% fewer bugs through code intelligence
  • 100% knowledge retention - nothing learned is ever lost

Key Decision: Implement a hybrid architecture using FoundationDB for structure and Qdrant for semantics.

💻 For Technical Readers

Technical Summary: Generic graph trait system allowing any domain (logs, prompts, code, etc.) to leverage both FoundationDB's ACID transactions and Qdrant's vector similarity search through a unified API.

↑ Back to Top

5. Visual Overview 🔴 REQUIRED

5.1 Hybrid Architecture

5.2 Cross-Domain Intelligence Flow

5.3 Data Flow Architecture

↑ Back to Top

6. Background & Problem 🔴 REQUIRED

6.1 Business Context

Why this matters:

  • Knowledge Silos: Each domain (logs, code, prompts) uses different storage systems
  • Lost Insights: Valuable patterns hidden because systems don't talk to each other
  • Duplicate Effort: Teams solve the same problems repeatedly
  • Scaling Costs: Multiple specialized databases increase complexity and cost

User impact:

  • Developers waste 40% of time on manual correlation
  • AI engineers rebuild identical patterns
  • Code reviewers miss critical dependencies
  • Operations teams react instead of predict

Cost of inaction:

  • $2.5M/year in lost developer productivity
  • 23% of bugs from missed dependencies
  • 60% redundant AI prompt development
  • Customer churn from slow issue resolution

6.2 Technical Context

Current state in industry:

  • Graph databases: Neo4j (relationships, no vectors)
  • Vector databases: Pinecone (semantics, no structure)
  • Hybrid attempts: Complex ETL pipelines
  • Result: Fragmented, inconsistent, expensive

Limitations:

  • Can't ask "find errors semantically similar to this across all time"
  • Can't track how solutions propagate through systems
  • Can't learn from patterns across domains
  • Can't maintain consistency across stores

6.3 Constraints

TypeConstraintImpact
Time3-month implementationPhased rollout by domain
💰 BudgetUse existing FDB + QdrantNo new infrastructure
👥 ResourcesCurrent team onlyLeverage trait patterns
🔧 TechnicalMulti-tenant isolationTenant-aware design
📜 ComplianceData residency rulesRegion-aware storage

↑ Back to Top

7. Decision 🔴 REQUIRED

7.1 Y-Statement Format

In the context of managing diverse knowledge domains at scale,
facing fragmented storage systems and lost cross-domain insights,
we decided for unified hybrid graph system with FoundationDB + Qdrant
and neglected separate specialized databases and complex ETL pipelines,
to achieve 80% faster insights and 50% cost reduction,
accepting initial implementation complexity and learning curve,
because unified intelligence drives competitive advantage.

7.2 What We're Doing

Implementing a comprehensive hybrid graph system:

  1. Universal Graph Abstraction

    • Single API for all domains
    • Generic trait system
    • Type-safe operations
    • Async-first design
  2. Dual Storage Backend

    • FoundationDB: Structure, ACID, time-series
    • Qdrant: Vectors, similarity, semantics
    • Unified query engine
    • Automatic data routing
  3. Domain Implementations

    • Log Intelligence System
    • Prompt Intelligence Engine
    • Code Intelligence Graph
    • Extensible to any domain
  4. Cross-Domain Features

    • Pattern discovery across domains
    • Knowledge transfer
    • Privacy-preserving learning
    • Real-time updates

7.3 Why This Approach

Hybrid architecture benefits:

  • Best of both worlds (structure + semantics)
  • No compromises on ACID or vectors
  • Single source of truth
  • Consistent operations

Generic trait system advantages:

  • Any domain can plug in
  • Consistent behavior
  • Code reuse
  • Type safety

This approach balances:

  • Performance vs flexibility
  • Consistency vs distribution
  • Simplicity vs power
  • Current vs future needs

7.4 Alternatives Considered 🟡 OPTIONAL

Option A: Separate Specialized Databases

AspectDetails
DescriptionDifferent database for each domain
✅ Pros• Best tool for each job
• Independent scaling
• Proven solutions
❌ Cons• Complex synchronization
• No cross-domain insights
• High operational cost
Rejection ReasonPerpetuates silos, prevents unified intelligence

Option B: Single Graph Database

AspectDetails
DescriptionUse Neo4j or similar for everything
✅ Pros• Single system
• Good relationships
• Mature tooling
❌ Cons• No vector operations
• Limited semantic search
• Performance at scale
Rejection ReasonCan't handle AI/semantic requirements

↑ Back to Top

8. Implementation Blueprint 🔴 REQUIRED

8.1 Architecture Overview

The system consists of three layers:

Application Layer: Domain-specific implementations (logs, prompts, code)
Graph Abstraction Layer: Universal API and trait system
Storage Layer: FoundationDB + Qdrant with unified query engine

8.2 Use Case 1: Log Intelligence System

Problem: Teams waste 40% of debugging time manually correlating errors

Solution Architecture:

  • FoundationDB stores: {tenant_id}/logs/{timestamp}/{log_id}
  • Qdrant indexes: Error embeddings, stack traces, patterns
  • Graph links: Related errors, solutions, propagation chains

Value: 80% faster issue resolution with proven solutions

8.3 Use Case 2: Prompt Intelligence Engine

Problem: AI engineers rebuild identical patterns, wasting 60% of time

Solution Architecture:

  • FoundationDB stores: {tenant_id}/prompts/{domain}/{technique_id}
  • Qdrant indexes: Prompt embeddings, intent vectors
  • Graph links: Successful techniques, evolution chains

Value: 50% better AI ROI through pattern reuse

8.4 Use Case 3: Code Intelligence Graph

Problem: Developers spend 75% of time understanding existing code

Solution Architecture:

  • FoundationDB stores: {tenant_id}/code/{commit}/{file_path}
  • Qdrant indexes: Code embeddings, documentation vectors
  • Graph links: Dependencies, refactoring impacts, review insights

Value: 70% fewer bugs through dependency awareness

8.5 Configuration

All managed through environment variables:

  • FDB cluster configuration
  • Qdrant cloud credentials
  • Domain-specific settings
  • Performance tuning parameters

See Part 2 - To be created for details.

8.6 API Endpoints

EndpointMethodPurpose
/graph/{domain}/nodesPOSTCreate node
/graph/{domain}/searchPOSTHybrid search
/graph/{domain}/traversePOSTGraph traversal
/graph/{domain}/insightsGETCross-domain patterns

8.7 Logging Requirements

All operations logged with:

  • Domain context
  • Query performance
  • Cross-domain discoveries
  • Error conditions
  • Audit trail

Detailed patterns in Part 2 - To be created.

8.8 Error Handling

Graph errors must:

  • Preserve partial results
  • Indicate which backend failed
  • Provide fallback options
  • Log detailed diagnostics

See Part 2 - To be created for implementation.

↑ Back to Top

9. Testing Strategy 🔴 REQUIRED

9.1 Test Scenarios

  1. Domain Tests

    • Log pattern matching
    • Prompt similarity search
    • Code dependency traversal
    • Cross-domain discovery
    • Multi-tenant isolation
  2. Performance Tests

    • 100k nodes insertion
    • Million-node traversal
    • Concurrent operations
    • Vector search latency
    • Hybrid query optimization
  3. Integration Tests

    • FDB + Qdrant coordination
    • Transaction consistency
    • Failover handling
    • Cache coherence
    • Real-time updates

9.2 Performance Benchmarks

TestTargetMethod
Node creation<10msSingle operation
Pattern search<100ms1M nodes
Graph traversal<50ms3 hops
Cross-domain<200msMulti-index

9.3 Test Coverage Requirements

ComponentUnitIntegrationE2E
Graph Traits≥95%≥85%≥75%
Query Engine≥90%≥80%≥70%
Domain Impls≥85%≥75%≥65%
Storage Layer≥90%≥85%≥75%

↑ Back to Top

10. Security Considerations 🔴 REQUIRED

10.1 Multi-Tenant Isolation

Data segregation:

  • Tenant ID prefix on all keys
  • No cross-tenant queries
  • Separate vector collections
  • Audit all access

10.2 Privacy Protection

Cross-tenant learning:

  • Aggregate patterns only
  • No raw data sharing
  • Differential privacy
  • Opt-in mechanisms

10.3 Threat Model

ThreatLikelihoodImpactMitigation
Data leakageLowCriticalTenant isolation
Pattern inferenceMediumHighPrivacy algorithms
Query injectionLowHighInput validation
DoS attacksMediumHighRate limiting

↑ Back to Top

11. Performance Characteristics 🔴 REQUIRED

11.1 Expected Metrics

OperationTargetActualNotes
Write throughput10k/secTBDPer tenant
Read latency<10msTBDP99
Search time<100msTBD1M nodes
Memory usage<8GBTBDPer service

11.2 Scalability

Horizontal scaling:

  • FDB automatic sharding
  • Qdrant cluster mode
  • Stateless services
  • Load balancing

Bottlenecks:

  • Vector computation
  • Complex traversals
  • Real-time indexing
  • Cross-region sync

↑ Back to Top

12. Operational Considerations 🔴 REQUIRED

12.1 Monitoring

MetricAlert ThresholdAction
Query latency>200msCheck indices
Error rate>1%Review logs
Storage growth>80%Add capacity
Cross-domain<10/minCheck discovery

12.2 Maintenance

Regular tasks:

  • Reindex vectors monthly
  • Prune old relationships
  • Optimize query patterns
  • Update embeddings
  • Backup critical graphs

12.3 Emergency Procedures

Backend failure:

  1. Automatic failover
  2. Degraded mode ops
  3. Queue updates
  4. Restore when available

↑ Back to Top

13. Migration Strategy 🔴 REQUIRED

13.1 Phase 1: Foundation (Month 1)

  • Deploy graph abstraction layer
  • Configure FDB + Qdrant
  • Implement core traits
  • Setup monitoring

13.2 Phase 2: Log Intelligence (Month 2)

  • Migrate log storage
  • Build pattern matching
  • Train initial models
  • Launch beta

13.3 Phase 3: Prompt & Code (Month 3)

  • Add prompt domain
  • Implement code graph
  • Cross-domain features
  • Full production

13.4 Rollback Plan

If issues arise:

  1. Keep legacy systems
  2. Dual-write period
  3. Gradual migration
  4. Instant fallback

↑ Back to Top

14. Consequences 🔴 REQUIRED

14.1 Positive Outcomes

Unified intelligence:

  • Cross-domain insights
  • Knowledge preservation
  • Pattern discovery
  • Competitive advantage

Developer productivity:

  • 80% faster debugging
  • 50% better AI results
  • 70% fewer bugs
  • Happier teams

Business benefits:

  • Reduced costs
  • Faster delivery
  • Better quality
  • Customer satisfaction

14.2 Negative Impacts

⚠️ Increased complexity:

  • Two storage systems
  • New abstractions
  • Learning curve
  • Integration effort

⚠️ Operational overhead:

  • More monitoring
  • Dual backups
  • Sync coordination
  • Version management

⚠️ Migration effort:

  • Data movement
  • Team training
  • Process changes
  • Temporary duplication

↑ Back to Top

15. References & Standards 🔴 REQUIRED

15.2 External Standards

15.3 Best Practices

↑ Back to Top

16. Review & Approval 🔴 REQUIRED

Approval Signatures

RoleNameDateSignature
CTO_________________________
Chief Architect_________________________
Data Science Lead_________________________
Security Officer_________________________

Review History

VersionDateReviewerStatusComments
1.0.02025-08-30InitialDRAFTOriginal version
2.0.02025-09-01SESSION4DRAFTComplete v4.2 rewrite

Approval Workflow

↑ Back to Top

17. Appendix

17.1 Glossary

TermDefinition
Hybrid GraphSystem combining structural and semantic storage
Vector EmbeddingNumeric representation of semantic meaning
Graph TraversalFollowing relationships between nodes
Cross-DomainInsights spanning multiple knowledge areas
ACIDAtomicity, Consistency, Isolation, Durability
Tenant IsolationComplete data separation between customers
Semantic SearchFinding by meaning, not just keywords
Pattern DiscoveryAutomatic identification of recurring structures

17.2 Performance Comparison

FeatureTraditionalHybrid Graph
Query SpeedO(n) scanO(log n) + vectors
RelationshipsForeign keysNative graph
SemanticsExternal NLPBuilt-in vectors
ScaleVerticalHorizontal
ConsistencyACIDACID + eventual

17.3 Implementation Timeline

↑ Back to Top

18. QA Review Block

Status: AWAITING INDEPENDENT QA REVIEW

This section will be completed by an independent QA reviewer according to ADR-QA-REVIEW-GUIDE-v4.2.

Document ready for review as of: 2025-09-01
Version ready for review: 2.0.0


Next: See Part 2: Technical Implementation - To be created for complete implementation details.