Skip to main content

ADR-003: Social Graph Storage Strategy

Status

Accepted

Context

The CRM needs to store and query complex relationship networks efficiently. We need to support shortest path queries, influence scoring, and community detection for millions of contacts while maintaining sub-second query performance.

Decision

Use PostgreSQL with recursive CTEs for graph storage, enhanced with Redis caching for hot paths and materialized views for analytics.

Storage Architecture

-- Primary storage in PostgreSQL
relationships (
from_contact_id,
to_contact_id,
strength,
relationship_type
)

-- Query with recursive CTEs
WITH RECURSIVE paths AS (...)

-- Cache hot paths in Redis
graph:path:{from}:{to} → JSON

Rationale

  1. PostgreSQL Maturity: Proven, ACID compliant, familiar
  2. Recursive CTEs: Native graph traversal support
  3. No Additional Database: Reduces operational complexity
  4. Good Enough Performance: <500ms for 3-degree searches
  5. Redis Caching: Solves hot path performance

Implementation Strategy

PostgreSQL Optimizations

-- Bidirectional index for fast traversal
CREATE INDEX idx_relationships_bidirectional ON relationships(from_contact_id, to_contact_id);

-- Covering index for common queries
CREATE INDEX idx_relationships_strength ON relationships(from_contact_id, to_contact_id) INCLUDE (strength);

-- Materialized view for influence scores
CREATE MATERIALIZED VIEW contact_influence AS
WITH RECURSIVE influence_calc AS (...);

Redis Cache Structure

# Path cache (TTL: 1 hour)
graph:path:{user}:{from}:{to} → {"nodes": [...], "distance": 3}

# Network stats (TTL: 30 minutes)
graph:stats:{user}:{contact} → {"degree": 156, "influence": 0.73}

# Hot contacts (TTL: 5 minutes)
graph:hot:{user} → [contact_ids]

Consequences

Positive

  • Single database technology
  • ACID compliance maintained
  • Familiar SQL queries
  • Built-in backup/recovery
  • Cost effective

Negative

  • Performance limits at 10M+ edges
  • Complex recursive queries
  • No native graph algorithms
  • Cache invalidation complexity

Migration Path

If performance becomes an issue:

  1. Phase 1: Current PostgreSQL + Redis
  2. Phase 2: Add read-only Neo4j replica
  3. Phase 3: Gradual migration to Neo4j
  4. Phase 4: PostgreSQL for OLTP, Neo4j for OLAP

Alternatives Considered

  1. Neo4j: Rejected - operational complexity for MVP
  2. DGraph: Rejected - immature ecosystem
  3. Amazon Neptune: Rejected - vendor lock-in
  4. Pure Redis: Rejected - no durability
  5. FoundationDB: Rejected - overkill for graph data