ADR-043: Bidirectional Context Memory Sync Architecture

Status: ~~Accepted~~ SUPERSEDED Date: 2025-12-29 Superseded By: ADR-044 - Custom REST Sync Architecture Author: Hal Casteel Stakeholders: Platform Engineering, Backend, Frontend, DevOps

⚠️ SUPERSESSION NOTICE (2025-12-29)

This ADR has been superseded by ADR-044 due to FSL license concerns.

Key Finding: PowerSync uses the Functional Source License (FSL) which includes a "Competing Use" restriction. JourneyApps (PowerSync owner) offers a mobile app builder with offline sync - potentially in competition with CODITECT's development platform.

Legal Risk: Using PowerSync in CODITECT SaaS could constitute license violation.

Replacement Architecture: Custom REST API sync (see ADR-044)

Same SQLite client-side storage

Django/FastAPI REST endpoints for push/pull

PostgreSQL as source of truth

Zero third-party sync dependencies

5-Year TCO Savings: $12,900 (Custom REST $41,400 vs PowerSync $54,300)

Context

CODITECT manages AI session context memory across multiple machines, teams, and tenants. The context memory originates from Claude Code sessions stored locally as JSONL files, processed into a local SQLite database, and must be synchronized bidirectionally with a central PostgreSQL database that serves as the primary record of truth.

The Problem

Users run Claude Code on multiple machines (desktop, laptop, cloud workstations). Each machine generates:

~/.claude/projects/*/*.jsonl - Raw Claude Code session files
CODITECT processes these into unique messages in local context.db (SQLite)

Without synchronization:

Context memory is fragmented across machines
Teams cannot share learned patterns and decisions
Multi-tenant SaaS requires centralized data management

Requirements

Local → Cloud Push: SQLite changes push to PostgreSQL (cloud master)
Cloud → Local Sync: PostgreSQL syncs to all SQLite clients
PostgreSQL as Source of Truth: Central authority for all context data
Multi-client Merge: Multiple SQLite instances write to same PostgreSQL
Multi-tenant Isolation: Tenant data completely separated
Offline Support: Local-first with sync on reconnect
Conflict Resolution: Handle concurrent writes from multiple clients

Decision

Implement a bidirectional sync architecture using:

PowerSync for PostgreSQL → SQLite downstream sync
REST API for SQLite → PostgreSQL upstream writes
PostgreSQL as the central source of truth

Architecture

Complete Data Flow

┌─────────────────────────────────────────────────────────────────────────────┐
│                          LOCAL CLIENT (per machine)                          │
│                                                                              │
│  ┌──────────────────┐    ┌──────────────────┐    ┌──────────────────┐      │
│  │  ~/.claude/      │    │    CODITECT      │    │   Local SQLite   │      │
│  │  projects/       │───▶│    Session       │───▶│   context.db     │      │
│  │  *.jsonl         │    │    Processor     │    │                  │      │
│  │                  │    │                  │    │  - messages      │      │
│  │  (Claude Code    │    │  - Deduplication │    │  - decisions     │      │
│  │   raw sessions)  │    │  - Extraction    │    │  - patterns      │      │
│  └──────────────────┘    │  - Enrichment    │    │  - errors        │      │
│                          └──────────────────┘    └────────┬─────────┘      │
│                                                           │                 │
│                                                    PUSH   │  PULL           │
│                                                    (API)  │  (PowerSync)    │
│                                                           │                 │
└───────────────────────────────────────────────────────────┼─────────────────┘
                                                            │
                              ┌──────────────────────────────┴─────────────────┐
                              │                                                │
                              ▼                                                ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                              GKE CLOUD CLUSTER                               │
│                                                                              │
│  ┌──────────────────┐         ┌──────────────────┐                         │
│  │   CODITECT API   │         │   PostgreSQL     │                         │
│  │   (Backend)      │────────▶│   (Cloud SQL)    │◀────┐                   │
│  │                  │ writes  │                  │     │                   │
│  │  POST /api/sync  │         │  PRIMARY RECORD  │     │ Logical           │
│  │  - messages      │         │  OF TRUTH        │     │ Replication       │
│  │  - decisions     │         │                  │     │                   │
│  │  - patterns      │         │  - All tenants   │     │                   │
│  └──────────────────┘         │  - All users     │     │                   │
│                               │  - All machines  │     │                   │
│                               │  - Merged data   │     │                   │
│                               └────────┬─────────┘     │                   │
│                                        │               │                   │
│                                        ▼               │                   │
│                               ┌──────────────────┐     │                   │
│                               │   PowerSync      │─────┘                   │
│                               │   Service        │                         │
│                               │                  │                         │
│                               │  sync.coditect.ai│                         │
│                               └────────┬─────────┘                         │
│                                        │                                    │
└────────────────────────────────────────┼────────────────────────────────────┘
                                         │
                                         │ Sync Protocol (downstream)
                                         │
         ┌───────────────────────────────┼───────────────────────────────┐
         │                               │                               │
         ▼                               ▼                               ▼
┌─────────────────┐            ┌─────────────────┐            ┌─────────────────┐
│   Machine A     │            │   Machine B     │            │   Machine C     │
│   (Desktop)     │            │   (Laptop)      │            │   (Team Member) │
│                 │            │                 │            │                 │
│   SQLite ◀──────┼────────────┼─────────────────┼────────────┼──────▶ SQLite   │
│   context.db    │            │   SQLite        │            │   context.db    │
│                 │            │   context.db    │            │                 │
│   Same user,    │            │   Same user,    │            │   Same tenant,  │
│   different     │            │   different     │            │   different     │
│   machine       │            │   machine       │            │   user          │
└─────────────────┘            └─────────────────┘            └─────────────────┘

Sequence Diagram

Multi-Tenant Data Flow

Implementation Details

1. Upstream Push (SQLite → PostgreSQL)

Endpoint: POST /api/sync/push

# Client-side push logic
async def push_to_cloud(local_db: sqlite3.Connection):
    # Get unpushed changes since last sync
    changes = local_db.execute("""
        SELECT * FROM messages
        WHERE updated_at > ? AND sync_status = 'pending'
        ORDER BY updated_at
        LIMIT 1000
    """, [last_sync_timestamp])

    # Push to cloud API
    response = await api.post('/api/sync/push', {
        'tenant_id': config.tenant_id,
        'user_id': config.user_id,
        'machine_id': config.machine_id,
        'changes': [dict(row) for row in changes]
    })

    # Mark as synced
    for change in changes:
        local_db.execute("""
            UPDATE messages SET sync_status = 'synced',
            cloud_id = ? WHERE id = ?
        """, [response.cloud_ids[change.id], change.id])

Server-side merge logic:

# Backend API handler
@app.post('/api/sync/push')
async def handle_push(request: SyncPushRequest):
    for change in request.changes:
        # Upsert with conflict resolution
        await db.execute("""
            INSERT INTO messages (id, tenant_id, user_id, content, ...)
            VALUES ($1, $2, $3, $4, ...)
            ON CONFLICT (tenant_id, content_hash) DO UPDATE SET
                updated_at = GREATEST(messages.updated_at, EXCLUDED.updated_at),
                -- Merge metadata, keep newer
                metadata = messages.metadata || EXCLUDED.metadata
            WHERE EXCLUDED.updated_at > messages.updated_at
        """, change)

    return {'status': 'ok', 'merged': len(request.changes)}

2. Downstream Sync (PostgreSQL → SQLite)

PowerSync Sync Rules (sync_rules.yaml):

bucket_definitions:
  # User's own messages (all machines)
  - name: user_messages
    parameters:
      - name: tenant_id
      - name: user_id
    data:
      - SELECT id, content, role, tokens, model, session_id,
               created_at, updated_at
        FROM messages
        WHERE tenant_id = bucket.tenant_id
          AND user_id = bucket.user_id
        ORDER BY created_at DESC

  # User's decisions
  - name: user_decisions
    parameters:
      - name: tenant_id
      - name: user_id
    data:
      - SELECT id, title, description, context, status,
               created_at, updated_at
        FROM decisions
        WHERE tenant_id = bucket.tenant_id
          AND user_id = bucket.user_id

  # Shared team patterns (read-only for team members)
  - name: team_patterns
    parameters:
      - name: tenant_id
    data:
      - SELECT id, pattern, language, frequency, last_seen_at
        FROM code_patterns
        WHERE tenant_id = bucket.tenant_id
          AND is_shared = true

3. Conflict Resolution Strategy

Scenario	Resolution
Same message from multiple machines	Deduplicate by content_hash
Concurrent edits to same record	Last-write-wins by `updated_at`
Delete vs Update conflict	Update wins (soft delete)
New record with same ID	Use UUID, no collision

4. Sync State Tracking

-- Local SQLite: Track sync state
CREATE TABLE sync_state (
    id INTEGER PRIMARY KEY,
    last_push_at TIMESTAMP,
    last_pull_at TIMESTAMP,
    last_push_cursor TEXT,
    last_pull_cursor TEXT,
    pending_count INTEGER DEFAULT 0
);

-- PostgreSQL: Track per-client sync
CREATE TABLE client_sync_state (
    tenant_id TEXT NOT NULL,
    user_id TEXT NOT NULL,
    machine_id TEXT NOT NULL,
    last_seen_at TIMESTAMP DEFAULT NOW(),
    last_push_at TIMESTAMP,
    sync_cursor TEXT,
    PRIMARY KEY (tenant_id, user_id, machine_id)
);

Component Roles

Component	Role	Direction
Claude Code	Generate raw session data	Source
CODITECT Processor	Deduplicate, extract, enrich	Local transform
Local SQLite	Local context storage	Local cache
CODITECT API	Handle push requests	Upstream gateway
PostgreSQL	Primary record of truth	Central authority
PowerSync	Stream changes to clients	Downstream sync
Litestream	Backup local SQLite to GCS	Disaster recovery

API Endpoints

Endpoint	Method	Purpose
`/api/sync/push`	POST	Push local changes to cloud
`/api/sync/status`	GET	Get sync state for client
`/api/sync/resolve`	POST	Manually resolve conflicts
`sync.coditect.ai/*`	WS	PowerSync streaming endpoint

Rationale

Why This Architecture?

PostgreSQL as master - Proven ACID compliance, rich querying, tenant isolation
PowerSync for downstream - Purpose-built for PostgreSQL → SQLite sync
REST API for upstream - Simple, debuggable, controllable write path
Local SQLite - Fast local queries, offline support

Why Not Pure PowerSync Bidirectional?

PowerSync supports writes via their SDK, but we chose REST API for upstream because:

More control over conflict resolution
Easier to add business logic (validation, enrichment)
Better audit logging
Simpler debugging

Consequences

Positive

Unified context across all user machines
Team collaboration via shared patterns
Multi-tenant isolation at database level
Offline-first with eventual consistency
Scalable with PostgreSQL + PowerSync

Negative

Eventual consistency (not real-time for writes)
Complexity of sync state management
Infrastructure cost (PostgreSQL + PowerSync + MongoDB)

Risks & Mitigations

Risk	Mitigation
Data loss during sync	Litestream backup, sync acknowledgments
Sync conflicts	Content-hash dedup, last-write-wins
Network partitions	Queue writes locally, sync on reconnect
Large datasets	Pagination, incremental sync

Files

File	Purpose
`config/powersync/powersync.yaml`	PowerSync service config
`config/powersync/sync_rules.yaml`	Multi-tenant sync rules
`config/powersync/kubernetes/`	GKE deployment manifests
`scripts/context-sync.py`	Local sync client (TODO)
`submodules/integrations/powersync-service`	PowerSync source

ADR-042: Litestream for SQLite backup to GCS
ADR-014: Multi-Product Commerce Architecture
ADR-CX: Context Extraction System

Context​

The Problem​

Requirements​

Decision​

Architecture​

Complete Data Flow​

Sequence Diagram​

Multi-Tenant Data Flow​

Implementation Details​

1. Upstream Push (SQLite → PostgreSQL)​

2. Downstream Sync (PostgreSQL → SQLite)​

3. Conflict Resolution Strategy​

4. Sync State Tracking​

Component Roles​

API Endpoints​

Rationale​

Why This Architecture?​

Why Not Pure PowerSync Bidirectional?​

Consequences​

Positive​

Negative​

Risks & Mitigations​

Files​

Related ADRs​

References​