Skip to main content

ADR-043: Bidirectional Context Memory Sync Architecture

Status: Accepted SUPERSEDED Date: 2025-12-29 Superseded By: ADR-044 - Custom REST Sync Architecture Author: Hal Casteel Stakeholders: Platform Engineering, Backend, Frontend, DevOps

⚠️ SUPERSESSION NOTICE (2025-12-29)

This ADR has been superseded by ADR-044 due to FSL license concerns.

Key Finding: PowerSync uses the Functional Source License (FSL) which includes a "Competing Use" restriction. JourneyApps (PowerSync owner) offers a mobile app builder with offline sync - potentially in competition with CODITECT's development platform.

Legal Risk: Using PowerSync in CODITECT SaaS could constitute license violation.

Replacement Architecture: Custom REST API sync (see ADR-044)

  • Same SQLite client-side storage
  • Django/FastAPI REST endpoints for push/pull
  • PostgreSQL as source of truth
  • Zero third-party sync dependencies

5-Year TCO Savings: $12,900 (Custom REST $41,400 vs PowerSync $54,300)


Context

CODITECT manages AI session context memory across multiple machines, teams, and tenants. The context memory originates from Claude Code sessions stored locally as JSONL files, processed into a local SQLite database, and must be synchronized bidirectionally with a central PostgreSQL database that serves as the primary record of truth.

The Problem

Users run Claude Code on multiple machines (desktop, laptop, cloud workstations). Each machine generates:

  • ~/.claude/projects/*/*.jsonl - Raw Claude Code session files
  • CODITECT processes these into unique messages in local context.db (SQLite)

Without synchronization:

  • Context memory is fragmented across machines
  • Teams cannot share learned patterns and decisions
  • Multi-tenant SaaS requires centralized data management

Requirements

  1. Local → Cloud Push: SQLite changes push to PostgreSQL (cloud master)
  2. Cloud → Local Sync: PostgreSQL syncs to all SQLite clients
  3. PostgreSQL as Source of Truth: Central authority for all context data
  4. Multi-client Merge: Multiple SQLite instances write to same PostgreSQL
  5. Multi-tenant Isolation: Tenant data completely separated
  6. Offline Support: Local-first with sync on reconnect
  7. Conflict Resolution: Handle concurrent writes from multiple clients

Decision

Implement a bidirectional sync architecture using:

  • PowerSync for PostgreSQL → SQLite downstream sync
  • REST API for SQLite → PostgreSQL upstream writes
  • PostgreSQL as the central source of truth

Architecture

Complete Data Flow

┌─────────────────────────────────────────────────────────────────────────────┐
│ LOCAL CLIENT (per machine) │
│ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
│ │ ~/.claude/ │ │ CODITECT │ │ Local SQLite │ │
│ │ projects/ │───▶│ Session │───▶│ context.db │ │
│ │ *.jsonl │ │ Processor │ │ │ │
│ │ │ │ │ │ - messages │ │
│ │ (Claude Code │ │ - Deduplication │ │ - decisions │ │
│ │ raw sessions) │ │ - Extraction │ │ - patterns │ │
│ └──────────────────┘ │ - Enrichment │ │ - errors │ │
│ └──────────────────┘ └────────┬─────────┘ │
│ │ │
│ PUSH │ PULL │
│ (API) │ (PowerSync) │
│ │ │
└───────────────────────────────────────────────────────────┼─────────────────┘

┌──────────────────────────────┴─────────────────┐
│ │
▼ ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ GKE CLOUD CLUSTER │
│ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ CODITECT API │ │ PostgreSQL │ │
│ │ (Backend) │────────▶│ (Cloud SQL) │◀────┐ │
│ │ │ writes │ │ │ │
│ │ POST /api/sync │ │ PRIMARY RECORD │ │ Logical │
│ │ - messages │ │ OF TRUTH │ │ Replication │
│ │ - decisions │ │ │ │ │
│ │ - patterns │ │ - All tenants │ │ │
│ └──────────────────┘ │ - All users │ │ │
│ │ - All machines │ │ │
│ │ - Merged data │ │ │
│ └────────┬─────────┘ │ │
│ │ │ │
│ ▼ │ │
│ ┌──────────────────┐ │ │
│ │ PowerSync │─────┘ │
│ │ Service │ │
│ │ │ │
│ │ sync.coditect.ai│ │
│ └────────┬─────────┘ │
│ │ │
└────────────────────────────────────────┼────────────────────────────────────┘

│ Sync Protocol (downstream)

┌───────────────────────────────┼───────────────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Machine A │ │ Machine B │ │ Machine C │
│ (Desktop) │ │ (Laptop) │ │ (Team Member) │
│ │ │ │ │ │
│ SQLite ◀──────┼────────────┼─────────────────┼────────────┼──────▶ SQLite │
│ context.db │ │ SQLite │ │ context.db │
│ │ │ context.db │ │ │
│ Same user, │ │ Same user, │ │ Same tenant, │
│ different │ │ different │ │ different │
│ machine │ │ machine │ │ user │
└─────────────────┘ └─────────────────┘ └─────────────────┘

Sequence Diagram

Multi-Tenant Data Flow

Implementation Details

1. Upstream Push (SQLite → PostgreSQL)

Endpoint: POST /api/sync/push

# Client-side push logic
async def push_to_cloud(local_db: sqlite3.Connection):
# Get unpushed changes since last sync
changes = local_db.execute("""
SELECT * FROM messages
WHERE updated_at > ? AND sync_status = 'pending'
ORDER BY updated_at
LIMIT 1000
""", [last_sync_timestamp])

# Push to cloud API
response = await api.post('/api/sync/push', {
'tenant_id': config.tenant_id,
'user_id': config.user_id,
'machine_id': config.machine_id,
'changes': [dict(row) for row in changes]
})

# Mark as synced
for change in changes:
local_db.execute("""
UPDATE messages SET sync_status = 'synced',
cloud_id = ? WHERE id = ?
""", [response.cloud_ids[change.id], change.id])

Server-side merge logic:

# Backend API handler
@app.post('/api/sync/push')
async def handle_push(request: SyncPushRequest):
for change in request.changes:
# Upsert with conflict resolution
await db.execute("""
INSERT INTO messages (id, tenant_id, user_id, content, ...)
VALUES ($1, $2, $3, $4, ...)
ON CONFLICT (tenant_id, content_hash) DO UPDATE SET
updated_at = GREATEST(messages.updated_at, EXCLUDED.updated_at),
-- Merge metadata, keep newer
metadata = messages.metadata || EXCLUDED.metadata
WHERE EXCLUDED.updated_at > messages.updated_at
""", change)

return {'status': 'ok', 'merged': len(request.changes)}

2. Downstream Sync (PostgreSQL → SQLite)

PowerSync Sync Rules (sync_rules.yaml):

bucket_definitions:
# User's own messages (all machines)
- name: user_messages
parameters:
- name: tenant_id
- name: user_id
data:
- SELECT id, content, role, tokens, model, session_id,
created_at, updated_at
FROM messages
WHERE tenant_id = bucket.tenant_id
AND user_id = bucket.user_id
ORDER BY created_at DESC

# User's decisions
- name: user_decisions
parameters:
- name: tenant_id
- name: user_id
data:
- SELECT id, title, description, context, status,
created_at, updated_at
FROM decisions
WHERE tenant_id = bucket.tenant_id
AND user_id = bucket.user_id

# Shared team patterns (read-only for team members)
- name: team_patterns
parameters:
- name: tenant_id
data:
- SELECT id, pattern, language, frequency, last_seen_at
FROM code_patterns
WHERE tenant_id = bucket.tenant_id
AND is_shared = true

3. Conflict Resolution Strategy

ScenarioResolution
Same message from multiple machinesDeduplicate by content_hash
Concurrent edits to same recordLast-write-wins by updated_at
Delete vs Update conflictUpdate wins (soft delete)
New record with same IDUse UUID, no collision

4. Sync State Tracking

-- Local SQLite: Track sync state
CREATE TABLE sync_state (
id INTEGER PRIMARY KEY,
last_push_at TIMESTAMP,
last_pull_at TIMESTAMP,
last_push_cursor TEXT,
last_pull_cursor TEXT,
pending_count INTEGER DEFAULT 0
);

-- PostgreSQL: Track per-client sync
CREATE TABLE client_sync_state (
tenant_id TEXT NOT NULL,
user_id TEXT NOT NULL,
machine_id TEXT NOT NULL,
last_seen_at TIMESTAMP DEFAULT NOW(),
last_push_at TIMESTAMP,
sync_cursor TEXT,
PRIMARY KEY (tenant_id, user_id, machine_id)
);

Component Roles

ComponentRoleDirection
Claude CodeGenerate raw session dataSource
CODITECT ProcessorDeduplicate, extract, enrichLocal transform
Local SQLiteLocal context storageLocal cache
CODITECT APIHandle push requestsUpstream gateway
PostgreSQLPrimary record of truthCentral authority
PowerSyncStream changes to clientsDownstream sync
LitestreamBackup local SQLite to GCSDisaster recovery

API Endpoints

EndpointMethodPurpose
/api/sync/pushPOSTPush local changes to cloud
/api/sync/statusGETGet sync state for client
/api/sync/resolvePOSTManually resolve conflicts
sync.coditect.ai/*WSPowerSync streaming endpoint

Rationale

Why This Architecture?

  1. PostgreSQL as master - Proven ACID compliance, rich querying, tenant isolation
  2. PowerSync for downstream - Purpose-built for PostgreSQL → SQLite sync
  3. REST API for upstream - Simple, debuggable, controllable write path
  4. Local SQLite - Fast local queries, offline support

Why Not Pure PowerSync Bidirectional?

PowerSync supports writes via their SDK, but we chose REST API for upstream because:

  • More control over conflict resolution
  • Easier to add business logic (validation, enrichment)
  • Better audit logging
  • Simpler debugging

Consequences

Positive

  • Unified context across all user machines
  • Team collaboration via shared patterns
  • Multi-tenant isolation at database level
  • Offline-first with eventual consistency
  • Scalable with PostgreSQL + PowerSync

Negative

  • Eventual consistency (not real-time for writes)
  • Complexity of sync state management
  • Infrastructure cost (PostgreSQL + PowerSync + MongoDB)

Risks & Mitigations

RiskMitigation
Data loss during syncLitestream backup, sync acknowledgments
Sync conflictsContent-hash dedup, last-write-wins
Network partitionsQueue writes locally, sync on reconnect
Large datasetsPagination, incremental sync

Files

FilePurpose
config/powersync/powersync.yamlPowerSync service config
config/powersync/sync_rules.yamlMulti-tenant sync rules
config/powersync/kubernetes/GKE deployment manifests
scripts/context-sync.pyLocal sync client (TODO)
submodules/integrations/powersync-servicePowerSync source
  • ADR-042: Litestream for SQLite backup to GCS
  • ADR-014: Multi-Product Commerce Architecture
  • ADR-CX: Context Extraction System

References