Skip to main content

ADR-149: Query Language Evolution Strategy

Status

ACCEPTED (2026-02-03)

Context

Problem Statement

The /cxq command has grown to support 94 tables across 4 databases with 100+ command flags. As capabilities expand, the current approach faces limitations:

GapExample NeedCurrent Limitation
Complex Filters"Decisions about APIs made in last 7 days with confidence > 0.8"Requires chained flags or raw SQL
Cross-Table Joins"Errors that occurred during tasks with low success scores"No cross-table filtering
Aggregations"Token cost by model by week"Limited to pre-built views
Temporal Patterns"Sessions where tool failures increased over time"No trend detection
Graph Traversal"All components that invoke security agents"No graph query syntax
Pipeline"Find errors → extract patterns → suggest fixes"Multi-step manual process

Current Query Capabilities

The /cxq command supports:

  1. FTS5 Full-Text Search: word1 word2, word1 OR word2, "phrase", word*, NEAR(a b, 5), -word
  2. Filter Flags: --role, --llm, --today, --since, --session
  3. Knowledge Queries: --decisions, --patterns, --errors, --learnings
  4. Analytics: --tokens, --cost, --tools, --traces
  5. Semantic Search: --semantic, --recall
  6. Multi-Database: --multi-db, --scope-framework, --scope-project
  7. Views: 18 pre-built database views

Options Considered

Option A: Continue Flag-Based Expansion

Add more --where, --group-by, --join flags.

Pros: Familiar, backward compatible Cons: Combinatorial explosion of flags, hard to compose

Option B: Full Domain-Specific Language (DSL)

Create CODITECT Query Language (CQL):

FROM decisions d
JOIN knowledge_graph kg ON d.id = kg.source_id
WHERE d.decision_type = 'architecture'
AND d.confidence > 0.7
ORDER BY d.created_at DESC
LIMIT 10

Pros: Powerful, expressive Cons: Learning curve, implementation complexity

Three-phase evolution:

  1. Phase 1: Enhanced flag syntax (immediate)
  2. Phase 2: Query templates/presets (medium-term)
  3. Phase 3: Optional DSL (if needed)

Pros: Incremental value, low risk, preserves compatibility Cons: May still need full DSL eventually

Decision

Implement Option C: Progressive Enhancement Strategy with three phases.

Phase 1: Enhanced Flag Syntax (Immediate - Q1 2026)

Add structured filtering without full DSL:

# Compound filters with --where
/cxq --decisions --where "confidence > 0.8 AND decision_type = 'api'" --since 7d

# Aggregation flags
/cxq --cost --group-by model,week --sort cost_desc

# Graph traversal shortcuts
/cxq --component security-specialist --traversal invokes:2 # 2 hops

# Field-specific search
/cxq --field "role:assistant,content:error"

# Cross-database correlation
/cxq --correlate "decisions,messages" --by session_id

Implementation:

  • Add --where flag with simple expression parser
  • Add --group-by, --sort flags for aggregations
  • Add --traversal flag for knowledge graph
  • No new dependencies, pure Python

Phase 2: Query Templates (Medium-term - Q2 2026)

Create reusable query definitions:

# ~/.coditect-data/queries/high-cost-sessions.yaml
name: high-cost-sessions
description: Sessions with unusual token costs
version: 1.0.0
author: hal@az1.ai

query:
from: token_economics
join:
- table: messages
on: session_id
where:
- "cost_total_usd > 1.0"
group_by: session_id
order_by: cost_total_usd DESC
limit: 20

output:
format: table
columns:
- session_id
- total_cost
- message_count

Usage:

# Execute saved query
/cxq --query high-cost-sessions

# List available queries
/cxq --queries

# Create query interactively
/cxq --create-query

# Share query (export to clipboard)
/cxq --export-query high-cost-sessions

Implementation:

  • YAML query definition schema
  • Query template registry in org.db
  • Validation against database schema

Phase 3: Optional DSL (Future - if needed)

If complexity warrants, introduce CODITECT Query Language (CQL):

-- CQL Example
FROM decisions d
LEFT JOIN knowledge_graph kg ON d.id = kg.source_id
WHERE d.decision_type IN ('architecture', 'api')
AND d.confidence > @threshold
AND d.created_at > @recent(7d)
AND kg.relationship = 'affects'
ORDER BY d.confidence DESC, d.created_at DESC
LIMIT @limit

-- Variables
@threshold = 0.7
@limit = 20

CQL Design Principles:

  1. SQL-Compatible Core: Leverage existing SQLite knowledge
  2. Domain Shorthands: @recent(7d), @high-confidence, @my-decisions
  3. Graph Extensions: TRAVERSE component -> invokes -> 2 hops
  4. Temporal Functions: TREND(cost_usd, daily, last_30d)
  5. Natural Language Bridge: "decisions about APIs" → CQL

LLM-Assisted Query Generation:

# Natural language to CQL
/cxq "find all architecture decisions that affected the authentication system"
# → Generates CQL, shows for approval, then executes

Decision Criteria for Phase 3

Move to Phase 3 (full DSL) if:

  • Query template usage exceeds 50 unique templates
  • Users frequently request joins across 3+ tables
  • Complex temporal analysis becomes common
  • Graph traversal depth regularly exceeds 2 hops

Implementation

Phase 1 Tasks (J.4.7)

TaskDescriptionPriority
J.4.7.1Add --where flag with expression parserP1
J.4.7.2Add --group-by and --sort flagsP1
J.4.7.3Add --traversal for knowledge graphP2
J.4.7.4Add --field for field-specific searchP2
J.4.7.5Add --correlate for cross-table queriesP3
J.4.7.6Update cxq.md documentationP1

Phase 2 Tasks (J.4.8)

TaskDescriptionPriority
J.4.8.1Create query template YAML schemaP1
J.4.8.2Implement query template registryP1
J.4.8.3Add --query, --queries commandsP1
J.4.8.4Create 20 standard query templatesP2
J.4.8.5Add --create-query interactive modeP3

Standard Query Templates (Phase 2)

TemplatePurpose
daily-cost-reportToken costs by model for day
weekly-activityMessages, decisions, patterns by day
error-frequencyMost common errors with solutions
high-cost-sessionsSessions exceeding cost threshold
stuck-tasksTasks with high retry counts
model-comparisonCost/performance by LLM model
component-usageMost invoked agents/skills
decision-impactDecisions affecting most components
learning-gapsSkills with low effectiveness
session-healthSessions with high error rates

Consequences

Positive

  1. Incremental value - Phase 1 delivers immediate improvements
  2. Low risk - No breaking changes to existing commands
  3. User feedback - Phase 2 templates capture common patterns
  4. Deferred complexity - Full DSL only if proven necessary
  5. LLM compatibility - Natural language can generate queries

Negative

  1. Multiple syntaxes - Users may need to learn flags + templates + (optionally) CQL
  2. Migration - Complex queries may need rewriting as capabilities evolve

Mitigations

  • Document clear guidance on when to use each approach
  • Provide migration tools between query formats
  • LLM assistance for query generation reduces learning curve

Query Capability Matrix

CapabilityCurrentPhase 1Phase 2Phase 3
FTS5 searchYesYesYesYes
Role filteringYesYesYesYes
Date filteringYesYesYesYes
LLM filteringYesYesYesYes
Semantic searchYesYesYesYes
Compound WHERENoYesYesYes
AggregationsLimitedYesYesYes
Graph traversalNoBasicYesFull
Saved queriesNoNoYesYes
Cross-table joinsNoBasicYesFull
Temporal analysisNoNoLimitedFull
Natural languageNoNoNoYes
  • ADR-118: Four-Tier Database Architecture
  • ADR-020: Context Extraction System (/cx)
  • ADR-021: Context Query System (/cxq)
  • ADR-148: Database Schema Documentation Standard

Changelog

DateChange
2026-02-03Initial version - three-phase strategy

Track: J.4 (Enhanced /cxq Query System) Tasks: J.4.7, J.4.8