Skip to main content

ADR-101: Database Usage Patterns for Discovery and Analytics

Status

Accepted - January 23, 2026

Context

CODITECT's two-database architecture (ADR-089) captures rich data about development sessions, tool usage, decisions, and code patterns. However, this data is only valuable if users and agents can effectively query it for:

  1. Discovery - Finding relevant past work, decisions, and solutions
  2. Analytics - Understanding patterns, costs, and productivity
  3. Learning - Improving agent performance through trajectory analysis
  4. Insights - Surfacing actionable information for developers

The Value Proposition Problem

Raw data in SQLite tables has limited value. The true value of CODITECT's memory system comes from queryable insights that:

  • Save developers from re-solving problems
  • Provide context for better decisions
  • Enable continuous improvement of agents
  • Surface patterns that humans miss

Decision

Document and standardize database usage patterns that unlock the value of CODITECT's memory system.

Value Proposition: Why This Matters

1. Anti-Forgetting Memory

Problem: AI assistants forget everything between sessions.

CODITECT Solution:

-- What did we decide about authentication?
SELECT decision, rationale, created_at
FROM decisions
WHERE decision LIKE '%auth%' OR tags LIKE '%auth%'
ORDER BY confidence DESC, created_at DESC
LIMIT 10;

Value: Never re-litigate decided issues. Build on past work.

2. Error-Solution Cache

Problem: Same errors waste time repeatedly.

CODITECT Solution:

-- Have we solved this error before?
SELECT error_signature, solution, solution_code, success_count
FROM error_solutions
WHERE error_type = 'TypeError'
AND error_signature LIKE '%cannot read property%'
ORDER BY success_count DESC
LIMIT 5;

Value: Instant solutions for known problems. Learning from failures.

3. Tool Usage Intelligence

Problem: No visibility into what tools agents use and how efficiently.

CODITECT Solution:

-- Tool success rates by category
SELECT
tool_category,
tool_name,
COUNT(*) as total_calls,
SUM(CASE WHEN status = 'success' THEN 1 ELSE 0 END) as successes,
ROUND(100.0 * SUM(CASE WHEN status = 'success' THEN 1 ELSE 0 END) / COUNT(*), 1) as success_rate,
ROUND(AVG(execution_time_ms), 0) as avg_time_ms
FROM tool_analytics
WHERE created_at > datetime('now', '-7 days')
GROUP BY tool_category, tool_name
ORDER BY total_calls DESC;

Value: Identify slow tools, failure patterns, optimization opportunities.

4. Cost Attribution

Problem: No understanding of where API costs go.

CODITECT Solution:

-- Cost by agent type
SELECT
ta.agent_name,
COUNT(DISTINCT ta.session_id) as sessions,
SUM(tu.cost_usd) as total_cost,
ROUND(AVG(tu.cost_usd), 4) as avg_cost_per_call
FROM tool_analytics ta
JOIN token_usage tu ON ta.session_id = tu.session_id
WHERE ta.created_at > datetime('now', '-30 days')
GROUP BY ta.agent_name
ORDER BY total_cost DESC;

Value: Understand ROI. Optimize expensive operations.

5. Pattern Discovery

Problem: Best practices scattered, not captured.

CODITECT Solution:

-- Most reused code patterns
SELECT
language,
pattern_name,
pattern_type,
usage_count,
SUBSTR(code, 1, 200) as code_preview
FROM code_patterns
WHERE usage_count > 5
ORDER BY usage_count DESC
LIMIT 20;

Value: Build on proven patterns. Standardize best practices.


Usage Patterns by Category

Discovery Patterns

1. Find Past Decisions

-- Decisions about a topic (full-text search)
SELECT d.*, m.session_id
FROM decisions d
LEFT JOIN messages m ON d.message_id = m.id
WHERE decisions_fts MATCH 'database migration'
ORDER BY d.confidence DESC, d.created_at DESC
LIMIT 10;
-- Sessions discussing a topic
SELECT DISTINCT
session_id,
COUNT(*) as message_count,
MIN(timestamp) as started,
MAX(timestamp) as ended
FROM messages
WHERE messages_fts MATCH 'kubernetes deployment'
GROUP BY session_id
ORDER BY started DESC
LIMIT 10;

3. Find Code Examples

-- Python async patterns
SELECT pattern_name, code, description
FROM code_patterns
WHERE language = 'python'
AND (pattern_type = 'async' OR code LIKE '%async%await%')
ORDER BY usage_count DESC
LIMIT 10;

Analytics Patterns

4. Session Productivity

-- Messages and tools per session
SELECT
m.session_id,
COUNT(DISTINCT m.id) as messages,
COUNT(DISTINCT ta.id) as tool_calls,
ROUND(COUNT(DISTINCT ta.id) * 1.0 / COUNT(DISTINCT m.id), 2) as tools_per_message
FROM messages m
LEFT JOIN tool_analytics ta ON m.session_id = ta.session_id
WHERE m.timestamp > datetime('now', '-7 days')
GROUP BY m.session_id
ORDER BY messages DESC
LIMIT 20;

5. Tool Performance Over Time

-- Daily tool performance trends
SELECT
DATE(created_at) as day,
tool_name,
COUNT(*) as calls,
ROUND(AVG(execution_time_ms), 0) as avg_ms,
SUM(CASE WHEN status = 'failed' THEN 1 ELSE 0 END) as failures
FROM tool_analytics
WHERE created_at > datetime('now', '-14 days')
GROUP BY day, tool_name
ORDER BY day DESC, calls DESC;

6. Error Hotspots

-- Most frequent errors with solutions
SELECT
error_type,
error_signature,
success_count,
failure_count,
ROUND(100.0 * success_count / (success_count + failure_count), 1) as solution_rate
FROM error_solutions
WHERE success_count + failure_count > 3
ORDER BY success_count + failure_count DESC
LIMIT 15;

Trajectory Analysis Patterns

7. Tool Sequences (What Comes After What)

-- Common tool sequences within sessions
WITH tool_sequence AS (
SELECT
session_id,
tool_name,
LAG(tool_name) OVER (PARTITION BY session_id ORDER BY created_at) as prev_tool
FROM tool_analytics
WHERE trajectory_hash IS NOT NULL
)
SELECT
prev_tool || ' → ' || tool_name as sequence,
COUNT(*) as frequency
FROM tool_sequence
WHERE prev_tool IS NOT NULL
GROUP BY prev_tool, tool_name
HAVING COUNT(*) > 100
ORDER BY frequency DESC
LIMIT 20;

8. Agent Efficiency

-- Agent comparison: tools per task
SELECT
agent_name,
COUNT(DISTINCT task_id) as tasks,
COUNT(*) as total_tools,
ROUND(COUNT(*) * 1.0 / COUNT(DISTINCT task_id), 1) as tools_per_task,
ROUND(AVG(execution_time_ms), 0) as avg_tool_time_ms
FROM tool_analytics
WHERE agent_name IS NOT NULL
AND task_id IS NOT NULL
AND created_at > datetime('now', '-7 days')
GROUP BY agent_name
HAVING tasks > 5
ORDER BY tools_per_task ASC;

9. Task Completion Patterns

-- Tools used to complete tasks (by track)
SELECT
SUBSTR(task_id, 1, 1) as track,
tool_category,
COUNT(*) as uses,
ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (PARTITION BY SUBSTR(task_id, 1, 1)), 1) as pct
FROM tool_analytics
WHERE task_id IS NOT NULL
GROUP BY track, tool_category
ORDER BY track, uses DESC;

MCP Server Query Patterns

10. Semantic Search Integration

-- For MCP semantic search server (hybrid search)
WITH fts_results AS (
SELECT id, hash, content, role, bm25(messages_fts) as fts_score
FROM messages_fts
WHERE messages_fts MATCH ?
LIMIT 100
),
vector_results AS (
SELECT message_id, similarity
FROM embeddings
WHERE embedding MATCH ? -- Vector search
LIMIT 100
)
SELECT
m.*,
COALESCE(f.fts_score, 0) * 0.4 + COALESCE(v.similarity, 0) * 0.6 as rrf_score
FROM messages m
LEFT JOIN fts_results f ON m.id = f.id
LEFT JOIN vector_results v ON m.id = v.message_id
WHERE f.id IS NOT NULL OR v.message_id IS NOT NULL
ORDER BY rrf_score DESC
LIMIT 20;

11. Call Graph Function Lookup

-- For MCP call graph server
SELECT
f.name as function_name,
f.file_path,
f.line_number,
f.signature,
COUNT(e.id) as caller_count
FROM call_graph_functions f
LEFT JOIN call_graph_edges e ON f.id = e.callee_id
WHERE f.name LIKE ?
GROUP BY f.id
ORDER BY caller_count DESC;

Dashboard Queries (What Powers /cxq)

/cxq --stats

SELECT
'Messages' as category, COUNT(*) as count FROM messages
UNION ALL SELECT 'Sessions', COUNT(DISTINCT session_id) FROM messages
UNION ALL SELECT 'Decisions', COUNT(*) FROM decisions
UNION ALL SELECT 'Code Patterns', COUNT(*) FROM code_patterns
UNION ALL SELECT 'Error Solutions', COUNT(*) FROM error_solutions
UNION ALL SELECT 'Tool Calls', COUNT(*) FROM tool_analytics
UNION ALL SELECT 'Token Records', COUNT(*) FROM token_usage;

/cxq --recent 20

SELECT
role,
SUBSTR(content, 1, 200) as preview,
session_id,
timestamp
FROM messages
ORDER BY timestamp DESC
LIMIT 20;

/cxq --decisions

SELECT
decision_type,
decision,
rationale,
confidence,
created_at
FROM decisions
WHERE confidence > 0.6
ORDER BY created_at DESC
LIMIT 20;

Implementation Notes

Query Performance

  1. Use indexes - All analytical queries should use indexed columns
  2. Limit results - Always include LIMIT to prevent runaway queries
  3. Date filters - Use datetime('now', '-N days') for time-bounded queries
  4. FTS for text - Use _fts tables for content searches, not LIKE

MCP Server Integration

The following MCP servers query these databases:

ServerDatabasePrimary Tables
coditect-semantic-searchcontext.dbmessages, embeddings, decisions
coditect-call-graphcontext.dbcall_graph_functions, call_graph_edges
coditect-impact-analysiscontext.dbcall_graph_*, decisions

Adding New Analytics

When adding new analytics:

  1. Check if existing tables suffice
  2. Add indexes for query patterns
  3. Create views for common aggregations
  4. Document in DATABASE-SCHEMA.md
  5. Update /cxq if user-facing

Consequences

Positive

  1. Clear value proposition - Users understand why memory matters
  2. Reusable queries - Copy-paste patterns for common needs
  3. MCP integration guide - Agents can query effectively
  4. Analytics foundation - Base for dashboards and insights

Negative

  1. Query complexity - Some patterns require SQL knowledge
  2. Performance considerations - Large tables need careful querying

Mitigations

  1. Views - Pre-built views for complex aggregations
  2. /cxq wrappers - User-friendly command wrappers
  3. Documentation - This ADR and DATABASE-SCHEMA.md

References


Author: CODITECT Architecture Team Reviewers: Architecture Council