ADR-101: Database Usage Patterns for Discovery and Analytics
Status
Accepted - January 23, 2026
Context
CODITECT's two-database architecture (ADR-089) captures rich data about development sessions, tool usage, decisions, and code patterns. However, this data is only valuable if users and agents can effectively query it for:
- Discovery - Finding relevant past work, decisions, and solutions
- Analytics - Understanding patterns, costs, and productivity
- Learning - Improving agent performance through trajectory analysis
- Insights - Surfacing actionable information for developers
The Value Proposition Problem
Raw data in SQLite tables has limited value. The true value of CODITECT's memory system comes from queryable insights that:
- Save developers from re-solving problems
- Provide context for better decisions
- Enable continuous improvement of agents
- Surface patterns that humans miss
Decision
Document and standardize database usage patterns that unlock the value of CODITECT's memory system.
Value Proposition: Why This Matters
1. Anti-Forgetting Memory
Problem: AI assistants forget everything between sessions.
CODITECT Solution:
-- What did we decide about authentication?
SELECT decision, rationale, created_at
FROM decisions
WHERE decision LIKE '%auth%' OR tags LIKE '%auth%'
ORDER BY confidence DESC, created_at DESC
LIMIT 10;
Value: Never re-litigate decided issues. Build on past work.
2. Error-Solution Cache
Problem: Same errors waste time repeatedly.
CODITECT Solution:
-- Have we solved this error before?
SELECT error_signature, solution, solution_code, success_count
FROM error_solutions
WHERE error_type = 'TypeError'
AND error_signature LIKE '%cannot read property%'
ORDER BY success_count DESC
LIMIT 5;
Value: Instant solutions for known problems. Learning from failures.
3. Tool Usage Intelligence
Problem: No visibility into what tools agents use and how efficiently.
CODITECT Solution:
-- Tool success rates by category
SELECT
tool_category,
tool_name,
COUNT(*) as total_calls,
SUM(CASE WHEN status = 'success' THEN 1 ELSE 0 END) as successes,
ROUND(100.0 * SUM(CASE WHEN status = 'success' THEN 1 ELSE 0 END) / COUNT(*), 1) as success_rate,
ROUND(AVG(execution_time_ms), 0) as avg_time_ms
FROM tool_analytics
WHERE created_at > datetime('now', '-7 days')
GROUP BY tool_category, tool_name
ORDER BY total_calls DESC;
Value: Identify slow tools, failure patterns, optimization opportunities.
4. Cost Attribution
Problem: No understanding of where API costs go.
CODITECT Solution:
-- Cost by agent type
SELECT
ta.agent_name,
COUNT(DISTINCT ta.session_id) as sessions,
SUM(tu.cost_usd) as total_cost,
ROUND(AVG(tu.cost_usd), 4) as avg_cost_per_call
FROM tool_analytics ta
JOIN token_usage tu ON ta.session_id = tu.session_id
WHERE ta.created_at > datetime('now', '-30 days')
GROUP BY ta.agent_name
ORDER BY total_cost DESC;
Value: Understand ROI. Optimize expensive operations.
5. Pattern Discovery
Problem: Best practices scattered, not captured.
CODITECT Solution:
-- Most reused code patterns
SELECT
language,
pattern_name,
pattern_type,
usage_count,
SUBSTR(code, 1, 200) as code_preview
FROM code_patterns
WHERE usage_count > 5
ORDER BY usage_count DESC
LIMIT 20;
Value: Build on proven patterns. Standardize best practices.
Usage Patterns by Category
Discovery Patterns
1. Find Past Decisions
-- Decisions about a topic (full-text search)
SELECT d.*, m.session_id
FROM decisions d
LEFT JOIN messages m ON d.message_id = m.id
WHERE decisions_fts MATCH 'database migration'
ORDER BY d.confidence DESC, d.created_at DESC
LIMIT 10;
2. Find Related Sessions
-- Sessions discussing a topic
SELECT DISTINCT
session_id,
COUNT(*) as message_count,
MIN(timestamp) as started,
MAX(timestamp) as ended
FROM messages
WHERE messages_fts MATCH 'kubernetes deployment'
GROUP BY session_id
ORDER BY started DESC
LIMIT 10;
3. Find Code Examples
-- Python async patterns
SELECT pattern_name, code, description
FROM code_patterns
WHERE language = 'python'
AND (pattern_type = 'async' OR code LIKE '%async%await%')
ORDER BY usage_count DESC
LIMIT 10;
Analytics Patterns
4. Session Productivity
-- Messages and tools per session
SELECT
m.session_id,
COUNT(DISTINCT m.id) as messages,
COUNT(DISTINCT ta.id) as tool_calls,
ROUND(COUNT(DISTINCT ta.id) * 1.0 / COUNT(DISTINCT m.id), 2) as tools_per_message
FROM messages m
LEFT JOIN tool_analytics ta ON m.session_id = ta.session_id
WHERE m.timestamp > datetime('now', '-7 days')
GROUP BY m.session_id
ORDER BY messages DESC
LIMIT 20;
5. Tool Performance Over Time
-- Daily tool performance trends
SELECT
DATE(created_at) as day,
tool_name,
COUNT(*) as calls,
ROUND(AVG(execution_time_ms), 0) as avg_ms,
SUM(CASE WHEN status = 'failed' THEN 1 ELSE 0 END) as failures
FROM tool_analytics
WHERE created_at > datetime('now', '-14 days')
GROUP BY day, tool_name
ORDER BY day DESC, calls DESC;
6. Error Hotspots
-- Most frequent errors with solutions
SELECT
error_type,
error_signature,
success_count,
failure_count,
ROUND(100.0 * success_count / (success_count + failure_count), 1) as solution_rate
FROM error_solutions
WHERE success_count + failure_count > 3
ORDER BY success_count + failure_count DESC
LIMIT 15;
Trajectory Analysis Patterns
7. Tool Sequences (What Comes After What)
-- Common tool sequences within sessions
WITH tool_sequence AS (
SELECT
session_id,
tool_name,
LAG(tool_name) OVER (PARTITION BY session_id ORDER BY created_at) as prev_tool
FROM tool_analytics
WHERE trajectory_hash IS NOT NULL
)
SELECT
prev_tool || ' → ' || tool_name as sequence,
COUNT(*) as frequency
FROM tool_sequence
WHERE prev_tool IS NOT NULL
GROUP BY prev_tool, tool_name
HAVING COUNT(*) > 100
ORDER BY frequency DESC
LIMIT 20;
8. Agent Efficiency
-- Agent comparison: tools per task
SELECT
agent_name,
COUNT(DISTINCT task_id) as tasks,
COUNT(*) as total_tools,
ROUND(COUNT(*) * 1.0 / COUNT(DISTINCT task_id), 1) as tools_per_task,
ROUND(AVG(execution_time_ms), 0) as avg_tool_time_ms
FROM tool_analytics
WHERE agent_name IS NOT NULL
AND task_id IS NOT NULL
AND created_at > datetime('now', '-7 days')
GROUP BY agent_name
HAVING tasks > 5
ORDER BY tools_per_task ASC;
9. Task Completion Patterns
-- Tools used to complete tasks (by track)
SELECT
SUBSTR(task_id, 1, 1) as track,
tool_category,
COUNT(*) as uses,
ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (PARTITION BY SUBSTR(task_id, 1, 1)), 1) as pct
FROM tool_analytics
WHERE task_id IS NOT NULL
GROUP BY track, tool_category
ORDER BY track, uses DESC;
MCP Server Query Patterns
10. Semantic Search Integration
-- For MCP semantic search server (hybrid search)
WITH fts_results AS (
SELECT id, hash, content, role, bm25(messages_fts) as fts_score
FROM messages_fts
WHERE messages_fts MATCH ?
LIMIT 100
),
vector_results AS (
SELECT message_id, similarity
FROM embeddings
WHERE embedding MATCH ? -- Vector search
LIMIT 100
)
SELECT
m.*,
COALESCE(f.fts_score, 0) * 0.4 + COALESCE(v.similarity, 0) * 0.6 as rrf_score
FROM messages m
LEFT JOIN fts_results f ON m.id = f.id
LEFT JOIN vector_results v ON m.id = v.message_id
WHERE f.id IS NOT NULL OR v.message_id IS NOT NULL
ORDER BY rrf_score DESC
LIMIT 20;
11. Call Graph Function Lookup
-- For MCP call graph server
SELECT
f.name as function_name,
f.file_path,
f.line_number,
f.signature,
COUNT(e.id) as caller_count
FROM call_graph_functions f
LEFT JOIN call_graph_edges e ON f.id = e.callee_id
WHERE f.name LIKE ?
GROUP BY f.id
ORDER BY caller_count DESC;
Dashboard Queries (What Powers /cxq)
/cxq --stats
SELECT
'Messages' as category, COUNT(*) as count FROM messages
UNION ALL SELECT 'Sessions', COUNT(DISTINCT session_id) FROM messages
UNION ALL SELECT 'Decisions', COUNT(*) FROM decisions
UNION ALL SELECT 'Code Patterns', COUNT(*) FROM code_patterns
UNION ALL SELECT 'Error Solutions', COUNT(*) FROM error_solutions
UNION ALL SELECT 'Tool Calls', COUNT(*) FROM tool_analytics
UNION ALL SELECT 'Token Records', COUNT(*) FROM token_usage;
/cxq --recent 20
SELECT
role,
SUBSTR(content, 1, 200) as preview,
session_id,
timestamp
FROM messages
ORDER BY timestamp DESC
LIMIT 20;
/cxq --decisions
SELECT
decision_type,
decision,
rationale,
confidence,
created_at
FROM decisions
WHERE confidence > 0.6
ORDER BY created_at DESC
LIMIT 20;
Implementation Notes
Query Performance
- Use indexes - All analytical queries should use indexed columns
- Limit results - Always include LIMIT to prevent runaway queries
- Date filters - Use
datetime('now', '-N days')for time-bounded queries - FTS for text - Use
_ftstables for content searches, not LIKE
MCP Server Integration
The following MCP servers query these databases:
| Server | Database | Primary Tables |
|---|---|---|
coditect-semantic-search | context.db | messages, embeddings, decisions |
coditect-call-graph | context.db | call_graph_functions, call_graph_edges |
coditect-impact-analysis | context.db | call_graph_*, decisions |
Adding New Analytics
When adding new analytics:
- Check if existing tables suffice
- Add indexes for query patterns
- Create views for common aggregations
- Document in DATABASE-SCHEMA.md
- Update /cxq if user-facing
Consequences
Positive
- Clear value proposition - Users understand why memory matters
- Reusable queries - Copy-paste patterns for common needs
- MCP integration guide - Agents can query effectively
- Analytics foundation - Base for dashboards and insights
Negative
- Query complexity - Some patterns require SQL knowledge
- Performance considerations - Large tables need careful querying
Mitigations
- Views - Pre-built views for complex aggregations
- /cxq wrappers - User-friendly command wrappers
- Documentation - This ADR and DATABASE-SCHEMA.md
References
- DATABASE-SCHEMA.md - Complete schema reference
- ADR-020 - /cx extraction pipeline
- ADR-021 - /cxq query interface
- ADR-079 - Trajectory extraction
- ADR-089 - Two-database architecture
Author: CODITECT Architecture Team Reviewers: Architecture Council