CODITECT Metadata Gaps Analysis
What's missing to enable full visibility into multi-agent workflows
Executive Summary
Current CODITECT infrastructure captures 70% of needed data. Critical gaps exist in:
- Agent Identity & Handoffs (30% gap)
- Intent Signals (40% gap)
- Session Context (25% gap)
- Causality Links (50% gap)
- Real-time State (60% gap)
Current State: What We Have ✅
1. Message Content
Source: unique_messages.jsonl
{
"hash": "abc123...",
"message": {
"role": "user" | "assistant",
"content": "text content"
},
"first_seen": "2025-11-17T20:00:00Z",
"checkpoint": "checkpoint-id"
}
What's good:
- ✅ Message content preserved
- ✅ Role (user/assistant)
- ✅ Timestamp
- ✅ Checkpoint linkage
What's missing:
- ❌ Which specific AI agent (if assistant)
- ❌ Session ID (can't group related messages)
- ❌ Parent message reference (conversation threading)
- ❌ Intent/purpose of message
- ❌ User identity beyond "user"
2. Git Data
Source: .git/ repository
git log --format="%H|%an|%ae|%at|%s"
abc123|Hal Casteel|hal@az1.ai|1700237400|Add OAuth2 middleware
What's good:
- ✅ Commit hash, author, email, timestamp, message
- ✅ File changes (git show --name-status)
- ✅ Diff stats (insertions/deletions)
What's missing:
- ❌ Link to conversation that led to commit
- ❌ Link to TASKLIST item being addressed
- ❌ Link to PROJECT-PLAN goal
- ❌ AI agent assistance metadata (which agent helped)
- ❌ Session ID during which commit was made
3. Strategic Planning
Source: project-plan.md
## Phase 1: Foundation
- [ ] Setup authentication system
- [ ] Database schema design
What's good:
- ✅ Goals and phases
- ✅ Checkbox state
- ✅ Structured hierarchy
What's missing:
- ❌ Timestamps (when goal was added/completed)
- ❌ Owner/assignee
- ❌ Priority/urgency
- ❌ Dependencies between goals
- ❌ Estimated vs actual effort
- ❌ Link to related TASKLIST items
4. Tactical Execution
Source: tasklist.md
- [x] Setup OAuth2 middleware
- [ ] Add JWT token validation
What's good:
- ✅ Task description
- ✅ Checkbox state
What's missing:
- ❌ Task ID (stable identifier)
- ❌ Created/started/completed timestamps
- ❌ Owner/assignee
- ❌ AI agents used
- ❌ Related files
- ❌ Parent PROJECT-PLAN goal
- ❌ Estimated effort
- ❌ Actual effort spent
- ❌ Blockers/dependencies
Critical Metadata Gaps
Gap #1: Agent Identity & Tracking
Problem: Can't distinguish between different AI agents
Current state:
{
"role": "assistant" // Which assistant? Claude? Specialized agent?
}
Needed:
{
"role": "assistant",
"agent": {
"type": "specialized",
"name": "rust-expert-developer",
"invocation_method": "Task tool proxy",
"capabilities": ["async Rust", "tokio", "actix-web"]
}
}
Impact: HIGH
- Can't measure agent effectiveness
- Can't track agent-to-agent handoffs
- Can't recommend best agent for task
Implementation needed:
- Agent detection from content patterns
- Explicit agent metadata in messages
- Agent registry/catalog
Gap #2: Session Context
Problem: Can't group related messages into coherent sessions
Current state:
- Messages are timestamped but not grouped
- No session boundaries
- Can't distinguish "planning session" from "implementation session"
Needed:
{
"session": {
"id": "session-2025-11-17-oauth2-impl",
"type": "implementation",
"started_at": "2025-11-17T14:00:00Z",
"ended_at": "2025-11-17T16:30:00Z",
"context": {
"project": "coditect-cloud-backend",
"feature": "OAuth2 authentication",
"phase": "Phase 1: Foundation"
},
"participants": [
{"type": "human", "name": "Hal Casteel"},
{"type": "ai_agent", "name": "rust-expert-developer"}
]
}
}
Impact: HIGH
- Can't understand context shifts
- Can't measure session productivity
- Can't resume interrupted sessions
Implementation needed:
- Session detection algorithm
- Session metadata storage
- Session boundary markers in exports
Gap #3: Intent Signals
Problem: Can't automatically determine WHY work is being done
Current state:
- Must infer intent from content
- No explicit goal linkage
- No activity classification
Needed:
{
"intent": {
"activity_type": "implementing",
"feature": "OAuth2 middleware",
"component": "authentication",
"linked_task": "TASK-003",
"linked_goal": "Setup authentication system",
"priority": "P0"
}
}
Impact: MEDIUM
- Requires fuzzy matching (error-prone)
- Manual classification needed
- Weak task→goal correlation
Implementation needed:
- Structured intent metadata in messages
- Auto-detection from message patterns
- User-prompted intent clarification
Gap #4: Causality Links
Problem: Can't trace cause-effect relationships
Current state:
- Messages exist in isolation
- Commits exist in isolation
- No explicit links between:
- Conversation → Code change
- Task completion → Checkpoint
- Problem → Solution
Needed:
{
"message_id": "msg-789",
"caused_by": ["msg-785", "msg-786"], // Parent messages
"caused": [
{"type": "git_commit", "hash": "abc123"},
{"type": "task_update", "task_id": "TASK-003", "status": "completed"}
]
}
Impact: VERY HIGH
- Can't build causal graph
- Can't trace decisions to outcomes
- Can't answer "why did we do this?"
Implementation needed:
- Message threading/parent references
- Explicit causality metadata
- Event correlation engine
Gap #5: Real-time State
Problem: Current state must be inferred from historical data
Current state:
- "What's being worked on now?" requires querying latest messages
- "Which task is active?" requires parsing TASKLIST
- No live state indicator
Needed:
{
"current_state": {
"active_session": "session-123",
"active_tasks": ["TASK-003", "TASK-005"],
"active_files": ["src/middleware/jwt.rs"],
"active_agents": ["rust-expert-developer"],
"context": {
"feature": "JWT validation",
"activity": "implementing",
"phase": "Phase 1: Foundation"
},
"last_updated": "2025-11-17T20:30:00Z"
}
}
Impact: HIGH
- Dashboard must reconstruct state (slow)
- No "right now" indicator
- Can't detect stalled work
Implementation needed:
- Live state tracking service
- State snapshot on every event
- State persistence mechanism
Gap #6: Multi-Agent Workflow Metadata
Problem: No metadata for agent-to-agent collaboration
Current state:
- Can't track agent handoffs
- Can't measure agent coordination
- No workflow state machine
Needed:
{
"workflow": {
"id": "workflow-oauth2-impl",
"type": "feature_implementation",
"stages": [
{
"name": "planning",
"agent": "orchestrator",
"started": "2025-11-17T14:00:00Z",
"completed": "2025-11-17T14:30:00Z",
"output": "Implementation plan with 5 tasks"
},
{
"name": "implementation",
"agent": "rust-expert-developer",
"started": "2025-11-17T14:30:00Z",
"status": "in_progress",
"handed_off_from": "orchestrator"
}
]
}
}
Impact: CRITICAL for multi-agent
- Can't visualize agent workflows
- Can't optimize agent coordination
- Can't detect handoff failures
Implementation needed:
- Workflow state machine
- Agent handoff metadata
- Workflow orchestration tracking
Gap #7: Effort & Time Tracking
Problem: No time/effort metadata
Current state:
- Timestamps exist but no duration
- No effort estimates vs actuals
- Can't measure productivity
Needed:
{
"task": {
"id": "TASK-003",
"estimated_hours": 4,
"actual_hours": 5.5,
"started_at": "2025-11-17T14:00:00Z",
"completed_at": "2025-11-17T19:30:00Z",
"time_breakdown": {
"planning": 0.5,
"implementation": 3.5,
"testing": 1.0,
"debugging": 0.5
}
}
}
Impact: MEDIUM
- Can't predict future work
- Can't measure efficiency
- Can't identify bottlenecks
Implementation needed:
- Time tracking integration
- Effort estimation framework
- Activity classification
Gap #8: File → Task → Goal Mapping
Problem: Must infer relationships via fuzzy matching
Current state:
- File changes not explicitly linked to tasks
- Tasks not explicitly linked to goals
- Weak correlation (60-70% accuracy)
Needed:
{
"file": "src/middleware/oauth2.rs",
"linked_tasks": ["TASK-003", "TASK-004"],
"linked_goals": ["Setup authentication system"],
"linked_conversations": ["msg-785", "msg-786", "msg-789"]
}
Impact: HIGH
- Impact analysis inaccurate
- Task completion detection weak
- Can't auto-update TASKLIST
Implementation needed:
- Explicit metadata in git commits
- Task ID references in commit messages
- Automated linkage on file save
Proposed Metadata Schema
Enhanced Message Format
{
"message_id": "msg-1234",
"hash": "content-hash",
"session_id": "session-2025-11-17-oauth2",
"timestamp": "2025-11-17T15:30:00Z",
"parent_message_id": "msg-1233",
"actor": {
"type": "human" | "ai_agent",
"name": "Hal Casteel" | "rust-expert-developer",
"email": "hal@az1.ai"
},
"content": {
"role": "user" | "assistant",
"text": "Implementing OAuth2 middleware",
"intent": {
"activity": "implementing",
"feature": "OAuth2 middleware",
"component": "authentication"
}
},
"context": {
"project": "coditect-cloud-backend",
"session_type": "implementation",
"active_files": ["src/middleware/oauth2.rs"],
"linked_task": "TASK-003",
"linked_goal": "Setup authentication system",
"phase": "Phase 1: Foundation"
},
"causality": {
"caused_by": ["msg-1230", "msg-1231"],
"caused": [
{"type": "git_commit", "hash": "abc123"},
{"type": "file_edit", "path": "src/middleware/oauth2.rs"}
]
},
"checkpoint": "checkpoint-id"
}
Enhanced Task Format
{
"task_id": "TASK-003",
"title": "Implement OAuth2 middleware",
"description": "Add OAuth2 token validation middleware",
"status": "completed",
"created_at": "2025-11-15T10:00:00Z",
"started_at": "2025-11-17T14:00:00Z",
"completed_at": "2025-11-17T19:30:00Z",
"owner": {
"type": "human",
"name": "Hal Casteel"
},
"agents_used": [
{"name": "rust-expert-developer", "messages": 15}
],
"linked_goal": {
"phase": "Phase 1: Foundation",
"goal": "Setup authentication system"
},
"effort": {
"estimated_hours": 4,
"actual_hours": 5.5
},
"artifacts": {
"files_created": ["src/middleware/oauth2.rs"],
"files_modified": ["src/middleware/mod.rs", "Cargo.toml"],
"commits": ["abc123", "def456"],
"conversations": ["session-2025-11-17-oauth2"]
}
}
Implementation Priority
P0 - Critical (Needed for MVP)
- Session ID - Group related messages
- Agent Identity - Track which agent
- Task IDs - Stable task references
- File→Task links - Connect code to tasks
P1 - High Value (Needed for 360° view)
- Intent metadata - Activity classification
- Causality links - Message threading
- Current state - Real-time tracking
- Timestamps - Created/started/completed
P2 - Nice to Have (Analytics)
- Effort tracking - Time estimates vs actuals
- Workflow stages - Multi-agent orchestration
- Dependencies - Task/goal relationships
Integration Strategy
Phase 1: Minimal Metadata (Week 1-2)
Add to existing systems without breaking changes:
- Session ID in exports (file naming convention)
- Agent detection from content patterns
- Task ID regex in TASKLIST comments
Phase 2: Enhanced Metadata (Week 3-4)
Structured metadata additions:
- JSON front-matter in tasklist.md
- Git commit message format standardization
- Session metadata in checkpoints
Phase 3: Full Schema (Week 5-6)
Complete metadata integration:
- Enhanced message format
- Real-time state tracking
- Causality graph database
Success Metrics
After metadata enhancements:
- ✅ 95%+ agent identification accuracy
- ✅ 90%+ file→task linkage accuracy
- ✅ 100% session grouping coverage
- ✅ <5s state reconstruction time
- ✅ Complete causality tracing
Appendix: Quick Wins
Quick Win #1: Agent Detection (2 hours)
Use regex patterns to detect agents from content:
agent_patterns = {
'rust-expert-developer': r'async.*Rust|tokio|actix',
'database-architect': r'PostgreSQL|schema.*design'
}
Quick Win #2: Session from Filename (30 minutes)
Extract session context from export filenames:
"2025-11-17-EXPORT-oauth2-implementation.txt"
→ session_id: "2025-11-17-oauth2-implementation"
Quick Win #3: Task ID in Comments (1 hour)
Add task IDs to TASKLIST:
- [ ] Implement OAuth2 middleware <!-- TASK-003 -->
Quick Win #4: Commit Message Format (1 hour)
Standardize git commits:
feat(auth): Add OAuth2 middleware
Task: TASK-003
Phase: Phase 1 - Foundation
Files: src/middleware/oauth2.rs
Next Steps:
- Review metadata gaps with team
- Prioritize P0 items for MVP
- Implement quick wins (4.5 hours total)
- Design full metadata schema
- Roll out incrementally