CODITECT Metadata Gaps Analysis

What's missing to enable full visibility into multi-agent workflows

Executive Summary

Current CODITECT infrastructure captures 70% of needed data. Critical gaps exist in:

Agent Identity & Handoffs (30% gap)
Intent Signals (40% gap)
Session Context (25% gap)
Causality Links (50% gap)
Real-time State (60% gap)

Current State: What We Have ✅

1. Message Content

Source: unique_messages.jsonl

{
  "hash": "abc123...",
  "message": {
    "role": "user" | "assistant",
    "content": "text content"
  },
  "first_seen": "2025-11-17T20:00:00Z",
  "checkpoint": "checkpoint-id"
}

What's good:

✅ Message content preserved
✅ Role (user/assistant)
✅ Timestamp
✅ Checkpoint linkage

What's missing:

❌ Which specific AI agent (if assistant)
❌ Session ID (can't group related messages)
❌ Parent message reference (conversation threading)
❌ Intent/purpose of message
❌ User identity beyond "user"

2. Git Data

Source: .git/ repository

git log --format="%H|%an|%ae|%at|%s"
abc123|Hal Casteel|hal@az1.ai|1700237400|Add OAuth2 middleware

What's good:

✅ Commit hash, author, email, timestamp, message
✅ File changes (git show --name-status)
✅ Diff stats (insertions/deletions)

What's missing:

❌ Link to conversation that led to commit
❌ Link to TASKLIST item being addressed
❌ Link to PROJECT-PLAN goal
❌ AI agent assistance metadata (which agent helped)
❌ Session ID during which commit was made

3. Strategic Planning

Source: project-plan.md

## Phase 1: Foundation
- [ ] Setup authentication system
- [ ] Database schema design

What's good:

✅ Goals and phases
✅ Checkbox state
✅ Structured hierarchy

What's missing:

❌ Timestamps (when goal was added/completed)
❌ Owner/assignee
❌ Priority/urgency
❌ Dependencies between goals
❌ Estimated vs actual effort
❌ Link to related TASKLIST items

4. Tactical Execution

Source: tasklist.md

- [x] Setup OAuth2 middleware
- [ ] Add JWT token validation

What's good:

✅ Task description
✅ Checkbox state

What's missing:

❌ Task ID (stable identifier)
❌ Created/started/completed timestamps
❌ Owner/assignee
❌ AI agents used
❌ Related files
❌ Parent PROJECT-PLAN goal
❌ Estimated effort
❌ Actual effort spent
❌ Blockers/dependencies

Critical Metadata Gaps

Gap #1: Agent Identity & Tracking

Problem: Can't distinguish between different AI agents

Current state:

{
  "role": "assistant"  // Which assistant? Claude? Specialized agent?
}

Needed:

{
  "role": "assistant",
  "agent": {
    "type": "specialized",
    "name": "rust-expert-developer",
    "invocation_method": "Task tool proxy",
    "capabilities": ["async Rust", "tokio", "actix-web"]
  }
}

Impact: HIGH

Can't measure agent effectiveness
Can't track agent-to-agent handoffs
Can't recommend best agent for task

Implementation needed:

Agent detection from content patterns
Explicit agent metadata in messages
Agent registry/catalog

Gap #2: Session Context

Problem: Can't group related messages into coherent sessions

Current state:

Messages are timestamped but not grouped
No session boundaries
Can't distinguish "planning session" from "implementation session"

Needed:

{
  "session": {
    "id": "session-2025-11-17-oauth2-impl",
    "type": "implementation",
    "started_at": "2025-11-17T14:00:00Z",
    "ended_at": "2025-11-17T16:30:00Z",
    "context": {
      "project": "coditect-cloud-backend",
      "feature": "OAuth2 authentication",
      "phase": "Phase 1: Foundation"
    },
    "participants": [
      {"type": "human", "name": "Hal Casteel"},
      {"type": "ai_agent", "name": "rust-expert-developer"}
    ]
  }
}

Impact: HIGH

Can't understand context shifts
Can't measure session productivity
Can't resume interrupted sessions

Implementation needed:

Session detection algorithm
Session metadata storage
Session boundary markers in exports

Gap #3: Intent Signals

Problem: Can't automatically determine WHY work is being done

Current state:

Must infer intent from content
No explicit goal linkage
No activity classification

Needed:

{
  "intent": {
    "activity_type": "implementing",
    "feature": "OAuth2 middleware",
    "component": "authentication",
    "linked_task": "TASK-003",
    "linked_goal": "Setup authentication system",
    "priority": "P0"
  }
}

Impact: MEDIUM

Requires fuzzy matching (error-prone)
Manual classification needed
Weak task→goal correlation

Implementation needed:

Structured intent metadata in messages
Auto-detection from message patterns
User-prompted intent clarification

Gap #4: Causality Links

Problem: Can't trace cause-effect relationships

Current state:

Messages exist in isolation
Commits exist in isolation
No explicit links between:
- Conversation → Code change
- Task completion → Checkpoint
- Problem → Solution

Needed:

{
  "message_id": "msg-789",
  "caused_by": ["msg-785", "msg-786"],  // Parent messages
  "caused": [
    {"type": "git_commit", "hash": "abc123"},
    {"type": "task_update", "task_id": "TASK-003", "status": "completed"}
  ]
}

Impact: VERY HIGH

Can't build causal graph
Can't trace decisions to outcomes
Can't answer "why did we do this?"

Implementation needed:

Message threading/parent references
Explicit causality metadata
Event correlation engine

Gap #5: Real-time State

Problem: Current state must be inferred from historical data

Current state:

"What's being worked on now?" requires querying latest messages
"Which task is active?" requires parsing TASKLIST
No live state indicator

Needed:

{
  "current_state": {
    "active_session": "session-123",
    "active_tasks": ["TASK-003", "TASK-005"],
    "active_files": ["src/middleware/jwt.rs"],
    "active_agents": ["rust-expert-developer"],
    "context": {
      "feature": "JWT validation",
      "activity": "implementing",
      "phase": "Phase 1: Foundation"
    },
    "last_updated": "2025-11-17T20:30:00Z"
  }
}

Impact: HIGH

Dashboard must reconstruct state (slow)
No "right now" indicator
Can't detect stalled work

Implementation needed:

Live state tracking service
State snapshot on every event
State persistence mechanism

Gap #6: Multi-Agent Workflow Metadata

Problem: No metadata for agent-to-agent collaboration

Current state:

Can't track agent handoffs
Can't measure agent coordination
No workflow state machine

Needed:

{
  "workflow": {
    "id": "workflow-oauth2-impl",
    "type": "feature_implementation",
    "stages": [
      {
        "name": "planning",
        "agent": "orchestrator",
        "started": "2025-11-17T14:00:00Z",
        "completed": "2025-11-17T14:30:00Z",
        "output": "Implementation plan with 5 tasks"
      },
      {
        "name": "implementation",
        "agent": "rust-expert-developer",
        "started": "2025-11-17T14:30:00Z",
        "status": "in_progress",
        "handed_off_from": "orchestrator"
      }
    ]
  }
}

Impact: CRITICAL for multi-agent

Can't visualize agent workflows
Can't optimize agent coordination
Can't detect handoff failures

Implementation needed:

Workflow state machine
Agent handoff metadata
Workflow orchestration tracking

Gap #7: Effort & Time Tracking

Problem: No time/effort metadata

Current state:

Timestamps exist but no duration
No effort estimates vs actuals
Can't measure productivity

Needed:

{
  "task": {
    "id": "TASK-003",
    "estimated_hours": 4,
    "actual_hours": 5.5,
    "started_at": "2025-11-17T14:00:00Z",
    "completed_at": "2025-11-17T19:30:00Z",
    "time_breakdown": {
      "planning": 0.5,
      "implementation": 3.5,
      "testing": 1.0,
      "debugging": 0.5
    }
  }
}

Impact: MEDIUM

Can't predict future work
Can't measure efficiency
Can't identify bottlenecks

Implementation needed:

Time tracking integration
Effort estimation framework
Activity classification

Gap #8: File → Task → Goal Mapping

Problem: Must infer relationships via fuzzy matching

Current state:

File changes not explicitly linked to tasks
Tasks not explicitly linked to goals
Weak correlation (60-70% accuracy)

Needed:

{
  "file": "src/middleware/oauth2.rs",
  "linked_tasks": ["TASK-003", "TASK-004"],
  "linked_goals": ["Setup authentication system"],
  "linked_conversations": ["msg-785", "msg-786", "msg-789"]
}

Impact: HIGH

Impact analysis inaccurate
Task completion detection weak
Can't auto-update TASKLIST

Implementation needed:

Explicit metadata in git commits
Task ID references in commit messages
Automated linkage on file save

Proposed Metadata Schema

Enhanced Message Format

{
  "message_id": "msg-1234",
  "hash": "content-hash",
  "session_id": "session-2025-11-17-oauth2",
  "timestamp": "2025-11-17T15:30:00Z",
  "parent_message_id": "msg-1233",

  "actor": {
    "type": "human" | "ai_agent",
    "name": "Hal Casteel" | "rust-expert-developer",
    "email": "hal@az1.ai"
  },

  "content": {
    "role": "user" | "assistant",
    "text": "Implementing OAuth2 middleware",
    "intent": {
      "activity": "implementing",
      "feature": "OAuth2 middleware",
      "component": "authentication"
    }
  },

  "context": {
    "project": "coditect-cloud-backend",
    "session_type": "implementation",
    "active_files": ["src/middleware/oauth2.rs"],
    "linked_task": "TASK-003",
    "linked_goal": "Setup authentication system",
    "phase": "Phase 1: Foundation"
  },

  "causality": {
    "caused_by": ["msg-1230", "msg-1231"],
    "caused": [
      {"type": "git_commit", "hash": "abc123"},
      {"type": "file_edit", "path": "src/middleware/oauth2.rs"}
    ]
  },

  "checkpoint": "checkpoint-id"
}

Enhanced Task Format

{
  "task_id": "TASK-003",
  "title": "Implement OAuth2 middleware",
  "description": "Add OAuth2 token validation middleware",

  "status": "completed",
  "created_at": "2025-11-15T10:00:00Z",
  "started_at": "2025-11-17T14:00:00Z",
  "completed_at": "2025-11-17T19:30:00Z",

  "owner": {
    "type": "human",
    "name": "Hal Casteel"
  },

  "agents_used": [
    {"name": "rust-expert-developer", "messages": 15}
  ],

  "linked_goal": {
    "phase": "Phase 1: Foundation",
    "goal": "Setup authentication system"
  },

  "effort": {
    "estimated_hours": 4,
    "actual_hours": 5.5
  },

  "artifacts": {
    "files_created": ["src/middleware/oauth2.rs"],
    "files_modified": ["src/middleware/mod.rs", "Cargo.toml"],
    "commits": ["abc123", "def456"],
    "conversations": ["session-2025-11-17-oauth2"]
  }
}

Implementation Priority

P0 - Critical (Needed for MVP)

Session ID - Group related messages
Agent Identity - Track which agent
Task IDs - Stable task references
File→Task links - Connect code to tasks

P1 - High Value (Needed for 360° view)

Intent metadata - Activity classification
Causality links - Message threading
Current state - Real-time tracking
Timestamps - Created/started/completed

P2 - Nice to Have (Analytics)

Effort tracking - Time estimates vs actuals
Workflow stages - Multi-agent orchestration
Dependencies - Task/goal relationships

Integration Strategy

Phase 1: Minimal Metadata (Week 1-2)

Add to existing systems without breaking changes:

Session ID in exports (file naming convention)
Agent detection from content patterns
Task ID regex in TASKLIST comments

Phase 2: Enhanced Metadata (Week 3-4)

Structured metadata additions:

JSON front-matter in tasklist.md
Git commit message format standardization
Session metadata in checkpoints

Phase 3: Full Schema (Week 5-6)

Complete metadata integration:

Enhanced message format
Real-time state tracking
Causality graph database

Success Metrics

After metadata enhancements:

✅ 95%+ agent identification accuracy
✅ 90%+ file→task linkage accuracy
✅ 100% session grouping coverage
✅ <5s state reconstruction time
✅ Complete causality tracing

Appendix: Quick Wins

Quick Win #1: Agent Detection (2 hours)

Use regex patterns to detect agents from content:

agent_patterns = {
    'rust-expert-developer': r'async.*Rust|tokio|actix',
    'database-architect': r'PostgreSQL|schema.*design'
}

Quick Win #2: Session from Filename (30 minutes)

Extract session context from export filenames:

"2025-11-17-EXPORT-oauth2-implementation.txt"
→ session_id: "2025-11-17-oauth2-implementation"

Quick Win #3: Task ID in Comments (1 hour)

Add task IDs to TASKLIST:

- [ ] Implement OAuth2 middleware  <!-- TASK-003 -->

Quick Win #4: Commit Message Format (1 hour)

Standardize git commits:

feat(auth): Add OAuth2 middleware

Task: TASK-003
Phase: Phase 1 - Foundation
Files: src/middleware/oauth2.rs

Next Steps:

Review metadata gaps with team
Prioritize P0 items for MVP
Implement quick wins (4.5 hours total)
Design full metadata schema
Roll out incrementally

Executive Summary​

Current State: What We Have ✅​

1. Message Content​

2. Git Data​

3. Strategic Planning​

4. Tactical Execution​

Critical Metadata Gaps​

Gap #1: Agent Identity & Tracking​

Gap #2: Session Context​

Gap #3: Intent Signals​

Gap #4: Causality Links​

Gap #5: Real-time State​

Gap #6: Multi-Agent Workflow Metadata​

Gap #7: Effort & Time Tracking​

Gap #8: File → Task → Goal Mapping​

Proposed Metadata Schema​

Enhanced Message Format​

Enhanced Task Format​

Implementation Priority​

P0 - Critical (Needed for MVP)​

P1 - High Value (Needed for 360° view)​

P2 - Nice to Have (Analytics)​

Integration Strategy​

Phase 1: Minimal Metadata (Week 1-2)​

Phase 2: Enhanced Metadata (Week 3-4)​

Phase 3: Full Schema (Week 5-6)​

Success Metrics​

Appendix: Quick Wins​

Quick Win #1: Agent Detection (2 hours)​

Quick Win #2: Session from Filename (30 minutes)​

Quick Win #3: Task ID in Comments (1 hour)​

Quick Win #4: Commit Message Format (1 hour)​

Executive Summary

Current State: What We Have ✅

1. Message Content

2. Git Data

3. Strategic Planning

4. Tactical Execution

Critical Metadata Gaps

Gap #1: Agent Identity & Tracking

Gap #2: Session Context

Gap #3: Intent Signals

Gap #4: Causality Links

Gap #5: Real-time State

Gap #6: Multi-Agent Workflow Metadata

Gap #7: Effort & Time Tracking

Gap #8: File → Task → Goal Mapping

Proposed Metadata Schema

Enhanced Message Format

Enhanced Task Format

Implementation Priority

P0 - Critical (Needed for MVP)

P1 - High Value (Needed for 360° view)

P2 - Nice to Have (Analytics)

Integration Strategy

Phase 1: Minimal Metadata (Week 1-2)

Phase 2: Enhanced Metadata (Week 3-4)

Phase 3: Full Schema (Week 5-6)

Success Metrics

Appendix: Quick Wins

Quick Win #1: Agent Detection (2 hours)

Quick Win #2: Session from Filename (30 minutes)

Quick Win #3: Task ID in Comments (1 hour)

Quick Win #4: Commit Message Format (1 hour)