Skip to main content

ADR-147: Submodule Observability Dashboard Architecture

Status: Accepted Date: 2026-02-02 Author: Hal Casteel Tracks: T.5.1 (MCP Server), C.14 (HTML Dashboard)


Context

CODITECT operates as a monorepo-of-polyrepos with 74 git submodules spanning 8 categories (cloud, core, dev, docs, enterprise-processes, gtm, integrations, investors, labs, ops, products). Each submodule has independent commit history, branch state, symlink configuration, and health characteristics.

Problem

Without centralized observability, the following failure modes occur regularly:

  1. Silent drift - Submodules fall behind remote by weeks without anyone noticing (2 stale submodules detected at time of implementation)
  2. Unpushed accumulation - Local commits pile up across 50+ submodules, risking data loss
  3. Symlink breakage - The .coditect and .claude symlinks that power distributed intelligence break silently
  4. No LLM awareness - AI agents working in the codebase have no way to query submodule health, leading to operations on stale or broken modules
  5. Manual inspection - Checking health requires running git status in 74 directories individually

Prior Art

  • scripts/submodule-health-check.py - Existing CLI tool with HealthStatus dataclass, scoring algorithm, and symlink verification. Solid data collection but text-only output.
  • scripts/sync-all-submodules.sh - Bulk sync operations (remediation, not observability)
  • tools/mcp-skill-server/server.py - Established MCP server pattern with CLI fallback

Requirements

RequirementPriority
LLM agents can query submodule health programmaticallyP0
Humans can view ecosystem health at a glanceP0
Reuse existing health-check logic (no duplication)P0
Self-contained output (no CDN dependencies)P1
Zero external Python dependenciesP1
Category-based filteringP1
Sortable, searchable interfaceP2
CI/CD integration capabilityP2

Decision

Implement a dual-interface observability system with shared data collection:

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│ Data Collection Layer │
│ submodule-health-check.py │
│ ┌──────────────┐ ┌─────────────┐ ┌───────────────────┐ │
│ │find_rollout_ │ │check_sub- │ │calculate_health_ │ │
│ │master_root() │ │module_health()│ │score() │ │
│ └──────────────┘ └─────────────┘ └───────────────────┘ │
│ ▲ ▲ │
└──────────────┼────────────────────┼──────────────────────────┘
│ │
┌──────────┴──────┐ ┌─────────┴──────────┐
│ MCP Server │ │ HTML Dashboard │
│ (T.5.1) │ │ Generator (C.14) │
│ │ │ │
│ 5 MCP Tools: │ │ Self-contained │
│ • list │ │ HTML output: │
│ • status │ │ • Summary cards │
│ • summary │ │ • Category tabs │
│ • dirty │ │ • Sortable table │
│ • stale │ │ • Search filter │
│ │ │ • Health bars │
│ CLI Fallback │ │ • Dark theme │
└────────┬────────┘ └─────────┬───────────┘
│ │
┌────────▼────────┐ ┌────────▼───────────┐
│ LLM Consumers │ │ Human Consumers │
│ │ │ │
│ Claude Code │ │ Browser │
│ Codex │ │ CI/CD artifacts │
│ Gemini │ │ Slack/email attach │
│ Any MCP client │ │ Static hosting │
└─────────────────┘ └─────────────────────┘

Component 1: MCP Git-Status Server (T.5.1)

Location: tools/mcp-git-status/server.py

Exposes submodule health data via Model Context Protocol, enabling any MCP-capable LLM to query repository health without manual inspection.

5 MCP Tools:

ToolPurposeParameters
list_submodulesList all submodules with basic statuscategory (optional filter)
get_submodule_statusDetailed status for one submodulename (required)
get_health_summaryAggregate dashboard metricsNone
get_dirty_submodulesOnly submodules needing attentionNone
get_stale_submodulesSubmodules inactive > N daysdays (default 30)

Design Decisions:

  • MCP + CLI dual mode - --mcp flag for MCP server, CLI flags for direct use. Follows mcp-skill-server pattern.
  • Optional MCP dependency - Server works as CLI tool without the mcp Python package installed. Graceful degradation.
  • 30-second git timeout - Prevents hung submodule checks from blocking the server.
  • Structured logging - Writes to ~/.coditect/logs/mcp-git-status.log for debugging.

Component 2: HTML Dashboard Generator (C.14)

Location: scripts/submodule-dashboard-html.py

Generates a single self-contained HTML file with interactive visualization of submodule health across the entire ecosystem.

Dashboard Layout:

┌─────────────────────────────────────────────────────────────┐
│ CODITECT Submodule Health Dashboard Generated: ... │
├──────────┬──────────┬──────────┬──────────┬────────┬────────┤
│ Total │ Clean │ Dirty │ Stale │Unpushed│ Avg │
│ 74 │ 20 │ 42 │ 2 │ 50 │ 85.9 │
├──────────┴──────────┴──────────┴──────────┴────────┴────────┤
│ [All] [Cloud] [Core] [Dev] [Docs] [GTM] [Integrations] ... │
├─────────────────────────────────────────────────────────────┤
│ [🔍 Search submodules...] │
├──────┬─────────┬────────┬────────┬──────────┬───────────────┤
│ Name │Category │ Branch │ Status │Uncommit. │ Health Score │
├──────┼─────────┼────────┼────────┼──────────┼───────────────┤
│ repo │ Cloud │ main │ Clean │ 0 │ ████████ 95 │
│ repo │ Core │ dev │ Dirty │ 3 │ ██████▒▒ 72 │
│ repo │ Dev │ main │ Clean │ 0 │ ██████████100 │
└──────┴─────────┴────────┴────────┴──────────┴───────────────┘

Design Decisions:

  • Self-contained HTML - All CSS and JS inlined. No CDN dependencies. Works offline, in CI artifacts, and behind firewalls.
  • Vanilla JavaScript - No React, no frameworks. Sort in <50ms, filter in <20ms, render in <100ms.
  • Dark theme - Matches developer tooling aesthetic. Color-coded health: green (80-100), yellow (50-79), red (0-49).
  • Dynamic import - Uses importlib.util to load submodule-health-check.py as a module, avoiding code duplication.

Shared: Health Scoring Algorithm

Both components consume the same scoring from submodule-health-check.py:

FactorDeductionMax
Uncommitted changes-5 per file-20
Unpushed commits-5 per commit-20
Behind remote-10-10
Detached HEAD-15-15
Not on main/master-5-5
Missing .coditect symlink-20-20
Broken .coditect symlink-20-20
Broken framework symlink-10-10

Score ranges: Excellent (90-100), Good (70-89), Fair (50-69), Poor (0-49)


Technical Design

Data Flow

git submodule foreach  ──▶  check_git_status()  ──▶  HealthStatus dataclass

┌────────────────────────────┤
│ │
┌─────────▼─────────┐ ┌──────────▼──────────┐
│ MCP JSON response │ │ HTML template render │
│ (stdio transport) │ │ (string interpolation)│
└───────────────────┘ └──────────────────────┘

File Structure

coditect-core/
├── scripts/
│ ├── submodule-health-check.py # Shared data collection (existing)
│ └── submodule-dashboard-html.py # HTML generator (new, C.14)
├── tools/
│ └── mcp-git-status/
│ ├── server.py # MCP server (new, T.5.1)
│ └── README.md # Integration docs

Integration Points

Claude Code (~/.claude/settings.json):

{
"mcpServers": {
"coditect-git-status": {
"command": "python3",
"args": ["~/.coditect/tools/mcp-git-status/server.py", "--mcp"]
}
}
}

CI/CD (GitHub Actions):

- name: Generate Health Dashboard
run: python3 scripts/submodule-dashboard-html.py --output dashboard.html
- uses: actions/upload-artifact@v4
with:
name: submodule-dashboard
path: dashboard.html

Cron (daily at 9 AM):

0 9 * * * cd ~/PROJECTS/coditect-rollout-master && \
python3 .coditect/scripts/submodule-dashboard-html.py \
--output ~/public_html/dashboard.html

Software Design

Class Diagram

┌──────────────────────┐
│ HealthStatus │ (from submodule-health-check.py)
├──────────────────────┤
│ name: str │
│ path: Path │
│ category: str │
│ branch: str │
│ is_clean: bool │
│ uncommitted: int │
│ unpushed: int │
│ behind: int │
│ ahead: int │
│ last_commit: str │
│ health_score: int │
│ coditect_symlink: bool│
│ framework_symlink: bool│
├──────────────────────┤
│ to_dict() -> dict │
└──────────┬───────────┘

┌──────┴──────┐
│ │
┌───▼──────┐ ┌───▼──────────┐
│MCP Server│ │HTML Generator│
│ │ │ │
│5 tools │ │generate_html()│
│CLI mode │ │generate_cards()│
│JSON out │ │generate_table()│
│logging │ │generate_js() │
└──────────┘ └──────────────┘

Error Handling Strategy

Failure ModeMCP Server ResponseDashboard Response
Git not installedError JSON + logExit code 1 + stderr
Submodule unreachableSkip + warning in resultRow with "N/A" values
Timeout (>30s)Partial resultsPartial dashboard
No rollout-master rootError: "Not in ecosystem"Error: "Not in ecosystem"
Health-check import failExit code 1Exit code 1

Performance Characteristics

MetricMCP ServerHTML Dashboard
Startup<1s<1s
10 submodules~2s~2s
97 submodules~15s~15s
Output size~5-50 KB JSON~16-34 KB HTML
Browser renderN/A<100ms
Sort operationN/A<50ms
Filter operationN/A<20ms

Value Proposition

Customer Journey

┌─────────────────────────────────────────────────────────────────┐
│ CUSTOMER JOURNEY MAP │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 1. AWARENESS │
│ ┌───────────────────────────────────────────────────┐ │
│ │ "Our 30+ microservice repos are a mess. │ │
│ │ No one knows which are stale, which have │ │
│ │ unpushed work, or which symlinks are broken." │ │
│ └───────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ 2. DISCOVERY │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Customer runs: python3 scripts/submodule-dashboard │ │
│ │ Opens dashboard.html → sees 12 repos at "Poor" │ │
│ │ health, 3 with broken symlinks, 8 stale. │ │
│ └───────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ 3. REMEDIATION │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Sorts by health score → fixes worst first. │ │
│ │ Filters by category → tackles "integrations" │ │
│ │ that are all behind. LLM uses MCP to check │ │
│ │ status before every operation. │ │
│ └───────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ 4. CONTINUOUS MONITORING │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Daily cron generates dashboard. CI/CD uploads │ │
│ │ as artifact. LLM agents query MCP before commits. │ │
│ │ Health score trends visible over time. │ │
│ └───────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ 5. OPTIMIZATION │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Team establishes SLOs: "No repo below 70 health." │ │
│ │ Dashboard becomes daily standup artifact. │ │
│ │ MCP server integrated into automated workflows. │ │
│ └───────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘

Value Metrics

MetricBeforeAfterImprovement
Time to check all repos~30 min (manual)15 sec (automated)120x faster
Stale repos discoveredAccidentallyProactivelyPrevention vs reaction
LLM context awarenessNoneReal-time via MCPNew capability
Symlink breakage detectionManual ls -laAutomatic health scoreZero manual effort
Unpushed work visibilityUnknownDashboard summary cardRisk elimination
CI/CD integrationNoneHTML artifact + MCPAutomated governance

Target Personas

PersonaPrimary InterfaceKey Value
Platform EngineerHTML DashboardVisual ecosystem overview, category filtering
AI Agent (LLM)MCP ServerProgrammatic health queries before operations
Engineering ManagerHTML Dashboard (daily)Health SLO compliance, trend tracking
DevOps EngineerBothCLI for scripting, dashboard for reporting
CI/CD PipelineDashboard + MCPAutomated health gates, artifact generation

Competitive Differentiation

FeatureCODITECTGitHubGitLabMonorepo tools
LLM-native MCP interfaceYesNoNoNo
Self-contained HTML outputYesNo (SaaS)No (SaaS)Varies
Symlink health trackingYesNoNoNo
Offline-capable dashboardYesNoNoSome
Zero dependenciesYesN/AN/ARarely
Health scoring algorithmCustomNoneNoneBasic
Category-based filteringYesLabelsGroupsVaries

Consequences

Positive

  1. LLM-first observability - AI agents can now query submodule health before performing operations, reducing errors on stale/broken repos
  2. Visual ecosystem awareness - One-click dashboard generation replaces 74 manual git status commands
  3. Zero new dependencies - Both tools use Python stdlib only (MCP package optional for MCP mode)
  4. Code reuse - Both tools share submodule-health-check.py data collection, eliminating duplication
  5. CI/CD ready - Dashboard HTML is a single artifact for pipelines; MCP server enables health gates
  6. Offline capable - Self-contained HTML works without network access

Negative

  1. 15-second generation time - Scanning 97 submodules via git subprocess calls takes time. Caching would add complexity.
  2. No historical data - Current implementation is point-in-time. Trend analysis requires external storage.
  3. Dynamic import fragility - Dashboard uses importlib.util to load health-check module; path changes break it.

Risks

RiskLikelihoodImpactMitigation
Health-check API changesLowMediumPin interface, add version check
MCP protocol evolutionLowLowMCP is versioned, server declares version
Submodule count growthMediumLowLinear scaling, 15s for 74 is acceptable
Git lock contentionMediumLow30-second timeout per submodule

Future Enhancements

Phase 2 (Planned)

  1. Historical data tracking - Write scores to SQLite for trend analysis
  2. Trend graphs - Chart.js integration for health over time
  3. GitHub API enrichment - PR counts, issue counts per submodule
  4. Alert thresholds - Email/Slack when health drops below SLO
  5. MCP server registration - Auto-register in ~/.claude/settings.json during initial setup (ADR-057)

Phase 3 (Future)

  1. Real-time updates - WebSocket-based live dashboard
  2. Multi-ecosystem support - Monitor multiple rollout-master instances
  3. Remediation actions - "Fix" buttons that trigger git push, git pull, symlink repair
  4. Export formats - CSV, JSON, PDF report generation

References


Decision Date: 2026-02-02 Review Date: 2026-05-02 Implementation: Complete (T.5.1 + C.14)