ADR-147: Submodule Observability Dashboard Architecture
Status: Accepted Date: 2026-02-02 Author: Hal Casteel Tracks: T.5.1 (MCP Server), C.14 (HTML Dashboard)
Context
CODITECT operates as a monorepo-of-polyrepos with 74 git submodules spanning 8 categories (cloud, core, dev, docs, enterprise-processes, gtm, integrations, investors, labs, ops, products). Each submodule has independent commit history, branch state, symlink configuration, and health characteristics.
Problem
Without centralized observability, the following failure modes occur regularly:
- Silent drift - Submodules fall behind remote by weeks without anyone noticing (2 stale submodules detected at time of implementation)
- Unpushed accumulation - Local commits pile up across 50+ submodules, risking data loss
- Symlink breakage - The
.coditectand.claudesymlinks that power distributed intelligence break silently - No LLM awareness - AI agents working in the codebase have no way to query submodule health, leading to operations on stale or broken modules
- Manual inspection - Checking health requires running
git statusin 74 directories individually
Prior Art
scripts/submodule-health-check.py- Existing CLI tool withHealthStatusdataclass, scoring algorithm, and symlink verification. Solid data collection but text-only output.scripts/sync-all-submodules.sh- Bulk sync operations (remediation, not observability)tools/mcp-skill-server/server.py- Established MCP server pattern with CLI fallback
Requirements
| Requirement | Priority |
|---|---|
| LLM agents can query submodule health programmatically | P0 |
| Humans can view ecosystem health at a glance | P0 |
| Reuse existing health-check logic (no duplication) | P0 |
| Self-contained output (no CDN dependencies) | P1 |
| Zero external Python dependencies | P1 |
| Category-based filtering | P1 |
| Sortable, searchable interface | P2 |
| CI/CD integration capability | P2 |
Decision
Implement a dual-interface observability system with shared data collection:
Architecture Overview
┌─────────────────────────────────────────────────────────────┐
│ Data Collection Layer │
│ submodule-health-check.py │
│ ┌──────────────┐ ┌─────────────┐ ┌───────────────────┐ │
│ │find_rollout_ │ │check_sub- │ │calculate_health_ │ │
│ │master_root() │ │module_health()│ │score() │ │
│ └──────────────┘ └─────────────┘ └───────────────────┘ │
│ ▲ ▲ │
└──────────────┼────────────────────┼──────────────────────────┘
│ │
┌──────────┴──────┐ ┌─────────┴──────────┐
│ MCP Server │ │ HTML Dashboard │
│ (T.5.1) │ │ Generator (C.14) │
│ │ │ │
│ 5 MCP Tools: │ │ Self-contained │
│ • list │ │ HTML output: │
│ • status │ │ • Summary cards │
│ • summary │ │ • Category tabs │
│ • dirty │ │ • Sortable table │
│ • stale │ │ • Search filter │
│ │ │ • Health bars │
│ CLI Fallback │ │ • Dark theme │
└────────┬────────┘ └─────────┬───────────┘
│ │
┌────────▼────────┐ ┌────────▼───────────┐
│ LLM Consumers │ │ Human Consumers │
│ │ │ │
│ Claude Code │ │ Browser │
│ Codex │ │ CI/CD artifacts │
│ Gemini │ │ Slack/email attach │
│ Any MCP client │ │ Static hosting │
└─────────────────┘ └─────────────────────┘
Component 1: MCP Git-Status Server (T.5.1)
Location: tools/mcp-git-status/server.py
Exposes submodule health data via Model Context Protocol, enabling any MCP-capable LLM to query repository health without manual inspection.
5 MCP Tools:
| Tool | Purpose | Parameters |
|---|---|---|
list_submodules | List all submodules with basic status | category (optional filter) |
get_submodule_status | Detailed status for one submodule | name (required) |
get_health_summary | Aggregate dashboard metrics | None |
get_dirty_submodules | Only submodules needing attention | None |
get_stale_submodules | Submodules inactive > N days | days (default 30) |
Design Decisions:
- MCP + CLI dual mode -
--mcpflag for MCP server, CLI flags for direct use. Followsmcp-skill-serverpattern. - Optional MCP dependency - Server works as CLI tool without the
mcpPython package installed. Graceful degradation. - 30-second git timeout - Prevents hung submodule checks from blocking the server.
- Structured logging - Writes to
~/.coditect/logs/mcp-git-status.logfor debugging.
Component 2: HTML Dashboard Generator (C.14)
Location: scripts/submodule-dashboard-html.py
Generates a single self-contained HTML file with interactive visualization of submodule health across the entire ecosystem.
Dashboard Layout:
┌─────────────────────────────────────────────────────────────┐
│ CODITECT Submodule Health Dashboard Generated: ... │
├──────────┬──────────┬──────────┬──────────┬────────┬────────┤
│ Total │ Clean │ Dirty │ Stale │Unpushed│ Avg │
│ 74 │ 20 │ 42 │ 2 │ 50 │ 85.9 │
├──────────┴──────────┴──────────┴──────────┴────────┴────────┤
│ [All] [Cloud] [Core] [Dev] [Docs] [GTM] [Integrations] ... │
├─────────────────────────────────────────────────────────────┤
│ [🔍 Search submodules...] │
├──────┬─────────┬────────┬────────┬──────────┬───────────────┤
│ Name │Category │ Branch │ Status │Uncommit. │ Health Score │
├──────┼─────────┼────────┼────────┼──────────┼───────────────┤
│ repo │ Cloud │ main │ Clean │ 0 │ ████████ 95 │
│ repo │ Core │ dev │ Dirty │ 3 │ ██████▒▒ 72 │
│ repo │ Dev │ main │ Clean │ 0 │ ██████████100 │
└──────┴─────────┴────────┴────────┴──────────┴───────────────┘
Design Decisions:
- Self-contained HTML - All CSS and JS inlined. No CDN dependencies. Works offline, in CI artifacts, and behind firewalls.
- Vanilla JavaScript - No React, no frameworks. Sort in <50ms, filter in <20ms, render in <100ms.
- Dark theme - Matches developer tooling aesthetic. Color-coded health: green (80-100), yellow (50-79), red (0-49).
- Dynamic import - Uses
importlib.utilto loadsubmodule-health-check.pyas a module, avoiding code duplication.
Shared: Health Scoring Algorithm
Both components consume the same scoring from submodule-health-check.py:
| Factor | Deduction | Max |
|---|---|---|
| Uncommitted changes | -5 per file | -20 |
| Unpushed commits | -5 per commit | -20 |
| Behind remote | -10 | -10 |
| Detached HEAD | -15 | -15 |
| Not on main/master | -5 | -5 |
| Missing .coditect symlink | -20 | -20 |
| Broken .coditect symlink | -20 | -20 |
| Broken framework symlink | -10 | -10 |
Score ranges: Excellent (90-100), Good (70-89), Fair (50-69), Poor (0-49)
Technical Design
Data Flow
git submodule foreach ──▶ check_git_status() ──▶ HealthStatus dataclass
│
┌────────────────────────────┤
│ │
┌─────────▼─────────┐ ┌──────────▼──────────┐
│ MCP JSON response │ │ HTML template render │
│ (stdio transport) │ │ (string interpolation)│
└───────────────────┘ └──────────────────────┘
File Structure
coditect-core/
├── scripts/
│ ├── submodule-health-check.py # Shared data collection (existing)
│ └── submodule-dashboard-html.py # HTML generator (new, C.14)
├── tools/
│ └── mcp-git-status/
│ ├── server.py # MCP server (new, T.5.1)
│ └── README.md # Integration docs
Integration Points
Claude Code (~/.claude/settings.json):
{
"mcpServers": {
"coditect-git-status": {
"command": "python3",
"args": ["~/.coditect/tools/mcp-git-status/server.py", "--mcp"]
}
}
}
CI/CD (GitHub Actions):
- name: Generate Health Dashboard
run: python3 scripts/submodule-dashboard-html.py --output dashboard.html
- uses: actions/upload-artifact@v4
with:
name: submodule-dashboard
path: dashboard.html
Cron (daily at 9 AM):
0 9 * * * cd ~/PROJECTS/coditect-rollout-master && \
python3 .coditect/scripts/submodule-dashboard-html.py \
--output ~/public_html/dashboard.html
Software Design
Class Diagram
┌──────────────────────┐
│ HealthStatus │ (from submodule-health-check.py)
├──────────────────────┤
│ name: str │
│ path: Path │
│ category: str │
│ branch: str │
│ is_clean: bool │
│ uncommitted: int │
│ unpushed: int │
│ behind: int │
│ ahead: int │
│ last_commit: str │
│ health_score: int │
│ coditect_symlink: bool│
│ framework_symlink: bool│
├──────────────────────┤
│ to_dict() -> dict │
└──────────┬───────────┘
│
┌──────┴──────┐
│ │
┌───▼──────┐ ┌───▼──────────┐
│MCP Server│ │HTML Generator│
│ │ │ │
│5 tools │ │generate_html()│
│CLI mode │ │generate_cards()│
│JSON out │ │generate_table()│
│logging │ │generate_js() │
└──────────┘ └──────────────┘
Error Handling Strategy
| Failure Mode | MCP Server Response | Dashboard Response |
|---|---|---|
| Git not installed | Error JSON + log | Exit code 1 + stderr |
| Submodule unreachable | Skip + warning in result | Row with "N/A" values |
| Timeout (>30s) | Partial results | Partial dashboard |
| No rollout-master root | Error: "Not in ecosystem" | Error: "Not in ecosystem" |
| Health-check import fail | Exit code 1 | Exit code 1 |
Performance Characteristics
| Metric | MCP Server | HTML Dashboard |
|---|---|---|
| Startup | <1s | <1s |
| 10 submodules | ~2s | ~2s |
| 97 submodules | ~15s | ~15s |
| Output size | ~5-50 KB JSON | ~16-34 KB HTML |
| Browser render | N/A | <100ms |
| Sort operation | N/A | <50ms |
| Filter operation | N/A | <20ms |
Value Proposition
Customer Journey
┌─────────────────────────────────────────────────────────────────┐
│ CUSTOMER JOURNEY MAP │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 1. AWARENESS │
│ ┌───────────────────────────────────────────────────┐ │
│ │ "Our 30+ microservice repos are a mess. │ │
│ │ No one knows which are stale, which have │ │
│ │ unpushed work, or which symlinks are broken." │ │
│ └───────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ 2. DISCOVERY │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Customer runs: python3 scripts/submodule-dashboard │ │
│ │ Opens dashboard.html → sees 12 repos at "Poor" │ │
│ │ health, 3 with broken symlinks, 8 stale. │ │
│ └───────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ 3. REMEDIATION │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Sorts by health score → fixes worst first. │ │
│ │ Filters by category → tackles "integrations" │ │
│ │ that are all behind. LLM uses MCP to check │ │
│ │ status before every operation. │ │
│ └───────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ 4. CONTINUOUS MONITORING │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Daily cron generates dashboard. CI/CD uploads │ │
│ │ as artifact. LLM agents query MCP before commits. │ │
│ │ Health score trends visible over time. │ │
│ └───────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ 5. OPTIMIZATION │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Team establishes SLOs: "No repo below 70 health." │ │
│ │ Dashboard becomes daily standup artifact. │ │
│ │ MCP server integrated into automated workflows. │ │
│ └───────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Value Metrics
| Metric | Before | After | Improvement |
|---|---|---|---|
| Time to check all repos | ~30 min (manual) | 15 sec (automated) | 120x faster |
| Stale repos discovered | Accidentally | Proactively | Prevention vs reaction |
| LLM context awareness | None | Real-time via MCP | New capability |
| Symlink breakage detection | Manual ls -la | Automatic health score | Zero manual effort |
| Unpushed work visibility | Unknown | Dashboard summary card | Risk elimination |
| CI/CD integration | None | HTML artifact + MCP | Automated governance |
Target Personas
| Persona | Primary Interface | Key Value |
|---|---|---|
| Platform Engineer | HTML Dashboard | Visual ecosystem overview, category filtering |
| AI Agent (LLM) | MCP Server | Programmatic health queries before operations |
| Engineering Manager | HTML Dashboard (daily) | Health SLO compliance, trend tracking |
| DevOps Engineer | Both | CLI for scripting, dashboard for reporting |
| CI/CD Pipeline | Dashboard + MCP | Automated health gates, artifact generation |
Competitive Differentiation
| Feature | CODITECT | GitHub | GitLab | Monorepo tools |
|---|---|---|---|---|
| LLM-native MCP interface | Yes | No | No | No |
| Self-contained HTML output | Yes | No (SaaS) | No (SaaS) | Varies |
| Symlink health tracking | Yes | No | No | No |
| Offline-capable dashboard | Yes | No | No | Some |
| Zero dependencies | Yes | N/A | N/A | Rarely |
| Health scoring algorithm | Custom | None | None | Basic |
| Category-based filtering | Yes | Labels | Groups | Varies |
Consequences
Positive
- LLM-first observability - AI agents can now query submodule health before performing operations, reducing errors on stale/broken repos
- Visual ecosystem awareness - One-click dashboard generation replaces 74 manual
git statuscommands - Zero new dependencies - Both tools use Python stdlib only (MCP package optional for MCP mode)
- Code reuse - Both tools share
submodule-health-check.pydata collection, eliminating duplication - CI/CD ready - Dashboard HTML is a single artifact for pipelines; MCP server enables health gates
- Offline capable - Self-contained HTML works without network access
Negative
- 15-second generation time - Scanning 97 submodules via
gitsubprocess calls takes time. Caching would add complexity. - No historical data - Current implementation is point-in-time. Trend analysis requires external storage.
- Dynamic import fragility - Dashboard uses
importlib.utilto load health-check module; path changes break it.
Risks
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Health-check API changes | Low | Medium | Pin interface, add version check |
| MCP protocol evolution | Low | Low | MCP is versioned, server declares version |
| Submodule count growth | Medium | Low | Linear scaling, 15s for 74 is acceptable |
| Git lock contention | Medium | Low | 30-second timeout per submodule |
Future Enhancements
Phase 2 (Planned)
- Historical data tracking - Write scores to SQLite for trend analysis
- Trend graphs - Chart.js integration for health over time
- GitHub API enrichment - PR counts, issue counts per submodule
- Alert thresholds - Email/Slack when health drops below SLO
- MCP server registration - Auto-register in
~/.claude/settings.jsonduring initial setup (ADR-057)
Phase 3 (Future)
- Real-time updates - WebSocket-based live dashboard
- Multi-ecosystem support - Monitor multiple rollout-master instances
- Remediation actions - "Fix" buttons that trigger
git push,git pull, symlink repair - Export formats - CSV, JSON, PDF report generation
References
- submodule-health-check.py - Shared data collection
- mcp-skill-server/server.py - MCP pattern reference
- MCP Specification - Model Context Protocol
- ADR-057 - Initial setup (future MCP registration)
- ADR-116 - Track-based architecture
Decision Date: 2026-02-02 Review Date: 2026-05-02 Implementation: Complete (T.5.1 + C.14)