Implementation Requirements: QA Agent Browser Automation

Document ID: IMPL-REQ-002
Priority: P0 (Critical Path)
Target ADR: ADR-109 (Proposed)
Estimated Effort: 2 Sprints
Dependencies: QA Agent, MCP Infrastructure, Playwright

1. Overview

1.1 Problem Statement

The Ralph Wiggum community has empirically validated that self-verification is non-negotiable for autonomous development loops. Without the ability to verify UI/UX changes, agents cannot:

Confirm features work end-to-end
Detect visual regressions
Validate user flows
Generate compliance evidence of testing

Current QA Agent capabilities are limited to unit/integration tests—insufficient for full-stack autonomous development.

1.2 Objective

Integrate Playwright browser automation into the QA Agent, enabling:

Headless browser testing within autonomous loops
Screenshot capture for visual validation
Console log analysis for error detection
E2E test execution with result reporting

1.3 Success Criteria

Metric	Target
E2E test execution time	< 30s per test
Screenshot capture reliability	> 99%
False positive rate	< 5%
Integration with checkpoint system	100%

2. Functional Requirements

2.1 Playwright MCP Server Integration

FR-001: MCP Server Configuration

MUST integrate Playwright MCP server with capabilities:
├── browser_launch(browser_type, options) → session_id
├── page_navigate(session_id, url) → void
├── page_screenshot(session_id, options) → base64_image
├── page_click(session_id, selector) → void
├── page_fill(session_id, selector, value) → void
├── page_evaluate(session_id, script) → result
├── page_wait_for_selector(session_id, selector, timeout) → void
├── page_get_console_logs(session_id) → [LogEntry]
├── browser_close(session_id) → void
└── network_intercept(session_id, patterns) → [NetworkEvent]

Configuration:
  server_name: "playwright-mcp"
  transport: "stdio"
  browser_defaults:
    headless: true
    viewport: { width: 1280, height: 720 }
    timeout: 30000

2.2 QA Agent Tool Definitions

FR-002: QA Agent Browser Tools

tools:
  - name: "verify_page_loads"
    description: "Navigate to URL and verify page loads successfully"
    parameters:
      url: string
      expected_title: string (optional)
      timeout_ms: integer (default: 10000)
    returns:
      success: boolean
      screenshot: base64
      load_time_ms: integer
      console_errors: array

  - name: "verify_element_exists"
    description: "Check if element exists on page"
    parameters:
      url: string
      selector: string
      should_be_visible: boolean (default: true)
    returns:
      exists: boolean
      visible: boolean
      screenshot: base64

  - name: "verify_user_flow"
    description: "Execute multi-step user flow and verify outcome"
    parameters:
      start_url: string
      steps: array of FlowStep
      success_condition: string (selector or text)
    returns:
      success: boolean
      steps_completed: integer
      failure_step: integer (if failed)
      screenshots: array of base64
      console_logs: array

  - name: "capture_visual_baseline"
    description: "Capture screenshot for visual regression baseline"
    parameters:
      url: string
      name: string
      selectors: array (optional, for component screenshots)
    returns:
      baseline_id: string
      screenshots: array of { name, base64 }

  - name: "compare_visual_regression"
    description: "Compare current state against visual baseline"
    parameters:
      url: string
      baseline_id: string
      threshold: float (default: 0.01)
    returns:
      passed: boolean
      diff_percentage: float
      diff_image: base64 (if failed)
      
  - name: "analyze_console_errors"
    description: "Check page for JavaScript errors"
    parameters:
      url: string
      ignore_patterns: array (optional)
    returns:
      has_errors: boolean
      errors: array of { message, source, line }
      warnings: array

2.3 Flow Step Definition

FR-003: User Flow Step Schema

interface FlowStep {
  action: 'navigate' | 'click' | 'fill' | 'select' | 'wait' | 'assert' | 'screenshot';
  
  // For navigate
  url?: string;
  
  // For click, fill, select, wait, assert
  selector?: string;
  
  // For fill
  value?: string;
  
  // For select
  option?: string;
  
  // For wait
  timeout_ms?: number;
  
  // For assert
  assertion?: {
    type: 'exists' | 'visible' | 'text_contains' | 'value_equals';
    expected?: string;
  };
  
  // For screenshot
  name?: string;
  
  // Common
  description?: string;  // Human-readable step description
}

2.4 Test Result Reporting

FR-004: Browser Test Results Integration

Test results MUST integrate with:
├── Checkpoint system (include in checkpoint metrics)
├── Compliance audit trail (evidence of testing)
├── CI/CD pipeline (JUnit XML output)
└── Agent decision making (pass/fail affects next steps)

Result schema:
  test_run_id: string
  timestamp: datetime
  duration_ms: integer
  status: 'passed' | 'failed' | 'error' | 'skipped'
  
  test_cases:
    - name: string
      status: 'passed' | 'failed' | 'error'
      duration_ms: integer
      screenshots: array
      console_logs: array
      failure_reason: string (if failed)
      
  summary:
    total: integer
    passed: integer
    failed: integer
    error: integer
    skipped: integer
    
  artifacts:
    screenshots: array of { name, path, base64 }
    videos: array of { name, path } (optional)
    traces: array of { name, path } (optional)

3. Non-Functional Requirements

3.1 Performance

Requirement	Specification
NFR-001	Browser launch < 3s
NFR-002	Page navigation < 10s (timeout configurable)
NFR-003	Screenshot capture < 500ms
NFR-004	Concurrent sessions: up to 5
NFR-005	Memory usage < 512MB per browser instance

3.2 Reliability

Requirement	Specification
NFR-006	Auto-retry on transient failures (3x)
NFR-007	Graceful timeout handling
NFR-008	Browser crash recovery
NFR-009	Session cleanup on agent termination

3.3 Security

Requirement	Specification
NFR-010	Sandbox browser execution
NFR-011	No access to host filesystem
NFR-012	Network isolation (configurable allowlist)
NFR-013	No credential persistence

3.4 Compliance

Requirement	Specification
NFR-014	Screenshot evidence for FDA 21 CFR Part 11
NFR-015	Test execution logs for SOC2 audit
NFR-016	Timestamp verification for test evidence
NFR-017	Hash verification of screenshot integrity

4. Implementation Steps

Phase 1: MCP Server Setup (Week 1)

Step 1.1: Playwright MCP Server Installation
├── Add playwright-mcp to MCP server configuration
├── Configure browser binaries (Chromium default)
├── Set up headless execution environment
├── Configure resource limits (memory, CPU)
└── Validate MCP protocol communication

Step 1.2: MCP Client Integration
├── Update QA Agent MCP client configuration
├── Add playwright-mcp to allowed servers
├── Implement tool discovery for browser tools
├── Add error handling for MCP communication
└── Test basic connectivity

Step 1.3: Environment Configuration
├── Docker container setup for browser isolation
├── X virtual framebuffer (Xvfb) for headless
├── Font installation for consistent rendering
├── Timezone configuration for timestamp consistency
└── Network configuration for test environments

Phase 2: Core Browser Tools (Week 2)

Step 2.1: Basic Navigation Tools
├── Implement verify_page_loads
│   ├── Launch browser session
│   ├── Navigate to URL
│   ├── Wait for load event
│   ├── Capture screenshot
│   ├── Collect console logs
│   └── Return result with metrics
├── Implement verify_element_exists
│   ├── Navigate to page
│   ├── Query selector
│   ├── Check visibility
│   └── Capture element screenshot
└── Add timeout and retry logic

Step 2.2: Interaction Tools
├── Implement click action
│   ├── Wait for element
│   ├── Scroll into view
│   ├── Click with retry
│   └── Wait for navigation (if applicable)
├── Implement fill action
│   ├── Clear existing value
│   ├── Type with realistic delay
│   └── Verify value entered
└── Implement select action
    ├── Find select element
    ├── Choose option
    └── Verify selection

Step 2.3: User Flow Execution
├── Implement verify_user_flow
│   ├── Parse flow steps
│   ├── Execute steps sequentially
│   ├── Capture screenshot after each step
│   ├── Collect console logs throughout
│   ├── Detect failures and stop
│   └── Return comprehensive result
├── Add step-level error handling
└── Implement flow timeout (aggregate)

Phase 3: Visual Testing (Week 3)

Step 3.1: Baseline Management
├── Implement capture_visual_baseline
│   ├── Navigate and wait for stability
│   ├── Capture full-page screenshot
│   ├── Capture component screenshots (if selectors)
│   ├── Store with baseline_id
│   └── Hash for integrity
├── Design baseline storage:
│   ├── FoundationDB for metadata
│   ├── Object storage for images
│   └── Versioning for baseline updates
└── Implement baseline retrieval

Step 3.2: Visual Comparison
├── Implement compare_visual_regression
│   ├── Capture current screenshot
│   ├── Retrieve baseline
│   ├── Pixel-diff comparison
│   ├── Calculate diff percentage
│   ├── Generate diff image (highlight changes)
│   └── Return pass/fail with evidence
├── Configure comparison thresholds
│   ├── Global default (1%)
│   ├── Per-component overrides
│   └── Ignore regions (dynamic content)
└── Add anti-aliasing tolerance

Step 3.3: Visual Test Integration
├── Add visual checks to verify_user_flow
├── Auto-capture baselines on first run
├── Flag visual regressions in results
└── Generate visual diff reports

Phase 4: Integration and Testing (Week 4)

Step 4.1: Checkpoint Integration
├── Add browser test results to checkpoint schema
├── Include screenshots in checkpoint artifacts
├── Link test runs to agent iterations
└── Enable recovery with test state

Step 4.2: Agent Decision Integration
├── Update QA Agent prompt to use browser tools
├── Define verification strategies:
│   ├── After implementation changes
│   ├── Before marking task complete
│   └── On-demand from orchestrator
├── Implement pass/fail decision logic
└── Add retry on failure (with fixes)

Step 4.3: Compliance Evidence
├── Generate timestamped test reports
├── Hash all screenshot evidence
├── Create audit trail entries
├── Export JUnit XML for CI/CD
└── Archive evidence per retention policy

Step 4.4: Testing
├── Unit tests:
│   ├── Tool parameter validation
│   ├── Result schema compliance
│   └── Error handling
├── Integration tests:
│   ├── Full browser automation flows
│   ├── Visual regression detection
│   ├── Concurrent session handling
│   └── Recovery from browser crashes
└── E2E tests:
    ├── QA Agent using browser tools
    ├── Checkpoint with test results
    └── Compliance evidence generation

5. QA Agent Prompt Enhancement

### Browser Verification Guidelines

When verifying UI/UX changes, use browser automation tools:

1. **After implementing UI changes:**
   - Use `verify_page_loads` to confirm page renders
   - Use `verify_element_exists` for new components
   - Check console for JavaScript errors

2. **For user-facing features:**
   - Define user flow with `verify_user_flow`
   - Include critical path steps
   - Verify success conditions

3. **For visual changes:**
   - Capture baseline with `capture_visual_baseline`
   - Compare changes with `compare_visual_regression`
   - Review diff images for unintended changes

4. **Before marking task complete:**
   - All browser tests must pass
   - No console errors (or documented exceptions)
   - Visual regressions reviewed and approved

5. **Evidence requirements:**
   - Screenshots at key verification points
   - Console logs for debugging
   - Test results in checkpoint

6. Configuration Schema

# coditect-config.yaml
qa_agent:
  browser_automation:
    enabled: true
    
    mcp_server:
      name: "playwright-mcp"
      transport: "stdio"
      command: "npx"
      args: ["playwright-mcp"]
      
    browser:
      type: "chromium"  # chromium, firefox, webkit
      headless: true
      viewport:
        width: 1280
        height: 720
      timeout_ms: 30000
      
    screenshots:
      format: "png"
      full_page: true
      quality: 90
      
    visual_regression:
      threshold: 0.01
      ignore_antialiasing: true
      ignore_colors: false
      
    network:
      allowed_hosts:
        - "localhost"
        - "*.test.coditect.io"
      block_external: true
      
    resource_limits:
      max_sessions: 5
      memory_mb: 512
      timeout_ms: 60000
      
    compliance:
      screenshot_hashing: true
      evidence_retention_days: 365
      audit_logging: true

7. API Specification

interface BrowserAutomationService {
  // Session management
  launchBrowser(options?: BrowserOptions): Promise<SessionId>;
  closeBrowser(sessionId: SessionId): Promise<void>;
  
  // Navigation
  navigate(sessionId: SessionId, url: string): Promise<NavigationResult>;
  
  // Interactions
  click(sessionId: SessionId, selector: string): Promise<void>;
  fill(sessionId: SessionId, selector: string, value: string): Promise<void>;
  select(sessionId: SessionId, selector: string, option: string): Promise<void>;
  
  // Verification
  verifyPageLoads(params: VerifyPageParams): Promise<PageVerificationResult>;
  verifyElementExists(params: ElementVerifyParams): Promise<ElementVerificationResult>;
  verifyUserFlow(params: FlowVerifyParams): Promise<FlowVerificationResult>;
  
  // Visual testing
  captureBaseline(params: BaselineCaptureParams): Promise<BaselineResult>;
  compareVisual(params: VisualCompareParams): Promise<VisualCompareResult>;
  
  // Diagnostics
  getConsoleLogs(sessionId: SessionId): Promise<ConsoleLog[]>;
  getNetworkLogs(sessionId: SessionId): Promise<NetworkLog[]>;
  captureScreenshot(sessionId: SessionId, options?: ScreenshotOptions): Promise<Screenshot>;
}

interface FlowVerificationResult {
  success: boolean;
  stepsCompleted: number;
  totalSteps: number;
  failureStep?: number;
  failureReason?: string;
  screenshots: Screenshot[];
  consoleLogs: ConsoleLog[];
  duration_ms: number;
  evidence: {
    hash: string;
    timestamp: string;
  };
}

8. Dependencies

Dependency	Type	Status
Playwright	Library	✅ Available (npm)
playwright-mcp	MCP Server	✅ Available
QA Agent	Platform	✅ Requires update
MCP Infrastructure	Platform	✅ Available
Object Storage (screenshots)	Infrastructure	⚠️ May need setup
Checkpoint Service	Platform	🔄 IMPL-REQ-001

9. Risks and Mitigations

Risk	Impact	Likelihood	Mitigation
Browser crashes	Test failures	Medium	Auto-restart, session isolation
Flaky tests	False negatives	High	Retry logic, wait strategies
Resource exhaustion	System instability	Medium	Limits, cleanup on termination
Visual diff false positives	Noise	Medium	Ignore regions, thresholds
Slow test execution	Agent delays	Medium	Parallelization, timeouts

10. Acceptance Criteria

QA Agent can launch headless browser via MCP
Page load verification works with screenshot capture
User flow execution completes multi-step scenarios
Visual regression detects intentional and unintentional changes
Console error detection identifies JavaScript issues
Test results integrate with checkpoint system
Compliance evidence meets FDA/SOC2 requirements
Performance targets achieved under load
Documentation complete with examples

Document Version: 1.0 | Last Updated: January 24, 2026

1. Overview​

1.1 Problem Statement​

1.2 Objective​

1.3 Success Criteria​

2. Functional Requirements​

2.1 Playwright MCP Server Integration​

2.2 QA Agent Tool Definitions​

2.3 Flow Step Definition​

2.4 Test Result Reporting​

3. Non-Functional Requirements​

3.1 Performance​

3.2 Reliability​

3.3 Security​

3.4 Compliance​

4. Implementation Steps​

Phase 1: MCP Server Setup (Week 1)​

Phase 2: Core Browser Tools (Week 2)​

Phase 3: Visual Testing (Week 3)​

Phase 4: Integration and Testing (Week 4)​

5. QA Agent Prompt Enhancement​

6. Configuration Schema​

7. API Specification​

8. Dependencies​

9. Risks and Mitigations​

10. Acceptance Criteria​