Skip to main content

Implementation Requirements: QA Agent Browser Automation

Document ID: IMPL-REQ-002
Priority: P0 (Critical Path)
Target ADR: ADR-109 (Proposed)
Estimated Effort: 2 Sprints
Dependencies: QA Agent, MCP Infrastructure, Playwright


1. Overview

1.1 Problem Statement

The Ralph Wiggum community has empirically validated that self-verification is non-negotiable for autonomous development loops. Without the ability to verify UI/UX changes, agents cannot:

  • Confirm features work end-to-end
  • Detect visual regressions
  • Validate user flows
  • Generate compliance evidence of testing

Current QA Agent capabilities are limited to unit/integration tests—insufficient for full-stack autonomous development.

1.2 Objective

Integrate Playwright browser automation into the QA Agent, enabling:

  • Headless browser testing within autonomous loops
  • Screenshot capture for visual validation
  • Console log analysis for error detection
  • E2E test execution with result reporting

1.3 Success Criteria

MetricTarget
E2E test execution time< 30s per test
Screenshot capture reliability> 99%
False positive rate< 5%
Integration with checkpoint system100%

2. Functional Requirements

2.1 Playwright MCP Server Integration

FR-001: MCP Server Configuration

MUST integrate Playwright MCP server with capabilities:
├── browser_launch(browser_type, options) → session_id
├── page_navigate(session_id, url) → void
├── page_screenshot(session_id, options) → base64_image
├── page_click(session_id, selector) → void
├── page_fill(session_id, selector, value) → void
├── page_evaluate(session_id, script) → result
├── page_wait_for_selector(session_id, selector, timeout) → void
├── page_get_console_logs(session_id) → [LogEntry]
├── browser_close(session_id) → void
└── network_intercept(session_id, patterns) → [NetworkEvent]

Configuration:
server_name: "playwright-mcp"
transport: "stdio"
browser_defaults:
headless: true
viewport: { width: 1280, height: 720 }
timeout: 30000

2.2 QA Agent Tool Definitions

FR-002: QA Agent Browser Tools

tools:
- name: "verify_page_loads"
description: "Navigate to URL and verify page loads successfully"
parameters:
url: string
expected_title: string (optional)
timeout_ms: integer (default: 10000)
returns:
success: boolean
screenshot: base64
load_time_ms: integer
console_errors: array

- name: "verify_element_exists"
description: "Check if element exists on page"
parameters:
url: string
selector: string
should_be_visible: boolean (default: true)
returns:
exists: boolean
visible: boolean
screenshot: base64

- name: "verify_user_flow"
description: "Execute multi-step user flow and verify outcome"
parameters:
start_url: string
steps: array of FlowStep
success_condition: string (selector or text)
returns:
success: boolean
steps_completed: integer
failure_step: integer (if failed)
screenshots: array of base64
console_logs: array

- name: "capture_visual_baseline"
description: "Capture screenshot for visual regression baseline"
parameters:
url: string
name: string
selectors: array (optional, for component screenshots)
returns:
baseline_id: string
screenshots: array of { name, base64 }

- name: "compare_visual_regression"
description: "Compare current state against visual baseline"
parameters:
url: string
baseline_id: string
threshold: float (default: 0.01)
returns:
passed: boolean
diff_percentage: float
diff_image: base64 (if failed)

- name: "analyze_console_errors"
description: "Check page for JavaScript errors"
parameters:
url: string
ignore_patterns: array (optional)
returns:
has_errors: boolean
errors: array of { message, source, line }
warnings: array

2.3 Flow Step Definition

FR-003: User Flow Step Schema

interface FlowStep {
action: 'navigate' | 'click' | 'fill' | 'select' | 'wait' | 'assert' | 'screenshot';

// For navigate
url?: string;

// For click, fill, select, wait, assert
selector?: string;

// For fill
value?: string;

// For select
option?: string;

// For wait
timeout_ms?: number;

// For assert
assertion?: {
type: 'exists' | 'visible' | 'text_contains' | 'value_equals';
expected?: string;
};

// For screenshot
name?: string;

// Common
description?: string; // Human-readable step description
}

2.4 Test Result Reporting

FR-004: Browser Test Results Integration

Test results MUST integrate with:
├── Checkpoint system (include in checkpoint metrics)
├── Compliance audit trail (evidence of testing)
├── CI/CD pipeline (JUnit XML output)
└── Agent decision making (pass/fail affects next steps)

Result schema:
test_run_id: string
timestamp: datetime
duration_ms: integer
status: 'passed' | 'failed' | 'error' | 'skipped'

test_cases:
- name: string
status: 'passed' | 'failed' | 'error'
duration_ms: integer
screenshots: array
console_logs: array
failure_reason: string (if failed)

summary:
total: integer
passed: integer
failed: integer
error: integer
skipped: integer

artifacts:
screenshots: array of { name, path, base64 }
videos: array of { name, path } (optional)
traces: array of { name, path } (optional)

3. Non-Functional Requirements

3.1 Performance

RequirementSpecification
NFR-001Browser launch < 3s
NFR-002Page navigation < 10s (timeout configurable)
NFR-003Screenshot capture < 500ms
NFR-004Concurrent sessions: up to 5
NFR-005Memory usage < 512MB per browser instance

3.2 Reliability

RequirementSpecification
NFR-006Auto-retry on transient failures (3x)
NFR-007Graceful timeout handling
NFR-008Browser crash recovery
NFR-009Session cleanup on agent termination

3.3 Security

RequirementSpecification
NFR-010Sandbox browser execution
NFR-011No access to host filesystem
NFR-012Network isolation (configurable allowlist)
NFR-013No credential persistence

3.4 Compliance

RequirementSpecification
NFR-014Screenshot evidence for FDA 21 CFR Part 11
NFR-015Test execution logs for SOC2 audit
NFR-016Timestamp verification for test evidence
NFR-017Hash verification of screenshot integrity

4. Implementation Steps

Phase 1: MCP Server Setup (Week 1)

Step 1.1: Playwright MCP Server Installation
├── Add playwright-mcp to MCP server configuration
├── Configure browser binaries (Chromium default)
├── Set up headless execution environment
├── Configure resource limits (memory, CPU)
└── Validate MCP protocol communication

Step 1.2: MCP Client Integration
├── Update QA Agent MCP client configuration
├── Add playwright-mcp to allowed servers
├── Implement tool discovery for browser tools
├── Add error handling for MCP communication
└── Test basic connectivity

Step 1.3: Environment Configuration
├── Docker container setup for browser isolation
├── X virtual framebuffer (Xvfb) for headless
├── Font installation for consistent rendering
├── Timezone configuration for timestamp consistency
└── Network configuration for test environments

Phase 2: Core Browser Tools (Week 2)

Step 2.1: Basic Navigation Tools
├── Implement verify_page_loads
│ ├── Launch browser session
│ ├── Navigate to URL
│ ├── Wait for load event
│ ├── Capture screenshot
│ ├── Collect console logs
│ └── Return result with metrics
├── Implement verify_element_exists
│ ├── Navigate to page
│ ├── Query selector
│ ├── Check visibility
│ └── Capture element screenshot
└── Add timeout and retry logic

Step 2.2: Interaction Tools
├── Implement click action
│ ├── Wait for element
│ ├── Scroll into view
│ ├── Click with retry
│ └── Wait for navigation (if applicable)
├── Implement fill action
│ ├── Clear existing value
│ ├── Type with realistic delay
│ └── Verify value entered
└── Implement select action
├── Find select element
├── Choose option
└── Verify selection

Step 2.3: User Flow Execution
├── Implement verify_user_flow
│ ├── Parse flow steps
│ ├── Execute steps sequentially
│ ├── Capture screenshot after each step
│ ├── Collect console logs throughout
│ ├── Detect failures and stop
│ └── Return comprehensive result
├── Add step-level error handling
└── Implement flow timeout (aggregate)

Phase 3: Visual Testing (Week 3)

Step 3.1: Baseline Management
├── Implement capture_visual_baseline
│ ├── Navigate and wait for stability
│ ├── Capture full-page screenshot
│ ├── Capture component screenshots (if selectors)
│ ├── Store with baseline_id
│ └── Hash for integrity
├── Design baseline storage:
│ ├── FoundationDB for metadata
│ ├── Object storage for images
│ └── Versioning for baseline updates
└── Implement baseline retrieval

Step 3.2: Visual Comparison
├── Implement compare_visual_regression
│ ├── Capture current screenshot
│ ├── Retrieve baseline
│ ├── Pixel-diff comparison
│ ├── Calculate diff percentage
│ ├── Generate diff image (highlight changes)
│ └── Return pass/fail with evidence
├── Configure comparison thresholds
│ ├── Global default (1%)
│ ├── Per-component overrides
│ └── Ignore regions (dynamic content)
└── Add anti-aliasing tolerance

Step 3.3: Visual Test Integration
├── Add visual checks to verify_user_flow
├── Auto-capture baselines on first run
├── Flag visual regressions in results
└── Generate visual diff reports

Phase 4: Integration and Testing (Week 4)

Step 4.1: Checkpoint Integration
├── Add browser test results to checkpoint schema
├── Include screenshots in checkpoint artifacts
├── Link test runs to agent iterations
└── Enable recovery with test state

Step 4.2: Agent Decision Integration
├── Update QA Agent prompt to use browser tools
├── Define verification strategies:
│ ├── After implementation changes
│ ├── Before marking task complete
│ └── On-demand from orchestrator
├── Implement pass/fail decision logic
└── Add retry on failure (with fixes)

Step 4.3: Compliance Evidence
├── Generate timestamped test reports
├── Hash all screenshot evidence
├── Create audit trail entries
├── Export JUnit XML for CI/CD
└── Archive evidence per retention policy

Step 4.4: Testing
├── Unit tests:
│ ├── Tool parameter validation
│ ├── Result schema compliance
│ └── Error handling
├── Integration tests:
│ ├── Full browser automation flows
│ ├── Visual regression detection
│ ├── Concurrent session handling
│ └── Recovery from browser crashes
└── E2E tests:
├── QA Agent using browser tools
├── Checkpoint with test results
└── Compliance evidence generation

5. QA Agent Prompt Enhancement

### Browser Verification Guidelines

When verifying UI/UX changes, use browser automation tools:

1. **After implementing UI changes:**
- Use `verify_page_loads` to confirm page renders
- Use `verify_element_exists` for new components
- Check console for JavaScript errors

2. **For user-facing features:**
- Define user flow with `verify_user_flow`
- Include critical path steps
- Verify success conditions

3. **For visual changes:**
- Capture baseline with `capture_visual_baseline`
- Compare changes with `compare_visual_regression`
- Review diff images for unintended changes

4. **Before marking task complete:**
- All browser tests must pass
- No console errors (or documented exceptions)
- Visual regressions reviewed and approved

5. **Evidence requirements:**
- Screenshots at key verification points
- Console logs for debugging
- Test results in checkpoint

6. Configuration Schema

# coditect-config.yaml
qa_agent:
browser_automation:
enabled: true

mcp_server:
name: "playwright-mcp"
transport: "stdio"
command: "npx"
args: ["playwright-mcp"]

browser:
type: "chromium" # chromium, firefox, webkit
headless: true
viewport:
width: 1280
height: 720
timeout_ms: 30000

screenshots:
format: "png"
full_page: true
quality: 90

visual_regression:
threshold: 0.01
ignore_antialiasing: true
ignore_colors: false

network:
allowed_hosts:
- "localhost"
- "*.test.coditect.io"
block_external: true

resource_limits:
max_sessions: 5
memory_mb: 512
timeout_ms: 60000

compliance:
screenshot_hashing: true
evidence_retention_days: 365
audit_logging: true

7. API Specification

interface BrowserAutomationService {
// Session management
launchBrowser(options?: BrowserOptions): Promise<SessionId>;
closeBrowser(sessionId: SessionId): Promise<void>;

// Navigation
navigate(sessionId: SessionId, url: string): Promise<NavigationResult>;

// Interactions
click(sessionId: SessionId, selector: string): Promise<void>;
fill(sessionId: SessionId, selector: string, value: string): Promise<void>;
select(sessionId: SessionId, selector: string, option: string): Promise<void>;

// Verification
verifyPageLoads(params: VerifyPageParams): Promise<PageVerificationResult>;
verifyElementExists(params: ElementVerifyParams): Promise<ElementVerificationResult>;
verifyUserFlow(params: FlowVerifyParams): Promise<FlowVerificationResult>;

// Visual testing
captureBaseline(params: BaselineCaptureParams): Promise<BaselineResult>;
compareVisual(params: VisualCompareParams): Promise<VisualCompareResult>;

// Diagnostics
getConsoleLogs(sessionId: SessionId): Promise<ConsoleLog[]>;
getNetworkLogs(sessionId: SessionId): Promise<NetworkLog[]>;
captureScreenshot(sessionId: SessionId, options?: ScreenshotOptions): Promise<Screenshot>;
}

interface FlowVerificationResult {
success: boolean;
stepsCompleted: number;
totalSteps: number;
failureStep?: number;
failureReason?: string;
screenshots: Screenshot[];
consoleLogs: ConsoleLog[];
duration_ms: number;
evidence: {
hash: string;
timestamp: string;
};
}

8. Dependencies

DependencyTypeStatus
PlaywrightLibrary✅ Available (npm)
playwright-mcpMCP Server✅ Available
QA AgentPlatform✅ Requires update
MCP InfrastructurePlatform✅ Available
Object Storage (screenshots)Infrastructure⚠️ May need setup
Checkpoint ServicePlatform🔄 IMPL-REQ-001

9. Risks and Mitigations

RiskImpactLikelihoodMitigation
Browser crashesTest failuresMediumAuto-restart, session isolation
Flaky testsFalse negativesHighRetry logic, wait strategies
Resource exhaustionSystem instabilityMediumLimits, cleanup on termination
Visual diff false positivesNoiseMediumIgnore regions, thresholds
Slow test executionAgent delaysMediumParallelization, timeouts

10. Acceptance Criteria

  • QA Agent can launch headless browser via MCP
  • Page load verification works with screenshot capture
  • User flow execution completes multi-step scenarios
  • Visual regression detects intentional and unintentional changes
  • Console error detection identifies JavaScript issues
  • Test results integrate with checkpoint system
  • Compliance evidence meets FDA/SOC2 requirements
  • Performance targets achieved under load
  • Documentation complete with examples

Document Version: 1.0 | Last Updated: January 24, 2026