Implementation Requirements: QA Agent Browser Automation
Document ID: IMPL-REQ-002
Priority: P0 (Critical Path)
Target ADR: ADR-109 (Proposed)
Estimated Effort: 2 Sprints
Dependencies: QA Agent, MCP Infrastructure, Playwright
1. Overview
1.1 Problem Statement
The Ralph Wiggum community has empirically validated that self-verification is non-negotiable for autonomous development loops. Without the ability to verify UI/UX changes, agents cannot:
- Confirm features work end-to-end
- Detect visual regressions
- Validate user flows
- Generate compliance evidence of testing
Current QA Agent capabilities are limited to unit/integration tests—insufficient for full-stack autonomous development.
1.2 Objective
Integrate Playwright browser automation into the QA Agent, enabling:
- Headless browser testing within autonomous loops
- Screenshot capture for visual validation
- Console log analysis for error detection
- E2E test execution with result reporting
1.3 Success Criteria
| Metric | Target |
|---|---|
| E2E test execution time | < 30s per test |
| Screenshot capture reliability | > 99% |
| False positive rate | < 5% |
| Integration with checkpoint system | 100% |
2. Functional Requirements
2.1 Playwright MCP Server Integration
FR-001: MCP Server Configuration
MUST integrate Playwright MCP server with capabilities:
├── browser_launch(browser_type, options) → session_id
├── page_navigate(session_id, url) → void
├── page_screenshot(session_id, options) → base64_image
├── page_click(session_id, selector) → void
├── page_fill(session_id, selector, value) → void
├── page_evaluate(session_id, script) → result
├── page_wait_for_selector(session_id, selector, timeout) → void
├── page_get_console_logs(session_id) → [LogEntry]
├── browser_close(session_id) → void
└── network_intercept(session_id, patterns) → [NetworkEvent]
Configuration:
server_name: "playwright-mcp"
transport: "stdio"
browser_defaults:
headless: true
viewport: { width: 1280, height: 720 }
timeout: 30000
2.2 QA Agent Tool Definitions
FR-002: QA Agent Browser Tools
tools:
- name: "verify_page_loads"
description: "Navigate to URL and verify page loads successfully"
parameters:
url: string
expected_title: string (optional)
timeout_ms: integer (default: 10000)
returns:
success: boolean
screenshot: base64
load_time_ms: integer
console_errors: array
- name: "verify_element_exists"
description: "Check if element exists on page"
parameters:
url: string
selector: string
should_be_visible: boolean (default: true)
returns:
exists: boolean
visible: boolean
screenshot: base64
- name: "verify_user_flow"
description: "Execute multi-step user flow and verify outcome"
parameters:
start_url: string
steps: array of FlowStep
success_condition: string (selector or text)
returns:
success: boolean
steps_completed: integer
failure_step: integer (if failed)
screenshots: array of base64
console_logs: array
- name: "capture_visual_baseline"
description: "Capture screenshot for visual regression baseline"
parameters:
url: string
name: string
selectors: array (optional, for component screenshots)
returns:
baseline_id: string
screenshots: array of { name, base64 }
- name: "compare_visual_regression"
description: "Compare current state against visual baseline"
parameters:
url: string
baseline_id: string
threshold: float (default: 0.01)
returns:
passed: boolean
diff_percentage: float
diff_image: base64 (if failed)
- name: "analyze_console_errors"
description: "Check page for JavaScript errors"
parameters:
url: string
ignore_patterns: array (optional)
returns:
has_errors: boolean
errors: array of { message, source, line }
warnings: array
2.3 Flow Step Definition
FR-003: User Flow Step Schema
interface FlowStep {
action: 'navigate' | 'click' | 'fill' | 'select' | 'wait' | 'assert' | 'screenshot';
// For navigate
url?: string;
// For click, fill, select, wait, assert
selector?: string;
// For fill
value?: string;
// For select
option?: string;
// For wait
timeout_ms?: number;
// For assert
assertion?: {
type: 'exists' | 'visible' | 'text_contains' | 'value_equals';
expected?: string;
};
// For screenshot
name?: string;
// Common
description?: string; // Human-readable step description
}
2.4 Test Result Reporting
FR-004: Browser Test Results Integration
Test results MUST integrate with:
├── Checkpoint system (include in checkpoint metrics)
├── Compliance audit trail (evidence of testing)
├── CI/CD pipeline (JUnit XML output)
└── Agent decision making (pass/fail affects next steps)
Result schema:
test_run_id: string
timestamp: datetime
duration_ms: integer
status: 'passed' | 'failed' | 'error' | 'skipped'
test_cases:
- name: string
status: 'passed' | 'failed' | 'error'
duration_ms: integer
screenshots: array
console_logs: array
failure_reason: string (if failed)
summary:
total: integer
passed: integer
failed: integer
error: integer
skipped: integer
artifacts:
screenshots: array of { name, path, base64 }
videos: array of { name, path } (optional)
traces: array of { name, path } (optional)
3. Non-Functional Requirements
3.1 Performance
| Requirement | Specification |
|---|---|
| NFR-001 | Browser launch < 3s |
| NFR-002 | Page navigation < 10s (timeout configurable) |
| NFR-003 | Screenshot capture < 500ms |
| NFR-004 | Concurrent sessions: up to 5 |
| NFR-005 | Memory usage < 512MB per browser instance |
3.2 Reliability
| Requirement | Specification |
|---|---|
| NFR-006 | Auto-retry on transient failures (3x) |
| NFR-007 | Graceful timeout handling |
| NFR-008 | Browser crash recovery |
| NFR-009 | Session cleanup on agent termination |
3.3 Security
| Requirement | Specification |
|---|---|
| NFR-010 | Sandbox browser execution |
| NFR-011 | No access to host filesystem |
| NFR-012 | Network isolation (configurable allowlist) |
| NFR-013 | No credential persistence |
3.4 Compliance
| Requirement | Specification |
|---|---|
| NFR-014 | Screenshot evidence for FDA 21 CFR Part 11 |
| NFR-015 | Test execution logs for SOC2 audit |
| NFR-016 | Timestamp verification for test evidence |
| NFR-017 | Hash verification of screenshot integrity |
4. Implementation Steps
Phase 1: MCP Server Setup (Week 1)
Step 1.1: Playwright MCP Server Installation
├── Add playwright-mcp to MCP server configuration
├── Configure browser binaries (Chromium default)
├── Set up headless execution environment
├── Configure resource limits (memory, CPU)
└── Validate MCP protocol communication
Step 1.2: MCP Client Integration
├── Update QA Agent MCP client configuration
├── Add playwright-mcp to allowed servers
├── Implement tool discovery for browser tools
├── Add error handling for MCP communication
└── Test basic connectivity
Step 1.3: Environment Configuration
├── Docker container setup for browser isolation
├── X virtual framebuffer (Xvfb) for headless
├── Font installation for consistent rendering
├── Timezone configuration for timestamp consistency
└── Network configuration for test environments
Phase 2: Core Browser Tools (Week 2)
Step 2.1: Basic Navigation Tools
├── Implement verify_page_loads
│ ├── Launch browser session
│ ├── Navigate to URL
│ ├── Wait for load event
│ ├── Capture screenshot
│ ├── Collect console logs
│ └── Return result with metrics
├── Implement verify_element_exists
│ ├── Navigate to page
│ ├── Query selector
│ ├── Check visibility
│ └── Capture element screenshot
└── Add timeout and retry logic
Step 2.2: Interaction Tools
├── Implement click action
│ ├── Wait for element
│ ├── Scroll into view
│ ├── Click with retry
│ └── Wait for navigation (if applicable)
├── Implement fill action
│ ├── Clear existing value
│ ├── Type with realistic delay
│ └── Verify value entered
└── Implement select action
├── Find select element
├── Choose option
└── Verify selection
Step 2.3: User Flow Execution
├── Implement verify_user_flow
│ ├── Parse flow steps
│ ├── Execute steps sequentially
│ ├── Capture screenshot after each step
│ ├── Collect console logs throughout
│ ├── Detect failures and stop
│ └── Return comprehensive result
├── Add step-level error handling
└── Implement flow timeout (aggregate)
Phase 3: Visual Testing (Week 3)
Step 3.1: Baseline Management
├── Implement capture_visual_baseline
│ ├── Navigate and wait for stability
│ ├── Capture full-page screenshot
│ ├── Capture component screenshots (if selectors)
│ ├── Store with baseline_id
│ └── Hash for integrity
├── Design baseline storage:
│ ├── FoundationDB for metadata
│ ├── Object storage for images
│ └── Versioning for baseline updates
└── Implement baseline retrieval
Step 3.2: Visual Comparison
├── Implement compare_visual_regression
│ ├── Capture current screenshot
│ ├── Retrieve baseline
│ ├── Pixel-diff comparison
│ ├── Calculate diff percentage
│ ├── Generate diff image (highlight changes)
│ └── Return pass/fail with evidence
├── Configure comparison thresholds
│ ├── Global default (1%)
│ ├── Per-component overrides
│ └── Ignore regions (dynamic content)
└── Add anti-aliasing tolerance
Step 3.3: Visual Test Integration
├── Add visual checks to verify_user_flow
├── Auto-capture baselines on first run
├── Flag visual regressions in results
└── Generate visual diff reports
Phase 4: Integration and Testing (Week 4)
Step 4.1: Checkpoint Integration
├── Add browser test results to checkpoint schema
├── Include screenshots in checkpoint artifacts
├── Link test runs to agent iterations
└── Enable recovery with test state
Step 4.2: Agent Decision Integration
├── Update QA Agent prompt to use browser tools
├── Define verification strategies:
│ ├── After implementation changes
│ ├── Before marking task complete
│ └── On-demand from orchestrator
├── Implement pass/fail decision logic
└── Add retry on failure (with fixes)
Step 4.3: Compliance Evidence
├── Generate timestamped test reports
├── Hash all screenshot evidence
├── Create audit trail entries
├── Export JUnit XML for CI/CD
└── Archive evidence per retention policy
Step 4.4: Testing
├── Unit tests:
│ ├── Tool parameter validation
│ ├── Result schema compliance
│ └── Error handling
├── Integration tests:
│ ├── Full browser automation flows
│ ├── Visual regression detection
│ ├── Concurrent session handling
│ └── Recovery from browser crashes
└── E2E tests:
├── QA Agent using browser tools
├── Checkpoint with test results
└── Compliance evidence generation
5. QA Agent Prompt Enhancement
### Browser Verification Guidelines
When verifying UI/UX changes, use browser automation tools:
1. **After implementing UI changes:**
- Use `verify_page_loads` to confirm page renders
- Use `verify_element_exists` for new components
- Check console for JavaScript errors
2. **For user-facing features:**
- Define user flow with `verify_user_flow`
- Include critical path steps
- Verify success conditions
3. **For visual changes:**
- Capture baseline with `capture_visual_baseline`
- Compare changes with `compare_visual_regression`
- Review diff images for unintended changes
4. **Before marking task complete:**
- All browser tests must pass
- No console errors (or documented exceptions)
- Visual regressions reviewed and approved
5. **Evidence requirements:**
- Screenshots at key verification points
- Console logs for debugging
- Test results in checkpoint
6. Configuration Schema
# coditect-config.yaml
qa_agent:
browser_automation:
enabled: true
mcp_server:
name: "playwright-mcp"
transport: "stdio"
command: "npx"
args: ["playwright-mcp"]
browser:
type: "chromium" # chromium, firefox, webkit
headless: true
viewport:
width: 1280
height: 720
timeout_ms: 30000
screenshots:
format: "png"
full_page: true
quality: 90
visual_regression:
threshold: 0.01
ignore_antialiasing: true
ignore_colors: false
network:
allowed_hosts:
- "localhost"
- "*.test.coditect.io"
block_external: true
resource_limits:
max_sessions: 5
memory_mb: 512
timeout_ms: 60000
compliance:
screenshot_hashing: true
evidence_retention_days: 365
audit_logging: true
7. API Specification
interface BrowserAutomationService {
// Session management
launchBrowser(options?: BrowserOptions): Promise<SessionId>;
closeBrowser(sessionId: SessionId): Promise<void>;
// Navigation
navigate(sessionId: SessionId, url: string): Promise<NavigationResult>;
// Interactions
click(sessionId: SessionId, selector: string): Promise<void>;
fill(sessionId: SessionId, selector: string, value: string): Promise<void>;
select(sessionId: SessionId, selector: string, option: string): Promise<void>;
// Verification
verifyPageLoads(params: VerifyPageParams): Promise<PageVerificationResult>;
verifyElementExists(params: ElementVerifyParams): Promise<ElementVerificationResult>;
verifyUserFlow(params: FlowVerifyParams): Promise<FlowVerificationResult>;
// Visual testing
captureBaseline(params: BaselineCaptureParams): Promise<BaselineResult>;
compareVisual(params: VisualCompareParams): Promise<VisualCompareResult>;
// Diagnostics
getConsoleLogs(sessionId: SessionId): Promise<ConsoleLog[]>;
getNetworkLogs(sessionId: SessionId): Promise<NetworkLog[]>;
captureScreenshot(sessionId: SessionId, options?: ScreenshotOptions): Promise<Screenshot>;
}
interface FlowVerificationResult {
success: boolean;
stepsCompleted: number;
totalSteps: number;
failureStep?: number;
failureReason?: string;
screenshots: Screenshot[];
consoleLogs: ConsoleLog[];
duration_ms: number;
evidence: {
hash: string;
timestamp: string;
};
}
8. Dependencies
| Dependency | Type | Status |
|---|---|---|
| Playwright | Library | ✅ Available (npm) |
| playwright-mcp | MCP Server | ✅ Available |
| QA Agent | Platform | ✅ Requires update |
| MCP Infrastructure | Platform | ✅ Available |
| Object Storage (screenshots) | Infrastructure | ⚠️ May need setup |
| Checkpoint Service | Platform | 🔄 IMPL-REQ-001 |
9. Risks and Mitigations
| Risk | Impact | Likelihood | Mitigation |
|---|---|---|---|
| Browser crashes | Test failures | Medium | Auto-restart, session isolation |
| Flaky tests | False negatives | High | Retry logic, wait strategies |
| Resource exhaustion | System instability | Medium | Limits, cleanup on termination |
| Visual diff false positives | Noise | Medium | Ignore regions, thresholds |
| Slow test execution | Agent delays | Medium | Parallelization, timeouts |
10. Acceptance Criteria
- QA Agent can launch headless browser via MCP
- Page load verification works with screenshot capture
- User flow execution completes multi-step scenarios
- Visual regression detects intentional and unintentional changes
- Console error detection identifies JavaScript issues
- Test results integrate with checkpoint system
- Compliance evidence meets FDA/SOC2 requirements
- Performance targets achieved under load
- Documentation complete with examples
Document Version: 1.0 | Last Updated: January 24, 2026