Lite Browser QA Skill

A lightweight, tiered approach to browser-based visual QA — from quick screenshots to full interaction testing, without the flakiness of traditional browser automation.

What problem does this solve?

Every time a developer changes CSS, HTML, or a UI component, someone needs to verify it actually looks right in a browser. Traditional approaches have two extremes:

Manual checking

Open the browser, click around, eyeball everything. Slow, error-prone, and impossible to reproduce consistently.

Full E2E automation

Playwright/Selenium suites that are expensive to maintain, break on minor DOM changes, and take minutes to run.

The Lite Browser QA skill sits in the sweet spot: it gives AI agents (and humans) a fast, reliable way to verify UI changes using the lightest tool that fits the job.

Think of it like a medical triage system. A nurse doesn't order an MRI for every patient — a quick check, a thermometer reading, and a stethoscope handle 90% of cases. The MRI is reserved for when it's truly needed. This skill works the same way: quick screenshot for most checks, scripted interaction when needed, full browser automation only as a last resort.

The three-tier architecture

The skill picks the lightest tool that fits the task, organised into three tiers. Each tier adds capability but also adds complexity and execution time.

Tier 1

90%

Static Screenshot

Headless Puppeteer captures a PNG of the page. Handles desktop, mobile, tablet, and element-specific shots. Takes 3–5 seconds.

Tier 1b

GIF

GIF Recording

Captures sequential frames over a duration, stitches into an animated GIF. For animations, transitions, hover effects, and loading states.

Tier 2

Scripted Interaction

Inline Puppeteer scripts that click buttons, fill forms, navigate between states, and screenshot at each step.

Tier 3

Full Browser Automation

Claude-in-Chrome, Playwright MCP, or Antigravity Browser for live interaction in the user’s actual browser session.

Tier selection decision flow

When an agent needs to verify a UI change, it walks through this decision tree to pick the right tier:

flowchart TD Start([UI Change Detected]) --> Q1{Is it a\nUI change?} Q1 -->|No| API[API-only test\ncurl / fetch] API --> Done1([Done]) Q1 -->|Yes| Q2{Animation or\ntransition?} Q2 -->|Yes| T1b["Tier 1b: GIF Recording\nscreenshot.mjs --record"] T1b --> Done2([Done]) Q2 -->|No| Q3{Needs user\ninteraction?} Q3 -->|No| T1["Tier 1: Static Screenshot\nscreenshot.mjs --preset desktop"] T1 --> Done3([Done]) Q3 -->|Yes| Q4{Can script\nthe interaction?} Q4 -->|Yes| T2["Tier 2: Scripted Puppeteer\nInline node -e script"] T2 --> Done4([Done]) Q4 -->|No| T3["Tier 3: Full Browser\nclaude-in-chrome / Playwright"] T3 --> Done5([Done]) style Start fill:#1e2a4a,stroke:#00e5ff,color:#e0e6f0 style Done1 fill:#1e2a4a,stroke:#00e676,color:#e0e6f0 style Done2 fill:#1e2a4a,stroke:#00e676,color:#e0e6f0 style Done3 fill:#1e2a4a,stroke:#00e676,color:#e0e6f0 style Done4 fill:#1e2a4a,stroke:#00e676,color:#e0e6f0 style Done5 fill:#1e2a4a,stroke:#00e676,color:#e0e6f0 style Q1 fill:#141a2e,stroke:#ffab40,color:#e0e6f0 style Q2 fill:#141a2e,stroke:#ffab40,color:#e0e6f0 style Q3 fill:#141a2e,stroke:#ffab40,color:#e0e6f0 style Q4 fill:#141a2e,stroke:#ffab40,color:#e0e6f0 style API fill:#141a2e,stroke:#5a6580,color:#e0e6f0 style T1 fill:#141a2e,stroke:#00e676,color:#e0e6f0 style T1b fill:#141a2e,stroke:#00e5ff,color:#e0e6f0 style T2 fill:#141a2e,stroke:#ffab40,color:#e0e6f0 style T3 fill:#141a2e,stroke:#ff5252,color:#e0e6f0

Tier selection decision tree — always pick the lightest tool that fits

BPMN process diagram (swimlane)

The same decision flow rendered as a formal BPMN 2.0 process diagram with the QA Agent as a swimlane. This was generated using the bpmn-creator skill with ISO/IEC 19510 compliant XML:

BPMN 2.0 tier selection process — exclusive gateways with Yes/No branching

Click any diagram or image on this page to zoom in. Press Escape to close.

How each tier works

Tier 1: Static screenshot

The workhorse of the skill. A single Node.js script (screenshot.mjs) launches an isolated headless Chrome instance via Puppeteer, navigates to the target URL, waits for rendering, and captures a PNG.

Key design choice: Instead of installing a separate 300MB Chromium, the script reuses the Chrome bundled with @mermaid-js/mermaid-cli that is already installed on the machine. Zero extra disk footprint.

Flag	Default	Description
--url	required	Page URL to screenshot
--output	./screenshot.png	Output PNG path
--full-page	false	Capture full scrollable page
--preset	—	mobile (375x812) / tablet (768x1024) / laptop (1280x720) / desktop (1400x900)
--selector	—	CSS selector for element-level screenshot
--wait	2000	Milliseconds to wait after load (for JS rendering)
--wait-for	—	CSS selector to wait for before capture
--multi-viewport	false	Take shots at all 4 preset sizes
--check-console	false	Report JS errors and warnings to stdout
--timeout	30000	Navigation timeout in ms

# Desktop screenshot with console error check node screenshot.mjs --url http://localhost:5174 --output /tmp/qa.png --preset desktop --check-console # Element-level screenshot of a specific component node screenshot.mjs --url http://localhost:5174 --output /tmp/sidebar.png --selector "#sidebar" # All 4 viewport sizes at once (creates -mobile, -tablet, -laptop, -desktop files) node screenshot.mjs --url http://localhost:5174 --output /tmp/page.png --multi-viewport

Tier 1b: GIF recording

When the change involves animation, transitions, or time-based effects, a single PNG is not enough. Tier 1b captures sequential frames at a configurable FPS and stitches them into an animated GIF using ImageMagick.

Flag	Default	Description
--record <ms>	—	Recording duration in milliseconds
--fps <n>	5	Frames per second (8-10 for smooth playback)
--record-action <js>	—	JavaScript to execute mid-recording (hover, click)
--action-at <ms>	500	When to fire the action (ms after start)

# Record 3 seconds of mesh animation node screenshot.mjs --url http://localhost:5174 --record 3000 --output /tmp/anim.gif # Record hover effect on a button at 10fps node screenshot.mjs --url http://localhost:5174 --record 2000 --fps 10 --output /tmp/hover.gif \ --record-action "document.querySelector('#btn').dispatchEvent(new MouseEvent('mouseover',{bubbles:true}))" \ --action-at 500

Tier 2: Scripted interaction

For changes that require clicking buttons, filling forms, or navigating between UI states, agents write an inline Puppeteer script. This is still lightweight — no test framework, no config files, just a Node.js one-liner.

# Multi-step interaction: click menu, then screenshot node -e " import puppeteer from '...puppeteer.js'; const browser = await puppeteer.launch({headless:true, args:['--no-sandbox']}); const page = await browser.newPage(); await page.setViewport({width:1400, height:900}); await page.goto('http://localhost:5174', {waitUntil:'networkidle0'}); // Step 1: Initial state await page.screenshot({path: '/tmp/step1-initial.png'}); // Step 2: Open the menu await page.click('#menu-btn'); await new Promise(r => setTimeout(r, 500)); await page.screenshot({path: '/tmp/step2-menu-open.png'}); // Step 3: Click a room await page.click('[data-room=brainstorm]'); await new Promise(r => setTimeout(r, 1000)); await page.screenshot({path: '/tmp/step3-room.png'}); await browser.close(); "

Analysis tools pipeline

Screenshots alone answer “what does it look like?” but not “what changed?” or “what elements are present?”. Three analysis tools extend the skill beyond simple capture:

flowchart LR SS["Screenshot\n(PNG)"] --> VD["Visual Diff\n(ImageMagick)"] SS --> YL["YOLO v11\n(UI Detection)"] SS --> OP["OmniParser v2\n(Screen Parsing)"] VD --> VR["Pixel diff %\nSimilarity score\nVerdict"] YL --> YR["18 elements\nBounding boxes\n264ms"] OP --> OR["44 elements\nStructured list\n169ms"] style SS fill:#1e2a4a,stroke:#00e5ff,color:#e0e6f0 style VD fill:#141a2e,stroke:#ff5252,color:#e0e6f0 style YL fill:#141a2e,stroke:#00e676,color:#e0e6f0 style OP fill:#141a2e,stroke:#8855ff,color:#e0e6f0 style VR fill:#0f1425,stroke:#ff5252,color:#e0e6f0 style YR fill:#0f1425,stroke:#00e676,color:#e0e6f0 style OR fill:#0f1425,stroke:#8855ff,color:#e0e6f0

Analysis pipeline — one screenshot feeds three independent analysis tools

BPMN collaboration diagram (swimlanes by role)

This formal BPMN 2.0 collaboration diagram shows how the three analysis tools operate as independent pools, each receiving a screenshot from the QA Agent orchestrator via message flows. The parallel gateway in the QA Agent pool fans out to all three tools simultaneously:

BPMN QA Pipeline - Swimlane Collaboration

BPMN 2.0 collaboration — 3 pools, 4 swimlanes, parallel fan-out/join pattern

Tool	What it does	Output	Speed
Visual Diff	Pixel-level comparison between two screenshots using ImageMagick `compare`	Diff overlay image, similarity %, verdict (IDENTICAL / MINOR / SIGNIFICANT)	<1s
YOLO v11	Detects UI elements (buttons, inputs, icons, checkboxes) with bounding boxes	Annotated image with 18+ detected elements and confidence scores	264ms
OmniParser v2	Full screen understanding — extracts a structured list of every UI element	44+ icons/elements with positions and avg 68% confidence	169ms

Visual diff in detail

The visual-diff.sh script compares two screenshots pixel-by-pixel and produces a diff overlay image with changed areas highlighted in red:

# Compare before/after screenshots bash visual-diff.sh before.png after.png /tmp/diff.png # With custom fuzz threshold (10% colour tolerance) bash visual-diff.sh before.png after.png /tmp/diff.png 10%

The verdict classification:

Verdict	Threshold	Meaning
IDENTICAL	0 pixels	No visual change detected
NEGLIGIBLE	<100 pixels	Sub-pixel rendering differences only
MINOR	<1%	Small localised change
NOTABLE	1–5%	Review recommended
SIGNIFICANT	>5%	Visual regression likely

Complete QA workflow

This sequence diagram shows the full end-to-end workflow when an agent uses the Lite Browser QA skill to verify a UI change:

sequenceDiagram participant Dev as Developer / Agent participant Skill as QA Skill participant Chrome as Headless Chrome participant Tools as Analysis Tools participant Claude as Claude Vision Dev->>Skill: "Check the UI at localhost:5174" Skill->>Skill: Select tier (decision tree) rect rgba(0, 229, 255, 0.05) Note over Skill,Chrome: Tier 1: Screenshot Capture Skill->>Chrome: Launch headless (--no-sandbox) Chrome->>Chrome: Navigate to URL Chrome->>Chrome: Wait 2000ms for JS render Chrome-->>Skill: PNG screenshot Chrome->>Chrome: Close browser end rect rgba(255, 171, 64, 0.05) Note over Skill,Tools: Optional: Analysis Skill->>Tools: Run visual-diff.sh (before vs after) Tools-->>Skill: Diff overlay + 21.3% change Skill->>Tools: Run YOLO detection Tools-->>Skill: 18 elements detected Skill->>Tools: Run OmniParser Tools-->>Skill: 44 elements parsed end rect rgba(0, 230, 118, 0.05) Note over Skill,Claude: Verdict Skill->>Claude: Read screenshot PNG Claude->>Claude: Visually analyse layout Claude-->>Dev: "Layout looks correct, sidebar renders properly" end

End-to-end QA workflow — from trigger to verdict

Architecture mindmap

The full skill architecture at a glance, showing how all components relate:

flowchart LR ROOT(("Lite\nBrowser QA")) subgraph CAPTURE ["Capture Tools"] T1["Tier 1\nStatic Screenshot\nscreenshot.mjs"] T1b["Tier 1b\nGIF Recording\n--record flag"] T2["Tier 2\nScripted Interaction\nInline Puppeteer"] T3["Tier 3\nFull Browser\nclaude-in-chrome"] end subgraph ANALYSIS ["Analysis Tools"] VD["Visual Diff\nImageMagick compare\n5 verdict levels"] YL["YOLO v11\nUI element detection\n264ms inference"] OP["OmniParser v2\nFull screen parsing\n68% avg confidence"] end CV["Claude Vision\nRead PNG into\nconversation"] ROOT --> T1 ROOT --> T1b ROOT --> T2 ROOT --> T3 T1 --> VD T1 --> YL T1 --> OP T1 --> CV T1b --> CV T2 --> CV style ROOT fill:#1e2a4a,stroke:#00e5ff,color:#e0e6f0,stroke-width:3px style T1 fill:#0d3320,stroke:#00e676,color:#a0ffcc,stroke-width:2px style T1b fill:#0d2a33,stroke:#00e5ff,color:#a0f0ff,stroke-width:2px style T2 fill:#332a0d,stroke:#ffab40,color:#ffe0a0,stroke-width:2px style T3 fill:#330d15,stroke:#ff5252,color:#ffa0a8,stroke-width:2px style VD fill:#1a0d1a,stroke:#ff5252,color:#ffa0a8 style YL fill:#0d1a0d,stroke:#00e676,color:#a0ffcc style OP fill:#1a0d33,stroke:#8855ff,color:#c8a0ff style CV fill:#141a2e,stroke:#00e5ff,color:#a0f0ff style CAPTURE fill:#0f1425,stroke:#2a3358,color:#8892a8 style ANALYSIS fill:#0f1425,stroke:#2a3358,color:#8892a8

Complete skill architecture — capture tools feed into analysis tools and Claude Vision

Real-world test results

The skill was tested against the BACON Mesh Hub v0.2 (a D3.js force-directed network visualisation). Two UI states were captured and analysed with all tools:

21.3%

Visual Diff (changed pixels)

YOLO Elements Detected

OmniParser Elements

264ms

YOLO Inference

169ms

OmniParser Inference

Source screenshots

State 1: Default view288 KB

State 2: Brainstorm + attach menu478 KB

Analysis results

Visual Diff: 21.3% change, SIGNIFICANT216 KB

YOLO v11: 18 elements (15 buttons, 3 images)345 KB

OmniParser v2: 44 elements, avg 68% confidence415 KB

Full desktop capture (1400x900)325 KB

GIF recordings

Mesh hub animation — 3s loop, 5fps, 15 frames (1.5 MB)

Room click interaction — 3s, 6fps, 18 frames (1.8 MB)

Interactive comparison dashboard

The full interactive dashboard with tabbed views, slider comparison, and zooming was generated during the QA demo session. It includes all screenshots, analysis overlays, and side-by-side comparisons:

qa-tools-comparison-dashboard-2.html — tabbed dashboard with slider compare Open full screen

Integration with 7-step UX validation

The skill maps directly to the BACON-AI mandatory 7-Step UX Validation Checklist:

flowchart LR S1["1. Technical Fix\nApplied"] --> S2["2. Component Test\nTier 1: element screenshot"] S2 --> S3["3. Integration Test\nTier 1: full-page screenshot"] S3 --> S4["4. User Workflow\nTier 2: scripted interaction"] S4 --> S5["5. Error Conditions\nTier 1 + --check-console"] S5 --> S6["6. Browser UX\nAll tiers: evidence capture"] S6 --> S7["7. Evidence Docs\nScreenshots + GIFs + diffs"] style S1 fill:#141a2e,stroke:#5a6580,color:#e0e6f0 style S2 fill:#141a2e,stroke:#00e676,color:#e0e6f0 style S3 fill:#141a2e,stroke:#00e676,color:#e0e6f0 style S4 fill:#141a2e,stroke:#ffab40,color:#e0e6f0 style S5 fill:#141a2e,stroke:#00e5ff,color:#e0e6f0 style S6 fill:#141a2e,stroke:#8855ff,color:#e0e6f0 style S7 fill:#141a2e,stroke:#00e676,color:#e0e6f0

How each validation step maps to a QA skill tier

Tool coverage matrix

The table below maps every tool defined in the bacon-ai-lite-browser-qa skill (SKILL.md v2.0.0) against what this documentation page covers, and highlights any gaps.

Tool / Capability	Use Case	Frequency	Docs Coverage	Gap Notes
Tier 1: Static Screenshot	Verify UI renders correctly after CSS/HTML/component changes. Desktop, mobile, tablet, element-specific, multi-viewport, and console error checking.	Every change	Full	—
Tier 1b: GIF Recording	Verify CSS animations, particle effects, loading states, hover transitions, and time-based visual behaviour. Supports mid-recording JavaScript actions.	Situational	Full	—
Tier 2: Scripted Interaction	Click buttons, open menus, fill forms, navigate between UI states via inline Puppeteer scripts. Multi-state screenshot sequences.	Situational	Full	—
Tier 3: Full Browser Automation	Live browser interaction via claude-in-chrome, Playwright MCP, or antigravity-browser. For auth flows, OAuth popups, cookie-dependent sessions, live demos.	Rare (~2%)	Covered	Mentioned in tier cards and decision flow; no dedicated section with examples (intentional — tool has own docs)
Tool A: Visual Diff	Pixel-level before/after regression detection using ImageMagick `compare`. 5 verdict levels from IDENTICAL to SIGNIFICANT. Configurable fuzz threshold.	Every CSS change	Full	—
Tool B: YOLO v11 Detection	Fast bounding-box detection of UI elements (buttons, inputs, icons, checkboxes). Structural sanity check after refactors. Compare element counts before/after.	After refactors	Full	—
Tool C: OmniParser v2.0	Full screen parsing — extracts structured list of all interactive elements with positions and confidence scores. For unfamiliar pages, accessibility audits, handoff documentation.	Rare / specialized	Full	—
Claude Vision (Read PNG)	Natural-language UI analysis — “does this look right?”, “is the spacing balanced?”. Qualitative assessment via Claude’s multimodal capability.	Subjective checks	Partial	Mentioned in sequence diagram and quick reference, but no dedicated section explaining usage patterns or example prompts
QA Workflows by Change Type	4 prescriptive step-by-step workflows: CSS change, component refactor, new feature, full page audit. Each specifies which tools to run and in what order.	Reference guide	Full	—
Per-Tool Frequency Guidance	Daily / situational / rare usage classification with “When to use” and “When NOT to use” guidelines and tool limitations.	Reference guide	Partial	Frequency shown in this table. Skill has detailed “When to use / When NOT to use / Limitation” prose for each tool — not fully reflected
Tool Selection Quick Reference	Question-based lookup: “Did my CSS break?” → Pixel Diff, “Does the animation look right?” → GIF Recording, etc.	Reference guide	Full	—
Puppeteer Import Path & Notes	Technical detail: reuses mermaid-cli’s bundled Puppeteer to avoid 300MB separate Chromium install. Required `--no-sandbox` flag.	Technical ref	Partial	Key design choice box mentions it. Full import path and rationale from skill not reproduced (intentional for human-facing docs)

Coverage summary

7 / 7

Core Tools Covered

9 / 12

Total Items Fully Covered

Partially Covered

Gaps Remaining

The three “partial” items are intentional: Claude Vision usage is shown in context (sequence diagram + quick ref) rather than duplicating its own docs; per-tool frequency is shown in the table’s Frequency column; and the Puppeteer import path is a machine-specific technical detail omitted from human-facing documentation.

ML models evaluated

During development, six HuggingFace models were evaluated for UI analysis. Two were selected for integration:

Model	Task	Size	Status
macpaw-research/yolov11l	UI element detection (bounding boxes)	49 MB	Installed
microsoft/OmniParser-v2.0	Full screen parsing + element extraction	1.1 GB	Installed
microsoft/Florence-2-base	General visual QA	930 MB	Evaluated
Qwen/Qwen2.5-VL-3B-Instruct	Vision-language model	~7 GB	Evaluated
BAAI/BGE-VL-Screenshot	Screenshot similarity scoring	~7 GB	Evaluated
microsoft/layoutlmv3-base	Document layout understanding	~500 MB	Evaluated

Which tool do I use?

Start with the question you need to answer, not the tool. This table maps the most common QA questions to the right tool:

Question	Tool	Speed
Did my CSS change break anything?	Pixel Diff — screenshot before, change, screenshot after, diff	<1s
Does the page render correctly?	Screenshot (Tier 1) — quickest sanity check	~3s
Does this animation look right?	GIF Recording (Tier 1b) — capture the motion	~5s
Are all buttons/inputs still there?	YOLO — structural element count after refactors	~300ms
What is every interactive element?	OmniParser — full inventory of unfamiliar pages	~200ms
Does this “look good”? (subjective)	Claude Vision — Read the PNG, ask a qualitative question	N/A
Show the user a live demo	Full Browser (Tier 3) — only when live interaction is needed	Varies

QA workflows by change type

Four prescriptive, step-by-step recipes. Pick the one that matches what you just changed:

1. CSS or styling change

The most common QA workflow. A simple before/after comparison catches both intentional changes and accidental side-effects.

flowchart LR B["Screenshot\nBEFORE"] --> C["Make CSS\nchange"] --> A["Screenshot\nAFTER"] A --> D["Pixel Diff\nbefore vs after"] D --> E{Unexpected\nred areas?} E -->|Yes| F["Investigate\nside-effects"] E -->|No| G["Read screenshot\nverify visually"] G --> H([Done]) F --> B style B fill:#141a2e,stroke:#00e5ff,color:#e0e6f0 style C fill:#141a2e,stroke:#ffab40,color:#ffe0a0 style A fill:#141a2e,stroke:#00e5ff,color:#e0e6f0 style D fill:#141a2e,stroke:#ff5252,color:#ffa0a8 style E fill:#141a2e,stroke:#ffab40,color:#e0e6f0 style F fill:#141a2e,stroke:#ff5252,color:#ffa0a8 style G fill:#141a2e,stroke:#00e676,color:#a0ffcc style H fill:#1e2a4a,stroke:#00e676,color:#e0e6f0

# Step 1: Baseline screenshot node screenshot.mjs --url http://localhost:5174 --output /tmp/before.png --preset desktop # Step 2: Make your CSS change, then screenshot again node screenshot.mjs --url http://localhost:5174 --output /tmp/after.png --preset desktop # Step 3: Diff them — red areas show what changed bash visual-diff.sh /tmp/before.png /tmp/after.png /tmp/diff.png # Step 4: Read the screenshot to verify visually # Use: Read /tmp/after.png

2. Component refactor

Refactors risk removing or repositioning elements without realising. Compare element counts before and after:

flowchart LR B1["Screenshot +\nYOLO BEFORE\ncount elements"] --> R["Refactor\ncomponent"] --> A1["Screenshot +\nYOLO AFTER\ncount elements"] A1 --> CMP{"Element\ncounts match?"} CMP -->|No| INV["Investigate\nmissing elements"] CMP -->|Yes| DIFF["Pixel Diff\nfor visual regression"] DIFF --> OK([Done]) INV --> R style B1 fill:#141a2e,stroke:#00e676,color:#a0ffcc style R fill:#141a2e,stroke:#ffab40,color:#ffe0a0 style A1 fill:#141a2e,stroke:#00e676,color:#a0ffcc style CMP fill:#141a2e,stroke:#ffab40,color:#e0e6f0 style INV fill:#141a2e,stroke:#ff5252,color:#ffa0a8 style DIFF fill:#141a2e,stroke:#ff5252,color:#ffa0a8 style OK fill:#1e2a4a,stroke:#00e676,color:#e0e6f0

# Before refactor — capture baseline + element count node screenshot.mjs --url http://localhost:5174 --output /tmp/before.png --preset desktop python3 -c "from ultralytics import YOLO; r = YOLO('model.pt')('/tmp/before.png', conf=0.25); print(f'{len(r[0].boxes)} elements')" # After refactor — same checks, compare counts node screenshot.mjs --url http://localhost:5174 --output /tmp/after.png --preset desktop python3 -c "from ultralytics import YOLO; r = YOLO('model.pt')('/tmp/after.png', conf=0.25); print(f'{len(r[0].boxes)} elements')" # Pixel diff to catch visual regressions bash visual-diff.sh /tmp/before.png /tmp/after.png /tmp/diff.png

3. New feature (adding a UI element)

When adding something new, the pixel diff should show change only in the area of the new element — red anywhere else means a side-effect:

flowchart LR BL["Screenshot\nBEFORE\nbaseline"] --> IMP["Implement\nnew feature"] IMP --> AF["Screenshot\nAFTER"] AF --> RD["Read screenshot\nverify it renders"] RD --> ANIM{"Animated?"} ANIM -->|Yes| GIF["GIF recording\n--record 3000"] ANIM -->|No| PD["Pixel Diff\nonly new area changed?"] GIF --> PD PD --> OK([Done]) style BL fill:#141a2e,stroke:#00e5ff,color:#e0e6f0 style IMP fill:#141a2e,stroke:#ffab40,color:#ffe0a0 style AF fill:#141a2e,stroke:#00e5ff,color:#e0e6f0 style RD fill:#141a2e,stroke:#00e676,color:#a0ffcc style ANIM fill:#141a2e,stroke:#ffab40,color:#e0e6f0 style GIF fill:#141a2e,stroke:#00e5ff,color:#a0f0ff style PD fill:#141a2e,stroke:#ff5252,color:#ffa0a8 style OK fill:#1e2a4a,stroke:#00e676,color:#e0e6f0

# Baseline before implementing node screenshot.mjs --url http://localhost:5174 --output /tmp/before.png # ... implement the feature ... # Verify it renders node screenshot.mjs --url http://localhost:5174 --output /tmp/after.png # Use: Read /tmp/after.png # If animated — record a GIF to verify the transition node screenshot.mjs --url http://localhost:5174 --record 3000 --output /tmp/feature-anim.gif # Use: Read /tmp/feature-anim.gif # Pixel diff — confirm only the new area changed bash visual-diff.sh /tmp/before.png /tmp/after.png /tmp/diff.png

4. Full page audit or handoff

When documenting a page for another agent, or doing a comprehensive QA pass before a release:

# All 4 viewports at once node screenshot.mjs --url http://localhost:5174 --output /tmp/audit.png --multi-viewport # Full element inventory via OmniParser # (use HuggingFace Space demo or local inference) # Structural element count via YOLO python3 -c "from ultralytics import YOLO; r = YOLO('model.pt')('/tmp/audit-desktop.png', conf=0.25, save=True, project='/tmp', name='yolo-audit')" # Console error check node screenshot.mjs --url http://localhost:5174 --output /tmp/audit-console.png --check-console # Save all artifacts as an audit trail # mkdir -p docs/tests/$(date +%Y-%m-%d)-audit/

Quick reference commands

# === TIER 1: Screenshots === node screenshot.mjs --url http://localhost:5174 --output /tmp/qa.png --preset desktop node screenshot.mjs --url http://localhost:5174 --output /tmp/el.png --selector "#sidebar" node screenshot.mjs --url http://localhost:5174 --output /tmp/qa.png --multi-viewport # === TIER 1b: GIF Recording === node screenshot.mjs --url http://localhost:5174 --record 3000 --output /tmp/anim.gif node screenshot.mjs --url http://localhost:5174 --record 2000 --fps 10 --output /tmp/smooth.gif # === TOOL A: Visual Diff === bash visual-diff.sh /tmp/before.png /tmp/after.png /tmp/diff.png bash visual-diff.sh /tmp/before.png /tmp/after.png /tmp/diff.png 10% # === View any screenshot in Claude === # Use the Read tool: Read("/tmp/qa.png") # Claude will visually analyse the PNG

Never claim “UI tested” without at least a Tier 1 screenshot. The Lite Browser QA skill exists precisely to prevent the anti-pattern of declaring changes work without visual evidence. Take a screenshot, read it into the conversation, and verify before reporting success.