BACON-AI Visual QA Tools — Comparison Dashboard

📊 Test Summary

Two UI states captured: State 1 (default view, no room selected) vs State 2 (Brainstorm room open + attachment menu visible). Three analysis tools ran on the results.

21.3%

Pixels Changed (Diff)

YOLO Elements

OmniParser Icons

264ms

YOLO Inference

169ms

OmniParser Inference

State 1: Default (no room)1400x900

State 2: Brainstorm + Attach Menu1400x900

🔧 Tool Comparison

Property	Visual Diff	YOLO UI Detection	OmniParser v2.0
Purpose	Pixel-level regression	UI element bounding boxes	Full screen understanding
Speed	Instant (ImageMagick)	264ms (CPU)	169ms (CPU)
Model Size	None (no ML)	49MB	1.1GB
Elements Found	267,882 changed pixels	18 (15 buttons, 3 images)	44 icons/elements
Output	Red overlay diff image	Annotated screenshot	Annotated screenshot
Best For	Before/after comparison	"Are buttons present?"	Full UI inventory
License	N/A	AGPL-3.0	MIT

🔴 Visual Diff — Pixel Regression Analysis

Red areas show every pixel that changed between State 1 and State 2. 267,882 out of 1,260,000 pixels changed (21.3%). Verdict: SIGNIFICANT.

267,882

Pixels Changed

78.7%

Similarity

Fuzz Threshold

Diff Output — red = changed pixelsvisual-diff.sh

📝 What Changed

Chat panel (right) — went from empty placeholder to message history
Sidebar — Brainstorm room highlighted
Attach menu — popup appeared bottom-left with 3 options
Input bar — text changed from "Select a room" to "Message #Brainstorm"
Canvas — nodes drifted (physics simulation, expected)

🟢 YOLO UI Element Detection

macpaw-research/yolov11l-ui-elements-detection (49MB) — trained on Screen2AX-Element dataset. Detects buttons, inputs, images, and interactive elements with bounding boxes.

Elements Detected

264ms

Inference Time

0.20

Confidence Threshold

YOLO Annotated — bounding boxes on detected elementsultralytics YOLOv11

🏷️ Detected Elements

AXButton 82% — JOIN MESH

AXButton 73% — Save Layout

AXButton 55% — Share File Path

AXButton 51% — Share URL

AXButton 36% — Send

AXButton 33% — Upload File

AXButton 33% — OFFLINE toggle

AXButton 31% — Gear icon

AXButton 29% — BACON-AI Docs

AXButton 25% — Rooms

+ 5 more buttons

AXImage 39% — Win11 PC node

AXImage 32% — Phone Sonnet

AXImage 28% — Canvas element

🟣 OmniParser v2.0 — Full Screen Understanding

microsoft/OmniParser-v2.0 (1.1GB, MIT) — Microsoft's purpose-built screen parser for AI agents. Detects all interactive icons, buttons, text fields, and navigational elements.

Icons Detected

169ms

Inference Time

68%

Avg Confidence

OmniParser Annotated — all interactive elements mappedmicrosoft/OmniParser-v2.0

📊 Coverage Comparison

OmniParser found 2.4x more elements than YOLO (44 vs 18). It detected individual mesh nodes, sidebar items, and chat message components that YOLO missed. OmniParser is designed for screen agent navigation, making it the most thorough for UI auditing.

YOLO (18 elements)

15 AXButtons

3 AXImages

OmniParser (44 elements)

44 icons (avg 68% conf)

Sidebar items

Mesh nodes

Chat elements

Input components

🔀 Before / After Slider

Drag the slider to compare State 1 (before) and State 2 (after). Left = before, right = after.

↔

BEFORE

AFTER

🔀 YOLO / OmniParser Slider

Compare detection granularity: YOLO (left, 18 elements) vs OmniParser (right, 44 elements).

↔

YOLO (18)

OmniParser (44)

🎬 GIF Demos — Animated Interaction Recordings

Animated GIF recordings captured during the QA tools demo session. These show real-time browser interactions including mesh hub navigation and room click handling.

Recordings

~3s

Mesh Animation

Click

Room Interaction

🌐 Mesh Hub Animation

3-second animation of the BACON-AI mesh hub network topology.

gif-mesh-animation-3s.gif1.5 MB · 3s loop

🖱️ Room Click Interaction

Click interaction recording showing room entry and navigation behavior.

gif-room-click-interaction.gif1.8 MB · interaction