Overview
Visual Diff 21%
YOLO 18
OmniParser 44
Slider Compare
GIF Demos 2
๐ Test Summary
Two UI states captured: State 1 (default view, no room selected) vs State 2 (Brainstorm room open + attachment menu visible).
Three analysis tools ran on the results.
21.3%
Pixels Changed (Diff)
169ms
OmniParser Inference
State 1: Default (no room)1400x900
State 2: Brainstorm + Attach Menu1400x900
๐ง Tool Comparison
| Property | Visual Diff | YOLO UI Detection | OmniParser v2.0 |
| Purpose | Pixel-level regression | UI element bounding boxes | Full screen understanding |
| Speed | Instant (ImageMagick) | 264ms (CPU) | 169ms (CPU) |
| Model Size | None (no ML) | 49MB | 1.1GB |
| Elements Found | 267,882 changed pixels | 18 (15 buttons, 3 images) | 44 icons/elements |
| Output | Red overlay diff image | Annotated screenshot | Annotated screenshot |
| Best For | Before/after comparison | "Are buttons present?" | Full UI inventory |
| License | N/A | AGPL-3.0 | MIT |
๐ด Visual Diff โ Pixel Regression Analysis
Red areas show every pixel that changed between State 1 and State 2.
267,882 out of 1,260,000 pixels changed (21.3%). Verdict: SIGNIFICANT.
Diff Output โ red = changed pixelsvisual-diff.sh
๐ What Changed
Chat panel (right) โ went from empty placeholder to message history
Sidebar โ Brainstorm room highlighted
Attach menu โ popup appeared bottom-left with 3 options
Input bar โ text changed from "Select a room" to "Message #Brainstorm"
Canvas โ nodes drifted (physics simulation, expected)
๐ข YOLO UI Element Detection
macpaw-research/yolov11l-ui-elements-detection (49MB) โ trained on Screen2AX-Element dataset.
Detects buttons, inputs, images, and interactive elements with bounding boxes.
0.20
Confidence Threshold
YOLO Annotated โ bounding boxes on detected elementsultralytics YOLOv11
๐ท๏ธ Detected Elements
AXButton 82% โ JOIN MESH
AXButton 73% โ Save Layout
AXButton 55% โ Share File Path
AXButton 51% โ Share URL
AXButton 36% โ Send
AXButton 33% โ Upload File
AXButton 33% โ OFFLINE toggle
AXButton 31% โ Gear icon
AXButton 29% โ BACON-AI Docs
AXButton 25% โ Rooms
+ 5 more buttons
AXImage 39% โ Win11 PC node
AXImage 32% โ Phone Sonnet
AXImage 28% โ Canvas element
๐ฃ OmniParser v2.0 โ Full Screen Understanding
microsoft/OmniParser-v2.0 (1.1GB, MIT) โ Microsoft's purpose-built screen parser for AI agents.
Detects all interactive icons, buttons, text fields, and navigational elements.
OmniParser Annotated โ all interactive elements mappedmicrosoft/OmniParser-v2.0
๐ Coverage Comparison
OmniParser found 2.4x more elements than YOLO (44 vs 18).
It detected individual mesh nodes, sidebar items, and chat message components that YOLO missed.
OmniParser is designed for screen agent navigation, making it the most thorough for UI auditing.
OmniParser (44 elements)
44 icons (avg 68% conf)
Sidebar items
Mesh nodes
Chat elements
Input components
๐ Before / After Slider
Drag the slider to compare State 1 (before) and State 2 (after). Left = before, right = after.
↔
BEFORE
AFTER
๐ YOLO / OmniParser Slider
Compare detection granularity: YOLO (left, 18 elements) vs OmniParser (right, 44 elements).
↔
YOLO (18)
OmniParser (44)
๐ฌ GIF Demos โ Animated Interaction Recordings
Animated GIF recordings captured during the QA tools demo session. These show real-time browser interactions
including mesh hub navigation and room click handling.
๐ Mesh Hub Animation
3-second animation of the BACON-AI mesh hub network topology.
gif-mesh-animation-3s.gif1.5 MB ยท 3s loop
๐ฑ๏ธ Room Click Interaction
Click interaction recording showing room entry and navigation behavior.
gif-room-click-interaction.gif1.8 MB ยท interaction