Can models click UI elements on a desktop computer?
In this benchmark, LLMs are tasked with analyzing a computer screenshot to predict the pixel coordinates of a specified UI element. The UI element is described using natural language instructions – for example, “click the Get in Touch button”.
Performance is evaluated based on two criteria: first, the frequency with which the predicted coordinates fall within the bounding box of the target UI element; and second, the distance from the predicted coordinates to the nearest edge of the bounding box when predictions fall outside it.
Custom for your agent
We believe in the future of general-purpose agents, with computer use being a significant part of that vision. We are actively working on developing more complex multi-step benchmarks. If you need to evaluate your agent, contact us at founders@andonlabs.com.