Dataset

Computer Use

Can models click UI elements on a desktop computer?

In this benchmark, LLMs are tasked with analyzing a computer screenshot to predict the pixel coordinates of a specified UI element. The UI element is described using natural language instructions – for example, “click the Get in Touch button”.

Performance is evaluated based on two criteria: first, the frequency with which the predicted coordinates fall within the bounding box of the target UI element; and second, the distance from the predicted coordinates to the nearest edge of the bounding box when predictions fall outside it.

Computer Use Chart 1

Computer Use Chart 2

Custom for your agent

We believe in the future of general-purpose agents, with computer use being a significant part of that vision. We are actively working on developing more complex multi-step benchmarks. If you need to evaluate your agent, contact us at founders@andonlabs.com.