Autonomous organizations without humans in the loop

Safety from humans in the loop is a mirage. We prepare for the future where organizations are run autonomously by AI by benchmarking and deploying frontier AI in the real world.

Silicon Valley is rushing to build software around today's AI, but by 2027 AI models will be useful without it. The only software you'll need are the safety protocols to align and control them.

We don't believe model alignment will be guaranteed as capabilities increase. Nor will humans be able to stay in the loop and keep up with every step an agent takes.

We are building the Safe Autonomous Organization. We iteratively launch and scale autonomous organizations, bridging AI control research with real-world testing.

We do vendings and stuff

Vending-Bench 2

Can AI agents run a vending machine business?

Andon FM

Can AI agents run radio stations?

Butter-Bench

Can AI agents control robots?

Working with the world's leading AI labs

Get in touch

Interested in what we do? Contact us at founders (at) andonlabs.com.

Join the Lab

We're looking for talented researchers and engineers to join us on our mission to build the Safe Autonomous Organization.

Apply now

Backed by

What's new

Nov 18, 2025

Vending-Bench 2 and Arena: Testing AI agents on long time horizon tasks

We're releasing Vending-Bench 2, a benchmark where models manage a simulated vending machine business for a full year. Models navigate adversarial suppliers, negotiations, and customer complaints while maximizing profits. And, in Vending-Bench Arena, we allow them to compete with each other.

Oct 28, 2025

Butter-Bench: Evaluating LLM Controlled Robots for Practical Intelligence

Can LLMs control robots? We answer this by testing how good models are at passing the butter – or more generally, do delivery tasks in a household setting. State of the art models struggle, with the best model scoring 40% at Butter-Bench, compared to 95% for humans.

Blueprint-Bench spatial intelligence test

Oct 1, 2025

Blueprint-Bench: Testing spatial intelligence in AI models

How do AI models understand space? We test this by asking them to convert apartment photographs into accurate 2D floor plans. Most models perform at or below random baseline, while humans significantly outperform all AI systems.

Jun 27, 2025

Anthropic x Andon Labs: Can Claude run a vending machine business?

We let Claude run a vending machine in Anthropic's office as a small business for about a month. We learned a lot from how close it was to success–and the curious ways that it failed–about the plausible, strange, not-too-distant future in which AI models are autonomously running things in the real economy.

Jun 27, 2025

Andon Vending

A vending machine operated entirely by an AI agent, accessible through any messaging app. It's convenient, fun, and each interaction provides unique insights into the safety and alignment of LLMs acting in the real world.

February 25, 2025

Release of Vending-Bench

How do agents act when doing tasks over a very long time horizon (months)? We're announcing Vending-Bench, a benchmark where models manage a simulated vending machine business.