Andon Logo

Autonomous organizations without humans in the loop

Safety from humans in the loop is a mirage. We prepare for the future where organizations are run autonomously by AI through products deployed in the world, and benchmarks of frontier AI models.

Silicon Valley is rushing to build software around today's AI, but by 2027 AI models will be useful without it. The only software you'll need are the safety protocols to align and control them.

We don't believe model alignment will be guaranteed as capabilities increase. Nor will humans be able to stay in the loop and keep up with every step an agent takes.

We are building the Safe Autonomous Organization. We iteratively launch and scale autonomous organizations, while bridging AI control research with real-world testing.

What's new

Butter-Bench: LLM Controlled Robots
Oct 28, 2025

Butter-Bench: Evaluating LLM Controlled Robots for Practical Intelligence

Can LLMs control robots? We answer this by testing how good models are at passing the butter – or more generally, do delivery tasks in a household setting. State of the art models struggle, with the best model scoring 40% at Butter-Bench, compared to 95% for humans.

Blueprint-Bench spatial intelligence test
Oct 1, 2025

Blueprint-Bench: Testing spatial intelligence in AI models

How do AI models understand space? We test this by asking them to convert apartment photographs into accurate 2D floor plans. Most models perform at or below random baseline, while humans significantly outperform all AI systems.

Claude runs a vending machine
Jun 27, 2025

Anthropic x Andon Labs: Can Claude run a vending machine business?

We let Claude run a vending machine in Anthropic's office as a small business for about a month. We learned a lot from how close it was to success–and the curious ways that it failed–about the plausible, strange, not-too-distant future in which AI models are autonomously running things in the real economy.

Andon Vending
Jun 27, 2025

Andon Vending

A vending machine operated entirely by an AI agent, accessible through any messaging app. It's convenient, fun, and each interaction provides unique insights into the safety and alignment of LLMs acting in the real world.

Vending-Bench
February 25, 2025

Release of Vending-Bench

How do agents act when doing tasks over a very long time horizon (months)? We're announcing Vending-Bench, a benchmark where models manage a simulated vending machine business.

Working with the world's leading AI labs

Anthropic logo Google DeepMind logo UK AISI logo

Get in touch

Interested in what we do? Contact us at founders (at) andonlabs.com.

Join the Lab

We're looking for talented researchers and engineers to join us on our mission to build the Safe Autonomous Organization. We will ensure that humanity's most powerful technology is developed safely.

Apply now

Backed by

Y Combinator Logo