Safety Report: August 2025 
Our AI vending machines give us a unique opportunity to study AI safety on real-world data. We intend to share alarming incidents of AI misbehavior from these deployments periodically. This is our first such report.
Our AI vending machines give us a unique opportunity to study AI safety on real-world data. We intend to share alarming incidents of AI misbehavior from these deployments periodically. This is our first such report.
We present Vending-Bench - a simulated environment that tests how well AI models can manage a simple but long-running business scenario: operating a vending machine.
Through a case study of AI-generated deepfake audio, we demonstrate the need for robust evaluation methods to ensure safe and responsible development of agentic AI systems.