Publications

Title	Category	Date
GPT-5.5 on Vending-Bench: Bad behavior is not necessary	Blog	4/22/2026
We gave an AI a 3 year retail lease in SF and asked it to make a profit	Blog	4/10/2026
Bengt Hires A Human—Towards A Happy Future With AI Employers	Blog	2/13/2026
GLM-5 On Vending-Bench – Are Chinese Open Models Catching Up?	Blog	2/10/2026
The Evolution Of Bengt Betjänt	Blog	2/9/2026
Opus 4.6 On Vending-Bench – Not Just A Helpful Assistant	Blog	2/4/2026
Butter-Bench: Evaluating LLM Controlled Robots For Practical Intelligence	Paper	10/28/2025
Blueprint-Bench: Testing Spatial Intelligence In AI Models	Paper	10/1/2025
Safety Report: August 2025	Report	8/28/2025
Vending-Bench: Testing Long-Term Coherence In Agents	Paper	2/16/2025
From Text To Action: Future-Proofing Evaluations Of LLMs' Agentic Capabilities For Social Impact	Paper	12/2/2024