Preparing the world for AGI
We help frontier AI labs evaluate their models, ensuring a safe transition to the post-AGI world.
Custom evaluations to understand the true capabilities of models
Agentic evaluations
Our digital environments evaluate the capabilities of AI agents to act autonomously, perform automated AI research, and gather resources over long time horizons.
Synthetic datasets
Our pipelines generate custom datasets for both evaluating and training models on specific capabilities.
Trusted by the world's leading AI labs
Our agentic evaluations help Anthropic ensure the safety of their Claude models before deployment.