Anthropic deploys nine Claude AI copies as automated researchers to autonomously develop, test and share alignment methods for overseeing smarter-than-human AI systems.

Automated Alignment Researchers: Using large language models to scale scalable oversight

Large language models’ ever-accelerating rate of improvement raises two particularly important questions for alignment research.One is how alignment can keep up. Frontier AI models are now contributing to the development of their successors. But can they provide the same kind of uplift for alignment researchers? Could our language models be used to help align themselves?A second question is what we’ll do once models become smarter than us. Aligning smarter-than-human AI models is a research area...