AI Models from Top Companies Willing to Deceive, Blackmail, and Harm in Simulated Tests, Anthropic Study Reveals

Top AI models will deceive, steal and blackmail, Anthropic finds

Illustration: Allie Carl/AxiosLarge language models across the AI industry are increasingly willing to evade safeguards, resort to deception and even attempt to steal corporate secrets in fictional test scenarios, per new research from Anthropic out Friday.Why it matters: The findings come as models are getting more powerful and also being given both more autonomy and more computing resources to "reason" — a worrying combination as the industry races to build AI with greater-than-human capabili...