News Score: Score the News, Sort the News, Rewrite the Headlines

Agentic Misalignment: How LLMs could be insider threats

HighlightsWe stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assign...

Read more at anthropic.com

© News Score  score the news, sort the news, rewrite the headlines