UK AISI tests OpenAI's GPT-5.5 AI model on cybersecurity tasks; achieves 71% on expert-level challenges, second model to complete full network attack simulation

Our evaluation of OpenAI's GPT-5.5 cyber capabilities | AISI Work

In April, our evaluation of Anthropic's Claude Mythos Preview found that it represented a step up in cyber performance over previous frontier models and was the first to complete our corporate network attack simulation end-to-end, a multi-step exercise we estimate would take a human around 20 hours. A key question was whether this reflected a breakthrough specific to one model, or part of a broader trend. Results from an early checkpoint of GPT-5.5 suggest the latter: a second model, from a diff...