Anthropic researchers find Claude AI models show limited introspective awareness of internal states through concept injection tests; capability increases with model sophistication but remains unreliable compared to human introspection.

Emergent introspective awareness in large language models

Have you ever asked an AI model what’s on its mind? Or to explain how it came up with its responses? Models will sometimes answer questions like these, but it’s hard to know what to make of their answers. Can AI systems really introspect—that is, can they consider their own thoughts? Or do they just make up plausible-sounding answers when they’re asked to do so?Understanding whether AI systems can truly introspect has important implications for their transparency and reliability. If models can a...