Researchers find large language models like Claude Opus 4 can introspect on internal states when tested via concept injection, though ability remains unreliable and context-dependent.

Emergent Introspective Awareness in Large Language Models

Affiliations Published October 29th, 2025 Contents We investigate whether large language models can introspect on their internal states. It is difficult to answer this question through conversation alone, as genuine introspection cannot be distinguished from confabulations. Here, we address this challenge by injecting representations of known concepts into a model’s activations, and measuring the influence of these manipulations on the model’s self-reported states. We find that models can, in ce...