Researcher tests AI intelligence without tasks; sends repeated word "tap" in mathematical patterns to 10 LLMs—Claude, Gemini play games, OpenAI's GPT stays serious, DeepSeek switches to Chinese to solve sequences.

Task-free intelligence testing of LLMs (Part 1)

Introduction I recently wrote about the apparently narrow focus of LLM evaluation on "task based" testing. The typical eval has a set of tasks, questions, problems, etc that need to be solved or answered, and a model is scored based on how many it answers correctly. Such tests are geared towards measuring an input/output system, or a "function approximator" which is great for confirming that LLMs can learn any task but limited in probing the nature of intelligence. I'm interested in interactions...