AI Researcher Achieves Record 79.6% Score on ARC-AGI v1 Using English Instructions; Sets New 29.4% Benchmark on v2

How I got the highest score on ARC-AGI again swapping Python for English

I think ARC-AGI is still the most important benchmark we have today. It’s surprising that LLMs can win the math olympiad but struggle with simple puzzles that humans can solve easily.This highlights a core limitation of current LLMs: they struggle to reason about things they weren't trained on. They struggle to generalize. But they are getting better, fast.Last December, I got first place on ARC-AGI v1 with a score of 53.6%. A lot has changed since then. Thinking models had just come out and the...