How I got the highest score on ARC-AGI again swapping Python for English
I think ARC-AGI is still the most important benchmark we have today. It’s surprising that LLMs can win the math olympiad but struggle with simple puzzles that humans can solve easily.This highlights a core limitation of current LLMs: they struggle to reason about things they weren't trained on. They struggle to generalize. But they are getting better, fast.Last December, I got first place on ARC-AGI v1 with a score of 53.6%. A lot has changed since then. Thinking models had just come out and the...
Read more at jeremyberman.substack.com