News Score: Score the News, Sort the News, Rewrite the Headlines

How I got the highest score on ARC-AGI again swapping Python for English

I think ARC-AGI is still the most important benchmark we have today. It’s surprising that LLMs can win the math olympiad but struggle with simple puzzles that humans can solve easily.This highlights a core limitation of current LLMs: they struggle to reason about things they weren't trained on. They struggle to generalize. But they are getting better, fast.Last December, I got first place on ARC-AGI v1 with a score of 53.6%. A lot has changed since then. Thinking models had just come out and the...

Read more at jeremyberman.substack.com

© News Score  score the news, sort the news, rewrite the headlines