News Score: Score the News, Sort the News, Rewrite the Headlines

CPUs Aren't Dead. Gemma 2B Just Scored Higher Than GPT-3.5 Turbo on the Test That Made It Famous — Your Laptop Can Run It, or Cloudflare for $5/Mo.

Gemma 2B scored ~8.0 on MT-Bench. GPT-3.5 Turbo scored 7.94. An 87-times-smaller model on a laptop CPU, no GPU anywhere in the stack. We published the full tape — every question, every turn, every score — so anyone can verify it. We found seven failure classes. Not hallucinations. Specific patterns: arithmetic where it computed correctly but committed the wrong number first, logic puzzles where it proved the right answer then shipped the wrong one, constraints it drifted on, personas it broke, q...

Read more at seqpu.com

© News Score  score the news, sort the news, rewrite the headlines