News Score: Score the News, Sort the News, Rewrite the Headlines

Two different tricks for fast LLM inference

Anthropic and OpenAI both recently announced “fast mode”: a way to interact with their best coding model at significantly higher speeds. These two versions of fast mode are very different. Anthropic’s offers up to 2.5x tokens per second (so around 170, up from Opus 4.6’s 65). OpenAI’s offers more than 1000 tokens per second (up from GPT-5.3-Codex’s 65 tokens per second, so 15x). So OpenAI’s fast mode is six times faster than Anthropic’s1. However, Anthropic’s big advantage is that they’re servin...

Read more at seangoedecke.com

© News Score  score the news, sort the news, rewrite the headlines