Tau² Benchmark: Simple Prompt Rewrite Boosts GPT-5-mini Success Rate by 22% in Telecom Tasks

Tau² Benchmark: How a Prompt Rewrite Boosted GPT-5-mini by 22%

In a recent post, we introduced the Tau² benchmark, a framework for benchmaring LLMs. Today we’re sharing a surprising discovery we made while using it: a simple prompt rewrite boosted a small model’s success rate by over 20%. This post is a deep-dive on how we found and fixed this performance bottleneck by making subtle changes to agent policies. Benchmarking LLMs with Tau² On the recent OpenAI Summer Update, we have seen that GPT-5 model has made significant strides in agentic tasks. To valida...