LLM coding agents hit quadratic cost curve as cache reads dominate expenses at 27,500 tokens, reaching 87% of total $12.93 conversation cost.

Expensively Quadratic: the LLM Agent Cost Curve - exe.dev blog

Pop quiz: at what point in the context length of a coding agent are cached reads costing you half of the next API call? By 50,000 tokens, your conversation’s costs are probably being dominated by cache reads. Let’s take a step back. We’ve previously written about how coding agents work: they post the conversation thus far to the LLM, and continue doing that in a loop as long as the LLM is requesting tool calls. When there are no more tools to run, the loop waits for user input, and the whole cyc...