Expensively Quadratic: the LLM Agent Cost Curve - exe.dev blog
Pop quiz: at what point in the context length of a coding agent are cached
reads costing you half of the next API call? By 50,000 tokens, your
conversation’s costs are probably being dominated by cache reads.
Let’s take a step back. We’ve previously
written about how coding agents work:
they post the conversation thus far to the LLM, and continue doing that in
a loop as long as the LLM is requesting tool calls. When there are no
more tools to run, the loop waits for user input, and the whole cyc...
Read more at blog.exe.dev