News Score: Score the News, Sort the News, Rewrite the Headlines

Measuring AI Ability to Complete Long Tasks

Summary: We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend predicts that, in under a decade, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks. The length of tasks (measured by how long they take human professionals) ...

Read more at metr.org

© News Score  score the news, sort the news, rewrite the headlines