PA Bench: Evaluating Web Agents on Real World Personal Assistant Workflows
IntroductionBrowser-based and computer-use agents are becoming increasingly popular for automating consumer workflows that involve interacting with web applications through clicks, typing, and navigation. Many of these workflows mirror how humans use personal assistant tools today—by coordinating information across multiple applications such as email, calendars, and booking platforms. However, it remains unclear whether current frontier computer-use agents are capable of reliably completing such...
Read more at vibrantlabs.com