Salesforce Study: LLM Agents Fail CRM Tests, Struggle with Confidentiality; Only 58% Success on Simple Tasks, 35% on Complex

Salesforce study finds LLM agents flunk CRM and confidentiality tests

A new benchmark developed by academics shows that LLM-based AI agents perform below par on standard CRM tests and fail to understand the need for customer confidentiality. A team led by Kung-Hsiang Huang, a Salesforce AI researcher, showed that using a new benchmark relying on synthetic data, LLM agents achieve around a 58 percent success rate on tasks that can be completed in a single step without needing follow-up actions or more information. Using the benchmark tool CRMArena-Pro, the team als...