Salesforce study finds LLM agents flunk CRM and confidentiality tests
A new benchmark developed by academics shows that LLM-based AI agents perform below par on standard CRM tests and fail to understand the need for customer confidentiality.
A team led by Kung-Hsiang Huang, a Salesforce AI researcher, showed that using a new benchmark relying on synthetic data, LLM agents achieve around a 58 percent success rate on tasks that can be completed in a single step without needing follow-up actions or more information.
Using the benchmark tool CRMArena-Pro, the team als...
Read more at theregister.com