{"ID":2890233,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.00889","arxiv_id":"2508.00889","title":"FECT: Factuality Evaluation of Interpretive AI-Generated Claims in Contact Center Conversation Transcripts","abstract":"Large language models (LLMs) are known to hallucinate, producing natural language outputs that are not grounded in the input, reference materials, or real-world knowledge. In enterprise applications where AI features support business decisions, such hallucinations can be particularly detrimental. LLMs that analyze and summarize contact center conversations introduce a unique set of challenges for factuality evaluation, because ground-truth labels often do not exist for analytical interpretations about sentiments captured in the conversation and root causes of the business problems. To remedy this, we first introduce a \\textbf{3D} -- \\textbf{Decompose, Decouple, Detach} -- paradigm in the human annotation guideline and the LLM-judges' prompt to ground the factuality labels in linguistically-informed evaluation criteria. We then introduce \\textbf{FECT}, a novel benchmark dataset for \\textbf{F}actuality \\textbf{E}valuation of Interpretive AI-Generated \\textbf{C}laims in Contact Center Conversation \\textbf{T}ranscripts, labeled under our 3D paradigm. Lastly, we report our findings from aligning LLM-judges on the 3D paradigm. Overall, our findings contribute a new approach for automatically evaluating the factuality of outputs generated by an AI system for analyzing contact center conversations.","short_abstract":"Large language models (LLMs) are known to hallucinate, producing natural language outputs that are not grounded in the input, reference materials, or real-world knowledge. In enterprise applications where AI features support business decisions, such hallucinations can be particularly detrimental. LLMs that analyze and...","url_abs":"https://arxiv.org/abs/2508.00889","url_pdf":"https://arxiv.org/pdf/2508.00889v1","authors":"[\"Hagyeong Shin\",\"Binoy Robin Dalal\",\"Iwona Bialynicka-Birula\",\"Navjot Matharu\",\"Ryan Muir\",\"Xingwei Yang\",\"Samuel W. K. Wong\"]","published":"2025-07-26T18:14:18Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\",\"cs.LG\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
