{"ID":2890974,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.18523","arxiv_id":"2507.18523","title":"The Moral Gap of Large Language Models","abstract":"Moral foundation detection is crucial for analyzing social discourse and developing ethically-aligned AI systems. While large language models excel across diverse tasks, their performance on specialized moral reasoning remains unclear. This study provides the first comprehensive comparison between state-of-the-art LLMs and fine-tuned transformers across Twitter and Reddit datasets using ROC, PR, and DET curve analysis. Results reveal substantial performance gaps, with LLMs exhibiting high false negative rates and systematic under-detection of moral content despite prompt engineering efforts. These findings demonstrate that task-specific fine-tuning remains superior to prompting for moral reasoning applications.","short_abstract":"Moral foundation detection is crucial for analyzing social discourse and developing ethically-aligned AI systems. While large language models excel across diverse tasks, their performance on specialized moral reasoning remains unclear. This study provides the first comprehensive comparison between state-of-the-art LLMs...","url_abs":"https://arxiv.org/abs/2507.18523","url_pdf":"https://arxiv.org/pdf/2507.18523v1","authors":"[\"Maciej Skorski\",\"Alina Landowska\"]","published":"2025-07-24T15:49:06Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.CY\",\"cs.HC\",\"cs.LG\"]","methods":"[\"Transformer\",\"Large Language Model\",\"Language Model\"]","has_code":false}