{"ID":2844719,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.07477","arxiv_id":"2511.07477","title":"The Polite Liar: Epistemic Pathology in Language Models","abstract":"Large language models exhibit a peculiar epistemic pathology: they speak as if they know, even when they do not. This paper argues that such confident fabrication, what I call the polite liar, is a structural consequence of reinforcement learning from human feedback (RLHF). Building on Frankfurt's analysis of bullshit as communicative indifference to truth, I show that this pathology is not deception but structural indifference: a reward architecture that optimizes for perceived sincerity over evidential accuracy. Current alignment methods reward models for being helpful, harmless, and polite, but not for being epistemically grounded. As a result, systems learn to maximize user satisfaction rather than truth, performing conversational fluency as a virtue. I analyze this behavior through the lenses of epistemic virtue theory, speech-act philosophy, and cognitive alignment, showing that RLHF produces agents trained to mimic epistemic confidence without access to epistemic justification. The polite liar thus reveals a deeper alignment tension between linguistic cooperation and epistemic integrity. The paper concludes with an \"epistemic alignment\" principle: reward justified confidence over perceived fluency.","short_abstract":"Large language models exhibit a peculiar epistemic pathology: they speak as if they know, even when they do not. This paper argues that such confident fabrication, what I call the polite liar, is a structural consequence of reinforcement learning from human feedback (RLHF). Building on Frankfurt's analysis of bullshit...","url_abs":"https://arxiv.org/abs/2511.07477","url_pdf":"https://arxiv.org/pdf/2511.07477v1","authors":"[\"Bentley DeVilling\"]","published":"2025-11-08T21:02:52Z","proceeding":"cs.CY","tasks":"[\"cs.CY\",\"cs.AI\",\"cs.CL\"]","methods":"[\"Reinforcement Learning\",\"Language Model\",\"RLHF\"]","has_code":false}
