Representation-based Broad Hallucination Detectors Fail to Generalize Out of Distribution

Sep 19, 2025 cs.LG arXiv:2509.19372

Abstract

We critically assess the efficacy of the current SOTA in hallucination detection and find that its performance on the RAGTruth dataset is largely driven by a spurious correlation with data. Controlling for this effect, state-of-the-art performs no better than supervised linear probes, while requiring extensive hyperparameter tuning across datasets. Out-of-distribution generalization is currently out of reach, with all of the analyzed methods performing close to random. We propose a set of guidelines for hallucination detection and its evaluation.

Abstract

PDF Viewer