{"ID":2871326,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.11208","arxiv_id":"2509.11208","title":"Predictable Compression Failures: Order Sensitivity and Information Budgeting for Evidence-Grounded Binary Adjudication","abstract":"Transformers used for evidence-grounded question answering with binary adjudication (e.g., support/refute or yes/no) can be highly sensitive to the order in which exchangeable evidence is presented, producing dispersion across permutations and unreliable attempted answers (``hallucinations'' under a Bernoulli predicate). We treat evidence order as a nuisance variable and show that next-token training minimizes expected conditional description length over orderings. This objective can be close to Bayes-optimal in expectation while deviating under any fixed ordering. We quantify this expectation--realization gap via a Quantified Martingale Violation (QMV) bound that predicts $\\mathcal{O}(\\log n)$ growth in permutation dispersion under harmonic positional sensitivity. We then derive the Expectation-level Decompression Law (EDFL), relating expected information budget to achievable reliability for Bernoulli predicates, and use it to define \\emph{Bits-to-Trust} (B2T), \\emph{Risk-of-Hallucination} (RoH), and the \\emph{Information Sufficiency Ratio} (ISR), together with a fixed ISR-gating rule for answer/abstain decisions under permutation mixtures. On 3,059 grounded items from a five-benchmark evidence-grounded QA suite (FEVER, HotpotQA, NQ-Open, PopQA, and Controls), we observe logarithmic dispersion and Jensen gains from uniform permutation mixtures. In a pre-specified held-out audit (528 items), an ISR $= 1$ gate attains 0.0--0.7\\% hallucination with 20.6--27.9\\% abstention (95\\% confidence intervals).","short_abstract":"Transformers used for evidence-grounded question answering with binary adjudication (e.g., support/refute or yes/no) can be highly sensitive to the order in which exchangeable evidence is presented, producing dispersion across permutations and unreliable attempted answers (``hallucinations'' under a Bernoulli predicate...","url_abs":"https://arxiv.org/abs/2509.11208","url_pdf":"https://arxiv.org/pdf/2509.11208v2","authors":"[\"Leon Chlon\",\"Ahmed Karim\",\"Maggie Chlon\",\"MarcAntonio Awada\"]","published":"2025-09-14T10:32:59Z","proceeding":"stat.ML","tasks":"[\"stat.ML\",\"cs.LG\"]","methods":"[\"Transformer\"]","has_code":false}
