{"ID":3050105,"CreatedAt":"2026-06-04T02:13:16.786527022Z","UpdatedAt":"2026-06-06T11:11:21.995702784Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.04727","arxiv_id":"2606.04727","title":"EviRank: Evidence-Based Confidence Estimation for LLM-Based Ranking","abstract":"Large Language Models show promise for recommendation, but they raise reliability concerns due to limited domain coverage and inherent stochasticity. Existing uncertainty quantification methods persist two fundamental challenges: (1) the global confidence score designed for question answering fails to reveal which positions are unreliable in ranking list; (2) fine-grained confidence extracted from model internals exhibits uniformly low values across all positions, making it impossible to filter unreliable predictions. To tackle the challenges, we propose an evidence-based confidence estimation for LLM-based ranking (EviRank). We extract three complementary evidences from a single forward pass and aggregate them via reliable opinion aggregation. Furthermore, we recognize that ranking positions are inherently unequal, and introduce a position-aware calibration. Lastly, the calibrated confidence guides ranking optimization. Experiments on three datasets demonstrate that our method achieves state-of-the-art performance on both recommendation and uncertainty quantification.","short_abstract":"Large Language Models show promise for recommendation, but they raise reliability concerns due to limited domain coverage and inherent stochasticity. Existing uncertainty quantification methods persist two fundamental challenges: (1) the global confidence score designed for question answering fails to reveal which posi...","url_abs":"https://arxiv.org/abs/2606.04727","url_pdf":"https://arxiv.org/pdf/2606.04727v1","authors":"[\"Meng Yan\",\"Cai Xv\",\"Xujing Wang\",\"Ziyu Guan\",\"Wei Zhao\"]","published":"2026-06-03T11:11:30Z","proceeding":"cs.IR","tasks":"[\"cs.IR\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}