{"ID":2859783,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.04633","arxiv_id":"2510.04633","title":"Topic-Specific Classifiers are Better Relevance Judges than Prompted LLMs","abstract":"The unjudged document problem, where systems that did not contribute to the original judgement pool may retrieve documents without a relevance judgement, is a key obstacle to the reuseability of test collections in information retrieval. While the de facto standard to deal with the problem is to treat unjudged documents as non-relevant, many alternatives have been proposed, such as the use of large language models (LLMs) as a relevance judge (LLM-as-a-judge). However, this has been criticized, among other things, as circular, since the same LLM can be used as the ranker and the judge. We propose to train topic-specific relevance classifiers instead: By finetuning monoT5 with independent LoRA weight adaptation on the judgments of a single assessor for a single topic's pool, we align it to that assessor's notion of relevance for the topic. The system rankings obtained through our classifier's relevance judgments achieve a Spearmans' $ρ$ correlation of $\u003e0.94$ with ground truth system rankings. As little as 128 initial human judgments per topic suffice to improve the comparability of models, compared to treating unjudged documents as non-relevant, while achieving more reliability than existing LLM-as-a-judge approaches. Topic-specific relevance classifiers are thus a lightweight and straightforward way to tackle the unjudged document problem, while maintaining human judgments as the gold standard for retrieval evaluation. Code, models, and data are made openly available.","short_abstract":"The unjudged document problem, where systems that did not contribute to the original judgement pool may retrieve documents without a relevance judgement, is a key obstacle to the reuseability of test collections in information retrieval. While the de facto standard to deal with the problem is to treat unjudged document...","url_abs":"https://arxiv.org/abs/2510.04633","url_pdf":"https://arxiv.org/pdf/2510.04633v2","authors":"[\"Lukas Gienapp\",\"Martin Potthast\",\"Andrew Yates\",\"Harrisen Scells\",\"Eugene Yang\"]","published":"2025-10-06T09:38:13Z","proceeding":"cs.IR","tasks":"[\"cs.IR\"]","methods":"[\"Large Language Model\",\"Language Model\",\"LoRA\"]","has_code":false}
