{"ID":2854720,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.14824","arxiv_id":"2510.14824","title":"Supervised Fine-Tuning or Contrastive Learning? Towards Better Multimodal LLM Reranking","abstract":"In information retrieval, training reranking models mainly focuses on two types of objectives: metric learning (e.g. contrastive loss to increase the predicted scores on relevant query-document pairs) and classification (binary label prediction of relevance vs. irrelevance). For BERT-style encoders, various studies have shown that contrastive learning (CL) can be more effective than discriminative (classification) learning. However, for large language models (LLMs), classification via supervised fine-tuning (SFT), which predicts ''yes'' (resp. ''no'') token for relevant (resp. irrelevant) pairs, appears more promising as it aligns well with the generative nature of LLMs. This divergence raises a central question: which objective is intrinsically better suited to LLM-based reranking, and what mechanism underlies the difference? In this work, we conduct a comprehensive comparison and analysis between CL and SFT for reranking, taking the universal multimodal retrieval (UMR) as the experimental playground. We first decompose the objectives into two components: weight, which controls the magnitude of those updates, and direction, which guides the model updates, then present a unified framework for understanding their interactions. Through probing experiments, we find that SFT provides a substantially stronger weighting scheme than CL, whereas the preferred scoring direction shows no clear winner. Taken together, these results point to a consistent advantage of SFT over CL for LLM reranking. To further validate our findings, we conduct large-scale training with SFT and present new state-of-the-art rerankers on the MRB benchmark. We also provide ablations on SFT settings and expect our findings to benefit future research and applications in this area.","short_abstract":"In information retrieval, training reranking models mainly focuses on two types of objectives: metric learning (e.g. contrastive loss to increase the predicted scores on relevant query-document pairs) and classification (binary label prediction of relevance vs. irrelevance). For BERT-style encoders, various studies hav...","url_abs":"https://arxiv.org/abs/2510.14824","url_pdf":"https://arxiv.org/pdf/2510.14824v1","authors":"[\"Ziqi Dai\",\"Xin Zhang\",\"Mingxin Li\",\"Yanzhao Zhang\",\"Dingkun Long\",\"Pengjun Xie\",\"Meishan Zhang\",\"Wenjie Li\",\"Min Zhang\"]","published":"2025-10-16T16:02:27Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.CV\",\"cs.IR\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
