{"ID":2831778,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.07738","arxiv_id":"2512.07738","title":"HLTCOE Evaluation Team at TREC 2025: VQA Track","abstract":"The HLTCOE Evaluation team participated in TREC VQA's Answer Generation (AG) task, for which we developed a listwise learning framework that aims to improve semantic precision and ranking consistency in answer generation. Given a video-question pair, a base multimodal model first generates multiple candidate answers, which are then reranked using a model trained with a novel Masked Pointer Cross-Entropy Loss with Rank Weights. This objective integrates pointer-based candidate selection, rank-dependent weighting, and masked cross-entropy under vocabulary restriction, enabling stable and interpretable listwise optimization. By bridging generative modeling with discriminative ranking, our method produces coherent, fine-grained answer lists. Experiments reveal consistent gains in accuracy and ranking stability, especially for questions requiring temporal reasoning and semantic disambiguation.","short_abstract":"The HLTCOE Evaluation team participated in TREC VQA's Answer Generation (AG) task, for which we developed a listwise learning framework that aims to improve semantic precision and ranking consistency in answer generation. Given a video-question pair, a base multimodal model first generates multiple candidate answers, w...","url_abs":"https://arxiv.org/abs/2512.07738","url_pdf":"https://arxiv.org/pdf/2512.07738v1","authors":"[\"Dengjia Zhang\",\"Charles Weng\",\"Katherine Guerrerio\",\"Yi Lu\",\"Kenton Murray\",\"Alexander Martin\",\"Reno Kriz\",\"Benjamin Van Durme\"]","published":"2025-12-08T17:25:13Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false}