{"ID":3053308,"CreatedAt":"2026-06-04T04:41:36.695875263Z","UpdatedAt":"2026-06-06T00:47:47.621308184Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.04323","arxiv_id":"2606.04323","title":"Answer Self-Consistency with Margin-Triggered Question Re-Arbitration for the CVPR 2026 VidLLMs Challenge","abstract":"In this report, we present our solution for Track 2 of the CVPR 2026 VidLLMs Challenge. This track evaluates visual relational reasoning in videos, where models must infer relations that are not always explicitly visible. We propose Answer Self-Consistency with Margin-Triggered Question Re-Arbitration (ASC-MQRA), a training-free test-time reasoning framework built on a multimodal reasoning model. The core ASC component performs multiple stochastic video question-answering runs and aggregates their answer choices through answer-level self-consistency. This substantially improves over single-pass inference and forms our final test submission. We further study MQRA, a conditional re-arbitration module for low-margin examples where the first-stage vote distribution indicates uncertainty. Our vote-margin analysis shows that low-margin examples often retain the ground-truth answer among the top candidates, motivating MQRA to narrow the candidate set and re-watch the video only over the retained candidates. On validation, MQRA further improves over ASC, indicating that low-margin vote distributions can provide a useful uncertainty signal. On test, however, MQRA slightly degrades performance relative to ASC, suggesting that re-arbitration is sensitive to the size and category distribution of the triggered subset. Our final test submission therefore uses ASC without re-arbitration, achieving 72.73 average accuracy and 78.34 category-wise macro average accuracy on validation, and 81.16 average accuracy and 80.91 category-wise macro average accuracy on test. This report details our prompting strategy, implementation setup, ablation studies, and diagnostic analyses. The code is available at https://github.com/data-analytics-labo/ASC-MQRA","short_abstract":"In this report, we present our solution for Track 2 of the CVPR 2026 VidLLMs Challenge. This track evaluates visual relational reasoning in videos, where models must infer relations that are not always explicitly visible. We propose Answer Self-Consistency with Margin-Triggered Question Re-Arbitration (ASC-MQRA), a tra...","url_abs":"https://arxiv.org/abs/2606.04323","url_pdf":"https://arxiv.org/pdf/2606.04323v1","authors":"[\"Tomoya Miyazawa\",\"Hiroyasu Okuno\"]","published":"2026-06-03T00:51:55Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Large Language Model\"]","has_code":false,"code_links":[{"ID":612806,"CreatedAt":"2026-06-04T04:41:36.695875263Z","UpdatedAt":"2026-06-04T04:41:36.695875263Z","DeletedAt":null,"paper_id":3053308,"paper_url":"https://arxiv.org/abs/2606.04323","paper_title":"Answer Self-Consistency with Margin-Triggered Question Re-Arbitration for the CVPR 2026 VidLLMs Challenge","repo_url":"https://github.com/data-analytics-labo/ASC-MQRA","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
