{"ID":2839749,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.15923","arxiv_id":"2511.15923","title":"RB-FT: Rationale-Bootstrapped Fine-Tuning for Video Classification","abstract":"Vision Language Models (VLMs) are becoming increasingly integral to multimedia understanding; however, they often struggle with domain-specific video classification tasks, particularly in cases with limited data. This stems from a critical \\textit{rationale gap}, where sparse domain data is insufficient to bridge the semantic distance between complex spatio-temporal content and abstract classification labels. We propose a two-stage self-improvement paradigm to bridge this gap without new annotations. First, we prompt the VLMs to generate detailed textual rationales for each video, compelling them to articulate the domain-specific logic. The VLM is then fine-tuned on these self-generated rationales, utilizing this intermediate supervision to align its representations with the nuances of the target domain. Second, conventional supervised fine-tuning (SFT) is performed on the task labels, achieving markedly higher effectiveness as a result of the model's pre-acquired domain reasoning. Extensive experiments on diverse datasets demonstrate that our method significantly outperforms direct SFT, validating self-generated rationale as an effective, annotation-efficient paradigm for adapting VLMs to domain-specific video analysis.","short_abstract":"Vision Language Models (VLMs) are becoming increasingly integral to multimedia understanding; however, they often struggle with domain-specific video classification tasks, particularly in cases with limited data. This stems from a critical \\textit{rationale gap}, where sparse domain data is insufficient to bridge the s...","url_abs":"https://arxiv.org/abs/2511.15923","url_pdf":"https://arxiv.org/pdf/2511.15923v1","authors":"[\"Meilong Xu\",\"Di Fu\",\"Jiaxing Zhang\",\"Gong Yu\",\"Jiayu Zheng\",\"Xiaoling Hu\",\"Dongdi Zhao\",\"Feiyang Li\",\"Chao Chen\",\"Yong Cao\"]","published":"2025-11-19T23:12:18Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Language Model\"]","has_code":false}
