{"ID":2887299,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.01711","arxiv_id":"2508.01711","title":"GAIS: Frame-Level Gated Audio-Visual Integration with Semantic Variance-Scaled Perturbation for Text-Video Retrieval","abstract":"Text-to-video retrieval requires precise alignment between language and temporally rich audio-video signals. However, existing methods often emphasize visual cues while underutilizing audio semantics or relying on coarse fusion strategies, resulting in suboptimal multimodal representations. We introduce GAIS, a retrieval framework that strengthens multimodal alignment from both representation and regularization perspectives. First, a Frame-level Gated Fusion (FGF) module adaptively integrates audio-visual features under textual guidance, enabling fine-grained temporal selection of informative frames. Second, a Semantic Variance-Scaled Perturbation (SVSP) mechanism regularizes the text embedding space by controlling perturbation magnitude in a semantics-aware manner. These two modules are complementary: FGF minimizes modality gaps through selective fusion, while SVSP improves embedding stability and discrimination. Extensive experiments on MSR-VTT, DiDeMo, LSMDC, and VATEX demonstrate that GAIS consistently outperforms strong baselines across multiple retrieval metrics while maintaining notable computational efficiency.","short_abstract":"Text-to-video retrieval requires precise alignment between language and temporally rich audio-video signals. However, existing methods often emphasize visual cues while underutilizing audio semantics or relying on coarse fusion strategies, resulting in suboptimal multimodal representations. We introduce GAIS, a retriev...","url_abs":"https://arxiv.org/abs/2508.01711","url_pdf":"https://arxiv.org/pdf/2508.01711v2","authors":"[\"Bowen Yang\",\"Yun Cao\",\"Chen He\",\"Xiaosu Su\"]","published":"2025-08-03T10:44:24Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\"]","methods":"[]","has_code":false}
