{"ID":2842184,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.10134","arxiv_id":"2511.10134","title":"Explicit Temporal-Semantic Modeling for Dense Video Captioning via Context-Aware Cross-Modal Interaction","abstract":"Dense video captioning jointly localizes and captions salient events in untrimmed videos. Recent methods primarily focus on leveraging additional prior knowledge and advanced multi-task architectures to achieve competitive performance. However, these pipelines rely on implicit modeling that uses frame-level or fragmented video features, failing to capture the temporal coherence across event sequences and comprehensive semantics within visual contexts. To address this, we propose an explicit temporal-semantic modeling framework called Context-Aware Cross-Modal Interaction (CACMI), which leverages both latent temporal characteristics within videos and linguistic semantics from text corpus. Specifically, our model consists of two core components: Cross-modal Frame Aggregation aggregates relevant frames to extract temporally coherent, event-aligned textual features through cross-modal retrieval; and Context-aware Feature Enhancement utilizes query-guided attention to integrate visual dynamics with pseudo-event semantics. Extensive experiments on the ActivityNet Captions and YouCook2 datasets demonstrate that CACMI achieves the state-of-the-art performance on dense video captioning task.","short_abstract":"Dense video captioning jointly localizes and captions salient events in untrimmed videos. Recent methods primarily focus on leveraging additional prior knowledge and advanced multi-task architectures to achieve competitive performance. However, these pipelines rely on implicit modeling that uses frame-level or fragment...","url_abs":"https://arxiv.org/abs/2511.10134","url_pdf":"https://arxiv.org/pdf/2511.10134v1","authors":"[\"Mingda Jia\",\"Weiliang Meng\",\"Zenghuang Fu\",\"Yiheng Li\",\"Qi Zeng\",\"Yifan Zhang\",\"Ju Xin\",\"Rongtao Xu\",\"Jiguang Zhang\",\"Xiaopeng Zhang\"]","published":"2025-11-13T09:48:12Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false}