{"ID":2841445,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.10866","arxiv_id":"2511.10866","title":"Short-Window Sliding Learning for Real-Time Violence Detection via LLM-based Auto-Labeling","abstract":"This paper proposes a Short-Window Sliding Learning framework for real-time violence detection in CCTV footages. Unlike conventional long-video training approaches, the proposed method divides videos into 1-2 second clips and applies Large Language Model (LLM)-based auto-caption labeling to construct fine-grained datasets. Each short clip fully utilizes all frames to preserve temporal continuity, enabling precise recognition of rapid violent events. Experiments demonstrate that the proposed method achieves 95.25\\% accuracy on RWF-2000 and significantly improves performance on long videos (UCF-Crime: 83.25\\%), confirming its strong generalization and real-time applicability in intelligent surveillance systems.","short_abstract":"This paper proposes a Short-Window Sliding Learning framework for real-time violence detection in CCTV footages. Unlike conventional long-video training approaches, the proposed method divides videos into 1-2 second clips and applies Large Language Model (LLM)-based auto-caption labeling to construct fine-grained datas...","url_abs":"https://arxiv.org/abs/2511.10866","url_pdf":"https://arxiv.org/pdf/2511.10866v1","authors":"[\"Seoik Jung\",\"Taekyung Song\",\"Yangro Lee\",\"Sungjun Lee\"]","published":"2025-11-14T00:29:31Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}