{"ID":2836001,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.22470","arxiv_id":"2511.22470","title":"Hybrid, Unified and Iterative: A Novel Framework for Text-based Person Anomaly Retrieval","abstract":"Text-based person anomaly retrieval has emerged as a challenging task, with most existing approaches relying on complex deep-learning techniques. This raises a research question: How can the model be optimized to achieve greater fine-grained features? To address this, we propose a Local-Global Hybrid Perspective (LHP) module integrated with a Vision-Language Model (VLM), designed to explore the effectiveness of incorporating both fine-grained features alongside coarse-grained features. Additionally, we investigate a Unified Image-Text (UIT) model that combines multiple objective loss functions, including Image-Text Contrastive (ITC), Image-Text Matching (ITM), Masked Language Modeling (MLM), and Masked Image Modeling (MIM) loss. Beyond this, we propose a novel iterative ensemble strategy, by combining iteratively instead of using model results simultaneously like other ensemble methods. To take advantage of the superior performance of the LHP model, we introduce a novel feature selection algorithm based on its guidance, which helps improve the model's performance. Extensive experiments demonstrate the effectiveness of our method in achieving state-of-the-art (SOTA) performance on PAB dataset, compared with previous work, with a 9.70\\% improvement in R@1, 1.77\\% improvement in R@5, and 1.01\\% improvement in R@10.","short_abstract":"Text-based person anomaly retrieval has emerged as a challenging task, with most existing approaches relying on complex deep-learning techniques. This raises a research question: How can the model be optimized to achieve greater fine-grained features? To address this, we propose a Local-Global Hybrid Perspective (LHP)...","url_abs":"https://arxiv.org/abs/2511.22470","url_pdf":"https://arxiv.org/pdf/2511.22470v1","authors":"[\"Tien-Huy Nguyen\",\"Huu-Loc Tran\",\"Huu-Phong Phan-Nguyen\",\"Quang-Vinh Dinh\"]","published":"2025-11-27T14:00:53Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Language Model\"]","has_code":false}