{"ID":2886000,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.04868","arxiv_id":"2508.04868","title":"Dual-Stream Attention with Multi-Modal Queries for Object Detection in Transportation Applications","abstract":"Transformer-based object detectors often struggle with occlusions, fine-grained localization, and computational inefficiency caused by fixed queries and dense attention. We propose DAMM, Dual-stream Attention with Multi-Modal queries, a novel framework introducing both query adaptation and structured cross-attention for improved accuracy and efficiency. DAMM capitalizes on three types of queries: appearance-based queries from vision-language models, positional queries using polygonal embeddings, and random learned queries for general scene coverage. Furthermore, a dual-stream cross-attention module separately refines semantic and spatial features, boosting localization precision in cluttered scenes. We evaluated DAMM on four challenging benchmarks, and it achieved state-of-the-art performance in average precision (AP) and recall, demonstrating the effectiveness of multi-modal query adaptation and dual-stream attention. Source code is at: \\href{https://github.com/DET-LIP/DAMM}{GitHub}.","short_abstract":"Transformer-based object detectors often struggle with occlusions, fine-grained localization, and computational inefficiency caused by fixed queries and dense attention. We propose DAMM, Dual-stream Attention with Multi-Modal queries, a novel framework introducing both query adaptation and structured cross-attention fo...","url_abs":"https://arxiv.org/abs/2508.04868","url_pdf":"https://arxiv.org/pdf/2508.04868v1","authors":"[\"Noreen Anwar\",\"Guillaume-Alexandre Bilodeau\",\"Wassim Bouachir\"]","published":"2025-08-06T20:37:24Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Transformer\",\"Language Model\"]","has_code":false,"code_links":[{"ID":611265,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2886000,"paper_url":"https://arxiv.org/abs/2508.04868","paper_title":"Dual-Stream Attention with Multi-Modal Queries for Object Detection in Transportation Applications","repo_url":"https://github.com/DET-LIP/DAMM","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
