{"ID":2832028,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.06838","arxiv_id":"2512.06838","title":"SparseCoop: Cooperative Perception with Kinematic-Grounded Queries","abstract":"Cooperative perception is critical for autonomous driving, overcoming the inherent limitations of a single vehicle, such as occlusions and constrained fields-of-view. However, current approaches sharing dense Bird's-Eye-View (BEV) features are constrained by quadratically-scaling communication costs and the lack of flexibility and interpretability for precise alignment across asynchronous or disparate viewpoints. While emerging sparse query-based methods offer an alternative, they often suffer from inadequate geometric representations, suboptimal fusion strategies, and training instability. In this paper, we propose SparseCoop, a fully sparse cooperative perception framework for 3D detection and tracking that completely discards intermediate BEV representations. Our framework features a trio of innovations: a kinematic-grounded instance query that uses an explicit state vector with 3D geometry and velocity for precise spatio-temporal alignment; a coarse-to-fine aggregation module for robust fusion; and a cooperative instance denoising task to accelerate and stabilize training. Experiments on V2X-Seq and Griffin datasets show SparseCoop achieves state-of-the-art performance. Notably, it delivers this with superior computational efficiency, low transmission cost, and strong robustness to communication latency. Code is available at https://github.com/wang-jh18-SVM/SparseCoop.","short_abstract":"Cooperative perception is critical for autonomous driving, overcoming the inherent limitations of a single vehicle, such as occlusions and constrained fields-of-view. However, current approaches sharing dense Bird's-Eye-View (BEV) features are constrained by quadratically-scaling communication costs and the lack of fle...","url_abs":"https://arxiv.org/abs/2512.06838","url_pdf":"https://arxiv.org/pdf/2512.06838v1","authors":"[\"Jiahao Wang\",\"Zhongwei Jiang\",\"Wenchao Sun\",\"Jiaru Zhong\",\"Haibao Yu\",\"Yuner Zhang\",\"Chenyang Lu\",\"Chuang Zhang\",\"Lei He\",\"Shaobing Xu\",\"Jianqiang Wang\"]","published":"2025-12-07T13:22:06Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false,"code_links":[{"ID":606189,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2832028,"paper_url":"https://arxiv.org/abs/2512.06838","paper_title":"SparseCoop: Cooperative Perception with Kinematic-Grounded Queries","repo_url":"https://github.com/wang-jh18-SVM/SparseCoop","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}