{"ID":2882583,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.09428","arxiv_id":"2508.09428","title":"What-Meets-Where: Unified Learning of Action and Contact Localization in Images","abstract":"People control their bodies to establish contact with the environment. To comprehensively understand actions across diverse visual contexts, it is essential to simultaneously consider \\textbf{what} action is occurring and \\textbf{where} it is happening. Current methodologies, however, often inadequately capture this duality, typically failing to jointly model both action semantics and their spatial contextualization within scenes. To bridge this gap, we introduce a novel vision task that simultaneously predicts high-level action semantics and fine-grained body-part contact regions. Our proposed framework, PaIR-Net, comprises three key components: the Contact Prior Aware Module (CPAM) for identifying contact-relevant body parts, the Prior-Guided Concat Segmenter (PGCS) for pixel-wise contact segmentation, and the Interaction Inference Module (IIM) responsible for integrating global interaction relationships. To facilitate this task, we present PaIR (Part-aware Interaction Representation), a comprehensive dataset containing 13,979 images that encompass 654 actions, 80 object categories, and 17 body parts. Experimental evaluation demonstrates that PaIR-Net significantly outperforms baseline approaches, while ablation studies confirm the efficacy of each architectural component. The code and dataset will be released upon publication.","short_abstract":"People control their bodies to establish contact with the environment. To comprehensively understand actions across diverse visual contexts, it is essential to simultaneously consider \\textbf{what} action is occurring and \\textbf{where} it is happening. Current methodologies, however, often inadequately capture this du...","url_abs":"https://arxiv.org/abs/2508.09428","url_pdf":"https://arxiv.org/pdf/2508.09428v2","authors":"[\"Yuxiao Wang\",\"Yu Lei\",\"Wolin Liang\",\"Weiying Xue\",\"Zhenao Wei\",\"Nan Zhuang\",\"Qi Liu\"]","published":"2025-08-13T02:06:33Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\"]","methods":"[]","has_code":false}
