{"ID":2833469,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.03724","arxiv_id":"2512.03724","title":"PosA-VLA: Enhancing Action Generation via Pose-Conditioned Anchor Attention","abstract":"The Vision-Language-Action (VLA) models have demonstrated remarkable performance on embodied tasks and shown promising potential for real-world applications. However, current VLAs still struggle to produce consistent and precise target-oriented actions, as they often generate redundant or unstable motions along trajectories, limiting their applicability in time-sensitive scenarios.In this work, we attribute these redundant actions to the spatially uniform perception field of existing VLAs, which causes them to be distracted by target-irrelevant objects, especially in complex environments.To address this issue, we propose an efficient PosA-VLA framework that anchors visual attention via pose-conditioned supervision, consistently guiding the model's perception toward task-relevant regions. The pose-conditioned anchor attention mechanism enables the model to better align instruction semantics with actionable visual cues, thereby improving action generation precision and efficiency. Moreover, our framework adopts a lightweight architecture and requires no auxiliary perception modules (e.g., segmentation or grounding networks), ensuring efficient inference. Extensive experiments verify that our method executes embodied tasks with precise and time-efficient behavior across diverse robotic manipulation benchmarks and shows robust generalization in a variety of challenging environments.","short_abstract":"The Vision-Language-Action (VLA) models have demonstrated remarkable performance on embodied tasks and shown promising potential for real-world applications. However, current VLAs still struggle to produce consistent and precise target-oriented actions, as they often generate redundant or unstable motions along traject...","url_abs":"https://arxiv.org/abs/2512.03724","url_pdf":"https://arxiv.org/pdf/2512.03724v2","authors":"[\"Ziwen Li\",\"Xin Wang\",\"Hanlue Zhang\",\"Runnan Chen\",\"Runqi Lin\",\"Xiao He\",\"Han Huang\",\"Yandong Guo\",\"Fakhri Karray\",\"Tongliang Liu\",\"Mingming Gong\"]","published":"2025-12-03T12:14:29Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.RO\"]","methods":"[]","has_code":false}
