{"ID":3083541,"CreatedAt":"2026-06-05T06:46:15.197025399Z","UpdatedAt":"2026-06-07T08:27:56.979384103Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.06485","arxiv_id":"2606.06485","title":"PAR3D: A Unified 3D-MLLM with Part-Aware Representation for Scene Understanding","abstract":"Recent advances in 3D multimodal large language models (3D-MLLMs) have enabled unified solutions for 3D scene understanding tasks, including visual question answering, captioning, and referring segmentation. However, existing 3D-MLLMs remain largely object-centric, limiting their ability to model fine-grained part structures that are essential for embodied interaction with 3D environments. In this work, we present PAR3D, a unified part-aware 3D-MLLM framework that enables models to understand, reason about, and ground both objects and their parts in 3D scenes. To enable training and evaluation of part-aware 3D scene understanding, we introduce ScenePart, a synthetic 3D scene dataset with part-level annotations and language instructions. We further develop Part-Aware 3D Representation Learning to enrich 3D visual representations with fine-grained part-level semantics, and propose Hierarchical Segmentation Query Generation to ground part targets via hierarchical object-part queries. Extensive experiments show that our method substantially improves part-level question answering and referring segmentation, while also achieving strong performance across object-level vision-language tasks.","short_abstract":"Recent advances in 3D multimodal large language models (3D-MLLMs) have enabled unified solutions for 3D scene understanding tasks, including visual question answering, captioning, and referring segmentation. However, existing 3D-MLLMs remain largely object-centric, limiting their ability to model fine-grained part stru...","url_abs":"https://arxiv.org/abs/2606.06485","url_pdf":"https://arxiv.org/pdf/2606.06485v1","authors":"[\"Shaohui Dai\",\"Yansong Qu\",\"You Shen\",\"Shengchuan Zhang\",\"Liujuan Cao\"]","published":"2026-06-04T17:59:04Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
