{"ID":2891399,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.17585","arxiv_id":"2507.17585","title":"From Scan to Action: Leveraging Realistic Scans for Embodied Scene Understanding","abstract":"Real-world 3D scene-level scans offer realism and can enable better real-world generalizability for downstream applications. However, challenges such as data volume, diverse annotation formats, and tool compatibility limit their use. This paper demonstrates a methodology to effectively leverage these scans and their annotations. We propose a unified annotation integration using USD, with application-specific USD flavors. We identify challenges in utilizing holistic real-world scan datasets and present mitigation strategies. The efficacy of our approach is demonstrated through two downstream applications: LLM-based scene editing, enabling effective LLM understanding and adaptation of the data (80% success), and robotic simulation, achieving an 87% success rate in policy learning.","short_abstract":"Real-world 3D scene-level scans offer realism and can enable better real-world generalizability for downstream applications. However, challenges such as data volume, diverse annotation formats, and tool compatibility limit their use. This paper demonstrates a methodology to effectively leverage these scans and their an...","url_abs":"https://arxiv.org/abs/2507.17585","url_pdf":"https://arxiv.org/pdf/2507.17585v1","authors":"[\"Anna-Maria Halacheva\",\"Jan-Nico Zaech\",\"Sombit Dey\",\"Luc Van Gool\",\"Danda Pani Paudel\"]","published":"2025-07-23T15:20:31Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.RO\"]","methods":"[\"Large Language Model\"]","has_code":false}