{"ID":2859426,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.06060","arxiv_id":"2510.06060","title":"Controllable Audio-Visual Viewpoint Generation from 360° Spatial Information","abstract":"The generation of sounding videos has seen significant advancements with the advent of diffusion models. However, existing methods often lack the fine-grained control needed to generate viewpoint-specific content from larger, immersive 360-degree environments. This limitation restricts the creation of audio-visual experiences that are aware of off-camera events. To the best of our knowledge, this is the first work to introduce a framework for controllable audio-visual generation, addressing this unexplored gap. Specifically, we propose a diffusion model by introducing a set of powerful conditioning signals derived from the full 360-degree space: a panoramic saliency map to identify regions of interest, a bounding-box-aware signed distance map to define the target viewpoint, and a descriptive caption of the entire scene. By integrating these controls, our model generates spatially-aware viewpoint videos and audios that are coherently influenced by the broader, unseen environmental context, introducing a strong controllability that is essential for realistic and immersive audio-visual generation. We show audiovisual examples proving the effectiveness of our framework.","short_abstract":"The generation of sounding videos has seen significant advancements with the advent of diffusion models. However, existing methods often lack the fine-grained control needed to generate viewpoint-specific content from larger, immersive 360-degree environments. This limitation restricts the creation of audio-visual expe...","url_abs":"https://arxiv.org/abs/2510.06060","url_pdf":"https://arxiv.org/pdf/2510.06060v1","authors":"[\"Christian Marinoni\",\"Riccardo Fosco Gramaccioni\",\"Eleonora Grassucci\",\"Danilo Comminiello\"]","published":"2025-10-07T15:53:31Z","proceeding":"cs.MM","tasks":"[\"cs.MM\",\"cs.AI\",\"cs.CV\"]","methods":"[\"Diffusion Model\"]","has_code":false}
