{"ID":2867893,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.18272","arxiv_id":"2509.18272","title":"StereoFoley: Object-Aware Stereo Audio Generation from Video","abstract":"We present StereoFoley, a video-to-audio generation framework that produces semantically aligned, temporally synchronized, and spatially accurate stereo sound at 48 kHz. While recent generative video-to-audio models achieve strong semantic and temporal fidelity, they largely remain limited to mono or fail to deliver object-aware stereo imaging, constrained by the lack of professionally mixed, spatially accurate video-to-audio datasets. First, we develop a base model that generates stereo audio from video, achieving performance on par with state-of-the-art V2A models in both semantic accuracy and synchronization. Next, to overcome dataset limitations, we introduce a synthetic data generation pipeline that combines video analysis, object tracking, and audio synthesis with dynamic panning and distance-based loudness controls, enabling spatially accurate object-aware sound. Finally, we fine-tune the base model on this synthetic dataset, yielding clear object-audio correspondence. Since no established metrics exist, we introduce a stereo object-awareness metric and report it alongside a human listening study; the two evaluations exhibit consistent trends. This work establishes the first end-to-end framework for stereo object-aware video-to-audio generation, addressing a critical gap in the field.","short_abstract":"We present StereoFoley, a video-to-audio generation framework that produces semantically aligned, temporally synchronized, and spatially accurate stereo sound at 48 kHz. While recent generative video-to-audio models achieve strong semantic and temporal fidelity, they largely remain limited to mono or fail to deliver ob...","url_abs":"https://arxiv.org/abs/2509.18272","url_pdf":"https://arxiv.org/pdf/2509.18272v4","authors":"[\"Tornike Karchkhadze\",\"Kuan-Lin Chen\",\"Mojtaba Heydari\",\"Robert Henzel\",\"Alessandro Toso\",\"Mehrez Souden\",\"Joshua Atkins\"]","published":"2025-09-22T18:00:54Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"cs.MM\",\"eess.AS\"]","methods":"[]","has_code":false}
