{"ID":2887737,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.00240","arxiv_id":"2508.00240","title":"Ambisonics Super-Resolution Using A Waveform-Domain Neural Network","abstract":"Ambisonics is a spatial audio format describing a sound field. First-order Ambisonics (FOA) is a popular format comprising only four channels. This limited channel count comes at the expense of spatial accuracy. Ideally one would be able to take the efficiency of a FOA format without its limitations. We have devised a data-driven spatial audio solution that retains the efficiency of the FOA format but achieves quality that surpasses conventional renderers. Utilizing a fully convolutional time-domain audio neural network (Conv-TasNet), we created a solution that takes a FOA input and provides a higher order Ambisonics (HOA) output. This data driven approach is novel when compared to typical physics and psychoacoustic based renderers. Quantitative evaluations showed a 0.6dB average positional mean squared error difference between predicted and actual 3rd order HOA. The median qualitative rating showed an 80% improvement in perceived quality over the traditional rendering approach.","short_abstract":"Ambisonics is a spatial audio format describing a sound field. First-order Ambisonics (FOA) is a popular format comprising only four channels. This limited channel count comes at the expense of spatial accuracy. Ideally one would be able to take the efficiency of a FOA format without its limitations. We have devised a...","url_abs":"https://arxiv.org/abs/2508.00240","url_pdf":"https://arxiv.org/pdf/2508.00240v1","authors":"[\"Ismael Nawfal\",\"Symeon Delikaris Manias\",\"Mehrez Souden\",\"Juha Merimaa\",\"Joshua Atkins\",\"Elisabeth McMullin\",\"Shadi Pirhosseinloo\",\"Daniel Phillips\"]","published":"2025-08-01T00:51:47Z","proceeding":"eess.AS","tasks":"[\"eess.AS\",\"cs.SD\"]","methods":"[]","has_code":false}
