{"ID":2866130,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.21447","arxiv_id":"2509.21447","title":"ARTI-6: Towards Six-dimensional Articulatory Speech Encoding","abstract":"We propose ARTI-6, a compact six-dimensional articulatory speech encoding framework derived from real-time MRI data that captures crucial vocal tract regions including the velum, tongue root, and larynx. ARTI-6 consists of three components: (1) a six-dimensional articulatory feature set representing key regions of the vocal tract; (2) an articulatory inversion model, which predicts articulatory features from speech acoustics leveraging speech foundation models, achieving a prediction correlation of 0.87; and (3) an articulatory synthesis model, which reconstructs intelligible speech directly from articulatory features, showing that even a low-dimensional representation can generate natural-sounding speech. Together, ARTI-6 provides an interpretable, computationally efficient, and physiologically grounded framework for advancing articulatory inversion, synthesis, and broader speech technology applications. The source code and speech samples are publicly available.","short_abstract":"We propose ARTI-6, a compact six-dimensional articulatory speech encoding framework derived from real-time MRI data that captures crucial vocal tract regions including the velum, tongue root, and larynx. ARTI-6 consists of three components: (1) a six-dimensional articulatory feature set representing key regions of the...","url_abs":"https://arxiv.org/abs/2509.21447","url_pdf":"https://arxiv.org/pdf/2509.21447v2","authors":"[\"Jihwan Lee\",\"Sean Foley\",\"Thanathai Lertpetchpun\",\"Kevin Huang\",\"Yoonjeong Lee\",\"Tiantian Feng\",\"Louis Goldstein\",\"Dani Byrd\",\"Shrikanth Narayanan\"]","published":"2025-09-25T19:18:35Z","proceeding":"eess.AS","tasks":"[\"eess.AS\",\"cs.AI\",\"cs.CL\"]","methods":"[]","has_code":false}
