{"ID":2826909,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.17228","arxiv_id":"2512.17228","title":"LUMIA: A Handheld Vision-to-Music System for Real-Time, Embodied Composition","abstract":"Most digital music tools emphasize precision and control, but often lack support for tactile, improvisational workflows grounded in environmental interaction. Lumia addresses this by enabling users to \"compose through looking\"--transforming visual scenes into musical phrases using a handheld, camera-based interface and large multimodal models. A vision-language model (GPT-4V) analyzes captured imagery to generate structured prompts, which, combined with user-selected instrumentation, guide a text-to-music pipeline (Stable Audio). This real-time process allows users to frame, capture, and layer audio interactively, producing loopable musical segments through embodied interaction. The system supports a co-creative workflow where human intent and model inference shape the musical outcome. By embedding generative AI within a physical device, Lumia bridges perception and composition, introducing a new modality for creative exploration that merges vision, language, and sound. It repositions generative music not as a task of parameter tuning, but as an improvisational practice driven by contextual, sensory engagement.","short_abstract":"Most digital music tools emphasize precision and control, but often lack support for tactile, improvisational workflows grounded in environmental interaction. Lumia addresses this by enabling users to \"compose through looking\"--transforming visual scenes into musical phrases using a handheld, camera-based interface and...","url_abs":"https://arxiv.org/abs/2512.17228","url_pdf":"https://arxiv.org/pdf/2512.17228v1","authors":"[\"Chung-Ta Huang\",\"Connie Cheng\",\"Vealy Lai\"]","published":"2025-12-19T04:27:59Z","proceeding":"cs.HC","tasks":"[\"cs.HC\"]","methods":"[\"Language Model\",\"LoRA\"]","has_code":false}
