{"ID":2846131,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.02378","arxiv_id":"2511.02378","title":"Revisiting put-that-there, context aware window interactions via LLMs","abstract":"We revisit Bolt's classic \"Put-That-There\" concept for modern head-mounted displays by pairing Large Language Models (LLMs) with XR sensor and tech stack. The agent fuses (i) a semantically segmented 3-D environment, (ii) live application metadata, and (iii) users' verbal, pointing, and head-gaze cues to issue JSON window-placement actions. As a result, users can manage a panoramic workspace through: (1) explicit commands (\"Place Google Maps on the coffee table\"), (2) deictic speech plus gestures (\"Put that there\"), or (3) high-level goals (\"I need to send a message\"). Unlike traditional explicit interfaces, our system supports one-to-many action mappings and goal-centric reasoning, allowing the LLM to dynamically infer relevant applications and layout decisions, including interrelationships across tools. This enables seamless, intent-driven interaction without manual window juggling in immersive XR environments.","short_abstract":"We revisit Bolt's classic \"Put-That-There\" concept for modern head-mounted displays by pairing Large Language Models (LLMs) with XR sensor and tech stack. The agent fuses (i) a semantically segmented 3-D environment, (ii) live application metadata, and (iii) users' verbal, pointing, and head-gaze cues to issue JSON win...","url_abs":"https://arxiv.org/abs/2511.02378","url_pdf":"https://arxiv.org/pdf/2511.02378v1","authors":"[\"Riccardo Bovo\",\"Daniele Giunchi\",\"Pasquale Cascarano\",\"Eric J. Gonzalez\",\"Mar Gonzalez-Franco\"]","published":"2025-11-04T08:58:30Z","proceeding":"cs.HC","tasks":"[\"cs.HC\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
