{"ID":2871134,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.12423","arxiv_id":"2509.12423","title":"Small Models, Big Results: Achieving Superior Intent Extraction through Decomposition","abstract":"Understanding user intents from UI interaction trajectories remains a challenging, yet crucial, frontier in intelligent agent development. While massive, datacenter-based, multi-modal large language models (MLLMs) possess greater capacity to handle the complexities of such sequences, smaller models which can run on-device to provide a privacy-preserving, low-cost, and low-latency user experience, struggle with accurate intent inference. We address these limitations by introducing a novel decomposed approach: first, we perform structured interaction summarization, capturing key information from each user action. Second, we perform intent extraction using a fine-tuned model operating on the aggregated summaries. This method improves intent understanding in resource-constrained models, even surpassing the base performance of large MLLMs.","short_abstract":"Understanding user intents from UI interaction trajectories remains a challenging, yet crucial, frontier in intelligent agent development. While massive, datacenter-based, multi-modal large language models (MLLMs) possess greater capacity to handle the complexities of such sequences, smaller models which can run on-dev...","url_abs":"https://arxiv.org/abs/2509.12423","url_pdf":"https://arxiv.org/pdf/2509.12423v1","authors":"[\"Danielle Cohen\",\"Yoni Halpern\",\"Noam Kahlon\",\"Joel Oren\",\"Omri Berkovitch\",\"Sapir Caduri\",\"Ido Dagan\",\"Anatoly Efros\"]","published":"2025-09-15T20:20:30Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.CL\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}