{"ID":2835809,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.22154","arxiv_id":"2511.22154","title":"WearVQA: A Visual Question Answering Benchmark for Wearables in Egocentric Authentic Real-world scenarios","abstract":"We introduce WearVQA, the first benchmark specifically designed to evaluate the Visual Question Answering (VQA) capabilities of multi-model AI assistant on wearable devices like smart glasses. Unlike prior benchmarks that focus on high-quality, third-person imagery, WearVQA reflects the unique challenges of ego-centric interaction-where visual inputs may be occluded, poorly lit, unzoomed, or blurry, and questions are grounded in realistic wearable use cases. The benchmark comprises 2,520 carefully curated image-question-answer triplets, spanning 7 diverse image domains including both text-centric and general scenes, 10 cognitive task types ranging from basic recognition to various forms of reasoning, and 6 common wearables-specific image quality issues. All questions are designed to be answerable using only the visual input and common senses. WearVQA is paired with a rigorous LLM-as-a-judge evaluation framework with 96% labeling accuracy. Open-source and proprietary multi-model LLMs achieved a QA accuracy as low as 24-52% on WearVQA, with substantial drops on lower-quality images and reasoning-heavy tasks. These observations position WearVQA as a comprehensive and challenging benchmark for guiding technical advancement towards robust, real-world multi-model wearables AI systems.","short_abstract":"We introduce WearVQA, the first benchmark specifically designed to evaluate the Visual Question Answering (VQA) capabilities of multi-model AI assistant on wearable devices like smart glasses. Unlike prior benchmarks that focus on high-quality, third-person imagery, WearVQA reflects the unique challenges of ego-centric...","url_abs":"https://arxiv.org/abs/2511.22154","url_pdf":"https://arxiv.org/pdf/2511.22154v2","authors":"[\"Eun Chang\",\"Zhuangqun Huang\",\"Yiwei Liao\",\"Sagar Ravi Bhavsar\",\"Amogh Param\",\"Tammy Stark\",\"Adel Ahmadyan\",\"Xiao Yang\",\"Jiaqi Wang\",\"Ahsan Abdullah\",\"Giang Nguyen\",\"Akil Iyer\",\"David Hall\",\"Elissa Li\",\"Shane Moon\",\"Nicolas Scheffer\",\"Kirmani Ahmed\",\"Babak Damavandi\",\"Rakesh Wanga\",\"Anuj Kumar\",\"Rohit Patel\",\"Xin Luna Dong\"]","published":"2025-11-27T06:44:49Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Large Language Model\"]","has_code":false}
