{"ID":3005080,"CreatedAt":"2026-06-03T03:09:48.883664427Z","UpdatedAt":"2026-06-05T07:50:16.0004273Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.03331","arxiv_id":"2606.03331","title":"Evaluating LLMs' Effectiveness on Real-World Consumer Device Repair Questions","abstract":"Consumer device repair is an important but underexplored testbed for large language models (LLMs). Repair tasks require reasoning over incomplete problem descriptions, hardware-specific diagnostics, actionable troubleshooting, and safety-critical decisions, where incorrect advice can cause device damage, battery hazards, or permanent data loss. We introduce a benchmark of 991 real-world repair questions from Reddit spanning phone repair, computer repair, and data recovery, each paired with technician-written reference solutions, and provide Bangla translations to evaluate cross-lingual performance. We evaluate six state-of-the-art LLMs in English and Bangla using four repair-specific criteria: correctness, completeness, practicality, and safety. Our results show that while LLMs can provide useful repair assistance, they remain unreliable for high-risk real-world repair tasks without rigorous evaluation and explicit safety safeguards. Phone repair is the most difficult and safety-sensitive domain, and all models make substantial errors in board-level diagnosis, repair prioritization, and safe recovery procedures. Across domains and models, Bangla responses consistently perform worse than English responses. Among the evaluated models, GPT-5.4 performs best overall.","short_abstract":"Consumer device repair is an important but underexplored testbed for large language models (LLMs). Repair tasks require reasoning over incomplete problem descriptions, hardware-specific diagnostics, actionable troubleshooting, and safety-critical decisions, where incorrect advice can cause device damage, battery hazard...","url_abs":"https://arxiv.org/abs/2606.03331","url_pdf":"https://arxiv.org/pdf/2606.03331v1","authors":"[\"Atm Mizanur Rahman\",\"Md Arid Hasan\",\"Syed Ishtiaque Ahmed\",\"Sharifa Sultana\"]","published":"2026-06-02T08:40:47Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
