{"ID":2849704,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.23458","arxiv_id":"2510.23458","title":"BrowseConf: Confidence-Guided Test-Time Scaling for Web Agents","abstract":"Confidence in LLMs is a useful indicator of model uncertainty and answer reliability. Existing work mainly focused on single-turn scenarios, while research on confidence in complex multi-turn interactions is limited. In this paper, we investigate whether LLM-based search agents have the ability to communicate their own confidence through verbalized confidence scores after long sequences of actions, a significantly more challenging task compared to outputting confidence in a single interaction. Experimenting on open-source agentic models, we first find that models exhibit much higher task accuracy at high confidence while having near-zero accuracy when confidence is low. Based on this observation, we propose Test-Time Scaling (TTS) methods that use confidence scores to determine answer quality, encourage the model to try again until reaching a satisfactory confidence level. Results show that our proposed methods significantly reduce token consumption while demonstrating competitive performance compared to baseline fixed budget TTS methods.","short_abstract":"Confidence in LLMs is a useful indicator of model uncertainty and answer reliability. Existing work mainly focused on single-turn scenarios, while research on confidence in complex multi-turn interactions is limited. In this paper, we investigate whether LLM-based search agents have the ability to communicate their own...","url_abs":"https://arxiv.org/abs/2510.23458","url_pdf":"https://arxiv.org/pdf/2510.23458v2","authors":"[\"Litu Ou\",\"Kuan Li\",\"Huifeng Yin\",\"Liwen Zhang\",\"Zhongwang Zhang\",\"Xixi Wu\",\"Rui Ye\",\"Zile Qiao\",\"Pengjun Xie\",\"Jingren Zhou\",\"Yong Jiang\"]","published":"2025-10-27T15:58:51Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Large Language Model\"]","has_code":false}
