{"ID":2865606,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.22876","arxiv_id":"2509.22876","title":"HEART: Emotionally-Driven Test-Time Scaling of Language Models","abstract":"Test-time scaling has significantly improved how AI models solve problems, yet current methods often get stuck in repetitive, incorrect patterns of thought. We introduce HEART, a framework that uses emotional cues to guide the model's focus, much like how feelings contribute to human decision-making. By alternating between critical tones to sharpen error detection and encouraging tones to spark new ideas, HEART helps the model break out of dead-end reasoning and find the right solution. We evaluate HEART across seven high-difficulty benchmarks--including Humanity's Last Exam, GPQA Diamond, and LiveCodeBench--demonstrating robustness across diverse models. Results show that emotion facilitates deeper reasoning, yielding consistent accuracy gains over affect-sterile baselines. These findings suggest that the next frontier in machine reasoning lies in the strategic integration of affective regulation to guide logical synthesis.","short_abstract":"Test-time scaling has significantly improved how AI models solve problems, yet current methods often get stuck in repetitive, incorrect patterns of thought. We introduce HEART, a framework that uses emotional cues to guide the model's focus, much like how feelings contribute to human decision-making. By alternating bet...","url_abs":"https://arxiv.org/abs/2509.22876","url_pdf":"https://arxiv.org/pdf/2509.22876v6","authors":"[\"Gabriela Pinto\",\"Palash Goyal\",\"Mihir Parmar\",\"Yiwen Song\",\"Souradip Chakraborty\",\"Zifeng Wang\",\"Jinsung Yoon\",\"Tomas Pfister\",\"Hamid Palangi\"]","published":"2025-09-26T19:41:00Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.LG\"]","methods":"[\"Language Model\"]","has_code":false}