{"ID":2863193,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.24156","arxiv_id":"2509.24156","title":"Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models","abstract":"Large reasoning models (LRMs) exhibit unprecedented capabilities in solving complex problems through Chain-of-Thought (CoT) reasoning. However, recent studies reveal that their final answers often contradict their own reasoning traces. We hypothesize that this inconsistency stems from two competing mechanisms for generating answers: CoT reasoning and memory retrieval. To test this hypothesis, we conduct controlled experiments that challenge LRMs with misleading cues during reasoning and/or corrupted answers during retrieval. Our results across models and datasets confirm that both mechanisms operate simultaneously, with their relative dominance influenced by multiple factors: problem domains, model scales, and fine-tuning approaches (e.g., reinforcement learning vs. distillation). The findings reveal a critical limitation in current reasoning fine-tuning paradigms: models can exploit the retrieval mechanism as a shortcut, effectively \"hacking\" the reward signal and undermining genuine reasoning development. To address this challenge, we introduce FARL, a novel fine-tuning framework that integrates memory unlearning with reinforcement learning. By carefully suppressing retrieval shortcuts during the fine-tuning process, FARL promotes reasoning-dominant behavior and enhances generalizable reasoning capabilities. The code is available: https://github.com/ZJUWYH/FARL.","short_abstract":"Large reasoning models (LRMs) exhibit unprecedented capabilities in solving complex problems through Chain-of-Thought (CoT) reasoning. However, recent studies reveal that their final answers often contradict their own reasoning traces. We hypothesize that this inconsistency stems from two competing mechanisms for gener...","url_abs":"https://arxiv.org/abs/2509.24156","url_pdf":"https://arxiv.org/pdf/2509.24156v2","authors":"[\"Yuhui Wang\",\"Changjiang Li\",\"Guangke Chen\",\"Jiacheng Liang\",\"Ting Wang\"]","published":"2025-09-29T01:13:33Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.CL\"]","methods":"[\"Reinforcement Learning\"]","has_code":false,"code_links":[{"ID":608983,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2863193,"paper_url":"https://arxiv.org/abs/2509.24156","paper_title":"Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models","repo_url":"https://github.com/ZJUWYH/FARL","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
