{"ID":2834651,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.01970","arxiv_id":"2512.01970","title":"Atomic Skills are the Prerequisite: When Reinforcement Learning Synthesizes Compositional Reasoning, and When It Only Amplifies","abstract":"Does Reinforcement Learning (RL) merely amplify existing skills, or synthesize novel skills? We investigate this question through the lens of Complementary Reasoning: the critical practical capability of integrating internal knowledge with external context, a prerequisite for reliable Continual Learning and Retrieval-Augmented Generation. To avoid pre-training contamination, we construct a controlled semanticsynthetic dataset of biographies and decompose this capability into two atomic skills: Parametric Reasoning (retrieving facts encoded in model weights) and Contextual Reasoning (processing novel in-context information). We present two findings. First, models supervised directly on the composite task reach high accuracy on seen facts and reasoning paths (90%) but collapse on novel facts and reasoning paths (18%), indicating that Supervised Fine-Tuning (SFT) relies on rote memorization rather than genuine skill integration. Second, RL bridges this generalization gap, acting as a skill synthesizer rather than a mere amplifier--but only under a strict prerequisite: it synthesizes new composite strategies only when the base model has first mastered the independent atomic skills via SFT. These results suggest that decoupled atomic training followed by RL offers a scalable path to complex novel reasoning.","short_abstract":"Does Reinforcement Learning (RL) merely amplify existing skills, or synthesize novel skills? We investigate this question through the lens of Complementary Reasoning: the critical practical capability of integrating internal knowledge with external context, a prerequisite for reliable Continual Learning and Retrieval-A...","url_abs":"https://arxiv.org/abs/2512.01970","url_pdf":"https://arxiv.org/pdf/2512.01970v3","authors":"[\"Sitao Cheng\",\"Xunjian Yin\",\"Ruiwen Zhou\",\"Yuxuan Li\",\"Xinyi Wang\",\"Liangming Pan\",\"William Yang Wang\",\"Victor Zhong\"]","published":"2025-12-01T18:27:25Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.CL\"]","methods":"[\"RAG\",\"Reinforcement Learning\"]","has_code":false}
