{"ID":2868381,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.16596","arxiv_id":"2509.16596","title":"Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter Levels","abstract":"Large language models (LLMs) acquire substantial world knowledge during pre-training, which is further shaped by post-training techniques such as supervised fine-tuning (SFT). However, the impact of SFT on a model's knowledge remains underexplored, limiting our ability to control knowledge change behavior in fine-tuned models. To address this gap, we evaluate closed-book question answering (CBQA) performance across five LLMs from the LLaMA-2 and LLaMA-3 families. Surprisingly, models fine-tuned on 1,920 samples perform up to 14% worse than those fine-tuned on only 240 samples. Furthermore, varying the level of knowledge mastery in the fine-tuning data leads to performance fluctuations of over 12%. To investigate these effects, we analyze model behavior at both the token and parameter levels. Our analysis reveals that up to 90% of parameter updates during SFT do not contribute to knowledge enhancement. Restoring these updates can improve performance on the CBQA task, depending on the characteristics of the fine-tuning data. These insights offer practical guidance for developing fine-tuning strategies that more effectively strengthen model knowledge.","short_abstract":"Large language models (LLMs) acquire substantial world knowledge during pre-training, which is further shaped by post-training techniques such as supervised fine-tuning (SFT). However, the impact of SFT on a model's knowledge remains underexplored, limiting our ability to control knowledge change behavior in fine-tuned...","url_abs":"https://arxiv.org/abs/2509.16596","url_pdf":"https://arxiv.org/pdf/2509.16596v2","authors":"[\"Junjie Ye\",\"Yuming Yang\",\"Yang Nan\",\"Shuo Li\",\"Qi Zhang\",\"Tao Gui\",\"Xuanjing Huang\",\"Peng Wang\",\"Zhongchao Shi\",\"Jianping Fan\"]","published":"2025-09-20T09:40:32Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
