{"ID":2868224,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.17196","arxiv_id":"2509.17196","title":"Evolution of Concepts in Language Model Pre-Training","abstract":"Language models obtain extensive capabilities through pre-training. However, the pre-training process remains a black box. In this work, we track linear interpretable feature evolution across pre-training snapshots using a sparse dictionary learning method called crosscoders. We find that most features begin to form around a specific point, while more complex patterns emerge in later training stages. Feature attribution analyses reveal causal connections between feature evolution and downstream performance. Our feature-level observations are highly consistent with previous findings on Transformer's two-stage learning process, which we term a statistical learning phase and a feature learning phase. Our work opens up the possibility to track fine-grained representation progress during language model learning dynamics.","short_abstract":"Language models obtain extensive capabilities through pre-training. However, the pre-training process remains a black box. In this work, we track linear interpretable feature evolution across pre-training snapshots using a sparse dictionary learning method called crosscoders. We find that most features begin to form ar...","url_abs":"https://arxiv.org/abs/2509.17196","url_pdf":"https://arxiv.org/pdf/2509.17196v2","authors":"[\"Xuyang Ge\",\"Wentao Shu\",\"Jiaxing Wu\",\"Yunhua Zhou\",\"Zhengfu He\",\"Xipeng Qiu\"]","published":"2025-09-21T18:53:12Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Transformer\",\"Language Model\"]","has_code":false}
