{"ID":2833366,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.03537","arxiv_id":"2512.03537","title":"Pushing the Limits of Distillation-Based Continual Learning via Classifier-Proximal Lightweight Plugins","abstract":"Continual learning requires models to learn continuously while preserving prior knowledge under evolving data streams. Distillation-based methods are appealing for retaining past knowledge in a shared single-model framework with low storage overhead. However, they remain constrained by the stability-plasticity dilemma: knowledge acquisition and preservation are still optimized through coupled objectives, and existing enhancement methods do not alter this underlying bottleneck. To address this issue, we propose a plugin extension paradigm termed Distillation-aware Lightweight Components (DLC) for distillation-based CL. DLC deploys lightweight residual plugins into the base feature extractor's classifier-proximal layer, enabling semantic-level residual correction for better classification accuracy while minimizing disruption to the overall feature extraction process. During inference, plugin-enhanced representations are aggregated to produce classification predictions. To mitigate interference from non-target plugins, we further introduce a lightweight weighting unit that learns to assign importance scores to different plugin-enhanced representations. DLC could deliver a significant 8% accuracy gain on large-scale benchmarks while introducing only a 4% increase in backbone parameters, highlighting its exceptional efficiency. Moreover, DLC is compatible with other plug-and-play CL enhancements and delivers additional gains when combined with them.","short_abstract":"Continual learning requires models to learn continuously while preserving prior knowledge under evolving data streams. Distillation-based methods are appealing for retaining past knowledge in a shared single-model framework with low storage overhead. However, they remain constrained by the stability-plasticity dilemma:...","url_abs":"https://arxiv.org/abs/2512.03537","url_pdf":"https://arxiv.org/pdf/2512.03537v3","authors":"[\"Zhiming Xu\",\"Baile Xu\",\"Jian Zhao\",\"Furao Shen\",\"Suorong Yang\"]","published":"2025-12-03T07:57:48Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"stat.ML\"]","methods":"[]","has_code":false}
