{"ID":2851778,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.19675","arxiv_id":"2510.19675","title":"Study of Training Dynamics for Memory-Constrained Fine-Tuning","abstract":"Memory-efficient training of deep neural networks has become increasingly important as models grow larger while deployment environments impose strict resource constraints. We propose TraDy, a novel transfer learning scheme leveraging two key insights: layer importance for updates is architecture-dependent and determinable a priori, while dynamic stochastic channel selection provides superior gradient approximation compared to static approaches. We introduce a dynamic channel selection approach that stochastically resamples channels between epochs within preselected layers. Extensive experiments demonstrate TraDy achieves state-of-the-art performance across various downstream tasks and architectures while maintaining strict memory constraints, achieving up to 99% activation sparsity, 95% weight derivative sparsity, and 97% reduction in FLOPs for weight derivative computation.","short_abstract":"Memory-efficient training of deep neural networks has become increasingly important as models grow larger while deployment environments impose strict resource constraints. We propose TraDy, a novel transfer learning scheme leveraging two key insights: layer importance for updates is architecture-dependent and determina...","url_abs":"https://arxiv.org/abs/2510.19675","url_pdf":"https://arxiv.org/pdf/2510.19675v2","authors":"[\"Aël Quélennec\",\"Nour Hezbri\",\"Pavlo Mozharovskyi\",\"Van-Tam Nguyen\",\"Enzo Tartaglione\"]","published":"2025-10-22T15:21:05Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[]","has_code":false}