{"ID":2861859,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.00504","arxiv_id":"2510.00504","title":"A universal compression theory for lottery ticket hypothesis and neural scaling laws","abstract":"When training large-scale models, the performance typically scales with the number of parameters and the dataset size according to a slow power law. A fundamental theoretical and practical question is whether comparable performance can be achieved with significantly smaller models and substantially less data. In this work, we provide a positive and constructive answer. We prove that a generic permutation-invariant function of $d$ objects can be asymptotically compressed into a function of $\\operatorname{polylog} d$ objects with vanishing error, which is proved to be the optimal compression rate. This theorem yields two key implications: (Ia) a large neural network can be compressed to polylogarithmic width while preserving its learning dynamics; (Ib) a large dataset can be compressed to polylogarithmic size while leaving the loss landscape of the corresponding model unchanged. Implication (Ia) directly establishes a proof of the dynamical lottery ticket hypothesis, which states that any ordinary network can be strongly compressed such that the learning dynamics and result remain unchanged. (Ib) shows that a neural scaling law of the form $L\\sim d^{-α}$ can be boosted to an arbitrarily fast power law decay, and ultimately to $\\exp(-α' \\sqrt[m]{d})$.","short_abstract":"When training large-scale models, the performance typically scales with the number of parameters and the dataset size according to a slow power law. A fundamental theoretical and practical question is whether comparable performance can be achieved with significantly smaller models and substantially less data. In this w...","url_abs":"https://arxiv.org/abs/2510.00504","url_pdf":"https://arxiv.org/pdf/2510.00504v2","authors":"[\"Hong-Yi Wang\",\"Di Luo\",\"Tomaso Poggio\",\"Isaac L. Chuang\",\"Liu Ziyin\"]","published":"2025-10-01T04:35:23Z","proceeding":"stat.ML","tasks":"[\"stat.ML\",\"cond-mat.dis-nn\",\"cs.IT\",\"cs.LG\"]","methods":"[]","has_code":false}
