{"ID":2828600,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.19720","arxiv_id":"2512.19720","title":"Per-Axis Weight Deltas for Frequent Model Updates","abstract":"Serving many task-specialized LLM variants is often limited by the large size of fine-tuned checkpoints and the resulting cold-start latency. Since fine-tuned weights differ from their base model by relatively small structured residuals, a natural approach is to represent them as compressed deltas. We propose a simple 1-bit delta scheme that stores only the sign of the weight difference together with lightweight per-axis (row/column) FP16 scaling factors, learned from a small calibration set. This design preserves the compactness of 1-bit deltas while more accurately capturing variation across weight dimensions, leading to improved reconstruction quality over scalar alternatives. From a systems perspective, a streamlined loader that transfers packed deltas in a single operation per module reduces cold-start latency and storage overhead, with artifacts several times smaller than a full FP16 checkpoint. The method is drop-in, requires minimal calibration data, and maintains inference efficiency by avoiding dense reconstruction. Our experimental setup and source code are available at https://github.com/kuiumdjiev/Per-Axis-Weight-Deltas-for-Frequent-Model-Updates.","short_abstract":"Serving many task-specialized LLM variants is often limited by the large size of fine-tuned checkpoints and the resulting cold-start latency. Since fine-tuned weights differ from their base model by relatively small structured residuals, a natural approach is to represent them as compressed deltas. We propose a simple...","url_abs":"https://arxiv.org/abs/2512.19720","url_pdf":"https://arxiv.org/pdf/2512.19720v1","authors":"[\"Stefan Kuyumdzhiev\",\"Radostin Cholakov\"]","published":"2025-12-16T16:46:28Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Large Language Model\"]","has_code":false,"code_links":[{"ID":605890,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2828600,"paper_url":"https://arxiv.org/abs/2512.19720","paper_title":"Per-Axis Weight Deltas for Frequent Model Updates","repo_url":"https://github.com/kuiumdjiev/Per-Axis-Weight-Deltas-for-Frequent-Model-Updates","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
