{"ID":2851756,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.19634","arxiv_id":"2510.19634","title":"Matrix-Free Least Squares Solvers: Values, Gradients, and What to Do With Them","abstract":"This paper argues that the method of least squares has significant unfulfilled potential in modern machine learning, far beyond merely being a tool for fitting linear models. To release its potential, we derive custom gradients that transform the solver into a differentiable operator, like a neural network layer, enabling many diverse applications. Empirically, we demonstrate: (i) scalability by enforcing weight sparsity on a 50 million parameter model; (ii) imposing conservativeness constraints in score-based generative models; and (iii) hyperparameter tuning of Gaussian processes based on predictive performance. By doing this, our work represents the next iteration in developing differentiable linear-algebra tools and making them widely accessible to machine learning practitioners.","short_abstract":"This paper argues that the method of least squares has significant unfulfilled potential in modern machine learning, far beyond merely being a tool for fitting linear models. To release its potential, we derive custom gradients that transform the solver into a differentiable operator, like a neural network layer, enabl...","url_abs":"https://arxiv.org/abs/2510.19634","url_pdf":"https://arxiv.org/pdf/2510.19634v1","authors":"[\"Hrittik Roy\",\"Søren Hauberg\",\"Nicholas Krämer\"]","published":"2025-10-22T14:31:51Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"math.NA\"]","methods":"[]","has_code":false}