{"ID":2840188,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.14827","arxiv_id":"2511.14827","title":"Implicit Bias of the JKO Scheme","abstract":"Wasserstein gradient flow provides a general framework for minimizing an energy functional $J$ over the space of probability measures on a Riemannian manifold $(M,g)$. Its canonical time-discretization, the Jordan-Kinderlehrer-Otto (JKO) scheme, produces for any step size $η\u003e0$ a sequence of probability distributions $ρ_k^η$ that approximate to first order in $η$ Wasserstein gradient flow on $J$. But the JKO scheme also has many other remarkable properties not shared by other first order integrators, e.g. it preserves energy dissipation and exhibits unconditional stability for $λ$-geodesically convex functionals $J$. To better understand the JKO scheme we characterize its implicit bias at second order in $η$. We show that $ρ_k^η$ are approximated to order $η^2$ by Wasserstein gradient flow on a modified energy \\[ J^η(ρ) = J(ρ) - \\fracη{4}\\int_M \\Big\\lVert \\nabla_g \\frac{δJ}{δρ} (ρ) \\Big\\rVert_{2}^{2} \\,ρ(dx), \\] obtained by subtracting from $J$ the squared metric curvature of $J$ times $η/4$. The JKO scheme therefore adds at second order in $η$ a deceleration in directions where the metric curvature of $J$ is rapidly changing. This corresponds to canonical implicit biases for common functionals: for entropy the implicit bias is the Fisher information, for KL-divergence it is the Fisher-Hyv{ä}rinen divergence, and for Riemannian gradient descent it is the kinetic energy in the metric $g$. To understand the differences between minimizing $J$ and $J^η$ we study JKO-Flow, Wasserstein gradient flow on $J^η$, in several simple numerical examples. These include exactly solvable Langevin dynamics on the Bures-Wasserstein space and Langevin sampling from a quartic potential in 1D.","short_abstract":"Wasserstein gradient flow provides a general framework for minimizing an energy functional $J$ over the space of probability measures on a Riemannian manifold $(M,g)$. Its canonical time-discretization, the Jordan-Kinderlehrer-Otto (JKO) scheme, produces for any step size $η\u003e0$ a sequence of probability distributions $...","url_abs":"https://arxiv.org/abs/2511.14827","url_pdf":"https://arxiv.org/pdf/2511.14827v3","authors":"[\"Peter Halmos\",\"Boris Hanin\"]","published":"2025-11-18T18:48:37Z","proceeding":"stat.ML","tasks":"[\"stat.ML\",\"cs.AI\",\"cs.LG\",\"math.AP\"]","methods":"[]","has_code":false}
