{"ID":2882699,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.09628","arxiv_id":"2508.09628","title":"Attention's forward pass and Frank-Wolfe","abstract":"We study the hardmax limit of self-attention dynamics for token embeddings obtained in the zero-temperature ($β\\to+\\infty$) regime, and relate it to the finite-$β$ setting. In this limit, the update rule can be viewed as a Frank-Wolfe step for a quadratic objective over the convex hull of the current token embeddings. When the key-query matrix is negative semidefinite, the method linearly contracts all tokens to a single cluster at the origin. When it is positive semidefinite, extending the hardmax rule to the entire convex hull induces a Voronoi diagram: vertices are stationary, interior points remain in their initial cells, and each token moves along a straight line toward its cell's vertex, yielding (super-)exponential convergence. As a byproduct, we also establish well-posedness of the associated ODE limit in this regime. Returning to the finite-$β$ regime, we model self-attention dynamics as a Markov chain and prove dynamic metastability: with high probability, interior tokens reach near-vertex configurations in a constant number of steps and remain within a small neighborhood for times that grow exponentially in the inverse temperature $β$, before ultimately collapsing to the origin. Thus, the hardmax dynamics accurately approximate the finite-$β$ process over exponentially long time horizons.","short_abstract":"We study the hardmax limit of self-attention dynamics for token embeddings obtained in the zero-temperature ($β\\to+\\infty$) regime, and relate it to the finite-$β$ setting. In this limit, the update rule can be viewed as a Frank-Wolfe step for a quadratic objective over the convex hull of the current token embeddings....","url_abs":"https://arxiv.org/abs/2508.09628","url_pdf":"https://arxiv.org/pdf/2508.09628v1","authors":"[\"Albert Alcalde\",\"Borjan Geshkovski\",\"Domènec Ruiz-Balet\"]","published":"2025-08-13T08:59:13Z","proceeding":"math.OC","tasks":"[\"math.OC\"]","methods":"[]","has_code":false}
