{"ID":2841875,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.11553","arxiv_id":"2511.11553","title":"Multistability of Self-Attention Dynamics in Transformers","abstract":"In machine learning, a self-attention dynamics is a continuous-time multiagent-like model of the attention mechanisms of transformers. In this paper we show that such dynamics is related to a multiagent version of the Oja flow, a dynamical system that computes the principal eigenvector of a matrix corresponding for transformers to the value matrix. We classify the equilibria of the ``single-head'' self-attention system into four classes: consensus, bipartite consensus, clustering and polygonal equilibria. Multiple asymptotically stable equilibria from the first three classes often coexist in the self-attention dynamics. Interestingly, equilibria from the first two classes are always aligned with the eigenvectors of the value matrix, often but not exclusively with the principal eigenvector.","short_abstract":"In machine learning, a self-attention dynamics is a continuous-time multiagent-like model of the attention mechanisms of transformers. In this paper we show that such dynamics is related to a multiagent version of the Oja flow, a dynamical system that computes the principal eigenvector of a matrix corresponding for tra...","url_abs":"https://arxiv.org/abs/2511.11553","url_pdf":"https://arxiv.org/pdf/2511.11553v1","authors":"[\"Claudio Altafini\"]","published":"2025-11-14T18:45:22Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"eess.SY\",\"math.DS\"]","methods":"[\"Transformer\"]","has_code":false}
