{"ID":2863776,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.25040","arxiv_id":"2509.25040","title":"A multiscale analysis of mean-field transformers in the moderate interaction regime","abstract":"In this paper, we study the evolution of tokens through the depth of encoder-only transformer models at inference time by modeling them as a system of particles interacting in a mean-field way and studying the corresponding dynamics. More specifically, we consider this problem in the moderate interaction regime, where the number $N$ of tokens is large and the inverse temperature parameter $β$ of the model scales together with $N$. In this regime, the dynamics of the system displays a multiscale behavior: a fast phase, where the token empirical measure collapses on a low-dimensional space, an intermediate phase, where the measure further collapses into clusters, and a slow one, where such clusters sequentially merge into a single one. We provide a rigorous characterization of the limiting dynamics in each of these phases and prove convergence in the above mentioned limit, exemplifying our results with some simulations.","short_abstract":"In this paper, we study the evolution of tokens through the depth of encoder-only transformer models at inference time by modeling them as a system of particles interacting in a mean-field way and studying the corresponding dynamics. More specifically, we consider this problem in the moderate interaction regime, where...","url_abs":"https://arxiv.org/abs/2509.25040","url_pdf":"https://arxiv.org/pdf/2509.25040v1","authors":"[\"Giuseppe Bruno\",\"Federico Pasqualotto\",\"Andrea Agazzi\"]","published":"2025-09-29T16:57:04Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"math.PR\",\"stat.ML\"]","methods":"[\"Transformer\"]","has_code":false}
