{"ID":2877777,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.20211","arxiv_id":"2508.20211","title":"What can we learn from signals and systems in a transformer? Insights for probabilistic modeling and inference architecture","abstract":"In the 1940s, Wiener introduced a linear predictor, where the future prediction is computed by linearly combining the past data. A transformer generalizes this idea: it is a nonlinear predictor where the next-token prediction is computed by nonlinearly combining the past tokens. In this essay, we present a probabilistic model that interprets transformer signals as surrogates of conditional measures, and layer operations as fixed-point updates. An explicit form of the fixed-point update is described for the special case when the probabilistic model is a hidden Markov model (HMM). In part, this paper is in an attempt to bridge the classical nonlinear filtering theory with modern inference architectures.","short_abstract":"In the 1940s, Wiener introduced a linear predictor, where the future prediction is computed by linearly combining the past data. A transformer generalizes this idea: it is a nonlinear predictor where the next-token prediction is computed by nonlinearly combining the past tokens. In this essay, we present a probabilisti...","url_abs":"https://arxiv.org/abs/2508.20211","url_pdf":"https://arxiv.org/pdf/2508.20211v1","authors":"[\"Heng-Sheng Chang\",\"Prashant G. Mehta\"]","published":"2025-08-27T18:37:55Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"eess.SY\",\"math.PR\"]","methods":"[\"Transformer\"]","has_code":false}