{"ID":2833265,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.03377","arxiv_id":"2512.03377","title":"Nexus: Higher-Order Attention Mechanisms in Transformers","abstract":"Transformers have achieved significant success across various domains, relying on self-attention to capture dependencies. However, the standard first-order attention mechanism is often limited by a low-rank bottleneck, struggling to capture intricate, multi-hop relationships within a single layer. In this paper, we propose the Nexus, a novel architecture designed to enhance representational power through a recursive framework. Unlike standard approaches that use static linear projections for Queries and Keys, Nexus dynamically refines these representations via nested self-attention mechanisms. Specifically, the Query and Key vectors are themselves outputs of inner attention loops, allowing tokens to aggregate global context and model high-order correlations \\textit{prior} to the final attention computation. We enforce a parameter-efficient weight-sharing strategy across recursive steps, ensuring that this enhanced expressivity incurs $\\mathcal{O}(1)$ additional parameters. We provide theoretical analysis demonstrating that our method breaks the linear bottleneck of standard attention. Empirically, Nexus outperforms standard Transformers on multiple benchmarks.","short_abstract":"Transformers have achieved significant success across various domains, relying on self-attention to capture dependencies. However, the standard first-order attention mechanism is often limited by a low-rank bottleneck, struggling to capture intricate, multi-hop relationships within a single layer. In this paper, we pro...","url_abs":"https://arxiv.org/abs/2512.03377","url_pdf":"https://arxiv.org/pdf/2512.03377v2","authors":"[\"Hanting Chen\",\"Chong Zhu\",\"Kai Han\",\"Yuchuan Tian\",\"Yuchen Liang\",\"Tianyu Guo\",\"Xinghao Chen\",\"Dacheng Tao\",\"Yunhe Wang\"]","published":"2025-12-03T02:25:38Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Transformer\"]","has_code":false}
