{"ID":2853145,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.16816","arxiv_id":"2510.16816","title":"Efficient High-Accuracy PDEs Solver with the Linear Attention Neural Operator","abstract":"Neural operators offer a powerful data-driven framework for learning mappings between function spaces, in which the transformer-based neural operator architecture faces a fundamental scalability-accuracy trade-off: softmax attention provides excellent fidelity but incurs quadratic complexity $\\mathcal{O}(N^2 d)$ in the number of mesh points $N$ and hidden dimension $d$, while linear attention variants reduce cost to $\\mathcal{O}(N d^2)$ but often suffer significant accuracy degradation. To address the aforementioned challenge, in this paper, we present a novel type of neural operators, Linear Attention Neural Operator (LANO), which achieves both scalability and high accuracy by reformulating attention through an agent-based mechanism. LANO resolves this dilemma by introducing a compact set of $M$ agent tokens $(M \\ll N)$ that mediate global interactions among $N$ tokens. This agent attention mechanism yields an operator layer with linear complexity $\\mathcal{O}(MN d)$ while preserving the expressive power of softmax attention. Theoretically, we demonstrate the universal approximation property, thereby demonstrating improved conditioning and stability properties. Empirically, LANO surpasses current state-of-the-art neural PDE solvers, including Transolver with slice-based softmax attention, achieving average $19.5\\%$ accuracy improvement across standard benchmarks. By bridging the gap between linear complexity and softmax-level performance, LANO establishes a scalable, high-accuracy foundation for scientific machine learning applications.","short_abstract":"Neural operators offer a powerful data-driven framework for learning mappings between function spaces, in which the transformer-based neural operator architecture faces a fundamental scalability-accuracy trade-off: softmax attention provides excellent fidelity but incurs quadratic complexity $\\mathcal{O}(N^2 d)$ in the...","url_abs":"https://arxiv.org/abs/2510.16816","url_pdf":"https://arxiv.org/pdf/2510.16816v1","authors":"[\"Ming Zhong\",\"Zhenya Yan\"]","published":"2025-10-19T13:03:09Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"math-ph\",\"physics.comp-ph\"]","methods":"[\"Transformer\"]","has_code":false}
