{"ID":2840203,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.14852","arxiv_id":"2511.14852","title":"PolyKAN: Efficient Fused GPU Operators for Polynomial Kolmogorov-Arnold Network Variants","abstract":"Kolmogorov-Arnold Networks (KANs) promise higher expressive capability and stronger interpretability than Multi-Layer Perceptron, particularly in the domain of AI for Science. However, practical adoption has been hindered by low GPU utilization of existing parallel implementations. To address this challenge, we present a GPU-accelerated operator library, named PolyKAN which is the first general open-source implementation of KAN and its variants. PolyKAN fuses the forward and backward passes of polynomial KAN layers into a concise set of optimized CUDA kernels. Four orthogonal techniques underpin the design: (i) \\emph{lookup-table} with linear interpolation that replaces runtime expensive math-library functions; (ii) \\emph{2D tiling} to expose thread-level parallelism with preserving memory locality; (iii) a \\emph{two-stage reduction} scheme converting scattered atomic updates into a single controllable merge step; and (iv) \\emph{coefficient-layout reordering} yielding unit-stride reads under the tiled schedule. Using a KAN variant, Chebyshev KAN, as a case-study, PolyKAN delivers $1.2$--$10\\times$ faster inference and $1.4$--$12\\times$ faster training than a Triton + cuBLAS baseline, with identical accuracy on speech, audio-enhancement, and tabular-regression workloads on both highend GPU and consumer-grade GPU.","short_abstract":"Kolmogorov-Arnold Networks (KANs) promise higher expressive capability and stronger interpretability than Multi-Layer Perceptron, particularly in the domain of AI for Science. However, practical adoption has been hindered by low GPU utilization of existing parallel implementations. To address this challenge, we present...","url_abs":"https://arxiv.org/abs/2511.14852","url_pdf":"https://arxiv.org/pdf/2511.14852v1","authors":"[\"Mingkun Yu\",\"Heming Zhong\",\"Dan Huang\",\"Yutong Lu\",\"Jiazhi Jiang\"]","published":"2025-11-18T19:05:16Z","proceeding":"cs.DC","tasks":"[\"cs.DC\",\"cs.AI\"]","methods":"[]","has_code":false}
