{"ID":2826261,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.19367","arxiv_id":"2512.19367","title":"Sprecher Networks: A Parameter-Efficient Kolmogorov-Arnold Architecture","abstract":"We introduce Sprecher Networks (SNs), a family of trainable architectures derived from David Sprecher's 1965 constructive form of the Kolmogorov-Arnold representation. Each SN block implements a \"sum of shifted univariate functions\" using only two shared learnable splines per block, a monotone inner spline $φ$ and a general outer spline $Φ$, together with a learnable shift parameter $η$ and a mixing vector $λ$ shared across all output dimensions. Stacking these blocks yields deep, compositional models; for vector-valued outputs we append an additional non-summed output block. We also propose an optional lateral mixing operator enabling intra-block communication between output channels with only $O(d_{\\mathrm{out}})$ additional parameters. Owing to the vector (not matrix) mixing weights and spline sharing, SNs scale linearly in width, approximately $O(\\sum_{\\ell}(d_{\\ell-1}+d_{\\ell}+G))$ parameters for $G$ spline knots, versus $O(\\sum_{\\ell} d_{\\ell-1}d_{\\ell})$ for dense MLPs and $O(G\\sum_{\\ell} d_{\\ell-1}d_{\\ell})$ for edge-spline KANs. This linear width-scaling is particularly attractive for extremely wide, shallow models, where low depth can translate into low inference latency. Finally, we describe a sequential forward implementation that avoids materializing the $d_{\\mathrm{in}}\\times d_{\\mathrm{out}}$ shifted-input tensor, reducing peak forward-intermediate memory from quadratic to linear in layer width, relevant for memory-constrained settings such as on-device/edge inference; we demonstrate deployability via fixed-point real-time digit classification on resource-constrained embedded device with only 4 MB RAM. We provide empirical demonstrations on supervised regression, Fashion-MNIST classification (including stable training at 25 hidden layers with residual connections and normalization), and a Poisson PINN, with controlled comparisons to MLP and KAN baselines.","short_abstract":"We introduce Sprecher Networks (SNs), a family of trainable architectures derived from David Sprecher's 1965 constructive form of the Kolmogorov-Arnold representation. Each SN block implements a \"sum of shifted univariate functions\" using only two shared learnable splines per block, a monotone inner spline $φ$ and a ge...","url_abs":"https://arxiv.org/abs/2512.19367","url_pdf":"https://arxiv.org/pdf/2512.19367v2","authors":"[\"Christian Hägg\",\"Kathlén Kohn\",\"Giovanni Luca Marchetti\",\"Boris Shapiro\"]","published":"2025-12-22T13:09:45Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"math.NA\"]","methods":"[]","has_code":false}
