Approximation with SiLU Networks: Constant Depth and Exponential Rates for Basic Operations

cs.LG arXiv:2512.12132
View PDF arXiv JSON

Abstract

We present SiLU network constructions whose approximation efficiency depends critically on proper hyperparameter tuning. For the square function $x^2$, with optimally chosen shift $a$ and scale $β$, we achieve approximation error $\varepsilon$ using a two-layer network of constant width, where weights scale as $β^{\pm k}$ with $k = \mathcal{O}(\ln(1/\varepsilon))$. We then extend this approach through functional composition to Sobolev spaces, we obtain networks with depth $\mathcal{O}(1)$ and $\mathcal{O}(\varepsilon^{-d/n})$ parameters under optimal hyperparameters settings. Our work highlights the trade-off between architectural depth and activation parameter optimization in neural network approximation theory.

PDF Viewer