{"ID":2863574,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.24728","arxiv_id":"2509.24728","title":"Beyond Softmax: A Natural Parameterization for Categorical Random Variables","abstract":"Latent categorical variables are frequently found in deep learning architectures. They can model actions in discrete reinforcement-learning environments, represent categories in latent-variable models, or express relations in graph neural networks. Despite their widespread use, their discrete nature poses significant challenges to gradient-descent learning algorithms. While a substantial body of work has offered improved gradient estimation techniques, we take a complementary approach. Specifically, we: 1) revisit the ubiquitous $\\textit{softmax}$ function and demonstrate its limitations from an information-geometric perspective; 2) replace the $\\textit{softmax}$ with the $\\textit{catnat}$ function, a function composed of a sequence of hierarchical binary splits; we prove that this choice offers significant advantages to gradient descent due to the resulting diagonal Fisher Information Matrix. A rich set of experiments - including graph structure learning, variational autoencoders, and reinforcement learning - empirically show that the proposed function improves the learning efficiency and yields models characterized by consistently higher test performance. $\\textit{Catnat}$ is simple to implement and seamlessly integrates into existing codebases. Moreover, it remains compatible with standard training stabilization techniques and, as such, offers a better alternative to the $\\textit{softmax}$ function.","short_abstract":"Latent categorical variables are frequently found in deep learning architectures. They can model actions in discrete reinforcement-learning environments, represent categories in latent-variable models, or express relations in graph neural networks. Despite their widespread use, their discrete nature poses significant c...","url_abs":"https://arxiv.org/abs/2509.24728","url_pdf":"https://arxiv.org/pdf/2509.24728v2","authors":"[\"Alessandro Manenti\",\"Cesare Alippi\"]","published":"2025-09-29T12:55:50Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"stat.ML\"]","methods":"[\"Graph Neural Network\",\"Reinforcement Learning\"]","has_code":false}
