{"ID":2899527,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.00518","arxiv_id":"2507.00518","title":"Exploring Large Action Sets with Hyperspherical Embeddings using von Mises-Fisher Sampling","abstract":"This paper introduces von Mises-Fisher exploration (vMF-exp), a scalable method for exploring large action sets in reinforcement learning problems where hyperspherical embedding vectors represent these actions. vMF-exp involves initially sampling a state embedding representation using a von Mises-Fisher distribution, then exploring this representation's nearest neighbors, which scales to virtually unlimited numbers of candidate actions. We show that, under theoretical assumptions, vMF-exp asymptotically maintains the same probability of exploring each action as Boltzmann Exploration (B-exp), a popular alternative that, nonetheless, suffers from scalability issues as it requires computing softmax values for each action. Consequently, vMF-exp serves as a scalable alternative to B-exp for exploring large action sets with hyperspherical embeddings. Experiments on simulated data, real-world public data, and the successful large-scale deployment of vMF-exp on the recommender system of a global music streaming service empirically validate the key properties of the proposed method.","short_abstract":"This paper introduces von Mises-Fisher exploration (vMF-exp), a scalable method for exploring large action sets in reinforcement learning problems where hyperspherical embedding vectors represent these actions. vMF-exp involves initially sampling a state embedding representation using a von Mises-Fisher distribution, t...","url_abs":"https://arxiv.org/abs/2507.00518","url_pdf":"https://arxiv.org/pdf/2507.00518v1","authors":"[\"Walid Bendada\",\"Guillaume Salha-Galvan\",\"Romain Hennequin\",\"Théo Bontempelli\",\"Thomas Bouabça\",\"Tristan Cazenave\"]","published":"2025-07-01T07:32:54Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.IR\"]","methods":"[\"Reinforcement Learning\",\"LoRA\"]","has_code":false}
