{"ID":2874632,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.04194","arxiv_id":"2509.04194","title":"Stochastic Matching Bandits with Rare Optimization Updates","abstract":"We introduce a bandit framework for stochastic matching under the multinomial logit (MNL) choice model. In our setting, $N$ agents on one side are assigned to $K$ arms on the other side, where each arm stochastically selects an agent from its assigned pool according to unknown preferences and yields a corresponding reward over a horizon $T$. The objective is to minimize regret by maximizing the cumulative revenue from successful matches. A naive approach requires solving an NP-hard combinatorial optimization problem at every round, resulting in a prohibitive computational cost. To address this challenge, we propose batched algorithms that strategically limit the number of times matching assignments are updated to $Θ(\\log\\log T)$ over the entire horizon. By invoking expensive combinatorial optimization only on a vanishing fraction of rounds, our algorithms substantially reduce overall computational overhead while still achieving a regret bound of $\\widetilde{\\mathcal{O}}(\\sqrt{T})$.","short_abstract":"We introduce a bandit framework for stochastic matching under the multinomial logit (MNL) choice model. In our setting, $N$ agents on one side are assigned to $K$ arms on the other side, where each arm stochastically selects an agent from its assigned pool according to unknown preferences and yields a corresponding rew...","url_abs":"https://arxiv.org/abs/2509.04194","url_pdf":"https://arxiv.org/pdf/2509.04194v2","authors":"[\"Jung-hun Kim\",\"Min-hwan Oh\"]","published":"2025-09-04T13:16:32Z","proceeding":"stat.ML","tasks":"[\"stat.ML\",\"cs.LG\"]","methods":"[]","has_code":false}
