{"ID":2884713,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.06247","arxiv_id":"2508.06247","title":"Near-Optimal Regret for Efficient Stochastic Combinatorial Semi-Bandits","abstract":"The combinatorial multi-armed bandit (CMAB) is a cornerstone of sequential decision-making framework, dominated by two algorithmic families: UCB-based and adversarial methods such as follow the regularized leader (FTRL) and online mirror descent (OMD). However, prominent UCB-based approaches like CUCB suffer from additional regret factor $\\log T$ that is detrimental over long horizons, while adversarial methods such as EXP3.M and HYBRID impose significant computational overhead. To resolve this trade-off, we introduce the Combinatorial Minimax Optimal Strategy in the Stochastic setting (CMOSS). CMOSS is a computationally efficient algorithm that achieves an instance-independent regret of $O\\big( (\\log k)\\sqrt{kmT}\\big )$ when $k\\leq \\frac{m}{2}$ and $O\\big((m-k)\\sqrt{\\log k\\log(m-k)T}\\big )$ when $k\u003e\\frac{m}{2}$ under semi-bandit feedback, where $m$ is the number of arms and $k$ is the maximum cardinality of a feasible action. Crucially, this result eliminates the dependency on $\\log T$ and matches the established lower bounds of $Ω\\big(\\sqrt{kmT}\\big)$ when $k\\leq \\frac{m}{2}$ and $Ω\\big((m-k)\\sqrt{\\log (\\frac{m}{m-k}) T}\\big)$ when $k\u003e\\frac{m}{2}$ up to logarithmic terms of $k$ and $m$. We then extend our analysis to show that CMOSS is also applicable to cascading feedback. Experiments on synthetic and real-world datasets validate that CMOSS consistently outperforms benchmark algorithms in both regret and runtime efficiency.","short_abstract":"The combinatorial multi-armed bandit (CMAB) is a cornerstone of sequential decision-making framework, dominated by two algorithmic families: UCB-based and adversarial methods such as follow the regularized leader (FTRL) and online mirror descent (OMD). However, prominent UCB-based approaches like CUCB suffer from addit...","url_abs":"https://arxiv.org/abs/2508.06247","url_pdf":"https://arxiv.org/pdf/2508.06247v2","authors":"[\"Zichun Ye\",\"Runqi Wang\",\"Xutong Liu\",\"Shuai Li\"]","published":"2025-08-08T12:01:50Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.DS\",\"stat.ML\"]","methods":"[]","has_code":false}
