{"ID":2921677,"CreatedAt":"2026-06-02T02:42:49.606572591Z","UpdatedAt":"2026-06-03T05:56:00.181519634Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.01159","arxiv_id":"2606.01159","title":"Fairness in two-player zero-sum games with bandit feedback","abstract":"We study two-player zero-sum games (TPZSGs) with bandit feedback under fairness constraints requiring every action to be played with probability at least $α/m$. Existing instance-dependent results target $\\textit{pure}$ Nash equilibria, while fairness generically produces $\\textit{mixed}$ equilibria, a harder learning target. Our key technical tool is a reparametrization: every fair strategy decomposes as $p = (α/m)\\mathbf{1} + (1-α)\\widetilde{p}$ with $\\widetilde{p} \\in Δ_m$, and substituting into the payoff form yields $p^{\\top}Aq = \\widetilde{p}^{\\top}\\widetilde{A} q$ for a fair payoff matrix $\\widetilde{A} := (1-α)A + α\\mathbf{1} c^{\\top}$, where $c_j = \\tfrac{1}{m}\\sum_i A(i,j)$ is the column-mean vector. The fair game on $A$ is then equivalent to a standard zero-sum game on $\\widetilde{A}$, so equilibrium existence, KKT structure, and LP basis stability reduce to classical results applied to $\\widetilde{A}$. We derive the fair minimax value, fair Nash equilibrium, fair regret, and a clean dual representation showing the price of fairness is at most $α(1-1/m)$ and vanishes whenever the unconstrained equilibrium already has full support. Our main result is an $\\widetilde{O}(T^{2/3})$ regret bound for an Explore-Then-Commit algorithm, $\\texttt{Fair-ETC-TPZSG}$, applicable to general mixed fair equilibria, together with a discussion of why naive action elimination does not readily improve it. When the fair equilibrium has a single dominant action, equivalently when $\\widetilde{p}^{\\star}$ is a vertex of $Δ_m$, the bound sharpens to instance-dependent $\\widetilde{O}(1/\\widetildeΔ(α)^{2})$, where $\\widetildeΔ(α)$ is the LP-margin gap.","short_abstract":"We study two-player zero-sum games (TPZSGs) with bandit feedback under fairness constraints requiring every action to be played with probability at least $α/m$. Existing instance-dependent results target $\\textit{pure}$ Nash equilibria, while fairness generically produces $\\textit{mixed}$ equilibria, a harder learning...","url_abs":"https://arxiv.org/abs/2606.01159","url_pdf":"https://arxiv.org/pdf/2606.01159v1","authors":"[\"S Akash\",\"Pratik Gajane\"]","published":"2026-05-31T11:06:06Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.GT\"]","methods":"[]","has_code":false}
