{"ID":2875190,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.03734","arxiv_id":"2509.03734","title":"How fast can you find a good hypothesis?","abstract":"In the hypothesis selection problem, we are given sample and query access to finite set of candidate distributions (hypotheses), $\\mathcal{H} = \\{H_1, \\ldots, H_n\\}$, and samples from an unknown distribution $P$, both over a domain $\\mathcal{X}$. The goal is to output a distribution $Q$ whose distance to $P$ is comparable to that of the nearest hypothesis in $\\mathcal{H}$. Specifically, if the minimum distance is $\\mathsf{OPT}$, we aim to output $Q$ such that, with probability at least $1-δ$, its total variation distance to $P$ is at most $C \\cdot \\mathsf{OPT} + \\varepsilon$. The optimal approximation for proper algorithms (where $Q \\in \\mathcal{H}$) is $C=3$ using $Θ(\\log(n/δ)/\\varepsilon^2)$ samples from $P$ and for improper algorithms (where $Q$ is not necessarily in $\\mathcal{H}$) is $C=2$ using $\\tildeΘ(\\log(n/δ)/\\varepsilon^2)$ samples from $P$. In the improper setting, the algorithm achieving $C=2$ [Bousquet, Braverman, Kol, Efremenko, Moran, FOCS 2021] runs in time which grows polynomially with $|\\mathcal{X}|$ -- it does not run in finite time for real-valued distributions. A promising path towards improved runtime is to consider improper algorithms which output a mixture $Q$ of the hypotheses as such a distribution can be represented in $n$ words of memory. We show (1) a lower bound that no algorithm which outputs a mixture can achieve approximation better than $C = 3-2/n$ unless the number of samples is polynomial in $|\\mathcal{X}|$, as well as (2) an algorithm which runs in time $\\text{poly}(n)$ and achieves the same approximation guarantee. In the proper setting, [Aliakbarpour, Bun, Smith, NeurIPS 2024] provided an algorithm with $C=3$ running in $\\tilde{O}(n/(δ^3\\varepsilon^3))$ time. We improve this time complexity to $\\tilde{O}(n/(δ\\varepsilon^2))$, significantly reducing the dependence on the confidence and error parameters.","short_abstract":"In the hypothesis selection problem, we are given sample and query access to finite set of candidate distributions (hypotheses), $\\mathcal{H} = \\{H_1, \\ldots, H_n\\}$, and samples from an unknown distribution $P$, both over a domain $\\mathcal{X}$. The goal is to output a distribution $Q$ whose distance to $P$ is compara...","url_abs":"https://arxiv.org/abs/2509.03734","url_pdf":"https://arxiv.org/pdf/2509.03734v2","authors":"[\"Anders Aamand\",\"Maryam Aliakbarpour\",\"Justin Y. Chen\",\"Sandeep Silwal\"]","published":"2025-09-03T21:44:48Z","proceeding":"cs.DS","tasks":"[\"cs.DS\",\"cs.LG\"]","methods":"[]","has_code":false}
