{"ID":2842933,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.09493","arxiv_id":"2511.09493","title":"Consensus Sampling for Safer Generative AI","abstract":"Motivated by undetectable risks in generative AI, we study a general robust aggregation problem: how to aggregate several probability distributions to boost safety. We present consensus sampling, a black-box algorithm that, given k distributions, has risk competitive with the average risk of the safest $s$ while abstaining when there is insufficient agreement. This yields an architecture-agnostic approach to generative-model safety when the distributions are induced by models that can sample and evaluate output probabilities. We formalize the guarantee through R-robustness, which also bounds information leakage and adversarial influence. Inspired by robust statistics and the provable copyright protection algorithm of Vyas et al (2023), we show that while a standard mixture is vulnerable to one unsafe constituent, a pointwise-median construction provides robust intuition, and our efficient sampler is Pareto-optimal for the tradeoff between worst-case risk and abstention. Experiments on synthetic distributions and image generation illustrate the general mechanism and its motivating safety application. The method requires overlap among safe distributions, but it provides a model-agnostic way to inherit guarantees from an unknown reliable subset.","short_abstract":"Motivated by undetectable risks in generative AI, we study a general robust aggregation problem: how to aggregate several probability distributions to boost safety. We present consensus sampling, a black-box algorithm that, given k distributions, has risk competitive with the average risk of the safest $s$ while abstai...","url_abs":"https://arxiv.org/abs/2511.09493","url_pdf":"https://arxiv.org/pdf/2511.09493v2","authors":"[\"Adam Tauman Kalai\",\"Yael Tauman Kalai\",\"Or Zamir\"]","published":"2025-11-12T17:09:45Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.LG\"]","methods":"[]","has_code":false}
