{"ID":2848835,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.23965","arxiv_id":"2510.23965","title":"The Sign Estimator: LLM Alignment in the Face of Choice Heterogeneity","abstract":"Traditional LLM alignment methods are vulnerable to heterogeneity in human preferences. Fitting a naïve probabilistic model to pairwise comparison data (say over prompt-completion pairs) yields an inconsistent estimate of the population-average utility -a canonical measure of social welfare. We propose a new method, dubbed the sign estimator, that provides a simple, provably consistent, and efficient estimator by replacing cross-entropy with binary classification loss in the aggregation step. This simple modification recovers consistent ordinal alignment under mild assumptions and achieves the first polynomial finite-sample error bounds in this setting. In realistic simulations of LLM alignment using digital twins, the sign estimator substantially reduces preference distortion over a panel of simulated personas, cutting (angular) estimation error by nearly 35% and decreasing disagreement with true population preferences from 12% to 8% compared to standard RLHF. Our method also compares favorably to panel data heuristics that explicitly model user heterogeneity and require tracking individual-level preference data-all while maintaining the implementation simplicity of existing LLM alignment pipelines.","short_abstract":"Traditional LLM alignment methods are vulnerable to heterogeneity in human preferences. Fitting a naïve probabilistic model to pairwise comparison data (say over prompt-completion pairs) yields an inconsistent estimate of the population-average utility -a canonical measure of social welfare. We propose a new method, du...","url_abs":"https://arxiv.org/abs/2510.23965","url_pdf":"https://arxiv.org/pdf/2510.23965v2","authors":"[\"Ali Aouad\",\"Aymane El Gadarri\",\"Vivek F. Farias\"]","published":"2025-10-28T00:42:38Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.LG\",\"stat.ML\"]","methods":"[\"Large Language Model\",\"RLHF\"]","has_code":false}
