{"ID":2871380,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.11311","arxiv_id":"2509.11311","title":"Prompts to Proxies: Emulating Human Preferences via a Compact LLM Ensemble","abstract":"Large language models are increasingly used as proxies for human subjects in social science research, yet external validity requires that synthetic agents faithfully reflect the preferences of target human populations. We introduce *preference reconstruction theory*, a framework that formalizes preference alignment as a representation learning problem: constructing a functional basis of proxy agents and recovering population preferences through weighted aggregation. We implement this via *Prompts to Proxies* ($\\texttt{P2P}$), a modular two-stage system. Stage 1 uses structured prompting with entropy-based adaptive sampling to construct a diverse agent pool spanning the latent preference space. Stage 2 employs L1-regularized regression to select a compact ensemble whose aggregate response distributions align with observed data from the target population. $\\texttt{P2P}$ requires no finetuning and no access to sensitive demographic data, incurring only API inference costs. We validate the approach on 14 waves of the American Trends Panel, achieving an average test MSE of 0.014 across diverse topics at approximately 0.8 USD per survey. We additionally test it on the World Values Survey, demonstrating its potential to generalize across locales. When stress-tested against an SFT-aligned baseline, $\\texttt{P2P}$ achieves competitive performance using less than 3% of the training data.","short_abstract":"Large language models are increasingly used as proxies for human subjects in social science research, yet external validity requires that synthetic agents faithfully reflect the preferences of target human populations. We introduce *preference reconstruction theory*, a framework that formalizes preference alignment as...","url_abs":"https://arxiv.org/abs/2509.11311","url_pdf":"https://arxiv.org/pdf/2509.11311v2","authors":"[\"Bingchen Wang\",\"Zi-Yu Khoo\",\"Jingtan Wang\"]","published":"2025-09-14T15:08:45Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.CY\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}