{"ID":2900830,"CreatedAt":"2026-06-01T05:51:17.9442275Z","UpdatedAt":"2026-06-01T05:51:17.9442275Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2605.30873","arxiv_id":"2605.30873","title":"Federated Variational Preference Alignment with Gumbel-Softmax Prior for Personalized User Preferences","abstract":"Federated Learning (FL) offers a privacy-preserving pathway for aligning Large Language Models (LLMs); however, existing frameworks typically enforce a monolithic reward model, inevitably averaging out inherently conflicting user preferences (e.g., helpfulness vs. harmlessness). While Variational Preference Learning (VPL) offers a pathway to personalization, adapting it to decentralized settings presents a fundamental challenge: posterior collapse driven by severe local data scarcity and heterogeneity. In this paper, we propose Federated Variational Preference Alignment with Gumbel-Softmax Prior (FedVPA-GP), a framework designed to disentangle diverse preferences without compromising privacy. To stabilize variational inference, we introduce a Federated Mixture Prior that enables clients to leverage the aggregate population distribution as a dynamic prior. Furthermore, we incorporate an Orthogonal Loss that explicitly enforces the separation of preference prototypes in the latent space. Experiments on the HH-RLHF dataset demonstrate that FedVPA-GP significantly outperforms monolithic baselines, successfully disentangling conflicting user intents and enabling dynamic preference switching.","short_abstract":"Federated Learning (FL) offers a privacy-preserving pathway for aligning Large Language Models (LLMs); however, existing frameworks typically enforce a monolithic reward model, inevitably averaging out inherently conflicting user preferences (e.g., helpfulness vs. harmlessness). While Variational Preference Learning (V...","url_abs":"https://arxiv.org/abs/2605.30873","url_pdf":"https://arxiv.org/pdf/2605.30873v1","authors":"[\"Jabin Koo\",\"Hoyoung Kim\",\"Minwoo Jang\",\"Jungseul Ok\"]","published":"2026-05-29T05:52:21Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"cs.DC\"]","methods":"[\"Large Language Model\",\"Language Model\",\"RLHF\"]","has_code":false}
