{"ID":2844263,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.06222","arxiv_id":"2511.06222","title":"SPA: Achieving Consensus in LLM Alignment via Self-Priority Optimization","abstract":"In high-stakes scenarios-such as self-harm, legal, or medical queries-LLMs must be both trustworthy and helpful. However, these goals often conflict. We propose priority alignment, a new alignment paradigm that enforces a strict \"trustworthy-before-helpful\" ordering: optimization of helpfulness is conditioned on first meeting trustworthy thresholds (e.g., harmlessness or honesty). To realize this, we introduce Self-Priority Alignment (SPA)-a fully unsupervised framework that generates diverse responses, self-evaluates them and refines them by the model itself, and applies dual-criterion denoising to remove inconsistency and control variance. From this, SPA constructs lexicographically ordered preference pairs and fine-tunes the model using an uncertainty-weighted alignment loss that emphasizes high-confidence, high-gap decisions. Experiments across multiple benchmarks show that SPA improves helpfulness without compromising safety, outperforming strong baselines while preserving general capabilities. Our results demonstrate that SPA provides a scalable and interpretable alignment strategy for critical LLM applications.","short_abstract":"In high-stakes scenarios-such as self-harm, legal, or medical queries-LLMs must be both trustworthy and helpful. However, these goals often conflict. We propose priority alignment, a new alignment paradigm that enforces a strict \"trustworthy-before-helpful\" ordering: optimization of helpfulness is conditioned on first...","url_abs":"https://arxiv.org/abs/2511.06222","url_pdf":"https://arxiv.org/pdf/2511.06222v1","authors":"[\"Yue Huang\",\"Xiangqi Wang\",\"Xiangliang Zhang\"]","published":"2025-11-09T04:43:32Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.CY\"]","methods":"[\"Large Language Model\"]","has_code":false}
