{"ID":2855997,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.13907","arxiv_id":"2510.13907","title":"LLM Prompt Duel Optimizer: Efficient Label-Free Prompt Optimization","abstract":"Large language models (LLMs) are highly sensitive to prompts, but most automatic prompt optimization (APO) methods assume access to ground-truth references (e.g., labeled validation data) that are costly to obtain. We propose the Prompt Duel Optimizer (PDO), a sample-efficient framework for label-free prompt optimization based on pairwise preference feedback from an LLM judge. PDO casts prompt selection as a dueling-bandit problem and combines (i) Double Thompson Sampling to prioritize informative comparisons under a fixed judge budget, with (ii) top-performer guided mutation to expand the candidate pool while pruning weak prompts. Experiments on BIG-bench Hard (BBH) and MS MARCO show that PDO consistently identifies stronger prompts than label-free baselines, while offering favorable quality--cost trade-offs under constrained comparison budgets.","short_abstract":"Large language models (LLMs) are highly sensitive to prompts, but most automatic prompt optimization (APO) methods assume access to ground-truth references (e.g., labeled validation data) that are costly to obtain. We propose the Prompt Duel Optimizer (PDO), a sample-efficient framework for label-free prompt optimizati...","url_abs":"https://arxiv.org/abs/2510.13907","url_pdf":"https://arxiv.org/pdf/2510.13907v3","authors":"[\"Yuanchen Wu\",\"Saurabh Verma\",\"Justin Lee\",\"Fangzhou Xiong\",\"Poppy Zhang\",\"Amel Awadelkarim\",\"Xu Chen\",\"Yubai Yuan\",\"Shawndra Hill\"]","published":"2025-10-14T22:23:08Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"stat.ML\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
