{"ID":2875546,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.02510","arxiv_id":"2509.02510","title":"Top-H Decoding: Adapting the Creativity and Coherence with Bounded Entropy in Text Generation","abstract":"Large language models (LLMs), despite their impressive performance across a wide range of tasks, often struggle to balance two competing objectives in open-ended text generation: fostering diversity and creativity while preserving logical coherence. Existing truncated sampling techniques, including temperature scaling, top-\\$p\\$ (nucleus) sampling, and min-\\$p\\$ sampling, aim to manage this trade-off. However, they exhibit limitations, particularly in the effective incorporation of the confidence of the model into the corresponding sampling strategy. For example, min-\\$p\\$ sampling relies on a single top token as a heuristic for confidence, eventually underutilizing the information of the probability distribution. Toward effective incorporation of the confidence of the model, in this paper, we present **top-H** decoding. We first establish the theoretical foundation of the interplay between creativity and coherence in truncated sampling by formulating an **entropy-constrained minimum divergence** problem. We then prove this minimization problem to be equivalent to an **entropy-constrained mass maximization** (ECMM) problem, which is NP-hard. Finally, we present top-H decoding, a computationally efficient greedy algorithm to solve the ECMM problem. Extensive empirical evaluations demonstrate that top-H outperforms the state-of-the-art (SoTA) alternative of min-\\$p\\$ sampling by up to **25.63%** on creative writing benchmarks, while maintaining robustness on question-answering datasets such as GPQA, GSM8K, and MT-Bench. Additionally, an *LLM-as-judge* evaluation confirms that top-H indeed produces coherent outputs even at higher temperatures, where creativity is especially critical. In summary, top-H advances SoTA in open-ended text generation and can be *easily integrated* into creative writing applications. The code is available at https://github.com/ErfanBaghaei/Top-H-Decoding.","short_abstract":"Large language models (LLMs), despite their impressive performance across a wide range of tasks, often struggle to balance two competing objectives in open-ended text generation: fostering diversity and creativity while preserving logical coherence. Existing truncated sampling techniques, including temperature scaling,...","url_abs":"https://arxiv.org/abs/2509.02510","url_pdf":"https://arxiv.org/pdf/2509.02510v2","authors":"[\"Erfan Baghaei Potraghloo\",\"Seyedarmin Azizi\",\"Souvik Kundu\",\"Massoud Pedram\"]","published":"2025-09-02T17:02:29Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\",\"stat.ML\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":610224,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2875546,"paper_url":"https://arxiv.org/abs/2509.02510","paper_title":"Top-H Decoding: Adapting the Creativity and Coherence with Bounded Entropy in Text Generation","repo_url":"https://github.com/ErfanBaghaei/Top-H-Decoding","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
