{"ID":2841202,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.12036","arxiv_id":"2511.12036","title":"Preference Learning from Physics-Based Feedback: Tuning Language Models to Design BCC/B2 Superalloys","abstract":"We apply preference learning to the task of language model-guided design of novel structural alloys. In contrast to prior work that focuses on generating stable inorganic crystals, our approach targets the synthesizeability of a specific structural class: BCC/B2 superalloys, an underexplored family of materials with potential applications in extreme environments. Using three open-weight models (LLaMA-3.1, Gemma-2, and OLMo-2), we demonstrate that language models can be optimized for multiple design objectives using a single, unified reward signal through Direct Preference Optimization (DPO). Unlike prior approaches that rely on heuristic or human-in-the-loop feedback (costly), our reward signal is derived from thermodynamic phase calculations, offering a scientifically grounded criterion for model tuning. To our knowledge, this is the first demonstration of preference-tuning a language model using physics-grounded feedback for structural alloy design. The resulting framework is general and extensible, providing a path forward for intelligent design-space exploration across a range of physical science domains.","short_abstract":"We apply preference learning to the task of language model-guided design of novel structural alloys. In contrast to prior work that focuses on generating stable inorganic crystals, our approach targets the synthesizeability of a specific structural class: BCC/B2 superalloys, an underexplored family of materials with po...","url_abs":"https://arxiv.org/abs/2511.12036","url_pdf":"https://arxiv.org/pdf/2511.12036v1","authors":"[\"Satanu Ghosh\",\"Collin Holgate\",\"Neal R. Brodnik\",\"Doug Downey\",\"Samantha Daly\",\"Tresa M. Pollock\",\"Samuel Carton\"]","published":"2025-11-15T05:08:22Z","proceeding":"cs.CE","tasks":"[\"cs.CE\",\"cond-mat.mtrl-sci\",\"cs.AI\",\"cs.CL\",\"cs.LG\"]","methods":"[\"Language Model\",\"LoRA\",\"Generative Adversarial Network\"]","has_code":false}