{"ID":2835387,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.00126","arxiv_id":"2512.00126","title":"RadDiff: Retrieval-Augmented Denoising Diffusion for Protein Inverse Folding","abstract":"Protein inverse folding, the design of an amino acid sequence based on a target protein structure, is a fundamental problem of computational protein engineering. Existing methods either generate sequences without leveraging external knowledge or relying on protein language models~(PLMs). The former omits the knowledge stored in natural protein data, while the latter is parameter-inefficient and inflexible to adapt to ever-growing protein data. To overcome the above drawbacks, in this paper we propose a novel method, called $\\underline{\\text{r}}$etrieval-$\\underline{\\text{a}}$ugmented $\\underline{\\text{d}}$enoising $\\underline{\\text{diff}}$usion~($\\mbox{RadDiff}$), for protein inverse folding. In RadDiff, a novel retrieval-augmentation mechanism is designed to capture the up-to-date protein knowledge. We further design a knowledge-aware diffusion model that integrates this protein knowledge into the diffusion process via a lightweight module. Experimental results on the CATH, TS50, and PDB2022 datasets show that $\\mbox{RadDiff}$ consistently outperforms existing methods, improving sequence recovery rate by up to 19\\%. Experimental results also demonstrate that RadDiff generates highly foldable sequences and scales effectively with database size.","short_abstract":"Protein inverse folding, the design of an amino acid sequence based on a target protein structure, is a fundamental problem of computational protein engineering. Existing methods either generate sequences without leveraging external knowledge or relying on protein language models~(PLMs). The former omits the knowledge...","url_abs":"https://arxiv.org/abs/2512.00126","url_pdf":"https://arxiv.org/pdf/2512.00126v2","authors":"[\"Jin Han\",\"Tianfan Fu\",\"Wu-Jun Li\"]","published":"2025-11-28T07:32:15Z","proceeding":"q-bio.QM","tasks":"[\"q-bio.QM\",\"cs.AI\"]","methods":"[\"Diffusion Model\",\"Language Model\"]","has_code":false}
