{"ID":2870588,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.18153","arxiv_id":"2509.18153","title":"A deep reinforcement learning platform for antibiotic discovery","abstract":"Antimicrobial resistance (AMR) is projected to cause up to 10 million deaths annually by 2050, underscoring the urgent need for new antibiotics. Here we present ApexAmphion, a deep-learning framework for de novo design of antibiotics that couples a 6.4-billion-parameter protein language model with reinforcement learning. The model is first fine-tuned on curated peptide data to capture antimicrobial sequence regularities, then optimised with proximal policy optimization against a composite reward that combines predictions from a learned minimum inhibitory concentration (MIC) classifier with differentiable physicochemical objectives. In vitro evaluation of 100 designed peptides showed low MIC values (nanomolar range in some cases) for all candidates (100% hit rate). Moreover, 99 our of 100 compounds exhibited broad-spectrum antimicrobial activity against at least two clinically relevant bacteria. The lead molecules killed bacteria primarily by potently targeting the cytoplasmic membrane. By unifying generation, scoring and multi-objective optimization with deep reinforcement learning in a single pipeline, our approach rapidly produces diverse, potent candidates, offering a scalable route to peptide antibiotics and a platform for iterative steering toward potency and developability within hours.","short_abstract":"Antimicrobial resistance (AMR) is projected to cause up to 10 million deaths annually by 2050, underscoring the urgent need for new antibiotics. Here we present ApexAmphion, a deep-learning framework for de novo design of antibiotics that couples a 6.4-billion-parameter protein language model with reinforcement learnin...","url_abs":"https://arxiv.org/abs/2509.18153","url_pdf":"https://arxiv.org/pdf/2509.18153v1","authors":"[\"Hanqun Cao\",\"Marcelo D. T. Torres\",\"Jingjie Zhang\",\"Zijun Gao\",\"Fang Wu\",\"Chunbin Gu\",\"Jure Leskovec\",\"Yejin Choi\",\"Cesar de la Fuente-Nunez\",\"Guangyong Chen\",\"Pheng-Ann Heng\"]","published":"2025-09-16T18:21:42Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"q-bio.BM\"]","methods":"[\"Reinforcement Learning\",\"Language Model\"]","has_code":false}