{"ID":2848907,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.24053","arxiv_id":"2510.24053","title":"Low-N Protein Activity Optimization with FolDE","abstract":"Proteins are traditionally optimized through the costly construction and measurement of many mutants. Active Learning-assisted Directed Evolution (ALDE) alleviates that cost by predicting the best improvements and iteratively testing mutants to inform predictions. However, existing ALDE methods face a critical limitation: selecting the highest-predicted mutants in each round yields homogeneous training data insufficient for accurate prediction models in subsequent rounds. Here we present FolDE, an ALDE method designed to maximize end-of-campaign success. In simulations across 20 protein targets, FolDE discovers 23% more top 10% mutants than the best baseline ALDE method (p=0.005) and is 55% more likely to find top 1% mutants. FolDE achieves this primarily through naturalness-based warm-starting, which augments limited activity measurements with protein language model outputs to improve activity prediction. We also introduce a constant-liar batch selector, which improves batch diversity; this is important in multi-mutation campaigns but had limited effect in our benchmarks. The complete workflow is freely available as open-source software, making efficient protein optimization accessible to any laboratory.","short_abstract":"Proteins are traditionally optimized through the costly construction and measurement of many mutants. Active Learning-assisted Directed Evolution (ALDE) alleviates that cost by predicting the best improvements and iteratively testing mutants to inform predictions. However, existing ALDE methods face a critical limitati...","url_abs":"https://arxiv.org/abs/2510.24053","url_pdf":"https://arxiv.org/pdf/2510.24053v1","authors":"[\"Jacob B. Roberts\",\"Catherine R. Ji\",\"Isaac Donnell\",\"Thomas D. Young\",\"Allison N. Pearson\",\"Graham A. Hudson\",\"Leah S. Keiser\",\"Mia Wesselkamper\",\"Peter H. Winegar\",\"Janik Ludwig\",\"Sarah H. Klass\",\"Isha V. Sheth\",\"Ezechinyere C. Ukabiala\",\"Maria C. T. Astolfi\",\"Benjamin Eysenbach\",\"Jay D. Keasling\"]","published":"2025-10-28T04:24:39Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"q-bio.QM\"]","methods":"[\"Language Model\"]","has_code":false}
