{"ID":2833761,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.02315","arxiv_id":"2512.02315","title":"Few-shot Protein Fitness Prediction via In-context Learning and Test-time Training","abstract":"Accurately predicting protein fitness with minimal experimental data is a persistent challenge in protein engineering. We introduce PRIMO (PRotein In-context Mutation Oracle), a transformer-based framework that leverages in-context learning and test-time training to adapt rapidly to new proteins and assays without large task-specific datasets. By encoding sequence information, auxiliary zero-shot predictions, and sparse experimental labels from many assays as a unified token set in a pre-training masked-language modeling paradigm, PRIMO learns to prioritize promising variants through a preference-based loss function. Across diverse protein families and properties-including both substitution and indel mutations-PRIMO outperforms zero-shot and fully supervised baselines. This work underscores the power of combining large-scale pre-training with efficient test-time adaptation to tackle challenging protein design tasks where data collection is expensive and label availability is limited.","short_abstract":"Accurately predicting protein fitness with minimal experimental data is a persistent challenge in protein engineering. We introduce PRIMO (PRotein In-context Mutation Oracle), a transformer-based framework that leverages in-context learning and test-time training to adapt rapidly to new proteins and assays without larg...","url_abs":"https://arxiv.org/abs/2512.02315","url_pdf":"https://arxiv.org/pdf/2512.02315v1","authors":"[\"Felix Teufel\",\"Aaron W. Kollasch\",\"Yining Huang\",\"Ole Winther\",\"Kevin K. Yang\",\"Pascal Notin\",\"Debora S. Marks\"]","published":"2025-12-02T01:20:40Z","proceeding":"q-bio.BM","tasks":"[\"q-bio.BM\",\"cs.LG\"]","methods":"[\"Transformer\",\"Language Model\"]","has_code":false}
