{"ID":2837750,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.19423","arxiv_id":"2511.19423","title":"Beyond Protein Language Models: An Agentic LLM Framework for Mechanistic Enzyme Design","abstract":"We present Genie-CAT, a tool-augmented large-language-model (LLM) system designed to accelerate scientific hypothesis generation in protein design. Using metalloproteins (e.g., ferredoxins) as a case study, Genie-CAT integrates four capabilities -- literature-grounded reasoning through retrieval-augmented generation (RAG), structural parsing of Protein Data Bank files, electrostatic potential calculations, and machine-learning prediction of redox properties -- into a unified agentic workflow. By coupling natural-language reasoning with data-driven and physics-based computation, the system generates mechanistically interpretable, testable hypotheses linking sequence, structure, and function. In proof-of-concept demonstrations, Genie-CAT autonomously identifies residue-level modifications near [Fe--S] clusters that affect redox tuning, reproducing expert-derived hypotheses in a fraction of the time. The framework highlights how AI agents combining language models with domain-specific tools can bridge symbolic reasoning and numerical simulation, transforming LLMs from conversational assistants into partners for computational discovery.","short_abstract":"We present Genie-CAT, a tool-augmented large-language-model (LLM) system designed to accelerate scientific hypothesis generation in protein design. Using metalloproteins (e.g., ferredoxins) as a case study, Genie-CAT integrates four capabilities -- literature-grounded reasoning through retrieval-augmented generation (R...","url_abs":"https://arxiv.org/abs/2511.19423","url_pdf":"https://arxiv.org/pdf/2511.19423v1","authors":"[\"Bruno Jacob\",\"Khushbu Agarwal\",\"Marcel Baer\",\"Peter Rice\",\"Simone Raugei\"]","published":"2025-11-24T18:57:07Z","proceeding":"q-bio.QM","tasks":"[\"q-bio.QM\",\"cs.AI\"]","methods":"[\"RAG\",\"Large Language Model\",\"Language Model\"]","has_code":false}
