{"ID":2861269,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.01632","arxiv_id":"2510.01632","title":"BioBlobs: Unsupervised Discovery of Functional Substructures for Protein Function Prediction","abstract":"Protein function is driven by cohesive substructures, such as catalytic triads, binding pockets, and structural motifs, that occupy only a small fraction of a protein's residues. Yet existing pipelines built on protein encoders do not model proteins at the substructure level, leaving the central biological question unanswered: which substructure of a protein is responsible for its function? We introduce BioBlobs, an encoder-agnostic, end-to-end differentiable framework that compresses a protein into a small set of cohesive substructures (blobs) and predicts function from these blobs alone, so that each blob corresponds to a candidate functional region. Across diverse protein function prediction tasks and multiple sequence- and structure-based encoders, BioBlobs matches or exceeds strong baselines while operating on only a small fraction of residues. The discovered blobs adapt their spatial scale to the task, ranging from local catalytic sites to entire structural domains. Trained only on protein-level labels, BioBlobs recovers experimentally annotated catalytic sites in the M-CSA database, demonstrating unsupervised functional substructure discovery and opening a path to large-scale functional site discovery across the unannotated proteome.","short_abstract":"Protein function is driven by cohesive substructures, such as catalytic triads, binding pockets, and structural motifs, that occupy only a small fraction of a protein's residues. Yet existing pipelines built on protein encoders do not model proteins at the substructure level, leaving the central biological question una...","url_abs":"https://arxiv.org/abs/2510.01632","url_pdf":"https://arxiv.org/pdf/2510.01632v3","authors":"[\"Xin Wang\",\"Kaiwen Shi\",\"Carlos Oliver\"]","published":"2025-10-02T03:25:02Z","proceeding":"q-bio.BM","tasks":"[\"q-bio.BM\",\"cs.AI\"]","methods":"[]","has_code":false}
