{"ID":2850847,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.22008","arxiv_id":"2510.22008","title":"A Multimodal Human Protein Embeddings Database: DeepDrug Protein Embeddings Bank (DPEB)","abstract":"Computationally predicting protein-protein interactions (PPIs) is challenging due to the lack of integrated, multimodal protein representations. DPEB is a curated collection of 22,043 human proteins that integrates four embedding types: structural (AlphaFold2), transformer-based sequence (BioEmbeddings), contextual amino acid patterns (ESM-2: Evolutionary Scale Modeling), and sequence-based n-gram statistics (ProtVec]). AlphaFold2 protein structures are available through public databases (e.g., AlphaFold2 Protein Structure Database), but the internal neural network embeddings are not. DPEB addresses this gap by providing AlphaFold2-derived embeddings for computational modeling. Our benchmark evaluations show GraphSAGE with BioEmbedding achieved the highest PPI prediction performance (87.37% AUROC, 79.16% accuracy). The framework also achieved 77.42% accuracy for enzyme classification and 86.04% accuracy for protein family classification. DPEB supports multiple graph neural network methods for PPI prediction, enabling applications in systems biology, drug target identification, pathway analysis, and disease mechanism studies.","short_abstract":"Computationally predicting protein-protein interactions (PPIs) is challenging due to the lack of integrated, multimodal protein representations. DPEB is a curated collection of 22,043 human proteins that integrates four embedding types: structural (AlphaFold2), transformer-based sequence (BioEmbeddings), contextual ami...","url_abs":"https://arxiv.org/abs/2510.22008","url_pdf":"https://arxiv.org/pdf/2510.22008v1","authors":"[\"Md Saiful Islam Sajol\",\"Magesh Rajasekaran\",\"Hayden Gemeinhardt\",\"Adam Bess\",\"Chris Alvin\",\"Supratik Mukhopadhyay\"]","published":"2025-10-24T20:22:17Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"q-bio.MN\"]","methods":"[\"Graph Neural Network\",\"Transformer\"]","has_code":false}
