{"ID":2835130,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.00376","arxiv_id":"2512.00376","title":"Layer Probing Improves Kinase Functional Prediction with Protein Language Models","abstract":"Protein language models (PLMs) have transformed sequence-based protein analysis, yet most applications rely only on final-layer embeddings, which may overlook biologically meaningful information encoded in earlier layers. We systematically evaluate all 33 layers of ESM-2 for kinase functional prediction using both unsupervised clustering and supervised classification. We show that mid-to-late transformer layers (layers 20-33) outperform the final layer by 32 percent in unsupervised Adjusted Rand Index and improve homology-aware supervised accuracy to 75.7 percent. Domain-level extraction, calibrated probability estimates, and a reproducible benchmarking pipeline further strengthen reliability. Our results demonstrate that transformer depth contains functionally distinct biological signals and that principled layer selection significantly improves kinase function prediction.","short_abstract":"Protein language models (PLMs) have transformed sequence-based protein analysis, yet most applications rely only on final-layer embeddings, which may overlook biologically meaningful information encoded in earlier layers. We systematically evaluate all 33 layers of ESM-2 for kinase functional prediction using both unsu...","url_abs":"https://arxiv.org/abs/2512.00376","url_pdf":"https://arxiv.org/pdf/2512.00376v1","authors":"[\"Ajit Kumar\",\"IndraPrakash Jha\"]","published":"2025-11-29T08:06:11Z","proceeding":"q-bio.QM","tasks":"[\"q-bio.QM\",\"cs.AI\",\"cs.LG\"]","methods":"[\"Transformer\",\"Language Model\"]","has_code":false}
