{"ID":2842641,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.09026","arxiv_id":"2511.09026","title":"DeepVRegulome: DNABERT-based deep-learning framework for predicting the functional impact of short genomic variants on the human regulome","abstract":"Whole-genome sequencing (WGS) has revealed numerous non-coding short variants whose functional impacts remain poorly understood. Despite recent advances in deep-learning genomic approaches, accurately predicting and prioritizing clinically relevant mutations in gene regulatory regions remains a major challenge. Here we introduce Deep VRegulome, a deep-learning method for prediction and interpretation of functionally disruptive variants in the human regulome, which combines 700 DNABERT fine-tuned models, trained on vast amounts of ENCODE gene regulatory regions, with variant scoring, motif analysis, attention-based visualization, and survival analysis. We showcase its application on TCGA glioblastoma WGS dataset in prioritizing survival-associated mutations and regulatory regions. The analysis identified 572 splice-disrupting and 9,837 transcription-factor binding site altering mutations occurring in greater than 10% of glioblastoma samples. Survival analysis linked 1352 mutations and 563 disrupted regulatory regions to patient outcomes, enabling stratification via non-coding mutation signatures. All the code, fine-tuned models, and an interactive data portal are publicly available.","short_abstract":"Whole-genome sequencing (WGS) has revealed numerous non-coding short variants whose functional impacts remain poorly understood. Despite recent advances in deep-learning genomic approaches, accurately predicting and prioritizing clinically relevant mutations in gene regulatory regions remains a major challenge. Here we...","url_abs":"https://arxiv.org/abs/2511.09026","url_pdf":"https://arxiv.org/pdf/2511.09026v1","authors":"[\"Pratik Dutta\",\"Matthew Obusan\",\"Rekha Sathian\",\"Max Chao\",\"Pallavi Surana\",\"Nimisha Papineni\",\"Yanrong Ji\",\"Zhihan Zhou\",\"Han Liu\",\"Alisa Yurovsky\",\"Ramana V Davuluri\"]","published":"2025-11-12T06:25:31Z","proceeding":"q-bio.GN","tasks":"[\"q-bio.GN\",\"cs.AI\",\"cs.LG\"]","methods":"[]","has_code":false}
