Transformers for Multimodal Brain State Decoding: Integrating Functional Magnetic Resonance Imaging Data and Medical Metadata

cs.LG arXiv:2512.08462
View PDF arXiv JSON

Abstract

Decoding brain states from functional magnetic resonance imaging (fMRI) data is vital for advancing neuroscience and clinical applications. While traditional machine learning and deep learning approaches have made strides in leveraging the high-dimensional and complex nature of fMRI data, they often fail to utilize the contextual richness provided by Digital Imaging and Communications in Medicine (DICOM) metadata. This paper presents a novel framework integrating transformer-based architectures with multimodal inputs, including fMRI data and DICOM metadata. By employing attention mechanisms, the proposed method captures intricate spatial-temporal patterns and contextual relationships, enhancing model accuracy, interpretability, and robustness. The potential of this framework spans applications in clinical diagnostics, cognitive neuroscience, and personalized medicine. Limitations, such as metadata variability and computational demands, are addressed, and future directions for optimizing scalability and generalizability are discussed.

PDF Viewer