{"ID":2839583,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.15464","arxiv_id":"2511.15464","title":"SIGMMA: Hierarchical Graph-Based Multi-Scale Multi-modal Contrastive Alignment of Histopathology Image and Spatial Transcriptome","abstract":"Recent advances in computational pathology have leveraged vision-language models to learn joint representations of Hematoxylin and Eosin (HE) images with spatial transcriptomic (ST) profiles. However, existing approaches typically align HE tiles with their corresponding ST profiles at a single scale, overlooking fine-grained cellular structures and their spatial organization. To address this, we propose Sigmma, a multi-modal contrastive alignment framework for learning hierarchical representations of HE images and spatial transcriptome profiles across multiple scales. Sigmma introduces multi-scale contrastive alignment, ensuring that representations learned at different scales remain coherent across modalities. Furthermore, by representing cell interactions as a graph and integrating inter- and intra-subgraph relationships, our approach effectively captures cell-cell interactions, ranging from fine to coarse, within the tissue microenvironment. We demonstrate that Sigmm learns representations that better capture cross-modal correspondences, leading to an improvement of avg. 9.78\\% in the gene-expression prediction task and avg. 26.93\\% in the cross-modal retrieval task across datasets. We further show that it learns meaningful multi-tissue organization in downstream analyses.","short_abstract":"Recent advances in computational pathology have leveraged vision-language models to learn joint representations of Hematoxylin and Eosin (HE) images with spatial transcriptomic (ST) profiles. However, existing approaches typically align HE tiles with their corresponding ST profiles at a single scale, overlooking fine-g...","url_abs":"https://arxiv.org/abs/2511.15464","url_pdf":"https://arxiv.org/pdf/2511.15464v3","authors":"[\"Dabin Jeong\",\"Amirhossein Vahidi\",\"Ciro Ramírez-Suástegui\",\"Marie Moullet\",\"Kevin Ly\",\"Mohammad Vali Sanian\",\"Sebastian Birk\",\"Yinshui Chang\",\"Adam Boxall\",\"Daniyal Jafree\",\"Lloyd Steele\",\"Vijaya Baskar MS\",\"Muzlifah Haniffa\",\"Mohammad Lotfollahi\"]","published":"2025-11-19T14:22:23Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.LG\"]","methods":"[\"Language Model\",\"Generative Adversarial Network\"]","has_code":false}