{"ID":2857717,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.09767","arxiv_id":"2510.09767","title":"HeSRN: Representation Learning On Heterogeneous Graphs via Slot-Aware Retentive Network","abstract":"Graph Transformers have recently achieved remarkable progress in graph representation learning by capturing long-range dependencies through self-attention. However, their quadratic computational complexity and inability to effectively model heterogeneous semantics severely limit their scalability and generalization on real-world heterogeneous graphs. To address these issues, we propose HeSRN, a novel Heterogeneous Slot-aware Retentive Network for efficient and expressive heterogeneous graph representation learning. HeSRN introduces a slot-aware structure encoder that explicitly disentangles node-type semantics by projecting heterogeneous features into independent slots and aligning their distributions through slot normalization and retention-based fusion, effectively mitigating the semantic entanglement caused by forced feature-space unification in previous Transformer-based models. Furthermore, we replace the self-attention mechanism with a retention-based encoder, which models structural and contextual dependencies in linear time complexity while maintaining strong expressive power. A heterogeneous retentive encoder is further employed to jointly capture both local structural signals and global heterogeneous semantics through multi-scale retention layers. Extensive experiments on four real-world heterogeneous graph datasets demonstrate that HeSRN consistently outperforms state-of-the-art heterogeneous graph neural networks and Graph Transformer baselines on node classification tasks, achieving superior accuracy with significantly lower computational complexity.","short_abstract":"Graph Transformers have recently achieved remarkable progress in graph representation learning by capturing long-range dependencies through self-attention. However, their quadratic computational complexity and inability to effectively model heterogeneous semantics severely limit their scalability and generalization on...","url_abs":"https://arxiv.org/abs/2510.09767","url_pdf":"https://arxiv.org/pdf/2510.09767v2","authors":"[\"Yifan Lu\",\"Ziyun Zou\",\"Belal Alsinglawi\",\"Islam Al-Qudah\",\"Izzat Alsmadi\",\"Feilong Tang\",\"Pengfei Jiao\",\"Shoaib Jameel\",\"Imran Razzak\"]","published":"2025-10-10T18:18:06Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Graph Neural Network\",\"Transformer\"]","has_code":false}