{"ID":2868405,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.16629","arxiv_id":"2509.16629","title":"Causality-Induced Positional Encoding for Transformer-Based Representation Learning of Non-Sequential Features","abstract":"Positional encoding is essential for supplementing transformer with positional information of tokens. Existing positional encoding methods demand predefined token/feature order, rendering them unsuitable for real-world data with non-sequential yet causally-related features. To address this limitation, we propose CAPE, a novel method that identifies underlying causal structure over non-sequential features as a weighted directed acyclic graph (DAG) using generalized structural equation modeling. The DAG is then embedded in hyperbolic space where its geometric structure is well-preserved using a hyperboloid model-based approach that effectively captures two important causal graph properties (causal strength \u0026 causal specificity). This step yields causality-aware positional encodings for the features, which are converted into their rotary form for integrating with transformer's self-attention mechanism. Theoretical analysis reveals that CAPE-generated rotary positional encodings possess three valuable properties for enhanced self-attention, including causal distance-induced attenuation, causal generality-induced attenuation, and robustness to positional disturbances. We evaluate CAPE over both synthetic and real-word datasets, empirically demonstrating its theoretical properties and effectiveness in enhancing transformer for data with non-sequential features. Our code is available at https://github.com/Catchxu/CAPE.","short_abstract":"Positional encoding is essential for supplementing transformer with positional information of tokens. Existing positional encoding methods demand predefined token/feature order, rendering them unsuitable for real-world data with non-sequential yet causally-related features. To address this limitation, we propose CAPE,...","url_abs":"https://arxiv.org/abs/2509.16629","url_pdf":"https://arxiv.org/pdf/2509.16629v2","authors":"[\"Kaichen Xu\",\"Yihang Du\",\"Mianpeng Liu\",\"Zimu Yu\",\"Xiaobo Sun\"]","published":"2025-09-20T11:08:02Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"q-bio.QM\"]","methods":"[\"Transformer\"]","has_code":false,"code_links":[{"ID":609583,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2868405,"paper_url":"https://arxiv.org/abs/2509.16629","paper_title":"Causality-Induced Positional Encoding for Transformer-Based Representation Learning of Non-Sequential Features","repo_url":"https://github.com/Catchxu/CAPE","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}