{"ID":2831805,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.07805","arxiv_id":"2512.07805","title":"Group Representational Position Encoding","abstract":"We present GRAPE (Group Representational Position Encoding), a unified framework for positional encoding based on group actions. GRAPE unifies two families of mechanisms: (i) multiplicative rotations (Multiplicative GRAPE) in $\\operatorname{SO}(d)$ and (ii) additive logit biases (Additive GRAPE) arising from unipotent actions in the general linear group $\\mathrm{GL}$. In Multiplicative GRAPE, a position $n \\in \\mathbb{Z}$ (or $t \\in \\mathbb{R}$) acts as $\\mathbf{G}(n) = \\exp(n \\, ω\\, \\mathbf{L})$ with a rank-2 skew-symmetric generator $\\mathbf{L} \\in \\mathbb{R}^{d \\times d}$, yielding a relative, compositional, norm-preserving map with a closed-form matrix exponential. RoPE is recovered exactly when the $d/2$ planes correspond to canonical coordinate pairs with a log-uniform spectrum. Learned commuting subspaces and compact non-commuting mixtures strictly extend this geometry to capture cross-subspace feature coupling at $O(d)$ and $O(r d)$ cost per head, respectively. In Additive GRAPE, additive logits arise from rank-1 (or low-rank) unipotent actions, recovering ALiBi and the Forgetting Transformer (FoX) as exact special cases while preserving an exact relative law and streaming cacheability. Overall, GRAPE provides a principled design space for positional geometry in long-context models, subsuming RoPE and ALiBi as special cases. Project page: https://github.com/model-architectures/GRAPE.","short_abstract":"We present GRAPE (Group Representational Position Encoding), a unified framework for positional encoding based on group actions. GRAPE unifies two families of mechanisms: (i) multiplicative rotations (Multiplicative GRAPE) in $\\operatorname{SO}(d)$ and (ii) additive logit biases (Additive GRAPE) arising from unipotent...","url_abs":"https://arxiv.org/abs/2512.07805","url_pdf":"https://arxiv.org/pdf/2512.07805v6","authors":"[\"Yifan Zhang\",\"Zixiang Chen\",\"Yifeng Liu\",\"Zhen Qin\",\"Huizhuo Yuan\",\"Kangping Xu\",\"Yang Yuan\",\"Quanquan Gu\",\"Andrew Chi-Chih Yao\"]","published":"2025-12-08T18:39:13Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"cs.CL\"]","methods":"[\"Transformer\"]","has_code":false,"code_links":[{"ID":606169,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2831805,"paper_url":"https://arxiv.org/abs/2512.07805","paper_title":"Group Representational Position Encoding","repo_url":"https://github.com/model-architectures/GRAPE","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}