{"ID":2829776,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.11411","arxiv_id":"2512.11411","title":"Sliced ReLU attention: Quasi-linear contextual expressivity via sorting","abstract":"We introduce sliced ReLU attention, a new attention mechanism that departs structurally from both softmax and its approximation alternatives. Instead of applying a nonlinearity to pairwise dot products, we operate on one-dimensional projections of key--query differences and leverage sorting to obtain quasi-linear complexity. This construction yields a differentiable, non-symmetric kernel that can be computed in O(n log(n)) through a sorting procedure, making it suitable for very long contexts. Beyond computational benefits, the model retains strong theoretical expressive power: we establish two in-context expressivity results, previously known for softmax attention, showing that sliced ReLU attention preserves the ability to perform nontrivial sequence-to-sequence disentangling tasks and satisfies a contextual universal approximation property. Finally, we illustrate the potential practical interest of this kernel in small to medium-scale experiments.","short_abstract":"We introduce sliced ReLU attention, a new attention mechanism that departs structurally from both softmax and its approximation alternatives. Instead of applying a nonlinearity to pairwise dot products, we operate on one-dimensional projections of key--query differences and leverage sorting to obtain quasi-linear compl...","url_abs":"https://arxiv.org/abs/2512.11411","url_pdf":"https://arxiv.org/pdf/2512.11411v2","authors":"[\"François-Xavier Vialard\",\"Siwan Boufadène\"]","published":"2025-12-12T09:39:14Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[]","has_code":false}