{"ID":2836815,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.20038","arxiv_id":"2511.20038","title":"Softmax Transformers are Turing-Complete","abstract":"Hard attention Chain-of-Thought (CoT) transformers are known to be Turing-complete. However, it is an open problem whether softmax attention Chain-of-Thought (CoT) transformers are Turing-complete. In this paper, we prove a stronger result that length-generalizable softmax CoT transformers are Turing-complete. More precisely, our Turing-completeness proof goes via the CoT extension of the Counting RASP (C-RASP), which correspond to softmax CoT transformers that admit length generalization. We prove Turing-completeness for CoT C-RASP with causal masking over a unary alphabet (more generally, for letter-bounded languages). While we show this is not Turing-complete for arbitrary languages, we prove that its extension with relative positional encoding is Turing-complete for arbitrary languages. We empirically validate our theory by training transformers for languages requiring complex (non-linear) arithmetic reasoning.","short_abstract":"Hard attention Chain-of-Thought (CoT) transformers are known to be Turing-complete. However, it is an open problem whether softmax attention Chain-of-Thought (CoT) transformers are Turing-complete. In this paper, we prove a stronger result that length-generalizable softmax CoT transformers are Turing-complete. More pre...","url_abs":"https://arxiv.org/abs/2511.20038","url_pdf":"https://arxiv.org/pdf/2511.20038v1","authors":"[\"Hongjian Jiang\",\"Michael Hahn\",\"Georg Zetzsche\",\"Anthony Widjaja Lin\"]","published":"2025-11-25T08:08:39Z","proceeding":"cs.FL","tasks":"[\"cs.FL\",\"cs.LG\",\"cs.LO\"]","methods":"[\"Transformer\"]","has_code":false}
