{"ID":2855413,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.14095","arxiv_id":"2510.14095","title":"Unlocking Out-of-Distribution Generalization in Transformers via Recursive Latent Space Reasoning","abstract":"Systematic, compositional generalization beyond the training distribution remains a core challenge in machine learning -- and a critical bottleneck for the emergent reasoning abilities of modern language models. This work investigates out-of-distribution (OOD) generalization in Transformer networks using a GSM8K-style modular arithmetic on computational graphs task as a testbed. We introduce and explore a set of four architectural mechanisms aimed at enhancing OOD generalization: (i) input-adaptive recurrence; (ii) algorithmic supervision; (iii) anchored latent representations via a discrete bottleneck; and (iv) an explicit error-correction mechanism. Collectively, these mechanisms yield an architectural approach for native and scalable latent space reasoning in Transformer networks with robust algorithmic generalization capabilities. We complement these empirical results with a detailed mechanistic interpretability analysis that reveals how these mechanisms give rise to robust OOD generalization abilities.","short_abstract":"Systematic, compositional generalization beyond the training distribution remains a core challenge in machine learning -- and a critical bottleneck for the emergent reasoning abilities of modern language models. This work investigates out-of-distribution (OOD) generalization in Transformer networks using a GSM8K-style...","url_abs":"https://arxiv.org/abs/2510.14095","url_pdf":"https://arxiv.org/pdf/2510.14095v1","authors":"[\"Awni Altabaa\",\"Siyu Chen\",\"John Lafferty\",\"Zhuoran Yang\"]","published":"2025-10-15T21:03:59Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[\"Transformer\",\"Language Model\"]","has_code":false}