{"ID":2895706,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.08567","arxiv_id":"2507.08567","title":"AbbIE: Autoregressive Block-Based Iterative Encoder for Efficient Sequence Modeling","abstract":"We introduce the Autoregressive Block-Based Iterative Encoder (AbbIE), a novel recursive generalization of the encoder-only Transformer architecture, which achieves better perplexity than a standard Transformer and allows for the dynamic scaling of compute resources at test time. This simple, recursive approach is a complement to scaling large language model (LLM) performance through parameter and token counts. AbbIE performs its iterations in latent space, but unlike latent reasoning models, does not require a specialized dataset or training protocol. We show that AbbIE upward generalizes (ability to generalize to arbitrary iteration lengths) at test time by only using 2 iterations during train time, far outperforming alternative iterative methods. AbbIE's ability to scale its computational expenditure based on the complexity of the task gives it an up to \\textbf{12\\%} improvement in zero-shot in-context learning tasks versus other iterative and standard methods and up to 5\\% improvement in language perplexity. The results from this study open a new avenue to Transformer performance scaling. We perform all of our evaluations on model sizes up to 350M parameters.","short_abstract":"We introduce the Autoregressive Block-Based Iterative Encoder (AbbIE), a novel recursive generalization of the encoder-only Transformer architecture, which achieves better perplexity than a standard Transformer and allows for the dynamic scaling of compute resources at test time. This simple, recursive approach is a co...","url_abs":"https://arxiv.org/abs/2507.08567","url_pdf":"https://arxiv.org/pdf/2507.08567v2","authors":"[\"Preslav Aleksandrov\",\"Meghdad Kurmanji\",\"Fernando Garcia Redondo\",\"David O'Shea\",\"William Shen\",\"Alex Iacob\",\"Lorenzo Sani\",\"Xinchi Qiu\",\"Nicola Cancedda\",\"Nicholas D. Lane\"]","published":"2025-07-11T13:11:11Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Transformer\",\"Large Language Model\",\"Language Model\"]","has_code":false}