{"ID":2899484,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.00453","arxiv_id":"2507.00453","title":"Recurrent Memory-Augmented Transformers with Chunked Attention for Long-Context Language Modeling","abstract":"We present a Transformer architecture for long-context language modeling that combines global attention with two biologically inspired components: chunked local attention and a gated FIFO memory mechanism. This unified attention block allows the model to efficiently handle both short-range and long-range dependencies without increasing attention cost quadratically. The memory module persistently stores past token representations using a gated update mechanism inspired by recurrent networks. Rotary positional encoding is applied per attention head to enable directionally disentangled, scale-invariant positional signals. The architecture is implemented entirely from scratch in PyTorch, with no reliance on high-level libraries, enabling transparent and modular experimentation. Our model offers a lightweight and extensible design for tasks such as dialogue modeling, code completion, and document understanding.","short_abstract":"We present a Transformer architecture for long-context language modeling that combines global attention with two biologically inspired components: chunked local attention and a gated FIFO memory mechanism. This unified attention block allows the model to efficiently handle both short-range and long-range dependencies w...","url_abs":"https://arxiv.org/abs/2507.00453","url_pdf":"https://arxiv.org/pdf/2507.00453v1","authors":"[\"Ankit Kashyap\"]","published":"2025-07-01T06:11:38Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Transformer\",\"Language Model\"]","has_code":false}