{"ID":2823894,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.23966","arxiv_id":"2512.23966","title":"Efficient Context Scaling with LongCat ZigZag Attention","abstract":"We introduce LongCat ZigZag Attention (LoZA), which is a sparse attention scheme designed to transform any existing full-attention models into sparse versions with rather limited compute budget. In long-context scenarios, LoZA can achieve significant speed-ups both for prefill-intensive (e.g., retrieval-augmented generation) and decode-intensive (e.g., tool-integrated reasoning) cases. Specifically, by applying LoZA to LongCat-Flash during mid-training, we serve LongCat-Flash-Exp as a long-context foundation model that can swiftly process up to 1 million tokens, enabling efficient long-term reasoning and long-horizon agentic capabilities.","short_abstract":"We introduce LongCat ZigZag Attention (LoZA), which is a sparse attention scheme designed to transform any existing full-attention models into sparse versions with rather limited compute budget. In long-context scenarios, LoZA can achieve significant speed-ups both for prefill-intensive (e.g., retrieval-augmented gener...","url_abs":"https://arxiv.org/abs/2512.23966","url_pdf":"https://arxiv.org/pdf/2512.23966v2","authors":"[\"Chen Zhang\",\"Yang Bai\",\"Jiahuan Li\",\"Anchun Gui\",\"Keheng Wang\",\"Feifan Liu\",\"Guanyu Wu\",\"Yuwei Jiang\",\"Defei Bu\",\"Li Wei\",\"Haihang Jing\",\"Hongyin Tang\",\"Xin Chen\",\"Xiangzhou Huang\",\"Fengcun Li\",\"Rongxiang Weng\",\"Yulei Qian\",\"Yifan Lu\",\"Yerui Sun\",\"Jingang Wang\",\"Yuchen Xie\",\"Xunliang Cai\"]","published":"2025-12-30T03:39:04Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"RAG\"]","has_code":false}